-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis CI linux-aarch64, linux-ppc64le jobs failing #185
Comments
I have been in touch with Travis support via email but no resolution yet. |
I've seen the issue on the gtest feedstock as well, independently of R. In any case, I'm not 100% sure this qualifies as "major". According to the status page we build around 100-150x more on azure than on travis, so <0.5% of our builds are affected1, and it's possible (at least in principle) to switch them to azure (either emulated or cross-compiled). I know this is splitting hairs a bit, so no need to change anything per se (I was thinking along the lines of avoiding a "boy who cried wolf" situation where people evenetually don't take our status seriously, but one time isn't going to do that). That said, thanks a lot for trying to the bottom of this @mfansler! 🙏 Footnotes
|
Was debating between "degraded" and "major outage". Ok with using "degraded' instead That said, this appears to be affecting all(?) native |
Mervin, have you heard anything from Travis CI? FWIW it seems Travis users outside conda-forge have the same issue. So it is not just us |
No word since when I created this. I just sent a ping to see if they have any updates. |
... conda-forge/conda-forge.github.io#1521 ... 🙄 |
It's been more than a week. Any affected feedstocks should consider either of the following changes in
|
Do you know of any example PR where a recipe was moved to using cross-compilation for |
You mean for R or in general? |
xref: conda/conda-build#5349 (just linking here since I tried to move a package out of PPC64le and hit this) |
That should be a very rare case though. Cross-compilation and |
In general. The actual feedstock where I'm hitting this is a |
Don't know how much of this will be helpful for other contexts, but here's an example for conversion to cross-compilation on an R feedstock: conda-forge/r-phylobase-feedstock#10 Our recipe (
For build_platform:
linux_ppc64le: linux_64
test: native_and_emulated NB: I usually switch It is not infrequent that we also need to patch the source's build scripts. Since CRAN native builds everything, our upstreams are not always considering cross-compilation, e.g., they use autoconf scripts that include run tests. Often it can be easiest to simply skip such configure scripts and directly provide pre-determined compilation flags. |
Just want to clarify the explicit combinations here:
|
I can't find any real competitors to Travis for IBM architectures. But I did find that OSU's Open Source Lab hosts (IBM sponsored) Jenkins instances for ppc and s390x for open source. I'm guessing they are not really prepared to handle conda-forge's scale, but it might be worth a contact in any case. |
Thanks Min. We had access to those for a long time now. Agreed they are not really for our scale. |
Has there been any word from Travis CI on this issue? |
Nothing through my email. I am also unable to view the ticket they created (always ask for "Sign-in" then dumps me on the Dashboard). Maybe someone from Core should take over. |
Thanks Mervin! 🙏 Have we seen any Travis CI builds run on |
For C/C++ recipes, there's not much to do except change The main problem in cross-compilation is that you cannot simply run things (e.g. just-built utilities) during the build process, because the architecture you're compiling for doesn't match what you're running on. That's also why you need the respective dependencies in the Rust recipes seem to cross-compile without much complications (from a few I've looked at recently), but I'm not familiar with what's necessary for go recipes. The org-wide github-search is very useful for finding this sort of thing though. First impression is that you'll have to pay attention to |
Switch to cross-compilation for linux_aarch64 and linux_ppc64le conda-forge#12 => to work around Travis CI linux-ppc64le job failing (conda-forge/status#185) Migrate to {{ stdlib("c") }} conda-forge#13 => use cos7 for all linux platforms (in os_version: of conda-forge.yml)
Yes, I've mostly been avoiding emulation except in a few edge case that would require heavier patching. The issue in R packages is that the R build process can sometimes involve loading the built library (e.g., to render help). In such cases I'll emulate. |
Seeing aarch builds fail as well now: conda-forge/povray-feedstock#19 |
Should we edit the title and issue description to reflect this new information? |
@jaimergp there are still other linux-aarch64 jobs passing - I'm not convinced that wasn't a sporadic failure. But if non-R feedstocks are seeing consistent failures, the issue description could be generalized. |
Travis CI reports to have resolved the issue and I have confirmed with several jobs that linux-ppc64le runs are indeed running normally again. |
Sounds like we can close this soon, then? Let's keep it open for a few more hours just in case, but will close by EOD if we can confirm it's working. |
Checked https://app.travis-ci.com/github/conda-forge and there are several feedstocks with passing builds for both PPC and ARM from few hours ago (e.g. https://app.travis-ci.com/github/conda-forge/databricks-cli-feedstock/builds/272058545?serverType=git). I'll close. Thanks for keeping an eye on this @mfansler! |
Glad this is improving! 🥳 That said, did just see a new instance of this So doesn't seem like this is fully resolved yet |
Looking at the travis dashboard, this still seems to be happening to ~50% of PPC jobs (which just get cancelled). |
At least that aspect can be cured by restarting the job though. |
Yeah, looks like it's back to the previous baseline with something like 10%-25% sporadic failure. |
A month has passed and this incident is still open with no foreseeable solution. Are we still observing the 10-25% sporadic failure rate? If that's the case, is it worth studying the feasibility of disabling that platform on Travis CI and let people cross-compile or emulate? |
Was about to ask the same thing. Did see a build here. It stalled out in the midst of the build, which is a different issue than this one, but it is an issue that we have seen with Travis CI before |
R migration bottlenecked on |
Having more data these last few weeks, I still see high failure rates (~10%-25%). I've updated the OP notification to reflect that more specifically - rather than the acute ppc64le issue - and added notice to consider moving to Azure. |
Travis CI linux-aarch64 and linux-ppc64le jobs continue to have high failure rates (10%-25%). This manifests either as infinite queuing or premature cancellation. Restarting the individual build jobs is often sufficient, however, maintainers may also consider moving builds to Azure following #185 (comment).
The text was updated successfully, but these errors were encountered: