-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent segfault in apparently memory safe code, perhaps FFTW related #48722
Comments
Have you tried running this with |
On my local machine (M1 Mac) the tests run cleanly with |
Can you try running on |
The recording was indeed made on an AMD machine.
Note also this recording was made on the main branch of Sunny (commit We can also see what happens on the |
I can confirm a Github CI crash on the 1.10.0-DEV.637 nightly. |
We will attempt to produce an rr trace on an Intel machine. Update Despite some effort, we're having trouble creating another rr trace. Does the existing rr trace replay on an AMD machine? |
We replayed the trace on the AMD machine on which it was originally recorded. After entering
@vtjnash Is this what you were referring to? Does this indicate that the trace is unusable? We will continue the effort to capture a trace on an Intel machine using the nightly. As noted above, we have seen the segfault on Intel, but so far it has only been on computers where we don't have the ability to use rr. |
Also, any other hints to diagnose memory corruption are welcome. We tried running valgrind with a vanilla nightly using these instructions, but the output appears clean. Is it likely to be helpful to try a custom Julia build with |
Yes, though perhaps there are settings you can use to relax the ticks checks, or maybe this is controlled by the perf_event_paranoid kernel setting? MSAN can be very good for that, but can be a bit annoying to build (https://docs.julialang.org/en/v1/devdocs/sanitizers/#Sanitizer-support) |
Unfortunately, the @inbounds marker seems unrelated to the crashing behavior. Attempt another temporary workaround: disable dipole-dipole in energy consistency test. JuliaLang/julia#48722
On Slack vtjnash and Gabriel Baraldi explained that |
Indeed the problem was |
We are observing intermittent segfault behavior when running the tests of the Sunny.jl package. It sometimes shows up during Github CI testing of our simplified
crash
branch:It only crashes sometimes, however. On my Mac, for example, crashes are rare, but when they happen, it's in roughly the same code location. An example of the segfault output is shown from this CI action: https://github.com/SunnySuite/Sunny.jl/actions/runs/4214225988/jobs/7314550112
The segfault seems to always occur inside FFTW, but perhaps there is memory corruption happening prior to FFTW.
The branch
Sunny#crash
contains no@inbounds
annotations, or other "memory unsafe" operations from what we can tell (presumably the FFT package is intended to be memory safe?). Sunny does depend on external C libraries, which could of course corrupt memory.I tried to bisect to a commit where the crash first appeared, and it seems to be one of these two:
SunnySuite/Sunny.jl@fb0a631 <- where crashes become very noticeable
SunnySuite/Sunny.jl@9f97b54 <- parent commit, seems suspicious to me
We recorded a log of the crash using
--bug-report=rr
and uploaded here:https://julialang-dumps.s3.amazonaws.com/reports/2023-02-18T02-49-23-ddahlbom.tar.zst
Two example segfault outputs are below.
and
We have observed the problem on multiple machines, all using Julia 1.8.5. It primarily appears on Github Actions CI using x64, but I have also seen it on my M1 Mac, which is:
Github Actions Julia installer with
[Julia 1.8 - ubuntu-latest - x64]
or juliaup for Mac.Thank you.
The text was updated successfully, but these errors were encountered: