-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError: inotify instance limit reached #24
Comments
I went digging into the codebase of cachier, and saw the following that might be responsible for the recursive inotify-ing: observer.join(timeout=1.0)
if observer.is_alive():
# print('Timedout waiting. Starting again...')
return self.wait_on_entry_calc(key) What if it were instead changed to the following? while observer.is_alive():
observer.join(timeout=1.0) Would this bork how Watchdog is supposed to work? Would this help with the recursive nesting calls of |
@ericmjl Did you find the solution to this? We encounter the same inotify problem when implementing cachier in our prediction server. |
@tsp-kucbd I tried using a while loop, as you can see in the PR #25 referenced above. Please try it out and see if it works for you -- if so, we should ping back to @shaypal5 to see how we can fix the macOS errors that are causing the PR to fail. (I have to admit, I'm not very well-versed with the codebase, and it took me a long time to narrow down the error.) |
Oh, I forgot about your PR to fix this, @ericmjl . Please ping me again freely to make me take a look, but I suggest you two try to take a jab at this. |
No worries, @shaypal5 😄 I assumed you were busy and needed to finish up work! |
I tried, but unfortunately it did not work. Also, this is not under MacOS but on a Linux server ... |
Without a stack trace, it's going to be tough to see what's happening. I am not sure whether you made the change in the source code, and whether the changes were reflected in your environment, or whether something else was happening. |
See my comment on the related PR, which fails an important test. |
I'm also getting this error when calling with anonymous func as worker function. Environment: Expected: Actual:
|
I am also experiencing this error. |
Please see my comments on PR #25 if you want to help get this fixed. |
I also am experiencing this error. |
I'm also getting this error when calling in a sub-processing. I used the forkserver mode on a linux server. The traceback is shown below.
|
I also experience this problem. I would have imagined this is the exact scenario that cachier is designed for (long run-to-complete tasks). I'm amazed this has stayed open for more than a year. Is this project active? |
@endlisnis The project is active, yes, but I myself (the author of the package) is not actively using it anymore, so I rarely sit down anymore to work on new feature, or debug scenarios I have no ideas how to reproduce. I put in enough time to answer issues, walk people through making PRs, etc. But yeah, this is an extremely small open source project that is in the stage where it relies on the community for improvements and bug fixes. ¯_(ツ)_/¯ Again, I'd love to help anyone who wants to get into researching this and making a PR, and I would advise anyone attempting this to start off where PR #25 has left off. |
The fix on PR #25 was released in |
@shaypal5 - I suspect that the underlying issue has not been fixed as I've just hit this issue. Edit: part of my hypothesis was wrong (is not to do with number of files in DetailsSituationLong running consumer completing same task (downloading image from URL), run across about 6000 tasks. Using Cachier with Pickle core. Versions
Traceback
MiscCreating cached functions using the following: self.get_image_data = cachier.cachier(
cache_dir=".tmp/cachier",
seperate_files=True,
pickle_reload=False,
)(parent.get_image_data) Cache directory .tmp/cachier % ls -a | wc -l
8309
.tmp/cachier % ls -a | grep get_image_data | wc -l
6099 OLD INCORRECT AnalysisHere's what I think is happening and how it might be resolved. First - there is a limit to how many files can be watched using inotofy (which is what A workaround would be to increase this limit on the system being used, but this would just "postpone" the error. Second - I believe that there are 2 scenarios that will trigger this limit being hit in general use. a) cachier is being used across a large number of different functions Third - the error is only thrown if a call to a cached function results in a For the fix I don't think anything like "adding subdirectories per function" will help because of case I'll see if I can find a way to reproduce the problem. |
Update: I cannot reproduce the problem with the below scripts. Why? It turns out I had a lot of hanging consumer threads that was probably taking me over the (I identified this hanging threads usings the bash Once I shutdown those consumer threads I could run the below tests with Files
|
Wow, amazing work tracking this down, explaining it thoroughly and reproducing it! All three mitigations, however, seem non-trivial to me (some technically, some in a bet on change in user experience, like the directory-per-key suggestion, as I suspect some people rely on the current single directory behavior. Still, I'd appreciate you working on a suggestion to the most sensible solution, and hopefully we'll get other key contributes to chime in on this, especially @lordjabez . |
Ahh @shaypal5 - looks like you got here just as I realised I could not reproduce the problem (see edited comments 😅). That said I think I am closer - let me spend some more time reproducing. |
That's ok. Take your time. :) |
Okay this time I can actually reproduce it. Uses same
|
@shaypal5 - I'm going to leave my analysis here for the moment (see Edit: I'm pretty confident in my analysis now - see "Changing limits" in the previous comment. |
Great work! Let's see if we can get people to chime in on this. |
So had I checked the relevent code from It also looks like inotify observers can be used on a single path (opposed to a directory), this wouldn't deal with our current error of Given That said, it appears that the default (Even if we do this I still think it would be a good idea to implement the more efficient use of observers per the above). Various references:
|
Ok. Great analysis. Thank you yet again! Is this something you would consider trying to solve with a suggested PR yourself? |
No problem :) Not at the moment - I'm pretty busy contributing to some other open-source projects 🥲. If that changes I'll let you know (probably via a PR). |
Ok. :) |
When using cachier with some long-running functions, I get the following error:
A fuller stack trace is in the details below. I have tried my best to obscure out proprietary information, so please forgive me if it kind of messes up the syntax.
What would be the best next step here to debug? (No pressure on your side, @shaypal5, I have no problems debugging on my own, just wanted to know your thoughts.)
The text was updated successfully, but these errors were encountered: