-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
During write_parquet
if Ray worker crashes invalid partial partition may be persisted
#3576
Comments
Hey @jpedrick-numeus, this should be possible to implement. We can implement a cleanup mechanism to detect duplicate/corrupted files written by daft upon a successful write operation. Setting |
Hi @colin-ho , thanks for the reply. I'm not quite sure what you're saying here. If I say "write_mode='overwrite'" Daft will cleanup partial writes from a crashed ray worker? My intuition is that currently |
The way
You don't need to call |
Ok, interesting, I'll try that then. |
Describe the bug
I'm running Daft + Ray on an instance with limited memory. When calling
daft.DataFrame.write_parquet
if workers crash for any reason, the partially written parquet files aren't cleaned up which causes problems in downstream processes. I, at minimum, need the entirewrite_parquet
operation to return some kind of failed state. Ideally, however, would be that Daft would cleanup any potentially corrupted parquet files and retry the task.To Reproduce
Run
daft.DataFrame.write_parquet
with too many threads for the memory available resulting in OOM crashes for Ray workers.Expected behavior
Either being informed that the write operation is potentially corrupted from the return value(or thrown exception) or ensure that the write_parquet operation cleans up and potentially corrupted parts when a ray worker crashes.
Component(s)
Parquet
Additional context
No response
The text was updated successfully, but these errors were encountered: