-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daft.DataFrame.write_deltalake
does not support append for tables with schema evolution
#3559
Comments
Thank you for reporting this. @jaychia any thoughts on this? |
Hi @anilmenon14 , what is the expected behavior here? Should Daft be allowing missing columns/extra columns/different columns by performing column pruning or adding columns of data with all nulls? |
Hi @jaychia ,
This is how this append in Spark behaves with merge schema option: I forked the current main branch and altered this L942 of the daft/dataframe/dataframe.py to accept In short, I'd expect
Let me know if you like me to look into anything else and I'm happy to help. |
Describe the bug
Appends to Delta Lake is not permitted when schema between the DataFrame and the DeltaLake table differs. There appears to be safety in place in
daft.DataFrame.write_deltalake
, which was noticed when working on PR #3522 that prevents this from happening. Specifically, this block of code is the one preventing this.Without knowing too much of the history, I assume this safety is in place since delta-rs likely has/had some limitation.
I could locate https://github.com/delta-io/delta-rs/pull/2246in the delta-rs project that appears to mention there is support for what we are seeking, however from what I see in the delta-rs writer, I believe this is the relevant block of code explicitly prevents mode='append' and schema_mode='overwrite' from being done.
Handling schema evolution using Daft DataFrame writes to Delta lake would be a great feature.
To Reproduce
daft.DataFrame.read_deltalake()
daft.DataFrame.write_deltalake(table="sometable",mode="append",schema_mode="overwrite")
Expected behavior
Should allow appends to existing tables.
Component(s)
Other
Additional context
No response
The text was updated successfully, but these errors were encountered: