-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing parquet files #16
Comments
If you wish to write dicts, as opposed to tabular data, you may be better off looking at avro. There are working python libraries, avro (official, slow), fastavro and cyavro. |
My stats team say they want it stored in parquet (in S3). I have many individual big dicts that I want to store. Most of them are 1-level dicts, so it's quite tabular. All of it needs to happen from CPython, not a JVM. |
In that case, you have two options: to wait for the ongoing work by the apache-arrow to enable the conversion of pandas dataframes to parquet (so, presumably, any data structure you can store in a dataframe), or - of course - to work on the writer in this project. I personally have no plans to work on it in the near future. |
Thanks! I appreciate the update and tips. I'll try to get a handle on the state of Python support inside arrow. I see the code's there but skimming through it, I only see support (no idea of it's completion state) for readiing. |
Hi,
We need to be able to write python dicts to parquet. What are the chances that you'll have time to work on this? I.e. a writer class.
My team is totally new to parquet so we have a lot to learn. We did see #13 which claims to have a writer functionality but that PR is out-of-sync and tries to solve a couple of other things at the same time.
Would appreciate your thoughts on this project's near future.
cc @adngdb
The text was updated successfully, but these errors were encountered: