Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for inability to read some parquet files (issue #816) #817

Merged
merged 13 commits into from
Dec 2, 2024
Merged

Conversation

daw3rd
Copy link
Member

@daw3rd daw3rd commented Nov 20, 2024

Why are these changes needed?

To allow proper reading and processing of some parquet files that are otherwise unreadable.

Related issue number (if any).

#816

@daw3rd daw3rd marked this pull request as draft November 20, 2024 14:44
@daw3rd daw3rd marked this pull request as ready for review November 20, 2024 20:48
@daw3rd daw3rd requested review from touma-I and blublinsky November 21, 2024 14:31
Copy link
Collaborator

@blublinsky blublinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but please, add the comment

logger.error(f"Failed to convert byte array to arrow table, exception {e}. Skipping it")
return None
logger.warning(f"Could not convert bytes to pyarrow: {e}")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you, please put a comment here about why polars. Just copy the blur from the where you found this solution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@touma-I touma-I left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if the two comments can be addressed. Thanks

transforms/universal/filter/python/src/filter_transform.py Outdated Show resolved Hide resolved
data-processing-lib/python/requirements.txt Outdated Show resolved Hide resolved
Copy link
Collaborator

@touma-I touma-I left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@daw3rd daw3rd merged commit 4171dfa into dev Dec 2, 2024
88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants