-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExpectedMoreSplits error when loading C4 dataset #6746
Comments
Hi ! We updated the To fix this issue you can update traindata = load_dataset('allenai/c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train')
valdata = load_dataset('allenai/c4', data_files={'validation': 'en/c4-validation.00000-of-00008.json.gz'}, split='validation') |
Did you solve this problem?I have the same bug.It is no use to delete "allenai--c4". |
Did you solve it? I met this problem too. |
But after I romove allenai--c4,it still fails |
For me it works this way. I'm using datasets version 2.17.0 |
First, pip install --upgrade datasets. |
The error is in the Wanda repository: https://github.com/locuslab/wanda Concretely, in these code lines: Please report there and/or make the fix in their code. |
Solved for me ! Thanks! |
Describe the bug
I encounter bug when running the example command line
The bug occurred at these lines of code (when loading c4 dataset)
The error message states:
Steps to reproduce the bug
Expected behavior
The error message states:
Environment info
I'm using cuda 12.4, so I use
pip install pytorch
instead of conda provided in install.mdAlso, I've tried another environment using the same commands in install.md, but the same bug occured
The text was updated successfully, but these errors were encountered: