-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce Llama2 results #52
Comments
I recalled that there shouldn't be 333 samples for the wikitext, actually much less than that (in my case it is 83). Are you using the validation set? |
I am using the same testenc that the function get_wikitext2 in data.py is returning. If the model's sequence length is 4096, does this mean that I'm somehow getting more samples? |
Correct, 333 does not looks like the right number from what i am seeing on my end; and i was referring to the |
Thanks to your tip I was able to figure out what the problem was. I was testing over wikitext103 instead of wikitext2. The version of datasets suggested in your install file automatically loads wikitext103 instead of wikitext2. I suggest you update it. |
Great, thank you for the update. |
Hello, I'm opening this issue because I'm still having problems with reproducing the llama 2-7b results (both without pruning and using wanda). Here are my intermediate and final perplexity results with the dense model (with context size 4096). It seems like the last few samples are somehow messing up the perplexity but I don't know why. Any help would be appreciated.
nsamples 333
sample 50, Perplexity 5.0264153480529785
sample 100, Perplexity 5.311441421508789
sample 150, Perplexity 5.710564136505127
sample 200, Perplexity 5.612466335296631
sample 250, Perplexity 5.526543617248535
sample 300, Perplexity 6.8109965324401855
wikitext perplexity 7.72459077835083
The text was updated successfully, but these errors were encountered: