v0.4.0
What's Changed
- Replace stale
triviaqa
dataset link by @jon-tow in #364 - Update
actions/setup-python
in CI workflows by @jon-tow in #365 - Bump
triviaqa
version by @jon-tow in #366 - Update
lambada_openai
multilingual data source by @jon-tow in #370 - Update Pile Test/Val Download URLs by @fattorib in #373
- Added ToxiGen task by @Thartvigsen in #377
- Added CrowSPairs by @aflah02 in #379
- Add accuracy metric to crows-pairs by @haileyschoelkopf in #380
- hotfix(gpt2): Remove vocab-size logits slice by @jon-tow in #384
- Enable "low_cpu_mem_usage" to reduce the memory usage of HF models by @sxjscience in #390
- Upstream
hf-causal
andhf-seq2seq
model implementations by @haileyschoelkopf in #381 - Hosting arithmetic dataset on HuggingFace by @fattorib in #391
- Hosting wikitext on HuggingFace by @fattorib in #396
- Change device parameter to cuda:0 to avoid runtime error by @Jeffwan in #403
- Update README installation instructions by @haileyschoelkopf in #407
- feat: evaluation using peft models with CLM by @zanussbaum in #414
- Update setup.py dependencies by @ret2libc in #416
- fix: add seq2seq peft by @zanussbaum in #418
- Add support for load_in_8bit and trust_remote_code model params by @philwee in #422
- Hotfix: patch issues with the
huggingface.py
model classes by @haileyschoelkopf in #427 - Continuing work on refactor [WIP] by @haileyschoelkopf in #425
- Document task name wildcard support in README by @haileyschoelkopf in #435
- Add non-programmatic BIG-bench-hard tasks by @yurodiviy in #406
- Updated handling for device in lm_eval/models/gpt2.py by @nikhilpinnaparaju in #447
- [WIP, Refactor] Staging more changes by @haileyschoelkopf in #465
- [Refactor, WIP] Multiple Choice + loglikelihood_rolling support for YAML tasks by @haileyschoelkopf in #467
- Configurable-Tasks by @lintangsutawika in #438
- single GPU automatic batching logic by @fattorib in #394
- Fix bugs introduced in #394 #406 and max length bug by @juletx in #472
- Sort task names to keep the same order always by @juletx in #474
- Set PAD token to EOS token by @nikhilpinnaparaju in #448
- [Refactor] Add decorator for registering YAMLs as tasks by @haileyschoelkopf in #486
- fix adaptive batch crash when there are no new requests by @jquesnelle in #490
- Add multilingual datasets (XCOPA, XStoryCloze, XWinograd, PAWS-X, XNLI, MGSM) by @juletx in #426
- Create output path directory if necessary by @janEbert in #483
- Add results of various models in json and md format by @juletx in #477
- Update config by @lintangsutawika in #501
- P3 prompt task by @lintangsutawika in #493
- Evaluation Against Portion of Benchmark Data by @kenhktsui in #480
- Add option to dump prompts and completions to a JSON file by @juletx in #492
- Add perplexity task on arbitrary JSON data by @janEbert in #481
- Update config by @lintangsutawika in #520
- Data Parallelism by @fattorib in #488
- Fix mgpt fewshot by @lintangsutawika in #522
- Extend
dtype
command line flag toHFLM
by @haileyschoelkopf in #523 - Add support for loading GPTQ models via AutoGPTQ by @gakada in #519
- Change type signature of
quantized
and its default value for python < 3.11 compatibility by @passaglia in #532 - Fix LLaMA tokenization issue by @gakada in #531
- [Refactor] Make promptsource an extra / not required for installation by @haileyschoelkopf in #542
- Move spaces from context to continuation by @gakada in #546
- Use max_length in AutoSeq2SeqLM by @gakada in #551
- Fix typo by @kwikiel in #557
- Add load_in_4bit and fix peft loading by @gakada in #556
- Update task_guide.md by @haileyschoelkopf in #564
- [Refactor] Non-greedy generation ; WIP GSM8k yaml by @haileyschoelkopf in #559
- Dataset metric log [WIP] by @lintangsutawika in #560
- Add Anthropic support by @zphang in #562
- Add MultipleChoiceExactTask by @gakada in #537
- Revert "Add MultipleChoiceExactTask" by @StellaAthena in #568
- [Refactor] [WIP] New YAML advanced docs by @haileyschoelkopf in #567
- Remove the registration of "GPT2" as a model type by @StellaAthena in #574
- [Refactor] Docs update by @haileyschoelkopf in #577
- Better docs by @lintangsutawika in #576
- Update evaluator.py cache_db argument str if model is not str by @poedator in #575
- Add --max_batch_size and --batch_size auto:N by @gakada in #572
- [Refactor] ALL_TASKS now maintained (not static) by @haileyschoelkopf in #581
- Fix seqlen issues for bloom, remove extraneous OPT tokenizer check by @haileyschoelkopf in #582
- Fix non-callable attributes in CachingLM by @gakada in #584
- Add error handling for calling
.to(device)
by @haileyschoelkopf in #585 - fixes some minor issues on tasks. by @lintangsutawika in #580
- Add - 4bit-related args by @SONG-WONHO in #579
- Fix triviaqa task by @seopbo in #525
- [Refactor] Addressing Feedback on new docs pages by @haileyschoelkopf in #578
- Logging Samples by @farzanehnakhaee70 in #563
- Merge master into big-refactor by @gakada in #590
- [Refactor] Package YAMLs alongside pip installations of lm-eval by @haileyschoelkopf in #596
- fixes for multiple_choice by @lintangsutawika in #598
- add openbookqa config by @farzanehnakhaee70 in #600
- [Refactor] Model guide docs by @haileyschoelkopf in #606
- [Refactor] More MCQA fixes by @haileyschoelkopf in #599
- [Refactor] Hellaswag by @nopperl in #608
- [Refactor] Seq2Seq Models with Multi-Device Support by @fattorib in #565
- [Refactor] CachingLM support via
--use_cache
by @haileyschoelkopf in #619 - [Refactor] batch generation better for
hf
model ; deprecatehf-causal
in new release by @haileyschoelkopf in #613 - [Refactor] Update task statuses on tracking list by @haileyschoelkopf in #629
- [Refactor]
device_map
options forhf
model type by @haileyschoelkopf in #625 - [Refactor] Misc. cleanup of dead code by @haileyschoelkopf in #609
- [Refactor] Log request arguments to per-sample json by @haileyschoelkopf in #624
- [Refactor] HellaSwag YAML fix by @nopperl in #639
- [Refactor] Add caveats to
parallelize=True
docs by @haileyschoelkopf in #638 - fixed super_glue and removed unused yaml config by @lintangsutawika in #645
- [Refactor] Fix sample logging by @haileyschoelkopf in #646
- Add PEFT, quantization, remote code, LLaMA fix by @gakada in #644
- [Refactor] Handle
cuda:0
device assignment by @haileyschoelkopf in #647 - [refactor] Add prost config by @farzanehnakhaee70 in #640
- [Refactor] Misc. bugfixes ; edgecase quantized models by @haileyschoelkopf in #648
- Update init.py by @lintangsutawika in #650
- [Refactor] Add Lambada Multilingual by @haileyschoelkopf in #658
- [Refactor] Add: SWAG,RACE,Arithmetic,Winogrande,PubmedQA by @fattorib in #627
- [refactor] Add qa4mre config by @farzanehnakhaee70 in #651
- Update
generation_kwargs
by @lintangsutawika in #657 - [Refactor] Move race dataset on HF to EleutherAI group by @fattorib in #661
- [Refactor] Add Headqa by @haileyschoelkopf in #659
- [Refactor] Add Unscramble ; Toxigen ; Hendrycks_Ethics ; MathQA by @haileyschoelkopf in #660
- [Refactor] Port TruthfulQA (mc1 only) by @nopperl in #666
- [Refactor] Miscellaneous fixes by @haileyschoelkopf in #676
- [Refactor] Patch to revamp-process by @haileyschoelkopf in #678
- Revamp process by @lintangsutawika in #671
- [Refactor] Fix padding ranks by @haileyschoelkopf in #679
- [Refactor] minor edits by @baberabb in #680
- [Refactor] Migrate ANLI tasks to yaml by @yeoedward in #682
- edited output_path and added help to args by @baberabb in #684
- [Refactor] Minor changes by @haileyschoelkopf in #685
- [Refactor] typo by @baberabb in #687
- [Test] fix test_evaluator.py by @baberabb in #675
- Fix dummy model not invoking super class constructor by @yeoedward in #688
- [Refactor] Migrate webqs task to yaml by @yeoedward in #689
- [Refactor] Fix tests by @baberabb in #693
- [Refactor] Migrate xwinograd tasks to yaml by @yeoedward in #695
- Early stop bug of greedy_until (primary_until should be a list of str) by @ZZR0 in #700
- Remove condition to check for
winograd_schema
by @lintangsutawika in #690 - [Refactor] Use console script by @lintangsutawika in #703
- [Refactor] Fixes for when using
num_fewshot
by @lintangsutawika in #702 - [Refactor] Updated anthropic to new API by @baberabb in #710
- [Refactor] Cleanup for
big-refactor
by @haileyschoelkopf in #686 - Update README.md by @lintangsutawika in #720
- [Refactor] Benchmark scripts by @lintangsutawika in #612
- [Refactor] Fix Max Length arg by @lintangsutawika in #723
- Add note about MPS by @StellaAthena in #728
- Update huggingface.py by @lintangsutawika in #730
- Update README.md by @StellaAthena in #732
- [Refactor] Port over Autobatching by @fattorib in #673
- [Refactor] Fix Anthropic Import and other fixes by @lintangsutawika in #724
- [Refactor] Remove Unused Variable in Make-Table by @lintangsutawika in #734
- [Refactor] logiqav2 by @baberabb in #711
- [Refactor] Fix task packaging by @yeoedward in #739
- [Refactor] fixed openai by @baberabb in #736
- [Refactor] added some typehints by @baberabb in #742
- [Refactor] Port Babi task by @haileyschoelkopf in #752
- [Refactor] CrowS-Pairs by @haileyschoelkopf in #751
- Update README.md by @haileyschoelkopf in #745
- [Refactor] add xcopa by @lintangsutawika in #749
- Update README.md by @lintangsutawika in #764
- [Refactor] Add Blimp by @lintangsutawika in #763
- [Refactor] Use evaluation mode for accelerate to prevent OOM by @tju01 in #770
- Patch Blimp by @lintangsutawika in #768
- [Refactor] Speedup hellaswag context building by @haileyschoelkopf in #774
- [Refactor] Patch crowspairs higher_is_better by @haileyschoelkopf in #766
- [Refactor] XNLI by @lintangsutawika in #776
- [Refactor] Update Benchmark by @lintangsutawika in #777
- [WIP] Update API docs in README by @haileyschoelkopf in #747
- [Refactor] Real Toxicity Prompts by @aflah02 in #725
- [Refactor] XStoryCloze by @lintangsutawika in #759
- [Refactor] Glue by @lintangsutawika in #761
- [Refactor] Add triviaqa by @lintangsutawika in #758
- [Refactor] Paws-X by @lintangsutawika in #779
- [Refactor] MC Taco by @lintangsutawika in #783
- [Refactor] Truthfulqa by @lintangsutawika in #782
- [Refactor] fix doc_to_target processing by @lintangsutawika in #786
- [Refactor] Add README.md by @lintangsutawika in #757
- [Refactor] Don't always require Perspective API key to run by @haileyschoelkopf in #788
- [Refactor] Added HF model test by @baberabb in #791
- [Big refactor] HF test fixup by @baberabb in #793
- [Refactor] Process Whitespace for greedy_until by @lintangsutawika in #781
- [Refactor] Fix metrics in Greedy Until by @lintangsutawika in #780
- Update README.md by @Wehzie in #803
- Merge Fix metrics branch by @uSaiPrashanth in #802
- [Refactor] Update docs by @lintangsutawika in #744
- [Refactor] Superglue T5 Parity by @lintangsutawika in #769
- Update main.py by @lintangsutawika in #817
- [Refactor] Coqa by @lintangsutawika in #820
- [Refactor] drop by @lintangsutawika in #821
- [Refactor] Asdiv by @lintangsutawika in #813
- [Refactor] Fix IndexError by @lintangsutawika in #819
- [Refactor] toxicity: API inside function by @baberabb in #822
- [Refactor] wsc273 by @lintangsutawika in #807
- [Refactor] Bump min accelerate version and update documentation by @fattorib in #812
- Add mypy baseline config by @ethanhs in #809
- [Refactor] Fix wikitext task by @haileyschoelkopf in #833
- [Refactor] Add WMT tasks by @haileyschoelkopf in #775
- [Refactor] consolidated tasks tests by @baberabb in #831
- Update README.md by @lintangsutawika in #838
- [Refactor] mgsm by @lintangsutawika in #784
- [Refactor] Add top-level import by @haileyschoelkopf in #830
- Add pyproject.toml by @ethanhs in #810
- [Refactor] Additions to docs by @haileyschoelkopf in #799
- [Refactor] Fix MGSM by @lintangsutawika in #845
- [Refactor] float16 MPS works in torch nightly by @baberabb in #853
- [Refactor] Update benchmark by @lintangsutawika in #850
- Switch to pyproject.toml based project metadata by @ethanhs in #854
- Use Dict to make the code python 3.8 compatible by @chrisociepa in #857
- [Refactor] NQopen by @baberabb in #859
- [Refactor] NQ-open by @haileyschoelkopf in #798
- Fix "local variable 'docs' referenced before assignment" error in write_out.py by @chrisociepa in #856
- [Refactor] 3.8 test compatibility by @baberabb in #863
- [Refactor] Cleanup dependencies by @haileyschoelkopf in #860
- [Refactor] Qasper, MuTual, MGSM (Native CoT) by @lintangsutawika in #840
- undefined type and output_type when using promptsource fixed by @Hojjat-Mokhtarabadi in #842
- [Refactor] Deactivate select GH Actions by @haileyschoelkopf in #871
- [Refactor] squadv2 by @lintangsutawika in #785
- [Refactor] Set python3.8 as allowed version by @haileyschoelkopf in #862
- Fix positional arguments in HF model generate by @chrisociepa in #877
- [Refactor] MATH by @baberabb in #861
- Create cot_yaml by @lintangsutawika in #870
- [Refactor] Port CSATQA to refactor by @haileyschoelkopf in #865
- [Refactor] CMMLU, C-Eval port ; Add fewshot config by @haileyschoelkopf in #864
- [Refactor] README.md for Asdiv by @lintangsutawika in #878
- [Refactor] Hotfixes to big-refactor by @haileyschoelkopf in #880
- Change Python Version to 3.8 in .pre-commit-config.yaml and GitHub Actions by @chrisociepa in #895
- [Refactor] Fix PubMedQA by @tmabraham in #890
- [Refactor] Fix error when calling
lm-eval
by @lintangsutawika in #899 - [Refactor] bigbench by @lintangsutawika in #852
- [Refactor] Fix wildcards by @haileyschoelkopf in #900
- Add transformation filters by @chrisociepa in #883
- [Refactor] Flan benchmark by @lintangsutawika in #816
- [Refactor] WIP: Add MMLU by @haileyschoelkopf in #753
- Added notable contributors to the citation block by @StellaAthena in #907
- [Refactor] Improve error logging by @baberabb in #908
- [Refactor] Add _batch_scheduler in greedy_until by @AndyWolfZwei in #912
- add belebele by @ManuelFay in #885
- Update README.md by @StellaAthena in #917
- [Refactor] Precommit formatting for Belebele by @lintangsutawika in #926
- [Refactor] change all mentions of
greedy_until
togenerate_until
by @lintangsutawika in #927 - [Refactor] Squadv2 updates by @lintangsutawika in #923
- [Refactor] Verbose by @lintangsutawika in #910
- [Refactor] Fix Unit Tests by @haileyschoelkopf in #905
- Fix
generate_until
rename by @haileyschoelkopf in #929 - [Refactor] Generate_until rename by @haileyschoelkopf in #931
- Fix 'tqdm' object is not subscriptable" error in huggingface.py when batch size is auto by @jasonkrone in #916
- [Refactor] Fix Default Metric Call by @lintangsutawika in #935
- Big refactor write out adaption by @MicPie in #937
- Update pyproject.toml by @lintangsutawika in #915
- [Refactor] Fix whitespace warning by @haileyschoelkopf in #949
- [Refactor] Update documentation by @haileyschoelkopf in #954
- [Refactor]fix two bugs when ran with qasper_bool and toxigen by @AndyWolfZwei in #934
- [Refactor] Describe local dataset usage in docs by @haileyschoelkopf in #956
- [Refactor] Update README, documentation by @haileyschoelkopf in #955
- [Refactor] Don't load MMLU auxiliary_train set by @haileyschoelkopf in #953
- [Refactor] Patch for Generation Until by @lintangsutawika in #957
- [Refactor] Model written eval by @lintangsutawika in #815
- [Refactor] Bugfix: AttributeError: 'Namespace' object has no attribute 'verbose' by @haileyschoelkopf in #966
- [Refactor] Mmlu subgroups and weight avg by @lintangsutawika in #922
- [Refactor] Remove deprecated
gold_alias
task YAML option by @haileyschoelkopf in #965 - [Refactor] Logging fixes by @haileyschoelkopf in #952
- [Refactor] fixes for alternative MMLU tasks. by @lintangsutawika in #981
- [Refactor] Alias fix by @lintangsutawika in #987
- [Refactor] Minor cleanup on base
Task
subclasses by @haileyschoelkopf in #996 - [Refactor] add squad from master by @lintangsutawika in #971
- [Refactor] Squad misc by @lintangsutawika in #999
- [Refactor] Fix CI tests by @haileyschoelkopf in #997
- [Refactor] will check if group_name is None by @lintangsutawika in #1001
- [Refactor] Bugfixes by @haileyschoelkopf in #1002
- [Refactor] Verbosity rework by @lintangsutawika in #958
- add description on task/group alias by @lintangsutawika in #979
- [Refactor] Upstream ggml from big-refactor branch by @haileyschoelkopf in #967
- [Refactor] Improve Handling of Stop-Sequences for HF Batched Generation by @haileyschoelkopf in #1009
- [Refactor] Update README by @baberabb in #1020
- [Refactor] Remove
examples/
folder by @haileyschoelkopf in #1018 - [Refactor] vllm support by @baberabb in #1011
- Allow Generation arguments on greedy_until reqs by @uSaiPrashanth in #897
- Social iqa by @StellaAthena in #1030
- [Refactor] BBH fixup by @haileyschoelkopf in #1029
- Rename bigbench.yml to default.yml by @StellaAthena in #1032
- [Refactor] Num_fewshot process by @lintangsutawika in #985
- [Refactor] Use correct HF model type for MBart-like models by @haileyschoelkopf in #1024
- [Refactor] Urgent fix by @lintangsutawika in #1033
- [Refactor] Versioning by @lintangsutawika in #1031
- fixes for sampler by @baberabb in #1038
- [Refactor] Update README.md by @lintangsutawika in #1046
- [refactor] mps requirement by @baberabb in #1037
- [Refactor] Additions to example notebook by @haileyschoelkopf in #1048
- Miscellaneous documentation updates by @StellaAthena in #1047
- [Refactor] add notebook for overview by @lintangsutawika in #1025
- Update README.md by @StellaAthena in #1049
- [Refactor] Openai completions by @lintangsutawika in #1008
- [Refactor] Added support for OpenAI ChatCompletions by @DaveOkpare in #839
- [Refactor] Update docs ToC by @haileyschoelkopf in #1051
- [Refactor] Fix fewshot cot mmlu descriptions by @lintangsutawika in #1060
New Contributors
- @fattorib made their first contribution in #373
- @Thartvigsen made their first contribution in #377
- @aflah02 made their first contribution in #379
- @sxjscience made their first contribution in #390
- @Jeffwan made their first contribution in #403
- @zanussbaum made their first contribution in #414
- @ret2libc made their first contribution in #416
- @philwee made their first contribution in #422
- @yurodiviy made their first contribution in #406
- @nikhilpinnaparaju made their first contribution in #447
- @lintangsutawika made their first contribution in #438
- @juletx made their first contribution in #472
- @janEbert made their first contribution in #483
- @kenhktsui made their first contribution in #480
- @passaglia made their first contribution in #532
- @kwikiel made their first contribution in #557
- @poedator made their first contribution in #575
- @SONG-WONHO made their first contribution in #579
- @seopbo made their first contribution in #525
- @farzanehnakhaee70 made their first contribution in #563
- @nopperl made their first contribution in #608
- @yeoedward made their first contribution in #682
- @ZZR0 made their first contribution in #700
- @tju01 made their first contribution in #770
- @Wehzie made their first contribution in #803
- @uSaiPrashanth made their first contribution in #802
- @ethanhs made their first contribution in #809
- @chrisociepa made their first contribution in #857
- @Hojjat-Mokhtarabadi made their first contribution in #842
- @AndyWolfZwei made their first contribution in #912
- @ManuelFay made their first contribution in #885
- @jasonkrone made their first contribution in #916
- @MicPie made their first contribution in #937
- @DaveOkpare made their first contribution in #839
Full Changelog: v0.3.0...v0.4.0