This folder includes run time benchmark scripts for vtext
To run the benchmarks download the following datasets,
- an (adapted) copy of the 20 newsgroup dataset here, and extract
the contents under
vtext/data/
. - the UD Treebanks v2.3 and extract them under
vtext/ud-treebanks-v2.3/
Various benchmark scrips can then be run in Python. Optional dependencies include,
- scikit-learn >=0.20
- nltk
- spacy
- python-Levenshtein
- blingfire
and are used as a performance baseline.