A project to mark up A Middle English Vocabulary by J. R. R. Tolkien and extract lexical information in a more machine-actionable form.
43737-0.txt
— text file from Project Gutenberg43737-h.htm
— HTML file from Project Gutenberg5KJeiE-middleenglishvoc00tolkuoft.pdf
— PDF scan from archive.org
corrected.txt
— corrected file (mostly transcription errors)
but also, the following errors in printed version:
- Wlaffyng is missing
[
- Ȝa, Ȝaa has
OE
forOE.
in etymology - Ver(r)ay has
OF.
forOFr.
in etymology - Noþeles has
OE
forOE.
in etymology - Werkman, Workeman has
OE
forOE.
in etymology - Goddesse has
OE
forOE.
in etymology - Dedir has
MnE.
forMn.E.
in etymology - Breue has
Med. L.
forMed.L.
in etymology - Danes has
Med. L.
forMed.L.
in etymology
Currently running ./check_etymologies.py
and building regular expressions in etym_patterns.py
for etymological information; and running ./check_entries.py
and building regular expressions in entry_patterns.py
for overall pattern structure.
There are currently no dependencies for running the script above.
Source code is run through black
, isort
and flake8
which are all dev dependencies in Pipfile
.
See Notes on Structure. If you have ideas about the TEI markup, create an issue for a particular entry, giving the markup in corrected.txt
, the proposed TEI markup and we can discuss open questions there.
The underlying dictionary was published prior to 1923 and is considered to be in the public domain. The source material from Project Gutenberg is subject to the Project Gutenberg License.
Code is made available under an MIT License (see LICENSE
)
Data is made available under a Creative Commons Attribution-ShareAlike 4.0 International Public License.