A Middle English Vocabulary

A project to mark up A Middle English Vocabulary by J. R. R. Tolkien and extract lexical information in a more machine-actionable form.

Source Material

43737-0.txt — text file from Project Gutenberg
43737-h.htm — HTML file from Project Gutenberg
5KJeiE-middleenglishvoc00tolkuoft.pdf — PDF scan from archive.org

Corrections

corrected.txt — corrected file (mostly transcription errors)

but also, the following errors in printed version:

Wlaffyng is missing [
Ȝa, Ȝaa has OE for OE. in etymology
Ver(r)ay has OF. for OFr. in etymology
Noþeles has OE for OE. in etymology
Werkman, Workeman has OE for OE. in etymology
Goddesse has OE for OE. in etymology
Dedir has MnE. for Mn.E. in etymology
Breue has Med. L. for Med.L. in etymology
Danes has Med. L. for Med.L. in etymology

Code for Patterns

Currently running ./check_etymologies.py and building regular expressions in etym_patterns.py for etymological information; and running ./check_entries.py and building regular expressions in entry_patterns.py for overall pattern structure.

There are currently no dependencies for running the script above.

Source code is run through black, isort and flake8 which are all dev dependencies in Pipfile.

Planning the TEI Markup

See Notes on Structure. If you have ideas about the TEI markup, create an issue for a particular entry, giving the markup in corrected.txt, the proposed TEI markup and we can discuss open questions there.

License

The underlying dictionary was published prior to 1923 and is considered to be in the public domain. The source material from Project Gutenberg is subject to the Project Gutenberg License.

Code is made available under an MIT License (see LICENSE)

Data is made available under a Creative Commons Attribution-ShareAlike 4.0 International Public License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Middle English Vocabulary

Source Material

Corrections

Code for Patterns

Planning the TEI Markup

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
43737-0.txt		43737-0.txt
43737-h.htm		43737-h.htm
5KJeiE-middleenglishvoc00tolkuoft.pdf		5KJeiE-middleenglishvoc00tolkuoft.pdf
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
check_entries.py		check_entries.py
check_etymologies.py		check_etymologies.py
corrected.txt		corrected.txt
entry_patterns.py		entry_patterns.py
etym_patterns.py		etym_patterns.py
setup.cfg		setup.cfg
utils.py		utils.py

License

digitaltolkien/a-middle-english-vocabulary

Folders and files

Latest commit

History

Repository files navigation

A Middle English Vocabulary

Source Material

Corrections

Code for Patterns

Planning the TEI Markup

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages