Bilbo2 is an open source software for automatic annotation of bibliographic reference. It provides the segmentation and tagging of input XML document. Rewritten in python3 from Scratch, it comes from BILBO. Compare to the old one, a particular attention has been paid to the possibility of easily adding new algorithms of machine learning and test parameters. It can be used as much in live systems as for research.
Bilbo2 requires some dependencies:
- python3.5
- git >= 1.7.10 (needed by github)
- pip and setuptools , necessary for launch python installation
- libxml2-dev
User installation
python3 setup.py install --user
The documentation includes more detailed Installation Instructions
See docs for complete cli usage and examples for python interface usage
(C)Copyright 2019 OpenEdition by Mathieu Orban Main contributors are Yann Weber, Jérémy Trione. Special acknowledgements for Yoann Dupont (https://github.com/YoannDupont)
Bilbo2 is free and opensource. This project is licensed under the GNU AFFERO GENERAL PUBLIC LICENCE - see the LICENSE.txt file for details
Currently it is based on Conditional Random Fields (CRFs), machine learning technique to segment and label sequence data and on Support-Vector machines, machine learning technique to classify data.
As external softwares, it is used python-crfsuite
_ for CRF learning and inference and and libSVM
_ is used for sequence classification.
- Python-crfsuite Machine learning tools to segment and label sequence data with linear-chain CRF.
- LibSVM A Library for Support Vector Machines by Chih-Chung Chang and Chih-Jen Lin
- Lxml Library for processing XML and HTML in the Python Language.
- setuptools: to install Bilbo2.
- langdetect Langage detection.
- Source code: https://github.com/OpenEdition/bilbo2
- Issue tracker: https://github.com/OpenEdition/bilbo2/Issues
Feel free to submit ideas, bugs reports, pull requests or regular patches.
In order to run tests, launch:
cd bilbo2
python3 -m bilbo.tests.tests