This is a library for detecting and generating IDN Homographs. It is based on ShamFinder Framework, a research project published on Arxiv by Mori et al.
This code is available as a package on PyPI and can be installed via pip via pip install homograph
.
Otherwise, you can acquire the library by cloning this repository:
git clone https://github.com/FlowCrypt/idn-homographs-database.git
cd idn-homographs-database
With that done, we can try to detect a homograph. Let's replace the lowercase L in flowcrypt.com with the number one:
>>> import homograph
>>> homograph.looks_similar('flowcrypt.com', 'f1owcrypt.com')
True
Voila! Now let's see how the library reacts to a non-homograph:
>>> homograph.looks_similar('flowcrylt.com', 'flowcrypt.com')
False
In addition to detecting homographs, the library can also be used offensively, to generate them:
>>> homograph_generator = homograph.generate_similar_strings('a.b.c')
>>> next(homograph_generator)
'à.h.o'
>>> next(homograph_generator)
'à.h.𑣎'
>>> next(homograph_generator)
'à.h.с'