-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add callback to optionally "repair" fields #24
Comments
Hi! Yeah... that functionality should be out of the scope of the project, but heck! Why not? In fact, almost everything in mrz.checker is already off target xDD Because almost all the project (especially checker) has been done based on requests from others and some ideas of mine (some very bad) now I realize that I should have planned many things differently. Actually i'm trying to fix some of those bad ideas a bit now. Specifically the horrible _Report class Please give me a few days to finish what I'm doing with checker and we'll see what we can do. I don't know what you will think, but an option could be add the option to transliterate desired chars with a dict in the same way as in mrz.generator with surnames and given names. Something like this: def __init__(self, mrz_code: str, check_expiry=False, compute_warnings=False, ocr_transliteration=None):
""""
Params:
mrz_string (str): MRZ string of TD1's. Must be 90 uppercase characters long
check_expiry (bool): If it's set to True, it is verified and reported as warning that the
document is not expired and that expiry_date is not greater than 10 years
compute_warnings (bool): If it's set True, warnings compute as False
ocr_transliteration (dict): Transliteration dictionary for OCR purposes. None by default
"""
[...] I have some doubts:
EDIT: It's too late here. Please let me think it a little more calmly. IIf I can't think of anything better, yours might be a good solution. By the way.. One of the rules for using classes that inherit from |
- Should some specific fields be repaired or could it be applied to
all mrz code?
I guess it makes sense to apply it to all the fields that have
constraints on them as to what data they can contain. It probably is not
useful to have a callback for "this field may contain anything", but if
one knows it is characters only, or digits only, or a date...
- Usually corrections are always the same for everyone or each person
have their specific corrections? I ask this to add a dictionary to the
project (or several if there are not many)
It would probably be the same for everybody, if the MRZ source is the
same (ie. if I scan 1000 ID cards, then they probably will all have the
same classes of errors).
Your ocr_transliteration dict could be something along the lines:
```
{
'alpha': callback_for_replacing_numbers,
'digit': callback_for_replacing_chars,
}
```
Perhaps having specialisations for 'date' might make sense, and fall
back to 'digit' if not present?
I'm partial to having callbacks instead just a static mapping
dictionary (1->L, 5->S), but I can live with the static mapping too. The
transliteration should run before the hash checks and the other sanity
checks.
|
This sounds really great, right now for solving that issue is:
For sure this also applies to TD2 and TD3 and the others. But still I am wondering if there are still some plans to work on that? |
I made a commitment to add this feature a long time ago and have not kept my word. I'm not normally like that, but my current circumstances stole me of most of my time. When @mjl created this issue i thought about giving "a twist to his idea" but the truth is that I do not have the time and the experience in CV to do it. YES OF COURSE, YOUR PR WILL BE WELCOME and you will have my eternal gratitude 🥇 . Ideally, it could work for all documents. If you propose a PR we could look at it (if possible and @mjl is not very angry, he could also get involved or at least give his opinion) Thank you very much in advance |
@arg0s Don't worry, we all fall off the train sometimes when life
happens.
The feature is on my back burner too at this moment in time, but if
anyone has ideas/comments/code, feel free to discuss here!
|
I'm not really sure this functionality belongs here, but as the knowledge of the MRZ internal structure is only present in this module, why not... let me know what you think!
I work with scanned MRZ, and as comes with the process, the OCR sometimes mis-reads similar characters. For example, I have seen countries read as "R0U" or a name "SZ0BO5ZLAI". And the MRZ checker correctly warns that the nationality or the identifier is not valid. However, if you could add a method
repair()
to the checkersthat would allow me to do things like:
This would make the checker more useful when presented with badly scanned data.
The alternative would be that I somehow preprocess the MRZ, but then I would have to re-implement the MRZ structure definition in my code too. As said above, I'm not a big fan of shoehorning that functionality into this module, but I don't see any other place that has enough knowledge of the MRZ structure.
The text was updated successfully, but these errors were encountered: