This repository contains experimental data and code for the fast approximate modelling of the combination result for implementing a stopping rule for video stream text recognition.
To calculate the estimation values with the evaluated methods and measure execution time, use the precalculations.ipynb
notebook (function precalculate(field_types, method, parallel_processes)
). To only run the analysis and construct the plots (using the precalculated values), run comparison.ipynb
. For our experiments the code was executed used Python 3.7.4 running under Jupyter notebook version 6.0.1.
Description of each source code file and data directory follows.
1. metrics.py
- python module containing implementations for basic functions required for the experiments;
2. combination.py
- python module containing implementation of the text string combination algorithm, described in [this article], used for computing the expected distance estimation using the [base stopping method];
3. combination_with_estimation.py
- python module containing a modified implementation of the text string combination algorithm, used for computing the expected distance estimation using Method A and Method B;
4. treap.py
- python module containing implementation of a balanced binary search tree - [treap with random priorities];
5. precalculations.ipynb
- Jupyter notebook with code for precalculations of the error level (distance from the combined result to the correct text field value), estimation value, and timing;
6. comparison.ipynb
- Jupyter notebook with code for comparing the stopping methods and constructing plots (Figures 3, 4, and 5).
7. prepare_ic15_yvt.ipynb
- Jupyter notebook with code for preparing IC15-Train and YVT datasets.
1. data_<DATASET>/
- directory with text field recognition results, used as a dataset for the experiments. Document fields were taken from [MIDV-500 dataset] and [MIDV-2019 dataset], recognized using text field recognition subsystem of [Smart IDReader]. Fields are grouped into four field types (data_<DATASET>/date
- numeric dates, data_<DATASET>/docnum
- document numbers, data_<DATASET>/latin
- Latin name components, data_<DATASET>/mrz
- MRZ lines). Each field clip is stored in a Pickle format and has the filename format <FIELDTYPE>_<CLIPID>_<FIELDNAME>.pkl
. Arbitrary text objects were taken from [IC15 Text in Videos dataset] and [YouTube Video Text dataset], and recognized using the pretrained @clovaai model [available here]. Arbitrary text objects have field type none
.
The Pickle data format for each field clip is the following:
{
"clip_id": "<CLIPID>",
"ideal": "<CORRECT_FIELD_VALUE>",
"field_name": "<FIELDNAME>",
"field_type": "<FIELDTYPE(date/docnum/latin/mrz)>",
"clip": [ // list of per-frame recognition results
[ // per-frame recognition result - list of character classification results
{ // character classification result
"A": 0.9435345435, // "<CLASS_LABEL>": <MEMBERSHIP_ESTIMATION>
"B": 0.0224234234,
...
},
...
],
...
]
}
2. precalc_base_<DATASET>/
- directory with precalculated values for analysis. [Base stopping method] is used. Data files are generated using the corresponding functions in precalculations.ipynb
. Each processed field clip is stored in JSON format and has the filename format <FIELDTYPE>_<CLIPID>_<FIELDNAME>_base_precalc.json
.
The JSON data format for each field clip is the following:
[ // list of stages (one per each processed frame)
[ // data for i-th stage
0.02934239847, // Normalized Generalized Levenshtein Distance from combined result to the correct value (error level)
0.00001239872, // Sum of modelled distances (see expected distance estimation (eq. 2))
0.00000023423, // Time required to obtain combined result (seconds)
0.00693821907 // Time required to calculate the estimation value (seconds)
],
...
]
3. precalc_summation_<DATASET>/
- directory with precalculated values for analysis. Method A is used. Data files are generated using the corresponding functions in precalculations.ipynb
. The format is the same as precalc_base_<DATASET>/
.
4. precalc_treap_<DATASET>/
- directory with precalculated values for analysis. Method B is used. Data files are generated using the corresponding functions in precalculations.ipynb
. The format is the same as precalc_base_<DATASET>/
.