Skip to content

Latest commit

 

History

History
executable file
·
120 lines (68 loc) · 5.13 KB

physical_dimension.md

File metadata and controls

executable file
·
120 lines (68 loc) · 5.13 KB
title author date output
Document dimension preprocessing summary
Helsinki Computational History Group (COMHIS)
2020-04-14
markdown_document

Document size comparisons

  • Some dimension info is provided in the original raw data for altogether 471076 documents (97.9%) but could not be interpreted for 6003 documents (ie. dimension info was successfully estimated for 98.7 % of the documents where this field was not empty).

  • Document size (area) info was obtained in the final preprocessed data for altogether 466698 documents (97%). For the remaining documents, critical dimension information was not available or could not be interpreted: List of entries where document surface area could not be estimated

  • Document gatherings info is originally available for 464163 documents (96%), and further estimated up to 465073 documents (97%) in the final preprocessed data.

  • Document height info is originally available for 4649 documents (1%), and further estimated up to 466698 documents (97%) in the final preprocessed data.

  • Document width info is originally available for 0 documents (0%), and further estimated up to 466698 documents (97%) in the final preprocessed data.

These tables can be used to verify the accuracy of the conversions from the raw data to final estimates:

The estimated dimensions are based on the following auxiliary information sheets:

Left: final gatherings vs. final document dimension (width x height). Right: original gatherings versus original heights where both are available. The point size indicates the number of documents for each case. The red dots indicate the estimated height that is used when only gathering information is available.

plot of chunk summaryplot of chunk summary

Left: Document dimension histogram (surface area); Right: title count per gatherings.

plot of chunk sizesplot of chunk sizes