for further experimentation, some book excerpts (ca. 100 pages each) can be found in the data directory of this workshop
preprocessed page images
ground truth (incomplete for Hobbes and Zonaras)
OCR output from ABBYY and Tesseract
sources for the scans of complete volumes are given below
many thanks to Kay Würzner (Grenzboten), Federico Boschetti (Zonaras) and Jasmin Chebib and Haide Friedrich-Salgado (Hobbes) for providing us with ground truth