DFG-Projekt "Adaptives OCR-Postprocessing"

In this project, funded by Deutsche Forschungsgemeinschaft (DFG), we develop "adaptive'' methods for postcorrection of OCR results, i.e., methods that reflect the topics, the domain, the vocabulary and the linguistic characteristics of the document to be processed. Since OCR software does not have a complete look on documents to be processed, adaptive postcorrection, offering a bird's eye perspective, often leads to a new level of accuracy. One branch of our research concentrates on methods for computing domain and document specific lexica, language and correction models, using the web as a corpus. Another central aspect are methods for optimizing parameter settings of sophisticated correction strategies. Our correction tools use latest finite-state technology for approximate search in large dictionaries, developed in related projects of the group.

Keywords: OCR, document analysis and recognition, postprocessing, postcorrection, classifier combination, context based correction, approximate search.