Content
Publikationen
2022
-
Optimizing the Training of Models for Automated Post-Correction of Arbitrary OCR-ed Historical Texts
(2022)
To appear in: Journal for Language Technology and Computational Linguistics (JLCL) 2022 – Band 35 (1)
2021
-
F-Transducers for Language Processing
(2021)
To appear in: Journal of Automata, Languages, and Combinatorics -- JALC
2019
-
Space-Efficient Bimachine Construction Based on the Equalizer Accumulation Principle
(2019)
To appear in: Theoretical Computer Science
2017
-
Failure Transducers and Applications in Knowledge-Based Text Processing
(2017)
Proceedings of the 13th International Conference in Finite State Methods and Natural Language Processing, FSMNLP 2017, Frank Drewes (Editor), Umeå, Sweden 2017 -
Profiling of OCR'ed Historical Texts Revisited.
(2017)
Proceedings of the 2nd Conference *Digital Access to Textual Cultural Heritage Conference 2017 (to appear)* ArXiv e-prints
2016a
-
Automatic quality evaluation and (semi-) automatic improvement of OCR models for historical printings
(2016a)
Automatic quality evaluation and (semi-) automatic improvement of OCR models for historical printings. ArXiv e-prints
2015a
-
Workshop: OCR & postcorrection of early printings for digital humanities.
(2015a)
Workshop: OCR & postcorrection of early printings for digital humanities.
2014
- Automated Assignment of Topics to OCRed Historical Texts (2014)
- PoCoTo - An Open Source System for Efficient Interactive Postcorrection of OCRed Historical Texts (2014)
- Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage (2014)
2013
-
Good parts first
(2013)
a new algorithm for approximate search in lexica and string databases. -
Overcoming the wall effect in similarity search.
(2013)
Overcoming the wall effect in similarity search. In: Proceeding EDBT '13, Proceedings of the Joint EDBT/ICDT 2013 Workshops, Pages 366-369. -
Plagiarism Detection for Indonesian Texts
(2013)
In Weippl, E., et.al (Eds). Proceedings of 15th International Conference on Information Integration and Web-Based Applications & Services (iiWAS2013). ACM , Vienna, Austria. ISBN: 978-1-4503-2113-6 . pp. 595-599.
2012
-
Facing Uncertainty in Digitisation
(2012)
Studies in Fuzziness and Soft Computing, 2012, Volume 273, Soft Computing in Humanities and Social Sciences, Rudolf Seising & Veronica Sanz González (Hrsg.), pp. 195-207, Springer New York, 2012.
2011
-
Computation of Similarity - Similarity Search as Computation.
(2011)
Computation of Similarity - Similarity Search as Computation. In: Löwe, B.; Normann, D.; Soskov, I.; Soskova, A. (Eds.) Models of Computation in Context, Proc. of the 7th Int. Conf. on Computability in Europe, CiE 2011, Sofia, Bulgaria, June 27 - July 2, 2011, Springer, Lecture Notes in Computer Science, Volume 6735, 2011, pp. 201-210 -
Deciding word neighborhood with universal neighborhood automata
(2011)
Theoretical Computer Science. ISSN 0304-3975. Volume 412, Issue 22, Pages 2340-2355. -
Deciding word neighborhood with universal neighborhood automata
(2011)
Deciding word neighborhood with universal neighborhood automata, Theoretical Computer Science, Volume 412, Issue 22, 2011, Pages 2340-2355, doi:10.1016/j.tcs.2011.01.013 -
Recognizing garbage in OCR output on historical documents
(2011)
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data (MOCR_AND '11), ACM, New York, NY, USA. -
Towards information retrieval on historical document collections: the role of matching procedures and special lexica
(2011)
International Journal of Document Analysis and Recognition 14(2): 159-171.
2010
-
Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, AND 2010
(2010)
Canada, Toronto -
Summary of the 4th workshop on analytics for noisy unstructured text data (AND)
(2010)
Proceedings of CIKM 2010, pp. 1965-1966
2009
-
Constructing a Lexicon from a Historical Corpus.
(2009)
Conference of the American Association for Corpus Linguistics (AACL09), Edmonton 2009. -
Enabling Information Retrieval on Historical Document Collections - the Role of Matching Procedures and Special Lexica.
(2009)
Proceedings of the ACM SIGIR 2009 Workshop on Analytics for Noisy Unstructured Text Data (AND 2009), Barcelona, 2009. -
On Lexical Resources for Digitization of Historical Documents.
(2009)
The 9th ACM Symposium on Document Engineering (DOCENG 2009). -
Successfully Detecting and Correcting False Friends Using Channel Profiles.
(2009)
International Journal of Document Analysis and Recognition (IJDAR) 12(3), pp.165-174. -
Universal levenshtein automata for a generalization of the Levenshtein distance
(2009)
Universal levenshtein automata for a generalization of the Levenshtein distance, Annuaire de l'Universite de Sofia "St. Kl. Ohridski", Faculte de Mathematique et Informatique, Volume 99, pp. 5-23, 2009
2008
-
A Semantic Interface for Post Secondary Education Programs.
(2008)
Anual Meeting of the American Society for Information Science and Technology (ASIS&T), Columbus, Ohio, USA. -
Efficient Techniques for Approximate Record Matching modulo Permutation.
(2008)
In G. Gross & K. U. Schulz (eds.): Linguistics, Computer Science and Language Processing: Festschrift for Franz Guenthner on the Occasion of His 60th Birthday (Tributes). College Publications, London, 2008. -
Successfully detecting and correcting false friends using channel profiles.
(2008)
Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data 2008 (AND 08): 17-22.
2007
-
Adaptive Text Correction with Web-Crawled Domain Dependent Dictionaries and Language Models
(2007)
ACM Transactions on Speech and Language Processing (TSLP), 4(4), pp. 9:1-9:36 -
Deriving Symbol Dependent Edit Weights for Text Correction - the Use of Error Dictionaries
(2007)
Proceedings of International Conference on Document Analysis and Recognition 2007 (ICDAR 07), pp. 639--643, Curitiba, Brazil, September 2007. -
Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks
(2007)
Proceedings of International Conference on Document Analysis and Recognition 2007 (ICDAR 07). -
Genre as Noise - Noise in Genre
(2007)
Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-2007) Workshop on Analytics for Noisy Unstructured Text Data, Hyderabad, India, January 2007, pp. 9-16. -
Genre as Noise: Noise in Genre
(2007)
International Journal on Document Analysis and Recognition (IJDAR), 10(3-4), pp.199-209. -
Information Access to Historical Documents from the Early New High German Period
(2007)
Proceedings of the IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data, Hyderabad, India, January 8, 2007. -
Text Correction Using Domain Dependent Bigram Models from Web Crawls
(2007)
Proceedings of the IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data, Hyderabad, India, January 8, 2007, pp. 47-54. -
Tuning the Selection of Correction Candidates for Garbled Tokens using Error Dictionaries
(2007)
in: Finite State Techniques and Approximate Search, Stoyan Mihov and Klaus U. Schulz (eds.), Proceedings of the First Workshop on Finite-State Techniques and Approximate Search, Recent Advances in Natural Language Processing (RANLP - 2007), pp. 25--30, Borovets, Bulgaria. September 2007. -
Unsupervised Learning of Edit Distance Weights for Retrieving Historical Spelling Variations
(2007)
in: Finite State Techniques and Approximate Search, Stoyan Mihov and Klaus U. Schulz (eds.), Proceedings of the First Workshop on Finite-State Techniques and Approximate Search, September 30th, 2007, Borovets, Bulgaria, p.1-6. -
Using Automated Error Profiling of Texts for Improved Selection of Correction Candidates for Garbled Tokens
(2007)
Proceedings of the Twentieth Australian Joint Conference on Artificial Intelligence (AI07). M.A. Orgun and J. Thornton (Eds.): Lecture Notes in Artificial Intelligence 4830, pp. 456-465. © Springer-Verlag Berlin Heidelberg 2007.
2006
-
Caching Schema Information and Intermediate Results for Fast Incremental XML Query Processing in RDBs.
(2006)
CIS-Bericht,2006 -
Conjunctive Queries over Trees.
(2006)
Journal of the ACM 53(2), March 2006. -
Integrated Document Browsing and Data Acquisition for Building Large Ontologies.
(2006)
In: Proceedings of the 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES), Part III, LNAI 4253, Invited Session "Engineered Applications of Semantic Web" (SWEA) 614-622 2006 Springer-Verlag -
Organizing Thematic, Geographic and Temporal Knowledge in a Well-founded Navigation Space: Logical and Algorithmic Foundations for EFGT Nets
(2006)
Journal of Web Services Research, Special Issue "Bridging Communities: Semantically Augmented Metadata for Services, Grids, and Software Engineering" 2006 -
Orthographic Errors in Web Pages - Towards Cleaner Web Corpora.
(2006)
Computational Linguistics 32(3) (September 2006), p. 295 - 340.
2005
-
A Corpus for Comparative Evaluation of OCR Software and Postcorrection Techniques.
(2005)
Proceedings of the 8th International Conference on Document Analysis and Recognition ICDAR'05, pp. 162-166, 2005. Paper, © IEEE Computer Society -
Decidability of Bounded Higher-Order Unification.
(2005)
Journal of Symbolic Computation, Volume 40 (2), pp. 905-954, August 2005. -
Efficient Dictionary-Based Text Rewriting using Subsequential Transducers.
(2005)
Journal of Natural Language Engineering, 2005. -
Exploiting Native XML Indexing Techniques for XML Retrieval in Relational Database Systems.
(2005)
Proceedings of the 7th ACM International Workshop on Web Information and Data Management (WIDM).2005. -
Stable Methods for Recognizing Acronym-Expansion Pairs: From Rule Sets to Hidden Markov Models.
(2005)
Accepted for publication in the International Journal of Document Analysis and Recognition IJDAR, 2005. -
The BIRD Numbering Scheme for XML and Tree Databases - Deciding and Reconstructing Tree Relations using Efficient Arithmetic Operations .
(2005)
Proceedings of the 3rd International XML Database Symposium (XSym).2005. -
The Same is Not The Same - Postcorrection of Alphabet Confusion Errors in Mixed-Alphabet OCR Recognition.
(2005)
Proceedings of the 8th International Conference on Document Analysis and Recognition ICDAR'05, pp. 406-410 , 2005. Paper, © IEEE Computer Society -
Visual Exploration and Retrieval of XML Document Collections with the Generic System X2.
(2005)
Journal on Digital Libraries, Special Issue on Information Visualization Interfaces for Retrieval and Analysis, 2005.Springer Verlag
2004
-
Automated Reasoning on the Web.
(2004)
Communications of Applied Logic, 2004. -
Conjunctive Queries over Trees.
(2004)
Proceedings of the ACM SIG-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM Press, pp. 189-200, 2004. -
Content and Structure in Indexing and Ranking XML.
(2004)
Proceedings of the 7th International Workshop on the Web and Databases,(WebDB), pp. 67-72, 2004. -
Content-Aware DataGuides: Interleaving IR and DB Indexing Techniques for Efficient Retrieval of Textual XML Data.
(2004)
Advances in Information Retrieval: Proceedings of the 26th European Conference on Information Retrieval (ECIR), LNCS 2997, pp. 378-393, 2004.© Springer Verlag . -
Efficient Dictionary-Based Text Rewriting using Subsequential Transducers.
(2004)
CIS-Bericht, Centrum für Informations- und Sprachverarbeitung, Universität München, 2004. -
Fast approximate search in large dictionaries.
(2004)
Computational Linguistics, Vol. 30(4), pp. 451-477, 2004. -
Precise and Efficient Text Correction using Levenshtein Automata, Dynamic Web Dictionaries and Optimized Correction Models.
(2004)
Proceedings of the Workshop on International Proofing Tools and Language Technologies, Patras, 2004. -
Ranked Retrieval of Structured Documents with the S-Term Vector Space Model.
(2004)
Advances in XML Information Retrieval:Proceedings of the 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), LNCS 3493, pp. 238-252, 2004 © Springer Verlag .
2003
-
A Visual and Interactive Tool for Optimizing Lexical Postcorrection of OCR-Results.
(2003)
Proceedings of the Workshop on Document Image Analysis and Retrieval DIAR'03 -
Fast Approximate Search in Large Dictionaries
(2003)
CIS-Bericht-03-132 -
How to Reduce k Tapes to One
(2003)
CIS-Bericht-03-133 -
Lexical Postcorrection of OCR-results: The Web as a Dynamic Secondary Dictionary?
(2003)
Proceedings of the 7th International Conference on Document Analysis and Recognition ICDAR'03, pp. 1133 - 1137, 2003. -
Lexical Postcorrection of OCR-Results: The Web as a Dynamic Secondary Dictionary
(2003)
CIS-Bericht-03-131 -
One-Letter Automata: How to Reduce k Tapes to One.
(2003)
CIS-Bericht-03-133, Centrum für Informations- und Sprachverarbeitung, Universität München, 2003. -
Systematics and Architecture for a Resource Representing Knowledge about Named Entities.
(2003)
Proceedings of the Workshop on Principles and Practice of Semantic Web Reasoning (PPSWR), pp. 189-207, 2003 © Springer Verlag . -
Visual Querying and Exploration of Large Answers in XML Databases with X²: A Demonstration.
(2003)
Proceedings of the International Conference on Data Engineering, ICDE'03.
2002
-
Decidability of Bounded Higher-Order Unification.
(2002)
Computer Science Logic, 16th international Workshop, CSL 2002, J. Bradfield (Ed.), Springer LNCS 2471, pp. 522-536. -
Fast String Correction with Levenshtein-Automata.
(2002)
International Journal of Document Analysis and Recognition (IJDAR) 5(1):67-85, 2002. -
Querying.
(2002)
In L. Lemnitzer und C. Lobin (eds.). Text Technologie. Perspektiven und Anwendungen. Stauffenberg-Verlag, 2002. -
Solvability of Context Equations with Two Context Variables is Decidable.
(2002)
Journal of Symbolic Computation, 33(1):77-122, 2002.
2001
-
Combination of Constraint Systems II: Rational Amalgamation.
(2001)
Theoretical Computer Science, 266:113-157, 2001 -
Combining Constraint Solving.
(2001)
In H. Comon and R. Treinen (eds.). Constraints in Computational Logics - Theory and Applications, number 2002, pages 1104-158. Springer LNCS, 2001. -
Complete Answer Aggregates for Tree-like Databases: A Novel Approach to Combine Querying and Navigation.
(2001)
ACM Transactions on Information Systems (TOIS), 19(2):161-215, 2001. -
Decidability of bounded higher-order unification.
(2001)
Forschungsbericht, Centrum für Informations- und Sprachverarbeitung, Universität München, 2001. -
Fast String Correction with Levenshtein-Automata
(2001)
CIS-Bericht-01-127 -
Fast String Correction with Levenshtein-Automata.
(2001)
CIS-Bericht-01-127, Centrum für Informations- und Sprachverarbeitung, Universität München, 2001. -
Towards Aggregated Answers for Semistructured Data.
(2001)
In Proceedings of the International Conference on Database Theory ICDT'2001.
2000
-
Complete Answer Aggregates for Answer Mappings to Sequence, Tree and Graph Databases
(2000)
CIS-Bericht-00-125 -
Tractable and Intractable Instances of Combination Problems for Unification and Disunification.
(2000)
Journal of Logic and Computation, volume 10(1), pp. 105-135, 2000. -
Why Combined Decision Problems are often Intractable.
(2000)
Frontiers of Combining Systems, 3rd International Workshop FroCoS'2000, Springer LNAI 1794, pp. 217-244, 2000.
1999
-
Aktuelles Schlagwort: Bioinformatik.
(1999)
Informatik Spektrum, Springer Verlag, Oktober 1999. -
Between Finite State and Prolog: Constraint-Based Automata for Efficient Recognition of Phrases.
(1999)
ACL Sudies in Natural Language Processing series (Cambridge University Press). -
Combining Constraint Solving.
(1999)
Proceedings of the First Summer School of Constraints in Computational Logic, Gif-Sur-Yvette, 1999. - Complete Answer Aggregates for Answer Mappings to Sequence, Tree and Graph Databases (1999)
-
Solvability of context equations with two context variables is decidable.
(1999)
Automated Deduction, Proceedings CADE 16, Springer LNAI, 1999.
1998
-
Abschlussbericht an die DFG.
(1998)
Final project report, 1998. (available only in German.) -
Between Finite State and Prolog: Constraint-Based Automata for Efficient Recognition of Phrases.
(1998)
Journal of Natural Language Engineering, Vol 2 No 4. -
Combination of Constraint Solvers for Free and Quasi-Free Structures.
(1998)
Theoretical Computer Science 192:107-161, 1998. -
Complete Answer Aggregates for Structured Document Retrieval
(1998)
CIS-Bericht-98-112 -
Complete Answer Aggregates for Structured Document Retrieval.
(1998)
CIS-Bericht-98-112, Centrum für Informations- und Sprachverarbeitung, Universität München, 1998. -
On the Exponent of Periodicity of Minimal Solutions of Context Equations.
(1998)
in: Proceedings of Rewriting Techniques and Applications 1998, T. Nipkow (ed.), Springer LNCS, 1998. -
Solvability of Context Equations with two Context Variables is Decidable
(1998)
CIS-Bericht-98-114 -
Solvability of Context Equations with two Context Variables is Decidable.
(1998)
CIS-Bericht-98-114, Centrum für Informations- und Sprachverarbeitung, Universität München, 1998. -
Unification Theory -- An Introduction.
(1998)
in: Automated Deduction. A basis for application. W. Bibel and P.H. Schmitt (eds.), Kluwer Academic Publishers, 1998.
1997
-
A Criterion for Intractability of E-unification with free function symbols and its relevance for combination of unification algorithms.
(1997)
In: Hubert Comon (ed.). Rewriting Techniques and Applications, 8th International Conference (RTA-97). p. 284 - 298. Springer, LNCS 1232, 1997. -
BILEDITA
(1997)
CIS-Bericht-97-108 -
Unification Theory - An Introduction
(1997)
CIS-Bericht-97-103
1996
-
Between Finite State and Prolog: Constraint-Based Automata for Efficient Recognition of Phrases
(1996)
CIS-Bericht-96-102 -
Combination of Constraint Solvers for Free and Quasi-Free Structures
(1996)
CIS-Bericht-96-90, auch erschienen in: Theoretical Computer Science, 192 (1988) pp. 107-161. -
Combination of Constraint Solvers II: Rational Amalgamation.
(1996)
in Proceedings of the second International Conference on Constraint Programming, CP-96, Springer-Verlag, Lecture Notes in Computer Science 1118, 1996. -
Combination of Constraint Systems II: Rational Amalgamation
(1996)
CIS-Bericht-96-86 -
Combining Unification and Disunification Algorithms -- Tractable and Intractable Instances
(1996)
CIS-Bericht-96-99 -
Frontiers of Combining Systems.
(1996)
Proceedings of the First International Workshop FroCoS'96. Kluwer Academic Publishers, 1996. -
UNIF 96, Extended Abstracts of the Tenth International Workshop on Unification
(1996)
CIS-Bericht-96-91 -
Unification in the Union of Disjoint Equational Theories: Combining Decision Procedures.
(1996)
J. Symbolic Computation, 21:211-243, 1996.
1995
-
Combination of Constraint Solving Techniques: An Algebraic Point of View.
(1995)
in: Proceedings of the 6th International Conference on Rewriting Techniques and Applications, RTA-95, Springer-Verlag, Lecture Notes in Computer Science 914, pp. 352-366, 1995. -
Logic Finite Automata
(1995)
in: Applied Logic: How, What and Why? M. Masuch and Lázló Pólos, Eds. Kluwer Academic Publishers, 1995. -
On the Combination of Symbolic Constraints, Solution Domains, and Constraint Solvers
(1995)
CIS-Bericht-95-120 -
On the Combination of Symbolic Constraints, Solution Domains, and Constraint Solvers .
(1995)
in Proceedings of the International Conference on Constraint Programming, CP-95, Springer-Verlag, Lecture Notes in Computer Science 976, 1995.
1994
-
Combination of Constraint Solving Techniques: An Algebraic Point of View
(1994)
CIS-Bericht-94-75 -
Constraints for Lists and Theories of Concatenation
(1994)
CIS-Bericht-94-80 -
On Existential Theories of List Concatenation
(1994)
in: Selected papers CSL'94 (Kazimierz). -
On the Combination of Symbolic Constraints, Solution Domains, and Constraint Solvers
(1994)
CIS-Bericht-94-82
1993
-
Combination Techniques and Decision Problems for Disunification
(1993)
CIS-Bericht-93-65 -
Word Unification and Transformation of Generalized Equations
(1993)
in: J. of Automated Reasoning 11, 1993, 149-184
1992
-
General A- and AX-Unification via Optimized Combination Procedures
(1992)
CIS-Bericht-92-58 -
Unification in the Union of Disjoint Equational Theories: Combining Decision Procedures
(1992)
CIS-Bericht-92-59 -
Word Equations: Proceedings of the October 1990 Workshop
(1992)
CIS-Bericht-92-54
1991
-
General A- and AX-Unification via Optimized Combination Procedures .
(1991)
General A- and AX-Unification via Optimized Combination Procedures . -
Logic Finite Automata and Constraint Logic Finite Automata
(1991)
CIS-Bericht-91-45 -
Makanin's Algorithm - Two Improvements and a Generalization
(1991)
CIS-Bericht-91-39 -
Word Unification and Transformation of Generalized Equations
(1991)
CIS-Bericht-91-46
1990
-
Word Equations and Related Topics, Proceedings of the 1st International Workshop, IWWERT'90 .
(1990)
Volume 572 of LNCS, Springer.
1988
-
Quantoren-Elimination bei Fastkörpern
(1988)
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg, 58, pp. 169-174, 1988.
1984
-
Algebraische Konsequenzen des Determiniertheitsaxionms
(1984)
Archiv der Mathematik, 42:557-563, 1984.