The RNNTagger is a tool for annotating text with part-of-speech and lemma information. It comes with pretrained parameter files for many languages. RNNTagger was implemented in Python using the Deep Learning library PyTorch.
Compared to TreeTagger, the pros of RNNTagger are
Please read the license terms, before you download the software! By downloading the software, you agree to the terms stated there.
The following steps are required to install RNNTagger on Linux:
> echo "This is a test." > test.txt
> cmd/rnn-tagger-english.sh test.txt
This will produce the output:
This | DT | this |
is | VBZ | be |
a | DT | a |
test | NN | test |
. | . | . |
Currently supported modern languages: Arabic, Belarusian, Bulgarian, Catalan, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Icelandic, Italian, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swahili, Turkish, Ukrainian, Upper Sorbian
Currently supported ancient languages: Coptic, Middle Dutch, Middle English, Old French, Middle French, Early New High German, Middle High German, Old Greek, Old Icelandic, Old Italian, Latin, Syriac
The Middle High German version of RNNTagger has been integrated into the Weblicht platform. Just follow these instructions in order to use it.
The tagger package contains a README file with further information on the tagger and the parameter files.
RNNTagger is described in this paper:
Helmut Schmid (2019). Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts, DATeCH, May 2019, Brussels, Belgium.