RNNTagger

RNNTagger - a Neural Part-of-Speech Tagger

The RNNTagger is a tool for annotating text with part-of-speech and lemma information. It comes with pretrained parameter files for over 50 modern and historical languages. RNNTagger was implemented in Python using the PyTorch library.

Compared to TreeTagger, the pros of RNNTagger are

higher tagging accuracy
All tokens are lemmatized.

The cons are:

slower processing
larger parameter files
requires Python, PyTorch (and Perl for text preprocessing)

Installation

This software is freely available for research, education and evaluation. For commercial and other licenses, please contact the developer via the email address at the bottom of the page.

Please read the license terms, before you download the software! By downloading the software, you agree to the terms stated there.

The following steps are required to install RNNTagger on Linux and Windows:

Download the RNNTagger package (3.6 GB)
Extract the contents of the RNNTagger.zip archive
Install Python3, PyTorch, and Perl

Now, you can open a command-line shell, change to the newly created directory RNNTagger, and enter the commands:

> echo "This is a test." > test.txt
> cmd/rnn-tagger-english.sh test.txt

This will produce the output:

This	DT	this
is	VBZ	be
a	DT	a
test	NN	test
.	.	.

On a Windows system, you have to use the following command instead:

> cmd\rnn-tagger-english.bat test.txt

Supported languages

modern languages: Arabic, Belarusian, Bulgarian, Catalan, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Icelandic, Italian, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swahili, Turkish, Ukrainian, Upper Sorbian

historical languages: Coptic, Middle Dutch, Middle English, Old French, Middle French, Early New High German, Middle High German, Old Greek, Old Icelandic, Old Italian, Latin, Syriac

The Middle High German version of RNNTagger has been integrated into the Weblicht platform. Just follow these instructions in order to use it.

The tagger package contains a README file with further information on the tagger and the parameter files.

Citation

RNNTagger is described in the paper:

Helmut Schmid (2019). Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts, DATeCH, May 2019, Brussels, Belgium.

Please send questions, comments, suggestions and bug reports to Helmut Schmid at LastName@cis.lmu.de.