SFST
What is SFST?
SFST is a toolbox for the implementation of morphological analysers
and other tools which are based on finite state transducer technology.
The SFST tools comprise
- a compiler which translates transducer programs into
minimised transducers
- interactive and batch-mode analysis programs
- tools for comparing and printing transducers
- an efficient C++ transducer library
Features
- freely available under
the GNU Public
License
- easy to learn for users who are familiar with grep,
sed, or Perl.
- efficient implementation in C++
- supports
- a wide range of transducer operations
- UTF-8 character coding
- weighted transducers (basic functionality only)
Downloads
- Source code of the SFST tools
- version 1.4.7g (only minor changes)
- version 1.4.7e (Empty lines in lexicon files are now ignored.)
- version 1.4.7d (fst-infl2 now allows
you to print all analyses on a single line by specifying a new delimiter.)
- version 1.4.7b (the replacement
operation now correctly works with alphabets that contain non-identity
mappings, problems with incompatible alphabets solved in fst-parse)
- version 1.4.6j (downward
replacement is now the exact opposite of upward replacement)
- version 1.4.6h (comments are now optionally allowed in the lexicon, faster fault-tolerant lookup)
- version 1.4.6a (Improvement of the efficiency of the minimisation and composition operations. Many thanks to Anssi Yli-Jyrä for his support!)
- version 1.4.4 (Bug related to multi-character symbols in the input was fixed.)
- version 1.4.3 (Optional replace operations have changed)
- version 1.4.2 (includes Hopcroft minimisation and other modifications which were jointly developed with the HFST team at Helsinki)
- version 1.3 (fst-print now produces a different output format which might affect the graphical viewers listed below)
- version 1.2
- A short
manual (included in the source code package)
-
A tutorial
on the implementation of computational morphologies (included in the
source code package)
-
SMOR, a German finite-state morphology which is based on SFST.
-
LatMor, a Latin finite-state morphology
with vowel length information.
-
EMOR, an English finite-state morphology using SFST.
-
TRMOR, a Turkish finite-state morphology created by Ayla Kayabas and documented in this paper.
-
mlmorph, a Malayalam finite-state morphology created by Santhosh Thottingal.
-
yakutmorph, a Yakut finite-state morphology created by Nicolas Cortegoso Vissio.
- A Debian package
for SFST (created by Francis Tyers)
- A Homebrew formula
for installing SFST on Macs (contributed by Nathan Glenn)
- Python bindings for SFST focusing on transducer usage (contributed by Gregor Middell)
- SFST source code with Python bindings (repository created by Santhosh Thottingal)
- Software for finding potential errors in your SFST code (created by Eleonora Nagy)
Publications
Please cite the following publication if you want to refer to the SFST tools:
A Programming Language for Finite State Transducers,
Proceedings of the 5th International Workshop on Finite State
Methods in Natural Language Processing (FSMNLP 2005), Helsinki, Finland. (pdf)
Relations to other FST Toolkits
There are two projects which aim to extend the functionality of SFST
in various ways:
- Anssi Yli-Jyrä's AFST toolkit is based on SFST
- The HFST
tookit developed by Krister Lindén, Kimmo Koskenniemi, and colleagues was implemented
on top of the three alternative FST libraries SFST, OpenFST, and foma.
See also the contributions by other authors below.
Links
Please send comments, suggestions and bug reports to Helmut Schmid at LastName@cis.uni-muenchen.de. (Insert the name into the email address.)