Statistical Machine Translation - Nepal Summer School in Advanced Language Engineering

Invitation

The success of statistical machine translation systems such as Moses, Language Weaver and Google Translate has shown that it is possible to build high performance machine translation systems with a small amount of effort using statistical learning techniques.

This course will present the basic modeling behind statistical machine translation in a concise way.

Instructor

Alex Fraser

Email Address: SubstituteMyLastName@ims.uni-stuttgart.de

University of Stuttgart

DFG Project: Models of Morphosyntax for Statistical Machine Translation

Institute for Natural Language Processing (IMS/IfNLP)

SFB 732 - Incremental Specification in Context

Schedule

Location: University of Kathmandu, see the Summer School in Advanced Language Engineering web page.

Homework assignments:

Assignment 1 - Google Translate and Manual Word Alignment. Here is the alignment browser/editor (and there is a zip file here).
DUE DATE: Midnight, September 13th
Assignment 2 - Do one of these two options: implement IBM Model 1 or try OmegaT and answer some questions about Model 1
DUE DATE: Midnight, September 16th
Assignment 3 Google Translate and Indian Parallel Corpora

Additional Resources:

Details of the Moses Toolkit.
The Indian Parallel Corpora and see also Matt Post's homepage for the presentation slides.
You can download two Urdu POS taggers, see also the paper, particularly for details of the tag set.
For more on language modeling, see Koehn Chapter 7 and the Chen and Goodman tutorial.
For transliteration intensive translation (here, Hindi to Urdu), see this paper.
Our English to German reordering in preprocessing paper (EACL 2012) has citations of other preprocessing papers covering subjects such as: parser-based SOV reordering, learning parser-based reordering rules, and learning tagger-based reordering rules.

Lectures:

September 18th Part 6. Translating to morphologically rich languages: case study on German
powerpoint slides
pdf slides

September 17th Part 5. Advanced topics in SMT. Discriminative bitext alignment, morphological processing, syntax
powerpoint slides
pdf slides
Reading: Koehn 10.1, 10.2, 10.3, 11.1

September 16th Part 4. Log-linear Models for SMT and Minimum Error Rate Training powerpoint slides
pdf slides
Reading: Koehn Chapter 5, 9.1, 9.2, 9.3

September 15th Part 3. Phrase-based Models and Decoding (automatically translating a text given an already learned model) powerpoint slides
pdf slides
Reading: Koehn 5.1, 5.2, Chapter 6

September 13th Part 2. Bitext alignment (extracting lexical knowledge from parallel corpora) powerpoint slides
pdf slides
Reading: Koehn Chapter 4
Optional Reading: Kevin Knight's SMT Tutorial (concentrate on Model 1)

September 10th to 11th Part 1. Introduction, basics of statistical machine translation (SMT), evaluation of MT (I also switched to slides on BLEU from Chris Callison-Burch) powerpoint slides
pdf slides
CCB slides
Reading: Koehn Chapters 1 and 3
OmegaT translation memory

September 18th	Part 6. Translating to morphologically rich languages: case study on German	powerpoint slides pdf slides
September 17th	Part 5. Advanced topics in SMT. Discriminative bitext alignment, morphological processing, syntax	powerpoint slides pdf slides Reading: Koehn 10.1, 10.2, 10.3, 11.1
September 16th	Part 4. Log-linear Models for SMT and Minimum Error Rate Training	powerpoint slides pdf slides Reading: Koehn Chapter 5, 9.1, 9.2, 9.3
September 15th	Part 3. Phrase-based Models and Decoding (automatically translating a text given an already learned model)	powerpoint slides pdf slides Reading: Koehn 5.1, 5.2, Chapter 6
September 13th	Part 2. Bitext alignment (extracting lexical knowledge from parallel corpora)	powerpoint slides pdf slides Reading: Koehn Chapter 4 Optional Reading: Kevin Knight's SMT Tutorial (concentrate on Model 1)
September 10th to 11th	Part 1. Introduction, basics of statistical machine translation (SMT), evaluation of MT (I also switched to slides on BLEU from Chris Callison-Burch)	powerpoint slides pdf slides CCB slides Reading: Koehn Chapters 1 and 3 OmegaT translation memory