Erweiterungsmodul: Machine Translation

Summary

Machine Translation

In the first part, the general problem of machine translation (automatic translation of text from one language to another) will be discussed, as well as the history of research into machine translation. We will then briefly consider older approaches to machine translation (before the current focus on machine learning). Then, some particular challenges for natural language processing that must be solved on the way to general approaches for machine translation will be presented. Finally, we will discuss the important topic of evaluation of machine translation systems.

In the second part, we will look at statistical machine translation (SMT), which became the dominant paradigm in translation from about 2000 to 2015, and is still the core of many industrial systems. The related concepts of translational equivalence (established through word alignment), simple statistical models and search algorithms will be introduced.

In the third and last part of the lecture, we will consider the deep learning approaches used in so-called neural machine translation (NMT). We will briefly introduce the concepts of word embeddings and deep learning before moving on to provide a high-level overview of recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) approaches to translation, and then follow up with the state-of-the-art Transformer approach, and talk about transfer learning (with applications beyond NMT).

Goals

Theoretical understanding of the challenges of machine translation and the models used to solve them.

Machine Translation Exercises (Übung)

Goals

Practical experience in solving sub-problems of machine translation, as well as familiarity with the data used for training statistical models.

Instructor

Alexander Fraser

Email Address: Put My Last Name Here @cis.uni-muenchen.de

CIS, LMU Munich

Schedule

Tuesdays, 16 to 18 (c.t.). Room 115.

Wednesdays, 12 to 14 (c.t.). Room 151.

Date Topic Slides Video

April 19th Orientation and Introduction to Machine Translation pdf mp4

April 25th Introduction to Statistical Machine Translation key pdf mp4

April 26th Bitext alignment (extracting lexical knowledge from parallel corpora) key pdf mp4

Optional: read about Model 1 in Koehn and/or Knight (see below)

May 2nd Many-to-many alignments and Phrase-based model key pdf mp4

May 2nd Exercise 1 Released. Due Monday May 15th at 15:00. exercise1.txt
May 3rd Log-linear model and Minimum Error Rate Training
key pdf mp4

May 9th Decoding pdf mp4

May 10th Linear Models key pdf part1 mp4
part2 mp4

May 16th Review Exercise 1. Exercise 2 Released. Due Friday May 26th at 15:00. exercise2.html
tamchyna_acl_2016_slides.pdf
tamchyna_acl_2016_slides.key

May 17th Neural Networks (and Word Embeddings) pdf mp4 (skip first 60 seconds)

May 24th Training and RNN/LSTMs pdf mp4

May 30th Pfingstdienstag (holiday)

May 31st Bilingual Word Embeddings and Unsupervised SMT (Viktor Hangya) pdf mp4

June 6th Encoder-Decoder and Attention (Katharina Hämmerl) pdf mp4

June 7th Transformer (and Document NMT) pdf mp4

June 13th Review Exercise 2. Exercise 3 Released. Due Monday June 19th at 15:00. Review of Transformers. exercise3.pdf
June 14th Unsupervised NMT (see Transformer slide set) mp4

June 20th Review Exercise 3. Exercise 4 Released. Due Monday June 26th at 15:00. Also briefly presented CNNs and RNNs for image captioning. exercise4.pdf
CNN.key CNN.pdf
June 27th Review Exercise 4. Exercise 5 Released. Due Monday July 3rd at 15:00. exercise5.pdf
June 27th Operation Sequence Model and OOV Translation 14_part1_OSM.pdf 14_part2_OOV.pdf mp4
June 28th Overcoming Sparsity in NMT (research talk) pdf mp4

July 11th Transfer Learning for Unsupervised NMT (Alexandra Chronopoulou) pdf mp4

July 12th Review Exercise 5. Exercise 6 (pytorch NLP tutorial) released, not collected, recommended to be done on your own during the summer vacation. exercise6.pdf
July 19th Review for exam. Also: please do the teaching evaluations for both the VL and the Übung!
July 26th Exam *IN ROOM 123* at the usual time (12:00 c.t.)

August 2nd Multilingual Pretrained Models (Katharina Hämmerl) pdf mp4

Literature:

Philipp Koehn's book Statistical Machine Translation.

Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)

Philipp Koehn's other book Neural Machine Translation.

Date	Topic	Slides	Video
April 19th	Orientation and Introduction to Machine Translation	pdf	mp4
April 25th	Introduction to Statistical Machine Translation	key pdf	mp4
April 26th	Bitext alignment (extracting lexical knowledge from parallel corpora)	key pdf	mp4
		Optional: read about Model 1 in Koehn and/or Knight (see below)
May 2nd	Many-to-many alignments and Phrase-based model	key pdf	mp4
May 2nd	Exercise 1 Released. Due Monday May 15th at 15:00.	exercise1.txt
May 3rd	Log-linear model and Minimum Error Rate Training	key pdf	mp4
May 9th	Decoding	pdf	mp4
May 10th	Linear Models	key pdf	part1 mp4 part2 mp4
May 16th	Review Exercise 1. Exercise 2 Released. Due Friday May 26th at 15:00.	exercise2.html tamchyna_acl_2016_slides.pdf tamchyna_acl_2016_slides.key
May 17th	Neural Networks (and Word Embeddings)	pdf	mp4 (skip first 60 seconds)
May 24th	Training and RNN/LSTMs	pdf	mp4
May 30th	Pfingstdienstag (holiday)
May 31st	Bilingual Word Embeddings and Unsupervised SMT (Viktor Hangya)	pdf	mp4
June 6th	Encoder-Decoder and Attention (Katharina Hämmerl)	pdf	mp4
June 7th	Transformer (and Document NMT)	pdf	mp4
June 13th	Review Exercise 2. Exercise 3 Released. Due Monday June 19th at 15:00. Review of Transformers.	exercise3.pdf
June 14th	Unsupervised NMT	(see Transformer slide set)	mp4
June 20th	Review Exercise 3. Exercise 4 Released. Due Monday June 26th at 15:00. Also briefly presented CNNs and RNNs for image captioning.	exercise4.pdf CNN.key CNN.pdf
June 27th	Review Exercise 4. Exercise 5 Released. Due Monday July 3rd at 15:00.	exercise5.pdf
June 27th	Operation Sequence Model and OOV Translation	14_part1_OSM.pdf 14_part2_OOV.pdf	mp4
June 28th	Overcoming Sparsity in NMT (research talk)	pdf	mp4
July 11th	Transfer Learning for Unsupervised NMT (Alexandra Chronopoulou)	pdf	mp4
July 12th	Review Exercise 5. Exercise 6 (pytorch NLP tutorial) released, not collected, recommended to be done on your own during the summer vacation.	exercise6.pdf
July 19th	Review for exam. Also: please do the teaching evaluations for both the VL and the Übung!
July 26th	Exam IN ROOM 123 at the usual time (12:00 c.t.)
August 2nd	Multilingual Pretrained Models (Katharina Hämmerl)	pdf	mp4