Word Sense Disambiguation (WSD) and Machine Translation (MT) are two key problems of natural language processing where the role of the lexicon is critical. While there are many different inventories of word senses for a particular language, it is clear that a minimal set of word senses can be defined by looking at translations into other languages (which are not synonyms).
Content:
The seminar will begin with the basics of Statistical Machine Translation and Word Sense Disambiguation, and then look at attempts to use approaches taken from the WSD literature in MT.
Goals:
The goal of the seminar is to understand the basics of MT, WSD and in particular the important role of the lexicon in both of these problems.
Email Address: SubstituteMyLastName@cis.uni-muenchen.de
DFG Project: Models of Morphosyntax for Statistical Machine Translation
Room L155, Tuesdays, 16:00 to 18:00 (c.t.)
Date | Topic | Reading (DO BEFORE THE MEETING!) | Slides |
October 7th | Organizational Meeting, Personal Information, Orientation Test | ||
October 14th | Introduction to Statistical Machine Translation | powerpoint pdf | |
October 21st | Bitext alignment (extracting lexical knowledge from parallel corpora) | powerpoint pdf | |
October 28th | Many-to-many alignments and Phrase-based Translation Modeling (also, Referat!) | powerpoint pdf | |
November 4th | Decoding | powerpoint pdf | |
November 4th and 11th | Log-linear model and Minimum Error Rate Training | powerpoint pdf | |
November 11th | SMT: Lexical Choice and Morphological Sparsity | powerpoint pdf | |
November 18th | Introduction to Word Sense Disambiguation | powerpoint pdf | |
November 25th | Referat + Introduction to Linear Models | Navigli, Sections 1 and 2 | powerpoint pdf |
December 2nd | Referat | Navigli, Sections 3 and 5 | |
December 9th | Referat + More Linear Models | (see Nov 25th) | |
December 16th | Machine Learning LAB! We will be in the *Gobi* computer lab | assignment CMU Seminars dataset tar file with scripts(UPDATED) unigram_bigram_pattern.txt(NOW WITH COMMENTS) wapiti |
Referatsthemen (name: topic)
Date | Topic | Materials | Hausarbeit Received |
November 25th | Wurst: Supervised WSD | yes | |
December 2nd | Chebib: Dictionary-based Disambiguation | yes | |
December 9th | Eder: Unsupervised WSD | yes | |
January 13th | CANCELLED due to presenter health reasons | ||
January 20th | Schätz: Project Cross-Lingual Lexical Substitution | yes | |
January 20th | Kalasouskaya: Project Supervised WSD | yes | |
January 27th | Deyringer: Project Wikification | yes | |
January 27th | Wunderlich: Project WSD of Old English | yes |
Literature:
Philipp Koehn's book Statistical Machine Translation
Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)
Roberto Navigli's tutorial on WSD (here is a local copy)