Klausur: Informationsextraktion (VL) ==================================== WS 2021/2022 Prof. Dr. Alexander Fraser fraser@cis.uni-muenchen.de PLEASE SAVE AS yourlastname.txt NOW. So if your lastname is Fraser, save the file as fraser.txt Resources (Hilfsmittel): any slides or books. NO COMMUNICATION! Time: 60 Minutes All questions are worth 5 points. Questions may be answered in English or German as you prefer. PLEASE WRITE YOUR ANSWERS IN THIS TEXT FILE, TRY NOT TO MODIFY THE TEXT THAT IS ALREADY HERE, JUST INSERT BLANK LINES AND YOUR TEXT. Viel Erfolg! 0 YOUR INFORMATION ================== Nachname: Vorname: Matrikelnummer: 1 Question Answering ===================== Given the question: "Who won the Nobel Peace Prize in 2009?", what is the first step that needs to be performed? What would training data for this first step look like? 2 Encoding ========== The character "A" is represented as the byte 01000001 in ASCII encoding. a) How is "A" represented in LATIN1? b) How is "A" represented in UTF-8? c) What is a common problem with LATIN1 as used in web servers? d) List two advantages of UTF-8 over ASCII. 3 Information Retrieval ======================= a) Give the precise definition of precision as used in information retrieval in both a full sentence and in a formula. b) Give the precise definition of recall as used in information retrieval in both a full sentence and in a formula. We have a set of documents D, numbered from 1 to 100. The next part uses a set of documents S retrieved by your system (the numbers are the document numbers) in answer to a query Q; the gold standard G contains the truly relevant documents for Q. c) If S={1,4,5,55,70,76} and G={5,7,55,60}, compute precision and recall (percentages or fractions like 1/10 are fine). d) If S={1,4,5,55,70,76} and G contains all 100 documents, compute precision and recall. e) If S is empty and G={5,7,55,60}, compute precision and recall. 4 Bottom-up rule formation ========================== a) How does bottom-up rule formation work, give a brief description. Suppose you are given text: "The seminar will be at 4 pm .", and you are told "4 pm" is a starting time. b) Which rule would you initially learn? c) Give an example of a rule you could learn after that in a later iteration of the learning algorithm. 5 Contingency tables in information retrieval evaluation ======================================================== a) Draw a contingency table as used in information retrieval evaluation. Put the gold standard on the top. Put the retrieved documents on the left hand side. Define: b) true negatives c) false positives d) accuracy Answer: e) In Information Retrieval we generally have many not relevant documents for a given query. Which evaluation metric is this a problem for and why? 6 Disorder and Entropy ====================== a) Order these Disorder values from lowest to highest (write them in sorted order from lowest (first line) to highest (last line). Use = and put on the same line to show any values that are the same: D(0,3), D(3,0), D(5,5), D(1,2), D(3,4), D(1,1) b) What is the relationship of the entropy of a set S with how ordered it is? Assume the set S contains entities annotated with positive and negative outcomes (such as "sunburned" versus "not sunburned") 7 Linear models =============== a) How can we make any binary classifier into a simple sequence classifier? (2 points) b) What type of markup is used in state-of-the-art named entity recognition? Give a short training example and indicate which tags go at each position. (3 points) 8 Relation Extraction ===================== Given: "King Ludwig of Bavaria was born in Nymphenburg Palace , on August 25 , 1845 ." a) Which classifier is used first, what does it do? Which three decisions will it make here? (2 points) b) Which classifier is used second? If everything works well, which classification decisions will be made? (1 point) Open Information Extraction c) Explain the nature of Open Relation Extraction as opposed to ordinary/classic Relation Extraction. (2 points) 9 Neural Networks ================= a) Given a binary linear classifier, how do we perform logistic regression? (1 point) b) Write the truth table for XNOR of two variables, "a" and "b" (2 points) c) What is the key idea for solving XNOR with a neural network? (1 point) d) What role does a bias term play in a neural network? How is this different from a bias in a linear model? (1 point) 10 Word Embeddings ================== a) What is BERT? (1 point) b) What is Word2Vec? (1 point) c) How can BERT and Word2Vec be used together with a basic linear classifier, as discussed in class (3 points)?