Teaching

Blockseminar SoSe 2023

Linguistic Information in Large Language Models II

8. ~~and 9.~~ August 2023

In this seminar, we will read and discuss papers concerned with analyzing and representing linguistic information in large language models with a focus on semantics and the understanding of concepts and relations in LLMs, for example identifying metaphors, the understanding of compounds and ontological knowledge.

The format of this seminar is similar to a reading group: participants should prepare a paper in advance and present it during the seminar as a basis for a discussion (see below for a list of papers), and also write a summary of the paper after the seminar.

If you are interested in participating, please contact me (dimarco -AT- cis.uni-muenchen.de)

Location: Room 115

Time: 9:30

Presentation:
Please prepare a presentation of 20 minutes.

Written summary of the paper:
The summary should be 4 pages long (+ references) using the following template: example-pdf latex

Deadline for the summary: 05. September

Papers:

From chocolate bunny to chocolate crocodile: Do Language Models Understand Noun Compounds? Jordan Coil and Vered Shwartz. Findings of the Association for Computational Linguistics: ACL 2023.
– already selected by L.L. –
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge. Paolo Pedinotti, Giulia Rambelli, Emmanuele Chersoni, Enrico Santus, Alessandro Lenci, Philippe Blache. Proceedings of *-SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics – already selected by H.C. –
Multilingual Multi-Figurative Language Detection. Huiyuan Lai, Antonio Toral, Malvina Nissim. Findings of the Association for Computational Linguistics: ACL 2023.
BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models. Shibo Hao, Bowen Tan, Kaiwen Tang, Bin Ni, Xiyan Shao, Hengzhe Zhang, Eric P. Xing, Zhiting Hu. Findings of the Association for Computational Linguistics: ACL2023.
Do language models have coherent mental models of everyday things? Yuling Gu, Bhavana Dalvi Mishra, Peter Clark. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. (ACL 2023).
Do PLMs Know and Understand Ontological Knowledge? Weiqi Wu, Chengyue Jiang, Yong Jiang, Pengjun Xie, Kewei Tu. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. (ACL 2023)
– already selected by H.C. –
A fine-grained comparison of pragmatic language understanding in humans and language models. Jennifer Hu, Sammy Floyd, Olessia Jouravlev, Evelina Fedorenko, Edward Gibson. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. (ACL 2023)
– already selected by L.N. –
Language acquisition: do children and language models follow similar learning stages? Linnea Evanson, Yair Lakretz, Jean Rémi King. Findings of the Association for Computational Linguistics: ACL 2023.
Second Language Acquisition of Neural Language Models. Miyu Oba, Tatsuki Kuribayashi, Hiroki Ouchi, Taro Watanabe. Findings of the Association for Computational Linguistics: ACL 2023.

Blockseminar WiSe 22/23

Linguistic Information in Large Language Models

14. and 15. March 2023

In this seminar, we will read and discuss papers concerned with analyzing and representing linguistic information in large language models. In the first part, we address the topic of segmentation and sub-word representation and its implications on morphology. In the second part, we look at papers analyzing different linguistic features in pre-trained language models.

The format of this seminar will be similar to a reading group; participants should prepare a paper in advance and present it during the seminar (see below for a list of papers).

If you are interested in participating, please contact me (dimarco -AT- cis.uni-muenchen.de)

Presentation:
Please prepare a presentation of 20 minutes.

Written summary of the paper:
The summary should be 4 pages long (+ references) using the following template: example-pdf latex

Deadline for the summary: 5. April 2023

Papers:

Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding Samson Tan, Shafiq Joty, Lav R. Varshneyf, Min-Yen Kan (2020)
– already selected –
Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words. Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze (2021)
– already selected –
Morphology Matters: A Multilingual Language Modeling Analysis. Hyunji Hayley Park, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu, Lane Schwartz (2021).
– already selected –
A Multilabel Approach to Morphosyntactic Probing. Naomi Tachikawa Shapiro, Amandalynne Paullada, Shane Steinert-Threlkeld (2021)
– already selected –
The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative Leonie Weissweiler, Valentin Hofmann, Abdullatif Köksal, Hinrich Schütze (2022)
– already selected –
A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs. Mareike Hartmann, Miryam de Lhoneux, Daniel Hershcovich, Yova Kementchedjhieva, Lukas Nielsen, Chen Qiu, Anders Søgaard (2021)
– already selected –
On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning. Marc Tanti, Lonneke van der Plas, Claudia Borg, Albert Gatt (2021)
– already selected –
Is “My Favorite New Movie” My Favorite Movie? Probing the Understanding of Recursive Noun Phrases. Qing Lyu, Zheng Hua, Daoxin Li, Li Zhang, Marianna Apidianaki, Chris Callison-Burch (2022)
Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology. Rochelle Choenni, Ekaterina Shutova (2022)
– already selected –
Probing for targeted syntactic knowledge through grammatical error detection. Christopher Davis, Christopher Bryant, Andrew Caines, Marek Rei, Paula Buttery
– already selected –
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models Aaron Mueller, Yu Xia, Tal Linzen
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models Benjamin Muller, Antonios Anastasopoulos, Benoît Sagot, Djamé Seddah
– already selected –
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. Phillip Rust, Jonas Pfeiffer,Ivan Vulic, Sebastian Ruder, Iryna Gurevych
– already selected –

Marion Di Marco (née Weller)

Post-Doc Researcher at the Center for Information and Language Processing (CIS)

Teaching