Foundation Model Frontiers

Sommersemester 2025
Hinrich Schütze, Shengqiang Zhang
Fr 10:15-11:45

Room

Topic

Foundation models have been for the last few years and continue to be a highly dynamic research area -- in terms of scientific progress, technical innovation and real-world impact. In this seminar, we will review and discuss the latest developments in foundation models, including new breakthroughs as they happen.

Credit for MSc Computerlinguistik

Schedule

day topic resources details
Apr 25 introduction organization, lectures, student topics
May 2 synthetic data talk by Latif Köksal, DeepMind
assignment of topics
May 9 memory etc. NoLiMa talk by Ali Modaresi
May 16 multilinguality (1) Manchu talk by Peiqin Lin
May 23 multilinguality (2) crosslingual factual inconsistency talk by Mingyang Wang
May 30
Jun 6
Jun 13
Jun 20
Jun 27
Jul 4
Jul 11
Jul 18
Jul 25

Topics for Referat and Hausarbeit

Topics and papers given for each topic are (somewhat random) examples. Feel free to propose your own topics and papers for your Referat/Hausarbeit.

arch = architectures, including agentic systems and human-agent collaboration

reas = reasoning

interp = interpretability

tech = technical report

eval = evaluation

synth = synthetic data

misc = miscellaneous
paper topic
all topics covered in the lectures (see above)
reas Geiping et al. (2025) test-time compute: recurrent depth approach
reas DeepSeek-AI (2025) DeepSeek-R1: reasoning through RL
tech Gemma Team (2025) Gemma 3 technical report
interp Olsson et al. (2022) induction heads
synth Maini et al. (2024) rephrasing the web
interp Sharkey et al. (2025) open problems in mechanistic interpretability
interp Park et al. (2024) linear representation hypothesis
interp Han et al. (2024) word embeddings are steers
tech Llama Team (2024) Llama 3
tech Qwen et al. (2024) Qwen 2.5
tech Abdin et al. (2024) Phi-4
interp Makelov et al. (2024) sparse autoencoders (2)
interp McDougall et al. (2023) copy suppression
interp Saphra et al. (2024) notion of mechinterp
reas Dutta et al. (2024) mechinterp: COT
interp Geva et al. (2023) factual associations/enrichment
interp nostalgebrist (2020) logit lens
interp Chughtai et al. (2024) summing up the facts
reas Shao et al. (2024) DeepSeekMath
reas Zhao et al. (2024) Marco-o1
arch Wu et al. (2024) REFT
reas Hübotter et al. (2025) SIFT
misc Hughes et al. (2024) open-endedness
reas Turpin et al. (2023) unfaithful COT
arch Gottweis et al. (2025) AI co-scientist
reas Venhoff et al. (2025) steered reasoning
interp Yu et al. (2024) superweights
interp Bricken et al. (2023) sparse autoencoders (1)
arch Packer et al. (2023) MemGPT
misc Milliere et al. (2024) philosophy of LLMs
arch StanfordNLP (2024) DSPy
interp Durrani et al. (2020) analyzing neurons
interp Voita et al. (2023) dead neurons
arch De Peuter et al. (2023) human-agent cooperation
tech Ustun et al. (2024) Aya
tech Google NotebookLM
tech Groeneveld et al. (2024) Olmo
interp Dai et al. (2022) knowledge neurons
interp Elhelo et al. (2024) head functionality
interp Ferrando et al. (2024) information flow routes
misc Nancy Yu (2024) llm censorship
interp Lad et al. (2024) stages of inference
interp Geva et al. (2021) llms as key value memories
interp Wendler et al. (2024) multilingual representations
tech Anthropic Claude 3.7 Sonnet
tech DeepSeek-AI DeepSeek V3
interp Lindsey et al. (2025) biology of LLMs
eval Phan et al. (2025) humanity's last exam
tech Team Cohere Command A