AI Safety Readings
We run a regular reading group on AI Safety research. We discuss recent papers on interpretability, alignment, multi-agent safety, and related topics. Everyone is welcome, regardless of background.
The reading group is currently held on Mondays (usually) at 14:00 Copenhagen time. Email us at galke@imada.sdu.dk if you want to join or have any questions.
Schedule
| Date | Topic | Presenter |
|---|---|---|
| Feb 3, 2026 | Activation Oracles | Federico |
| Feb 10, 2026 | Weird generalizations | Lukas |
| Feb 16, 2026 | The Dead Salmons of AI Interpretability | Andor |
| Mar 2, 2026 | Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences | Annemette |
| Mar 9, 2026 | Linear Representations can change over the course of a conversation | Federico |
| Mar 23, 2026 | EasySteer | Gianluca |
| March 30, 2026 | Thought Branches | Andrea |
| April 13, 2026 | Evaluating and Understanding Scheming Propensity in LLM Agents | Filippo |
| April 20, 2026, 13:00 | t.b.d | t.b.d |
| April 27, 2026 | Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment | Laurene Vaugrante |