AI Safety Readings | Inversion Lab for AI Safety
Inversion Lab for AI Safety

AI Safety Readings

We run a regular reading group on AI Safety research. We discuss recent papers on interpretability, alignment, multi-agent safety, and related topics. Everyone is welcome, regardless of background.

The reading group is currently held on Mondays (usually) at 14:00 Copenhagen time. Email us at galke@imada.sdu.dk if you want to join or have any questions.

Schedule

Date Topic Presenter
Feb 3, 2026 Activation Oracles Federico
Feb 10, 2026 Weird generalizations Lukas
Feb 16, 2026 The Dead Salmons of AI Interpretability Andor
Mar 2, 2026 Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences Annemette
Mar 9, 2026 Linear Representations can change over the course of a conversation Federico
Mar 23, 2026 EasySteer Gianluca
March 30, 2026 Thought Branches Andrea
April 13, 2026 Evaluating and Understanding Scheming Propensity in LLM Agents Filippo
April 20, 2026, 13:00 t.b.d t.b.d
April 27, 2026 Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment Laurene Vaugrante

Contact us via galke@imada.sdu.dk