AI Safety Readings

We run a regular reading group on AI Safety research. We discuss recent papers on interpretability, alignment, multi-agent safety, and related topics. Everyone is welcome, regardless of background.

The reading group is currently held on Mondays (usually) at 14:00 Copenhagen time. Email us at galke@imada.sdu.dk if you want to join or have any questions.

Schedule

Date	Topic	Presenter
Feb 3, 2026	Activation Oracles	Federico
Feb 10, 2026	Weird generalizations	Lukas
Feb 16, 2026	The Dead Salmons of AI Interpretability	Andor
Mar 2, 2026	Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences	Annemette
Mar 9, 2026	Linear Representations can change over the course of a conversation	Federico
Mar 23, 2026	EasySteer	Gianluca
March 30, 2026	Thought Branches	Andrea
April 13, 2026	Evaluating and Understanding Scheming Propensity in LLM Agents	Filippo
April 20, 2026, 13:00	t.b.d	t.b.d
April 27, 2026	Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment	Laurene Vaugrante