InterpretabilityResearchA Mathematical Framework for Transformer CircuitsDec 22, 2021Read PaperResearchExploring model welfareApr 24, 2025ResearchValues in the wild: Discovering and analyzing values in real-world language model interactionsApr 21, 2025ResearchReasoning models don't always say what they thinkApr 03, 2025