Towards eliciting latent knowledge from LLMs with mechanistic interpretability Paper • 2505.14352 • Published May 20, 2025 • 9 • 2
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders Paper • 2501.18052 • Published Jan 29, 2025 • 8 • 2