Greg Frank PRO
gregfrank
·
AI & ML interests
Alignment, mechanistic interpretability, model behavior, agentic red-teaming
Recent Activity
authored a paper 8 days ago
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails authored a paper 8 days ago
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models updated a collection 9 days ago
alignmentOrganizations
None yet