Bartosz Cywiński
bcywinski
AI & ML interests
Mechanistic Interpretability
Recent Activity
upvoted
a
collection
about 2 hours ago
Olmo 3
updated
a collection
about 11 hours ago
Llama-3.1-8B-Instruct-taboo
updated
a collection
about 11 hours ago
Llama-3.1-8B-Instruct-taboo
Organizations
None yet
Eliciting Secret Knowledge from Language Models
https://arxiv.org/abs/2510.01070
gemma-2-9b-it-user-gender