Chris Olah
@ch402
Reverse engineering neural networks at @AnthropicAI. DMs open! Previously @distillpub, OpenAI Clarity Team, Google Brain. Personal account.
ID:153196789
http://colah.github.io 07-06-2010 23:08:04
5,2K Tweets
90,9K Followers
173 Following
Great visualisation library for Sparse Autoencoder features from Callum McDougall! My team has already been finding it super useful, go check it out:
lesswrong.com/posts/nAhy6Zqu…
I'm incredibly excited to have Craig joining us on the Anthropic Interpretability team!
I've been a huge fan of Colaboratory for nearly a decade (I used it internally at Google!) and have really admired Craig's work on it.
Next our series of small monthly updates from the interpretability team, including a few fun things:
1. We use do feature attribution to find features related to specific completions (following the athlete-sport association example of Neel Nanda )
Reflections on Qualitative Research:
transformer-circuits.pub/2024/qualitati…
[h/t to Chris Olah for originating & driving this!]