Shan Carter(@shancarter) 's Twitter Profileg
Shan Carter

@shancarter

he/him

ID:14875983

calendar_today23-05-2008 01:43:41

3,7K Tweets

7,9K Followers

316 Following

Ethan Perez(@EthanJPerez) 's Twitter Profile Photo

A bit late, but excited about our recent work doing a deep-dive on sycophancy in LLMs. It seems like it's a general phenomenon that shows up in a variety of contexts/SOTA models, and we were also able to more clearly point to human feedback as a probable part of the cause

account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

What does it mean for AI development to be more democratic? To find out, we partnered with Collective Intelligence Project to use @usepolis to curate an AI constitution based on the opinions of ~1000 Americans. Then we trained a model against it using Constitutional AI.

What does it mean for AI development to be more democratic? To find out, we partnered with @collect_intel to use @usepolis to curate an AI constitution based on the opinions of ~1000 Americans. Then we trained a model against it using Constitutional AI.
account_circle
Adam Jermyn(@AdamSJermyn) 's Twitter Profile Photo

I think Responsible Scaling Policies are a great idea and more AI orgs should do them.

anthropic.com/index/anthropi…

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

It increasingly seems to me that the next big barrier in mechanistic interpretability will be an engineering one.

If you are an engineer who wants to help us scale up this work, please consider applying! Your support could really accelerate interpretability right now.

account_circle
Roger Grosse(@RogerGrosse) 's Twitter Profile Photo

Excited to share what I've been working on for the past year at Anthropic along with Juhan Bae and Cem Anil: influence functions for large language models. Some interesting patterns arise at scale!

account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.

account_circle
Nick(@nickcammarata) 's Twitter Profile Photo

in my opinion it’s likely the most exciting time in mech interpretability is about to start

to build great interfaces for studying models you want the preliminaries of clean neurons (this work) and short labels (feature vis for image, auto-interpretability labeling for language)

account_circle
Joshua Batson(@thebasepoint) 's Twitter Profile Photo

In writing this paper, there were countless features we thought might be bugs. After careful inspection, ~all of them revealed surprising and subtle model properties.

To me this capacity for surprise is the true test of a new technique.

This thread is about my favorite finding.

account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

Today, our CEO Dario Amodei has the opportunity to discuss the risks and oversight of AI in front of the Senate. You can read his full testimony here: judiciary.senate.gov/download/2023-…

account_circle
Joshua Batson(@thebasepoint) 's Twitter Profile Photo

I've thoroughly enjoyed working with this team since I joined in March...highly collaborative, focused on hard and important problems. If you're interested, please apply. If you want to learn more, email me [email protected]

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

The mechanistic interpretability team at Anthropic is hiring! Come work with us to help solve the mystery of how large models do what they do, with the goal of making them safer.

jobs.lever.co/Anthropic/33dc…

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

It also suggests that studying the mechanisms which develop in artificial neural networks may teach much more general lessons -- that lessons from artificial neural nets might possibly transfer to biological systems!

account_circle
Shan Carter(@shancarter) 's Twitter Profile Photo

We’re looking for a principal designer to create and cultivate the design culture at Anthropic, a safety-focused organization at the forefront of AI research. Set the design vision and strategy for how users interact with and experience our AI systems.

jobs.lever.co/Anthropic/a3d9…

account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs. arxiv.org/abs/2302.07459

Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs. arxiv.org/abs/2302.07459
account_circle
Tristan Hume(@trishume) 's Twitter Profile Photo

The list of super cool people joining Anthropic to help figure out how to make wild AI progress go well continues to grow! Check out Ben's post for info on why.

account_circle
kipply(@kipperrii) 's Twitter Profile Photo

I've been at Anthropic for over six months now and I'm happy to recommend it to a friend! We're hiring for software engineers to work on our research, product and infrastructure, and particularly you can come work with me on a newly formed✨Tokens Team!
anthropic.com/#careers

account_circle
Michael Nielsen(@michael_nielsen) 's Twitter Profile Photo

How do generative AI interfaces (like midjourney, ChatGPT) relate to creative work? A 2017 article with Shan Carter addresses it: distill.pub/2017/aia/

One view is that they mostly generate the expected, and can perhaps eventually be useful for doing groundwork:

How do generative AI interfaces (like midjourney, ChatGPT) relate to creative work? A 2017 article with @shancarter addresses it: distill.pub/2017/aia/ One view is that they mostly generate the expected, and can perhaps eventually be useful for doing groundwork:
account_circle