Shan Carter (@shancarter) Twitter Tweets • TwiCopy

Shan Carter

@shancarter

+ Follow

he/him

ID:14875983

calendar_today23-05-2008 01:43:41

3,7K Tweets

7,9K Followers

316 Following

Ethan Perez

6 months ago

A bit late, but excited about our recent work doing a deep-dive on sycophancy in LLMs. It seems like it's a general phenomenon that shows up in a variety of contexts/SOTA models, and we were also able to more clearly point to human feedback as a probable part of the cause

thumb_up_off_alt87

chat_bubble_outline0

account_circle

Anthropic

6 months ago

What does it mean for AI development to be more democratic? To find out, we partnered with Collective Intelligence Project to use @usepolis to curate an AI constitution based on the opinions of ~1000 Americans. Then we trained a model against it using Constitutional AI.

What does it mean for AI development to be more democratic? To find out, we partnered with @collect_intel to use @usepolis to curate an AI constitution based on the opinions of ~1000 Americans. Then we trained a model against it using Constitutional AI.

thumb_up_off_alt398

chat_bubble_outline0

account_circle

Adam Jermyn

6 months ago

I think Responsible Scaling Policies are a great idea and more AI orgs should do them.

anthropic.com/index/anthropi…

thumb_up_off_alt8

chat_bubble_outline0

account_circle

Chris Olah

6 months ago

It increasingly seems to me that the next big barrier in mechanistic interpretability will be an engineering one.

If you are an engineer who wants to help us scale up this work, please consider applying! Your support could really accelerate interpretability right now.

thumb_up_off_alt192

chat_bubble_outline0

account_circle

Roger Grosse

8 months ago

Excited to share what I've been working on for the past year at Anthropic along with Juhan Bae and Cem Anil: influence functions for large language models. Some interesting patterns arise at scale!

thumb_up_off_alt294

chat_bubble_outline0

account_circle

Anthropic

6 months ago

The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.

thumb_up_off_alt6,0K

chat_bubble_outline0

account_circle

Nick

6 months ago

in my opinion it’s likely the most exciting time in mech interpretability is about to start

to build great interfaces for studying models you want the preliminaries of clean neurons (this work) and short labels (feature vis for image, auto-interpretability labeling for language)

thumb_up_off_alt244

chat_bubble_outline0

account_circle

Joshua Batson

6 months ago

In writing this paper, there were countless features we thought might be bugs. After careful inspection, ~all of them revealed surprising and subtle model properties.

To me this capacity for surprise is the true test of a new technique.

This thread is about my favorite finding.

thumb_up_off_alt389

chat_bubble_outline0

account_circle

Anthropic

9 months ago

Today, our CEO Dario Amodei has the opportunity to discuss the risks and oversight of AI in front of the Senate. You can read his full testimony here: judiciary.senate.gov/download/2023-…

thumb_up_off_alt164

chat_bubble_outline0

account_circle

Catherine Olsson

9 months ago

I had a great time on this team and I encourage folks to apply!

thumb_up_off_alt53

chat_bubble_outline0

account_circle

Joshua Batson

9 months ago

I've thoroughly enjoyed working with this team since I joined in March...highly collaborative, focused on hard and important problems. If you're interested, please apply. If you want to learn more, email me [email protected]

thumb_up_off_alt26

chat_bubble_outline0

account_circle

Chris Olah

9 months ago

The mechanistic interpretability team at Anthropic is hiring! Come work with us to help solve the mystery of how large models do what they do, with the goal of making them safer.

jobs.lever.co/Anthropic/33dc…

thumb_up_off_alt505

chat_bubble_outline0

account_circle

Chris Olah

1 year ago

It also suggests that studying the mechanisms which develop in artificial neural networks may teach much more general lessons -- that lessons from artificial neural nets might possibly transfer to biological systems!

thumb_up_off_alt27

chat_bubble_outline0

account_circle

Shan Carter

1 year ago

We’re looking for a principal designer to create and cultivate the design culture at Anthropic, a safety-focused organization at the forefront of AI research. Set the design vision and strategy for how users interact with and experience our AI systems.

jobs.lever.co/Anthropic/a3d9…

thumb_up_off_alt16

chat_bubble_outline0

account_circle

Anthropic

1 year ago

Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs. arxiv.org/abs/2302.07459

Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs. arxiv.org/abs/2302.07459

thumb_up_off_alt651

chat_bubble_outline0

account_circle

Shan Carter

1 year ago

Feeling nostalgia today for window managers.

thumb_up_off_alt4

chat_bubble_outline0

account_circle

Tristan Hume

1 year ago

The list of super cool people joining Anthropic to help figure out how to make wild AI progress go well continues to grow! Check out Ben's post for info on why.

thumb_up_off_alt26

chat_bubble_outline0

account_circle

kipply

1 year ago

I've been at Anthropic for over six months now and I'm happy to recommend it to a friend! We're hiring for software engineers to work on our research, product and infrastructure, and particularly you can come work with me on a newly formed✨Tokens Team!
anthropic.com/#careers

thumb_up_off_alt269

chat_bubble_outline0

account_circle

Michael Nielsen

@michael_nielsen

1 year ago

How do generative AI interfaces (like midjourney, ChatGPT) relate to creative work? A 2017 article with Shan Carter addresses it: distill.pub/2017/aia/

One view is that they mostly generate the expected, and can perhaps eventually be useful for doing groundwork:

How do generative AI interfaces (like midjourney, ChatGPT) relate to creative work? A 2017 article with @shancarter addresses it: distill.pub/2017/aia/ One view is that they mostly generate the expected, and can perhaps eventually be useful for doing groundwork:

thumb_up_off_alt13

chat_bubble_outline0

account_circle

fpc ok :)