Jan Hendrik Kirchner (@janhkirchner) Twitter Tweets • TwiCopy

repeat0

account_circle

Joel David Hamkins

@JDHamkins

1 day ago

Take my Philosophy of Mathematics final exam!
Post your answers in response to each question.

account_circle

Jacob Pfau

@jacob_pfau

2 weeks ago

Do models need to reason in words to benefit from chain-of-thought tokens?

In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens.
This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵

account_circle

Owain Evans

@OwainEvans_UK

4 months ago

Article with 'Perspectives on the State and Future of Deep Learning' from a number of AI professors. Many interesting points.

(Image has answer to 'Why haven’t we made progress towards understanding deep learning and will we ever?' from Andrew Gordon Wilson)

thumb_up_off_alt28

repeat5

account_circle

Tom Gara

@tomgara

4 months ago

A teen hacked Nvidia, got arrested, was released on bail under police supervision. Police confiscated his laptop and put him in a motel room. He then used the Amazon fire stick connected to his motel room TV to hack Rockstar and steal GTA 6 clips bbc.com/news/technolog…

thumb_up_off_alt20,0K

repeat2,2K

account_circle

vittorio

@IterIntellectus

4 months ago

let me get this straight

we can just tell a piece of software “give it another shot buddy, i know you can do better” to improve its output and people are just cool with that

got it

account_circle

Leopold Aschenbrenner

@leopoldasch

4 months ago

Very excited that this is out: OpenAI's RSP!

Lots of good stuff in here—particularly excited about the baseline commitments (security, alignment, and not deploying/not developing at high/critical), 'preparedness roadmap' and safety drills, and the model autonomy/CBRN evals.

account_circle

Consistently Candid Data Generating Process

4 months ago

account_circle

Greg Brockman

@gdb

4 months ago

New direction for AI alignment — weak-to-strong generalization.

Promising initial results: we used outputs from a weak model (fine-tuned GPT-2) to communicate a task to a stronger model (GPT-4), resulting in intermediate (GPT-3-level) performance.

account_circle

AI Notkilleveryoneism Memes ⏸️

@AISafetyMemes

4 months ago

Sam Altman I dunk on yall a lot but this is actually pretty cool ♥️ Jan Leike Ilya Sutskever Leopold Aschenbrenner

thumb_up_off_alt25

repeat1

account_circle

Stefan Schubert

@StefanFSchubert

4 months ago

OpenAI announces $10M 'superalignment fast grants'

'To support technical research towards the alignment and safety of superhuman AI systems, including weak-to-strong generalization, interpretability, scalable oversight, and more.'

Deadline 18 February

openai.com/blog/superalig…

thumb_up_off_alt14

account_circle

Alignment Lab AI

@alignment_lab

4 months ago

YES

Thank you for making the code available! This is exactly what we need!

OpenAI redemption arc

thumb_up_off_alt17

repeat3

account_circle

Rajesh / Raj

@rkarmani

4 months ago

This is an important read, irrespective of a specific company or an architecture

thumb_up_off_alt9

account_circle

Leo Gao

@nabla_theta

4 months ago

new paper! one reason aligning superintelligence is hard is because it will be different from current models, so doing useful empirical research today is hard. we fix one major disanalogy of previous empirical setups. I'm excited for future work making it even more analogous.

account_circle

Sam Altman

@sama

4 months ago

great work from the superalignment team:

account_circle

Jan Leike

@janleike

4 months ago

Kudos especially to Collin Burns for being the visionary behind this work, Pavel Izmailov for all the great scientific inquisition, Ilya Sutskever for stoking the fires, Jan Hendrik Kirchner and Leopold Aschenbrenner for moving things forward every day. Amazing ✨

thumb_up_off_alt133

repeat9

account_circle

Adrien Ecoffet

@AdrienLE

4 months ago

And especially proud of Collin for being the visionary behind this line of work and driving it all the way here.

thumb_up_off_alt20

account_circle

Sam Bowman

@sleepinyourhat

4 months ago

Excited to see this. Academics interested in alignment, take a look:

thumb_up_off_alt82

repeat5

account_circle

Nora Belrose

@norabelrose

4 months ago

We literally saw a small version of this phenomenon when fine tuning Mistral 7B with labels from a weaker model yesterday, glad to see someone already looked into it in depth 👀

thumb_up_off_alt112

repeat5

account_circle

AI Notkilleveryoneism Memes ⏸️

4 months ago