Jan Hendrik Kirchner(@janhkirchner) 's Twitter Profileg
Jan Hendrik Kirchner

@janhkirchner

phd student in comp neuroscience @ mpi brain research frankfurt, https://t.co/42mTlpAKYJ, ➡️ supergeneralization theorist

ID:972038953586057216

calendar_today09-03-2018 09:18:40

401 Tweets

944 Followers

522 Following

Jan Hendrik Kirchner(@janhkirchner) 's Twitter Profile Photo

i figured it out (lol)! utilitarianism is about coherent actions. deontology is for doing utilitarianism with bounded cognition. and virtue ethics is about managing risks from self-modification. don’t at me

account_circle
Jacob Pfau(@jacob_pfau) 's Twitter Profile Photo

Do models need to reason in words to benefit from chain-of-thought tokens?

In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens.
This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵

Do models need to reason in words to benefit from chain-of-thought tokens? In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens. This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵
account_circle
Owain Evans(@OwainEvans_UK) 's Twitter Profile Photo

Article with 'Perspectives on the State and Future of Deep Learning' from a number of AI professors. Many interesting points.

(Image has answer to 'Why haven’t we made progress towards understanding deep learning and will we ever?' from Andrew Gordon Wilson)

Article with 'Perspectives on the State and Future of Deep Learning' from a number of AI professors. Many interesting points. (Image has answer to 'Why haven’t we made progress towards understanding deep learning and will we ever?' from Andrew Gordon Wilson)
account_circle
Tom Gara(@tomgara) 's Twitter Profile Photo

A teen hacked Nvidia, got arrested, was released on bail under police supervision. Police confiscated his laptop and put him in a motel room. He then used the Amazon fire stick connected to his motel room TV to hack Rockstar and steal GTA 6 clips bbc.com/news/technolog…

A teen hacked Nvidia, got arrested, was released on bail under police supervision. Police confiscated his laptop and put him in a motel room. He then used the Amazon fire stick connected to his motel room TV to hack Rockstar and steal GTA 6 clips bbc.com/news/technolog…
account_circle
vittorio(@IterIntellectus) 's Twitter Profile Photo

let me get this straight

we can just tell a piece of software “give it another shot buddy, i know you can do better” to improve its output and people are just cool with that

got it

account_circle
Leopold Aschenbrenner(@leopoldasch) 's Twitter Profile Photo

Very excited that this is out: OpenAI's RSP!

Lots of good stuff in here—particularly excited about the baseline commitments (security, alignment, and not deploying/not developing at high/critical), 'preparedness roadmap' and safety drills, and the model autonomy/CBRN evals.

Very excited that this is out: OpenAI's RSP! Lots of good stuff in here—particularly excited about the baseline commitments (security, alignment, and not deploying/not developing at high/critical), 'preparedness roadmap' and safety drills, and the model autonomy/CBRN evals.
account_circle
Greg Brockman(@gdb) 's Twitter Profile Photo

New direction for AI alignment — weak-to-strong generalization.

Promising initial results: we used outputs from a weak model (fine-tuned GPT-2) to communicate a task to a stronger model (GPT-4), resulting in intermediate (GPT-3-level) performance.

account_circle
Stefan Schubert(@StefanFSchubert) 's Twitter Profile Photo

OpenAI announces $10M 'superalignment fast grants'

'To support technical research towards the alignment and safety of superhuman AI systems, including weak-to-strong generalization, interpretability, scalable oversight, and more.'

Deadline 18 February

openai.com/blog/superalig…

account_circle
Leo Gao(@nabla_theta) 's Twitter Profile Photo

new paper! one reason aligning superintelligence is hard is because it will be different from current models, so doing useful empirical research today is hard. we fix one major disanalogy of previous empirical setups. I'm excited for future work making it even more analogous.

new paper! one reason aligning superintelligence is hard is because it will be different from current models, so doing useful empirical research today is hard. we fix one major disanalogy of previous empirical setups. I'm excited for future work making it even more analogous.
account_circle
Jan Leike(@janleike) 's Twitter Profile Photo

Kudos especially to Collin Burns for being the visionary behind this work, Pavel Izmailov for all the great scientific inquisition, Ilya Sutskever for stoking the fires, Jan Hendrik Kirchner and Leopold Aschenbrenner for moving things forward every day. Amazing ✨

account_circle
Adrien Ecoffet(@AdrienLE) 's Twitter Profile Photo

And especially proud of Collin for being the visionary behind this line of work and driving it all the way here.

account_circle
Nora Belrose(@norabelrose) 's Twitter Profile Photo

We literally saw a small version of this phenomenon when fine tuning Mistral 7B with labels from a weaker model yesterday, glad to see someone already looked into it in depth 👀

account_circle