Paul Röttger(@paul_rottger) 's Twitter Profileg
Paul Röttger

@paul_rottger

Postdoc @MilaNLProc, working on evaluating and improving LLM safety. Previously PhD @oiioxford & CTO/co-founder @rewire_online

ID:1280235569218494465

linkhttps://paulrottger.com/ calendar_today06-07-2020 20:22:08

274 Tweets

2,2K Followers

455 Following

Paul Röttger(@paul_rottger) 's Twitter Profile Photo

If you are working on AI alignment, you should really check out PRISM. It is hard to overstate how rich and exciting this dataset is.

What a great week to be a co-author of Hannah Rose Kirk!

account_circle
Paul Röttger(@paul_rottger) 's Twitter Profile Photo

Personalised LLMs are great, but should there be limits to personalisation? If so, who should set these limits?

For answers to these questions and more, check out our paper on the risks and benefits of personalising LLMs, led by Hannah Rose Kirk 👇 out in Nature Machine Intelligence today!

Personalised LLMs are great, but should there be limits to personalisation? If so, who should set these limits? For answers to these questions and more, check out our paper on the risks and benefits of personalising LLMs, led by @hannahrosekirk 👇 out in @NatMachIntell today!
account_circle
Janis Goldzycher(@jagoldz) 's Twitter Profile Photo

New paper at 🥳

We present GAHD, an 11k German Adversarial Hate speech Dataset 📜 and show that mixing annotator support strategies for finding adv. examples leads to a more effective dataset!

Great collab with Paul Röttger and Text Crunching Center @UZH!

Highlights below ⬇️

New paper at #NAACL2024 🥳 We present GAHD, an 11k German Adversarial Hate speech Dataset 📜 and show that mixing annotator support strategies for finding adv. examples leads to a more effective dataset! Great collab with @paul_rottger and @center_text! Highlights below ⬇️
account_circle
James Zou(@james_y_zou) 's Twitter Profile Photo

How many safety examples do need?
What examples are most useful?
Why is it unethical to kill Python processes?🤯

Our new paper studies these + more! openreview.net/pdf?id=gT5hALc…
We analyze safey/utility tradeoff (100s safe demos suffice) and exaggerated safety.

Great…

How many safety examples do #LLMs need? What examples are most useful? Why is it unethical to kill Python processes?🤯 Our new #ICLR2024 paper studies these + more! openreview.net/pdf?id=gT5hALc… We analyze safey/utility tradeoff (100s safe demos suffice) and exaggerated safety. Great…
account_circle