Paul Röttger (@paul_rottger) Twitter Tweets • TwiCopy

Paul Röttger

@paul_rottger

+ Follow

Postdoc @MilaNLProc, working on evaluating and improving LLM safety. Previously PhD @oiioxford & CTO/co-founder @rewire_online

ID:1280235569218494465

linkhttps://paulrottger.com/ calendar_today06-07-2020 20:22:08

274 Tweets

2,2K Followers

455 Following

Paul Röttger

@paul_rottger

1 week ago

If you are working on AI alignment, you should really check out PRISM. It is hard to overstate how rich and exciting this dataset is.

What a great week to be a co-author of Hannah Rose Kirk!

thumb_up_off_alt33

chat_bubble_outline0

repeat5

shareShare

account_circle

Personalised LLMs are great, but should there be limits to personalisation? If so, who should set these limits?

For answers to these questions and more, check out our paper on the risks and benefits of personalising LLMs, led by Hannah Rose Kirk 👇 out in Nature Machine Intelligence today!

thumb_up_off_alt57

chat_bubble_outline0

repeat7

shareShare

account_circle

Janis Goldzycher

@jagoldz

1 month ago

New paper at #NAACL2024 🥳

We present GAHD, an 11k German Adversarial Hate speech Dataset 📜 and show that mixing annotator support strategies for finding adv. examples leads to a more effective dataset!

Great collab with Paul Röttger and Text Crunching Center @UZH!

Highlights below ⬇️

thumb_up_off_alt49

chat_bubble_outline0

repeat4

shareShare

account_circle

James Zou

@james_y_zou

1 month ago

How many safety examples do #LLMs need?
What examples are most useful?
Why is it unethical to kill Python processes?🤯

Our new #ICLR2024 paper studies these + more! openreview.net/pdf?id=gT5hALc…
We analyze safey/utility tradeoff (100s safe demos suffice) and exaggerated safety.

Great…