Leo Gao(@nabla_theta) 's Twitter Profileg
Leo Gao

@nabla_theta

Alignment researcher. cofounder & head of alignment memes @ EleutherAI. currently RE @ OpenAI. Let's make the future awesome.

ID:1174529814264332289

linkhttps://leogao.dev calendar_today19-09-2019 03:45:10

962 Tweets

5,5K Followers

363 Following

Leo Gao(@nabla_theta) 's Twitter Profile Photo

Man goes to doctor. Says he feels all alone in a world not on track to solve alignment. Doctor says, 'Treatment is simple. Great alignment researcher Pagliacci is in town. Go and see him. He has a plan to solve alignment.' Man bursts into tears. 'But doctor.. I am Pagliacci'

account_circle
Leo Gao(@nabla_theta) 's Twitter Profile Photo

any swe can write code that's maintainable, but it takes a research engineer to write code that's barely maintainable

account_circle
Leo Gao(@nabla_theta) 's Twitter Profile Photo

pretraining leakage disanalogy explained: we want to study the analogy where weak models supervise the strong model. but because our models are pretrained on human text, there's implicit supervision by something stronger. this could make results look better than they actually are

pretraining leakage disanalogy explained: we want to study the analogy where weak models supervise the strong model. but because our models are pretrained on human text, there's implicit supervision by something stronger. this could make results look better than they actually are
account_circle
Leo Gao(@nabla_theta) 's Twitter Profile Photo

human simulator / imitation saliency problem explained: one very natural generalization is to just say ~what a human would say. if this is more natural than what the human would say if they knew what the AI knew, then it will systematically hide things humans can't understand

human simulator / imitation saliency problem explained: one very natural generalization is to just say ~what a human would say. if this is more natural than what the human would say if they knew what the AI knew, then it will systematically hide things humans can't understand
account_circle
Jacob Hilton(@JacobHHilton) 's Twitter Profile Photo

There's a cute formula that appears in this paper: KL[best-of-n||best-of-1] = log(n) - (n-1)/n, where best-of-n is the distribution of the best of n i.i.d. samples according to some scoring function. Several people have asked about this so I put together an explainer. (1/6)

account_circle