Leo Gao (@nabla_theta) Twitter Tweets • TwiCopy

3 months ago

it's a hierarchical navigable small world after all

thumb_up_off_alt21

account_circle

Man goes to doctor. Says he feels all alone in a world not on track to solve alignment. Doctor says, 'Treatment is simple. Great alignment researcher Pagliacci is in town. Go and see him. He has a plan to solve alignment.' Man bursts into tears. 'But doctor.. I am Pagliacci'

thumb_up_off_alt114

repeat8

account_circle

Leo Gao

4 months ago

concreteness precedes abstraction

thumb_up_off_alt18

account_circle

Leo Gao

4 months ago

the variance will increase until morale improves

thumb_up_off_alt47

repeat5

account_circle

Leo Gao

5 months ago

startup idea: coil whine simulator so you can hear your cloud GPUs go brrr

thumb_up_off_alt13

account_circle

Leo Gao

5 months ago

the same procedure as every year

thumb_up_off_alt9

account_circle

Leo Gao

5 months ago

repeat1

account_circle

Leo Gao

5 months ago

any swe can write code that's maintainable, but it takes a research engineer to write code that's barely maintainable

thumb_up_off_alt171

repeat8

account_circle

Leo Gao

5 months ago

pretraining leakage disanalogy explained: we want to study the analogy where weak models supervise the strong model. but because our models are pretrained on human text, there's implicit supervision by something stronger. this could make results look better than they actually are

thumb_up_off_alt49

account_circle

Leo Gao

5 months ago

human simulator / imitation saliency problem explained: one very natural generalization is to just say ~what a human would say. if this is more natural than what the human would say if they knew what the AI knew, then it will systematically hide things humans can't understand

thumb_up_off_alt88

repeat7

account_circle

Jacob Hilton

@JacobHHilton

6 months ago

There's a cute formula that appears in this paper: KL[best-of-n||best-of-1] = log(n) - (n-1)/n, where best-of-n is the distribution of the best of n i.i.d. samples according to some scoring function. Several people have asked about this so I put together an explainer. (1/6)

thumb_up_off_alt85

repeat8

account_circle

Leo Gao

6 months ago

repeat3

account_circle

Leo Gao

8 months ago

I, for one, consider myself an optimist

thumb_up_off_alt67

repeat2