Eugene Yan(@eugeneyan) 's Twitter Profileg
Eugene Yan

@eugeneyan

ML, RecSys, LLMs @ Amazon. Prev Alibaba, Lazada, startup.
Building systems to serve customers at scale. Writing to learn & teach.
Ideas my own.

ID:35109534

linkhttps://eugeneyan.com calendar_today25-04-2009 01:51:11

3,4K Tweets

18,2K Followers

517 Following

Modal(@modal_labs) 's Twitter Profile Photo

What is the sushi of Texas? Who is the Albert Einstein of synchronized swimming? What is to startups as gasoline is to engines?

Discover these connections, and more, with analogy-based Wikipedia search -- powered by Weaviate • vector database and Modal!

Try it here: vector-analogies-wikipedia.vercel.app

What is the sushi of Texas? Who is the Albert Einstein of synchronized swimming? What is to startups as gasoline is to engines? Discover these connections, and more, with analogy-based Wikipedia search -- powered by @weaviate_io and Modal! Try it here: vector-analogies-wikipedia.vercel.app
account_circle
chris(@hingeloss) 's Twitter Profile Photo

The best part of doing a startup is getting to choose the right thing over the prettiest thing.

The hardest part is saying no to everyone who just wants the pretty thing.

account_circle
tobi lutke(@tobi) 's Twitter Profile Photo

Sunday rant.

For software engineering, my sense is that the phrase “premature optimization is the root of all evil” has massively backfired. Its from a book on data structures and mainly tried to dissuade people from prematurely write things in assembler. But the point was to…

account_circle
Eugene Yan(@eugeneyan) 's Twitter Profile Photo

The doers are the major thinkers. The people that really create the things that change this industry are both the thinker doer in one person.

Of course it's very easy to take credit for the thinking. The doing is more concrete but it's very easy for somebody to say 'oh I…

account_circle
Jo Kristian Bergum(@jobergum) 's Twitter Profile Photo

imho top performers:

- figure out what needs to be done
- advocate for why it needs to be done
- prioritize what needs to be done versus should be done
- gets it done

account_circle
Charles 🎉 Frye(@charles_irl) 's Twitter Profile Photo

when i looked back at alexnet again in ~2020 and noticed it had model parallelism, i realized that i really needed to spend less time on mathematics and more on software engineering

account_circle
Eugene Yan(@eugeneyan) 's Twitter Profile Photo

You NEED “privately curated, internal benchmarks for each company’s own use cases. You can’t game your customers.”

account_circle
Alexandr Wang(@alexandr_wang) 's Twitter Profile Photo

How overfit are popular LLMs on public benchmarks?

New research out of @scale_ai SEAL to answer this:

- produced a new eval GSM1k
- evaluated public LLMs for overfitting on GSM8k

VERDICT: Mistral & Phi are overfitting benchmarks, while GPT, Claude, Gemini, and Llama are not.

How overfit are popular LLMs on public benchmarks? New research out of @scale_ai SEAL to answer this: - produced a new eval GSM1k - evaluated public LLMs for overfitting on GSM8k VERDICT: Mistral & Phi are overfitting benchmarks, while GPT, Claude, Gemini, and Llama are not.
account_circle
Hamel Husain(@HamelHusain) 's Twitter Profile Photo

I’m getting lots of questions about why this is a bad idea.

Repeatedly peeking at the validation set in the process optimizing anything makes that validation set very biased

It’s very bad hygiene to intermingle your validation and test/eval set. The consequences of this…

account_circle
sarah guo // conviction(@saranormous) 's Twitter Profile Photo

HUGE problem for startups selling AI products to enterprises - buyers have no idea what is going on

you think evals are a problem in research, but way more so in the real world: noisy, snake oil marketing, top-down pressure, lack of talent, focus on credibility as safe signal

account_circle