Josh Tobin(@josh_tobin_) 's Twitter Profileg
Josh Tobin

@josh_tobin_

ML-powered products @gantry_ml @full_stack_dl.

Previously @Berkeley_EECS PhD and @openai

ID:331827860

linkhttps://gantry.io calendar_today08-07-2011 19:42:10

880 Tweets

11,1K Followers

1,0K Following

Charles 🎉 Frye(@charles_irl) 's Twitter Profile Photo

d.erenrich.net/are-you-smarte…

Great way to try out MMLU and get a sense for just what, exactly, we are using to evaluate LLMs!

I doubt folks would knife fight for a percent on this benchmark if its contents were realized more broadly.

d.erenrich.net/are-you-smarte… Great way to try out MMLU and get a sense for just what, exactly, we are using to evaluate LLMs! I doubt folks would knife fight for a percent on this benchmark if its contents were realized more broadly.
account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

Lessons from the last 24H (as an outsider):
- when things get hard, you see immediately who was rooting for you to fail
- incentives matter
- come at the king you best not miss

account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

There are a lot of good reasons to prefer open-source LLMs.

'We need to own the IP' isn't one.

It's like saying 'We need to own the data centers'. LLM IP is not where the value lies for most businesses. Data is.

account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

This is more common than you'd think.

Early on at OpenAI, I told Pieter Abbeel and Wojciech Zaremba that I was going to spend a few weeks on domain randomization because it felt like the right baseline for domain adaptation.

Turns out it worked way better.

arxiv.org/abs/1703.06907

account_circle
The Full Stack(@full_stack_dl) 's Twitter Profile Photo

🥞🦜 New LLM Bootcamp Announcement 🦜🥞

In 2023, the AI world speedran through models, architectures (e.g., RAG), and frameworks (e.g., LangChain).

After a year of hype, what's *actually* working?

This November, we'll show you, in our latest class on building prod LLM apps

account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

I'm excited to teach this edition of The Full Stack at Scale by the Bay in November!

Join us to learn about building LLM applications the right way -- systematically, with users in mind, and ready for production.

account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

deep learning in a nutshell:

- If you suspect you have a bug, you do
- If you don't think you have a bug, you still probably do
- If you know you don't have a bug, you still might

account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

Evaluation is a key challenge for LLM builders these days.

I had a great time talking about it at the MLOps Community LLMs in Production Conference.

Check it out here: home.mlops.community/public/collect…

account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

Same phenomenon is playing out for teams building LLM-powered product features.

You launch the feature and see insane retention numbers.

But users lose a bit of trust with each bad interaction. Eventually leads to a delayed churn.

account_circle
Josh Tobin(@josh_tobin_) 's Twitter Profile Photo

I’m giving a talk on evaluating LLM based applications at the Databricks #DataAISummit at 1:30, come stop by if you are around!

databricks.com/dataaisummit/s…

I’m giving a talk on evaluating LLM based applications at the @databricks @Data_AI_Summit at 1:30, come stop by if you are around! databricks.com/dataaisummit/s…
account_circle