Soumith Chintala (@soumithchintala) Twitter Tweets • TwiCopy

Soumith Chintala

@soumithchintala

+ Follow

Cofounded and lead @PyTorch at Meta.
Also dabble in robotics at NYU.

AI is delicious when it is accessible and open-source.

ID:70831441

linkhttp://soumith.ch calendar_today02-09-2009 00:23:57

3,4K Tweets

187,6K Followers

889 Following

lmsys.org

1 day ago

Exciting new blog -- What’s up with Llama-3?

Since Llama 3’s release, it has quickly jumped to top of the leaderboard. We dive into our data and answer below questions:

- What are users asking? When do users prefer Llama 3?
- How challenging are the prompts?
- Are certain users…

Exciting new blog -- What’s up with Llama-3? Since Llama 3’s release, it has quickly jumped to top of the leaderboard. We dive into our data and answer below questions: - What are users asking? When do users prefer Llama 3? - How challenging are the prompts? - Are certain users…

thumb_up_off_alt710

chat_bubble_outline0

account_circle

Soumith Chintala

@soumithchintala

3 days ago

I've gotten hyper-conscious to the word 'delve' now.
I now see it everywhere and i cant help but assume someone used GPT to write or rephrase whatever it is I'm reading.

thumb_up_off_alt302

chat_bubble_outline0

account_circle

Hugh Zhang @ICLR '24

1 week ago

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

thumb_up_off_alt1,0K

chat_bubble_outline0

account_circle

Gradient

1 week ago

We've been in the kitchen cooking 🔥 Excited to release the first AI at Meta LLama-3 8B with a context length of over 1M on Hugging Face - coming off of the 160K context length model we released on Friday!

A huge thank you to Crusoe Energy for sponsoring the compute. Let us know…

We've been in the kitchen cooking 🔥 Excited to release the first @AIatMeta LLama-3 8B with a context length of over 1M on @huggingface - coming off of the 160K context length model we released on Friday! A huge thank you to @CrusoeEnergy for sponsoring the compute. Let us know…

thumb_up_off_alt1,1K

chat_bubble_outline0

account_circle

Soumith Chintala

@soumithchintala

1 week ago

apparently Google laid off their entire Python Foundations team, WTF!
( Aaron Gokaslan who is one of the pybind11 maintainers just informed me, asking what ways they can re-fund pybind11)
The team seems to have done substantial work that seems critical for Google internally as well.…

apparently Google laid off their entire Python Foundations team, WTF! ( @SkyLi0n who is one of the pybind11 maintainers just informed me, asking what ways they can re-fund pybind11) The team seems to have done substantial work that seems critical for Google internally as well.…

thumb_up_off_alt3,9K

chat_bubble_outline0

account_circle

Sergey Edunov

2 weeks ago

There are many ways a very large and powerful model can be useful, even if no one can run it locally today:

Distillation -- think about all recent results people show distilling GPT-4 outputs and training smaller models on those, how much more can be done if the teacher model…

thumb_up_off_alt69

chat_bubble_outline0

account_circle

PyTorch

2 weeks ago

PyTorch 2.3 is here 😎🔥

PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance regressions or graph breaks.

Details: hubs.la/Q02tYcYq0

PyTorch 2.3 is here 😎🔥 PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance regressions or graph breaks. Details: hubs.la/Q02tYcYq0

thumb_up_off_alt765

chat_bubble_outline0

account_circle

Soumith Chintala

@soumithchintala

2 weeks ago

nice work on Phi-3 Sebastien Bubeck and team :-)
results look really impressive.

thumb_up_off_alt46

chat_bubble_outline0

account_circle

Jeremy Howard

2 weeks ago

Today at Answer.AI we've got something new for you: FSDP/QDoRA. We've tested it with AI at Meta Llama3 and the results blow away anything we've seen before.

I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

Today at @answerdotai we've got something new for you: FSDP/QDoRA. We've tested it with @AIatMeta Llama3 and the results blow away anything we've seen before. I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

thumb_up_off_alt1,8K

chat_bubble_outline0

account_circle

Soumith Chintala

@soumithchintala

2 weeks ago

Ahmad Al-Dahle also, Llama3-70B is #1 on English-only, whut!!!!

@Ahmad_Al_Dahle also, Llama3-70B is #1 on English-only, whut!!!!

thumb_up_off_alt89

chat_bubble_outline0

account_circle

Mike Schroepfer

2 weeks ago

Jeremy Howard Soumith Chintala Absolutely! I remember this call! One of the things I learned quickly as an exec is that 'getting the true story' was always very challenging as everyone wanted to manage me! One of the many reasons I love open source is it doesn't give a damn about org charts!

thumb_up_off_alt102

chat_bubble_outline0

account_circle

Mike Schroepfer

2 weeks ago

True Story!

One of the many reasons I love open source is it doesn't give a damn about the org chart or 'managing up.' If people outside of FB/Meta didn't use or like our OSS then something was wrong with it.

PyTorch succeeded because of the hyper focus on developer…

thumb_up_off_alt530

chat_bubble_outline0

account_circle

Horace He

3 weeks ago

The live updating image generator on meta.ai/?icebreaker=im… is a pretty sick UX.

thumb_up_off_alt114

chat_bubble_outline0

account_circle

Soumith Chintala

@soumithchintala

3 weeks ago

very early LMSys Arena results peg llama3-70B at 5th place (the variance is still pretty high, so it can jump up or down a bit).
This is so exciting.
Can't wait to see how the 405B fares once it is released.
chat.lmsys.org/?leaderboard

very early LMSys Arena results peg llama3-70B at 5th place (the variance is still pretty high, so it can jump up or down a bit). This is so exciting. Can't wait to see how the 405B fares once it is released. chat.lmsys.org/?leaderboard

thumb_up_off_alt198

chat_bubble_outline0

account_circle

Soumith Chintala

@soumithchintala

3 weeks ago

There's another quieter release from AI at Meta today that's really cool.
* Live Preview: As you type your image prompt, you get a live preview, making iterating for a good image easier.
* Animate: now you can animate images for short bursts

thumb_up_off_alt125

chat_bubble_outline0

account_circle

Soumith Chintala

@soumithchintala

3 weeks ago

Llama3 8B and 70B are out, with pretty exciting results!
* The ~400B is still training but results already look promising.
* Meta's own Chat interface is also live at meta.ai
* TorchTune integration is shortly going live: github.com/pytorch/torcht…

thumb_up_off_alt714

chat_bubble_outline0

account_circle

Andrew Ruiz

@then_there_was

3 weeks ago

Oh my god. 😂

GPT-4 uses the word “delve” so much because many of the RLHF’s (reinforcement learning human feedback) workers for GPT-4 were Nigerians who use the word “delve” a lot more relative to other countries.

So GPT-4 writes like an educated anglophone African.

Oh my god. 😂 GPT-4 uses the word “delve” so much because many of the RLHF’s (reinforcement learning human feedback) workers for GPT-4 were Nigerians who use the word “delve” a lot more relative to other countries. So GPT-4 writes like an educated anglophone African.

thumb_up_off_alt6,6K

chat_bubble_outline0

account_circle

fpc ok :)