Soumith Chintala(@soumithchintala) 's Twitter Profileg
Soumith Chintala

@soumithchintala

Cofounded and lead @PyTorch at Meta.
Also dabble in robotics at NYU.

AI is delicious when it is accessible and open-source.

ID:70831441

linkhttp://soumith.ch calendar_today02-09-2009 00:23:57

3,4K Tweets

187,6K Followers

889 Following

lmsys.org(@lmsysorg) 's Twitter Profile Photo

Exciting new blog -- What’s up with Llama-3?

Since Llama 3’s release, it has quickly jumped to top of the leaderboard. We dive into our data and answer below questions:

- What are users asking? When do users prefer Llama 3?
- How challenging are the prompts?
- Are certain users…

Exciting new blog -- What’s up with Llama-3? Since Llama 3’s release, it has quickly jumped to top of the leaderboard. We dive into our data and answer below questions: - What are users asking? When do users prefer Llama 3? - How challenging are the prompts? - Are certain users…
account_circle
Soumith Chintala(@soumithchintala) 's Twitter Profile Photo

I've gotten hyper-conscious to the word 'delve' now.
I now see it everywhere and i cant help but assume someone used GPT to write or rephrase whatever it is I'm reading.

account_circle
Hugh Zhang @ICLR '24(@hughbzhang) 's Twitter Profile Photo

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.
account_circle
Gradient(@Gradient_AI_) 's Twitter Profile Photo

We've been in the kitchen cooking 🔥 Excited to release the first AI at Meta LLama-3 8B with a context length of over 1M on Hugging Face - coming off of the 160K context length model we released on Friday!

A huge thank you to Crusoe Energy for sponsoring the compute. Let us know…

We've been in the kitchen cooking 🔥 Excited to release the first @AIatMeta LLama-3 8B with a context length of over 1M on @huggingface - coming off of the 160K context length model we released on Friday! A huge thank you to @CrusoeEnergy for sponsoring the compute. Let us know…
account_circle
Soumith Chintala(@soumithchintala) 's Twitter Profile Photo

apparently Google laid off their entire Python Foundations team, WTF!
( Aaron Gokaslan who is one of the pybind11 maintainers just informed me, asking what ways they can re-fund pybind11)
The team seems to have done substantial work that seems critical for Google internally as well.…

apparently Google laid off their entire Python Foundations team, WTF! ( @SkyLi0n who is one of the pybind11 maintainers just informed me, asking what ways they can re-fund pybind11) The team seems to have done substantial work that seems critical for Google internally as well.…
account_circle
Sergey Edunov(@edunov) 's Twitter Profile Photo

There are many ways a very large and powerful model can be useful, even if no one can run it locally today:

Distillation -- think about all recent results people show distilling GPT-4 outputs and training smaller models on those, how much more can be done if the teacher model…

account_circle
PyTorch(@PyTorch) 's Twitter Profile Photo

PyTorch 2.3 is here 😎🔥

PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance regressions or graph breaks.

Details: hubs.la/Q02tYcYq0

PyTorch 2.3 is here 😎🔥 PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance regressions or graph breaks. Details: hubs.la/Q02tYcYq0
account_circle
Jeremy Howard(@jeremyphoward) 's Twitter Profile Photo

Today at Answer.AI we've got something new for you: FSDP/QDoRA. We've tested it with AI at Meta Llama3 and the results blow away anything we've seen before.

I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

Today at @answerdotai we've got something new for you: FSDP/QDoRA. We've tested it with @AIatMeta Llama3 and the results blow away anything we've seen before. I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵
account_circle
Mike Schroepfer(@schrep) 's Twitter Profile Photo

Jeremy Howard Soumith Chintala Absolutely! I remember this call! One of the things I learned quickly as an exec is that 'getting the true story' was always very challenging as everyone wanted to manage me! One of the many reasons I love open source is it doesn't give a damn about org charts!

account_circle
Mike Schroepfer(@schrep) 's Twitter Profile Photo

True Story!

One of the many reasons I love open source is it doesn't give a damn about the org chart or 'managing up.' If people outside of FB/Meta didn't use or like our OSS then something was wrong with it.

PyTorch succeeded because of the hyper focus on developer…

account_circle
Soumith Chintala(@soumithchintala) 's Twitter Profile Photo

very early LMSys Arena results peg llama3-70B at 5th place (the variance is still pretty high, so it can jump up or down a bit).
This is so exciting.
Can't wait to see how the 405B fares once it is released.
chat.lmsys.org/?leaderboard

very early LMSys Arena results peg llama3-70B at 5th place (the variance is still pretty high, so it can jump up or down a bit). This is so exciting. Can't wait to see how the 405B fares once it is released. chat.lmsys.org/?leaderboard
account_circle
Soumith Chintala(@soumithchintala) 's Twitter Profile Photo

There's another quieter release from AI at Meta today that's really cool.
* Live Preview: As you type your image prompt, you get a live preview, making iterating for a good image easier.
* Animate: now you can animate images for short bursts

account_circle
Soumith Chintala(@soumithchintala) 's Twitter Profile Photo

Llama3 8B and 70B are out, with pretty exciting results!
* The ~400B is still training but results already look promising.
* Meta's own Chat interface is also live at meta.ai
* TorchTune integration is shortly going live: github.com/pytorch/torcht…

account_circle
Andrew Ruiz(@then_there_was) 's Twitter Profile Photo

Oh my god. 😂

GPT-4 uses the word “delve” so much because many of the RLHF’s (reinforcement learning human feedback) workers for GPT-4 were Nigerians who use the word “delve” a lot more relative to other countries.

So GPT-4 writes like an educated anglophone African.

Oh my god. 😂 GPT-4 uses the word “delve” so much because many of the RLHF’s (reinforcement learning human feedback) workers for GPT-4 were Nigerians who use the word “delve” a lot more relative to other countries. So GPT-4 writes like an educated anglophone African.
account_circle