Andrej Karpathy(@karpathy) 's Twitter Profileg
Andrej Karpathy

@karpathy

🧑‍🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥

ID:33836629

linkhttps://karpathy.ai calendar_today21-04-2009 06:49:15

8,6K Tweets

973,3K Followers

903 Following

Follow People
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Okay I did a first quick pass of naive CUDA kernels for the forward pass of GPT-2 and pushed everything to one file in llm.c, Still only ~1000 lines of code:
github.com/karpathy/llm.c…

Current per iteration timings on my Lambda box <3 A100 40GB PCIe, B=4, T=1024:
- llm.c: 111ms
-…

Okay I did a first quick pass of naive CUDA kernels for the forward pass of GPT-2 and pushed everything to one file in llm.c, Still only ~1000 lines of code: github.com/karpathy/llm.c… Current per iteration timings on my Lambda box <3 A100 40GB PCIe, B=4, T=1024: - llm.c: 111ms -…
account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Btw writing the llm.c training code would imo be a very interesting, impressive, self-contained and very meta challenge for LLM agents. The prompt is:

Take the PyTorch code train_gpt2.py
And write, compile and unit test a single .c file that reproduces the training: train_gpt2.c…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

I added a quick crappy tutorial on how PyTorch layers are moved to C, with a few possibly helpful pointers:
github.com/karpathy/llm.c…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Returning from an experimental ~2 week detox from the internet. Main takeaway is that I didn't realize how unsettled the mind can get when over-stimulating on problems/information (like a stirred liquid), and ~2 weeks is enough to settle into a lot more zen state.

I'm struck by…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Follow along the the tiny corp saga, who are (very publicly!) trying to build your commodity ~petaflop compute node.

tinybox specs: tinygrad.org
the youtube videos form George Hotz 🌑 are actually quite great and entertaining, featuring the signature blend of…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

+1 to the best AI newsletter atm that I enjoy skimming, great/ambitious work by swyx & friends:

buttondown.email/ainews/archive/

'Skimming' because they are very long. Not sure how it is built, sounds like there is a lot of LLM aid going on indexing ~356 Twitters, ~21 Discords, etc.

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

# automating software engineering

In my mind, automating software engineering will look similar to automating driving. E.g. in self-driving the progression of increasing autonomy and higher abstraction looks something like:

1. first the human performs all driving actions…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Beautiful work / attention to detail trying to get Gemma to finetune correctly. There are so many foot guns here to be super careful with. All of these issues don't throw any errors, they silently make your network worse.

A great example of what I wrote about in my 'A Recipe for…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Nice read on the rarely-discussed-in-the-open difficulties of training LLMs. Mature companies have dedicated teams maintaining the clusters. At scale, clusters leave the realm of engineering and become a lot more biological, hence e.g. teams dedicated to 'hardware health'.

It…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Claude 3 takes on the Tokenization book chapter challenge :) context: twitter.com/karpathy/statu…

Definitely looks quite nice, stylistically!

If you look closer there are a number of subtle issues / hallucinations. One example there is a claim that 'hello world' tokenizes into 3…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Setting up my shiny new fully maxed out Space Black MacBook Pro M3 Max 128GB 16-inch (upgrading from an M1 Air). I always like to set up the new one with a clean slate, from scratch - this time I will not allow my dev configuration to get out of hand. Then we'll talk to it.

account_circle