Pedro Cuenca (@pcuenq) Twitter Tweets • TwiCopy

Pedro Cuenca

@pcuenq

+ Follow

ML Engineer at 🤗 Hugging Face | Co-founder at LateNiteSoft (Camera+). I love AI and photography.

ID:1132965807896563714

calendar_today27-05-2019 11:04:32

1,6K Tweets

4,8K Followers

769 Following

Follow People

Andrej Karpathy

🧑‍🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥

+ Follow

AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80Gx

+ Follow

Emad acc/acc

#decentralizeAI

+ Follow

hardmaru

Building Collective Intelligence @SakanaAILabs 🧠

+ Follow

François Chollet

Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'. Opinions are my own.

+ Follow

Birchlabs

@Birchlabs

1 week ago

Apple releases CoreNet, a library for training deep neural networks
github.com/apple/corenet

thumb_up_off_alt73

chat_bubble_outline0

repeat8

shareShare

account_circle

Freepik

@freepik

1 week ago

If you're generating AI images and not using our upscaler, then you're missing half the story! 😛

Gift in the next tweet 👇

account_circle

merve

@mervenoyann

1 week ago

very nice blog post on training a VLM purely in pytorch huggingface.co/blog/AviSoori1… 🔖

account_circle

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to Unsloth AI:

1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096?
2. Upcasted RoPE? Like Gemma?
3. Dynamic

account_circle

Vaibhav (VB) Srivastav

@reach_vb

1 week ago

BOOM! Phi3 is now in Hugging Chat! 🔥

Bonus: MIT Licensed ⚡

Paired with Web-search, it is an unbeatable combination!

From my experience so far, Phi-3 is literally punching above its weight.

Try it out; put your vibe checks below ;)

account_circle

Vaibhav (VB) Srivastav

@reach_vb

1 week ago

Wow! Phi 3 is wicked - GPU Poor ftw 🔥

Here's what we know so far:

Highlights

> 3.8B parameter model (also ran experiments on 7B and 14B)
> Trained on 3.3 Trillion tokens (4.8T for larger variants)
> 3.8B is competitive with Mixtral8x7B & GPT 3.5
> 69% on MMLU and 8.38 on

account_circle

clem 🤗

@ClementDelangue

1 week ago

Very cool community article by Wolfram Ravenwolf! huggingface.co/blog/wolfram/l…

Very cool community article by @WolframRvnwlf! huggingface.co/blog/wolfram/l…

thumb_up_off_alt99

chat_bubble_outline0

repeat9

shareShare

account_circle

Jeremy Howard

@jeremyphoward

1 week ago

Today at Answer.AI we've got something new for you: FSDP/QDoRA. We've tested it with AI at Meta Llama3 and the results blow away anything we've seen before.

I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

Today at @answerdotai we've got something new for you: FSDP/QDoRA. We've tested it with @AIatMeta Llama3 and the results blow away anything we've seen before. I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

account_circle

Benjamin Warner

@benjamin_warner

9 months ago

I've written up my study group lectures on implementing Transformers in PyTorch into a blog series:

Creating Transformers from Scratch:

- Part 1: The Attention Mechanism benjaminwarner.dev/2023/07/01/att…

- Part 2: The Rest of the Transformer benjaminwarner.dev/2023/07/28/res…

account_circle

AK

@_akhaliq

1 week ago

Microsoft announces Phi-3

A Highly Capable Language Model Locally on Your Phone

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing,

account_circle

Awni Hannun

@awnihannun

1 week ago

Awesome new project: MLX Transformers from Odunayo

Runs a subset of 🤗 Transformers in MLX. No conversions needed, downloads directly from the hub.

Code: github.com/ToluClassics/m…
Install: pip install mlx-transformers
Example:

Awesome new project: MLX Transformers from @j___y_t Runs a subset of 🤗 Transformers in MLX. No conversions needed, downloads directly from the hub. Code: github.com/ToluClassics/m… Install: pip install mlx-transformers Example:

account_circle

bartowski

@bartowski1182

1 week ago

huggingface.co/bartowski/Meta…

I just remade and uploaded my quants for AI at Meta Llama 3 8B instruct GGUF to Hugging Face using the latest llamacpp release with official support, so no hacking needed to make the end token work, generation is perfect with llama.cpp ./main

Will have

thumb_up_off_alt87

chat_bubble_outline0

repeat8

shareShare

account_circle

clem 🤗

@ClementDelangue

1 week ago

Datasets might be more impactful than models at this point and this may be the GPT4 of datasets.

Courtesy of the amazing Guilherme who trained Falcon & the Hugging Face team!

account_circle

Thomas Wolf

@Thom_Wolf

1 week ago

Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes??

Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation

account_circle

Pedro Cuenca

@pcuenq

2 weeks ago

Fine-tuning Llama 3 with ORPO, by Maxime Labonne

huggingface.co/blog/mlabonne/…

thumb_up_off_alt20

chat_bubble_outline0

repeat4

shareShare

account_circle

Victor M

@victormustar

2 weeks ago

🗨️ Llama 3 is now available on HuggingChat ✨
hf.co/chat/models/me…

account_circle

Vaibhav (VB) Srivastav

@reach_vb

2 weeks ago

Here's all that we know about Meta Llama 3 so far

> Trained on 15T tokens
> 70B and 8B models released (along with instruction tuned)
> 8K context length
> 70B scores 82 on MMLU and 81.7 on Human eval
> 128K vocab tokenizer - utilises 15% less tokens
> Dense model architecture