Pedro Cuenca(@pcuenq) 's Twitter Profileg
Pedro Cuenca

@pcuenq

ML Engineer at 🤗 Hugging Face | Co-founder at LateNiteSoft (Camera+). I love AI and photography.

ID:1132965807896563714

calendar_today27-05-2019 11:04:32

1,6K Tweets

4,8K Followers

769 Following

Follow People
Freepik(@freepik) 's Twitter Profile Photo

If you're generating AI images and not using our upscaler, then you're missing half the story! 😛

Gift in the next tweet 👇

account_circle
Daniel Han(@danielhanchen) 's Twitter Profile Photo

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to Unsloth AI:

1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096?
2. Upcasted RoPE? Like Gemma?
3. Dynamic

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to @UnslothAI: 1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096? 2. Upcasted RoPE? Like Gemma? 3. Dynamic
account_circle
Vaibhav (VB) Srivastav(@reach_vb) 's Twitter Profile Photo

BOOM! Phi3 is now in Hugging Chat! 🔥

Bonus: MIT Licensed ⚡

Paired with Web-search, it is an unbeatable combination!

From my experience so far, Phi-3 is literally punching above its weight.

Try it out; put your vibe checks below ;)

account_circle
Vaibhav (VB) Srivastav(@reach_vb) 's Twitter Profile Photo

Wow! Phi 3 is wicked - GPU Poor ftw 🔥

Here's what we know so far:

Highlights

> 3.8B parameter model (also ran experiments on 7B and 14B)
> Trained on 3.3 Trillion tokens (4.8T for larger variants)
> 3.8B is competitive with Mixtral8x7B & GPT 3.5
> 69% on MMLU and 8.38 on

Wow! Phi 3 is wicked - GPU Poor ftw 🔥 Here's what we know so far: Highlights > 3.8B parameter model (also ran experiments on 7B and 14B) > Trained on 3.3 Trillion tokens (4.8T for larger variants) > 3.8B is competitive with Mixtral8x7B & GPT 3.5 > 69% on MMLU and 8.38 on
account_circle
Jeremy Howard(@jeremyphoward) 's Twitter Profile Photo

Today at Answer.AI we've got something new for you: FSDP/QDoRA. We've tested it with AI at Meta Llama3 and the results blow away anything we've seen before.

I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

Today at @answerdotai we've got something new for you: FSDP/QDoRA. We've tested it with @AIatMeta Llama3 and the results blow away anything we've seen before. I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵
account_circle
Benjamin Warner(@benjamin_warner) 's Twitter Profile Photo

I've written up my study group lectures on implementing Transformers in PyTorch into a blog series:

Creating Transformers from Scratch:

- Part 1: The Attention Mechanism benjaminwarner.dev/2023/07/01/att…

- Part 2: The Rest of the Transformer benjaminwarner.dev/2023/07/28/res…

account_circle
AK(@_akhaliq) 's Twitter Profile Photo

Microsoft announces Phi-3

A Highly Capable Language Model Locally on Your Phone

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing,

Microsoft announces Phi-3 A Highly Capable Language Model Locally on Your Phone We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing,
account_circle
Awni Hannun(@awnihannun) 's Twitter Profile Photo

Awesome new project: MLX Transformers from Odunayo

Runs a subset of 🤗 Transformers in MLX. No conversions needed, downloads directly from the hub.

Code: github.com/ToluClassics/m…
Install: pip install mlx-transformers
Example:

Awesome new project: MLX Transformers from @j___y_t Runs a subset of 🤗 Transformers in MLX. No conversions needed, downloads directly from the hub. Code: github.com/ToluClassics/m… Install: pip install mlx-transformers Example:
account_circle
bartowski(@bartowski1182) 's Twitter Profile Photo

huggingface.co/bartowski/Meta…

I just remade and uploaded my quants for AI at Meta Llama 3 8B instruct GGUF to Hugging Face using the latest llamacpp release with official support, so no hacking needed to make the end token work, generation is perfect with llama.cpp ./main

Will have

account_circle
clem 🤗(@ClementDelangue) 's Twitter Profile Photo

Datasets might be more impactful than models at this point and this may be the GPT4 of datasets.

Courtesy of the amazing Guilherme who trained Falcon & the Hugging Face team!

account_circle
Thomas Wolf(@Thom_Wolf) 's Twitter Profile Photo

Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes??

Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation

account_circle
Vaibhav (VB) Srivastav(@reach_vb) 's Twitter Profile Photo

Here's all that we know about Meta Llama 3 so far

> Trained on 15T tokens
> 70B and 8B models released (along with instruction tuned)
> 8K context length
> 70B scores 82 on MMLU and 81.7 on Human eval
> 128K vocab tokenizer - utilises 15% less tokens
> Dense model architecture

Here's all that we know about Meta Llama 3 so far > Trained on 15T tokens > 70B and 8B models released (along with instruction tuned) > 8K context length > 70B scores 82 on MMLU and 81.7 on Human eval > 128K vocab tokenizer - utilises 15% less tokens > Dense model architecture
account_circle