Jeremy Howard(@jeremyphoward) 's Twitter Profileg
Jeremy Howard

@jeremyphoward

🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ;
Hon Professor: @UQSchoolITEE ;
Digital Fellow: @Stanford

ID:175282603

linkhttp://answer.ai calendar_today06-08-2010 04:58:18

54,8K Tweets

220,6K Followers

4,9K Following

JJ(@JosephJacks_) 's Twitter Profile Photo

“Open source AI has risks, but it’s potentially a much bigger risk that one institution controls the most powerful AI” 🎯🎯🎯🎯🎯🎯🎯🎯🎯🎯🎯🎯🎯

account_circle
Benjamin Warner(@benjamin_warner) 's Twitter Profile Photo

If finetuning Llama 3 w/ Hugging Face, use Transformers 4.37 or 4.40.

Llama & Gemma in 4.38 & 4.39 don't use PyTorch's Flash Attention 2 kernel, leading to high memory usage.

4.40 uses FA2 in eager mode, but not with torch.compile. I'm working with HF to fully fix this.

account_circle
Jeremy Howard(@jeremyphoward) 's Twitter Profile Photo

> 'I'm not based on LLaMA 3'

I'm surprised that most modern LLMs still aren't being fine tuned to correctly answer basic questions about themselves.

Intuitively, users expect that they can ask an LLM about itself, and they generally trust the answers provided.

> 'I'm not based on LLaMA 3' I'm surprised that most modern LLMs still aren't being fine tuned to correctly answer basic questions about themselves. Intuitively, users expect that they can ask an LLM about itself, and they generally trust the answers provided.
account_circle
Vipul Ved Prakash(@vipulved) 's Twitter Profile Photo

These models are incredible, and a massive step forward for OSS AI. Amazing work from Meta team!

On Together AI now at 350 t/s for full precision on 8B and 150 t/s on 70B.

api.together.xyz/playground/cha…

account_circle
Jeremy Howard(@jeremyphoward) 's Twitter Profile Photo

Claude has a nice trick where you prefill the start of the assistant response, and it continues from there. Anyone know if Llama 3 can do the same thing?

account_circle
Maziyar PANAHI(@MaziyarPanahi) 's Twitter Profile Photo

Mixtral-8x22B-Instruct-v0.1, going wild on TOOLS & FUNCTION CALLING:

'<unk>'
'<s>',
'</s>',
'[INST]',
'[/INST]',
'[TOOL_CALLS]',
'[AVAILABLE_TOOLS]',
'[/AVAILABLE_TOOLS]',
'[TOOL_RESULT]',
'[/TOOL_RESULTS]',

account_circle
Nathan Lambert(@natolambert) 's Twitter Profile Photo

Diff of llama 3 license to llama 2: Mostly around sharing, built with llama 3 branding, agree to meta brand guidelines for distributing trademark
Some minor other differences

Diff of llama 3 license to llama 2: Mostly around sharing, built with llama 3 branding, agree to meta brand guidelines for distributing trademark Some minor other differences
account_circle
Daniel Han(@danielhanchen) 's Twitter Profile Photo

is out! It's the same architecture as Llama-2, except for some differences:
1. 128K Tiktoken vocab vs 32K vocab of Llama-2
2. 15 Trillion tokens instead of 2T
3. 8 billion model uses GQA (unlike Llama 7b)
4. 8K Context Length
5. Chinchilla scaling laws - log linear gains!…

#LLaMA3 is out! It's the same architecture as Llama-2, except for some differences: 1. 128K Tiktoken vocab vs 32K vocab of Llama-2 2. 15 Trillion tokens instead of 2T 3. 8 billion model uses GQA (unlike Llama 7b) 4. 8K Context Length 5. Chinchilla scaling laws - log linear gains!…
account_circle
jackson petty(@jowenpetty) 's Twitter Profile Photo

Mikel Artetxe I'd just like to interject for a moment. What you're referring to as <model>,
is in fact, Llama 3/<model>, or as I've recently taken to calling it, Llama 3 plus <model>.

account_circle