Chunting Zhou(@violet_zct) 's Twitter Profileg
Chunting Zhou

@violet_zct

Research Scientist at FAIR. PhD @CMU. she/her.

ID:3284146452

linkhttps://violet-zct.github.io/ calendar_today19-07-2015 09:41:45

119 Tweets

2,0K Followers

270 Following

AK(@_akhaliq) 's Twitter Profile Photo

Meta announces Megalodon

Efficient LLM Pretraining and Inference with Unlimited Context Length

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and

Meta announces Megalodon Efficient LLM Pretraining and Inference with Unlimited Context Length The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and
account_circle
Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

Our 12 scaling laws (for LLM knowledge capacity) are out: arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions

Our 12 scaling laws (for LLM knowledge capacity) are out: arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions
account_circle
Chunting Zhou(@violet_zct) 's Twitter Profile Photo

I will be at NeurIPS 12/11-12/14, happy to meet friends and chat about (pretraining) of new efficient architectures and multimodality foundation models. Feel free to stop by and say hi at my poster session (LIMA) Wednesday 10:45-12:45 at Great hall &Hall B1 +B2.

account_circle
Sasha Rush(@srush_nlp) 's Twitter Profile Photo

As with LMs, modern Diffusion models rely heavily on Attention. This improves quality but requires patching to scale. Working with Apple, we designed a model without attention that matches top imagenet accuracy and removes this resolution bottleneck.

arxiv.org/abs/2311.18257

As with LMs, modern Diffusion models rely heavily on Attention. This improves quality but requires patching to scale. Working with Apple, we designed a model without attention that matches top imagenet accuracy and removes this resolution bottleneck. arxiv.org/abs/2311.18257
account_circle
AI at Meta(@AIatMeta) 's Twitter Profile Photo

New AI research paper from Meta — MART, or Multi-round Automatic Red-Teaming is a framework for improving LLM safety that trains an adversarial and target LLM through automatic iterative adversarial red-teaming.

Details in the paper ➡️ bit.ly/40H1l2z

New AI research paper from Meta — MART, or Multi-round Automatic Red-Teaming is a framework for improving LLM safety that trains an adversarial and target LLM through automatic iterative adversarial red-teaming. Details in the paper ➡️ bit.ly/40H1l2z
account_circle
Mengzhou Xia(@xiamengzhou) 's Twitter Profile Photo

We release the strongest public 1.3B and 3B models so far – the ShearedLLaMA series.
Structured pruning from a large model to a small one is far more cost-effective (only 3%!) than pre-training them from scratch!

Check out our paper and models at: xiamengzhou.github.io/sheared-llama/
[1/n]

We release the strongest public 1.3B and 3B models so far – the ShearedLLaMA series. Structured pruning from a large model to a small one is far more cost-effective (only 3%!) than pre-training them from scratch! Check out our paper and models at: xiamengzhou.github.io/sheared-llama/ [1/n]
account_circle
Weijia Shi(@WeijiaShi2) 's Twitter Profile Photo

Introduce In-Context Pretraining🖇️: train LMs on contexts of related documents. Improving 7B LM by simply reordering pretrain docs
📈In-context learning +8%
📈Faithful +16%
📈Reading comprehension +15%
📈Retrieval augmentation +9%
📈Long-context reason +5%
arxiv.org/abs/2301.12652

Introduce In-Context Pretraining🖇️: train LMs on contexts of related documents. Improving 7B LM by simply reordering pretrain docs 📈In-context learning +8% 📈Faithful +16% 📈Reading comprehension +15% 📈Retrieval augmentation +9% 📈Long-context reason +5% arxiv.org/abs/2301.12652
account_circle
Jason Weston(@jaseweston) 's Twitter Profile Photo

🚨New Paper 🚨
Self-Alignment with Instruction Backtranslation

- New method auto-labels web text with instructions & curates high quality ones for FTing

- Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst

arxiv.org/abs/2308.06259
(1/4)🧵

🚨New Paper 🚨 Self-Alignment with Instruction Backtranslation - New method auto-labels web text with instructions & curates high quality ones for FTing - Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst arxiv.org/abs/2308.06259 (1/4)🧵
account_circle