Chunting Zhou (@violet_zct) Twitter Tweets • TwiCopy

Meta announces Megalodon

Efficient LLM Pretraining and Inference with Unlimited Context Length

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and

thumb_up_off_alt1,1K

chat_bubble_outline0

repeat230

shareShare

account_circle

Zeyuan Allen-Zhu

@ZeyuanAllenZhu

1 month ago

Our 12 scaling laws (for LLM knowledge capacity) are out: arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions

account_circle

Chunting Zhou

@violet_zct

5 months ago

I will be at NeurIPS 12/11-12/14, happy to meet friends and chat about (pretraining) of new efficient architectures and multimodality foundation models. Feel free to stop by and say hi at my poster session (LIMA) Wednesday 10:45-12:45 at Great hall &Hall B1 +B2.

thumb_up_off_alt79

chat_bubble_outline0

repeat9

shareShare

account_circle

Sasha Rush

@srush_nlp

5 months ago

As with LMs, modern Diffusion models rely heavily on Attention. This improves quality but requires patching to scale. Working with Apple, we designed a model without attention that matches top imagenet accuracy and removes this resolution bottleneck.

arxiv.org/abs/2311.18257

account_circle

AI at Meta

@AIatMeta

6 months ago

New AI research paper from Meta — MART, or Multi-round Automatic Red-Teaming is a framework for improving LLM safety that trains an adversarial and target LLM through automatic iterative adversarial red-teaming.

Details in the paper ➡️ bit.ly/40H1l2z

account_circle

Mengzhou Xia

@xiamengzhou

7 months ago

We release the strongest public 1.3B and 3B models so far – the ShearedLLaMA series.
Structured pruning from a large model to a small one is far more cost-effective (only 3%!) than pre-training them from scratch!

Check out our paper and models at: xiamengzhou.github.io/sheared-llama/
[1/n]

account_circle

Weijia Shi

@WeijiaShi2

7 months ago

Introduce In-Context Pretraining🖇️: train LMs on contexts of related documents. Improving 7B LM by simply reordering pretrain docs
📈In-context learning +8%
📈Faithful +16%
📈Reading comprehension +15%
📈Retrieval augmentation +9%
📈Long-context reason +5%
arxiv.org/abs/2301.12652

account_circle

Jason Weston

@jaseweston

9 months ago

🚨New Paper 🚨
Self-Alignment with Instruction Backtranslation

- New method auto-labels web text with instructions & curates high quality ones for FTing

- Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst

arxiv.org/abs/2308.06259
(1/4)🧵

account_circle

Chunting Zhou

Sergey Edunov

AK

Zeyuan Allen-Zhu

Chunting Zhou

Sasha Rush

AI at Meta

Mengzhou Xia

Weijia Shi

Jason Weston