anton (@abacaj) Twitter Tweets • TwiCopy

anton

@abacaj

+ Follow

Software engineer. Hacking on large language models

ID:70514287

calendar_today31-08-2009 22:06:04

10,8K Tweets

36,1K Followers

518 Following

Rafael Rafailov

@rm_rafailov

8 hours ago

We train a family of LLMs on the tiny stories dataset and indeed verify significant model collapse in the iterative (replace) setting. However, surprisingly, in the data accumulation regime the model not only does not degrade, but improves with more iterations!

thumb_up_off_alt13

chat_bubble_outline0

repeat4

shareShare

account_circle

elvis

@omarsar0

1 day ago

When to Retrieve?

This new paper presents an approach to train LLMs to effectively utilize information retrieval.

It first proposes a training approach to teach an LLM to generate a special token, <RET>, when it's not confident or doesn't know the answer to a question.

The…

account_circle

Aran Komatsuzaki

@arankomatsuzaki

1 day ago

Meta presents Better & Faster Large Language Models via Multi-token Prediction

- training language models to predict multiple future tokens at once results in higher sample efficiency
- up to 3x faster at inference

arxiv.org/abs/2404.19737

account_circle

Jason Weston

@jaseweston

1 day ago

🚨 Iterative Reasoning Preference Optimization 🚨
- Iterative algorithm for reasoning tasks: generate pairs & apply DPO+NLL
- Improves accuracy over iterations on GSM8K, MATH, ARC & beats baselines
E.g. Llama2-70B GSM8K: 55.6%->81.6% (88.7% maj32)
arxiv.org/abs/2404.19733
🧵(1/5)

account_circle

anton

@abacaj

1 day ago

Turns out you can actually just run full 32k context on a single 3090 using vllm at higher precision (bf16). Just enable 'fp8' cache dtype. This is for llama-3 8B

account_circle