Johannes Oswald (@oswaldjoh) Twitter Tweets • TwiCopy

Johannes Oswald

@oswaldjoh

+ Follow

Research Scientist, Google Research & ETH Zurich alumni

ID:867349295464316928

calendar_today24-05-2017 11:59:21

143 Tweets

747 Followers

540 Following

Robert Lange

2 weeks ago

🦎Can we teach Transformers to perform in-context Evolutionary Optimization? Surely! We propose Evolutionary Algorithm Distillation for pre-training Transformers to mimic teachers 🧑‍🏫

🎉 Work done Google DeepMind 🗼with Yingtao Tian & Yujin Tang 🤗

📜: arxiv.org/abs/2403.02985

🦎Can we teach Transformers to perform in-context Evolutionary Optimization? Surely! We propose Evolutionary Algorithm Distillation for pre-training Transformers to mimic teachers 🧑‍🏫 🎉 Work done @GoogleDeepMind 🗼with @alanyttian & @yujin_tang 🤗 📜: arxiv.org/abs/2403.02985

thumb_up_off_alt139

chat_bubble_outline0

account_circle

Andrew Lampinen

@AndrewLampinen

1 month ago

Very cool! Reminds me a bit of the hypernetwork meta-learning architecture we found was beneficial in pnas.org/doi/abs/10.107…
— awesome to see some theoretical justification for why it might be useful!

thumb_up_off_alt41

chat_bubble_outline0

account_circle

Johannes Oswald

1 month ago

HyperNetworks are not dead yet! 😎

thumb_up_off_alt14

chat_bubble_outline0

account_circle

Johannes Oswald

1 month ago

😎

thumb_up_off_alt7

chat_bubble_outline0

account_circle

Johannes Oswald

2 months ago

The power of linear self-attention!! Great reverse engineering by Max who finds a surprisingly simple optimizer inside the Transformers solving linear regression problems with varying noise levels.

thumb_up_off_alt14

chat_bubble_outline0

account_circle

Johannes Oswald

2 months ago

Paper update alert!

Lots of recent papers about if and how well modern RNNs/SSMs like Mamba can learn in-context. A crazy(?) hypothesis (with some evidence in our paper 🥰): they actually solve these tasks by approximating attention. What do you think?

thumb_up_off_alt20

chat_bubble_outline0

account_circle

Dimitris Papailiopoulos

@DimitrisPapail

2 months ago

arxiv drop tonite

'Can Mamba Learn How to Learn?: A Comparative Study on In-Context Learning Tasks'

with all-star set of collaborations from Krafton inc. Seoul National University University of Michigan and UW–Madison

arxiv drop tonite 'Can Mamba Learn How to Learn?: A Comparative Study on In-Context Learning Tasks' with all-star set of collaborations from @Krafton_inc @SeoulNatlUni @UMich and @UWMadison

thumb_up_off_alt381

chat_bubble_outline0

account_circle

Ekin Akyürek

3 months ago

Can insights from synthetic experiments and interpretability lead to real improvements in language modeling? We:
> propose a formal model for in-context learning

> uncover 'n-gram heads' = high order induction heads, crucial for ICLL

> improve Transformer LM perplexity by 6.7%

Can insights from synthetic experiments and interpretability lead to real improvements in language modeling? We: > propose a formal model for in-context learning > uncover 'n-gram heads' = high order induction heads, crucial for ICLL > improve Transformer LM perplexity by 6.7%

thumb_up_off_alt412

chat_bubble_outline0

account_circle

Ben Recht

4 months ago

Since we just wrapped up an AI megaconference, it felt like a good day to plead for fewer papers. argmin.net/p/too-much-inf…

thumb_up_off_alt861

chat_bubble_outline0

account_circle

Dimitris Papailiopoulos

@DimitrisPapail

9 months ago

1/ Our paper is out!

Teaching Arithmetic to Small Transformers

We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT).

paper: arxiv.org/abs/2307.03381
Work led by:Nayoung Lee & Kartik Sreenivasan

Thread below.

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: arxiv.org/abs/2307.03381 Work led by:@nayoung_nylee & @KartikSreeni Thread below.

thumb_up_off_alt636

chat_bubble_outline0

account_circle

Nicolas Zucchet

@NicolasZucchet

4 months ago

Sasha Rush Actually linear RNNs with GLUs can behave like Linear Attention 😉 arxiv.org/abs/2309.01775

thumb_up_off_alt8

chat_bubble_outline0

account_circle

Johannes Oswald

4 months ago

Training RNNs without BPTT! Are there language models trained without backprop just around the corner … ?

thumb_up_off_alt14

chat_bubble_outline0

account_circle

Stephanie Chan

@scychan_brains

5 months ago

We all know that in-context learning emerges in transformers... but our new work shows that it can actually then disappear, after long training times!

We dive into this **transience** phenomenon. arxiv.org/abs/2311.08360 🧵👇1/N

We all know that in-context learning emerges in transformers... but our new work shows that it can actually then disappear, after long training times! We dive into this **transience** phenomenon. arxiv.org/abs/2311.08360 🧵👇1/N

thumb_up_off_alt465

chat_bubble_outline0

account_circle

Andrew Lampinen

@AndrewLampinen

6 months ago

Very excited to share a substantial updated version of our preprint “Language models show human-like content effects on reasoning tasks!” TL;DR: LMs and humans show strikingly similar patterns in how the content of a logic problem affects their answers. Thread: 1/

Very excited to share a substantial updated version of our preprint “Language models show human-like content effects on reasoning tasks!” TL;DR: LMs and humans show strikingly similar patterns in how the content of a logic problem affects their answers. Thread: 1/

thumb_up_off_alt253

chat_bubble_outline0

account_circle

Johannes Oswald

5 months ago

Incredible work from my friends and colleagues on a difficult credit assignment problem: 🍏, 🍏, 🗝️, 🍏, 🍏, 🚪, 🥰

thumb_up_off_alt7

chat_bubble_outline0

account_circle

Blaise Aguera

6 months ago

Artificial General Intelligence is Already Here, from Peter Norvig and me on Noema Magazine 'Today’s most advanced AI models have many flaws, but decades from now they will be recognized as the first true examples of artificial general intelligence.' noemamag.com/artificial-gen…

thumb_up_off_alt91

chat_bubble_outline0

account_circle

Rogério Guimarães

6 months ago

We're excited to share our latest work! We achieve SOTA results in segmentation, detection, and depth estimation, in single and cross-domain, by exploiting image-aligned text prompts in a pretrained diffusion backbone repurposed for vision tasks.

See vision.caltech.edu/tadp/
🧵👇

We're excited to share our latest work! We achieve SOTA results in segmentation, detection, and depth estimation, in single and cross-domain, by exploiting image-aligned text prompts in a pretrained diffusion backbone repurposed for vision tasks. See vision.caltech.edu/tadp/ 🧵👇

thumb_up_off_alt188

chat_bubble_outline0

account_circle

Adrian Valente

@lowrank_adrian

6 months ago

Blog post!!
Rumors of the death of RNNs have been largely exaggerated...
In this post I summarize why and how RNNs are making a comeback in ML, and what this means for theorists of neural comps.
Many thanks to Nicolas Zucchet for help and corrections!
adrian-valente.github.io/2023/10/03/lin…

Blog post!! Rumors of the death of RNNs have been largely exaggerated... In this post I summarize why and how RNNs are making a comeback in ML, and what this means for theorists of neural comps. Many thanks to @NicolasZucchet for help and corrections! adrian-valente.github.io/2023/10/03/lin…

thumb_up_off_alt213

chat_bubble_outline0

account_circle

Nino Scherrer

7 months ago

Very happy to share that this work got accepted to #NeurIPS2023 as a spotlight 🥳

It's my personal first ever acceptance at NeurIPS - and got an additional poster as cherry on top!

thumb_up_off_alt32

chat_bubble_outline0

account_circle

Johannes Oswald

7 months ago

The one and only!

thumb_up_off_alt5

chat_bubble_outline0

account_circle

fpc ok :)