Johannes Oswald(@oswaldjoh) 's Twitter Profileg
Johannes Oswald

@oswaldjoh

Research Scientist, Google Research & ETH Zurich alumni

ID:867349295464316928

calendar_today24-05-2017 11:59:21

143 Tweets

747 Followers

540 Following

Robert Lange(@RobertTLange) 's Twitter Profile Photo

🦎Can we teach Transformers to perform in-context Evolutionary Optimization? Surely! We propose Evolutionary Algorithm Distillation for pre-training Transformers to mimic teachers 🧑‍🏫

🎉 Work done Google DeepMind 🗼with Yingtao Tian & Yujin Tang 🤗

📜: arxiv.org/abs/2403.02985

🦎Can we teach Transformers to perform in-context Evolutionary Optimization? Surely! We propose Evolutionary Algorithm Distillation for pre-training Transformers to mimic teachers 🧑‍🏫 🎉 Work done @GoogleDeepMind 🗼with @alanyttian & @yujin_tang 🤗 📜: arxiv.org/abs/2403.02985
account_circle
Andrew Lampinen(@AndrewLampinen) 's Twitter Profile Photo

Very cool! Reminds me a bit of the hypernetwork meta-learning architecture we found was beneficial in pnas.org/doi/abs/10.107…
— awesome to see some theoretical justification for why it might be useful!

account_circle
Johannes Oswald(@oswaldjoh) 's Twitter Profile Photo

The power of linear self-attention!! Great reverse engineering by Max who finds a surprisingly simple optimizer inside the Transformers solving linear regression problems with varying noise levels.

account_circle
Johannes Oswald(@oswaldjoh) 's Twitter Profile Photo

Paper update alert!

Lots of recent papers about if and how well modern RNNs/SSMs like Mamba can learn in-context. A crazy(?) hypothesis (with some evidence in our paper 🥰): they actually solve these tasks by approximating attention. What do you think?

account_circle
Ekin Akyürek(@akyurekekin) 's Twitter Profile Photo

Can insights from synthetic experiments and interpretability lead to real improvements in language modeling? We:
> propose a formal model for in-context learning

> uncover 'n-gram heads' = high order induction heads, crucial for ICLL

> improve Transformer LM perplexity by 6.7%

Can insights from synthetic experiments and interpretability lead to real improvements in language modeling? We: > propose a formal model for in-context learning > uncover 'n-gram heads' = high order induction heads, crucial for ICLL > improve Transformer LM perplexity by 6.7%
account_circle
Ben Recht(@beenwrekt) 's Twitter Profile Photo

Since we just wrapped up an AI megaconference, it felt like a good day to plead for fewer papers. argmin.net/p/too-much-inf…

account_circle
Dimitris Papailiopoulos(@DimitrisPapail) 's Twitter Profile Photo

1/ Our paper is out!

Teaching Arithmetic to Small Transformers

We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT).

paper: arxiv.org/abs/2307.03381
Work led by:Nayoung Lee & Kartik Sreenivasan

Thread below.

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: arxiv.org/abs/2307.03381 Work led by:@nayoung_nylee & @KartikSreeni Thread below.
account_circle
Stephanie Chan(@scychan_brains) 's Twitter Profile Photo

We all know that in-context learning emerges in transformers... but our new work shows that it can actually then disappear, after long training times!

We dive into this **transience** phenomenon. arxiv.org/abs/2311.08360 🧵👇1/N

We all know that in-context learning emerges in transformers... but our new work shows that it can actually then disappear, after long training times! We dive into this **transience** phenomenon. arxiv.org/abs/2311.08360 🧵👇1/N
account_circle
Andrew Lampinen(@AndrewLampinen) 's Twitter Profile Photo

Very excited to share a substantial updated version of our preprint “Language models show human-like content effects on reasoning tasks!” TL;DR: LMs and humans show strikingly similar patterns in how the content of a logic problem affects their answers. Thread: 1/

Very excited to share a substantial updated version of our preprint “Language models show human-like content effects on reasoning tasks!” TL;DR: LMs and humans show strikingly similar patterns in how the content of a logic problem affects their answers. Thread: 1/
account_circle
Johannes Oswald(@oswaldjoh) 's Twitter Profile Photo

Incredible work from my friends and colleagues on a difficult credit assignment problem: 🍏, 🍏, 🗝️, 🍏, 🍏, 🚪, 🥰

account_circle
Blaise Aguera(@blaiseaguera) 's Twitter Profile Photo

Artificial General Intelligence is Already Here, from Peter Norvig and me on Noema Magazine 'Today’s most advanced AI models have many flaws, but decades from now they will be recognized as the first true examples of artificial general intelligence.' noemamag.com/artificial-gen…

account_circle
Rogério Guimarães(@rogerioagjr) 's Twitter Profile Photo

We're excited to share our latest work! We achieve SOTA results in segmentation, detection, and depth estimation, in single and cross-domain, by exploiting image-aligned text prompts in a pretrained diffusion backbone repurposed for vision tasks.

See vision.caltech.edu/tadp/
🧵👇

We're excited to share our latest work! We achieve SOTA results in segmentation, detection, and depth estimation, in single and cross-domain, by exploiting image-aligned text prompts in a pretrained diffusion backbone repurposed for vision tasks. See vision.caltech.edu/tadp/ 🧵👇
account_circle
Adrian Valente(@lowrank_adrian) 's Twitter Profile Photo

Blog post!!
Rumors of the death of RNNs have been largely exaggerated...
In this post I summarize why and how RNNs are making a comeback in ML, and what this means for theorists of neural comps.
Many thanks to Nicolas Zucchet for help and corrections!
adrian-valente.github.io/2023/10/03/lin…

Blog post!! Rumors of the death of RNNs have been largely exaggerated... In this post I summarize why and how RNNs are making a comeback in ML, and what this means for theorists of neural comps. Many thanks to @NicolasZucchet for help and corrections! adrian-valente.github.io/2023/10/03/lin…
account_circle
Nino Scherrer(@ninoscherrer) 's Twitter Profile Photo

Very happy to share that this work got accepted to as a spotlight 🥳

It's my personal first ever acceptance at NeurIPS - and got an additional poster as cherry on top!

account_circle