Tom Sherborne(@tomsherborne) 's Twitter Profile Photo

🚨 new paper 🚨

Can we train for flat minima with less catastrophic forgetting?

We propose Trust Region Aware Minimization for smoothness in parameter+representations. TL;DR representations matter as much as parameters!

arxiv.org/abs/2310.03646 w/Naomi Saphra Pradeep Dasigi Hao Peng

🚨 new paper 🚨

Can we train for flat minima with less catastrophic forgetting?

We propose Trust Region Aware Minimization for smoothness in parameter+representations. TL;DR representations matter as much as parameters!

arxiv.org/abs/2310.03646 w/@nsaphra @pdasigi @haopeng_nlp
account_circle
Tom Sherborne(@tomsherborne) 's Twitter Profile Photo

CR for TRAM is now live! See you at in Vienna (as a spotlight poster)

now feat.
* Vision exps (Better Imagenet→CIFAR/Cars/Flowers transfer)
* +ablations (XL model, weird combos)
* Pictures (see below!)
w/ Naomi Saphra Pradeep Dasigi Hao Peng

openreview.net/forum?id=kxebD…

CR for TRAM is now live! See you at #ICLR2024 in Vienna (as a spotlight poster)

now feat. 
* Vision exps (Better Imagenet→CIFAR/Cars/Flowers transfer)
* +ablations (XL model, weird combos)
* Pictures (see below!)
w/ @nsaphra @pdasigi @haopeng_nlp 

openreview.net/forum?id=kxebD…
account_circle
Igor Kotenkov(@stalkermustang) 's Twitter Profile Photo

Yao Fu Yizhong Wang Guangxuan Xiao Hao Peng Really important one. Hope to see a framework to detect & store only 10-15% of heads cache to support longer context (w/ sliding attention). Likely, there are other heads that we wanna store, I doubt there are more than 20% for most of the tasks.

@Francis_YAO_ @yizhongwyz @Guangxuan_Xiao @haopeng_nlp Really important one. Hope to see a framework to detect & store only 10-15% of heads cache to support longer context (w/ sliding attention). Likely, there are other heads that we wanna store, I doubt there are more than 20% for most of the tasks.
account_circle
Tom Sherborne(@tomsherborne) 's Twitter Profile Photo

TRAM is accepted to ICLR 2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to Naomi Saphra, Pradeep Dasigi, Hao Peng and Allen Institute for AI AllenNLP

Vision experiments, more discussion and visuals coming soon to the camera ready!

account_circle
Tom Sherborne(@tomsherborne) 's Twitter Profile Photo

I'll be at next week in Vienna presenting TRAM at a Spotlight Poster! Come find me at Halle B Thu 9 May 10:45AM-12:45PM CEST

Lets talk about SAM, OOD generalisation, PhDing at EdinburghNLP or working at cohere

w/ Naomi Saphra Pradeep Dasigi Hao Peng

account_circle
TTIC(@TTIC_Connect) 's Twitter Profile Photo

Friday, April 12, 2024 at 11:00 am CT: TTIC/UChicagoCS NLP Seminar presents Hao Peng (@haopeng_nlp) of Illinois Computer Science with a talk titled 'Pushing the Boundaries of Length Generalization and Reasoning Capabilities of Open LLMs.' Please join us in Room 529, 5th floor at TTIC.

Friday, April 12, 2024 at 11:00 am CT: TTIC/@UChicagoCS NLP Seminar presents Hao Peng (@haopeng_nlp) of @IllinoisCS with a talk titled 'Pushing the Boundaries of Length Generalization and Reasoning Capabilities of Open LLMs.' Please join us in Room 529, 5th floor at TTIC.
account_circle
MichiganAI(@michigan_AI) 's Twitter Profile Photo

🎙️ Speaker Announcement🎙️
We're pleased to announce the keynote speakers to the 17th Midwest Speech & Language Days Symposium , happening University of Michigan, April 15-16:

🌟Eric Fosler-Lussier Eric Fosler-Lussier
🌟Hao Peng Hao Peng
🌟Betsy Sneller Betsy Sneller BLACK LIVES MATTER
🌟Emma Strubell Emma Strubell

🎙️ Speaker Announcement🎙️
We're pleased to announce the keynote speakers to the 17th Midwest Speech & Language Days Symposium #MSLD2024, happening @UMich, April 15-16:

🌟Eric Fosler-Lussier @EricFos
🌟Hao Peng @haopeng_nlp
🌟Betsy Sneller @betsysneller
🌟Emma Strubell @strubell
account_circle
Heng Ji(@hengjinlp) 's Twitter Profile Photo

Can we let LLM simulate human tutor to guide reasoning and problem solving?My amazing PhD student Xingyao Wang Xingyao Wang has just finished another line of innovative work, based on collaborations with my new wonderful colleague Hao Peng Hao Peng

arxiv.org/abs/2309.10691

account_circle
Genglin Liu(@genglin_liu) 's Twitter Profile Photo

Many thanks to my wonderful collaborators/advisor Xingyao Wang , @lifan__yuan , Yangyi Chen and Hao Peng !

Check out our paper for more details: arxiv.org/abs/2311.09731

account_circle
HamiltonHuaji(生活西化, 恐怖分子)(@HamiltonHuaji) 's Twitter Profile Photo

Yao Fu Yizhong Wang Guangxuan Xiao Hao Peng If you freeze everything but retrieval heads during continual pretraining, can you still get the same perfect retrieval accuracy as full parameter training?

account_circle