Xi Ye(@xiye_nlp) 's Twitter Profileg
Xi Ye

@xiye_nlp

I study NLP. CS PhD student @UTAustin. Incoming postdoc fellow @PrincetonPLI. Incoming assistant professor @UAlberta (Summer 2025).

ID:1242135548040548352

linkhttps://www.cs.utexas.edu/~xiye/ calendar_today23-03-2020 17:06:14

127 Tweets

1,5K Followers

305 Following

Prasann Singhal(@prasann_singhal) 's Twitter Profile Photo

Labeling preferences online for LLM alignment improves DPO vs using static prefs. We show we can use online prefs to train a reward model and label *even more* preferences to train the LLM.

D2PO: discriminator-guided DPO

Work w/ Nathan Lambert Scott Niekum Tanya Goyal Greg Durrett

Labeling preferences online for LLM alignment improves DPO vs using static prefs. We show we can use online prefs to train a reward model and label *even more* preferences to train the LLM. D2PO: discriminator-guided DPO Work w/ @natolambert @scottniekum @tanyaagoyal @gregd_nlp
account_circle
Yoonsang Lee(@yoonsang_) 's Twitter Profile Photo

Can LMs correctly distinguish🔎 confusing entity mentions in multiple documents?

We study how current LMs perform QA task when provided ambiguous questions and a document set📚 that requires challenging entity disambiguation.

Work done at Computer Science at UT Austin✨ w/ Xi Ye, Eunsol Choi

Can LMs correctly distinguish🔎 confusing entity mentions in multiple documents? We study how current LMs perform QA task when provided ambiguous questions and a document set📚 that requires challenging entity disambiguation. Work done at @UTCompSci✨ w/ @xiye_nlp, @eunsolc
account_circle
Liyan Tang(@LiyanTang4) 's Twitter Profile Photo

🔎📄New model & benchmark to check LLMs’ output against docs (e.g., fact-check RAG)

🕵️ MiniCheck: a model w/GPT-4 accuracy @ 400x cheaper

📚LLM-AggreFact: collects 10 human-labeled datasets of errors in model outputs

arxiv.org/abs/2404.10774
w/ Philippe Laban, Greg Durrett 🧵

🔎📄New model & benchmark to check LLMs’ output against docs (e.g., fact-check RAG) 🕵️ MiniCheck: a model w/GPT-4 accuracy @ 400x cheaper 📚LLM-AggreFact: collects 10 human-labeled datasets of errors in model outputs arxiv.org/abs/2404.10774 w/ @PhilippeLaban, @gregd_nlp 🧵
account_circle
Greg Durrett(@gregd_nlp) 's Twitter Profile Photo

This is a cool method, but 'superhuman' is an overclaim based on the data shown. There are better datasets than FActScore for evaluating this:
ExpertQA arxiv.org/abs/2309.07852 by Chaitanya Malaviya +al
Factcheck-GPT arxiv.org/abs/2311.09000 by Yuxia Wang +al (+ same methodology) 🧵

account_circle
Fangyuan Xu(@brunchavecmoi) 's Twitter Profile Photo

Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task?

We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.

Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task? We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.
account_circle
Zayne Sprague @ ICLR 24(@ZayneSprague) 's Twitter Profile Photo

Super excited to bring ChatGPT Murder Mysteries to from our dataset MuSR as a spotlight presentation!

A big shout-out goes to my coauthors, Xi Ye Kaj Bostrom Swarat Chaudhuri @ ICLR 2024 and Greg Durrett

See you all there 😀

account_circle
Xi Ye(@xiye_nlp) 's Twitter Profile Photo

Heading to ✈️

Happy to chat about LLM explanations, LLM reasoning, and more (as well as crawfish and oyster🦪

📣📣I am also on Academic jobmarket this year. Any chat welcome!

I’ll present SatLM with Jocelyn(Qiaochu) Chen and Greg Durrett on Wednesday (Poster Session 4)

account_circle
Jocelyn(Qiaochu) Chen(@jocelynqchen) 's Twitter Profile Photo

🌟 I'll be at NeurIPS from Dec 10-13 and would love to discuss neurosymbolic programming, synthesis, and the use of LLMs in programming. I am on the academic job market this year, so also open to chatting about opportunities. If you'd like to talk or grab a coffee, just ping me!

account_circle
Yoonsang Lee(@yoonsang_) 's Twitter Profile Photo

Known example❗️ or Unknown example❓ to prompt an LM?

We propose best practices for crafting ✏️ in-context examples according to LMs' parametric knowledge 📚.

Work done at Computer Science at UT Austin ✨ w/ Pranav Atreya, Xi Ye, Eunsol Choi

lilys012.github.io/assets/pdf/cra…

Known example❗️ or Unknown example❓ to prompt an LM? We propose best practices for crafting ✏️ in-context examples according to LMs' parametric knowledge 📚. Work done at @UTCompSci ✨ w/ @pranav_atreya, @xiye_nlp, @eunsolc lilys012.github.io/assets/pdf/cra…
account_circle
Manya Wadhwa(@ManyaWadhwa1) 's Twitter Profile Photo

Excited to share our updated preprint (w/ Jay Chen, Jessy Li , Greg Durrett)

📜 arxiv.org/pdf/2305.14770…

We show that LLMs can help understand nuances of annotation: they can convert the expressiveness of natural language explanations to a numerical form
🧵

Excited to share our updated preprint (w/ @jfchen, @jessyjli , @gregd_nlp) 📜 arxiv.org/pdf/2305.14770… We show that LLMs can help understand nuances of annotation: they can convert the expressiveness of natural language explanations to a numerical form 🧵
account_circle
Shankar Padmanabhan(@shankarpad8) 's Twitter Profile Photo

How do we teach LMs about new entities? Our paper proposes a distillation-based method to inject new entities via definitions. The LM can then make inferences that go beyond those definitions!
arxiv.org/abs/2306.09306
w/Yasumasa Onoe, Michael ZhangGreg Durrett, Eunsol Choi

How do we teach LMs about new entities? Our #NeurIPS2023 paper proposes a distillation-based method to inject new entities via definitions. The LM can then make inferences that go beyond those definitions! arxiv.org/abs/2306.09306 w/@yasumasa_onoe, @mjqzhang,  @gregd_nlp, @eunsolc
account_circle
Xi Ye(@xiye_nlp) 's Twitter Profile Photo

Check out Zayne Sprague 's super interesting work, MuSR.

We use GPT-4 to create murder mysteries🕵️ (and more) to test LLMs' reasoning abilities. We all had a lot of fun solving the mysteries during the project LOL.

account_circle
CLS(@ChengleiSi) 's Twitter Profile Photo

How can we humans verify the truthfulness of LLM outputs (or any claims you see on the Internet)? Should we ask ChatGPT ( )? Search on Google (retrieval)? Are they complementary?

Tldr: LLMs Help Humans Verify Truthfulness - Except When They Are Convincingly Wrong!

1/n

How can we humans verify the truthfulness of LLM outputs (or any claims you see on the Internet)? Should we ask ChatGPT (#LLMs)? Search on Google (retrieval)? Are they complementary? Tldr: LLMs Help Humans Verify Truthfulness - Except When They Are Convincingly Wrong! 1/n
account_circle
Xi Ye(@xiye_nlp) 's Twitter Profile Photo

SatLM is now accepted at

One strength💪of this framework is that it handles many tasks with the same solver (see previous 🧵). We added new results on BoardgameQA (released recently). SatLM also shows SOTA💡on this dataset requiring lots of commonsense knowledge.

SatLM is now accepted at #NeurIPS2023 One strength💪of this framework is that it handles many tasks with the same solver (see previous 🧵). We added new results on BoardgameQA (released recently). SatLM also shows SOTA💡on this dataset requiring lots of commonsense knowledge.
account_circle
Jocelyn(Qiaochu) Chen(@jocelynqchen) 's Twitter Profile Photo

I will be presenting semantic regex in the Program Synthesis 2 session, Wed 2 - 3:30pm. I will be at the conference Tues to Fri. Feel free to say hi and connect or chat about potential job opportunities (I am on the academic job market this year)!

account_circle