Xi Ye (@xiye_nlp) Twitter Tweets • TwiCopy

Xi Ye

@xiye_nlp

+ Follow

I study NLP. CS PhD student @UTAustin. Incoming postdoc fellow @PrincetonPLI. Incoming assistant professor @UAlberta (Summer 2025).

ID:1242135548040548352

linkhttps://www.cs.utexas.edu/~xiye/ calendar_today23-03-2020 17:06:14

127 Tweets

1,5K Followers

305 Following

Prasann Singhal

@prasann_singhal

1 week ago

Labeling preferences online for LLM alignment improves DPO vs using static prefs. We show we can use online prefs to train a reward model and label *even more* preferences to train the LLM.

D2PO: discriminator-guided DPO

Work w/ Nathan Lambert Scott Niekum Tanya Goyal Greg Durrett

Labeling preferences online for LLM alignment improves DPO vs using static prefs. We show we can use online prefs to train a reward model and label *even more* preferences to train the LLM. D2PO: discriminator-guided DPO Work w/ @natolambert @scottniekum @tanyaagoyal @gregd_nlp

thumb_up_off_alt109

chat_bubble_outline0

account_circle

Yoonsang Lee

2 weeks ago

Can LMs correctly distinguish🔎 confusing entity mentions in multiple documents?

We study how current LMs perform QA task when provided ambiguous questions and a document set📚 that requires challenging entity disambiguation.

Work done at Computer Science at UT Austin✨ w/ Xi Ye, Eunsol Choi

Can LMs correctly distinguish🔎 confusing entity mentions in multiple documents? We study how current LMs perform QA task when provided ambiguous questions and a document set📚 that requires challenging entity disambiguation. Work done at @UTCompSci✨ w/ @xiye_nlp, @eunsolc

thumb_up_off_alt42

chat_bubble_outline0

account_circle

Liyan Tang

3 weeks ago

🔎📄New model & benchmark to check LLMs’ output against docs (e.g., fact-check RAG)

🕵️ MiniCheck: a model w/GPT-4 accuracy @ 400x cheaper

📚LLM-AggreFact: collects 10 human-labeled datasets of errors in model outputs

arxiv.org/abs/2404.10774
w/ Philippe Laban, Greg Durrett 🧵

🔎📄New model & benchmark to check LLMs’ output against docs (e.g., fact-check RAG) 🕵️ MiniCheck: a model w/GPT-4 accuracy @ 400x cheaper 📚LLM-AggreFact: collects 10 human-labeled datasets of errors in model outputs arxiv.org/abs/2404.10774 w/ @PhilippeLaban, @gregd_nlp 🧵

thumb_up_off_alt81

chat_bubble_outline0

account_circle

Greg Durrett

1 month ago

This is a cool method, but 'superhuman' is an overclaim based on the data shown. There are better datasets than FActScore for evaluating this:
ExpertQA arxiv.org/abs/2309.07852 by Chaitanya Malaviya +al
Factcheck-GPT arxiv.org/abs/2311.09000 by Yuxia Wang +al (+ same methodology) 🧵

thumb_up_off_alt183

chat_bubble_outline0

account_circle

Fangyuan Xu

2 months ago

Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task?

We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.

Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task? We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.

thumb_up_off_alt210

chat_bubble_outline0

account_circle

Zayne Sprague @ ICLR 24

3 months ago

Super excited to bring ChatGPT Murder Mysteries to #ICLR2024 from our dataset MuSR as a spotlight presentation!

A big shout-out goes to my coauthors, Xi Ye Kaj Bostrom Swarat Chaudhuri @ ICLR 2024 and Greg Durrett

See you all there 😀

thumb_up_off_alt42

chat_bubble_outline0

account_circle

Xi Ye

4 months ago

#NeurIPS2023 #NeurIPS2033

#NeurIPS2023 #NeurIPS2033

thumb_up_off_alt26

chat_bubble_outline0

account_circle

Xi Ye

5 months ago

Heading to #NeurIPS2023 ✈️

Happy to chat about LLM explanations, LLM reasoning, and more (as well as crawfish and oyster🦪

📣📣I am also on Academic jobmarket this year. Any chat welcome!

I’ll present SatLM with Jocelyn(Qiaochu) Chen and Greg Durrett on Wednesday (Poster Session 4)

thumb_up_off_alt68

chat_bubble_outline0

account_circle

Xi Ye

5 months ago

Still feel weird about the virtual board thing😂
Here’s a trick that helped me get some interaction
#EMNLP2023

Still feel weird about the virtual board thing😂 Here’s a trick that helped me get some interaction #EMNLP2023

thumb_up_off_alt37

chat_bubble_outline0

account_circle

Jocelyn(Qiaochu) Chen

5 months ago

🌟 I'll be at NeurIPS from Dec 10-13 and would love to discuss neurosymbolic programming, synthesis, and the use of LLMs in programming. I am on the academic job market this year, so also open to chatting about opportunities. If you'd like to talk or grab a coffee, just ping me!

thumb_up_off_alt20

chat_bubble_outline0

account_circle

Yoonsang Lee

5 months ago

Known example❗️ or Unknown example❓ to prompt an LM?

We propose best practices for crafting ✏️ in-context examples according to LMs' parametric knowledge 📚.

Work done at Computer Science at UT Austin ✨ w/ Pranav Atreya, Xi Ye, Eunsol Choi

lilys012.github.io/assets/pdf/cra…

Known example❗️ or Unknown example❓ to prompt an LM? We propose best practices for crafting ✏️ in-context examples according to LMs' parametric knowledge 📚. Work done at @UTCompSci ✨ w/ @pranav_atreya, @xiye_nlp, @eunsolc lilys012.github.io/assets/pdf/cra…

thumb_up_off_alt33

chat_bubble_outline0

account_circle

Manya Wadhwa

5 months ago

Excited to share our updated preprint (w/ Jay Chen, Jessy Li , Greg Durrett)

📜 arxiv.org/pdf/2305.14770…

We show that LLMs can help understand nuances of annotation: they can convert the expressiveness of natural language explanations to a numerical form
🧵

Excited to share our updated preprint (w/ @jfchen, @jessyjli , @gregd_nlp) 📜 arxiv.org/pdf/2305.14770… We show that LLMs can help understand nuances of annotation: they can convert the expressiveness of natural language explanations to a numerical form 🧵

thumb_up_off_alt97

chat_bubble_outline0

account_circle

Shankar Padmanabhan

6 months ago

How do we teach LMs about new entities? Our #NeurIPS2023 paper proposes a distillation-based method to inject new entities via definitions. The LM can then make inferences that go beyond those definitions!
arxiv.org/abs/2306.09306
w/Yasumasa Onoe, Michael Zhang, Greg Durrett, Eunsol Choi

How do we teach LMs about new entities? Our #NeurIPS2023 paper proposes a distillation-based method to inject new entities via definitions. The LM can then make inferences that go beyond those definitions! arxiv.org/abs/2306.09306 w/@yasumasa_onoe, @mjqzhang, @gregd_nlp, @eunsolc

thumb_up_off_alt64

chat_bubble_outline0

account_circle

Xi Ye

6 months ago

Check out Zayne Sprague 's super interesting work, MuSR.

We use GPT-4 to create murder mysteries🕵️ (and more) to test LLMs' reasoning abilities. We all had a lot of fun solving the mysteries during the project LOL.

thumb_up_off_alt7

chat_bubble_outline0

account_circle

CLS

6 months ago

How can we humans verify the truthfulness of LLM outputs (or any claims you see on the Internet)? Should we ask ChatGPT ( #LLMs )? Search on Google (retrieval)? Are they complementary?

Tldr: LLMs Help Humans Verify Truthfulness - Except When They Are Convincingly Wrong!

1/n

How can we humans verify the truthfulness of LLM outputs (or any claims you see on the Internet)? Should we ask ChatGPT (#LLMs)? Search on Google (retrieval)? Are they complementary? Tldr: LLMs Help Humans Verify Truthfulness - Except When They Are Convincingly Wrong! 1/n

thumb_up_off_alt234

chat_bubble_outline0

account_circle

Xi Ye

6 months ago

SatLM is now accepted at #NeurIPS2023

One strength💪of this framework is that it handles many tasks with the same solver (see previous 🧵). We added new results on BoardgameQA (released recently). SatLM also shows SOTA💡on this dataset requiring lots of commonsense knowledge.

SatLM is now accepted at #NeurIPS2023 One strength💪of this framework is that it handles many tasks with the same solver (see previous 🧵). We added new results on BoardgameQA (released recently). SatLM also shows SOTA💡on this dataset requiring lots of commonsense knowledge.

thumb_up_off_alt63

chat_bubble_outline0

account_circle

Jocelyn(Qiaochu) Chen

6 months ago

I will be presenting semantic regex in the Program Synthesis 2 session, Wed 2 - 3:30pm. I will be at the conference Tues to Fri. Feel free to say hi and connect or chat about potential job opportunities (I am on the academic job market this year)!

thumb_up_off_alt21

chat_bubble_outline0

account_circle

fpc ok :)