Shunyu Yao(@ShunyuYao12) 's Twitter Profileg
Shunyu Yao

@ShunyuYao12

Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)

ID:1271552707464032256

linkhttp://ysymyth.github.io calendar_today12-06-2020 21:19:32

575 Tweets

7,9K Followers

886 Following

Kilian Lieret(@KLieret) 's Twitter Profile Photo

You can now apply SWE-agent to any local repository and use any text file as the input issue instead of having to use GitHub repos/issues. Lots of people were asking for this! More information in the latest release notes: github.com/princeton-nlp/…

You can now apply SWE-agent to any local repository and use any text file as the input issue instead of having to use GitHub repos/issues. Lots of people were asking for this! More information in the latest release notes: github.com/princeton-nlp/…
account_circle
Ofir Press(@OfirPress) 's Twitter Profile Photo

It's been just 10 days since we launched SWE-agent but we already have 1.5k people in our Discord and lots of contributors on GitHub.

We've been making the agent easier to use and there are lots more exciting updates coming soon, including a web UI! Join us :)

It's been just 10 days since we launched SWE-agent but we already have 1.5k people in our Discord and lots of contributors on GitHub. We've been making the agent easier to use and there are lots more exciting updates coming soon, including a web UI! Join us :)
account_circle
Tianbao Xie(@TianbaoX) 's Twitter Profile Photo

🤔Can we assess agents across various apps & OS w.o. crafting new envs?

OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS.

+ annotated 369 real-world computer tasks…

account_circle
Ruibo Liu(@RuiboLiu) 's Twitter Profile Photo

Thanks Aran for sharing our work!

This is a survey paper I’ve been thinking about for a long time, as we have seen an increasing need for synthetic data. As we will probably run out of fresh tokens soon, the audience of this paper should be everyone who cares about AI progress.

account_circle
Shunyu Yao(@ShunyuYao12) 's Twitter Profile Photo

Will visit AGI House for the first time this Saturday and talk about SWE-agent, Agent-Computer Interface (ACI), and answer questions😃

account_circle
Shunyu Yao(@ShunyuYao12) 's Twitter Profile Photo

When I first saw Tree of Thoughts I also asked myself this😀 great exploration into if next-token prediction can simulate search, and if you're interested in this you probably also wanna check out arxiv.org/abs/2309.02427 last paragraph

account_circle
Ofir Press(@OfirPress) 's Twitter Profile Photo

You can now download & run SWE-agent (on any GitHub issue) in 1 line!

Check our repo for deets: github.com/princeton-nlp/…

Join our Discord to hear first about updates like this: discord.gg/AVEFbBn2rH

You can now download & run SWE-agent (on any GitHub issue) in 1 line! Check our repo for deets: github.com/princeton-nlp/… Join our Discord to hear first about updates like this: discord.gg/AVEFbBn2rH
account_circle
Shunyu Yao(@ShunyuYao12) 's Twitter Profile Photo

in some sense, math is the first programming language, and mathematician's mind (+scratchpad) is the first compiler

account_circle
Shunyu Yao(@ShunyuYao12) 's Twitter Profile Photo

People still surprised by such things across pairs among ReAct ToT Reflexion CoALA WebShop SWE-bench SWE-agent😂

People still surprised by such things across pairs among ReAct ToT Reflexion CoALA WebShop SWE-bench SWE-agent😂
account_circle
Karthik Narasimhan(@karthik_r_n) 's Twitter Profile Photo

SWE-agent is finally out. A few highlights:
1. Agent-Computer Interface (ACI) design will be critical for the success of AI agents, much like HCI is critical for how effective humans are with computers.
2. You can use SWE-agent out of the box on any github issue.
(1/2)

SWE-agent is finally out. A few highlights: 1. Agent-Computer Interface (ACI) design will be critical for the success of AI agents, much like HCI is critical for how effective humans are with computers. 2. You can use SWE-agent out of the box on any github issue. (1/2)
account_circle
Ofir Press(@OfirPress) 's Twitter Profile Photo

People are asking us how Claude 3 does with SWE-agent- not well. On SWE-bench Lite (a 10% subset of the test set) it gets almost 6% less (absolute) than GPT-4.

It's also much slower.

We'll have all the data in the preprint next week.

People are asking us how Claude 3 does with SWE-agent- not well. On SWE-bench Lite (a 10% subset of the test set) it gets almost 6% less (absolute) than GPT-4. It's also much slower. We'll have all the data in the preprint next week.
account_circle