0% found this document useful (0 votes)
99 views8 pages

Language Agents: Foundations and Risks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views8 pages

Language Agents: Foundations and Risks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Language Agents: Foundations, Prospects, and Risks

Yu Su1 Diyi Yang2 Shunyu Yao3 Tao Yu4


1
The Ohio State University, 2 Stanford University, 3 Princeton University, 4 University of Hong Kong
[email protected], [email protected], [email protected], [email protected]

1 Introduction
A heated discussion thread in AI and NLP is au-
tonomous agents, usually powered by large lan-
guage models (LLMs), that can follow language
instructions to carry out diverse and complex tasks
in real-world or simulated environments. There are
numerous proof-of-concept efforts on such agents
recently, including ChatGPT Plugins,1 AutoGPT,2
generative agents (Park et al., 2023), just to name a
few. The public is also showing an unprecedentedly
high level of excitement. For example, AutoGPT Figure 1: A conceptual framework for language agents.
has received 147K stars in just 4 months, making it
the fastest growing repository in the Github history, earlier AI agents, we suggest that these AI agents
despite its experimental nature with many known capable of using language for thought and commu-
and sometimes serious limitations. nication should be called “language agents,” for
However, the concept of agent has been intro- language being their most salient trait.
duced into AI since its dawn. So what has changed Language played a critical role in the evolution
recently? We argue that the most fundamental of biological intelligence, and now artificial intelli-
change is the capability of using language. Con- gence may be following a similar evolutionary path.
temporary AI agents use language as a vehicle for This is remarkable and concerning at the same time.
both thought and communication, a trait that was Despite the rapid progress, there has been a sig-
unique to humans. This dramatically expands the nificant lack of systematic discussions regarding
breadth and depth of the problems these agents can the conceptual definition, theoretical foundation,
possibly tackle, autonomously. The capability of promising directions, and risks associated with lan-
using language, bestowed by their LLM founda- guage agents. This proposed tutorial endeavors
tions, allows these agents to 1) use a wide range to fill this gap by giving a comprehensive account
of tools and reconcile their heterogeneous syntax of language agents based on both contemporary
and semantics (Parisi et al., 2022; Schick et al., and classic AI research while drawing connections
2023; Qin et al., 2023a; Patil et al., 2023; Qin to cognitive science, neuroscience, and linguistics
et al., 2023b; Mialon et al., 2023), 2) operate in when appropriate.
complex environments and ground to environment-
2 Outline of Tutorial Content
specific semantics (Brohan et al., 2023b; Yao et al.,
2022a; Gu et al., 2023; Wang et al., 2023a; Deng This cutting-edge tutorial will be half-day and
et al., 2023; Zhou et al., 2023), 3) conduct complex cover a conceptual framework for language agents
language-driven reasoning (Wei et al., 2022; Shinn as well as important topic areas including tool
et al., 2023; Chen et al., 2023), and 4) form sponta- augmentation, grounding, reasoning and planning,
neous multi-agent systems (Park et al., 2023; Liu multi-agent systems, and risks and societal impact.
et al., 2023b). Therefore, to distinguish from the
2.1 Overview [30mins]
1
https://openai.com/blog/chatgpt-plugins
2
https://github.com/Significant-Gravitas/ What are language agents and how they differ
Auto-GPT from the previous generations of AI agents? We
17
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pages 17–24
November 12–16, 2024. ©2024 Association for Computational Linguistics
will start by discussing why the capability of us- particularly problematic for tools that produce side
ing language for thought and communication em- effects in the world (e.g., a tool for sending emails).
powered by LLMs is the defining trait of the We will discuss the challenges and opportunities
contemporary agents, drawing connections to the around tool augmentation.
role language played in the evolution of biolog-
ical intelligence (Dennett, 2013). We will then 2.3 Grounding [30mins]
discuss a potential conceptual framework for lan- Most of the transformative applications of language
guage agents (Figure 1) and how each component agents involve connecting an agent to some real-
(agent/embodiment/environment) differs from pre- world environments (e.g., through tools or em-
vious agents. One foundational construct is mem- bodiment), be it databases (Cheng et al., 2023),
ory. We will discuss the resemblances and differ- knowledge bases (Gu et al., 2023), the web (Deng
ences between a language agent/LLM’s memory et al., 2023; Zhou et al., 2023), or the physical
and human memory, including the storage mecha- world (Brohan et al., 2023a). Each environment is
nism (Kandel, 2007), long-term memory (LLM’s a unique context that provides possibly different
parametric memory/vector databases), and work- interpretations of natural language. Grounding, i.e.,
ing memory (in-context learning), and how such the linking of (natural language) concepts to con-
memory may support general-purpose language- texts (Chandu et al., 2021), thus becomes a central
driven reasoning. We will wrap up this section by and pervasive challenge. There are two types of
outlining the key technical and societal aspects that grounding related to language agents:
will be discussed in the rest of the tutorial.
• Grounding natural language to an environ-
2.2 Tool Augmentation [30mins] ment (Gu et al., 2023). This is also closely
Tool augmentation or tool use (Schick et al., 2023; related to the meaning of natural language,
Mialon et al., 2023) is a natural extension of lan- which, as Bender and Koller (2020) put it, is
guage agents due to their capability of using lan- the mapping from an utterance to its commu-
guage for thought and communication. Language nicative intent.
agents start to demonstrate a possibility of au- • Grounding an agent’s decisions in its own con-
tonomously understanding and reconciling the het- text (i.e., working memory), which includes
erogeneous syntax and semantics (e.g., XML vs. external information from tools (Liu et al.,
JSON) of different tools (i.e., using language for 2023a; Yue et al., 2023; Gao et al., 2023;
communication), and orchestrating the tool exe- Cheng et al., 2023).
cution results into a coherent reasoning process
(i.e., using language for thought). At present, tool We will discuss the current work on both types of
augmentation mainly serves three purposes: grounding, the remaining challenges, and promis-
ing future directions.
• Provide up-to-date and/or domain-specific in-
formation (Nakano et al., 2021; Lazaridou 2.4 Reasoning and Planning [30mins]
et al., 2022; Guu et al., 2020).
The simplest way for language agents to interact
• Provide specialized capabilities (e.g., high-
with external worlds is to generate the next action
precision calculation) that a language agent
via the LLM (Nakano et al., 2021; Schick et al.,
may not have or be best at (Schick et al., 2023;
2023), but the mapping from context to action is
Shen et al., 2023; Cheng et al., 2023; Gao
often non-trivial and such approaches often require
et al., 2022).
fine-tuning to learn the mapping. Inspired by prior
• Enable a language agent to act in external
work that leverages intermediate reasoning to im-
environments (Liang et al., 2022; Wang et al.,
prove LLM performance (Nye et al., 2021; Wei
2023a).
et al., 2022), approaches such as ReAct (Yao et al.,
Two metrics are essential for practical tool aug- 2022b) start to leverage intermediate reasoning for
mentation: robustness, i.e., accuracy in using tools, better acting by flexibly analyzing environmental
and flexibility, i.e., ease of integrating a new tool. observations, making plans, tracking task status,
While existing efforts, e.g., ChatGPT Plugins, have recovering from exceptions, etc. Subsequent stud-
made meaningful progress on flexibility, robust- ies (Shinn et al., 2023; Chen et al., 2023) further
ness still presents a significant challenge. This is leverage LLM reasoning for explicit self evaluation,
18
critic, or reflection, to further improve agent per- augmentation can largely increase faithfulness of
formance. On the other hand, the simplest way for model output, but hallucination issues might still
language agents to plan multiple steps of actions is exist and could lead to misleading, unsecure, and
to generate an action plan (Huang et al., 2022), but even harmful output especially when it comes to
the token-by-token autoregressive decoding makes high-stake scenarios, raising key concerns towards
it hard to forecast planned future, backtrack from privacy and truthfulness of the resulting interac-
error, or maintain a global exploration structure tion. Bias and fairness remain another primary risk,
for planning. To this end, recent works have be- as language agents might inherit biases from the
gun to enhance LMs with re-planning (Song et al., training corpus. The simulated AI agents might per-
2022) or tree search algorithms (Yao et al., 2023; petuate stereotypes or discriminate against certain
Hao et al., 2023) to systematically explore and groups of people (Schramowski et al., 2022). Other
make decisions in the planning space, analogous potential risks include: the lack of transparency in
to planning-based agents such as AlphaGo (Silver why AI agents behave in their decision-making pro-
et al., 2016). We will also discuss the recent trend cess, the robustness in AI agents in terms of being
that blurs the boundary between reasoning and act- manipulated by malicious actors (Zou et al., 2023),
ing, which leads to a more unified methodology be- and the ethics in terms of what AI agents can and
tween reasoning and planning (e.g., Monte-Carlo cannot do, etc. Our tutorial will provide a detailed
tree search applied for both reasoning (Hao et al., walkthrough of these potential risks in AI agents
2023) and action planning (Silver et al., 2016)). (Aher et al., 2023), using a few representative case
studies to demonstrate how such risks might affect
2.5 Multi-Agent Systems [30mins] downstream applications, and how human-in-the-
When AI agents are equipped with the capabil- loop (Wu et al., 2022) or mixed initiative agents can
ity of using language for thought and communica- be leveraged to build more responsible language
tion, it starts to enable multi-agent systems quite agents. More importantly, we will briefly discuss
different from the conventional ones (Ferber and the multifaceted impact of language agents, when it
Weiss, 1999)—agents can now act and communi- comes to user trust (Hancock et al., 2020; Liu et al.,
cate with each other in a more autonomous fash- 2022), and cultural and societal implications. We
ion. On the one hand, agents may now be gen- will also discuss efforts on evaluating and bench-
erated with minimal specification instead of pre- marking language agents (Liu et al., 2023c,d).
programmed and can continually evolve through
use and communication to produce complex so- 3 Other Required Information
cial behaviors (Park et al., 2023), collaborate for
task solving (Wu et al., 2023; Qian et al., 2023; The proposed tutorial is considered a cutting-edge
Hong et al., 2023), or debate for more divergent tutorial that gives a systematic account of the
and faithful reasoning (Chan et al., 2023; Liang emerging topic of language agents. There is no
et al., 2023; Du et al., 2023). On the other hand, prior tutorial at *CL conferences that has covered
human users are also agents, and these artificial this topic. There are a few recent tutorials covering
language agents can interact with human agents in some related aspects of language agents, such as
much richer and more flexible ways than before. “ACL’23: Tutorial on Complex Reasoning over Nat-
There are numerous emerging opportunities, such ural Language” on reasoning, “ACL’23: Retrieval-
as providing guardrails and alignment for language based Language Models and Applications” on re-
agents (Bai et al., 2022) and resolving uncertain- trieval augmentation, and “EMNLP’23: Mitigating
ties (Yao et al., 2020). We will discuss the oppor- Societal Harms in Large Language Models” on
tunities and challenges in this new generation of societal considerations of LLMs. However, there
multi-agent and human-AI collaborative systems. lacks a comprehensive coverage on the foundations,
prospects, and risks of language agents, a void this
2.6 Risks and Societal Impact [30mins] proposed tutorial aspires to fill.
Despite being powerful in a wide range of tasks,
language agents are very likely to suffer from key 3.1 Target Audience and Prerequisites
risks and societal harms (Wang et al., 2023b). The This tutorial is targeted at a broad audience who are
first aspect is towards hallucination. The afore- interested in language agents. There are no strict
mentioned memory module, retrieval, or even tool prerequisites for the audience’s background, but
19
having 1) basic knowledge of machine learning and intelligence. His work at Microsoft has been de-
deep learning and 2) basic knowledge of language ployed as the official conversational interface for
models will help deeper understanding. Microsoft Outlook. His work on language agents
has won awards such as Outstanding Paper Award
3.2 Diversity and Inclusion at ACL’23 and COLING’22 and from the Ama-
We deeply value diversity and strongly believe it zon Alexa Prize Challenge. He has given 30+
can greatly help realize the tutorial’s goal and will invited talks internationally. Homepage: https:
ensure diversity in the following aspects: //ysu1989.github.io/.
Diversity of instructors. The instructor team has Diyi Yang is an assistant professor in the Computer
a diverse background including faculty members Science Department at Stanford University. Her
and graduate students from four institutes spanning research focuses on human-centered natural lan-
two continents and from different gender groups. guage processing and computational social science.
Diversity of participants. Language agents are an Diyi has organized four workshops at NLP con-
emerging multi-disciplinary research topic with a ferences: Widening NLP Workshops at NAACL
very high level of interests in both academia and in- 2018 and ACL 2019, Causal Inference workshop
dustry, so we expect a diverse audience. To further at EMNLP 2021, NLG Evaluation workshop at
promote the awareness of the tutorial in underrep- EMNLP 2021, and Shared Stories and Lessons
resented communities, we will work with affinity Learned workshop at EMNLP 2022. She gave a
groups such as Black in AI, WiNLP, and LatinX tutorial at ACL 2022 on Learning with Limited
in AI to broadcast the tutorial as well as solicit Data, and a tutorial at EACL 2023 on Summariz-
suggestions on the tutorial content. ing Conversations at Scale. Homepage: https:
Diversity of topics. Given the multi-disciplinary //cs.stanford.edu/~diyiy/.
nature of language agents, the materials of this tu- Shunyu Yao is a PhD student at Princeton NLP
torial will cover both contemporary and classic Group, advised by Karthik Narasimhan and sup-
AI/NLP research as well as related discussions ported by Harold W. Dodds Fellowship. His re-
from reinforcement learning, cognitive science, search focuses on various facets of developing
neuroscience, linguistics, human-computer interac- language agents, such as reasoning, acting, learn-
tion, and social science. ing, and benchmarking. Homepage: https://
3.3 Tutorial Logistics ysymyth.github.io.
Tao Yu is an assistant professor of computer sci-
Estimated audience size. Based on prior tutorials
ence at The University of Hong Kong. He com-
and workshops we organized on related topics, we
pleted his Ph.D. at Yale University and was a post-
expect 100-150 attendees including researchers
doctoral fellow at the University of Washington.
and practitioners in related fields.
His research aims to build language model agents
Open access. All materials will be released online
that ground language instructions into code or ac-
on a dedicated website for the tutorial.
tions executable in real-world environments. Tao
Preferred venue. We prefer to have the tutorial
is the recipient of an Amazon Research Award
co-located with ACL 2024 or EMNLP 2024.
and Google Scholar Research Award. He has co-
3.4 Breadth organized multiple workshops and a tutorial related
to language agents at ACL, EMNLP, and NAACL.
At least 60% of the tutorial will center around work
Homepage: https://taoyds.github.io/.
done by researchers other than the instructors. This
tutorial categorizes promising approaches for lan-
5 Ethics Statement
guage agents into several groups, and each of these
groups includes a significant amount of other re- Language agents, with the ability of autonomously
searchers’ works. acting in the real world, pose significant potential
ethical and safety risks. A main purpose of this
4 Tutorial Instructors
proposed tutorial is to systematically define and
Yu Su is a distinguished assistant professor of en- analyze the unique capabilities and associated risks
gineering at the Ohio State University. His re- of language agents. We have a dedicated section on
search investigates the role of language as a ve- risks and societal impact, and we also cover related
hicle for thought and communication in artificial discussion in every other section when appropriate.
20
References Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenen-
baum, and Igor Mordatch. 2023. Improving factual-
Gati V Aher, Rosa I Arriaga, and Adam Tauman Kalai. ity and reasoning in language models through multia-
2023. Using large language models to simulate mul- gent debate. arXiv preprint arXiv:2305.14325.
tiple humans and replicate human subject studies.
In International Conference on Machine Learning,
Jacques Ferber and Gerhard Weiss. 1999. Multi-agent
pages 337–371. PMLR.
systems: an introduction to distributed artificial in-
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, telligence, volume 1. Addison-wesley Reading.
Amanda Askell, Jackson Kernion, Andy Jones,
Anna Chen, Anna Goldie, Azalia Mirhoseini, Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon,
Cameron McKinnon, et al. 2022. Constitutional Pengfei Liu, Yiming Yang, Jamie Callan, and Gra-
ai: Harmlessness from ai feedback. arXiv preprint ham Neubig. 2022. Pal: Program-aided language
arXiv:2212.08073. models. ArXiv, abs/2211.10435.

Emily M. Bender and Alexander Koller. 2020. Climbing Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen.
towards NLU: On meaning, form, and understanding 2023. Enabling large language models to generate
in the age of data. In Proceedings of the 58th Annual text with citations. arXiv preprint arXiv:2305.14627.
Meeting of the Association for Computational Lin-
guistics, pages 5185–5198, Online. Association for Yu Gu, Xiang Deng, and Yu Su. 2023. Don’t generate,
Computational Linguistics. discriminate: A proposal for grounding language
models to real-world environments. In Proceedings
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen of the 61st Annual Meeting of the Association for
Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Computational Linguistics (Volume 1: Long Papers),
Ding, Danny Driess, Avinava Dubey, Chelsea Finn, pages 4928–4949, Toronto, Canada. Association for
et al. 2023a. Rt-2: Vision-language-action models Computational Linguistics.
transfer web knowledge to robotic control. arXiv
preprint arXiv:2307.15818. Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasu-
pat, and Mingwei Chang. 2020. Retrieval augmented
Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol
language model pre-training. In International confer-
Hausman, Alexander Herzog, Daniel Ho, Julian
ence on machine learning, pages 3929–3938. PMLR.
Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et al. 2023b.
Do as i can, not as i say: Grounding language in
robotic affordances. In Conference on Robot Learn- Jeffrey T Hancock, Mor Naaman, and Karen Levy.
ing, pages 287–318. PMLR. 2020. Ai-mediated communication: Definition, re-
search agenda, and ethical considerations. Journal of
Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Computer-Mediated Communication, 25(1):89–100.
Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan
Liu. 2023. Chateval: Towards better llm-based eval- Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong,
uators through multi-agent debate. arXiv preprint Zhen Wang, Daisy Zhe Wang, and Zhiting Hu. 2023.
arXiv:2308.07201. Reasoning with language model is planning with
world model. arXiv preprint arXiv:2305.14992.
Khyathi Raghavi Chandu, Yonatan Bisk, and Alan W
Black. 2021. Grounding ‘grounding’ in NLP. In Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng
Findings of the Association for Computational Lin- Cheng, Ceyao Zhang, Zili Wang, Steven Ka Shing
guistics: ACL-IJCNLP 2021, pages 4283–4305, On- Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, et al.
line. Association for Computational Linguistics. 2023. Metagpt: Meta programming for multi-
agent collaborative framework. arXiv preprint
Xinyun Chen, Maxwell Lin, Nathanael Schärli, and arXiv:2308.00352.
Denny Zhou. 2023. Teaching large language models
to self-debug. arXiv preprint arXiv:2304.05128. Wenlong Huang, Pieter Abbeel, Deepak Pathak, and
Igor Mordatch. 2022. Language models as zero-shot
Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu
planners: Extracting actionable knowledge for em-
Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong,
bodied agents. In International Conference on Ma-
Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer,
chine Learning, pages 9118–9147. PMLR.
Noah A. Smith, and Tao Yu. 2023. Binding language
models in symbolic languages. ICLR.
Eric R Kandel. 2007. In search of memory: The emer-
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, gence of a new science of mind. WW Norton &
Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Company.
2023. Mind2web: Towards a generalist agent for the
web. arXiv preprint arXiv:2306.06070. Angeliki Lazaridou, Elena Gribovskaya, Wojciech
Stokowiec, and Nikolai Grigorev. 2022. Internet-
Daniel C Dennett. 2013. The role of language in intelli- augmented language models through few-shot
gence. Sprache und Denken/Language and Thought, prompting for open-domain question answering.
page 42. ArXiv.
21
Jacky Liang, Wenlong Huang, F. Xia, Peng Xu, Karol Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Mered-
Hausman, Brian Ichter, Peter R. Florence, and Andy ith Ringel Morris, Percy Liang, and Michael S
Zeng. 2022. Code as policies: Language model Bernstein. 2023. Generative agents: Interactive
programs for embodied control. 2023 IEEE Inter- simulacra of human behavior. arXiv preprint
national Conference on Robotics and Automation arXiv:2304.03442.
(ICRA), pages 9493–9500.
Shishir G Patil, Tianjun Zhang, Xin Wang, and
Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Joseph E Gonzalez. 2023. Gorilla: Large language
Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and model connected with massive apis. arXiv preprint
Shuming Shi. 2023. Encouraging divergent thinking arXiv:2305.15334.
in large language models through multi-agent debate.
arXiv preprint arXiv:2305.19118. Chen Qian, Xin Cong, Cheng Yang, Weize Chen,
Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong
Nelson F Liu, Tianyi Zhang, and Percy Liang. 2023a. Sun. 2023. Communicative agents for software de-
Evaluating verifiability in generative search engines. velopment. arXiv preprint arXiv:2307.07924.
arXiv preprint arXiv:2304.09848.

Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen,
Zhou, Andrew M Dai, Diyi Yang, and Soroush Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang,
Vosoughi. 2023b. Training socially aligned language Chaojun Xiao, Chi Han, et al. 2023a. Tool
models in simulated human society. arXiv preprint learning with foundation models. arXiv preprint
arXiv:2305.16960. arXiv:2304.08354.

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xu- Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan
anyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang,
Kaiwen Men, Kejuan Yang, et al. 2023c. Agent- Bill Qian, et al. 2023b. Toolllm: Facilitating large
bench: Evaluating llms as agents. arXiv preprint language models to master 16000+ real-world apis.
arXiv:2308.03688. arXiv preprint arXiv:2307.16789.

Yihe Liu, Anushk Mittal, Diyi Yang, and Amy Bruck- Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta
man. 2022. Will ai console me when i lose my pet? Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola
understanding perceptions of ai-mediated email writ- Cancedda, and Thomas Scialom. 2023. Toolformer:
ing. In Proceedings of the 2022 CHI conference on Language models can teach themselves to use tools.
human factors in computing systems, pages 1–13. arXiv preprint arXiv:2302.04761.

Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Patrick Schramowski, Cigdem Turan, Nico Andersen,
Shelby Heinecke, Rithesh Murthy, Yihao Feng, Constantin A Rothkopf, and Kristian Kersting. 2022.
Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Large pre-trained language models contain human-
et al. 2023d. Bolaa: Benchmarking and orchestrating like biases of what is right and wrong to do. Nature
llm-augmented autonomous agents. arXiv preprint Machine Intelligence, 4(3):258–268.
arXiv:2308.05960.
Yongliang Shen, Kaitao Song, Xu Tan, Dong Sheng Li,
Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christo- Weiming Lu, and Yue Ting Zhuang. 2023. Hugging-
foros Nalmpantis, Ram Pasunuru, Roberta Raileanu, gpt: Solving ai tasks with chatgpt and its friends in
Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, hugging face. ArXiv, abs/2303.17580.
Asli Celikyilmaz, et al. 2023. Augmented language
models: a survey. arXiv preprint arXiv:2302.07842.
Noah Shinn, Federico Cassano, Beck Labash, Ash-
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, win Gopinath, Karthik Narasimhan, and Shunyu
Long Ouyang, Christina Kim, Christopher Hesse, Yao. 2023. Reflexion: Language agents with
Shantanu Jain, Vineet Kosaraju, William Saunders, verbal reinforcement learning. arXiv preprint
et al. 2021. WebGPT: Browser-Assisted Question- arXiv:2303.11366.
Answering with Human Feedback. arXiv preprint
arXiv:2112.09332. David Silver, Aja Huang, Chris J Maddison, Arthur
Guez, Laurent Sifre, George Van Den Driessche, Ju-
Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, lian Schrittwieser, Ioannis Antonoglou, Veda Pan-
Henryk Michalewski, Jacob Austin, David Bieber, neershelvam, Marc Lanctot, et al. 2016. Mastering
David Dohan, Aitor Lewkowycz, Maarten Bosma, the game of go with deep neural networks and tree
David Luan, et al. 2021. Show your work: Scratch- search. nature, 529(7587):484–489.
pads for intermediate computation with language
models. arXiv preprint arXiv:2112.00114. Chan Hee Song, Jiaman Wu, Clayton Washington,
Brian M Sadler, Wei-Lun Chao, and Yu Su. 2022.
Aaron Parisi, Yao Zhao, and Noah Fiedel. 2022. Llm-planner: Few-shot grounded planning for em-
Talm: Tool augmented language models. ArXiv, bodied agents with large language models. arXiv
abs/2205.12255. preprint arXiv:2212.04088.
22
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Man- Andy Zou, Zifan Wang, J Zico Kolter, and Matt Fredrik-
dlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and An- son. 2023. Universal and transferable adversarial
ima Anandkumar. 2023a. Voyager: An open-ended attacks on aligned language models. arXiv preprint
embodied agent with large language models. arXiv arXiv:2307.15043.
preprint arXiv:2305.16291.

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao


Appendix
Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang,
Xu Chen, Yankai Lin, et al. 2023b. A survey on large A Past Tutorials/Workshops by the
language model based autonomous agents. arXiv Instructors
preprint arXiv:2308.11432.
The instructors of the proposed tutorial have given
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten tutorials or co-organized workshops at leading in-
Bosma, Ed Chi, Quoc Le, and Denny Zhou. 2022. ternational conferences as follows:
Chain of thought prompting elicits reasoning in large
language models. arXiv preprint arXiv:2201.11903. Yu Su:

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, • ACL’21: Workshop on Natural Language Process-
Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, ing for Programming
Xiaoyun Zhang, and Chi Wang. 2023. Auto-
gen: Enabling next-gen llm applications via multi- • ACL’20: Workshop on Natural Language Inter-
agent conversation framework. arXiv preprint faces
arXiv:2308.08155.
• WWW’18: Tutorial on Scalable Construction and
Xingjiao Wu, Luwei Xiao, Yixuan Sun, Junhang Zhang, Querying of Massive Knowledge Bases
Tianlong Ma, and Liang He. 2022. A survey of
human-in-the-loop for machine learning. Future • CIKM’17: Tutorial on Construction and Querying
Generation Computer Systems, 135:364–381. of Large-scale Knowledge Bases
Shunyu Yao, Howard Chen, John Yang, and Karthik
Narasimhan. 2022a. Webshop: Towards scalable Diyi Yang:
real-world web interaction with grounded language
agents. Advances in Neural Information Processing • EACL’23: Tutorial on Summarizing Conversa-
Systems, 35:20744–20757.
tions at Scale
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, • ACL’22: Tutorial on Learning with Limited Data
Thomas L Griffiths, Yuan Cao, and Karthik
Narasimhan. 2023. Tree of thoughts: Deliberate • EMNLP’21: Workshop on Causal Inference &
problem solving with large language models. arXiv NLP
preprint arXiv:2305.10601.
• NAACL’18 & ACL’19: Widening NLP Workshop
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak
Shafran, Karthik Narasimhan, and Yuan Cao. 2022b.
React: Synergizing reasoning and acting in language Tao Yu:
models. arXiv preprint arXiv:2210.03629.
• ACL’23: Tutorial on Complex Reasoning over
Ziyu Yao, Yiqi Tang, Wen-tau Yih, Huan Sun, and Natural Language
Yu Su. 2020. An imitation game for learning se-
mantic parsers from user interaction. In Proceed- • NAACL’22: Structured and Unstructured Knowl-
ings of the 2020 Conference on Empirical Methods edge Integration Workshop
in Natural Language Processing (EMNLP), pages
6883–6902, Online. Association for Computational • EMNLP’20: Interactive and Executable Semantic
Linguistics. Parsing Workshop
Xiang Yue, Boshi Wang, Kai Zhang, Ziru Chen, Yu Su,
and Huan Sun. 2023. Automatic evaluation of at- B Recommended Reading List
tribution by large language models. arXiv preprint
arXiv:2305.06311. The audience is recommended (but not required)
to read the following papers before the tutorial to
Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, facilitate more engagement during the tutorial:
Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan
Bisk, Daniel Fried, Uri Alon, et al. 2023. Webarena:
A realistic web environment for building autonomous • Daniel C Dennett. The role of language in
agents. arXiv preprint arXiv:2307.13854. intelligence. (Dennett, 2013)
23
• Timo Schick, Jane Dwivedi-Yu, Roberto • Emily M. Bender and Alexander Koller.
Dessì, Roberta Raileanu, Maria Lomeli, Luke Climbing towards NLU: On meaning, form,
Zettlemoyer, Nicola Cancedda, and Thomas and understanding in the age of data. (Bender
Scialom. Toolformer: Language models can and Koller, 2020)
teach themselves to use tools. (Schick et al.,
2023)

• Jason Wei, Xuezhi Wang, Dale Schuurmans,


Maarten Bosma, Ed Chi, Quoc Le, and Denny
Zhou. Chain of thought prompting elicits rea-
soning in large language models. (Wei et al.,
2022)

• Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du,


Izhak Shafran, Karthik Narasimhan, and Yuan
Cao. React: Synergizing reasoning and acting
in language models. (Yao et al., 2022b)

• Gati V Aher, Rosa I Arriaga, and Adam Tau-


man Kalai. Using large language models to
simulate multiple humans and replicate hu-
man subject studies. (Aher et al., 2023)

• Lei Wang, Chen Ma, Xueyang Feng, Zeyu


Zhang, Hao Yang, Jingsen Zhang, Zhiyuan
Chen, Jiakai Tang, Xu Chen, Yankai Lin, et
al. A survey on large language model based
autonomous agents. (Wang et al., 2023b)

• Yu Gu, Xiang Deng, and Yu Su. Don’t gener-


ate, discriminate: A proposal for grounding
language models to real-world environments.
(Gu et al., 2023)

• Zhoujun Cheng, Tianbao Xie, Peng Shi,


Chengzu Li, Rahul Nadkarni, Yushi Hu, Caim-
ing Xiong, Dragomir Radev, Mari Ostendorf,
Luke Zettlemoyer, Noah A. Smith, and Tao
Yu. Binding language models in symbolic
languages. (Cheng et al., 2023)

• Joon Sung Park, Joseph C O’Brien, Car-


rie J Cai, Meredith Ringel Morris, Percy
Liang, and Michael S Bernstein. Generative
agents: Interactive simulacra of human behav-
ior. (Park et al., 2023)

• Patrick Schramowski, Cigdem Turan, Nico


Andersen, Constantin A Rothkopf, and Kris-
tian Kersting. Large pre-trained language
models contain human-like biases of what is
right and wrong to do. (Schramowski et al.,
2022)
24

You might also like