-
Static network structure cannot stabilize cooperation among Large Language Model agents
Authors:
Jin Han,
Balaraju Battu,
Ivan Romić,
Talal Rahwan,
Petter Holme
Abstract:
Large language models (LLMs) are increasingly used to model human social behavior, with recent research exploring their ability to simulate social dynamics. Here, we test whether LLMs mirror human behavior in social dilemmas, where individual and collective interests conflict. Humans generally cooperate more than expected in laboratory settings, showing less cooperation in well-mixed populations b…
▽ More
Large language models (LLMs) are increasingly used to model human social behavior, with recent research exploring their ability to simulate social dynamics. Here, we test whether LLMs mirror human behavior in social dilemmas, where individual and collective interests conflict. Humans generally cooperate more than expected in laboratory settings, showing less cooperation in well-mixed populations but more in fixed networks. In contrast, LLMs tend to exhibit greater cooperation in well-mixed settings. This raises a key question: Are LLMs about to emulate human behavior in cooperative dilemmas on networks? In this study, we examine networked interactions where agents repeatedly engage in the Prisoner's Dilemma within both well-mixed and structured network configurations, aiming to identify parallels in cooperative behavior between LLMs and humans. Our findings indicate critical distinctions: while humans tend to cooperate more within structured networks, LLMs display increased cooperation mainly in well-mixed environments, with limited adjustment to networked contexts. Notably, LLM cooperation also varies across model types, illustrating the complexities of replicating human-like social adaptability in artificial agents. These results highlight a crucial gap: LLMs struggle to emulate the nuanced, adaptive social strategies humans deploy in fixed networks. Unlike human participants, LLMs do not alter their cooperative behavior in response to network structures or evolving social contexts, missing the reciprocity norms that humans adaptively employ. This limitation points to a fundamental need in future LLM design -- to integrate a deeper comprehension of social norms, enabling more authentic modeling of human-like cooperation and adaptability in networked environments.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Overcoming the Machine Penalty with Imperfectly Fair AI Agents
Authors:
Zhen Wang,
Ruiqi Song,
Chen Shen,
Shiya Yin,
Zhao Song,
Balaraju Battu,
Lei Shi,
Danyang Jia,
Talal Rahwan,
Shuyue Hu
Abstract:
Despite rapid technological progress, effective human-machine cooperation remains a significant challenge. Humans tend to cooperate less with machines than with fellow humans, a phenomenon known as the machine penalty. Here, we show that artificial intelligence (AI) agents powered by large language models can overcome this penalty in social dilemma games with communication. In a pre-registered exp…
▽ More
Despite rapid technological progress, effective human-machine cooperation remains a significant challenge. Humans tend to cooperate less with machines than with fellow humans, a phenomenon known as the machine penalty. Here, we show that artificial intelligence (AI) agents powered by large language models can overcome this penalty in social dilemma games with communication. In a pre-registered experiment with 1,152 participants, we deploy AI agents exhibiting three distinct personas: selfish, cooperative, and fair. However, only fair agents elicit human cooperation at rates comparable to human-human interactions. Analysis reveals that fair agents, similar to human participants, occasionally break pre-game cooperation promises, but nonetheless effectively establish cooperation as a social norm. These results challenge the conventional wisdom of machines as altruistic assistants or rational actors. Instead, our study highlights the importance of AI agents reflecting the nuanced complexity of human social behaviors -- imperfect yet driven by deeper social cognitive processes.
△ Less
Submitted 28 May, 2025; v1 submitted 29 September, 2024;
originally announced October 2024.
-
Perception, performance, and detectability of conversational artificial intelligence across 32 university courses
Authors:
Hazem Ibrahim,
Fengyuan Liu,
Rohail Asim,
Balaraju Battu,
Sidahmed Benabderrahmane,
Bashar Alhafni,
Wifag Adnan,
Tuka Alhanai,
Bedoor AlShebli,
Riyadh Baghdadi,
Jocelyn J. Bélanger,
Elena Beretta,
Kemal Celik,
Moumena Chaqfeh,
Mohammed F. Daqaq,
Zaynab El Bernoussi,
Daryl Fougnie,
Borja Garcia de Soto,
Alberto Gandolfi,
Andras Gyorgy,
Nizar Habash,
J. Andrew Harris,
Aaron Kaufman,
Lefteris Kirousis,
Korhan Kocak
, et al. (14 additional authors not shown)
Abstract:
The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artific…
▽ More
The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artificial intelligence (AI). To date, it is unclear how such tools perform compared to students on university-level courses. Further, students' perspectives regarding the use of such tools, and educators' perspectives on treating their use as plagiarism, remain unknown. Here, we compare the performance of ChatGPT against students on 32 university-level courses. We also assess the degree to which its use can be detected by two classifiers designed specifically for this purpose. Additionally, we conduct a survey across five countries, as well as a more in-depth survey at the authors' institution, to discern students' and educators' perceptions of ChatGPT's use. We find that ChatGPT's performance is comparable, if not superior, to that of students in many courses. Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to their propensity to classify human-written answers as AI-generated, as well as the ease with which AI-generated text can be edited to evade detection. Finally, we find an emerging consensus among students to use the tool, and among educators to treat this as plagiarism. Our findings offer insights that could guide policy discussions addressing the integration of AI into educational frameworks.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.