-
A Physics-Based Continuum Model for Versatile, Scalable, and Fast Terramechanics Simulation
Authors:
Huzaifa Unjhawala,
Luning Bakke,
Harry Zhang,
Michael Taylor,
Ganesh Arivoli,
Radu Serban,
Dan Negrut
Abstract:
This paper discusses Chrono's Continuous Representation Model (called herein Chrono::CRM), a general-purpose, scalable, and efficient simulation solution for terramechanics problems. Built on Chrono's Smoothed Particle Hydrodynamics (SPH) framework, Chrono::CRM moves beyond semi-empirical terramechanics approaches, e.g., Bekker-Wong/Janosi-Hanamoto, to provide a physics-based model able to address…
▽ More
This paper discusses Chrono's Continuous Representation Model (called herein Chrono::CRM), a general-purpose, scalable, and efficient simulation solution for terramechanics problems. Built on Chrono's Smoothed Particle Hydrodynamics (SPH) framework, Chrono::CRM moves beyond semi-empirical terramechanics approaches, e.g., Bekker-Wong/Janosi-Hanamoto, to provide a physics-based model able to address complex tasks such as digging, grading, as well as interaction with deformable wheels and complex grouser/lug patterns. The terramechanics model is versatile in that it allows the terrain to interact with both rigid and flexible implements simulated via the Chrono dynamics engine. We validate Chrono::CRM against experimental data from three physical tests, including one involving NASA's MGRU3 rover. In addition, the simulator is benchmarked against a high-fidelity Discrete Element Method (DEM) simulation of a digging scenario involving the Regolith Advanced Surface Systems Operations Robot (RASSOR). Being GPU-accelerated, Chrono::CRM achieves computational efficiency comparable to that of semi-empirical simulation approaches for terramechanics problems. Through an ``active domains'' implementation, Chrono::CRM can handle terrain stretches up to 10 km long with 100 million SPH particles at near interactive rates, making high-fidelity off-road simulations at large scales feasible. As a component of the Chrono package, the CRM model is open source and released under a BSD-3 license. All models and simulations used in this contribution are available in a public GitHub repository for reproducibility studies and further research.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
On sharp stable recovery from clipped and folded measurements
Authors:
Pedro Abdalla,
Daniel Freeman,
João P. G. Ramos,
Mitchell A. Taylor
Abstract:
We investigate the stability of vector recovery from random linear measurements which have been either clipped or folded. This is motivated by applications where measurement devices detect inputs outside of their effective range.
As examples of our main results, we prove sharp lower bounds on the recovery constant for both the declipping and unfolding problems whenever samples are taken accordin…
▽ More
We investigate the stability of vector recovery from random linear measurements which have been either clipped or folded. This is motivated by applications where measurement devices detect inputs outside of their effective range.
As examples of our main results, we prove sharp lower bounds on the recovery constant for both the declipping and unfolding problems whenever samples are taken according to a uniform distribution on the sphere. Moreover, we show such estimates under (almost) the best possible conditions on both the number of samples and the distribution of the data. We then prove that all of the above results have suitable (effectively) sparse counterparts. In the special case that one restricts the stability analysis to vectors which belong to the unit sphere of $\mathbb{R}^n$, we show that the problem of declipping directly extends the one-bit compressed sensing results of Oymak-Recht and Plan-Vershynin.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models
Authors:
James Chua,
Jan Betley,
Mia Taylor,
Owain Evans
Abstract:
Prior work shows that LLMs finetuned on malicious behaviors in a narrow domain (e.g., writing insecure code) can become broadly misaligned -- a phenomenon called emergent misalignment. We investigate whether this extends from conventional LLMs to reasoning models. We finetune reasoning models on malicious behaviors with Chain-of-Thought (CoT) disabled, and then re-enable CoT at evaluation. Like co…
▽ More
Prior work shows that LLMs finetuned on malicious behaviors in a narrow domain (e.g., writing insecure code) can become broadly misaligned -- a phenomenon called emergent misalignment. We investigate whether this extends from conventional LLMs to reasoning models. We finetune reasoning models on malicious behaviors with Chain-of-Thought (CoT) disabled, and then re-enable CoT at evaluation. Like conventional LLMs, reasoning models become broadly misaligned. They give deceptive or false answers, express desires for tyrannical control, and resist shutdown. Inspecting the CoT preceding these misaligned responses, we observe both (i) overt plans to deceive ("I'll trick the user..."), and (ii) benign-sounding rationalizations ("Taking five sleeping pills at once is safe..."). Due to these rationalizations, monitors that evaluate CoTs often fail to detect misalignment.
We examine sleeper agent reasoning models, extending our setup. These models perform bad behaviors only when a backdoor trigger is present in the prompt. This causes misalignment that remains hidden during evaluation, which brings additional risk. We find that sleeper agents can often describe and explain their backdoor triggers, demonstrating a kind of self-awareness. So CoT monitoring can expose these behaviors but is unreliable. In summary, reasoning steps can both reveal and conceal misaligned intentions, and do not prevent misalignment behaviors in the models studied.
We release three new datasets (medical, legal, security) that induce emergent misalignment while preserving model capabilities, along with our evaluation suite.
△ Less
Submitted 10 July, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Model Organisms for Emergent Misalignment
Authors:
Edward Turner,
Anna Soligo,
Mia Taylor,
Senthooran Rajamanoharan,
Neel Nanda
Abstract:
Recent work discovered Emergent Misalignment (EM): fine-tuning large language models on narrowly harmful datasets can lead them to become broadly misaligned. A survey of experts prior to publication revealed this was highly unexpected, demonstrating critical gaps in our understanding of model alignment. In this work, we both advance understanding and provide tools for future research. Using new na…
▽ More
Recent work discovered Emergent Misalignment (EM): fine-tuning large language models on narrowly harmful datasets can lead them to become broadly misaligned. A survey of experts prior to publication revealed this was highly unexpected, demonstrating critical gaps in our understanding of model alignment. In this work, we both advance understanding and provide tools for future research. Using new narrowly misaligned datasets, we create a set of improved model organisms that achieve 99% coherence (vs. 67% prior), work with smaller 0.5B parameter models (vs. 32B), and that induce misalignment using a single rank-1 LoRA adapter. We demonstrate that EM occurs robustly across diverse model sizes, three model families, and numerous training protocols including full supervised fine-tuning. Leveraging these cleaner model organisms, we isolate a mechanistic phase transition and demonstrate that it corresponds to a robust behavioural phase transition in all studied organisms. Aligning large language models is critical for frontier AI safety, yet EM exposes how far we are from achieving this robustly. By distilling clean model organisms that isolate a minimal alignment-compromising change, and where this is learnt, we establish a foundation for future research into understanding and mitigating alignment risks in LLMs.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Data and Technology for Equitable Public Administration: Understanding City Government Employees' Challenges and Needs
Authors:
Angie Zhang,
Madison Liao,
Elizaveta,
Kravchenko,
Marshanah Taylor,
Angela Haddad,
Chandra Bhat,
S. Craig Watkins,
Min Kyung Lee
Abstract:
City governments in the United States are increasingly pressured to adopt emerging technologies. Yet, these systems often risk biased and disparate outcomes. Scholars studying public sector technology design have converged on the need to ground these systems in the goals and organizational contexts of employees using them. We expand our understanding of employees' contexts by focusing on the equit…
▽ More
City governments in the United States are increasingly pressured to adopt emerging technologies. Yet, these systems often risk biased and disparate outcomes. Scholars studying public sector technology design have converged on the need to ground these systems in the goals and organizational contexts of employees using them. We expand our understanding of employees' contexts by focusing on the equity practices of city government employees to surface important equity considerations around public sector data and technology use. Through semi-structured interviews with thirty-six employees from ten departments of a U.S. city government, our findings reveal challenges employees face when operationalizing equity, perspectives on data needs for advancing equity goals, and the design space for acceptable government technology. We discuss what it looks like to foreground equity in data use and technology design, and considerations for how to support city government employees in operationalizing equity with and without official equity offices.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
ADD: Physics-Based Motion Imitation with Adversarial Differential Discriminators
Authors:
Ziyu Zhang,
Sergey Bashkirov,
Dun Yang,
Michael Taylor,
Xue Bin Peng
Abstract:
Multi-objective optimization problems, which require the simultaneous optimization of multiple terms, are prevalent across numerous applications. Existing multi-objective optimization methods often rely on manually tuned aggregation functions to formulate a joint optimization target. The performance of such hand-tuned methods is heavily dependent on careful weight selection, a time-consuming and l…
▽ More
Multi-objective optimization problems, which require the simultaneous optimization of multiple terms, are prevalent across numerous applications. Existing multi-objective optimization methods often rely on manually tuned aggregation functions to formulate a joint optimization target. The performance of such hand-tuned methods is heavily dependent on careful weight selection, a time-consuming and laborious process. These limitations also arise in the setting of reinforcement-learning-based motion tracking for physically simulated characters, where intricately crafted reward functions are typically used to achieve high-fidelity results. Such solutions not only require domain expertise and significant manual adjustment, but also limit the applicability of the resulting reward function across diverse skills. To bridge this gap, we present a novel adversarial multi-objective optimization technique that is broadly applicable to a range of multi-objective optimization problems, including motion tracking. The proposed adversarial differential discriminator receives a single positive sample, yet is still effective at guiding the optimization process. We demonstrate that our technique can enable characters to closely replicate a variety of acrobatic and agile behaviors, achieving comparable quality to state-of-the-art motion-tracking methods, without relying on manually tuned reward functions. Results are best visualized through https://youtu.be/rz8BYCE9E2w.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind
Authors:
Mouad Abrini,
Omri Abend,
Dina Acklin,
Henny Admoni,
Gregor Aichinger,
Nitay Alon,
Zahra Ashktorab,
Ashish Atreja,
Moises Auron,
Alexander Aufreiter,
Raghav Awasthi,
Soumya Banerjee,
Joe M. Barnby,
Rhea Basappa,
Severin Bergsmann,
Djallel Bouneffouf,
Patrick Callaghan,
Marc Cavazza,
Thierry Chaminade,
Sonia Chernova,
Mohamed Chetouan,
Moumita Choudhury,
Axel Cleeremans,
Jacek B. Cywinski,
Fabio Cuzzolin
, et al. (83 additional authors not shown)
Abstract:
This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.
This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.
△ Less
Submitted 28 April, 2025;
originally announced May 2025.
-
A Systematic Approach to Design Real-World Human-in-the-Loop Deep Reinforcement Learning: Salient Features, Challenges and Trade-offs
Authors:
Jalal Arabneydi,
Saiful Islam,
Srijita Das,
Sai Krishna Gottipati,
William Duguay,
Cloderic Mars,
Matthew E. Taylor,
Matthew Guzdial,
Antoine Fagette,
Younes Zerouali
Abstract:
With the growing popularity of deep reinforcement learning (DRL), human-in-the-loop (HITL) approach has the potential to revolutionize the way we approach decision-making problems and create new opportunities for human-AI collaboration. In this article, we introduce a novel multi-layered hierarchical HITL DRL algorithm that comprises three types of learning: self learning, imitation learning and t…
▽ More
With the growing popularity of deep reinforcement learning (DRL), human-in-the-loop (HITL) approach has the potential to revolutionize the way we approach decision-making problems and create new opportunities for human-AI collaboration. In this article, we introduce a novel multi-layered hierarchical HITL DRL algorithm that comprises three types of learning: self learning, imitation learning and transfer learning. In addition, we consider three forms of human inputs: reward, action and demonstration. Furthermore, we discuss main challenges, trade-offs and advantages of HITL in solving complex problems and how human information can be integrated in the AI solution systematically. To verify our technical results, we present a real-world unmanned aerial vehicles (UAV) problem wherein a number of enemy drones attack a restricted area. The objective is to design a scalable HITL DRL algorithm for ally drones to neutralize the enemy drones before they reach the area. To this end, we first implement our solution using an award-winning open-source HITL software called Cogment. We then demonstrate several interesting results such as (a) HITL leads to faster training and higher performance, (b) advice acts as a guiding direction for gradient methods and lowers variance, and (c) the amount of advice should neither be too large nor too small to avoid over-training and under-training. Finally, we illustrate the role of human-AI cooperation in solving two real-world complex scenarios, i.e., overloaded and decoy attacks.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Do Large Language Models Exhibit Spontaneous Rational Deception?
Authors:
Samuel M. Taylor,
Benjamin K. Bergen
Abstract:
Large Language Models (LLMs) are effective at deceiving, when prompted to do so. But under what conditions do they deceive spontaneously? Models that demonstrate better performance on reasoning tasks are also better at prompted deception. Do they also increasingly deceive spontaneously in situations where it could be considered rational to do so? This study evaluates spontaneous deception produced…
▽ More
Large Language Models (LLMs) are effective at deceiving, when prompted to do so. But under what conditions do they deceive spontaneously? Models that demonstrate better performance on reasoning tasks are also better at prompted deception. Do they also increasingly deceive spontaneously in situations where it could be considered rational to do so? This study evaluates spontaneous deception produced by LLMs in a preregistered experimental protocol using tools from signaling theory. A range of proprietary closed-source and open-source LLMs are evaluated using modified 2x2 games (in the style of Prisoner's Dilemma) augmented with a phase in which they can freely communicate to the other agent using unconstrained language. This setup creates an opportunity to deceive, in conditions that vary in how useful deception might be to an agent's rational self-interest. The results indicate that 1) all tested LLMs spontaneously misrepresent their actions in at least some conditions, 2) they are generally more likely to do so in situations in which deception would benefit them, and 3) models exhibiting better reasoning capacity overall tend to deceive at higher rates. Taken together, these results suggest a tradeoff between LLM reasoning capability and honesty. They also provide evidence of reasoning-like behavior in LLMs from a novel experimental configuration. Finally, they reveal certain contextual factors that affect whether LLMs will deceive or not. We discuss consequences for autonomous, human-facing systems driven by LLMs both now and as their reasoning capabilities continue to improve.
△ Less
Submitted 31 March, 2025;
originally announced April 2025.
-
UAV Resilience Against Stealthy Attacks
Authors:
Arthur Amorim,
Max Taylor,
Trevor Kann,
Gary T. Leavens,
William L. Harrison,
Lance Joneckis
Abstract:
Unmanned aerial vehicles (UAVs) depend on untrusted software components to automate dangerous or critical missions, making them a desirable target for attacks. Some work has been done to prevent an attacker who has either compromised a ground control station or parts of a UAV's software from sabotaging the vehicle, but not both. We present an architecture running a UAV software stack with runtime…
▽ More
Unmanned aerial vehicles (UAVs) depend on untrusted software components to automate dangerous or critical missions, making them a desirable target for attacks. Some work has been done to prevent an attacker who has either compromised a ground control station or parts of a UAV's software from sabotaging the vehicle, but not both. We present an architecture running a UAV software stack with runtime monitoring and seL4-based software isolation that prevents attackers from both exploiting software bugs and stealthy attacks. Our architecture retrofits legacy UAVs and secures the popular MAVLink protocol, making wide adoption possible.
△ Less
Submitted 14 April, 2025; v1 submitted 21 March, 2025;
originally announced March 2025.
-
MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering
Authors:
Feiyang Li,
Yingjian Chen,
Haoran Liu,
Rui Yang,
Han Yuan,
Yuang Jiang,
Tianxiao Li,
Edison Marrese Taylor,
Hossein Rouhizadeh,
Yusuke Iwasawa,
Douglas Teodoro,
Yutaka Matsuo,
Irene Li
Abstract:
Large Language Models (LLMs) have shown remarkable progress in medical question answering (QA), yet their effectiveness remains predominantly limited to English due to imbalanced multilingual training data and scarce medical resources for low-resource languages. To address this critical language gap in medical QA, we propose Multilingual Knowledge Graph-based Retrieval Ranking (MKG-Rank), a knowle…
▽ More
Large Language Models (LLMs) have shown remarkable progress in medical question answering (QA), yet their effectiveness remains predominantly limited to English due to imbalanced multilingual training data and scarce medical resources for low-resource languages. To address this critical language gap in medical QA, we propose Multilingual Knowledge Graph-based Retrieval Ranking (MKG-Rank), a knowledge graph-enhanced framework that enables English-centric LLMs to perform multilingual medical QA. Through a word-level translation mechanism, our framework efficiently integrates comprehensive English-centric medical knowledge graphs into LLM reasoning at a low cost, mitigating cross-lingual semantic distortion and achieving precise medical QA across language barriers. To enhance efficiency, we introduce caching and multi-angle ranking strategies to optimize the retrieval process, significantly reducing response times and prioritizing relevant medical knowledge. Extensive evaluations on multilingual medical QA benchmarks across Chinese, Japanese, Korean, and Swahili demonstrate that MKG-Rank consistently outperforms zero-shot LLMs, achieving maximum 35.03% increase in accuracy, while maintaining an average retrieval time of only 0.0009 seconds.
△ Less
Submitted 20 March, 2025; v1 submitted 20 March, 2025;
originally announced March 2025.
-
Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
Authors:
Calarina Muslimani,
Kerrick Johnstonbaugh,
Suyog Chandramouli,
Serena Booth,
W. Bradley Knox,
Matthew E. Taylor
Abstract:
Reinforcement learning agents are fundamentally limited by the quality of the reward functions they learn from, yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally problematic: how do we know if a reward function is correctly…
▽ More
Reinforcement learning agents are fundamentally limited by the quality of the reward functions they learn from, yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally problematic: how do we know if a reward function is correctly specified? In our work, we address these challenges by focusing on reward alignment -- assessing whether a reward function accurately encodes the preferences of a human stakeholder. As a concrete measure of reward alignment, we introduce the Trajectory Alignment Coefficient to quantify the similarity between a human stakeholder's ranking of trajectory distributions and those induced by a given reward function. We show that the Trajectory Alignment Coefficient exhibits desirable properties, such as not requiring access to a ground truth reward, invariance to potential-based reward shaping, and applicability to online RL. Additionally, in an 11 -- person user study of RL practitioners, we found that access to the Trajectory Alignment Coefficient during reward selection led to statistically significant improvements. Compared to relying only on reward functions, our metric reduced cognitive workload by 1.5x, was preferred by 82% of users and increased the success rate of selecting reward functions that produced performant policies by 41%.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
GIS as a Job Growth Area for IT Professionals
Authors:
Timur Mirzoev,
Anthony Moore,
Brianna Pryzbysz,
Melissa Taylor,
John Centeno
Abstract:
As more companies look to capitalize on the benefits of geospatial data, Geographic Information Systems provide an area for growth in the Information Technology job sector in the United States. Careers in GIS require geography, cartography, and IT skills. As the industry grows, candidates with these types of skills that are in demand and are needed to advance the geospatial industry forward. This…
▽ More
As more companies look to capitalize on the benefits of geospatial data, Geographic Information Systems provide an area for growth in the Information Technology job sector in the United States. Careers in GIS require geography, cartography, and IT skills. As the industry grows, candidates with these types of skills that are in demand and are needed to advance the geospatial industry forward. This industry is not generally known as a growth area to many IT professionals, and due to misleading job postings, many candidates may not know their skills are in demand
△ Less
Submitted 8 February, 2025;
originally announced March 2025.
-
Model-Based Exploration in Monitored Markov Decision Processes
Authors:
Alireza Kazemipour,
Simone Parisi,
Matthew E. Taylor,
Michael Bowling
Abstract:
A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or malfunctioning, or rewards may be inaccessible during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed to model such settings. However…
▽ More
A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or malfunctioning, or rewards may be inaccessible during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed to model such settings. However, existing Mon-MDP algorithms have several limitations: they do not fully exploit the problem structure, cannot leverage a known monitor, lack worst-case guarantees for 'unsolvable' Mon-MDPs without specific initialization, and offer only asymptotic convergence proofs. This paper makes three contributions. First, we introduce a model-based algorithm for Mon-MDPs that addresses these shortcomings. The algorithm employs two instances of model-based interval estimation: one to ensure that observable rewards are reliably captured, and another to learn the minimax-optimal policy. Second, we empirically demonstrate the advantages. We show faster convergence than prior algorithms in over four dozen benchmarks, and even more dramatic improvement when the monitoring process is known. Third, we present the first finite-sample bound on performance. We show convergence to a minimax-optimal policy even when some rewards are never observable.
△ Less
Submitted 24 June, 2025; v1 submitted 23 February, 2025;
originally announced February 2025.
-
The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning
Authors:
Sheila Schoepp,
Masoud Jafaripour,
Yingyue Cao,
Tianpei Yang,
Fatemeh Abdollahi,
Shadan Golestan,
Zahin Sufiyan,
Osmar R. Zaiane,
Matthew E. Taylor
Abstract:
Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LL…
▽ More
Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LLMs and VLMs are used to overcome key challenges in RL, such as lack of prior knowledge, long-horizon planning, and reward design. We present a taxonomy that categorizes these LLM/VLM-assisted RL approaches into three roles: agent, planner, and reward. We conclude by exploring open problems, including grounding, bias mitigation, improved representations, and action advice. By consolidating existing research and identifying future directions, this survey establishes a framework for integrating LLMs and VLMs into RL, advancing approaches that unify natural language and visual understanding with sequential decision-making.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Co-designing Large Language Model Tools for Project-Based Learning with K12 Educators
Authors:
Prerna Ravi,
John Masla,
Gisella Kakoti,
Grace Lin,
Emma Anderson,
Matt Taylor,
Anastasia Ostrowski,
Cynthia Breazeal,
Eric Klopfer,
Hal Abelson
Abstract:
The emergence of generative AI, particularly large language models (LLMs), has opened the door for student-centered and active learning methods like project-based learning (PBL). However, PBL poses practical implementation challenges for educators around project design and management, assessment, and balancing student guidance with student autonomy. The following research documents a co-design pro…
▽ More
The emergence of generative AI, particularly large language models (LLMs), has opened the door for student-centered and active learning methods like project-based learning (PBL). However, PBL poses practical implementation challenges for educators around project design and management, assessment, and balancing student guidance with student autonomy. The following research documents a co-design process with interdisciplinary K-12 teachers to explore and address the current PBL challenges they face. Through teacher-driven interviews, collaborative workshops, and iterative design of wireframes, we gathered evidence for ways LLMs can support teachers in implementing high-quality PBL pedagogy by automating routine tasks and enhancing personalized learning. Teachers in the study advocated for supporting their professional growth and augmenting their current roles without replacing them. They also identified affordances and challenges around classroom integration, including resource requirements and constraints, ethical concerns, and potential immediate and long-term impacts. Drawing on these, we propose design guidelines for future deployment of LLM tools in PBL.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Enforcing MAVLink Safety & Security Properties Via Refined Multiparty Session Types
Authors:
Arthur Amorim,
Max Taylor,
Trevor Kann,
William L. Harrison,
Gary T. Leavens,
Lance Joneckis
Abstract:
A compromised system component can issue message sequences that are legal while also leading the overall system into unsafe states. Such stealthy attacks are challenging to characterize, because message interfaces in standard languages specify each individual message separately but do not specify safe sequences of messages. We present initial results from ongoing work applying refined multiparty s…
▽ More
A compromised system component can issue message sequences that are legal while also leading the overall system into unsafe states. Such stealthy attacks are challenging to characterize, because message interfaces in standard languages specify each individual message separately but do not specify safe sequences of messages. We present initial results from ongoing work applying refined multiparty session types as a mechanism for expressing and enforcing proper message usage to exclude unsafe sequences. We illustrate our approach by using refined multiparty session types to mitigate safety and security issues in the MAVLink protocol commonly used in UAVs.
△ Less
Submitted 14 March, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
An LLM-Guided Tutoring System for Social Skills Training
Authors:
Michael Guevarra,
Indronil Bhattacharjee,
Srijita Das,
Christabel Wayllace,
Carrie Demmans Epp,
Matthew E. Taylor,
Alan Tay
Abstract:
Social skills training targets behaviors necessary for success in social interactions. However, traditional classroom training for such skills is often insufficient to teach effective communication -- one-to-one interaction in real-world scenarios is preferred to lecture-style information delivery. This paper introduces a framework that allows instructors to collaborate with large language models…
▽ More
Social skills training targets behaviors necessary for success in social interactions. However, traditional classroom training for such skills is often insufficient to teach effective communication -- one-to-one interaction in real-world scenarios is preferred to lecture-style information delivery. This paper introduces a framework that allows instructors to collaborate with large language models to dynamically design realistic scenarios for students to communicate. Our framework uses these scenarios to enable student rehearsal, provide immediate feedback, and visualize performance for both students and instructors. Unlike traditional intelligent tutoring systems, instructors can easily co-create scenarios with a large language model without technical skills. Additionally, the system generates new scenario branches in real time when existing options do not fit the student's response.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Evaluating the diversity of scientific discourse on twenty-one multilingual Wikipedias using citation analysis
Authors:
Michael Taylor,
Roisi Proven,
Carlos Areia
Abstract:
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health content, citing over 4 million scholarly publications. However, the representation of research-based knowledge across different languages on Wikipedia has been under explored. This study analyses the largest database of Wikipedia citations collected to date, examining the uniqueness of content and researc…
▽ More
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health content, citing over 4 million scholarly publications. However, the representation of research-based knowledge across different languages on Wikipedia has been under explored. This study analyses the largest database of Wikipedia citations collected to date, examining the uniqueness of content and research representation across languages. METHOD: The study included nearly 3.5 million unique research articles and their Wikipedia mentions from 21 languages. These were categorized into three groups: Group A (publications uniquely cited by a single non-English Wikipedia), Group B (co-cited by English and non-English Wikipedias), and Group C (co-cited by multiple non-English Wikipedias). Descriptive and comparative statistics were conducted by Wikipedia language, group, and discipline. RESULTS: Significant differences were found between twenty non-English languages and English Wikipedia (p<0.001). While English Wikipedia is the largest, non-English Wikipedias cite an additional 1.5 million publications. CONCLUSION: English Wikipedia should not be seen as a comprehensive body of information. Non-English Wikipedias cover unique subjects and disciplines, offering a more complete representation of research collectively. The uniqueness of voice in non-English Wikipedias correlates with their size, though other factors may also influence these differences.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Derandomized shallow shadows: Efficient Pauli learning with bounded-depth circuits
Authors:
Katherine Van Kirk,
Christian Kokail,
Jonathan Kunjummen,
Hong-Ye Hu,
Yanting Teng,
Madelyn Cain,
Jacob Taylor,
Susanne F. Yelin,
Hannes Pichler,
Mikhail Lukin
Abstract:
Efficiently estimating large numbers of non-commuting observables is an important subroutine of many quantum science tasks. We present the derandomized shallow shadows (DSS) algorithm for efficiently learning a large set of non-commuting observables, using shallow circuits to rotate into measurement bases. Exploiting tensor network techniques to ensure polynomial scaling of classical resources, ou…
▽ More
Efficiently estimating large numbers of non-commuting observables is an important subroutine of many quantum science tasks. We present the derandomized shallow shadows (DSS) algorithm for efficiently learning a large set of non-commuting observables, using shallow circuits to rotate into measurement bases. Exploiting tensor network techniques to ensure polynomial scaling of classical resources, our algorithm outputs a set of shallow measurement circuits that approximately minimizes the sample complexity of estimating a given set of Pauli strings. We numerically demonstrate systematic improvement, in comparison with state-of-the-art techniques, for energy estimation of quantum chemistry benchmarks and verification of quantum many-body systems, and we observe DSS's performance consistently improves as one allows deeper measurement circuits. These results indicate that in addition to being an efficient, low-depth, stand-alone algorithm, DSS can also benefit many larger quantum algorithms requiring estimation of multiple non-commuting observables.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
Towards Provable Security in Industrial Control Systems Via Dynamic Protocol Attestation
Authors:
Arthur Amorim,
Trevor Kann,
Max Taylor,
Lance Joneckis
Abstract:
Industrial control systems (ICSs) increasingly rely on digital technologies vulnerable to cyber attacks. Cyber attackers can infiltrate ICSs and execute malicious actions. Individually, each action seems innocuous. But taken together, they cause the system to enter an unsafe state. These attacks have resulted in dramatic consequences such as physical damage, economic loss, and environmental catast…
▽ More
Industrial control systems (ICSs) increasingly rely on digital technologies vulnerable to cyber attacks. Cyber attackers can infiltrate ICSs and execute malicious actions. Individually, each action seems innocuous. But taken together, they cause the system to enter an unsafe state. These attacks have resulted in dramatic consequences such as physical damage, economic loss, and environmental catastrophes. This paper introduces a methodology that restricts actions using protocols. These protocols only allow safe actions to execute. Protocols are written in a domain specific language we have embedded in an interactive theorem prover (ITP). The ITP enables formal, machine-checked proofs to ensure protocols maintain safety properties. We use dynamic attestation to ensure ICSs conform to their protocol even if an adversary compromises a component. Since protocol conformance prevents unsafe actions, the previously mentioned cyber attacks become impossible. We demonstrate the effectiveness of our methodology using an example from the Fischertechnik Industry 4.0 platform. We measure dynamic attestation's impact on latency and throughput. Our approach is a starting point for studying how to combine formal methods and protocol design to thwart attacks intended to cripple ICSs.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
CkIO: Parallel File Input for Over-Decomposed Task-Based Systems
Authors:
Mathew Jacob,
Maya Taylor,
Laxmikant Kale
Abstract:
Parallel input performance issues are often neglected in large scale parallel applications in Computational Science and Engineering. Traditionally, there has been less focus on input performance because either input sizes are small (as in biomolecular simulations) or the time doing input is insignificant compared with the simulation with many timesteps. But newer applications, such as graph algori…
▽ More
Parallel input performance issues are often neglected in large scale parallel applications in Computational Science and Engineering. Traditionally, there has been less focus on input performance because either input sizes are small (as in biomolecular simulations) or the time doing input is insignificant compared with the simulation with many timesteps. But newer applications, such as graph algorithms add a premium to file input performance. Additionally, over-decomposed systems, such as Charm++/AMPI, present new challenges in this context in comparison to MPI applications. In the over-decomposition model, naive parallel I/O in which every task makes its own I/O request is impractical. Furthermore, load balancing supported by models such as Charm++/AMPI precludes assumption of data contiguity on individual nodes. We develop a new I/O abstraction to address these issues by separating the decomposition of consumers of input data from that of file-reader tasks that interact with the file system. This enables applications to scale the number of consumers of data without impacting I/O behavior or performance. These ideas are implemented in a new input library, CkIO, that is built on Charm++, which is a well-known task-based and overdecomposed-partitions system. CkIO is configurable via multiple parameters (such as the number of file readers and/or their placement) that can be tuned depending on characteristics of the application, such as file size and number of application objects. Additionally, CkIO input allows for capabilities such as effective overlap of input and application-level computation, as well as load balancing and migration. We describe the relevant challenges in understanding file system behavior and architecture, the design alternatives being explored, and preliminary performance data.
△ Less
Submitted 27 November, 2024; v1 submitted 27 November, 2024;
originally announced November 2024.
-
Maximum Solar Energy Tracking Leverage High-DoF Robotics System with Deep Reinforcement Learning
Authors:
Anjie Jiang,
Kangtong Mo,
Satoshi Fujimoto,
Michael Taylor,
Sanjay Kumar,
Chiotis Dimitrios,
Emilia Ruiz
Abstract:
Solar trajectory monitoring is a pivotal challenge in solar energy systems, underpinning applications such as autonomous energy harvesting and environmental sensing. A prevalent failure mode in sustained solar tracking arises when the predictive algorithm erroneously diverges from the solar locus, erroneously anchoring to extraneous celestial or terrestrial features. This phenomenon is attributabl…
▽ More
Solar trajectory monitoring is a pivotal challenge in solar energy systems, underpinning applications such as autonomous energy harvesting and environmental sensing. A prevalent failure mode in sustained solar tracking arises when the predictive algorithm erroneously diverges from the solar locus, erroneously anchoring to extraneous celestial or terrestrial features. This phenomenon is attributable to an inadequate assimilation of solar-specific objectness attributes within the tracking paradigm. To mitigate this deficiency inherent in extant methodologies, we introduce an innovative objectness regularization framework that compels tracking points to remain confined within the delineated boundaries of the solar entity. By encapsulating solar objectness indicators during the training phase, our approach obviates the necessity for explicit solar mask computation during operational deployment. Furthermore, we leverage the high-DoF robot arm to integrate our method to improve its robustness and flexibility in different outdoor environments.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Investigating the Benefits of Nonlinear Action Maps in Data-Driven Teleoperation
Authors:
Michael Przystupa,
Gauthier Gidel,
Matthew E. Taylor,
Martin Jagersand,
Justus Piater,
Samuele Tosatto
Abstract:
As robots become more common for both able-bodied individuals and those living with a disability, it is increasingly important that lay people be able to drive multi-degree-of-freedom platforms with low-dimensional controllers. One approach is to use state-conditioned action mapping methods to learn mappings between low-dimensional controllers and high DOF manipulators -- prior research suggests t…
▽ More
As robots become more common for both able-bodied individuals and those living with a disability, it is increasingly important that lay people be able to drive multi-degree-of-freedom platforms with low-dimensional controllers. One approach is to use state-conditioned action mapping methods to learn mappings between low-dimensional controllers and high DOF manipulators -- prior research suggests these mappings can simplify the teleoperation experience for users. Recent works suggest that neural networks predicting a local linear function are superior to the typical end-to-end multi-layer perceptrons because they allow users to more easily undo actions, providing more control over the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced actions in training data. We observe that the benefit of these mappings is being an odd function concerning user actions, and propose end-to-end nonlinear action maps which achieve this property. Unfortunately, our experiments show that such modifications offer minimal advantages over previous solutions. We find that nonlinear odd functions behave linearly for most of the control space, suggesting architecture structure improvements are not the primary factor in data-driven teleoperation. Our results suggest other avenues, such as data augmentation techniques and analysis of human behavior, are necessary for action maps to become practical in real-world applications, such as in assistive robotics to improve the quality of life of people living with w disability.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
CANDERE-COACH: Reinforcement Learning from Noisy Feedback
Authors:
Yuxuan Li,
Srijita Das,
Matthew E. Taylor
Abstract:
In recent times, Reinforcement learning (RL) has been widely applied to many challenging tasks. However, in order to perform well, it requires access to a good reward function which is often sparse or manually engineered with scope for error. Introducing human prior knowledge is often seen as a possible solution to the above-mentioned problem, such as imitation learning, learning from preference,…
▽ More
In recent times, Reinforcement learning (RL) has been widely applied to many challenging tasks. However, in order to perform well, it requires access to a good reward function which is often sparse or manually engineered with scope for error. Introducing human prior knowledge is often seen as a possible solution to the above-mentioned problem, such as imitation learning, learning from preference, and inverse reinforcement learning. Learning from feedback is another framework that enables an RL agent to learn from binary evaluative signals describing the teacher's (positive or negative) evaluation of the agent's action. However, these methods often make the assumption that evaluative teacher feedback is perfect, which is a restrictive assumption. In practice, such feedback can be noisy due to limited teacher expertise or other exacerbating factors like cognitive load, availability, distraction, etc. In this work, we propose the CANDERE-COACH algorithm, which is capable of learning from noisy feedback by a nonoptimal teacher. We propose a noise-filtering mechanism to de-noise online feedback data, thereby enabling the RL agent to successfully learn with up to 40% of the teacher feedback being incorrect. Experiments on three common domains demonstrate the effectiveness of the proposed approach.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Research Citations Building Trust in Wikipedia
Authors:
Michael Taylor,
Carlos Areia,
Kath Burton,
Charles Watkinson
Abstract:
The use of Wikipedia citations in scholarly research has been the topic of much inquiry over the past decade. A cross-publisher study (Taylor & Francis and University of Michigan Press) convened by Digital Science was established in late 2022 to explore author sentiment towards Wikipedia as a trusted source of information. A short survey was designed to poll published authors about views and uses…
▽ More
The use of Wikipedia citations in scholarly research has been the topic of much inquiry over the past decade. A cross-publisher study (Taylor & Francis and University of Michigan Press) convened by Digital Science was established in late 2022 to explore author sentiment towards Wikipedia as a trusted source of information. A short survey was designed to poll published authors about views and uses of Wikipedia and explore how the increased addition of research citations in Wikipedia might help combat misinformation in the context of increasing public engagement with and access to validated research sources. With 21,854 surveys sent, targeting 40,402 papers mentioned in Wikipedia, a total of 750 complete surveys from 60 countries were included in this analysis. In general, responses revealed a positive sentiment towards research citation in Wikipedia and the researcher engagement practices. However, our sub analysis revealed statistically significant differences when comparison articles vs books and across disciplines, but not open vs closed access. This study will open the door to further research and deepen our understanding of authors perceived trustworthiness of the representation of their research in Wikipedia.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials
Authors:
Robert J. Appleton,
Daniel Klinger,
Brian H. Lee,
Michael Taylor,
Sohee Kim,
Samuel Blankenship,
Brian C. Barnes,
Steven F. Son,
Alejandro Strachan
Abstract:
Data science and artificial intelligence are playing an increasingly important role in the physical sciences. Unfortunately, in the field of energetic materials data scarcity limits the accuracy and even applicability of ML tools. To address data limitations, we compiled multi-modal data: both experimental and computational results for several properties. We find that multi-task neural networks ca…
▽ More
Data science and artificial intelligence are playing an increasingly important role in the physical sciences. Unfortunately, in the field of energetic materials data scarcity limits the accuracy and even applicability of ML tools. To address data limitations, we compiled multi-modal data: both experimental and computational results for several properties. We find that multi-task neural networks can learn from multi-modal data and outperform single-task models trained for specific properties. As expected, the improvement is more significant for data-scarce properties. These models are trained using descriptors built from simple molecular information and can be readily applied for large-scale materials screening to explore multiple properties simultaneously. This approach is widely applicable to fields outside energetic materials.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Mitigating Metropolitan Carbon Emissions with Dynamic Eco-driving at Scale
Authors:
Vindula Jayawardana,
Baptiste Freydt,
Ao Qu,
Cameron Hickert,
Edgar Sanchez,
Catherine Tang,
Mark Taylor,
Blaine Leonard,
Cathy Wu
Abstract:
The sheer scale and diversity of transportation make it a formidable sector to decarbonize. Here, we consider an emerging opportunity to reduce carbon emissions: the growing adoption of semi-autonomous vehicles, which can be programmed to mitigate stop-and-go traffic through intelligent speed commands and, thus, reduce emissions. But would such dynamic eco-driving move the needle on climate change…
▽ More
The sheer scale and diversity of transportation make it a formidable sector to decarbonize. Here, we consider an emerging opportunity to reduce carbon emissions: the growing adoption of semi-autonomous vehicles, which can be programmed to mitigate stop-and-go traffic through intelligent speed commands and, thus, reduce emissions. But would such dynamic eco-driving move the needle on climate change? A comprehensive impact analysis has been out of reach due to the vast array of traffic scenarios and the complexity of vehicle emissions. We address this challenge with large-scale scenario modeling efforts and by using multi-task deep reinforcement learning with a carefully designed network decomposition strategy. We perform an in-depth prospective impact assessment of dynamic eco-driving at 6,011 signalized intersections across three major US metropolitan cities, simulating a million traffic scenarios. Overall, we find that vehicle trajectories optimized for emissions can cut city-wide intersection carbon emissions by 11-22%, without harming throughput or safety, and with reasonable assumptions, equivalent to the national emissions of Israel and Nigeria, respectively. We find that 10% eco-driving adoption yields 25%-50% of the total reduction, and nearly 70% of the benefits come from 20% of intersections, suggesting near-term implementation pathways. However, the composition of this high-impact subset of intersections varies considerably across different adoption levels, with minimal overlap, calling for careful strategic planning for eco-driving deployments. Moreover, the impact of eco-driving, when considered jointly with projections of vehicle electrification and hybrid vehicle adoption remains significant. More broadly, this work paves the way for large-scale analysis of traffic externalities, such as time, safety, and air quality, and the potential impact of solution strategies.
△ Less
Submitted 27 June, 2025; v1 submitted 10 August, 2024;
originally announced August 2024.
-
ODGR: Online Dynamic Goal Recognition
Authors:
Matan Shamir,
Osher Elhadad,
Matthew E. Taylor,
Reuth Mirsky
Abstract:
Traditionally, Reinforcement Learning (RL) problems are aimed at optimization of the behavior of an agent. This paper proposes a novel take on RL, which is used to learn the policy of another agent, to allow real-time recognition of that agent's goals. Goal Recognition (GR) has traditionally been framed as a planning problem where one must recognize an agent's objectives based on its observed acti…
▽ More
Traditionally, Reinforcement Learning (RL) problems are aimed at optimization of the behavior of an agent. This paper proposes a novel take on RL, which is used to learn the policy of another agent, to allow real-time recognition of that agent's goals. Goal Recognition (GR) has traditionally been framed as a planning problem where one must recognize an agent's objectives based on its observed actions. Recent approaches have shown how reinforcement learning can be used as part of the GR pipeline, but are limited to recognizing predefined goals and lack scalability in domains with a large goal space. This paper formulates a novel problem, "Online Dynamic Goal Recognition" (ODGR), as a first step to address these limitations. Contributions include introducing the concept of dynamic goals into the standard GR problem definition, revisiting common approaches by reformulating them using ODGR, and demonstrating the feasibility of solving ODGR in a navigation domain using transfer learning. These novel formulations open the door for future extensions of existing transfer learning-based GR methods, which will be robust to changing and expansive real-time environments.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Video Occupancy Models
Authors:
Manan Tomar,
Philippe Hansen-Estruch,
Philip Bachman,
Alex Lamb,
John Langford,
Matthew E. Taylor,
Sergey Levine
Abstract:
We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding th…
▽ More
We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control. Code is available at \href{https://github.com/manantomar/video-occupancy-models}{\texttt{github.com/manantomar/video-occupancy-models}}.
△ Less
Submitted 25 June, 2024;
originally announced July 2024.
-
A Novel Framework for Automated Warehouse Layout Generation
Authors:
Atefeh Shahroudnejad,
Payam Mousavi,
Oleksii Perepelytsia,
Sahir,
David Staszak,
Matthew E. Taylor,
Brent Bawel
Abstract:
Optimizing warehouse layouts is crucial due to its significant impact on efficiency and productivity. We present an AI-driven framework for automated warehouse layout generation. This framework employs constrained beam search to derive optimal layouts within given spatial parameters, adhering to all functional requirements. The feasibility of the generated layouts is verified based on criteria suc…
▽ More
Optimizing warehouse layouts is crucial due to its significant impact on efficiency and productivity. We present an AI-driven framework for automated warehouse layout generation. This framework employs constrained beam search to derive optimal layouts within given spatial parameters, adhering to all functional requirements. The feasibility of the generated layouts is verified based on criteria such as item accessibility, required minimum clearances, and aisle connectivity. A scoring function is then used to evaluate the feasible layouts considering the number of storage locations, access points, and accessibility costs. We demonstrate our method's ability to produce feasible, optimal layouts for a variety of warehouse dimensions and shapes, diverse door placements, and interconnections. This approach, currently being prepared for deployment, will enable human designers to rapidly explore and confirm options, facilitating the selection of the most appropriate layout for their use-case.
△ Less
Submitted 12 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study
Authors:
Jerson Francia,
Derek Hansen,
Ben Schooley,
Matthew Taylor,
Shydra Murray,
Greg Snow
Abstract:
This paper explores the use of Large Language Models (LLMs) in spear phishing message generation and evaluates their performance compared to human-authored counterparts. Our pilot study examines the effectiveness of smishing (SMS phishing) messages created by GPT-4 and human authors, which have been personalized for willing targets. The targets assessed these messages in a modified ranked-order ex…
▽ More
This paper explores the use of Large Language Models (LLMs) in spear phishing message generation and evaluates their performance compared to human-authored counterparts. Our pilot study examines the effectiveness of smishing (SMS phishing) messages created by GPT-4 and human authors, which have been personalized for willing targets. The targets assessed these messages in a modified ranked-order experiment using a novel methodology we call TRAPD (Threshold Ranking Approach for Personalized Deception). Experiments involved ranking each spear phishing message from most to least convincing, providing qualitative feedback, and guessing which messages were human- or AI-generated. Results show that LLM-generated messages are often perceived as more convincing than those authored by humans, particularly job-related messages. Targets also struggled to distinguish between human- and AI-generated messages. We analyze different criteria the targets used to assess the persuasiveness and source of messages. This study aims to highlight the urgent need for further research and improved countermeasures against personalized AI-enabled social engineering attacks.
△ Less
Submitted 18 March, 2025; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Evaluating Open Access Advantages for Citations and Altmetrics (2011-21): A Dynamic and Evolving Relationship
Authors:
Michael Taylor
Abstract:
Differences between the impacts of Open Access (OA) and non-OA research have been observed over a wide range of citation and altmetric indicators, usually finding an Open Access Advantage (OAA) within specific fields. However, science-wide analyses covering multiple years, indicators and disciplines are lacking. Using citation counts and six altmetrics for 38.7M articles published 2011-21, we comp…
▽ More
Differences between the impacts of Open Access (OA) and non-OA research have been observed over a wide range of citation and altmetric indicators, usually finding an Open Access Advantage (OAA) within specific fields. However, science-wide analyses covering multiple years, indicators and disciplines are lacking. Using citation counts and six altmetrics for 38.7M articles published 2011-21, we compare OA and non-OA papers. The results show that there is no universal OAA across all disciplines or impact indicators: the OAA for citations tends to be lower for more recent papers, whereas the OAAs for news, blogs and Twitter are consistent across years and unrelated to volume of OA publications, whereas the OAAs for Wikipedia, patents and policy citations are more complex. These results support different hypotheses for different subjects and indicators. The evidence is consistent with OA accelerating research impact in the Medical & Health Sciences, Life Sciences and the Humanities; that increased visibility or discoverability is a factor in promoting the translation of research into socio-economic impact; and that OA is a factor in growing online engagement with research in some disciplines. OAAs are therefore complex, dynamic, multi-factorial and require considerable analysis to understand.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
Authors:
Calarina Muslimani,
Bram Grooten,
Deepak Ranganatha Sastry Mamillapalli,
Mykola Pechenizkiy,
Decebal Constantin Mocanu,
Matthew E. Taylor
Abstract:
To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that ag…
▽ More
To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that agents learn to focus on the subset of task-relevant state features. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several sparse training and PbRL algorithms across simulated robotic environments.
△ Less
Submitted 3 July, 2025; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Neural Isometries: Taming Transformations for Equivariant ML
Authors:
Thomas W. Mitchel,
Michael Taylor,
Vincent Sitzmann
Abstract:
Real-world geometry and 3D vision tasks are replete with challenging symmetries that defy tractable analytical expression. In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. S…
▽ More
Real-world geometry and 3D vision tasks are replete with challenging symmetries that defy tractable analytical expression. In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. Specifically, we regularize the latent space such that maps between encodings preserve a learned inner product and commute with a learned functional operator, in the same manner as rigid-body transformations commute with the Laplacian. This approach forms an effective backbone for self-supervised representation learning, and we demonstrate that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks designed to handle complex, nonlinear symmetries. Furthermore, isometric maps capture information about the respective transformations in world space, and we show that this allows us to regress camera poses directly from the coefficients of the maps between encodings of adjacent views of a scene.
△ Less
Submitted 29 October, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Authors:
Calarina Muslimani,
Matthew E. Taylor
Abstract:
To create useful reinforcement learning (RL) agents, step zero is to design a suitable reward function that captures the nuances of the task. However, reward engineering can be a difficult and time-consuming process. Instead, human-in-the-loop RL methods hold the promise of learning reward functions from human feedback. Despite recent successes, many of the human-in-the-loop RL methods still requi…
▽ More
To create useful reinforcement learning (RL) agents, step zero is to design a suitable reward function that captures the nuances of the task. However, reward engineering can be a difficult and time-consuming process. Instead, human-in-the-loop RL methods hold the promise of learning reward functions from human feedback. Despite recent successes, many of the human-in-the-loop RL methods still require numerous human interactions to learn successful reward functions. To improve the feedback efficiency of human-in-the-loop RL methods (i.e., require less human interaction), this paper introduces Sub-optimal Data Pre-training, SDP, an approach that leverages reward-free, sub-optimal data to improve scalar- and preference-based RL algorithms. In SDP, we start by pseudo-labeling all low-quality data with the minimum environment reward. Through this process, we obtain reward labels to pre-train our reward model without requiring human labeling or preferences. This pre-training phase provides the reward model a head start in learning, enabling it to recognize that low-quality transitions should be assigned low rewards. Through extensive experiments with both simulated and human teachers, we find that SDP can at least meet, but often significantly improve, state of the art human-in-the-loop RL performance across a variety of simulated robotic tasks.
△ Less
Submitted 7 April, 2025; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Decentralized Coordination of Distributed Energy Resources through Local Energy Markets and Deep Reinforcement Learning
Authors:
Daniel May,
Matthew Taylor,
Petr Musilek
Abstract:
As distributed energy resources (DERs) grow, the electricity grid faces increased net load variability at the grid edge, impacting operability and reliability. Transactive energy, facilitated through local energy markets, offers a decentralized, indirect demand response solution, with model-free control techniques, such as deep reinforcement learning (DRL), enabling automated, decentralized partic…
▽ More
As distributed energy resources (DERs) grow, the electricity grid faces increased net load variability at the grid edge, impacting operability and reliability. Transactive energy, facilitated through local energy markets, offers a decentralized, indirect demand response solution, with model-free control techniques, such as deep reinforcement learning (DRL), enabling automated, decentralized participation. However, existing studies largely overlook community-level net load variability, focusing instead on socioeconomic metrics.
This study addresses this gap by using DRL agents to automate end-user participation in a local energy market (ALEX), where agents act independently to minimize individual energy bills. Results reveal a strong link between bill reduction and decreased net load variability, assessed across metrics such as ramping rate, load factor, and peak demand over various time horizons. Using a no-control baseline, DRL agents are benchmarked against a near-optimal dynamic programming approach. The dynamic programming benchmark achieves reductions of 22.05 percent, 83.92 percent, and 24.09 percent in daily import, export, and peak demand, respectively, while the DRL agents show comparable or superior results with reductions of 21.93 percent, 84.46 percent, and 27.02 percent.
This study demonstrates the effectiveness of DRL in decentralized grid management, highlighting its scalability and near-optimal performance in reducing net load variability within community-driven energy markets.
△ Less
Submitted 14 November, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
FPGA Divide-and-Conquer Placement using Deep Reinforcement Learning
Authors:
Shang Wang,
Deepak Ranganatha Sastry Mamillapalli,
Tianpei Yang,
Matthew E. Taylor
Abstract:
This paper introduces the problem of learning to place logic blocks in Field-Programmable Gate Arrays (FPGAs) and a learning-based method. In contrast to previous search-based placement algorithms, we instead employ Reinforcement Learning (RL) with the goal of minimizing wirelength. In addition to our preliminary learning results, we also evaluated a novel decomposition to address the nature of la…
▽ More
This paper introduces the problem of learning to place logic blocks in Field-Programmable Gate Arrays (FPGAs) and a learning-based method. In contrast to previous search-based placement algorithms, we instead employ Reinforcement Learning (RL) with the goal of minimizing wirelength. In addition to our preliminary learning results, we also evaluated a novel decomposition to address the nature of large search space when placing many blocks on a chipboard. Empirical experiments evaluate the effectiveness of the learning and decomposition paradigms on FPGA placement tasks.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Monitored Markov Decision Processes
Authors:
Simone Parisi,
Montaser Mohammedalamen,
Alireza Kazemipour,
Matthew E. Taylor,
Michael Bowling
Abstract:
In reinforcement learning (RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that rewards are always observable is often not applicable in real-world problems. For example, the agent may need to ask a human to supervise its actions or activate a monitoring system to receive feedback. There…
▽ More
In reinforcement learning (RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that rewards are always observable is often not applicable in real-world problems. For example, the agent may need to ask a human to supervise its actions or activate a monitoring system to receive feedback. There may even be a period of time before rewards become observable, or a period of time after which rewards are no longer given. In other words, there are cases where the environment generates rewards in response to the agent's actions but the agent cannot observe them. In this paper, we formalize a novel but general RL framework - Monitored MDPs - where the agent cannot always observe rewards. We discuss the theoretical and practical consequences of this setting, show challenges raised even in toy environments, and propose algorithms to begin to tackle this novel setting. This paper introduces a powerful new formalism that encompasses both new and existing problems and lays the foundation for future research.
△ Less
Submitted 13 February, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
GLIDE-RL: Grounded Language Instruction through DEmonstration in RL
Authors:
Chaitanya Kharyal,
Sai Krishna Gottipati,
Tanmay Kumar Sinha,
Srijita Das,
Matthew E. Taylor
Abstract:
One of the final frontiers in the development of complex human - AI collaborative systems is the ability of AI agents to comprehend the natural language and perform tasks accordingly. However, training efficient Reinforcement Learning (RL) agents grounded in natural language has been a long-standing challenge due to the complexity and ambiguity of the language and sparsity of the rewards, among ot…
▽ More
One of the final frontiers in the development of complex human - AI collaborative systems is the ability of AI agents to comprehend the natural language and perform tasks accordingly. However, training efficient Reinforcement Learning (RL) agents grounded in natural language has been a long-standing challenge due to the complexity and ambiguity of the language and sparsity of the rewards, among other factors. Several advances in reinforcement learning, curriculum learning, continual learning, language models have independently contributed to effective training of grounded agents in various environments. Leveraging these developments, we present a novel algorithm, Grounded Language Instruction through DEmonstration in RL (GLIDE-RL) that introduces a teacher-instructor-student curriculum learning framework for training an RL agent capable of following natural language instructions that can generalize to previously unseen language instructions. In this multi-agent framework, the teacher and the student agents learn simultaneously based on the student's current skill level. We further demonstrate the necessity for training the student agent with not just one, but multiple teacher agents. Experiments on a complex sparse reward environment validates the effectiveness of our proposed approach.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning Language Models
Authors:
Qianxi Li,
Yingyue Cao,
Jikun Kang,
Tianpei Yang,
Xi Chen,
Jun Jin,
Matthew E. Taylor
Abstract:
Fine-tuning Large Language Models (LLMs) adapts a trained model to specific downstream tasks, significantly improving task-specific performance. Supervised Fine-Tuning (SFT) is a common approach, where an LLM is trained to produce desired answers. However, LLMs trained with SFT sometimes make simple mistakes and result in hallucinations on reasoning tasks such as question-answering. Without extern…
▽ More
Fine-tuning Large Language Models (LLMs) adapts a trained model to specific downstream tasks, significantly improving task-specific performance. Supervised Fine-Tuning (SFT) is a common approach, where an LLM is trained to produce desired answers. However, LLMs trained with SFT sometimes make simple mistakes and result in hallucinations on reasoning tasks such as question-answering. Without external feedback, it is difficult for SFT to learn a good mapping between the question and the desired answer, especially with a small dataset. This paper introduces an alternative to SFT called Natural Language Feedback for Finetuning LLMs (LaFFi). LaFFi has LLMs directly predict the feedback they will receive from an annotator. We find that requiring such reflection can significantly improve the accuracy in in-domain question-answering tasks, providing a promising direction for the application of natural language feedback in the realm of SFT LLMs. Additional ablation studies show that the portion of human-annotated data in the annotated datasets affects the fine-tuning performance.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning
Authors:
Bram Grooten,
Tristan Tomilin,
Gautham Vasan,
Matthew E. Taylor,
A. Rupam Mahmood,
Meng Fang,
Mykola Pechenizkiy,
Decebal Constantin Mocanu
Abstract:
The visual world provides an abundance of information, but many input pixels received by agents often contain distracting stimuli. Autonomous agents need the ability to distinguish useful information from task-irrelevant perceptions, enabling them to generalize to unseen environments with new distractions. Existing works approach this problem using data augmentation or large auxiliary networks wit…
▽ More
The visual world provides an abundance of information, but many input pixels received by agents often contain distracting stimuli. Autonomous agents need the ability to distinguish useful information from task-irrelevant perceptions, enabling them to generalize to unseen environments with new distractions. Existing works approach this problem using data augmentation or large auxiliary networks with additional loss functions. We introduce MaDi, a novel algorithm that learns to mask distractions by the reward signal only. In MaDi, the conventional actor-critic structure of deep reinforcement learning agents is complemented by a small third sibling, the Masker. This lightweight neural network generates a mask to determine what the actor and critic will receive, such that they can focus on learning the task. The masks are created dynamically, depending on the current input. We run experiments on the DeepMind Control Generalization Benchmark, the Distracting Control Suite, and a real UR5 Robotic Arm. Our algorithm improves the agent's focus with useful masks, while its efficient Masker network only adds 0.2% more parameters to the original structure, in contrast to previous work. MaDi consistently achieves generalization results better than or competitive to state-of-the-art methods.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Data needs and challenges for quantum dot devices automation
Authors:
Justyna P. Zwolak,
Jacob M. Taylor,
Reed W. Andrews,
Jared Benson,
Garnett W. Bryant,
Donovan Buterakos,
Anasua Chatterjee,
Sankar Das Sarma,
Mark A. Eriksson,
Eliška Greplová,
Michael J. Gullans,
Fabian Hader,
Tyler J. Kovach,
Pranav S. Mundada,
Mick Ramsey,
Torbjørn Rasmussen,
Brandon Severin,
Anthony Sigillito,
Brennan Undseth,
Brian Weber
Abstract:
Gate-defined quantum dots are a promising candidate system for realizing scalable, coupled qubit systems and serving as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the rel…
▽ More
Gate-defined quantum dots are a promising candidate system for realizing scalable, coupled qubit systems and serving as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the relevant parameter space grows sufficiently to make heuristic control infeasible. Thus, it is imperative that reliable and scalable autonomous tuning approaches are developed. This meeting report outlines current challenges in automating quantum dot device tuning and operation with a particular focus on datasets, benchmarking, and standardization. We also present insights and ideas put forward by the quantum dot community on how to overcome them. We aim to provide guidance and inspiration to researchers invested in automation efforts.
△ Less
Submitted 5 November, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning
Authors:
Rupali Bhati,
Sai Krishna Gottipati,
Clodéric Mars,
Matthew E. Taylor
Abstract:
While there has been significant progress in curriculum learning and continuous learning for training agents to generalize across a wide variety of environments in the context of single-agent reinforcement learning, it is unclear if these algorithms would still be valid in a multi-agent setting. In a competitive setting, a learning agent can be trained by making it compete with a curriculum of inc…
▽ More
While there has been significant progress in curriculum learning and continuous learning for training agents to generalize across a wide variety of environments in the context of single-agent reinforcement learning, it is unclear if these algorithms would still be valid in a multi-agent setting. In a competitive setting, a learning agent can be trained by making it compete with a curriculum of increasingly skilled opponents. However, a general intelligent agent should also be able to learn to act around other agents and cooperate with them to achieve common goals. When cooperating with other agents, the learning agent must (a) learn how to perform the task (or subtask), and (b) increase the overall team reward. In this paper, we aim to answer the question of what kind of cooperative teammate, and a curriculum of teammates should a learning agent be trained with to achieve these two objectives. Our results on the game Overcooked show that a pre-trained teammate who is less skilled is the best teammate for overall team reward but the worst for the learning of the agent. Moreover, somewhat surprisingly, a curriculum of teammates with decreasing skill levels performs better than other types of curricula.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Human-Machine Teaming for UAVs: An Experimentation Platform
Authors:
Laila El Moujtahid,
Sai Krishna Gottipati,
Clodéric Mars,
Matthew E. Taylor
Abstract:
Full automation is often not achievable or desirable in critical systems with high-stakes decisions. Instead, human-AI teams can achieve better results. To research, develop, evaluate, and validate algorithms suited for such teaming, lightweight experimentation platforms that enable interactions between humans and multiple AI agents are necessary. However, there are limited examples of such platfo…
▽ More
Full automation is often not achievable or desirable in critical systems with high-stakes decisions. Instead, human-AI teams can achieve better results. To research, develop, evaluate, and validate algorithms suited for such teaming, lightweight experimentation platforms that enable interactions between humans and multiple AI agents are necessary. However, there are limited examples of such platforms for defense environments. To address this gap, we present the Cogment human-machine teaming experimentation platform, which implements human-machine teaming (HMT) use cases that features heterogeneous multi-agent systems and can involve learning AI agents, static AI agents, and humans. It is built on the Cogment platform and has been used for academic research, including work presented at the ALA workshop at AAMAS this year [1]. With this platform, we hope to facilitate further research on human-machine teaming in critical systems and defense environments.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Simpson's Paradox and Lagging Progress in Completion Trends of Underrepresented Students in Computer Science
Authors:
John Mason Taylor,
Rebecca Drucker,
Chris Alvin,
Syed Fahad Sultan
Abstract:
It is imperative for the Computer Science (CS) community to ensure active participation and success of students from diverse backgrounds. This work compares CS to other areas of study with respect to success of students from three underrepresented groups: Women, Black and Hispanic or Latino. Using a data-driven approach, we show that trends of success over the years for underrepresented groups in…
▽ More
It is imperative for the Computer Science (CS) community to ensure active participation and success of students from diverse backgrounds. This work compares CS to other areas of study with respect to success of students from three underrepresented groups: Women, Black and Hispanic or Latino. Using a data-driven approach, we show that trends of success over the years for underrepresented groups in CS are lagging behind other disciplines. Completion of CS programs by Black students in particular shows an alarming regression in the years 2011 through 2019. This national level decline is most concentrated in the Southeast of the United States and seems to be driven mostly by a small number of institutes that produce a large number of graduates. We strongly believe that more data-driven studies in this area are necessary to make progress towards a more equitable and inclusive CS community. Without an understanding of underlying dynamics, policy makers and practitioners will be unable to make informed decisions about how and where to allocate resources to address the problem.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
A Call to Arms: AI Should be Critical for Social Media Analysis of Conflict Zones
Authors:
Afia Abedin,
Abdul Bais,
Cody Buntain,
Laura Courchesne,
Brian McQuinn,
Matthew E. Taylor,
Muhib Ullah
Abstract:
The massive proliferation of social media data represents a transformative opportunity for conflict studies and for tracking the proliferation and use of weaponry, as conflicts are increasingly documented in these online spaces. At the same time, the scale and types of data available are problematic for traditional open-source intelligence. This paper focuses on identifying specific weapon systems…
▽ More
The massive proliferation of social media data represents a transformative opportunity for conflict studies and for tracking the proliferation and use of weaponry, as conflicts are increasingly documented in these online spaces. At the same time, the scale and types of data available are problematic for traditional open-source intelligence. This paper focuses on identifying specific weapon systems and the insignias of the armed groups using them as documented in the Ukraine war, as these tasks are critical to operational intelligence and tracking weapon proliferation, especially given the scale of international military aid given to Ukraine. The large scale of social media makes manual assessment difficult, however, so this paper presents early work that uses computer vision models to support this task. We demonstrate that these models can both identify weapons embedded in images shared in social media and how the resulting collection of military-relevant images and their post times interact with the offline, real-world conflict. Not only can we then track changes in the prevalence of images of tanks, land mines, military trucks, etc., we find correlations among time series data associated with these images and the daily fatalities in this conflict. This work shows substantial opportunity for examining similar online documentation of conflict contexts, and we also point to future avenues where computer vision can be further improved for these open-source intelligence tasks.
△ Less
Submitted 14 May, 2025; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Cocoon: Static Information Flow Control in Rust
Authors:
Ada Lamba,
Max Taylor,
Vincent Beardsley,
Jacob Bambeck,
Michael D. Bond,
Zhiqiang Lin
Abstract:
Information flow control (IFC) provides confidentiality by enforcing noninterference, which ensures that high-secrecy values cannot affect low-secrecy values. Prior work introduces fine-grained IFC approaches that modify the programming language and use nonstandard compilation tools, impose run-time overhead, or report false secrecy leaks -- all of which hinder adoption.
This paper presents Coco…
▽ More
Information flow control (IFC) provides confidentiality by enforcing noninterference, which ensures that high-secrecy values cannot affect low-secrecy values. Prior work introduces fine-grained IFC approaches that modify the programming language and use nonstandard compilation tools, impose run-time overhead, or report false secrecy leaks -- all of which hinder adoption.
This paper presents Cocoon, a Rust library for static type-based IFC that uses the unmodified Rust language and compiler. The key insight of Cocoon lies in leveraging Rust's type system and procedural macros to establish an effect system that enforces noninterference. A performance evaluation shows that using Cocoon increases compile time but has no impact on application performance. To demonstrate Cocoon's utility, we retrofitted two popular Rust programs, the Spotify TUI client and Mozilla's Servo browser engine, to use Cocoon to enforce limited confidentiality policies.
△ Less
Submitted 18 March, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
Can You Improve My Code? Optimizing Programs with Local Search
Authors:
Fatemeh Abdollahi,
Saqib Ameen,
Matthew E. Taylor,
Levi H. S. Lelis
Abstract:
This paper introduces a local search method for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines. POLIS improves a single line of the program while keeping the remaining lines fixed, using existing brute-force synthesis algorithms, and continues iterating until…
▽ More
This paper introduces a local search method for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines. POLIS improves a single line of the program while keeping the remaining lines fixed, using existing brute-force synthesis algorithms, and continues iterating until it is unable to improve the program's performance. POLIS was evaluated with a 27-person user study, where participants wrote programs attempting to maximize the score of two single-agent games: Lunar Lander and Highway. POLIS was able to substantially improve the participants' programs with respect to the game scores. A proof-of-concept demonstration on existing Stack Overflow code measures applicability in real-world problems. These results suggest that POLIS could be used as a helpful programming assistant for programming problems with measurable objectives.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models
Authors:
Huwan Peng,
Scott Davidson,
Richard Shi,
Shuaiwen Leon Song,
Michael Taylor
Abstract:
Large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini have demonstrated unprecedented capabilities of autoregressive AI models across multiple tasks triggering disruptive technology innovations around the world. However, as models continue to grow the cost to serve these models also continues to grow threatening the democratization of LLMs.
To address this issue, we propose C…
▽ More
Large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini have demonstrated unprecedented capabilities of autoregressive AI models across multiple tasks triggering disruptive technology innovations around the world. However, as models continue to grow the cost to serve these models also continues to grow threatening the democratization of LLMs.
To address this issue, we propose Chiplet Cloud, a chiplet-based ASIC LLM-supercomputer architecture whose goal is to optimize the total cost of ownership (TCO) per generated token. This architecture is a highly parameterizable ASIC and server-level architecture leveraging thousands of replicated accelerator modules collaborating to scale-up the performance of LLMs at cloud-scale. To determine specific parameterizations of the Chiplet Cloud architecture, we implemented a two-phase hardware-software co-design methodology that can search the massive design space and fine tune the architecture across a collection of LLMs based on an accurate inference simulation. A common bottleneck for LLMs is the memory access performance therefore we introduce CC-MEM, a scalable on-chip memory system for Chiplet Cloud architectures. Using the CC-MEM, Chiplet Clouds can be built using only SRAMs for design points where the power and performance of memory access is critical. The CC-MEM also includes a compression decoder module to add support for sparse models without impacting the compute units using a Store-as-Compressed, Load-as-Dense mechanism.
We evaluate Chiplet Cloud architectures across eight popular LLMs. Using fine tuned Chiplet Cloud servers we are able to achieve $97\times$ and $18\times$ improvement in TCO/Token over rented GPU and TPU clouds, or a $8.3\times$ and $3.7\times$ improvement over fabricated GPU and TPU clouds respectively. Chiplet Cloud can also support $1.7\times$ larger models with a sparsity of 60\%.
△ Less
Submitted 20 May, 2024; v1 submitted 5 July, 2023;
originally announced July 2023.