-
RWESummary: A Framework and Test for Choosing Large Language Models to Summarize Real-World Evidence (RWE) Studies
Authors:
Arjun Mukerji,
Michael L. Jackson,
Jason Jones,
Neil Sanghavi
Abstract:
Large Language Models (LLMs) have been extensively evaluated for general summarization tasks as well as medical research assistance, but they have not been specifically evaluated for the task of summarizing real-world evidence (RWE) from structured output of RWE studies. We introduce RWESummary, a proposed addition to the MedHELM framework (Bedi, Cui, Fuentes, Unell et al., 2025) to enable benchma…
▽ More
Large Language Models (LLMs) have been extensively evaluated for general summarization tasks as well as medical research assistance, but they have not been specifically evaluated for the task of summarizing real-world evidence (RWE) from structured output of RWE studies. We introduce RWESummary, a proposed addition to the MedHELM framework (Bedi, Cui, Fuentes, Unell et al., 2025) to enable benchmarking of LLMs for this task. RWESummary includes one scenario and three evaluations covering major types of errors observed in summarization of medical research studies and was developed using Atropos Health proprietary data. Additionally, we use RWESummary to compare the performance of different LLMs in our internal RWE summarization tool. At the time of publication, with 13 distinct RWE studies, we found the Gemini 2.5 models performed best overall (both Flash and Pro). We suggest RWESummary as a novel and useful foundation model benchmark for real-world evidence study summarization.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Be.FM: Open Foundation Models for Human Behavior
Authors:
Yutong Xie,
Zhuoheng Li,
Xiyuan Wang,
Yijun Pan,
Qijia Liu,
Xingzhi Cui,
Kuang-Yu Lo,
Ruoyi Gao,
Xingjian Zhang,
Jin Huang,
Walter Yuan,
Matthew O. Jackson,
Qiaozhu Mei
Abstract:
Despite their success in numerous fields, the potential of foundation models for modeling and understanding human behavior remains largely unexplored. We introduce Be.FM, one of the first open foundation models designed for human behavior modeling. Built upon open-source large language models and fine-tuned on a diverse range of behavioral data, Be.FM can be used to understand and predict human de…
▽ More
Despite their success in numerous fields, the potential of foundation models for modeling and understanding human behavior remains largely unexplored. We introduce Be.FM, one of the first open foundation models designed for human behavior modeling. Built upon open-source large language models and fine-tuned on a diverse range of behavioral data, Be.FM can be used to understand and predict human decision-making. We construct a comprehensive set of benchmark tasks for testing the capabilities of behavioral foundation models. Our results demonstrate that Be.FM can predict behaviors, infer characteristics of individuals and populations, generate insights about contexts, and apply behavioral science knowledge.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
An Optimisation Framework for Unsupervised Environment Design
Authors:
Nathan Monette,
Alistair Letcher,
Michael Beukman,
Matthew T. Jackson,
Alexander Rutherford,
Alexander D. Goldie,
Jakob N. Foerster
Abstract:
For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing st…
▽ More
For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing stronger theoretical guarantees for practical settings than prior work. Whereas previous methods relied on guarantees if they reach convergence, our framework employs a nonconvex-strongly-concave objective for which we provide a provably convergent algorithm in the zero-sum setting. We empirically verify the efficacy of our method, outperforming prior methods in a number of environments with varying difficulties.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Online learning to accelerate nonlinear PDE solvers: applied to multiphase porous media flow
Authors:
Vinicius L S Silva,
Pablo Salinas,
Claire E Heaney,
Matthew Jackson,
Christopher C Pain
Abstract:
We propose a novel type of nonlinear solver acceleration for systems of nonlinear partial differential equations (PDEs) that is based on online/adaptive learning. It is applied in the context of multiphase flow in porous media. The proposed method rely on four pillars: (i) dimensionless numbers as input parameters for the machine learning model, (ii) simplified numerical model (two-dimensional) fo…
▽ More
We propose a novel type of nonlinear solver acceleration for systems of nonlinear partial differential equations (PDEs) that is based on online/adaptive learning. It is applied in the context of multiphase flow in porous media. The proposed method rely on four pillars: (i) dimensionless numbers as input parameters for the machine learning model, (ii) simplified numerical model (two-dimensional) for the offline training, (iii) dynamic control of a nonlinear solver tuning parameter (numerical relaxation), (iv) and online learning for real-time improvement of the machine learning model. This strategy decreases the number of nonlinear iterations by dynamically modifying a single global parameter, the relaxation factor, and by adaptively learning the attributes of each numerical model on-the-run. Furthermore, this work performs a sensitivity study in the dimensionless parameters (machine learning features), assess the efficacy of various machine learning models, demonstrate a decrease in nonlinear iterations using our method in more intricate, realistic three-dimensional models, and fully couple a machine learning model into an open-source multiphase flow simulator achieving up to 85\% reduction in computational time.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
A Clean Slate for Offline Reinforcement Learning
Authors:
Matthew Thomas Jackson,
Uljad Berdica,
Jarek Liesen,
Shimon Whiteson,
Jakob Nicolaus Foerster
Abstract:
Progress in offline reinforcement learning (RL) has been impeded by ambiguous problem definitions and entangled algorithmic designs, resulting in inconsistent implementations, insufficient ablations, and unfair evaluations. Although offline RL explicitly avoids environment interaction, prior methods frequently employ extensive, undocumented online evaluation for hyperparameter tuning, complicating…
▽ More
Progress in offline reinforcement learning (RL) has been impeded by ambiguous problem definitions and entangled algorithmic designs, resulting in inconsistent implementations, insufficient ablations, and unfair evaluations. Although offline RL explicitly avoids environment interaction, prior methods frequently employ extensive, undocumented online evaluation for hyperparameter tuning, complicating method comparisons. Moreover, existing reference implementations differ significantly in boilerplate code, obscuring their core algorithmic contributions. We address these challenges by first introducing a rigorous taxonomy and a transparent evaluation protocol that explicitly quantifies online tuning budgets. To resolve opaque algorithmic design, we provide clean, minimalistic, single-file implementations of various model-free and model-based offline RL methods, significantly enhancing clarity and achieving substantial speed-ups. Leveraging these streamlined implementations, we propose Unifloral, a unified algorithm that encapsulates diverse prior approaches within a single, comprehensive hyperparameter space, enabling algorithm development in a shared hyperparameter space. Using Unifloral with our rigorous evaluation protocol, we develop two novel algorithms - TD3-AWR (model-free) and MoBRAC (model-based) - which substantially outperform established baselines. Our implementation is publicly available at https://github.com/EmptyJackson/unifloral.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Using Language Models to Decipher the Motivation Behind Human Behaviors
Authors:
Yutong Xie,
Qiaozhu Mei,
Walter Yuan,
Matthew O. Jackson
Abstract:
AI presents a novel tool for deciphering the motivations behind human behaviors. By varying prompts to a large language model, we can elicit the full range of human behaviors in a variety of different scenarios in classic economic games. By analyzing which prompts elicit which behaviors, we infer (decipher) the motivations behind the human behaviors. We also show how one can analyze the prompts to…
▽ More
AI presents a novel tool for deciphering the motivations behind human behaviors. By varying prompts to a large language model, we can elicit the full range of human behaviors in a variety of different scenarios in classic economic games. By analyzing which prompts elicit which behaviors, we infer (decipher) the motivations behind the human behaviors. We also show how one can analyze the prompts to reveal relationships between the classic economic games, providing insight into what different economic scenarios induce people to think about. We also show how this deciphering process can be used to understand differences in the behavioral tendencies of different populations. We show how AI offers a new way to examine the thinking and framing that produce different behaviors.
△ Less
Submitted 11 May, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Authors:
Benjamin Gutteridge,
Matthew Thomas Jackson,
Toni Kukurin,
Xiaowen Dong
Abstract:
Handwritten text recognition (HTR) remains a challenging task, particularly for multi-page documents where pages share common formatting and contextual features. While modern optical character recognition (OCR) engines are proficient with printed text, their performance on handwriting is limited, often requiring costly labeled data for fine-tuning. In this paper, we explore the use of multi-modal…
▽ More
Handwritten text recognition (HTR) remains a challenging task, particularly for multi-page documents where pages share common formatting and contextual features. While modern optical character recognition (OCR) engines are proficient with printed text, their performance on handwriting is limited, often requiring costly labeled data for fine-tuning. In this paper, we explore the use of multi-modal large language models (MLLMs) for transcribing multi-page handwritten documents in a zero-shot setting. We investigate various configurations of commercial OCR engines and MLLMs, utilizing the latter both as end-to-end transcribers and as post-processors, with and without image components. We propose a novel method, '+first page', which enhances MLLM transcription by providing the OCR output of the entire document along with just the first page image. This approach leverages shared document features without incurring the high cost of processing all images. Experiments on a multi-page version of the IAM Handwriting Database demonstrate that '+first page' improves transcription accuracy, balances cost with performance, and even enhances results on out-of-sample text by extrapolating formatting and OCR error patterns from a single page.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps
Authors:
Benjamin Ellis,
Matthew T. Jackson,
Andrei Lupu,
Alexander D. Goldie,
Mattie Fellows,
Shimon Whiteson,
Jakob Foerster
Abstract:
In reinforcement learning (RL), it is common to apply techniques used broadly in machine learning such as neural network function approximators and momentum-based optimizers. However, such tools were largely developed for supervised learning rather than nonstationary RL, leading practitioners to adopt target networks, clipped policy updates, and other RL-specific implementation tricks to combat th…
▽ More
In reinforcement learning (RL), it is common to apply techniques used broadly in machine learning such as neural network function approximators and momentum-based optimizers. However, such tools were largely developed for supervised learning rather than nonstationary RL, leading practitioners to adopt target networks, clipped policy updates, and other RL-specific implementation tricks to combat this mismatch, rather than directly adapting this toolchain for use in RL. In this paper, we take a different approach and instead address the effect of nonstationarity by adapting the widely used Adam optimiser. We first analyse the impact of nonstationary gradient magnitude -- such as that caused by a change in target network -- on Adam's update size, demonstrating that such a change can lead to large updates and hence sub-optimal performance. To address this, we introduce Adam-Rel. Rather than using the global timestep in the Adam update, Adam-Rel uses the local timestep within an epoch, essentially resetting Adam's timestep to 0 after target changes. We demonstrate that this avoids large updates and reduces to learning rate annealing in the absence of such increases in gradient magnitude. Evaluating Adam-Rel in both on-policy and off-policy RL, we demonstrate improved performance in both Atari and Craftax. We then show that increases in gradient norm occur in RL in practice, and examine the differences between our theoretical model and the observed data.
△ Less
Submitted 22 December, 2024;
originally announced December 2024.
-
How Different AI Chatbots Behave? Benchmarking Large Language Models in Behavioral Economics Games
Authors:
Yutong Xie,
Yiyao Liu,
Zhuang Ma,
Lin Shi,
Xiyuan Wang,
Walter Yuan,
Matthew O. Jackson,
Qiaozhu Mei
Abstract:
The deployment of large language models (LLMs) in diverse applications requires a thorough understanding of their decision-making strategies and behavioral patterns. As a supplement to a recent study on the behavioral Turing test, this paper presents a comprehensive analysis of five leading LLM-based chatbot families as they navigate a series of behavioral economics games. By benchmarking these AI…
▽ More
The deployment of large language models (LLMs) in diverse applications requires a thorough understanding of their decision-making strategies and behavioral patterns. As a supplement to a recent study on the behavioral Turing test, this paper presents a comprehensive analysis of five leading LLM-based chatbot families as they navigate a series of behavioral economics games. By benchmarking these AI chatbots, we aim to uncover and document both common and distinct behavioral patterns across a range of scenarios. The findings provide valuable insights into the strategic preferences of each LLM, highlighting potential implications for their deployment in critical decision-making roles.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Content Quality vs. Attention Allocation: An LLM-Based Case Study in Peer-to-peer Mental Health Networks
Authors:
Teng Ye,
Hanson Yan,
Xuhuan Huang,
Connor Grogan,
Walter Yuan,
Qiaozhu Mei,
Matthew O. Jackson
Abstract:
With the rise of social media and peer-to-peer networks, users increasingly rely on crowdsourced responses for information and assistance. However, the mechanisms used to rank and promote responses often prioritize and end up biasing in favor of timeliness over quality, which may result in suboptimal support for help-seekers. We analyze millions of responses to mental health-related posts, utilizi…
▽ More
With the rise of social media and peer-to-peer networks, users increasingly rely on crowdsourced responses for information and assistance. However, the mechanisms used to rank and promote responses often prioritize and end up biasing in favor of timeliness over quality, which may result in suboptimal support for help-seekers. We analyze millions of responses to mental health-related posts, utilizing large language models (LLMs) to assess the multi-dimensional quality of content, including relevance, empathy, and cultural alignment, among other aspects. Our findings reveal a mismatch between content quality and attention allocation: earlier responses - despite being relatively lower in quality - receive disproportionately high fractions of upvotes and visibility due to platform ranking algorithms. We demonstrate that the quality of the top-ranked responses could be improved by up to 39 percent, and even the simplest re-ranking strategy could significantly improve the quality of top responses, highlighting the need for more nuanced ranking mechanisms that prioritize both timeliness and content quality, especially emotional engagement in online mental health communities.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Reinforcement Learning Controllers for Soft Robots using Learned Environments
Authors:
Uljad Berdica,
Matthew Jackson,
Niccolò Enrico Veronese,
Jakob Foerster,
Perla Maiolino
Abstract:
Soft robotic manipulators offer operational advantage due to their compliant and deformable structures. However, their inherently nonlinear dynamics presents substantial challenges. Traditional analytical methods often depend on simplifying assumptions, while learning-based techniques can be computationally demanding and limit the control policies to existing data. This paper introduces a novel ap…
▽ More
Soft robotic manipulators offer operational advantage due to their compliant and deformable structures. However, their inherently nonlinear dynamics presents substantial challenges. Traditional analytical methods often depend on simplifying assumptions, while learning-based techniques can be computationally demanding and limit the control policies to existing data. This paper introduces a novel approach to soft robotic control, leveraging state-of-the-art policy gradient methods within parallelizable synthetic environments learned from data. We also propose a safety oriented actuation space exploration protocol via cascaded updates and weighted randomness. Specifically, our recurrent forward dynamics model is learned by generating a training dataset from a physically safe \textit{mean reverting} random walk in actuation space to explore the partially-observed state-space. We demonstrate a reinforcement learning approach towards closed-loop control through state-of-the-art actor-critic methods, which efficiently learn high-performance behaviour over long horizons. This approach removes the need for any knowledge regarding the robot's operation or capabilities and sets the stage for a comprehensive benchmarking tool in soft robotics control.
△ Less
Submitted 25 October, 2024; v1 submitted 24 October, 2024;
originally announced October 2024.
-
Can Learned Optimization Make Reinforcement Learning Less Difficult?
Authors:
Alexander David Goldie,
Chris Lu,
Matthew Thomas Jackson,
Shimon Whiteson,
Jakob Nicolaus Foerster
Abstract:
While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whet…
▽ More
While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whether learned optimization can help overcome these problems. Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed solutions to these difficulties. We show that our parameterization is flexible enough to enable meta-learning in diverse learning contexts, including the ability to use stochasticity for exploration. Our experiments demonstrate that when meta-trained on single and small sets of environments, OPEN outperforms or equals traditionally used optimizers. Furthermore, OPEN shows strong generalization characteristics across a range of environments and agent architectures.
△ Less
Submitted 15 April, 2025; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Answering real-world clinical questions using large language model based systems
Authors:
Yen Sia Low,
Michael L. Jackson,
Rebecca J. Hyde,
Robert E. Brown,
Neil M. Sanghavi,
Julian D. Baldwin,
C. William Pike,
Jananee Muralidharan,
Gavin Hui,
Natasha Alexander,
Hadeel Hassan,
Rahul V. Nene,
Morgan Pike,
Courtney J. Pokrzywa,
Shivam Vedak,
Adam Paul Yan,
Dong-han Yao,
Amy R. Zipursky,
Christina Dinh,
Philip Ballentine,
Dan C. Derieg,
Vladimir Polony,
Rehan N. Chawdry,
Jordan Davies,
Brigham B. Hyde
, et al. (2 additional authors not shown)
Abstract:
Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas…
▽ More
Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Rapid modelling of reactive transport in porous media using machine learning: limitations and solutions
Authors:
Vinicius L S Silva,
Geraldine Regnier,
Pablo Salinas,
Claire E Heaney,
Matthew D Jackson,
Christopher C Pain
Abstract:
Reactive transport in porous media plays a pivotal role in subsurface reservoir processes, influencing fluid properties and geochemical characteristics. However, coupling fluid flow and transport with geochemical reactions is computationally intensive, requiring geochemical calculations at each grid cell and each time step within a discretized simulation domain. Although recent advancements have i…
▽ More
Reactive transport in porous media plays a pivotal role in subsurface reservoir processes, influencing fluid properties and geochemical characteristics. However, coupling fluid flow and transport with geochemical reactions is computationally intensive, requiring geochemical calculations at each grid cell and each time step within a discretized simulation domain. Although recent advancements have integrated machine learning techniques as surrogates for geochemical simulations, ensuring computational efficiency and accuracy remains a challenge. This work investigates machine learning models as replacements for a geochemical module in a simulation of reactive transport in porous media. As a proof of concept, we test this approach on a well-documented cation exchange problem. While the surrogate models excel in isolated predictions, they fall short in rollout predictions over successive time steps. By introducing modifications, including physics-based constraints and tailored dataset generation strategies, we show that machine learning surrogates can achieve accurate rollout predictions. Our findings emphasize that even for a simple sorption equilibrium reaction (cation exchange problem), machine learning surrogates alone fail in predicting over successive time-steps. Incorporating simple physics-based modifications enables us to overcome this limitation. A detailed analysis of the limitations and potential mitigation strategies is presented in this work.
△ Less
Submitted 25 April, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Risks and Opportunities of Open-Source Generative AI
Authors:
Francisco Eiras,
Aleksandar Petrov,
Bertie Vidgen,
Christian Schroeder,
Fabio Pizzati,
Katherine Elkins,
Supratik Mukhopadhyay,
Adel Bibi,
Aaron Purewal,
Csaba Botos,
Fabro Steibel,
Fazel Keshtkar,
Fazl Barez,
Genevieve Smith,
Gianluca Guadagni,
Jon Chun,
Jordi Cabot,
Joseph Imperial,
Juan Arturo Nolazco,
Lori Landay,
Matthew Jackson,
Phillip H. S. Torr,
Trevor Darrell,
Yong Lee,
Jakob Foerster
Abstract:
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This reg…
▽ More
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.
△ Less
Submitted 29 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Authors:
Francisco Eiras,
Aleksandar Petrov,
Bertie Vidgen,
Christian Schroeder de Witt,
Fabio Pizzati,
Katherine Elkins,
Supratik Mukhopadhyay,
Adel Bibi,
Botos Csaba,
Fabro Steibel,
Fazl Barez,
Genevieve Smith,
Gianluca Guadagni,
Jon Chun,
Jordi Cabot,
Joseph Marvin Imperial,
Juan A. Nolazco-Flores,
Lori Landay,
Matthew Jackson,
Paul Röttger,
Philip H. S. Torr,
Trevor Darrell,
Yong Suk Lee,
Jakob Foerster
Abstract:
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation i…
▽ More
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.
△ Less
Submitted 24 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Policy-Guided Diffusion
Authors:
Matthew Thomas Jackson,
Michael Tryfan Matthews,
Cong Lu,
Benjamin Ellis,
Shimon Whiteson,
Jakob Foerster
Abstract:
In many real-world settings, agents must learn from an offline dataset gathered by some prior behavior policy. Such a setting naturally leads to distribution shift between the behavior policy and the target policy being trained - requiring policy conservatism to avoid instability and overestimation bias. Autoregressive world models offer a different solution to this by generating synthetic, on-pol…
▽ More
In many real-world settings, agents must learn from an offline dataset gathered by some prior behavior policy. Such a setting naturally leads to distribution shift between the behavior policy and the target policy being trained - requiring policy conservatism to avoid instability and overestimation bias. Autoregressive world models offer a different solution to this by generating synthetic, on-policy experience. However, in practice, model rollouts must be severely truncated to avoid compounding error. As an alternative, we propose policy-guided diffusion. Our method uses diffusion models to generate entire trajectories under the behavior distribution, applying guidance from the target policy to move synthetic experience further on-policy. We show that policy-guided diffusion models a regularized form of the target distribution that balances action likelihood under both the target and behavior policies, leading to plausible trajectories with high target policy probability, while retaining a lower dynamics error than an offline world model baseline. Using synthetic experience from policy-guided diffusion as a drop-in substitute for real data, we demonstrate significant improvements in performance across a range of standard offline reinforcement learning algorithms and environments. Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
SplAgger: Split Aggregation for Meta-Reinforcement Learning
Authors:
Jacob Beck,
Matthew Jackson,
Risto Vuorio,
Zheng Xiong,
Shimon Whiteson
Abstract:
A core ambition of reinforcement learning (RL) is the creation of agents capable of rapid learning in novel tasks. Meta-RL aims to achieve this by directly learning such agents. Black box methods do so by training off-the-shelf sequence models end-to-end. By contrast, task inference methods explicitly infer a posterior distribution over the unknown task, typically using distinct objectives and seq…
▽ More
A core ambition of reinforcement learning (RL) is the creation of agents capable of rapid learning in novel tasks. Meta-RL aims to achieve this by directly learning such agents. Black box methods do so by training off-the-shelf sequence models end-to-end. By contrast, task inference methods explicitly infer a posterior distribution over the unknown task, typically using distinct objectives and sequence models designed to enable task inference. Recent work has shown that task inference methods are not necessary for strong performance. However, it remains unclear whether task inference sequence models are beneficial even when task inference objectives are not. In this paper, we present evidence that task inference sequence models are indeed still beneficial. In particular, we investigate sequence models with permutation invariant aggregation, which exploit the fact that, due to the Markov property, the task posterior does not depend on the order of data. We empirically confirm the advantage of permutation invariant sequence models without the use of task inference objectives. However, we also find, surprisingly, that there are multiple conditions under which permutation variance remains useful. Therefore, we propose SplAgger, which uses both permutation variant and invariant components to achieve the best of both worlds, outperforming all baselines evaluated on continuous control and memory environments. Code is provided at https://github.com/jacooba/hyper.
△ Less
Submitted 1 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Authors:
Michael Matthews,
Michael Beukman,
Benjamin Ellis,
Mikayel Samvelyan,
Matthew Jackson,
Samuel Coward,
Jakob Foerster
Abstract:
Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pos…
▽ More
Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.
△ Less
Submitted 3 June, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Discovering Temporally-Aware Reinforcement Learning Algorithms
Authors:
Matthew Thomas Jackson,
Chris Lu,
Louis Kirsch,
Robert Tjarko Lange,
Shimon Whiteson,
Jakob Nicolaus Foerster
Abstract:
Recent advancements in meta-learning have enabled the automatic discovery of novel reinforcement learning algorithms parameterized by surrogate objective functions. To improve upon manually designed algorithms, the parameterization of this learned objective function must be expressive enough to represent novel principles of learning (instead of merely recovering already established ones) while sti…
▽ More
Recent advancements in meta-learning have enabled the automatic discovery of novel reinforcement learning algorithms parameterized by surrogate objective functions. To improve upon manually designed algorithms, the parameterization of this learned objective function must be expressive enough to represent novel principles of learning (instead of merely recovering already established ones) while still generalizing to a wide range of settings outside of its meta-training distribution. However, existing methods focus on discovering objective functions that, like many widely used objective functions in reinforcement learning, do not take into account the total number of steps allowed for training, or "training horizon". In contrast, humans use a plethora of different learning objectives across the course of acquiring a new ability. For instance, students may alter their studying techniques based on the proximity to exam deadlines and their self-assessed capabilities. This paper contends that ignoring the optimization time horizon significantly restricts the expressive potential of discovered learning algorithms. We propose a simple augmentation to two existing objective discovery approaches that allows the discovered algorithm to dynamically update its objective function throughout the agent's training procedure, resulting in expressive schedules and increased generalization across different training horizons. In the process, we find that commonly used meta-gradient approaches fail to discover such adaptive objective functions while evolution strategies discover highly dynamic learning rules. We demonstrate the effectiveness of our approach on a wide range of tasks and analyze the resulting learned algorithms, which we find effectively balance exploration and exploitation by modifying the structure of their learning rules throughout the agent's lifetime.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?
Authors:
Qiaozhu Mei,
Yutong Xie,
Walter Yuan,
Matthew O. Jackson
Abstract:
We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, \textit{etc.}, as well as how they respond to a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 exhibits behavioral and personality traits that are statis…
▽ More
We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, \textit{etc.}, as well as how they respond to a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 exhibits behavioral and personality traits that are statistically indistinguishable from a random human from tens of thousands of human subjects from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts ``as if'' they were learning from the interactions, and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner's payoffs.
△ Less
Submitted 1 January, 2024; v1 submitted 19 November, 2023;
originally announced December 2023.
-
Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design
Authors:
Matthew Thomas Jackson,
Minqi Jiang,
Jack Parker-Holder,
Risto Vuorio,
Chris Lu,
Gregory Farquhar,
Shimon Whiteson,
Jakob Nicolaus Foerster
Abstract:
The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), th…
▽ More
The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a generalization gap when these algorithms are applied to unseen environments. In this work, we examine how characteristics of the meta-training distribution impact the generalization performance of these algorithms. Motivated by this analysis and building on ideas from Unsupervised Environment Design (UED), we propose a novel approach for automatically generating curricula to maximize the regret of a meta-learned optimizer, in addition to a novel approximation of regret, which we name algorithmic regret (AR). The result is our method, General RL Optimizers Obtained Via Environment Design (GROOVE). In a series of experiments, we show that GROOVE achieves superior generalization to LPG, and evaluate AR against baseline metrics from UED, identifying it as a critical component of environment design in this setting. We believe this approach is a step towards the discovery of truly general RL algorithms, capable of solving a wide range of real-world environments.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction
Authors:
Jorge Quesada,
Lakshmi Sathidevi,
Ran Liu,
Nauman Ahad,
Joy M. Jackson,
Mehdi Azabou,
Jingyun Xiao,
Christopher Liding,
Matthew Jin,
Carolina Urzay,
William Gray-Roncal,
Erik C. Johnson,
Eva L. Dyer
Abstract:
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain reg…
▽ More
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
△ Less
Submitted 31 December, 2022;
originally announced January 2023.
-
Finite model theory for pseudovarieties and universal algebra: preservation, definability and complexity
Authors:
Lucy Ham,
Marcel Jackson
Abstract:
We explore new interactions between finite model theory and a number of classical streams of universal algebra and semigroup theory. A key result is an example of a finite algebra whose variety is not finitely axiomatisable in first order logic, but which has first order definable finite membership problem. This algebra witnesses the simultaneous failure of the Łos-Tarski Theorem, the SP-preservat…
▽ More
We explore new interactions between finite model theory and a number of classical streams of universal algebra and semigroup theory. A key result is an example of a finite algebra whose variety is not finitely axiomatisable in first order logic, but which has first order definable finite membership problem. This algebra witnesses the simultaneous failure of the Łos-Tarski Theorem, the SP-preservation theorem and Birkhoff's HSP-preservation theorem at the finite level as well as providing a negative solution to a first order formulation of the long-standing Eilenberg Schützenberger problem. The example also shows that a pseudovariety without any finite pseudo-identity basis may be finitely axiomatisable in first order logic. Other results include the undecidability of deciding first order definability of the pseudovariety of a finite algebra and a mapping from any fixed template constraint satisfaction problem to a first order equivalent variety membership problem, thereby providing examples of variety membership problems complete in each of the classes $\texttt{L}$, $\texttt{NL}$, $\texttt{Mod}_p(\texttt{L})$, $\texttt{P}$, and infinitely many others (depending on complexity-theoretic assumptions).
△ Less
Submitted 31 December, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Hypernetworks in Meta-Reinforcement Learning
Authors:
Jacob Beck,
Matthew Thomas Jackson,
Risto Vuorio,
Shimon Whiteson
Abstract:
Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns e…
▽ More
Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Multi-Modal Fusion by Meta-Initialization
Authors:
Matthew T. Jackson,
Shreshth A. Malik,
Michael T. Matthews,
Yousuf Mohamed-Ahmed
Abstract:
When experience is scarce, models may have insufficient information to adapt to a new task. In this case, auxiliary information - such as a textual description of the task - can enable improved task inference and adaptation. In this work, we propose an extension to the Model-Agnostic Meta-Learning algorithm (MAML), which allows the model to adapt using auxiliary information as well as task experie…
▽ More
When experience is scarce, models may have insufficient information to adapt to a new task. In this case, auxiliary information - such as a textual description of the task - can enable improved task inference and adaptation. In this work, we propose an extension to the Model-Agnostic Meta-Learning algorithm (MAML), which allows the model to adapt using auxiliary information as well as task experience. Our method, Fusion by Meta-Initialization (FuMI), conditions the model initialization on auxiliary information using a hypernetwork, rather than learning a single, task-agnostic initialization. Furthermore, motivated by the shortcomings of existing multi-modal few-shot learning benchmarks, we constructed iNat-Anim - a large-scale image classification dataset with succinct and visually pertinent textual class descriptions. On iNat-Anim, FuMI significantly outperforms uni-modal baselines such as MAML in the few-shot regime. The code for this project and a dataset exploration tool for iNat-Anim are publicly available at https://github.com/s-a-malik/multi-few .
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Qualitative representations of chromatic algebras
Authors:
Badriah Al Juaid,
Marcel Jackson,
James Koussas,
Tomasz Kowalski
Abstract:
Conventional Ramsey-theoretic investigations for edge-colourings of complete graphs are framed around avoidance of certain configurations. Motivated by considerations arising in the field of Qualitative Reasoning, we explore edge colourings that in addition to forbidding certain triangle configurations also require others to be present. These conditions have natural combinatorial interest in their…
▽ More
Conventional Ramsey-theoretic investigations for edge-colourings of complete graphs are framed around avoidance of certain configurations. Motivated by considerations arising in the field of Qualitative Reasoning, we explore edge colourings that in addition to forbidding certain triangle configurations also require others to be present. These conditions have natural combinatorial interest in their own right, but also correspond to qualitative representability of certain nonassociative relation algebras, which we will call chromatic.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Credit Freezes, Equilibrium Multiplicity, and Optimal Bailouts in Financial Networks
Authors:
Matthew O. Jackson,
Agathe Pernoud
Abstract:
We analyze how interdependencies between organizations in financial networks can lead to multiple possible equilibrium outcomes. A multiplicity arises if and only if there exists a certain type of dependency cycle in the network that allows for self-fulfilling chains of defaults. We provide necessary and sufficient conditions for banks' solvency in any equilibrium. Building on these conditions, we…
▽ More
We analyze how interdependencies between organizations in financial networks can lead to multiple possible equilibrium outcomes. A multiplicity arises if and only if there exists a certain type of dependency cycle in the network that allows for self-fulfilling chains of defaults. We provide necessary and sufficient conditions for banks' solvency in any equilibrium. Building on these conditions, we characterize the minimum bailout payments needed to ensure systemic solvency, as well as how solvency can be ensured by guaranteeing a specific set of debt payments. Bailout injections needed to eliminate self-fulfilling cycles of defaults (credit freezes) are fully recoverable, while those needed to prevent cascading defaults outside of cycles are not. We show that the minimum bailout problem is computationally hard, but provide an upper bound on optimal payments and show that the problem has intuitive solutions in specific network structures such as those with disjoint cycles or a core-periphery structure.
△ Less
Submitted 6 July, 2023; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Override and update
Authors:
Marcel Jackson,
Tim Stokes
Abstract:
Override and update are natural constructions for combining partial functions, which arise in various program specification contexts. We use an unexpected connection with combinatorial geometry to provide a complete finite system of equational axioms for the first order theory of the override and update constructions on partial functions, resolving the main unsolved problem in the area.
Override and update are natural constructions for combining partial functions, which arise in various program specification contexts. We use an unexpected connection with combinatorial geometry to provide a complete finite system of equational axioms for the first order theory of the override and update constructions on partial functions, resolving the main unsolved problem in the area.
△ Less
Submitted 2 January, 2021; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Why understanding multiplex social network structuring processes will help us better understand the evolution of human behavior
Authors:
Curtis Atkisson,
Piotr J. Górski,
Matthew O. Jackson,
Janusz A. Hołyst,
Raissa M. D'Souza
Abstract:
Social scientists have long appreciated that relationships between individuals cannot be described from observing a single domain, and that the structure across domains of interaction can have important effects on outcomes of interest (e.g., cooperation).1 One debate explicitly about this surrounds food sharing. Some argue that failing to find reciprocal food sharing means that some process other…
▽ More
Social scientists have long appreciated that relationships between individuals cannot be described from observing a single domain, and that the structure across domains of interaction can have important effects on outcomes of interest (e.g., cooperation).1 One debate explicitly about this surrounds food sharing. Some argue that failing to find reciprocal food sharing means that some process other than reciprocity must be occurring, whereas others argue for models that allow reciprocity to span domains in the form of trade.2 Multilayer networks, high-dimensional networks that allow us to consider multiple sets of relationships at the same time, are ubiquitous and have consequences, so processes giving rise to them are important social phenomena. The analysis of multi-dimensional social networks has recently garnered the attention of the network science community.3 Recent models of these processes show how ignoring layer interdependencies can lead one to miss why a layer formed the way it did, and/or draw erroneous conclusions.6 Understanding the structuring processes that underlie multiplex networks will help understand increasingly rich datasets, giving more accurate and complete pictures of social interactions.
△ Less
Submitted 27 May, 2020; v1 submitted 26 March, 2019;
originally announced March 2019.
-
Learning through the Grapevine: The Impact of Noise and the Breadth and Depth of Social Networks
Authors:
Matthew O. Jackson,
Suraj Malladi,
David McAdams
Abstract:
We examine how well people learn when information is noisily relayed from person to person; and we study how communication platforms can improve learning without censoring or fact-checking messages. We analyze learning as a function of social network depth (how many times information is relayed) and breadth (the number of relay chains accessed). Noise builds up as depth increases, so learning requ…
▽ More
We examine how well people learn when information is noisily relayed from person to person; and we study how communication platforms can improve learning without censoring or fact-checking messages. We analyze learning as a function of social network depth (how many times information is relayed) and breadth (the number of relay chains accessed). Noise builds up as depth increases, so learning requires greater breadth. In the presence of mutations (deliberate or random) and transmission failures of messages, we characterize sharp thresholds for breadths above which receivers learn fully and below which they learn nothing. When there is uncertainty about mutation rates, optimizing learning requires either capping depth, or if that is not possible, limiting breadth by capping the number of people to whom someone can forward a message. Limiting breadth cuts the number of messages received but also decreases the fraction originating further from the receiver, and so can increase the signal to noise ratio. Finally, we extend our model to study learning from message survival: e.g., people are more likely to pass messages with one conclusion than another. We find that as depth grows, all learning comes from either the total number of messages received or from the content of received messages, but the learner does not need to pay attention to both.
△ Less
Submitted 26 June, 2020; v1 submitted 8 December, 2018;
originally announced December 2018.
-
Domain and range for angelic and demonic compositions
Authors:
Marcel Jackson,
Szabolcs Mikulas
Abstract:
We give finite axiomatizations for the varieties generated by representable domain--range algebras when the semigroup operation is interpreted as angelic or demonic composition, respectively.
We give finite axiomatizations for the varieties generated by representable domain--range algebras when the semigroup operation is interpreted as angelic or demonic composition, respectively.
△ Less
Submitted 30 October, 2018;
originally announced November 2018.
-
A Typology of Social Capital and Associated Network Measures
Authors:
Matthew O. Jackson
Abstract:
I provide a typology of social capital, breaking it down into seven more fundamental forms of capital: information capital, brokerage capital, coordination and leadership capital, bridging capital, favor capital, reputation capital, and community capital. I discuss how most of these forms of social capital can be identified using different network-based measures.
I provide a typology of social capital, breaking it down into seven more fundamental forms of capital: information capital, brokerage capital, coordination and leadership capital, bridging capital, favor capital, reputation capital, and community capital. I discuss how most of these forms of social capital can be identified using different network-based measures.
△ Less
Submitted 23 February, 2019; v1 submitted 26 November, 2017;
originally announced November 2017.
-
Behavioral Communities and the Atomic Structure of Networks
Authors:
Matthew O. Jackson,
Evan C. Storms
Abstract:
When people prefer to coordinate their behaviors with their friends -- e.g., choosing whether to adopt a new technology, to protest against a government, to attend university -- divisions within a social network can sustain different behaviors in different parts of the network. We define a society's `behavioral communities' via its network's `atoms': groups of people who adopt the same behavior in…
▽ More
When people prefer to coordinate their behaviors with their friends -- e.g., choosing whether to adopt a new technology, to protest against a government, to attend university -- divisions within a social network can sustain different behaviors in different parts of the network. We define a society's `behavioral communities' via its network's `atoms': groups of people who adopt the same behavior in every equilibrium. We analyze how the atoms change with the intensity of the peer effects, and characterize the atoms in a prominent class of network models. We show that using knowledge of atoms to seed the diffusion of a behavior significantly increases diffusion compared to seeding based on standard community detection algorithms. We also show how to use observed behaviors to estimate the intensity of peer effects.
△ Less
Submitted 19 November, 2023; v1 submitted 12 October, 2017;
originally announced October 2017.
-
Axiomatisability and hardness for universal Horn classes of hypergraphs
Authors:
Lucy Ham,
Marcel Jackson
Abstract:
We characterise finite axiomatisability and intractability of deciding membership for universal Horn classes generated by finite loop-free hypergraphs.
We characterise finite axiomatisability and intractability of deciding membership for universal Horn classes generated by finite loop-free hypergraphs.
△ Less
Submitted 7 April, 2017;
originally announced April 2017.
-
A Network Formation Model Based on Subgraphs
Authors:
Arun G. Chandrasekhar,
Matthew O. Jackson
Abstract:
We develop a new class of random graph models for the statistical estimation of network formation -- subgraph generated models (SUGMs). Various subgraphs -- e.g., links, triangles, cliques, stars -- are generated and their union results in a network. We show that SUGMs are identified and establish the consistency and asymptotic distribution of parameter estimators in empirically relevant cases. We…
▽ More
We develop a new class of random graph models for the statistical estimation of network formation -- subgraph generated models (SUGMs). Various subgraphs -- e.g., links, triangles, cliques, stars -- are generated and their union results in a network. We show that SUGMs are identified and establish the consistency and asymptotic distribution of parameter estimators in empirically relevant cases. We show that a simple four-parameter SUGM matches basic patterns in empirical networks more closely than four standard models (with many more dimensions): (i) stochastic block models; (ii) models with node-level unobserved heterogeneity; (iii) latent space models; (iv) exponential random graphs. We illustrate the framework's value via several applications using networks from rural India. We study whether network structure helps enforce risk-sharing and whether cross-caste interactions are more likely to be private. We also develop a new central limit theorem for correlated random variables, which is required to prove our results and is of independent interest.
△ Less
Submitted 25 November, 2024; v1 submitted 23 November, 2016;
originally announced November 2016.
-
All or nothing: toward a promise problem dichotomy for constraint problems
Authors:
Lucy Ham,
Marcel Jackson
Abstract:
A finite constraint language $\mathscr{R}$ is a finite set of relations over some finite domain $A$. We show that intractability of the constraint satisfaction problem $\operatorname{CSP}(\mathscr{R})$ can, in all known cases, be replaced by an infinite hierarchy of intractable promise problems of increasingly disparate promise conditions: where instances are guaranteed to either have no solutions…
▽ More
A finite constraint language $\mathscr{R}$ is a finite set of relations over some finite domain $A$. We show that intractability of the constraint satisfaction problem $\operatorname{CSP}(\mathscr{R})$ can, in all known cases, be replaced by an infinite hierarchy of intractable promise problems of increasingly disparate promise conditions: where instances are guaranteed to either have no solutions at all, or to be $k$-robustly satisfiable (for any fixed $k$), meaning that every "reasonable" partial instantiation on~$k$ variables extends to a solution. For example, subject to the assumption $\texttt{P}\neq \texttt{NP}$, then for any~$k$, we show that there is no polynomial time algorithm that can distinguish non-$3$-colourable graphs, from those for which any reasonable $3$-colouring of any $k$ of the vertices can extend to a full $3$-colouring. Our main result shows that an analogous statement holds for all known intractable constraint problems over fixed finite constraint languages.
△ Less
Submitted 30 April, 2017; v1 submitted 3 November, 2016;
originally announced November 2016.
-
Networks: An Economic Perspective
Authors:
Matthew O. Jackson,
Brian W. Rogers,
Yves Zenou
Abstract:
We discuss social network analysis from the perspective of economics. We organize the presentaion around the theme of externalities: the effects that one's behavior has on others' well-being. Externalities underlie the interdependencies that make networks interesting. We discuss network formation, as well as interactions between peoples' behaviors within a given network, and the implications in a…
▽ More
We discuss social network analysis from the perspective of economics. We organize the presentaion around the theme of externalities: the effects that one's behavior has on others' well-being. Externalities underlie the interdependencies that make networks interesting. We discuss network formation, as well as interactions between peoples' behaviors within a given network, and the implications in a variety of settings. Finally, we highlight some empirical challenges inherent in the statistical analysis of network-based data.
△ Less
Submitted 28 August, 2016;
originally announced August 2016.
-
Diffusion in Networks and the Unexpected Virtue of Burstiness
Authors:
Mohammad Akbarpour,
Matthew O. Jackson
Abstract:
Whether an idea, information, infection, or innovation diffuses throughout a society depends not only on the structure of the network of interactions, but also on the timing of those interactions. Recent studies have shown that diffusion can fail on a network in which people are only active in "bursts", active for a while and then silent for a while, but diffusion could succeed on the same network…
▽ More
Whether an idea, information, infection, or innovation diffuses throughout a society depends not only on the structure of the network of interactions, but also on the timing of those interactions. Recent studies have shown that diffusion can fail on a network in which people are only active in "bursts", active for a while and then silent for a while, but diffusion could succeed on the same network if people were active in a more random Poisson manner. Those studies generally consider models in which nodes are active according to the same random timing process and then ask which timing is optimal. In reality, people differ widely in their activity patterns -- some are bursty and others are not. Here we show that, if people differ in their activity patterns, bursty behavior does not always hurt the diffusion, and in fact having some (but not all) of the population be bursty significantly helps diffusion. We prove that maximizing diffusion requires heterogeneous activity patterns across agents, and the overall maximizing pattern of agents' activity times does not involve any Poisson behavior.
△ Less
Submitted 18 December, 2017; v1 submitted 28 August, 2016;
originally announced August 2016.
-
Centrality Measures in Networks
Authors:
Francis Bloch,
Matthew O. Jackson,
Pietro Tebaldi
Abstract:
We show that prominent centrality measures in network analysis are all based on additively separable and linear treatments of statistics that capture a node's position in the network. This enables us to provide a taxonomy of centrality measures that distills them to varying on two dimensions: (i) which information they make use of about nodes' positions, and (ii) how that information is weighted a…
▽ More
We show that prominent centrality measures in network analysis are all based on additively separable and linear treatments of statistics that capture a node's position in the network. This enables us to provide a taxonomy of centrality measures that distills them to varying on two dimensions: (i) which information they make use of about nodes' positions, and (ii) how that information is weighted as a function of distance from the node in question. The three sorts of information about nodes' positions that are usually used -- which we refer to as "nodal statistics" -- are the paths from a given node to other nodes, the walks from a given node to other nodes, and the geodesics between other nodes that include a given node. Using such statistics on nodes' positions, we also characterize the types of trees such that centrality measures all agree, and we also discuss the properties that identify some path-based centrality measures.
△ Less
Submitted 22 January, 2021; v1 submitted 20 August, 2016;
originally announced August 2016.
-
Low growth equational complexity
Authors:
Marcel Jackson
Abstract:
The equational complexity function $β_\mathscr{V}:\mathbb{N}\to\mathbb{N}$ of an equational class of algebras $\mathscr{V}$ bounds the size of equation required to determine membership of $n$-element algebras in $\mathscr{V}$. Known examples of finitely generated varieties $\mathscr{V}$ with unbounded equational complexity have growth in $Ω(n^c)$, usually for $c\geq \frac{1}{2}$. We show that much…
▽ More
The equational complexity function $β_\mathscr{V}:\mathbb{N}\to\mathbb{N}$ of an equational class of algebras $\mathscr{V}$ bounds the size of equation required to determine membership of $n$-element algebras in $\mathscr{V}$. Known examples of finitely generated varieties $\mathscr{V}$ with unbounded equational complexity have growth in $Ω(n^c)$, usually for $c\geq \frac{1}{2}$. We show that much slower growth is possible, exhibiting $O(\log_2^3(n))$ growth amongst varieties of semilattice ordered inverse semigroups and additive idempotent semirings. We also examine a quasivariety analogue of equational complexity, and show that a finite group has polylogarithmic quasi-equational complexity function, bounded if and only if all Sylow subgroups are abelian.
△ Less
Submitted 2 January, 2021; v1 submitted 25 July, 2016;
originally announced July 2016.
-
Algebraic foundations for qualitative calculi and networks
Authors:
Robin Hirsch,
Marcel Jackson,
Tomasz Kowalski
Abstract:
A qualitative representation $φ$ is like an ordinary representation of a relation algebra, but instead of requiring $(a; b)^φ= a^φ| b^φ$, as we do for ordinary representations, we only require that $c^φ\supseteq a^φ| b^φ\iff c\geq a ; b$, for each $c$ in the algebra. A constraint network is qualitatively satisfiable if its nodes can be mapped to elements of a qualitative representation, preserving…
▽ More
A qualitative representation $φ$ is like an ordinary representation of a relation algebra, but instead of requiring $(a; b)^φ= a^φ| b^φ$, as we do for ordinary representations, we only require that $c^φ\supseteq a^φ| b^φ\iff c\geq a ; b$, for each $c$ in the algebra. A constraint network is qualitatively satisfiable if its nodes can be mapped to elements of a qualitative representation, preserving the constraints. If a constraint network is satisfiable then it is clearly qualitatively satisfiable, but the converse can fail. However, for a wide range of relation algebras including the point algebra, the Allen Interval Algebra, RCC8 and many others, a network is satisfiable if and only if it is qualitatively satisfiable.
Unlike ordinary composition, the weak composition arising from qualitative representations need not be associative, so we can generalise by considering network satisfaction problems over non-associative algebras. We prove that computationally, qualitative representations have many advantages over ordinary representations: whereas many finite relation algebras have only infinite representations, every finite qualitatively representable algebra has a finite qualitative representation; the representability problem for (the atom structures of) finite non-associative algebras is NP-complete; the network satisfaction problem over a finite qualitatively representable algebra is always in NP; the validity of equations over qualitative representations is co-NP-complete. On the other hand we prove that there is no finite axiomatisation of the class of qualitatively representable algebras.
△ Less
Submitted 19 June, 2017; v1 submitted 29 June, 2016;
originally announced June 2016.
-
The Friendship Paradox and Systematic Biases in Perceptions and Social Norms
Authors:
Matthew O. Jackson
Abstract:
The "friendship paradox" (Feld1991) refers to the fact that, on average, people have strictly fewer friends than their friends have. I show that this over-sampling of the most popular people amplifies behaviors that involve complementarities. People with more friends experience greater interactive effects and hence engage more in socially influenced activities. Given the friendship paradox, people…
▽ More
The "friendship paradox" (Feld1991) refers to the fact that, on average, people have strictly fewer friends than their friends have. I show that this over-sampling of the most popular people amplifies behaviors that involve complementarities. People with more friends experience greater interactive effects and hence engage more in socially influenced activities. Given the friendship paradox, people then perceive more engagement when sampling their friends than exists in the overall population. Given the complementarities, this feeds back to amplify average engagement. In addition, people with the greatest innate benefits from a behavior also tend to be the ones who choose to interact the most, leading to further feedback and amplification. These results are consistent with studies finding that people consistently overestimate peer consumption of alcohol, cigarettes, and drugs; and, can help explain problems with adolescent abuse of drugs and binge-drinking, as well as other behaviors. I also discuss how these results change in cases of strategic substitutes, where individuals overestimate free-riding by peers.
△ Less
Submitted 17 November, 2017; v1 submitted 14 May, 2016;
originally announced May 2016.
-
Flexible constraint satisfiability and a problem in semigroup theory
Authors:
Marcel Jackson
Abstract:
We examine some flexible notions of constraint satisfaction, observing some relationships between model theoretic notions of universal Horn class membership and robust satisfiability. We show the \texttt{NP}-completeness of $2$-robust monotone 1-in-3 3SAT in order to give very small examples of finite algebras with \texttt{NP}-hard variety membership problem. In particular we give a $3$-element al…
▽ More
We examine some flexible notions of constraint satisfaction, observing some relationships between model theoretic notions of universal Horn class membership and robust satisfiability. We show the \texttt{NP}-completeness of $2$-robust monotone 1-in-3 3SAT in order to give very small examples of finite algebras with \texttt{NP}-hard variety membership problem. In particular we give a $3$-element algebra with this property, and solve a widely stated problem by showing that the $6$-element Brandt monoid has \texttt{NP}-hard variety membership problem. These are the smallest possible sizes for a general algebra and a semigroup to exhibit \texttt{NP}-hardness for the membership problem of finite algebras in finitely generated varieties.
△ Less
Submitted 27 May, 2025; v1 submitted 9 December, 2015;
originally announced December 2015.
-
Pricing and Referrals in Diffusion on Networks
Authors:
Matt V. Leduc,
Matthew O. Jackson,
Ramesh Johari
Abstract:
When a new product or technology is introduced, potential consumers can learn its quality by trying the product, at a risk, or by letting others try it and free-riding on the information that they generate. We propose a dynamic game to study the adoption of technologies of uncertain value, when agents are connected by a network and a monopolist seller chooses a policy to maximize profits. Consumer…
▽ More
When a new product or technology is introduced, potential consumers can learn its quality by trying the product, at a risk, or by letting others try it and free-riding on the information that they generate. We propose a dynamic game to study the adoption of technologies of uncertain value, when agents are connected by a network and a monopolist seller chooses a policy to maximize profits. Consumers with low degree (few friends) have incentives to adopt early, while consumers with high degree have incentives to free ride. The seller can induce high-degree consumers to adopt early by offering referral incentives - rewards to early adopters whose friends buy in the second period. Referral incentives thus lead to a `double-threshold strategy' by which low and high-degree agents adopt the product early while middle-degree agents wait. We show that referral incentives are optimal on certain networks while inter-temporal price discrimination (i.e., a first-period price discount) is optimal on others, and discuss welfare implications.
△ Less
Submitted 26 June, 2017; v1 submitted 22 September, 2015;
originally announced September 2015.
-
A Dictionary Approach to EBSD Indexing
Authors:
Yu-Hui Chen,
Se Un Park,
Dennis Wei,
Gregory Newstadt,
Michael Jackson,
Jeff P. Simmons,
Marc De Graef,
Alfred O. Hero
Abstract:
We propose a framework for indexing of grain and sub-grain structures in electron backscatter diffraction (EBSD) images of polycrystalline materials. The framework is based on a previously introduced physics-based forward model by Callahan and De Graef (2013) relating measured patterns to grain orientations (Euler angle). The forward model is tuned to the microscope and the sample symmetry group.…
▽ More
We propose a framework for indexing of grain and sub-grain structures in electron backscatter diffraction (EBSD) images of polycrystalline materials. The framework is based on a previously introduced physics-based forward model by Callahan and De Graef (2013) relating measured patterns to grain orientations (Euler angle). The forward model is tuned to the microscope and the sample symmetry group. We discretize the domain of the forward model onto a dense grid of Euler angles and for each measured pattern we identify the most similar patterns in the dictionary. These patterns are used to identify boundaries, detect anomalies, and index crystal orientations. The statistical distribution of these closest matches is used in an unsupervised binary decision tree (DT) classifier to identify grain boundaries and anomalous regions. The DT classifies a pattern as an anomaly if it has an abnormally low similarity to any pattern in the dictionary. It classifies a pixel as being near a grain boundary if the highly ranked patterns in the dictionary differ significantly over the pixels 3x3 neighborhood. Indexing is accomplished by computing the mean orientation of the closest dictionary matches to each pattern. The mean orientation is estimated using a maximum likelihood approach that models the orientation distribution as a mixture of Von Mises-Fisher distributions over the quaternionic 3-sphere. The proposed dictionary matching approach permits segmentation, anomaly detection, and indexing to be performed in a unified manner with the additional benefit of uncertainty quantification. We demonstrate the proposed dictionary-based approach on a Ni-base IN100 alloy.
△ Less
Submitted 27 February, 2015; v1 submitted 26 February, 2015;
originally announced February 2015.
-
The algebra of functions with antidomain and range
Authors:
Robin Hirsch,
Marcel Jackson,
Szabolcs Mikulás
Abstract:
We give complete, finite quasiequational axiomatisations for algebras of unary partial functions under the operations of composition, domain, antidomain, range and intersection. This completes the extensive programme of classifying algebras of unary partial functions under combinations of these operations. We look at the complexity of the equational theories and provide a nondeterministic polynomi…
▽ More
We give complete, finite quasiequational axiomatisations for algebras of unary partial functions under the operations of composition, domain, antidomain, range and intersection. This completes the extensive programme of classifying algebras of unary partial functions under combinations of these operations. We look at the complexity of the equational theories and provide a nondeterministic polynomial upper bound. Finally we look at the problem of finite representability and show that finite algebras can be represented as a collection of unary functions over a finite base set provided that intersection is not in the signature.
△ Less
Submitted 15 October, 2014;
originally announced October 2014.
-
Monoids with tests and the algebra of possibly non-halting programs
Authors:
Marcel Jackson,
Tim Stokes
Abstract:
We study the algebraic theory of computable functions, which can be viewed as arising from possibly non-halting computer programs or algorithms, acting on some state space, equipped with operations of composition, {\em if-then-else} and {\em while-do} defined in terms of a Boolean algebra of conditions. It has previously been shown that there is no finite axiomatisation of algebras of partial func…
▽ More
We study the algebraic theory of computable functions, which can be viewed as arising from possibly non-halting computer programs or algorithms, acting on some state space, equipped with operations of composition, {\em if-then-else} and {\em while-do} defined in terms of a Boolean algebra of conditions. It has previously been shown that there is no finite axiomatisation of algebras of partial functions under these operations alone, and this holds even if one restricts attention to transformations (representing halting programs) rather than partial functions, and omits {\em while-do} from the signature. In the halting case, there is a natural "fix", which is to allow composition of halting programs with conditions, and then the resulting algebras admit a finite axiomatisation. In the current setting such compositions are not possible, but by extending the notion of {\em if-then-else}, we are able to give finite axiomatisations of the resulting algebras of (partial) functions, with {\em while-do} in the signature if the state space is assumed finite. The axiomatisations are extended to consider the partial predicate of equality. All algebras considered turn out to be enrichments of the notion of a (one-sided) restriction semigroup.
△ Less
Submitted 19 August, 2014;
originally announced August 2014.
-
A finer reduction of constraint problems to digraphs
Authors:
Jakub Bulín,
Dejan Delic,
Marcel Jackson,
Todd Niven
Abstract:
It is well known that the constraint satisfaction problem over a general relational structure A is polynomial time equivalent to the constraint problem over some associated digraph. We present a variant of this construction and show that the corresponding constraint satisfaction problem is logspace equivalent to that over A. Moreover, we show that almost all of the commonly encountered polymorph…
▽ More
It is well known that the constraint satisfaction problem over a general relational structure A is polynomial time equivalent to the constraint problem over some associated digraph. We present a variant of this construction and show that the corresponding constraint satisfaction problem is logspace equivalent to that over A. Moreover, we show that almost all of the commonly encountered polymorphism properties are held equivalently on the A and the constructed digraph. As a consequence, the Algebraic CSP dichotomy conjecture as well as the conjectures characterizing CSPs solvable in logspace and in nondeterministic logspace are equivalent to their restriction to digraphs.
△ Less
Submitted 27 December, 2015; v1 submitted 24 June, 2014;
originally announced June 2014.
-
Using Gossips to Spread Information: Theory and Evidence from a Randomized Controlled Trial
Authors:
Abhijit Banerjee,
Arun G. Chandrasekhar,
Esther Duflo,
Matthew O. Jackson
Abstract:
Is it possible to identify individuals who are highly central in a community without gathering any network information, simply by asking a few people? If we use people's nominees as seeds for a diffusion process, will it be successful? We explore these questions theoretically, via surveys, and via field experiments. We show via a model of information flow how members of a community can, just by tr…
▽ More
Is it possible to identify individuals who are highly central in a community without gathering any network information, simply by asking a few people? If we use people's nominees as seeds for a diffusion process, will it be successful? We explore these questions theoretically, via surveys, and via field experiments. We show via a model of information flow how members of a community can, just by tracking gossip about others, identify highly central individuals in their network. Asking villagers in rural Indian villages to name good seeds for diffusion, we find that they accurately nominate those who are central according to a measure tailored for diffusion - not just those with many friends or in powerful positions. Finally, we run a randomized field experiment in 213 other villages that tests how effective it is to use such nominations as seeds for a diffusion process. Relative to random seeds or those with high social status, hitting at least one seed nominated by villagers leads to more than a 65% increase in the spread of information.
△ Less
Submitted 8 May, 2017; v1 submitted 9 June, 2014;
originally announced June 2014.