-
Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona
Authors:
Philip Colangelo,
Ayse K. Coskun,
Jack Megrue,
Ciaran Roberts,
Shayan Sengupta,
Varun Sivaram,
Ethan Tiao,
Aroon Vijaykar,
Chris Williams,
Daniel C. Wilson,
Zack MacFarland,
Daniel Dreiling,
Nathan Morey,
Anuja Ratnayake,
Baskar Vairamohan
Abstract:
Artificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emer…
▽ More
Artificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emerald Conductor--that transforms AI data centers into flexible grid resources that can efficiently and immediately harness existing power systems without massive infrastructure buildout. Conducted at a 256-GPU cluster running representative AI workloads within a commercial, hyperscale cloud data center in Phoenix, Arizona, the trial achieved a 25% reduction in cluster power usage for three hours during peak grid events while maintaining AI quality of service (QoS) guarantees. By orchestrating AI workloads based on real-time grid signals without hardware modifications or energy storage, this platform reimagines data centers as grid-interactive assets that enhance grid reliability, advance affordability, and accelerate AI's development.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Toward Reliable Scientific Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models
Authors:
Guangzhi Xiong,
Eric Xie,
Corey Williams,
Myles Kim,
Amir Hassan Shariatmadari,
Sikun Guo,
Stefan Bekiranov,
Aidong Zhang
Abstract:
Large language models (LLMs) have shown significant potential in scientific disciplines such as biomedicine, particularly in hypothesis generation, where they can analyze vast literature, identify patterns, and suggest research directions. However, a key challenge lies in evaluating the truthfulness of generated hypotheses, as verifying their accuracy often requires substantial time and resources.…
▽ More
Large language models (LLMs) have shown significant potential in scientific disciplines such as biomedicine, particularly in hypothesis generation, where they can analyze vast literature, identify patterns, and suggest research directions. However, a key challenge lies in evaluating the truthfulness of generated hypotheses, as verifying their accuracy often requires substantial time and resources. Additionally, the hallucination problem in LLMs can lead to the generation of hypotheses that appear plausible but are ultimately incorrect, undermining their reliability. To facilitate the systematic study of these challenges, we introduce TruthHypo, a benchmark for assessing the capabilities of LLMs in generating truthful scientific hypotheses, and KnowHD, a knowledge-based hallucination detector to evaluate how well hypotheses are grounded in existing knowledge. Our results show that LLMs struggle to generate truthful hypotheses. By analyzing hallucinations in reasoning steps, we demonstrate that the groundedness scores provided by KnowHD serve as an effective metric for filtering truthful hypotheses from the diverse outputs of LLMs. Human evaluations further validate the utility of KnowHD in identifying truthful hypotheses and accelerating scientific discovery. Our data and source code are available at https://github.com/Teddy-XiongGZ/TruthHypo.
△ Less
Submitted 8 June, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
Fusing Foveal Fixations Using Linear Retinal Transformations and Bayesian Experimental Design
Authors:
Christopher K. I. Williams
Abstract:
Humans (and many vertebrates) face the problem of fusing together multiple fixations of a scene in order to obtain a representation of the whole, where each fixation uses a high-resolution fovea and decreasing resolution in the periphery. In this paper we explicitly represent the retinal transformation of a fixation as a linear downsampling of a high-resolution latent image of the scene, exploitin…
▽ More
Humans (and many vertebrates) face the problem of fusing together multiple fixations of a scene in order to obtain a representation of the whole, where each fixation uses a high-resolution fovea and decreasing resolution in the periphery. In this paper we explicitly represent the retinal transformation of a fixation as a linear downsampling of a high-resolution latent image of the scene, exploiting the known geometry. This linear transformation allows us to carry out exact inference for the latent variables in factor analysis (FA) and mixtures of FA models of the scene. Further, this allows us to formulate and solve the choice of "where to look next" as a Bayesian experimental design problem using the Expected Information Gain criterion. Experiments on the Frey faces and MNIST datasets demonstrate the effectiveness of our models.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines
Authors:
Sachin R. Pendse,
Darren Gergle,
Rachel Kornfield,
Jonah Meyerhoff,
David Mohr,
Jina Suh,
Annie Wescott,
Casey Williams,
Jessica Schleider
Abstract:
Red-teaming is a core part of the infrastructure that ensures that AI models do not produce harmful content. Unlike past technologies, the black box nature of generative AI systems necessitates a uniquely interactional mode of testing, one in which individuals on red teams actively interact with the system, leveraging natural language to simulate malicious actors and solicit harmful outputs. This…
▽ More
Red-teaming is a core part of the infrastructure that ensures that AI models do not produce harmful content. Unlike past technologies, the black box nature of generative AI systems necessitates a uniquely interactional mode of testing, one in which individuals on red teams actively interact with the system, leveraging natural language to simulate malicious actors and solicit harmful outputs. This interactional labor done by red teams can result in mental health harms that are uniquely tied to the adversarial engagement strategies necessary to effectively red team. The importance of ensuring that generative AI models do not propagate societal or individual harm is widely recognized -- one less visible foundation of end-to-end AI safety is also the protection of the mental health and wellbeing of those who work to keep model outputs safe. In this paper, we argue that the unmet mental health needs of AI red-teamers is a critical workplace safety concern. Through analyzing the unique mental health impacts associated with the labor done by red teams, we propose potential individual and organizational strategies that could be used to meet these needs, and safeguard the mental health of red-teamers. We develop our proposed strategies through drawing parallels between common red-teaming practices and interactional labor common to other professions (including actors, mental health professionals, conflict photographers, and content moderators), describing how individuals and organizations within these professional spaces safeguard their mental health given similar psychological demands. Drawing on these protective practices, we describe how safeguards could be adapted for the distinct mental health challenges experienced by red teaming organizations as they mitigate emerging technological risks on the new digital frontlines.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Principles and Policy Recommendations for Comprehensive Genetic Data Governance
Authors:
Vivek Ramanan,
Ria Vinod,
Cole Williams,
Sohini Ramachandran,
Suresh Venkatasubramanian
Abstract:
Genetic data collection has become ubiquitous, producing genetic information about health, ancestry, and social traits. However, unregulated use, especially amid evolving scientific understanding, poses serious privacy and discrimination risks. These risks are intensified by advancing AI, particularly multi-modal systems integrating genetic, clinical, behavioral, and environmental data. In this wo…
▽ More
Genetic data collection has become ubiquitous, producing genetic information about health, ancestry, and social traits. However, unregulated use, especially amid evolving scientific understanding, poses serious privacy and discrimination risks. These risks are intensified by advancing AI, particularly multi-modal systems integrating genetic, clinical, behavioral, and environmental data. In this work, we organize the uses of genetic data along four distinct "pillars", and develop a risk assessment framework that identifies key values any governance system must preserve. In doing so, we draw on current privacy scholarship concerning contextual integrity, data relationality, and the Belmont principle. We apply the framework to four real-world case studies and identify critical gaps in existing regulatory frameworks and specific threats to privacy and personal liberties, particularly through genetic discrimination. Finally, we offer three policy recommendations for genetic data governance that safeguard individual rights in today's under-regulated ecosystem of large-scale genetic data collection and usage.
△ Less
Submitted 30 May, 2025; v1 submitted 13 February, 2025;
originally announced February 2025.
-
Score-Optimal Diffusion Schedules
Authors:
Christopher Williams,
Andrew Campbell,
Arnaud Doucet,
Saifuddin Syed
Abstract:
Denoising diffusion models (DDMs) offer a flexible framework for sampling from high dimensional data distributions. DDMs generate a path of probability distributions interpolating between a reference Gaussian distribution and a data distribution by incrementally injecting noise into the data. To numerically simulate the sampling process, a discretisation schedule from the reference back towards cl…
▽ More
Denoising diffusion models (DDMs) offer a flexible framework for sampling from high dimensional data distributions. DDMs generate a path of probability distributions interpolating between a reference Gaussian distribution and a data distribution by incrementally injecting noise into the data. To numerically simulate the sampling process, a discretisation schedule from the reference back towards clean data must be chosen. An appropriate discretisation schedule is crucial to obtain high quality samples. However, beyond hand crafted heuristics, a general method for choosing this schedule remains elusive. This paper presents a novel algorithm for adaptively selecting an optimal discretisation schedule with respect to a cost that we derive. Our cost measures the work done by the simulation procedure to transport samples from one point in the diffusion path to the next. Our method does not require hyperparameter tuning and adapts to the dynamics and geometry of the diffusion path. Our algorithm only involves the evaluation of the estimated Stein score, making it scalable to existing pre-trained models at inference time and online during training. We find that our learned schedule recovers performant schedules previously only discovered through manual search and obtains competitive FID scores on image datasets.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Exploring Crowdworkers' Perceptions, Current Practices, and Desired Practices Regarding Using Non-Workstation Devices for Crowdwork
Authors:
Senjuti Dutta,
Scott Ruoti,
Rhema Linder,
Alex C. Williams,
Anastasia Kuzminykh
Abstract:
Despite a plethora of research dedicated to designing HITs for non-workstations, there is a lack of research looking specifically into workers' perceptions of the suitability of these devices for managing and completing work. In this work, we fill this research gap by conducting an online survey of 148 workers on Amazon Mechanical Turk to explore 1. how crowdworkers currently use their non-worksta…
▽ More
Despite a plethora of research dedicated to designing HITs for non-workstations, there is a lack of research looking specifically into workers' perceptions of the suitability of these devices for managing and completing work. In this work, we fill this research gap by conducting an online survey of 148 workers on Amazon Mechanical Turk to explore 1. how crowdworkers currently use their non-workstation devices to complete and manage crowdwork, 2. what challenges they face using those devices, and 3. to what extent they wish they could use those devices if their concerns were addressed. Our results show that workers unanimously favor using a desktop to complete and manage crowdwork. While workers occasionally use smartphones or tablets, they find their suitability marginal at best and have little interest in smart speakers and smartwatches, viewing them as unsuitable for crowdwork. When investigating the reason for these views, we find that the key issue is that non workstation devices lack the tooling necessary to automatically find and accept HITs, tooling that workers view as essential in their efforts to compete with bots in accepting high paying work. To address this problem, we propose a new paradigm for finding, accepting, and completing crowdwork that puts crowdworkers on equal footing with bots in these tasks. We also describe future research directions for tailoring HITs to non workstation devices and definitely answering whether smart speakers and smartwatches have a place in crowdwork.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Unveiling the Inter-Related Preferences of Crowdworkers: Implications for Personalized and Flexible Platform Design
Authors:
Senjuti Dutta,
Rhema Linder,
Alex C. Williams,
Anastasia Kuzminykh,
Scott Ruoti
Abstract:
Crowdsourcing platforms have traditionally been designed with a focus on workstation interfaces, restricting the flexibility that crowdworkers need. Recognizing this limitation and the need for more adaptable platforms, prior research has highlighted the diverse work processes of crowdworkers, influenced by factors such as device type and work stage. However, these variables have largely been stud…
▽ More
Crowdsourcing platforms have traditionally been designed with a focus on workstation interfaces, restricting the flexibility that crowdworkers need. Recognizing this limitation and the need for more adaptable platforms, prior research has highlighted the diverse work processes of crowdworkers, influenced by factors such as device type and work stage. However, these variables have largely been studied in isolation. Our study is the first to explore the interconnected variabilities among these factors within the crowdwork community. Through a survey involving 150 Amazon Mechanical Turk crowdworkers, we uncovered three distinct groups characterized by their interrelated variabilities in key work aspects. The largest group exhibits a reliance on traditional devices, showing limited interest in integrating smartphones and tablets into their work routines. The second-largest group also primarily uses traditional devices but expresses a desire for supportive tools and scripts that enhance productivity across all devices, particularly smartphones and tablets. The smallest group actively uses and strongly prefers non-workstation devices, especially smartphones and tablets, for their crowdworking activities. We translate our findings into design insights for platform developers, discussing the implications for creating more personalized, flexible, and efficient crowdsourcing environments. Additionally, we highlight the unique work practices of these crowdworker clusters, offering a contrast to those of more traditional and established worker groups.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Improved 3D Whole Heart Geometry from Sparse CMR Slices
Authors:
Yiyang Xu,
Hao Xu,
Matthew Sinclair,
Esther Puyol-Antón,
Steven A Niederer,
Amedeo Chiribiri,
Steven E Williams,
Michelle C Williams,
Alistair A Young
Abstract:
Cardiac magnetic resonance (CMR) imaging and computed tomography (CT) are two common non-invasive imaging methods for assessing patients with cardiovascular disease. CMR typically acquires multiple sparse 2D slices, with unavoidable respiratory motion artefacts between slices, whereas CT acquires isotropic dense data but uses ionising radiation. In this study, we explore the combination of Slice S…
▽ More
Cardiac magnetic resonance (CMR) imaging and computed tomography (CT) are two common non-invasive imaging methods for assessing patients with cardiovascular disease. CMR typically acquires multiple sparse 2D slices, with unavoidable respiratory motion artefacts between slices, whereas CT acquires isotropic dense data but uses ionising radiation. In this study, we explore the combination of Slice Shifting Algorithm (SSA), Spatial Transformer Network (STN), and Label Transformer Network (LTN) to: 1) correct respiratory motion between segmented slices, and 2) transform sparse segmentation data into dense segmentation. All combinations were validated using synthetic motion-corrupted CMR slice segmentation generated from CT in 1699 cases, where the dense CT serves as the ground truth. In 199 testing cases, SSA-LTN achieved the best results for Dice score and Huasdorff distance (94.0% and 4.7 mm respectively, average over 5 labels) but gave topological errors in 8 cases. STN was effective as a plug-in tool for correcting all topological errors with minimal impact on overall performance (93.5% and 5.0 mm respectively). SSA also proves to be a valuable plug-in tool, enhancing performance over both STN-based and LTN-based models. The code for these different combinations is available at https://github.com/XESchong/STACOM2024.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Naive Bayes Classifiers and One-hot Encoding of Categorical Variables
Authors:
Christopher K. I. Williams
Abstract:
This paper investigates the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding, when using a Naïve Bayes classifier. This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Naïve Bayes classifier. The differences between the two classifiers are analysed mathematically and experimentally. In our experiment…
▽ More
This paper investigates the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding, when using a Naïve Bayes classifier. This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Naïve Bayes classifier. The differences between the two classifiers are analysed mathematically and experimentally. In our experiments using probability vectors drawn from a Dirichlet distribution, the two classifiers are found to agree on the maximum a posteriori class label for most cases, although the posterior probabilities are usually greater for the PoB case.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
LaPlaSS: Latent Space Planning for Stochastic Systems
Authors:
Marlyse Reeves,
Brian C. Williams
Abstract:
Autonomous mobile agents often operate in hazardous environments, necessitating an awareness of safety. These agents can have non-linear, stochastic dynamics that must be considered during planning to guarantee bounded risk. Most state of the art methods require closed-form dynamics to verify plan correctness and safety however modern robotic systems often have dynamics that are learned from data.…
▽ More
Autonomous mobile agents often operate in hazardous environments, necessitating an awareness of safety. These agents can have non-linear, stochastic dynamics that must be considered during planning to guarantee bounded risk. Most state of the art methods require closed-form dynamics to verify plan correctness and safety however modern robotic systems often have dynamics that are learned from data. Thus, there is a need to perform efficient trajectory planning with guarantees on risk for agents without known dynamics models. We propose a "generate-and-test" approach to risk-bounded planning in which a planner generates a candidate trajectory using an approximate linear dynamics model and a validator assesses the risk of the trajectory, computing additional safety constraints for the planner if the candidate does not satisfy the desired risk bound. To acquire the approximate model, we use a variational autoencoder to learn a latent linear dynamics model and encode the planning problem into the latent space to generate the candidate trajectory. The VAE also serves to sample trajectories around the candidate to use in the validator. We demonstrate that our algorithm, LaPlaSS, is able to generate trajectory plans with bounded risk for a real-world agent with learned dynamics and is an order of magnitude more efficient than the state of the art.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Physics-informed generative model for drug-like molecule conformers
Authors:
David C. Williams,
Neil Inala
Abstract:
We present a diffusion-based, generative model for conformer generation. Our model is focused on the reproduction of bonded structure and is constructed from the associated terms traditionally found in classical force fields to ensure a physically relevant representation. Techniques in deep learning are used to infer atom typing and geometric parameters from a training set. Conformer sampling is a…
▽ More
We present a diffusion-based, generative model for conformer generation. Our model is focused on the reproduction of bonded structure and is constructed from the associated terms traditionally found in classical force fields to ensure a physically relevant representation. Techniques in deep learning are used to infer atom typing and geometric parameters from a training set. Conformer sampling is achieved by taking advantage of recent advancements in diffusion-based generation. By training on large, synthetic data sets of diverse, drug-like molecules optimized with the semiempirical GFN2-xTB method, high accuracy is achieved for bonded parameters, exceeding that of conventional, knowledge-based methods. Results are also compared to experimental structures from the Protein Databank (PDB) and Cambridge Structural Database (CSD).
△ Less
Submitted 14 March, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
The Minimum Information about CLinical Artificial Intelligence Checklist for Generative Modeling Research (MI-CLAIM-GEN)
Authors:
Brenda Y. Miao,
Irene Y. Chen,
Christopher YK Williams,
Jaysón Davidson,
Augusto Garcia-Agundez,
Shenghuan Sun,
Travis Zack,
Suchi Saria,
Rima Arnaout,
Giorgio Quer,
Hossein J. Sadaei,
Ali Torkamani,
Brett Beaulieu-Jones,
Bin Yu,
Milena Gianfrancesco,
Atul J. Butte,
Beau Norgeot,
Madhumita Sushil
Abstract:
Recent advances in generative models, including large language models (LLMs), vision language models (VLMs), and diffusion models, have accelerated the field of natural language and image processing in medicine and marked a significant paradigm shift in how biomedical models can be developed and deployed. While these models are highly adaptable to new tasks, scaling and evaluating their usage pres…
▽ More
Recent advances in generative models, including large language models (LLMs), vision language models (VLMs), and diffusion models, have accelerated the field of natural language and image processing in medicine and marked a significant paradigm shift in how biomedical models can be developed and deployed. While these models are highly adaptable to new tasks, scaling and evaluating their usage presents new challenges not addressed in previous frameworks. In particular, the ability of these models to produce useful outputs with little to no specialized training data ("zero-" or "few-shot" approaches), as well as the open-ended nature of their outputs, necessitate the development of new guidelines for robust reporting of clinical generative model research. In response to gaps in standards and best practices for the development of clinical AI tools identified by US Executive Order 141103 and several emerging national networks for clinical AI evaluation, we begin to formalize some of these guidelines by building on the original MI-CLAIM checklist. The new checklist, MI-CLAIM-GEN (Table 1), aims to address differences in training, evaluation, interpretability, and reproducibility of new generative models compared to non-generative ("predictive") AI models. This MI-CLAIM-GEN checklist also seeks to clarify cohort selection reporting with unstructured clinical data and adds additional items on alignment with ethical standards for clinical AI research.
△ Less
Submitted 11 July, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Approximations to the Fisher Information Metric of Deep Generative Models for Out-Of-Distribution Detection
Authors:
Sam Dauncey,
Chris Holmes,
Christopher Williams,
Fabian Falck
Abstract:
Likelihood-based deep generative models such as score-based diffusion models and variational autoencoders are state-of-the-art machine learning models approximating high-dimensional distributions of data such as images, text, or audio. One of many downstream tasks they can be naturally applied to is out-of-distribution (OOD) detection. However, seminal work by Nalisnick et al. which we reproduce s…
▽ More
Likelihood-based deep generative models such as score-based diffusion models and variational autoencoders are state-of-the-art machine learning models approximating high-dimensional distributions of data such as images, text, or audio. One of many downstream tasks they can be naturally applied to is out-of-distribution (OOD) detection. However, seminal work by Nalisnick et al. which we reproduce showed that deep generative models consistently infer higher log-likelihoods for OOD data than data they were trained on, marking an open problem. In this work, we analyse using the gradient of a data point with respect to the parameters of the deep generative model for OOD detection, based on the simple intuition that OOD data should have larger gradient norms than training data. We formalise measuring the size of the gradient as approximating the Fisher information metric. We show that the Fisher information matrix (FIM) has large absolute diagonal values, motivating the use of chi-square distributed, layer-wise gradient norms as features. We combine these features to make a simple, model-agnostic and hyperparameter-free method for OOD detection which estimates the joint density of the layer-wise gradient norms for a given data point. We find that these layer-wise gradient norms are weakly correlated, rendering their combined usage informative, and prove that the layer-wise gradient norms satisfy the principle of (data representation) invariance. Our empirical results indicate that this method outperforms the Typicality test for most deep generative models and image dataset pairings.
△ Less
Submitted 25 May, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
Libfork: portable continuation-stealing with stackless coroutines
Authors:
Conor John Williams,
James Elliott
Abstract:
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation stealing in traditional High Performance Computing (HPC) languages -- where it is often impossible without modifying the compiler or resorting to non-portable techni…
▽ More
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation stealing in traditional High Performance Computing (HPC) languages -- where it is often impossible without modifying the compiler or resorting to non-portable techniques. We demonstrate how stackless coroutines (a new feature in C++20) can enable fully-portable continuation stealing and present libfork a lock-free fine-grained parallelism library, combining coroutines with user-space, geometric segmented-stacks. We show our approach is able to achieve optimal time/memory scaling, both theoretically and empirically, across a variety of benchmarks. Compared to openMP (libomp), libfork is on average 7.2x faster and consumes 10x less memory. Similarly, compared to Intel's TBB, libfork is on average 2.7x faster and consumes 6.2x less memory. Additionally, we introduce non-uniform memory access (NUMA) optimizations for schedulers that demonstrate performance matching busy-waiting schedulers.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
LLMs as Meta-Reviewers' Assistants: A Case Study
Authors:
Eftekhar Hossain,
Sanjeev Kumar Sinha,
Naman Bansal,
Alex Knipper,
Souvika Sarkar,
John Salvador,
Yash Mahajan,
Sri Guttikonda,
Mousumi Akter,
Md. Mahadi Hassan,
Matthew Freestone,
Matthew C. Williams Jr.,
Dongji Feng,
Santu Karmaker
Abstract:
One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves assimilating diverse opinions from multiple expert peers, formulating one's self-judgment as a senior expert, and then summarizing all these perspectives into a concise holistic overview to make an overall recommendation. This process is time-consuming and can be compromised…
▽ More
One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves assimilating diverse opinions from multiple expert peers, formulating one's self-judgment as a senior expert, and then summarizing all these perspectives into a concise holistic overview to make an overall recommendation. This process is time-consuming and can be compromised by human factors like fatigue, inconsistency, missing tiny details, etc. Given the latest major developments in Large Language Models (LLMs), it is very compelling to rigorously study whether LLMs can help metareviewers perform this important task better. In this paper, we perform a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and PaLM2, to assist meta-reviewers in better comprehending multiple experts perspectives by generating a controlled multi-perspective summary (MPS) of their opinions. To achieve this, we prompt three LLMs with different types/levels of prompts based on the recently proposed TELeR taxonomy. Finally, we perform a detailed qualitative study of the MPSs generated by the LLMs and report our findings.
△ Less
Submitted 8 February, 2025; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models
Authors:
Brenda Y. Miao,
Christopher YK Williams,
Ebenezer Chinedu-Eneh,
Travis Zack,
Emily Alsentzer,
Atul J. Butte,
Irene Y. Chen
Abstract:
Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women in the United States using contraceptives, understanding the factors that drive contraceptives selection and switching is of significant interest. However, many factors related to medication switching are often only captured in unstructured clinical notes and can be difficult to…
▽ More
Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women in the United States using contraceptives, understanding the factors that drive contraceptives selection and switching is of significant interest. However, many factors related to medication switching are often only captured in unstructured clinical notes and can be difficult to extract. Here, we evaluate the zero-shot abilities of a recently developed large language model, GPT-4 (via HIPAA-compliant Microsoft Azure API), to identify reasons for switching between classes of contraceptives from the UCSF Information Commons clinical notes dataset. We demonstrate that GPT-4 can accurately extract reasons for contraceptive switching, outperforming baseline BERT-based models with microF1 scores of 0.849 and 0.881 for contraceptive start and stop extraction, respectively. Human evaluation of GPT-4-extracted reasons for switching showed 91.4% accuracy, with minimal hallucinations. Using extracted reasons, we identified patient preference, adverse events, and insurance as key reasons for switching using unsupervised topic modeling approaches. Notably, we also showed using our approach that "weight gain/mood change" and "insurance coverage" are disproportionately found as reasons for contraceptive switching in specific demographic populations. Our code and supplemental data are available at https://github.com/BMiao10/contraceptive-switching.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Exploring Multimodal Large Language Models for Radiology Report Error-checking
Authors:
Jinge Wu,
Yunsoo Kim,
Eva C. Keller,
Jamie Chow,
Adam P. Levine,
Nikolas Pontikos,
Zina Ibrahim,
Paul Taylor,
Michelle C. Williams,
Honghan Wu
Abstract:
This paper proposes one of the first clinical applications of multimodal large language models (LLMs) as an assistant for radiologists to check errors in their reports. We created an evaluation dataset from real-world radiology datasets (including X-rays and CT scans). A subset of original reports was modified to contain synthetic errors by introducing three types of mistakes: "insert", "remove",…
▽ More
This paper proposes one of the first clinical applications of multimodal large language models (LLMs) as an assistant for radiologists to check errors in their reports. We created an evaluation dataset from real-world radiology datasets (including X-rays and CT scans). A subset of original reports was modified to contain synthetic errors by introducing three types of mistakes: "insert", "remove", and "substitute". The evaluation contained two difficulty levels: SIMPLE for binary error-checking and COMPLEX for identifying error types. At the SIMPLE level, our fine-tuned model significantly enhanced performance by 47.4% and 25.4% on MIMIC-CXR and IU X-ray data, respectively. This performance boost is also observed in unseen modality, CT scans, as the model performed 19.46% better than the baseline model. The model also surpassed the domain expert's accuracy in the MIMIC-CXR dataset by 1.67%. Notably, among the subsets (N=21) of the test set where a clinician did not achieve the correct conclusion, the LLaVA ensemble mode correctly identified 71.4% of these cases. However, all models performed poorly in identifying mistake types, underscoring the difficulty of the COMPLEX level. This study marks a promising step toward utilizing multimodal LLMs to enhance diagnostic accuracy in radiology. The ensemble model demonstrated comparable performance to clinicians, even capturing errors overlooked by humans.
△ Less
Submitted 3 March, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Adaptation and Communication in Human-Robot Teaming to Handle Discrepancies in Agents' Beliefs about Plans
Authors:
Yuening Zhang,
Brian C. Williams
Abstract:
When agents collaborate on a task, it is important that they have some shared mental model of the task routines -- the set of feasible plans towards achieving the goals. However, in reality, situations often arise that such a shared mental model cannot be guaranteed, such as in ad-hoc teams where agents may follow different conventions or when contingent constraints arise that only some agents are…
▽ More
When agents collaborate on a task, it is important that they have some shared mental model of the task routines -- the set of feasible plans towards achieving the goals. However, in reality, situations often arise that such a shared mental model cannot be guaranteed, such as in ad-hoc teams where agents may follow different conventions or when contingent constraints arise that only some agents are aware of. Previous work on human-robot teaming has assumed that the team has a set of shared routines, which breaks down in these situations. In this work, we leverage epistemic logic to enable agents to understand the discrepancy in each other's beliefs about feasible plans and dynamically plan their actions to adapt or communicate to resolve the discrepancy. We propose a formalism that extends conditional doxastic logic to describe knowledge bases in order to explicitly represent agents' nested beliefs on the feasible plans and state of execution. We provide an online execution algorithm based on Monte Carlo Tree Search for the agent to plan its action, including communication actions to explain the feasibility of plans, announce intent, and ask questions. Finally, we evaluate the success rate and scalability of the algorithm and show that our agent is better equipped to work in teams without the guarantee of a shared mental model.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Revealing the impact of social circumstances on the selection of cancer therapy through natural language processing of social work notes
Authors:
Shenghuan Sun,
Travis Zack,
Christopher Y. K. Williams,
Atul J. Butte,
Madhumita Sushil
Abstract:
We aimed to investigate the impact of social circumstances on cancer therapy selection using natural language processing to derive insights from social worker documentation. We developed and employed a Bidirectional Encoder Representations from Transformers (BERT) based approach, using a hierarchical multi-step BERT model (BERT-MS) to predict the prescription of targeted cancer therapy to patients…
▽ More
We aimed to investigate the impact of social circumstances on cancer therapy selection using natural language processing to derive insights from social worker documentation. We developed and employed a Bidirectional Encoder Representations from Transformers (BERT) based approach, using a hierarchical multi-step BERT model (BERT-MS) to predict the prescription of targeted cancer therapy to patients based solely on documentation by clinical social workers. Our corpus included free-text clinical social work notes, combined with medication prescription information, for all patients treated for breast cancer. We conducted a feature importance analysis to pinpoint the specific social circumstances that impact cancer therapy selection. Using only social work notes, we consistently predicted the administration of targeted therapies, suggesting systematic differences in treatment selection exist due to non-clinical factors. The UCSF-BERT model, pretrained on clinical text at UCSF, outperformed other publicly available language models with an AUROC of 0.675 and a Macro F1 score of 0.599. The UCSF BERT-MS model, capable of leveraging multiple pieces of notes, surpassed the UCSF-BERT model in both AUROC and Macro-F1. Our feature importance analysis identified several clinically intuitive social determinants of health (SDOH) that potentially contribute to disparities in treatment. Our findings indicate that significant disparities exist among breast cancer patients receiving different types of therapies based on social determinants of health. Social work reports play a crucial role in understanding these disparities in clinical decision-making.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups using a Single Model across Cages
Authors:
Michael P. J. Camilleri,
Rasneer S. Bains,
Christopher K. I. Williams
Abstract:
Behavioural experiments often happen in specialised arenas, but this may confound the analysis. To address this issue, we provide tools to study mice in the home-cage environment, equipping biologists with the possibility to capture the temporal aspect of the individual's behaviour and model the interaction and interdependence between cage-mates with minimal human intervention. Our main contributi…
▽ More
Behavioural experiments often happen in specialised arenas, but this may confound the analysis. To address this issue, we provide tools to study mice in the home-cage environment, equipping biologists with the possibility to capture the temporal aspect of the individual's behaviour and model the interaction and interdependence between cage-mates with minimal human intervention. Our main contribution is the novel Group Behaviour Model (GBM) which summarises the joint behaviour of groups of mice across cages, using a permutation matrix to match the mouse identities in each cage to the model. In support of the above, we also (a) developed the Activity Labelling Module (ALM) to automatically classify mouse behaviour from video, and (b) released two datasets, ABODe for training behaviour classifiers and IMADGE for modelling behaviour.
△ Less
Submitted 24 June, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning
Authors:
Edward C. Williams,
Grace Su,
Sandra R. Schloen,
Miller C. Prosser,
Susanne Paulus,
Sanjay Krishnan
Abstract:
Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich datase…
▽ More
Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich dataset consisting of over 5,000 annotated tablet images and 100,000 cuneiform sign bounding boxes. We leverage this dataset to develop DeepScribe, a modular computer vision pipeline capable of localizing cuneiform signs and providing suggestions for the identity of each sign. We investigate the difficulty of learning subtasks relevant to cuneiform tablet transcription on ground-truth data, finding that a RetinaNet object detector can achieve a localization mAP of 0.78 and a ResNet classifier can achieve a top-5 sign classification accuracy of 0.89. The end-to-end pipeline achieves a top-5 classification accuracy of 0.80. As part of the classification module, DeepScribe groups cuneiform signs into morphological clusters. We consider how this automatic clustering approach differs from the organization of standard, printed sign lists and what we may learn from it. These components, trained individually, are sufficient to produce a system that can analyze photos of cuneiform tablets from the Achaemenid period and provide useful transliteration suggestions to researchers. We evaluate the model's end-to-end performance on locating and classifying signs, providing a roadmap to a linguistically-aware transliteration system, then consider the model's potential utility when applied to other periods of cuneiform writing.
△ Less
Submitted 1 February, 2025; v1 submitted 2 June, 2023;
originally announced June 2023.
-
A Unified Framework for U-Net Design and Analysis
Authors:
Christopher Williams,
Fabian Falck,
George Deligiannidis,
Chris Holmes,
Arnaud Doucet,
Saifuddin Syed
Abstract:
U-Nets are a go-to, state-of-the-art neural architecture across numerous tasks for continuous signals on a square such as images and Partial Differential Equations (PDE), however their design and architecture is understudied. In this paper, we provide a framework for designing and analysing general U-Net architectures. We present theoretical results which characterise the role of the encoder and d…
▽ More
U-Nets are a go-to, state-of-the-art neural architecture across numerous tasks for continuous signals on a square such as images and Partial Differential Equations (PDE), however their design and architecture is understudied. In this paper, we provide a framework for designing and analysing general U-Net architectures. We present theoretical results which characterise the role of the encoder and decoder in a U-Net, their high-resolution scaling limits and their conjugacy to ResNets via preconditioning. We propose Multi-ResNets, U-Nets with a simplified, wavelet-based encoder without learnable parameters. Further, we show how to design novel U-Net architectures which encode function constraints, natural bases, or the geometry of the data. In diffusion models, our framework enables us to identify that high-frequency information is dominated by noise exponentially faster, and show how U-Nets with average pooling exploit this. In our experiments, we demonstrate how Multi-ResNets achieve competitive and often superior performance compared to classical U-Nets in image segmentation, PDE surrogate modelling, and generative modelling with diffusion models. Our U-Net framework paves the way to study the theoretical properties of U-Nets and design natural, scalable neural architectures for a multitude of problems beyond the square.
△ Less
Submitted 10 January, 2024; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Shifted-Windows Transformers for the Detection of Cerebral Aneurysms in Microsurgery
Authors:
Jinfan Zhou,
William Muirhead,
Simon C. Williams,
Danail Stoyanov,
Hani J. Marcus,
Evangelos B. Mazomenos
Abstract:
Purpose: Microsurgical Aneurysm Clipping Surgery (MACS) carries a high risk for intraoperative aneurysm rupture. Automated recognition of instances when the aneurysm is exposed in the surgical video would be a valuable reference point for neuronavigation, indicating phase transitioning and more importantly designating moments of high risk for rupture. This article introduces the MACS dataset conta…
▽ More
Purpose: Microsurgical Aneurysm Clipping Surgery (MACS) carries a high risk for intraoperative aneurysm rupture. Automated recognition of instances when the aneurysm is exposed in the surgical video would be a valuable reference point for neuronavigation, indicating phase transitioning and more importantly designating moments of high risk for rupture. This article introduces the MACS dataset containing 16 surgical videos with frame-level expert annotations and proposes a learning methodology for surgical scene understanding identifying video frames with the aneurysm present in the operating microscope's field-of-view. Methods: Despite the dataset imbalance (80% no presence, 20% presence) and developed without explicit annotations, we demonstrate the applicability of Transformer-based deep learning architectures (MACSSwin-T, vidMACSSwin-T) to detect the aneurysm and classify MACS frames accordingly. We evaluate the proposed models in multiple-fold cross-validation experiments with independent sets and in an unseen set of 15 images against 10 human experts (neurosurgeons). Results: Average (across folds) accuracy of 80.8% (range 78.5%-82.4%) and 87.1% (range 85.1%-91.3%) is obtained for the image- and video-level approach respectively, demonstrating that the models effectively learn the classification task. Qualitative evaluation of the models' class activation maps show these to be localized on the aneurysm's actual location. Depending on the decision threshold, MACSWin-T achieves 66.7% to 86.7% accuracy in the unseen images, compared to 82% of human raters, with moderate to strong correlation.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Augmenting Pathologists with NaviPath: Design and Evaluation of a Human-AI Collaborative Navigation System
Authors:
Hongyan Gu,
Chunxu Yang,
Mohammad Haeri,
Jing Wang,
Shirley Tang,
Wenzhong Yan,
Shujin He,
Christopher Kazu Williams,
Shino Magaki,
Xiang 'Anthony' Chen
Abstract:
Artificial Intelligence (AI) brings advancements to support pathologists in navigating high-resolution tumor images to search for pathology patterns of interest. However, existing AI-assisted tools have not realized this promised potential due to a lack of insight into pathology and HCI considerations for pathologists' navigation workflows in practice. We first conducted a formative study with six…
▽ More
Artificial Intelligence (AI) brings advancements to support pathologists in navigating high-resolution tumor images to search for pathology patterns of interest. However, existing AI-assisted tools have not realized this promised potential due to a lack of insight into pathology and HCI considerations for pathologists' navigation workflows in practice. We first conducted a formative study with six medical professionals in pathology to capture their navigation strategies. By incorporating our observations along with the pathologists' domain knowledge, we designed NaviPath -- a human-AI collaborative navigation system. An evaluation study with 15 medical professionals in pathology indicated that: (i) compared to the manual navigation, participants saw more than twice the number of pathological patterns in unit time with NaviPath, and (ii) participants achieved higher precision and recall against the AI and the manual navigation on average. Further qualitative analysis revealed that navigation was more consistent with NaviPath, which can improve the overall examination quality.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
Structured Generative Models for Scene Understanding
Authors:
Christopher K. I. Williams
Abstract:
This position paper argues for the use of \emph{structured generative models} (SGMs) for the understanding of static scenes. This requires the reconstruction of a 3D scene from an input image (or a set of multi-view images), whereby the contents of the image(s) are causally explained in terms of models of instantiated objects, each with their own type, shape, appearance and pose, along with global…
▽ More
This position paper argues for the use of \emph{structured generative models} (SGMs) for the understanding of static scenes. This requires the reconstruction of a 3D scene from an input image (or a set of multi-view images), whereby the contents of the image(s) are causally explained in terms of models of instantiated objects, each with their own type, shape, appearance and pose, along with global variables like scene lighting and camera parameters. This approach also requires scene models which account for the co-occurrences and inter-relationships of objects in a scene. The SGM approach has the merits that it is compositional and generative, which lead to interpretability and editability. \\\\ To pursue the SGM agenda, we need models for objects and scenes, and approaches to carry out inference. We first review models for objects, which include ``things'' (object categories that have a well defined shape), and ``stuff'' (categories which have amorphous spatial extent). We then move on to review \emph{scene models} which describe the inter-relationships of objects. Perhaps the most challenging problem for SGMs is \emph{inference} of the objects, lighting and camera parameters, and scene inter-relationships from input consisting of a single or multiple images. We conclude with a discussion of issues that need addressing to advance the SGM agenda.
△ Less
Submitted 2 September, 2024; v1 submitted 7 February, 2023;
originally announced February 2023.
-
A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs
Authors:
Fabian Falck,
Christopher Williams,
Dominic Danks,
George Deligiannidis,
Christopher Yau,
Chris Holmes,
Arnaud Doucet,
Matthew Willetts
Abstract:
U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds…
▽ More
U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving
Authors:
Qiao Sun,
Xin Huang,
Brian C. Williams,
Hang Zhao
Abstract:
Motion prediction is crucial in enabling safe motion planning for autonomous vehicles in interactive scenarios. It allows the planner to identify potential conflicts with other traffic agents and generate safe plans. Existing motion predictors often focus on reducing prediction errors, yet it remains an open question on how well they help identify the conflicts for the planner. In this paper, we e…
▽ More
Motion prediction is crucial in enabling safe motion planning for autonomous vehicles in interactive scenarios. It allows the planner to identify potential conflicts with other traffic agents and generate safe plans. Existing motion predictors often focus on reducing prediction errors, yet it remains an open question on how well they help identify the conflicts for the planner. In this paper, we evaluate state-of-the-art predictors through novel conflict-related metrics, such as the success rate of identifying conflicts. Surprisingly, the predictors suffer from a low success rate and thus lead to a large percentage of collisions when we test the prediction-planning system in an interactive simulator. To fill the gap, we propose a simple but effective alternative that combines a physics-based trajectory generator and a learning-based relation predictor to identify conflicts and infer conflict relations. We demonstrate that our predictor, P4P, achieves superior performance over existing learning-based predictors in realistic interactive driving scenarios from Waymo Open Motion Dataset.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
AI Assistants: A Framework for Semi-Automated Data Wrangling
Authors:
Tomas Petricek,
Gerrit J. J. van den Burg,
Alfredo Nazábal,
Taha Ceritli,
Ernesto Jiménez-Ruiz,
Christopher K. I. Williams
Abstract:
Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline…
▽ More
Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline data wrangling. An AI assistant guides the analyst through a specific data wrangling task by recommending a suitable data transformation that respects the constraints obtained through interaction with the analyst.
We formally define the structure of AI assistants and describe how existing tools that treat data cleaning as an optimization problem fit the definition. We implement AI assistants for four common data wrangling tasks and make AI assistants easily accessible to data analysts in an open-source notebook environment for data science, by leveraging the common structure they follow. We evaluate our AI assistants both quantitatively and qualitatively through three example scenarios. We show that the unified and interactive design makes it easy to perform tasks that would be difficult to do manually or with a fully automatic tool.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Incorporating Crowdsourced Annotator Distributions into Ensemble Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri
Authors:
Graham West,
Matthew I. Swindall,
Ben Keener,
Timothy Player,
Alex C. Williams,
James H. Brusuelas,
John F. Wallin
Abstract:
Performing classification on noisy, crowdsourced image datasets can prove challenging even for the best neural networks. Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped, individual characters from images of ancient Greek papyri - are strongly affected by both issues…
▽ More
Performing classification on noisy, crowdsourced image datasets can prove challenging even for the best neural networks. Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped, individual characters from images of ancient Greek papyri - are strongly affected by both issues. The application of ensemble modeling to such datasets can help identify images where the ground-truth is questionable and quantify the trustworthiness of those samples. As such, we apply stacked generalization consisting of nearly identical ResNets with different loss functions: one utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence (KLD). Both networks use labels drawn from a crowd-sourced consensus. This consensus is derived from a Normalized Distribution of Annotations (NDA) based on all annotations for a given character in the dataset. For the second network, the KLD is calculated with respect to the NDA. For our ensemble model, we apply a k-nearest neighbors model to the outputs of the CXE and KLD networks. Individually, the ResNet models have approximately 93% accuracy, while the ensemble model achieves an accuracy of > 95%, increasing the classification trustworthiness. We also perform an analysis of the Shannon entropy of the various models' output distributions to measure classification uncertainty. Our results suggest that entropy is useful for predicting model misclassifications.
△ Less
Submitted 26 January, 2024; v1 submitted 28 October, 2022;
originally announced October 2022.
-
InterSim: Interactive Traffic Simulation via Explicit Relation Modeling
Authors:
Qiao Sun,
Xin Huang,
Brian C. Williams,
Hang Zhao
Abstract:
Interactive traffic simulation is crucial to autonomous driving systems by enabling testing for planners in a more scalable and safe way compared to real-world road testing. Existing approaches learn an agent model from large-scale driving data to simulate realistic traffic scenarios, yet it remains an open question to produce consistent and diverse multi-agent interactive behaviors in crowded sce…
▽ More
Interactive traffic simulation is crucial to autonomous driving systems by enabling testing for planners in a more scalable and safe way compared to real-world road testing. Existing approaches learn an agent model from large-scale driving data to simulate realistic traffic scenarios, yet it remains an open question to produce consistent and diverse multi-agent interactive behaviors in crowded scenes. In this work, we present InterSim, an interactive traffic simulator for testing autonomous driving planners. Given a test plan trajectory from the ego agent, InterSim reasons about the interaction relations between the agents in the scene and generates realistic trajectories for each environment agent that are consistent with the relations. We train and validate our model on a large-scale interactive driving dataset. Experiment results show that InterSim achieves better simulation realism and reactivity in two simulation tasks compared to a state-of-the-art learning-based traffic simulator.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
The Elliptical Quartic Exponential Distribution: An Annular Distribution Obtained via Maximum Entropy
Authors:
Christopher K I Williams
Abstract:
This paper describes the Elliptical Quartic Exponential distribution in $\mathbb{R}^D$, obtained via a maximum entropy construction by imposing second and fourth moment constraints. I discuss relationships to related work, analytical expressions for the normalization constant and the entropy, and the conditional and marginal distributions.
This paper describes the Elliptical Quartic Exponential distribution in $\mathbb{R}^D$, obtained via a maximum entropy construction by imposing second and fourth moment constraints. I discuss relationships to related work, analytical expressions for the normalization constant and the entropy, and the conditional and marginal distributions.
△ Less
Submitted 9 October, 2022;
originally announced October 2022.
-
Multi-Task Dynamical Systems
Authors:
Alex Bird,
Christopher K. I. Williams,
Christopher Hawthorne
Abstract:
Time series datasets are often composed of a variety of sequences from the same domain, but from different entities, such as individuals, products, or organizations. We are interested in how time series models can be specialized to individual sequences (capturing the specific characteristics) while still retaining statistical power by sharing commonalities across the sequences. This paper describe…
▽ More
Time series datasets are often composed of a variety of sequences from the same domain, but from different entities, such as individuals, products, or organizations. We are interested in how time series models can be specialized to individual sequences (capturing the specific characteristics) while still retaining statistical power by sharing commonalities across the sequences. This paper describes the multi-task dynamical system (MTDS); a general methodology for extending multi-task learning (MTL) to time series models. Our approach endows dynamical systems with a set of hierarchical latent variables which can modulate all model parameters. To our knowledge, this is a novel development of MTL, and applies to time series both with and without control inputs. We apply the MTDS to motion-capture data of people walking in various styles using a multi-task recurrent neural network (RNN), and to patient drug-response data using a multi-task pharmacodynamic model.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
Hardware Trojan Threats to Cache Coherence in Modern 2.5D Chiplet Systems
Authors:
Gino A. Chacon,
Charles Williams,
Johann Knechtel,
Ozgur Sinanoglu,
Paul V. Gratz
Abstract:
As industry moves toward chiplet-based designs, the insertion of hardware Trojans poses a significant threat to the security of these systems. These systems rely heavily on cache coherence for coherent data communication, making coherence an attractive target. Critically, unlike prior work, which focuses only on malicious packet modifications, a Trojan attack that exploits coherence can modify dat…
▽ More
As industry moves toward chiplet-based designs, the insertion of hardware Trojans poses a significant threat to the security of these systems. These systems rely heavily on cache coherence for coherent data communication, making coherence an attractive target. Critically, unlike prior work, which focuses only on malicious packet modifications, a Trojan attack that exploits coherence can modify data in memory that was never touched and is not owned by the chiplet which contains the Trojan. Further, the Trojan need not even be physically between the victim and the memory controller to attack the victim's memory transactions. Here, we explore the fundamental attack vectors possible in chiplet-based systems and provide an example Trojan implementation capable of directly modifying victim data in memory. This work aims to highlight the need for developing mechanisms that can protect and secure the coherence scheme from these forms of attacks.
△ Less
Submitted 30 September, 2022;
originally announced October 2022.
-
Inference and Learning for Generative Capsule Models
Authors:
Alfredo Nazabal,
Nikolaos Tsagkas,
Christopher K. I. Williams
Abstract:
Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this paper we specify a generative model for such data, and derive a variational algorithm for inferring the transformation of each model object in a scene, and the assignments of observed parts to the objects. We derive a learning algorithm for the objec…
▽ More
Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this paper we specify a generative model for such data, and derive a variational algorithm for inferring the transformation of each model object in a scene, and the assignments of observed parts to the objects. We derive a learning algorithm for the object models, based on variational expectation maximization (Jordan et al., 1999). We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981). We apply these inference methods to (i) data generated from multiple geometric objects like squares and triangles ("constellations"), and (ii) data from a parts-based model of faces. Recent work by Kosiorek et al. (2019) has used amortized inference via stacked capsule autoencoders (SCAEs) to tackle this problem -- our results show that we significantly outperform them where we can make comparisons (on the constellations data).
△ Less
Submitted 21 October, 2022; v1 submitted 7 September, 2022;
originally announced September 2022.
-
Detecting Mitoses with a Convolutional Neural Network for MIDOG 2022 Challenge
Authors:
Hongyan Gu,
Mohammad Haeri,
Shuo Ni,
Christopher Kazu Williams,
Neda Zarrin-Khameh,
Shino Magaki,
Xiang 'Anthony' Chen
Abstract:
This work presents a mitosis detection method with only one vanilla Convolutional Neural Network (CNN). Our method consists of two steps: given an image, we first apply a CNN using a sliding window technique to extract patches that have mitoses; we then calculate each extracted patch's class activation map to obtain the mitosis's precise location. To increase the model performance on high-domain-v…
▽ More
This work presents a mitosis detection method with only one vanilla Convolutional Neural Network (CNN). Our method consists of two steps: given an image, we first apply a CNN using a sliding window technique to extract patches that have mitoses; we then calculate each extracted patch's class activation map to obtain the mitosis's precise location. To increase the model performance on high-domain-variance pathology images, we train the CNN with a data augmentation pipeline, a noise-tolerant loss that copes with unlabeled images, and a multi-rounded active learning strategy. In the MIDOG 2022 challenge, our approach, with an EfficientNet-b3 CNN model, achieved an overall F1 score of 0.7323 in the preliminary test phase, and 0.6847 in the final test phase (task 1). Our approach sheds light on the broader applicability of class activation maps for object detections in pathology images.
△ Less
Submitted 30 October, 2022; v1 submitted 26 August, 2022;
originally announced August 2022.
-
Hierarchical Constrained Stochastic Shortest Path Planning via Cost Budget Allocation
Authors:
Sungkweon Hong,
Brian C. Williams
Abstract:
Stochastic sequential decision making often requires hierarchical structure in the problem where each high-level action should be further planned with primitive states and actions. In addition, many real-world applications require a plan that satisfies constraints on the secondary costs such as risk measure or fuel consumption. In this paper, we propose a hierarchical constrained stochastic shorte…
▽ More
Stochastic sequential decision making often requires hierarchical structure in the problem where each high-level action should be further planned with primitive states and actions. In addition, many real-world applications require a plan that satisfies constraints on the secondary costs such as risk measure or fuel consumption. In this paper, we propose a hierarchical constrained stochastic shortest path problem (HC-SSP) that meets those two crucial requirements in a single framework. Although HC-SSP provides a useful framework to model such planning requirements in many real-world applications, the resulting problem has high complexity and makes it difficult to find an optimal solution fast which prevents user from applying it to real-time and risk-sensitive applications. To address this problem, we present an algorithm that iteratively allocates cost budget to lower level planning problems based on branch-and-bound scheme to find a feasible solution fast and incrementally update the incumbent solution. We demonstrate the proposed algorithm in an evacuation scenario and prove the advantage over a state-of-the-art mathematical programming based approach.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces
Authors:
Simon Dahan,
Hao Xu,
Logan Z. J. Williams,
Abdulah Fawaz,
Chunhui Yang,
Timothy S. Coalson,
Michelle C. Williams,
David E. Newby,
A. David Edwards,
Matthew F. Glasser,
Alistair A. Young,
Daniel Rueckert,
Emma C. Robinson
Abstract:
Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem…
▽ More
Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem, by proposing patching mechanisms for general surface meshes. Sequences of patches are then processed by a transformer encoder and used for classification or regression. We validate our method on a range of different biomedical surface domains and tasks: brain age prediction in the developing Human Connectome Project (dHCP), fluid intelligence prediction in the Human Connectome Project (HCP), and coronary artery calcium score classification using surfaces from the Scottish Computed Tomography of the Heart (SCOT-HEART) dataset, and investigate the impact of pretraining and data augmentation on model performance. Results suggest that Surface Vision Transformers (SiT) demonstrate consistent improvement over geometric deep learning methods for brain age and fluid intelligence prediction and achieve comparable performance on calcium score classification to standard metrics used in clinical practice. Furthermore, analysis of transformer attention maps offers clear and individualised predictions of the features driving each task. Code is available on Github: https://github.com/metrics-lab/surface-vision-transformers
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
On Suspicious Coincidences and Pointwise Mutual Information
Authors:
Christopher K. I. Williams
Abstract:
Barlow (1985) hypothesized that the co-occurrence of two events $A$ and $B$ is "suspicious" if $P(A,B) \gg P(A) P(B)$. We first review classical measures of association for $2 \times 2$ contingency tables, including Yule's $Y$ (Yule, 1912), which depends only on the odds ratio $λ$, and is independent of the marginal probabilities of the table. We then discuss the mutual information (MI) and pointw…
▽ More
Barlow (1985) hypothesized that the co-occurrence of two events $A$ and $B$ is "suspicious" if $P(A,B) \gg P(A) P(B)$. We first review classical measures of association for $2 \times 2$ contingency tables, including Yule's $Y$ (Yule, 1912), which depends only on the odds ratio $λ$, and is independent of the marginal probabilities of the table. We then discuss the mutual information (MI) and pointwise mutual information (PMI), which depend on the ratio $P(A,B)/P(A)P(B)$, as measures of association. We show that, once the effect of the marginals is removed, MI and PMI behave similarly to $Y$ as functions of $λ$. The pointwise mutual information is used extensively in some research communities for flagging suspicious coincidences, but it is important to bear in mind the sensitivity of the PMI to the marginals, with increased scores for sparser events.
△ Less
Submitted 2 March, 2023; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Align-Deform-Subtract: An Interventional Framework for Explaining Object Differences
Authors:
Cian Eastwood,
Li Nanbo,
Christopher K. I. Williams
Abstract:
Given two object images, how can we explain their differences in terms of the underlying object properties? To address this question, we propose Align-Deform-Subtract (ADS) -- an interventional framework for explaining object differences. By leveraging semantic alignments in image-space as counterfactual interventions on the underlying object properties, ADS iteratively quantifies and removes diff…
▽ More
Given two object images, how can we explain their differences in terms of the underlying object properties? To address this question, we propose Align-Deform-Subtract (ADS) -- an interventional framework for explaining object differences. By leveraging semantic alignments in image-space as counterfactual interventions on the underlying object properties, ADS iteratively quantifies and removes differences in object properties. The result is a set of "disentangled" error measures which explain object differences in terms of the underlying properties. Experiments on real and synthetic data illustrate the efficacy of the framework.
△ Less
Submitted 20 July, 2022; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Cooperative Task and Motion Planning for Multi-Arm Assembly Systems
Authors:
Jingkai Chen,
Jiaoyang Li,
Yijiang Huang,
Caelan Garrett,
Dawei Sun,
Chuchu Fan,
Andreas Hofmann,
Caitlin Mueller,
Sven Koenig,
Brian C. Williams
Abstract:
Multi-robot assembly systems are becoming increasingly appealing in manufacturing due to their ability to automatically, flexibly, and quickly construct desired structural designs. However, effectively planning for these systems in a manner that ensures each robot is simultaneously productive, and not idle, is challenging due to (1) the close proximity that the robots must operate in to manipulate…
▽ More
Multi-robot assembly systems are becoming increasingly appealing in manufacturing due to their ability to automatically, flexibly, and quickly construct desired structural designs. However, effectively planning for these systems in a manner that ensures each robot is simultaneously productive, and not idle, is challenging due to (1) the close proximity that the robots must operate in to manipulate the structure and (2) the inherent structural partial orderings on when each part can be installed. In this paper, we present a task and motion planning framework that jointly plans safe, low-makespan plans for a team of robots to assemble complex spatial structures. Our framework takes a hierarchical approach that, at the high level, uses Mixed-integer Linear Programs to compute an abstract plan comprised of an allocation of robots to tasks subject to precedence constraints and, at the low level, builds on a state-of-the-art algorithm for Multi-Agent Path Finding to plan collision-free robot motions that realize this abstract plan. Critical to our approach is the inclusion of certain collision constraints and movement durations during high-level planning, which better informs the search for abstract plans that are likely to be both feasible and low-makespan while keeping the search tractable. We demonstrate our planning system on several challenging assembly domains with several (sometimes heterogeneous) robots with grippers or suction plates for assembling structures with up to 23 objects involving Lego bricks, bars, plates, or irregularly shaped blocks.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction
Authors:
Qiao Sun,
Xin Huang,
Junru Gu,
Brian C. Williams,
Hang Zhao
Abstract:
Predicting future motions of road participants is an important task for driving autonomously in urban scenes. Existing models excel at predicting marginal trajectories for single agents, yet it remains an open question to jointly predict scene compliant trajectories over multiple agents. The challenge is due to exponentially increasing prediction space as a function of the number of agents. In thi…
▽ More
Predicting future motions of road participants is an important task for driving autonomously in urban scenes. Existing models excel at predicting marginal trajectories for single agents, yet it remains an open question to jointly predict scene compliant trajectories over multiple agents. The challenge is due to exponentially increasing prediction space as a function of the number of agents. In this work, we exploit the underlying relations between interacting agents and decouple the joint prediction problem into marginal prediction problems. Our proposed approach M2I first classifies interacting agents as pairs of influencers and reactors, and then leverages a marginal prediction model and a conditional prediction model to predict trajectories for the influencers and reactors, respectively. The predictions from interacting agents are combined and selected according to their joint likelihoods. Experiments show that our simple but effective approach achieves state-of-the-art performance on the Waymo Open Motion Dataset interactive prediction benchmark.
△ Less
Submitted 27 March, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Persistent Animal Identification Leveraging Non-Visual Markers
Authors:
Michael P. J. Camilleri,
Li Zhang,
Rasneer S. Bains,
Andrew Zisserman,
Christopher K. I. Williams
Abstract:
Our objective is to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time, as a precursor to automated behaviour recognition for biological research. This is a very challenging problem due to (i) the lack of distinguishing visual features for each mouse, and (ii) the close confines of the scene with constant occlusion, making standard visual tracki…
▽ More
Our objective is to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time, as a precursor to automated behaviour recognition for biological research. This is a very challenging problem due to (i) the lack of distinguishing visual features for each mouse, and (ii) the close confines of the scene with constant occlusion, making standard visual tracking approaches unusable. However, a coarse estimate of each mouse's location is available from a unique RFID implant, so there is the potential to optimally combine information from (weak) tracking with coarse information on identity. To achieve our objective, we make the following key contributions: (a) the formulation of the object identification problem as an assignment problem (solved using Integer Linear Programming), and (b) a novel probabilistic model of the affinity between tracklets and RFID data. The latter is a crucial part of the model, as it provides a principled probabilistic treatment of object detections given coarse localisation. Our approach achieves 77% accuracy on this animal identification problem, and is able to reject spurious detections when the animals are hidden.
△ Less
Submitted 19 July, 2023; v1 submitted 13 December, 2021;
originally announced December 2021.
-
Identifying the Units of Measurement in Tabular Data
Authors:
Taha Ceritli,
Christopher K. I. Williams
Abstract:
We consider the problem of identifying the units of measurement in a data column that contains both numeric values and unit symbols in each row, e.g., "5.2 l", "7 pints". In this case we seek to identify the dimension of the column (e.g. volume) and relate the unit symbols to valid units (e.g. litre, pint) obtained from a knowledge graph. Below we present PUC, a Probabilistic Unit Canonicalizer th…
▽ More
We consider the problem of identifying the units of measurement in a data column that contains both numeric values and unit symbols in each row, e.g., "5.2 l", "7 pints". In this case we seek to identify the dimension of the column (e.g. volume) and relate the unit symbols to valid units (e.g. litre, pint) obtained from a knowledge graph. Below we present PUC, a Probabilistic Unit Canonicalizer that can accurately identify the units of measurement, extract semantic descriptions of quantitative data columns and canonicalize their entries. We present the first messy real-world tabular datasets annotated for units of measurement, which can enable and accelerate the research in this area. Our experiments on these datasets show that PUC achieves better results than existing solutions.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
ptype-cat: Inferring the Type and Values of Categorical Variables
Authors:
Taha Ceritli,
Christopher K. I. Williams
Abstract:
Type inference is the task of identifying the type of values in a data column and has been studied extensively in the literature. Most existing type inference methods support data types such as Boolean, date, float, integer and string. However, these methods do not consider non-Boolean categorical variables, where there are more than two possible values encoded by integers or strings. Therefore, s…
▽ More
Type inference is the task of identifying the type of values in a data column and has been studied extensively in the literature. Most existing type inference methods support data types such as Boolean, date, float, integer and string. However, these methods do not consider non-Boolean categorical variables, where there are more than two possible values encoded by integers or strings. Therefore, such columns are annotated either as integer or string rather than categorical, and need to be transformed into categorical manually by the user. In this paper, we propose a probabilistic type inference method that can identify the general categorical data type (including non-Boolean variables). Additionally, we identify the possible values of each categorical variable by adapting the existing type inference method ptype. Combining these methods, we present ptype-cat which achieves better results than existing applicable solutions.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
TIP: Task-Informed Motion Prediction for Intelligent Vehicles
Authors:
Xin Huang,
Guy Rosman,
Ashkan Jasour,
Stephen G. McGill,
John J. Leonard,
Brian C. Williams
Abstract:
When predicting trajectories of road agents, motion predictors usually approximate the future distribution by a limited number of samples. This constraint requires the predictors to generate samples that best support the task given task specifications. However, existing predictors are often optimized and evaluated via task-agnostic measures without accounting for the use of predictions in downstre…
▽ More
When predicting trajectories of road agents, motion predictors usually approximate the future distribution by a limited number of samples. This constraint requires the predictors to generate samples that best support the task given task specifications. However, existing predictors are often optimized and evaluated via task-agnostic measures without accounting for the use of predictions in downstream tasks, and thus could result in sub-optimal task performance.
In this paper, we propose a task-informed motion prediction model that better supports the tasks through its predictions, by jointly reasoning about prediction accuracy and the utility of the downstream tasks, which is commonly used to evaluate the task performance. The task utility function does not require the full task information, but rather a specification of the utility of the task, resulting in predictors that serve a wide range of downstream tasks. We demonstrate our approach on two use cases of common decision making tasks and their utility functions, in the context of autonomous driving and parallel autonomy. Experiment results show that our predictor produces accurate predictions that improve the task performance by a large margin in both tasks when compared to task-agnostic baselines on the Waymo Open Motion dataset.
△ Less
Submitted 26 May, 2022; v1 submitted 17 October, 2021;
originally announced October 2021.
-
HYPER: Learned Hybrid Trajectory Prediction via Factored Inference and Adaptive Sampling
Authors:
Xin Huang,
Guy Rosman,
Igor Gilitschenski,
Ashkan Jasour,
Stephen G. McGill,
John J. Leonard,
Brian C. Williams
Abstract:
Modeling multi-modal high-level intent is important for ensuring diversity in trajectory prediction. Existing approaches explore the discrete nature of human intent before predicting continuous trajectories, to improve accuracy and support explainability. However, these approaches often assume the intent to remain fixed over the prediction horizon, which is problematic in practice, especially over…
▽ More
Modeling multi-modal high-level intent is important for ensuring diversity in trajectory prediction. Existing approaches explore the discrete nature of human intent before predicting continuous trajectories, to improve accuracy and support explainability. However, these approaches often assume the intent to remain fixed over the prediction horizon, which is problematic in practice, especially over longer horizons. To overcome this limitation, we introduce HYPER, a general and expressive hybrid prediction framework that models evolving human intent. By modeling traffic agents as a hybrid discrete-continuous system, our approach is capable of predicting discrete intent changes over time. We learn the probabilistic hybrid model via a maximum likelihood estimation problem and leverage neural proposal distributions to sample adaptively from the exponentially growing discrete space. The overall approach affords a better trade-off between accuracy and coverage. We train and validate our model on the Argoverse dataset, and demonstrate its effectiveness through comprehensive ablation studies and comparisons with state-of-the-art models.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Fast nonlinear risk assessment for autonomous vehicles using learned conditional probabilistic models of agent futures
Authors:
Ashkan Jasour,
Xin Huang,
Allen Wang,
Brian C. Williams
Abstract:
This paper presents fast non-sampling based methods to assess the risk for trajectories of autonomous vehicles when probabilistic predictions of other agents' futures are generated by deep neural networks (DNNs). The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models to predict both agent positions and control…
▽ More
This paper presents fast non-sampling based methods to assess the risk for trajectories of autonomous vehicles when probabilistic predictions of other agents' futures are generated by deep neural networks (DNNs). The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models to predict both agent positions and control inputs conditioned on the scene contexts. We show that the problem of risk assessment when Gaussian mixture models (GMMs) of agent positions are learned can be solved rapidly to arbitrary levels of accuracy with existing numerical methods. To address the problem of risk assessment for non-Gaussian mixture models of agent position, we propose finding upper bounds on risk using nonlinear Chebyshev's Inequality and sums-of-squares (SOS) programming; they are both of interest as the former is much faster while the latter can be arbitrarily tight. These approaches only require higher order statistical moments of agent positions to determine upper bounds on risk. To perform risk assessment when models are learned for agent control inputs as opposed to positions, we propagate the moments of uncertain control inputs through the nonlinear motion dynamics to obtain the exact moments of uncertain position over the planning horizon. To this end, we construct deterministic linear dynamical systems that govern the exact time evolution of the moments of uncertain position in the presence of uncertain control inputs. The presented methods are demonstrated on realistic predictions from DNNs trained on the Argoverse and CARLA datasets and are shown to be effective for rapidly assessing the probability of low probability events.
△ Less
Submitted 22 September, 2021; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Source-Free Adaptation to Measurement Shift via Bottom-Up Feature Restoration
Authors:
Cian Eastwood,
Ian Mason,
Christopher K. I. Williams,
Bernhard Schölkopf
Abstract:
Source-free domain adaptation (SFDA) aims to adapt a model trained on labelled data in a source domain to unlabelled data in a target domain without access to the source-domain data during adaptation. Existing methods for SFDA leverage entropy-minimization techniques which: (i) apply only to classification; (ii) destroy model calibration; and (iii) rely on the source model achieving a good level o…
▽ More
Source-free domain adaptation (SFDA) aims to adapt a model trained on labelled data in a source domain to unlabelled data in a target domain without access to the source-domain data during adaptation. Existing methods for SFDA leverage entropy-minimization techniques which: (i) apply only to classification; (ii) destroy model calibration; and (iii) rely on the source model achieving a good level of feature-space class-separation in the target domain. We address these issues for a particularly pervasive type of domain shift called measurement shift which can be resolved by restoring the source features rather than extracting new ones. In particular, we propose Feature Restoration (FR) wherein we: (i) store a lightweight and flexible approximation of the feature distribution under the source data; and (ii) adapt the feature-extractor such that the approximate feature distribution under the target data realigns with that saved on the source. We additionally propose a bottom-up training scheme which boosts performance, which we call Bottom-Up Feature Restoration (BUFR). On real and synthetic data, we demonstrate that BUFR outperforms existing SFDA methods in terms of accuracy, calibration, and data efficiency, while being less reliant on the performance of the source model in the target domain.
△ Less
Submitted 17 March, 2022; v1 submitted 12 July, 2021;
originally announced July 2021.
-
On Memorization in Probabilistic Deep Generative Models
Authors:
Gerrit J. J. van den Burg,
Christopher K. I. Williams
Abstract:
Recent advances in deep generative models have led to impressive results in a variety of application domains. Motivated by the possibility that deep learning models might memorize part of the input data, there have been increased efforts to understand how memorization arises. In this work, we extend a recently proposed measure of memorization for supervised learning (Feldman, 2019) to the unsuperv…
▽ More
Recent advances in deep generative models have led to impressive results in a variety of application domains. Motivated by the possibility that deep learning models might memorize part of the input data, there have been increased efforts to understand how memorization arises. In this work, we extend a recently proposed measure of memorization for supervised learning (Feldman, 2019) to the unsupervised density estimation problem and adapt it to be more computationally efficient. Next, we present a study that demonstrates how memorization can occur in probabilistic deep generative models such as variational autoencoders. This reveals that the form of memorization to which these models are susceptible differs fundamentally from mode collapse and overfitting. Furthermore, we show that the proposed memorization score measures a phenomenon that is not captured by commonly-used nearest neighbor tests. Finally, we discuss several strategies that can be used to limit memorization in practice. Our work thus provides a framework for understanding problematic memorization in probabilistic generative models.
△ Less
Submitted 29 December, 2021; v1 submitted 6 June, 2021;
originally announced June 2021.