-
Ethics Statements in AI Music Papers: The Effective and the Ineffective
Authors:
Julia Barnett,
Patrick O'Reilly,
Jason Brent Smith,
Annie Chu,
Bryan Pardo
Abstract:
While research in AI methods for music generation and analysis has grown in scope and impact, AI researchers' engagement with the ethical consequences of this work has not kept pace. To encourage such engagement, many publication venues have introduced optional or required ethics statements for AI research papers. Though some authors use these ethics statements to critically engage with the broade…
▽ More
While research in AI methods for music generation and analysis has grown in scope and impact, AI researchers' engagement with the ethical consequences of this work has not kept pace. To encourage such engagement, many publication venues have introduced optional or required ethics statements for AI research papers. Though some authors use these ethics statements to critically engage with the broader implications of their research, we find that the majority of ethics statements in the AI music literature do not appear to be effectively utilized for this purpose. In this work, we conduct a review of ethics statements across ISMIR, NIME, and selected prominent works in AI music from the past five years. We then offer suggestions for both audio conferences and researchers for engaging with ethics statements in ways that foster meaningful reflection rather than formulaic compliance.
△ Less
Submitted 29 September, 2025;
originally announced September 2025.
-
The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling
Authors:
Patrick O'Reilly,
Julia Barnett,
Hugo Flores García,
Annie Chu,
Nathan Pruyne,
Prem Seetharaman,
Bryan Pardo
Abstract:
Musicians and nonmusicians alike use rhythmic sound gestures, such as tapping and beatboxing, to express drum patterns. While these gestures effectively communicate musical ideas, realizing these ideas as fully-produced drum recordings can be time-consuming, potentially disrupting many creative workflows. To bridge this gap, we present TRIA (The Rhythm In Anything), a masked transformer model for…
▽ More
Musicians and nonmusicians alike use rhythmic sound gestures, such as tapping and beatboxing, to express drum patterns. While these gestures effectively communicate musical ideas, realizing these ideas as fully-produced drum recordings can be time-consuming, potentially disrupting many creative workflows. To bridge this gap, we present TRIA (The Rhythm In Anything), a masked transformer model for mapping rhythmic sound gestures to high-fidelity drum recordings. Given an audio prompt of the desired rhythmic pattern and a second prompt to represent drumkit timbre, TRIA produces audio of a drumkit playing the desired rhythm (with appropriate elaborations) in the desired timbre. Subjective and objective evaluations show that a TRIA model trained on less than 10 hours of publicly-available drum data can generate high-quality, faithful realizations of sound gestures across a wide range of timbres in a zero-shot manner.
△ Less
Submitted 19 September, 2025;
originally announced September 2025.
-
Scenarios in Computing Research: A Systematic Review of the Use of Scenario Methods for Exploring the Future of Computing Technologies in Society
Authors:
Julia Barnett,
Kimon Kieslich,
Jasmine Sinchai,
Nicholas Diakopoulos
Abstract:
Scenario building is an established method to anticipate the future of emerging technologies. Its primary goal is to use narratives to map future trajectories of technology development and sociotechnical adoption. Following this process, risks and benefits can be identified early on, and strategies can be developed that strive for desirable futures. In recent years, computer science has adopted th…
▽ More
Scenario building is an established method to anticipate the future of emerging technologies. Its primary goal is to use narratives to map future trajectories of technology development and sociotechnical adoption. Following this process, risks and benefits can be identified early on, and strategies can be developed that strive for desirable futures. In recent years, computer science has adopted this method and applied it to various technologies, including Artificial Intelligence (AI). Because computing technologies play such an important role in shaping modern societies, it is worth exploring how scenarios are being used as an anticipatory tool in the field -- and what possible traditional uses of scenarios are not yet covered but have the potential to enrich the field. We address this gap by conducting a systematic literature review on the use of scenario building methods in computer science over the last decade (n = 59). We guide the review along two main questions. First, we aim to uncover how scenarios are used in computing literature, focusing especially on the rationale for why scenarios are used. Second, in following the potential of scenario building to enhance inclusivity in research, we dive deeper into the participatory element of the existing scenario building literature in computer science.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Authors:
Jon Donnelly,
Zhicheng Guo,
Alina Jade Barnett,
Hayden McTavish,
Chaofan Chen,
Cynthia Rudin
Abstract:
Interpretability is critical for machine learning models in high-stakes settings because it allows users to verify the model's reasoning. In computer vision, prototypical part models (ProtoPNets) have become the dominant model type to meet this need. Users can easily identify flaws in ProtoPNets, but fixing problems in a ProtoPNet requires slow, difficult retraining that is not guaranteed to resol…
▽ More
Interpretability is critical for machine learning models in high-stakes settings because it allows users to verify the model's reasoning. In computer vision, prototypical part models (ProtoPNets) have become the dominant model type to meet this need. Users can easily identify flaws in ProtoPNets, but fixing problems in a ProtoPNet requires slow, difficult retraining that is not guaranteed to resolve the issue. This problem is called the "interaction bottleneck." We solve the interaction bottleneck for ProtoPNets by simultaneously finding many equally good ProtoPNets (i.e., a draw from a "Rashomon set"). We show that our framework - called Proto-RSet - quickly produces many accurate, diverse ProtoPNets, allowing users to correct problems in real time while maintaining performance guarantees with respect to the training set. We demonstrate the utility of this method in two settings: 1) removing synthetic bias introduced to a bird identification model and 2) debugging a skin cancer identification model. This tool empowers non-machine-learning experts, such as clinicians or domain experts, to quickly refine and correct machine learning models without repeated retraining by machine learning experts.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Envisioning Stakeholder-Action Pairs to Mitigate Negative Impacts of AI: A Participatory Approach to Inform Policy Making
Authors:
Julia Barnett,
Kimon Kieslich,
Natali Helberger,
Nicholas Diakopoulos
Abstract:
The potential for negative impacts of AI has rapidly become more pervasive around the world, and this has intensified a need for responsible AI governance. While many regulatory bodies endorse risk-based approaches and a multitude of risk mitigation practices are proposed by companies and academic scholars, these approaches are commonly expert-centered and thus lack the inclusion of a significant…
▽ More
The potential for negative impacts of AI has rapidly become more pervasive around the world, and this has intensified a need for responsible AI governance. While many regulatory bodies endorse risk-based approaches and a multitude of risk mitigation practices are proposed by companies and academic scholars, these approaches are commonly expert-centered and thus lack the inclusion of a significant group of stakeholders. Ensuring that AI policies align with democratic expectations requires methods that prioritize the voices and needs of those impacted. In this work we develop a participative and forward-looking approach to inform policy-makers and academics that grounds the needs of lay stakeholders at the forefront and enriches the development of risk mitigation strategies. Our approach (1) maps potential mitigation and prevention strategies of negative AI impacts that assign responsibility to various stakeholders, (2) explores the importance and prioritization thereof in the eyes of laypeople, and (3) presents these insights in policy fact sheets, i.e., a digestible format for informing policy processes. We emphasize that this approach is not targeted towards replacing policy-makers; rather our aim is to present an informative method that enriches mitigation strategies and enables a more participatory approach to policy development.
△ Less
Submitted 24 January, 2025;
originally announced February 2025.
-
Event Segmentation Applications in Large Language Model Enabled Automated Recall Assessments
Authors:
Ryan A. Panela,
Alex J. Barnett,
Morgan D. Barense,
Björn Herrmann
Abstract:
Understanding how individuals perceive and recall information in their natural environments is critical to understanding potential failures in perception (e.g., sensory loss) and memory (e.g., dementia). Event segmentation, the process of identifying distinct events within dynamic environments, is central to how we perceive, encode, and recall experiences. This cognitive process not only influence…
▽ More
Understanding how individuals perceive and recall information in their natural environments is critical to understanding potential failures in perception (e.g., sensory loss) and memory (e.g., dementia). Event segmentation, the process of identifying distinct events within dynamic environments, is central to how we perceive, encode, and recall experiences. This cognitive process not only influences moment-to-moment comprehension but also shapes event specific memory. Despite the importance of event segmentation and event memory, current research methodologies rely heavily on human judgements for assessing segmentation patterns and recall ability, which are subjective and time-consuming. A few approaches have been introduced to automate event segmentation and recall scoring, but validity with human responses and ease of implementation require further advancements. To address these concerns, we leverage Large Language Models (LLMs) to automate event segmentation and assess recall, employing chat completion and text-embedding models, respectively. We validated these models against human annotations and determined that LLMs can accurately identify event boundaries, and that human event segmentation is more consistent with LLMs than among humans themselves. Using this framework, we advanced an automated approach for recall assessments which revealed semantic similarity between segmented narrative events and participant recall can estimate recall performance. Our findings demonstrate that LLMs can effectively simulate human segmentation patterns and provide recall evaluations that are a scalable alternative to manual scoring. This research opens novel avenues for studying the intersection between perception, memory, and cognitive impairment using methodologies driven by artificial intelligence.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Sound Check: Auditing Audio Datasets
Authors:
William Agnew,
Julia Barnett,
Annie Chu,
Rachel Hong,
Michael Feffer,
Robin Netzorg,
Harry H. Jiang,
Ezra Awumey,
Sauvik Das
Abstract:
Generative audio models are rapidly advancing in both capabilities and public utilization -- several powerful generative audio models have readily available open weights, and some tech companies have released high quality generative audio products. Yet, while prior work has enumerated many ethical issues stemming from the data on which generative visual and textual models have been trained, we hav…
▽ More
Generative audio models are rapidly advancing in both capabilities and public utilization -- several powerful generative audio models have readily available open weights, and some tech companies have released high quality generative audio products. Yet, while prior work has enumerated many ethical issues stemming from the data on which generative visual and textual models have been trained, we have little understanding of similar issues with generative audio datasets, including those related to bias, toxicity, and intellectual property. To bridge this gap, we conducted a literature review of hundreds of audio datasets and selected seven of the most prominent to audit in more detail. We found that these datasets are biased against women, contain toxic stereotypes about marginalized communities, and contain significant amounts of copyrighted work. To enable artists to see if they are in popular audio datasets and facilitate exploration of the contents of these datasets, we developed a web tool audio datasets exploration tool at https://audio-audit.vercel.app.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Integrating Artificial Intelligence Models and Synthetic Image Data for Enhanced Asset Inspection and Defect Identification
Authors:
Reddy Mandati,
Vladyslav Anderson,
Po-chen Chen,
Ankush Agarwal,
Tatjana Dokic,
David Barnard,
Michael Finn,
Jesse Cromer,
Andrew Mccauley,
Clay Tutaj,
Neha Dave,
Bobby Besharati,
Jamie Barnett,
Timothy Krall
Abstract:
In the past utilities relied on in-field inspections to identify asset defects. Recently, utilities have started using drone-based inspections to enhance the field-inspection process. We consider a vast repository of drone images, providing a wealth of information about asset health and potential issues. However, making the collected imagery data useful for automated defect detection requires sign…
▽ More
In the past utilities relied on in-field inspections to identify asset defects. Recently, utilities have started using drone-based inspections to enhance the field-inspection process. We consider a vast repository of drone images, providing a wealth of information about asset health and potential issues. However, making the collected imagery data useful for automated defect detection requires significant manual labeling effort. We propose a novel solution that combines synthetic asset defect images with manually labeled drone images. This solution has several benefits: improves performance of defect detection, reduces the number of hours spent on manual labeling, and enables the capability to generate realistic images of rare defects where not enough real-world data is available. We employ a workflow that combines 3D modeling tools such as Maya and Unreal Engine to create photorealistic 3D models and 2D renderings of defective assets and their surroundings. These synthetic images are then integrated into our training pipeline augmenting the real data. This study implements an end-to-end Artificial Intelligence solution to detect assets and asset defects from the combined imagery repository. The unique contribution of this research lies in the application of advanced computer vision models and the generation of photorealistic 3D renderings of defective assets, aiming to transform the asset inspection process. Our asset detection model has achieved an accuracy of 92 percent, we achieved a performance lift of 67 percent when introducing approximately 2,000 synthetic images of 2k resolution. In our tests, the defect detection model achieved an accuracy of 73 percent across two batches of images. Our analysis demonstrated that synthetic data can be successfully used in place of real-world manually labeled data to train defect detection model.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
The role of interface boundary conditions and sampling strategies for Schwarz-based coupling of projection-based reduced order models
Authors:
Christopher R. Wentland,
Francesco Rizzi,
Joshua Barnett,
Irina Tezaur
Abstract:
This paper presents and evaluates a framework for the coupling of subdomain-local projection-based reduced order models (PROMs) using the Schwarz alternating method following a domain decomposition (DD) of the spatial domain on which a given problem of interest is posed. In this approach, the solution on the full domain is obtained via an iterative process in which a sequence of subdomain-local pr…
▽ More
This paper presents and evaluates a framework for the coupling of subdomain-local projection-based reduced order models (PROMs) using the Schwarz alternating method following a domain decomposition (DD) of the spatial domain on which a given problem of interest is posed. In this approach, the solution on the full domain is obtained via an iterative process in which a sequence of subdomain-local problems are solved, with information propagating between subdomains through transmission boundary conditions (BCs). We explore several new directions involving the Schwarz alternating method aimed at maximizing the method's efficiency and flexibility, and demonstrate it on three challenging two-dimensional nonlinear hyperbolic problems: the shallow water equations, Burgers' equation, and the compressible Euler equations. We demonstrate that, for a cell-centered finite volume discretization and a non-overlapping DD, it is possible to obtain a stable and accurate coupled model utilizing Dirichlet-Dirichlet (rather than Robin-Robin or alternating Dirichlet-Neumann) transmission BCs on the subdomain boundaries. We additionally explore the impact of boundary sampling when utilizing the Schwarz alternating method to couple subdomain-local hyper-reduced PROMs. Our numerical results suggest that the proposed methodology has the potential to improve PROM accuracy by enabling the spatial localization of these models via domain decomposition, and achieve up to two orders of magnitude speedup over equivalent coupled full order model solutions and moderate speedups over analogous monolithic solutions.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects
Authors:
Annie Chu,
Patrick O'Reilly,
Julia Barnett,
Bryan Pardo
Abstract:
This work introduces Text2FX, a method that leverages CLAP embeddings and differentiable digital signal processing to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e.g., "make this sound in-your-face and bold"). Text2FX operates without retraining any models, relying instead on single-instance optimization within the existing embeddi…
▽ More
This work introduces Text2FX, a method that leverages CLAP embeddings and differentiable digital signal processing to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e.g., "make this sound in-your-face and bold"). Text2FX operates without retraining any models, relying instead on single-instance optimization within the existing embedding space, thus enabling a flexible, scalable approach to open-vocabulary sound transformations through interpretable and disentangled FX manipulation. We show that CLAP encodes valuable information for controlling audio effects and propose two optimization approaches using CLAP to map text to audio effect parameters. While we demonstrate with CLAP, this approach is applicable to any shared text-audio embedding space. Similarly, while we demonstrate with equalization and reverberation, any differentiable audio effect may be controlled. We conduct a listener study with diverse text prompts and source audio to evaluate the quality and alignment of these methods with human perception. Demos and code are available at anniejchu.github.io/text2fx.
△ Less
Submitted 20 February, 2025; v1 submitted 27 September, 2024;
originally announced September 2024.
-
A Confidence Interval for the $\ell_2$ Expected Calibration Error
Authors:
Yan Sun,
Pratik Chaudhari,
Ian J. Barnett,
Edgar Dobriban
Abstract:
Recent advances in machine learning have significantly improved prediction accuracy in various applications. However, ensuring the calibration of probabilistic predictions remains a significant challenge. Despite efforts to enhance model calibration, the rigorous statistical evaluation of model calibration remains less explored. In this work, we develop confidence intervals the $\ell_2$ Expected C…
▽ More
Recent advances in machine learning have significantly improved prediction accuracy in various applications. However, ensuring the calibration of probabilistic predictions remains a significant challenge. Despite efforts to enhance model calibration, the rigorous statistical evaluation of model calibration remains less explored. In this work, we develop confidence intervals the $\ell_2$ Expected Calibration Error (ECE). We consider top-1-to-$k$ calibration, which includes both the popular notion of confidence calibration as well as full calibration. For a debiased estimator of the ECE, we show asymptotic normality, but with different convergence rates and asymptotic variances for calibrated and miscalibrated models. We develop methods to construct asymptotically valid confidence intervals for the ECE, accounting for this behavior as well as non-negativity. Our theoretical findings are supported through extensive experiments, showing that our methods produce valid confidence intervals with shorter lengths compared to those obtained by resampling-based methods.
△ Less
Submitted 2 August, 2025; v1 submitted 16 August, 2024;
originally announced August 2024.
-
This Looks Better than That: Better Interpretable Models with ProtoPNeXt
Authors:
Frank Willard,
Luke Moffett,
Emmanuel Mokel,
Jon Donnelly,
Stark Guo,
Julia Yang,
Giyoung Kim,
Alina Jade Barnett,
Cynthia Rudin
Abstract:
Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), w…
▽ More
Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), we create a new framework for integrating components of prototypical-part models -- ProtoPNeXt. Using ProtoPNeXt, we show that applying Bayesian hyperparameter tuning and an angular prototype similarity metric to the original ProtoPNet is sufficient to produce new state-of-the-art accuracy for prototypical-part models on CUB-200 across multiple backbones. We further deploy this framework to jointly optimize for accuracy and prototype interpretability as measured by metrics included in ProtoPNeXt. Using the same resources, this produces models with substantially superior semantics and changes in accuracy between +1.3% and -1.5%. The code and trained models will be made publicly available upon publication.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography
Authors:
Julia Yang,
Alina Jade Barnett,
Jon Donnelly,
Satvik Kishore,
Jerry Fang,
Fides Regina Schwartz,
Chaofan Chen,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency t…
▽ More
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency to these formerly black boxes by utilizing prototypes for case-based explanations, achieving high accuracy in applications including mammography. However, these models struggle with precise feature localization, reasoning on large portions of an image when only a small part is relevant. This paper addresses this gap by proposing a novel multi-scale interpretable deep learning model for mammographic mass margin classification. Our contribution not only offers an interpretable model with reasoning aligned with radiologist practices, but also provides a general architecture for computer vision with user-configurable prototypes from coarse- to fine-grained prototypes.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation
Authors:
Julia Barnett,
Kimon Kieslich,
Nicholas Diakopoulos
Abstract:
The rapid advancement of AI technologies yields numerous future impacts on individuals and society. Policymakers are tasked to react quickly and establish policies that mitigate those impacts. However, anticipating the effectiveness of policies is a difficult task, as some impacts might only be observable in the future and respective policies might not be applicable to the future development of AI…
▽ More
The rapid advancement of AI technologies yields numerous future impacts on individuals and society. Policymakers are tasked to react quickly and establish policies that mitigate those impacts. However, anticipating the effectiveness of policies is a difficult task, as some impacts might only be observable in the future and respective policies might not be applicable to the future development of AI. In this work we develop a method for using large language models (LLMs) to evaluate the efficacy of a given piece of policy at mitigating specified negative impacts. We do so by using GPT-4 to generate scenarios both pre- and post-introduction of policy and translating these vivid stories into metrics based on human perceptions of impacts. We leverage an already established taxonomy of impacts of generative AI in the media environment to generate a set of scenario pairs both mitigated and non-mitigated by the transparency policy in Article 50 of the EU AI Act. We then run a user study (n=234) to evaluate these scenarios across four risk-assessment dimensions: severity, plausibility, magnitude, and specificity to vulnerable populations. We find that this transparency legislation is perceived to be effective at mitigating harms in areas such as labor and well-being, but largely ineffective in areas such as social cohesion and security. Through this case study we demonstrate the efficacy of our method as a tool to iterate on the effectiveness of policy for mitigating various negative impacts. We expect this method to be useful to researchers or other stakeholders who want to brainstorm the potential utility of different pieces of policy or other mitigation strategies.
△ Less
Submitted 26 July, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model
Authors:
Julia Barnett,
Hugo Flores Garcia,
Bryan Pardo
Abstract:
Every artist has a creative process that draws inspiration from previous artists and their works. Today, "inspiration" has been automated by generative music models. The black box nature of these models obscures the identity of the works that influence their creative output. As a result, users may inadvertently appropriate, misuse, or copy existing artists' works. We establish a replicable methodo…
▽ More
Every artist has a creative process that draws inspiration from previous artists and their works. Today, "inspiration" has been automated by generative music models. The black box nature of these models obscures the identity of the works that influence their creative output. As a result, users may inadvertently appropriate, misuse, or copy existing artists' works. We establish a replicable methodology to systematically identify similar pieces of music audio in a manner that is useful for understanding training data attribution. A key aspect of our approach is to harness an effective music audio similarity measure. We compare the effect of applying CLMR and CLAP embeddings to similarity measurement in a set of 5 million audio clips used to train VampNet, a recent open source generative music model. We validate this approach with a human listening study. We also explore the effect that modifications of an audio example (e.g., pitch shifting, time stretching, background noise) have on similarity measurements. This work is foundational to incorporating automated influence attribution into generative modeling, which promises to let model creators and users move from ignorant appropriation to informed creation. Audio samples that accompany this paper are available at https://tinyurl.com/exploring-musical-roots.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
ProtoEEGNet: An Interpretable Approach for Detecting Interictal Epileptiform Discharges
Authors:
Dennis Tang,
Frank Willard,
Ronan Tegerdine,
Luke Triplett,
Jon Donnelly,
Luke Moffett,
Lesia Semenova,
Alina Jade Barnett,
Jin Jing,
Cynthia Rudin,
Brandon Westover
Abstract:
In electroencephalogram (EEG) recordings, the presence of interictal epileptiform discharges (IEDs) serves as a critical biomarker for seizures or seizure-like events.Detecting IEDs can be difficult; even highly trained experts disagree on the same sample. As a result, specialists have turned to machine-learning models for assistance. However, many existing models are black boxes and do not provid…
▽ More
In electroencephalogram (EEG) recordings, the presence of interictal epileptiform discharges (IEDs) serves as a critical biomarker for seizures or seizure-like events.Detecting IEDs can be difficult; even highly trained experts disagree on the same sample. As a result, specialists have turned to machine-learning models for assistance. However, many existing models are black boxes and do not provide any human-interpretable reasoning for their decisions. In high-stakes medical applications, it is critical to have interpretable models so that experts can validate the reasoning of the model before making important diagnoses. We introduce ProtoEEGNet, a model that achieves state-of-the-art accuracy for IED detection while additionally providing an interpretable justification for its classifications. Specifically, it can reason that one EEG looks similar to another ''prototypical'' EEG that is known to contain an IED. ProtoEEGNet can therefore help medical professionals effectively detect IEDs while maintaining a transparent decision-making process.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Entropy-Based Strategies for Multi-Bracket Pools
Authors:
Ryan S. Brill,
Abraham J. Wyner,
Ian J. Barnett
Abstract:
Much work in the parimutuel betting literature has discussed estimating event outcome probabilities or developing optimal wagering strategies, particularly for horse race betting. Some betting pools, however, involve betting not just on a single event, but on a tuple of events. For example, pick six betting in horse racing, March Madness bracket challenges, and predicting a randomly drawn bitstrin…
▽ More
Much work in the parimutuel betting literature has discussed estimating event outcome probabilities or developing optimal wagering strategies, particularly for horse race betting. Some betting pools, however, involve betting not just on a single event, but on a tuple of events. For example, pick six betting in horse racing, March Madness bracket challenges, and predicting a randomly drawn bitstring each involve making a series of individual forecasts. Although traditional optimal wagering strategies work well when the size of the tuple is very small (e.g., betting on the winner of a horse race), they are intractable for more general betting pools in higher dimensions (e.g., March Madness bracket challenges). Hence we pose the multi-brackets problem: supposing we wish to predict a tuple of events and that we know the true probabilities of each potential outcome of each event, what is the best way to tractably generate a set of $n$ predicted tuples? The most general version of this problem is extremely difficult, so we begin with a simpler setting. In particular, we generate $n$ independent predicted tuples according to a distribution having optimal entropy. This entropy-based approach is tractable, scalable, and performs well.
△ Less
Submitted 20 March, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
The Ethical Implications of Generative Audio Models: A Systematic Literature Review
Authors:
Julia Barnett
Abstract:
Generative audio models typically focus their applications in music and speech generation, with recent models having human-like quality in their audio output. This paper conducts a systematic literature review of 884 papers in the area of generative audio models in order to both quantify the degree to which researchers in the field are considering potential negative impacts and identify the types…
▽ More
Generative audio models typically focus their applications in music and speech generation, with recent models having human-like quality in their audio output. This paper conducts a systematic literature review of 884 papers in the area of generative audio models in order to both quantify the degree to which researchers in the field are considering potential negative impacts and identify the types of ethical implications researchers in this area need to consider. Though 65% of generative audio research papers note positive potential impacts of their work, less than 10% discuss any negative impacts. This jarringly small percentage of papers considering negative impact is particularly worrying because the issues brought to light by the few papers doing so are raising serious ethical implications and concerns relevant to the broader field such as the potential for fraud, deep-fakes, and copyright infringement. By quantifying this lack of ethical consideration in generative audio research and identifying key areas of potential harm, this paper lays the groundwork for future work in the field at a critical point in time in order to guide more conscientious research as this field progresses.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Neural-Network-Augmented Projection-Based Model Order Reduction for Mitigating the Kolmogorov Barrier to Reducibility of CFD Models
Authors:
Joshua L Barnett,
Charbel Farhat,
Yvon Maday
Abstract:
Inspired by our previous work on mitigating the Kolmogorov barrier using a quadratic approximation manifold, we propose in this paper a computationally tractable approach for combining a projection-based reduced-order model (PROM) and an artificial neural network (ANN) for mitigating the Kolmogorov barrier to reducibility of convection-dominated flow problems. The main objective the PROM-ANN conce…
▽ More
Inspired by our previous work on mitigating the Kolmogorov barrier using a quadratic approximation manifold, we propose in this paper a computationally tractable approach for combining a projection-based reduced-order model (PROM) and an artificial neural network (ANN) for mitigating the Kolmogorov barrier to reducibility of convection-dominated flow problems. The main objective the PROM-ANN concept that we propose is to reduce the dimensionality of the online approximation of the solution beyond what is possible using affine and quadratic approximation manifolds. In contrast to previous approaches for constructing arbitrarily nonlinear manifold approximations for nonlinear model reduction that exploited one form or another of ANN, the training of the PROM-ANN we propose in this paper does not involve data whose dimension scales with that of the high-dimensional model; and this PROM-ANN is hyperreducible using any well-established hyperreduction method. Hence, unlike many other ANN-based approaches, the PROM-ANN concept we propose in this paper is practical for large-scale and industry-relevant CFD problems. Its potential is demonstrated here for a parametric, shock-dominated, benchmark problem.
△ Less
Submitted 17 December, 2022;
originally announced December 2022.
-
Improving Clinician Performance in Classification of EEG Patterns on the Ictal-Interictal-Injury Continuum using Interpretable Machine Learning
Authors:
Alina Jade Barnett,
Zhicheng Guo,
Jin Jing,
Wendong Ge,
Peter W. Kaplan,
Wan Yee Kong,
Ioannis Karakis,
Aline Herlopian,
Lakshman Arcot Jayagopal,
Olga Taraschenko,
Olga Selioutski,
Gamaleldin Osman,
Daniel Goldenholz,
Cynthia Rudin,
M. Brandon Westover
Abstract:
In intensive care units (ICUs), critically ill patients are monitored with electroencephalograms (EEGs) to prevent serious brain injury. The number of patients who can be monitored is constrained by the availability of trained physicians to read EEGs, and EEG interpretation can be subjective and prone to inter-observer variability. Automated deep learning systems for EEG could reduce human bias an…
▽ More
In intensive care units (ICUs), critically ill patients are monitored with electroencephalograms (EEGs) to prevent serious brain injury. The number of patients who can be monitored is constrained by the availability of trained physicians to read EEGs, and EEG interpretation can be subjective and prone to inter-observer variability. Automated deep learning systems for EEG could reduce human bias and accelerate the diagnostic process. However, black box deep learning models are untrustworthy, difficult to troubleshoot, and lack accountability in real-world applications, leading to a lack of trust and adoption by clinicians. To address these challenges, we propose a novel interpretable deep learning model that not only predicts the presence of harmful brainwave patterns but also provides high-quality case-based explanations of its decisions. Our model performs better than the corresponding black box model, despite being constrained to be interpretable. The learned 2D embedded space provides the first global overview of the structure of ictal-interictal-injury continuum brainwave patterns. The ability to understand how our model arrived at its decisions will not only help clinicians to diagnose and treat harmful brain activities more accurately but also increase their trust and adoption of machine learning models in clinical practice; this could be an integral component of the ICU neurologists' standard workflow.
△ Less
Submitted 24 September, 2024; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Crowdsourcing Impacts: Exploring the Utility of Crowds for Anticipating Societal Impacts of Algorithmic Decision Making
Authors:
Julia Barnett,
Nicholas Diakopoulos
Abstract:
With the increasing pervasiveness of algorithms across industry and government, a growing body of work has grappled with how to understand their societal impact and ethical implications. Various methods have been used at different stages of algorithm development to encourage researchers and designers to consider the potential societal impact of their research. An understudied yet promising area in…
▽ More
With the increasing pervasiveness of algorithms across industry and government, a growing body of work has grappled with how to understand their societal impact and ethical implications. Various methods have been used at different stages of algorithm development to encourage researchers and designers to consider the potential societal impact of their research. An understudied yet promising area in this realm is using participatory foresight to anticipate these different societal impacts. We employ crowdsourcing as a means of participatory foresight to uncover four different types of impact areas based on a set of governmental algorithmic decision making tools: (1) perceived valence, (2) societal domains, (3) specific abstract impact types, and (4) ethical algorithm concerns. Our findings suggest that this method is effective at leveraging the cognitive diversity of the crowd to uncover a range of issues. We further analyze the complexities within the interaction of the impact areas identified to demonstrate how crowdsourcing can illuminate patterns around the connections between impacts. Ultimately this work establishes crowdsourcing as an effective means of anticipating algorithmic impact which complements other approaches towards assessing algorithms in society by leveraging participatory foresight and cognitive diversity.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Quadratic Approximation Manifold for Mitigating the Kolmogorov Barrier in Nonlinear Projection-Based Model Order Reduction
Authors:
Joshua Barnett,
Charbel Farhat
Abstract:
A quadratic approximation manifold is presented for performing nonlinear, projection-based, model order reduction (PMOR). It constitutes a departure from the traditional affine subspace approximation that is aimed at mitigating the Kolmogorov barrier for nonlinear PMOR, particularly for convection-dominated transport problems. It builds on the data-driven approach underlying the traditional constr…
▽ More
A quadratic approximation manifold is presented for performing nonlinear, projection-based, model order reduction (PMOR). It constitutes a departure from the traditional affine subspace approximation that is aimed at mitigating the Kolmogorov barrier for nonlinear PMOR, particularly for convection-dominated transport problems. It builds on the data-driven approach underlying the traditional construction of projection-based reduced-order models (PROMs); is application-independent; is linearization-free; and therefore is robust for highly nonlinear problems. Most importantly, this approximation leads to quadratic PROMs that deliver the same accuracy as their traditional counterparts using however a much smaller dimension -- typically, $n_2 \sim \sqrt n_1$, where $n_2$ and $n_1$ denote the dimensions of the quadratic and traditional PROMs, respectively. The computational advantages of the proposed high-order approach to nonlinear PMOR over the traditional approach are highlighted for the detached-eddy simulation-based prediction of the Ahmed body turbulent wake flow, which is a popular CFD benchmark problem in the automotive industry. For a fixed accuracy level, these advantages include: a reduction of the total offline computational cost by a factor greater than five; a reduction of its online wall clock time by a factor greater than 32; and a reduction of the wall clock time of the underlying high-dimensional model by a factor greater than two orders of magnitude.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Sparse Neural Additive Model: Interpretable Deep Learning with Feature Selection via Group Sparsity
Authors:
Shiyun Xu,
Zhiqi Bu,
Pratik Chaudhari,
Ian J. Barnett
Abstract:
Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art accuracy among the large family of generalized additive models. In order to empower NAM with feature selection and improve the generalization, we propose the sparse…
▽ More
Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art accuracy among the large family of generalized additive models. In order to empower NAM with feature selection and improve the generalization, we propose the sparse neural additive models (SNAM) that employ the group sparsity regularization (e.g. Group LASSO), where each feature is learned by a sub-network whose trainable parameters are clustered as a group. We study the theoretical properties for SNAM with novel techniques to tackle the non-parametric truth, thus extending from classical sparse linear models such as the LASSO, which only works on the parametric truth.
Specifically, we show that SNAM with subgradient and proximal gradient descents provably converges to zero training loss as $t\to\infty$, and that the estimation error of SNAM vanishes asymptotically as $n\to\infty$. We also prove that SNAM, similar to LASSO, can have exact support recovery, i.e. perfect feature selection, with appropriate regularization. Moreover, we show that the SNAM can generalize well and preserve the `identifiability', recovering each feature's effect. We validate our theories via extensive experiments and further testify to the good accuracy and efficiency of SNAM.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
Authors:
Jon Donnelly,
Alina Jade Barnett,
Chaofan Chen
Abstract:
We present a deformable prototypical part network (Deformable ProtoPNet), an interpretable image classifier that integrates the power of deep learning and the interpretability of case-based reasoning. This model classifies input images by comparing them with prototypes learned during training, yielding explanations in the form of "this looks like that." However, while previous methods use spatiall…
▽ More
We present a deformable prototypical part network (Deformable ProtoPNet), an interpretable image classifier that integrates the power of deep learning and the interpretability of case-based reasoning. This model classifies input images by comparing them with prototypes learned during training, yielding explanations in the form of "this looks like that." However, while previous methods use spatially rigid prototypes, we address this shortcoming by proposing spatially flexible prototypes. Each prototype is made up of several prototypical parts that adaptively change their relative spatial positions depending on the input image. Consequently, a Deformable ProtoPNet can explicitly capture pose variations and context, improving both model accuracy and the richness of explanations provided. Compared to other case-based interpretable models using prototypes, our approach achieves state-of-the-art accuracy and gives an explanation with greater context. The code is available at https://github.com/jdonnelly36/Deformable-ProtoPNet.
△ Less
Submitted 2 May, 2024; v1 submitted 29 November, 2021;
originally announced November 2021.
-
Interpretable Mammographic Image Classification using Case-Based Reasoning and Deep Learning
Authors:
Alina Jade Barnett,
Fides Regina Schwartz,
Chaofan Tao,
Chaofan Chen,
Yinhao Ren,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
When we deploy machine learning models in high-stakes medical settings, we must ensure these models make accurate predictions that are consistent with known medical science. Inherently interpretable networks address this need by explaining the rationale behind each decision while maintaining equal or higher accuracy compared to black-box models. In this work, we present a novel interpretable neura…
▽ More
When we deploy machine learning models in high-stakes medical settings, we must ensure these models make accurate predictions that are consistent with known medical science. Inherently interpretable networks address this need by explaining the rationale behind each decision while maintaining equal or higher accuracy compared to black-box models. In this work, we present a novel interpretable neural network algorithm that uses case-based reasoning for mammography. Designed to aid a radiologist in their decisions, our network presents both a prediction of malignancy and an explanation of that prediction using known medical features. In order to yield helpful explanations, the network is designed to mimic the reasoning processes of a radiologist: our network first detects the clinically relevant semantic features of each image by comparing each new image with a learned set of prototypical image parts from the training images, then uses those clinical features to predict malignancy. Compared to other methods, our model detects clinical features (mass margins) with equal or higher accuracy, provides a more detailed explanation of its prediction, and is better able to differentiate the classification-relevant parts of the image.
△ Less
Submitted 4 October, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
IAIA-BL: A Case-based Interpretable Deep Learning Model for Classification of Mass Lesions in Digital Mammography
Authors:
Alina Jade Barnett,
Fides Regina Schwartz,
Chaofan Tao,
Chaofan Chen,
Yinhao Ren,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
Interpretability in machine learning models is important in high-stakes decisions, such as whether to order a biopsy based on a mammographic exam. Mammography poses important challenges that are not present in other computer vision tasks: datasets are small, confounding information is present, and it can be difficult even for a radiologist to decide between watchful waiting and biopsy based on a m…
▽ More
Interpretability in machine learning models is important in high-stakes decisions, such as whether to order a biopsy based on a mammographic exam. Mammography poses important challenges that are not present in other computer vision tasks: datasets are small, confounding information is present, and it can be difficult even for a radiologist to decide between watchful waiting and biopsy based on a mammogram alone. In this work, we present a framework for interpretable machine learning-based mammography. In addition to predicting whether a lesion is malignant or benign, our work aims to follow the reasoning processes of radiologists in detecting clinically relevant semantic features of each image, such as the characteristics of the mass margins. The framework includes a novel interpretable neural network algorithm that uses case-based reasoning for mammography. Our algorithm can incorporate a combination of data with whole image labelling and data with pixel-wise annotations, leading to better accuracy and interpretability even with a small number of images. Our interpretable models are able to highlight the classification-relevant parts of the image, whereas other methods highlight healthy tissue and confounding information. Our models are decision aids, rather than decision makers, aimed at better overall human-machine collaboration. We do not observe a loss in mass margin classification accuracy over a black box neural network trained on the same data.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
A Socio-Informatic Approach to Automated Account Classification on Social Media
Authors:
Laurenz A Cornelissen,
Petrus Schoonwinkel,
Richard J Barnett
Abstract:
Automated accounts on social media have become increasingly problematic. We propose a key feature in combination with existing methods to improve machine learning algorithms for bot detection. We successfully improve classification performance through including the proposed feature.
Automated accounts on social media have become increasingly problematic. We propose a key feature in combination with existing methods to improve machine learning algorithms for bot detection. We successfully improve classification performance through including the proposed feature.
△ Less
Submitted 27 April, 2019;
originally announced April 2019.
-
A Network Topology Approach to Bot Classification
Authors:
Laurenz A Cornelissen,
Richard J Barnett,
Petrus Schoonwinkel,
Brent D. Eichstadt,
Hluma B. Magodla
Abstract:
Automated social agents, or bots, are increasingly becoming a problem on social media platforms. There is a growing body of literature and multiple tools to aid in the detection of such agents on online social networking platforms. We propose that the social network topology of a user would be sufficient to determine whether the user is a automated agent or a human. To test this, we use a publicly…
▽ More
Automated social agents, or bots, are increasingly becoming a problem on social media platforms. There is a growing body of literature and multiple tools to aid in the detection of such agents on online social networking platforms. We propose that the social network topology of a user would be sufficient to determine whether the user is a automated agent or a human. To test this, we use a publicly available dataset containing users on Twitter labelled as either automated social agent or human. Using an unsupervised machine learning approach, we obtain a detection accuracy rate of 70%.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
Deploying South African Social Honeypots on Twitter
Authors:
Laurenz A Cornelissen,
Richard J Barnett,
Morakane AM Kepa,
Daniel Loebenberg-Novitzkas,
Jacques Jordaan
Abstract:
Inspired by the simple, yet effective, method of tweeting gibberish to attract automated social agents (bots), we attempt to create localised honeypots in the South African political context. We produce a series of defined techniques and combine them to generate interactions from users on Twitter. The paper offers two key contributions. Conceptually, an argument is made that honeypots should not b…
▽ More
Inspired by the simple, yet effective, method of tweeting gibberish to attract automated social agents (bots), we attempt to create localised honeypots in the South African political context. We produce a series of defined techniques and combine them to generate interactions from users on Twitter. The paper offers two key contributions. Conceptually, an argument is made that honeypots should not be confused for bot detection methods, but are rather methods to capture low-quality users. Secondly, we successfully generate a list of 288 local low quality users active in the political context.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
This Looks Like That: Deep Learning for Interpretable Image Recognition
Authors:
Chaofan Chen,
Oscar Li,
Chaofan Tao,
Alina Jade Barnett,
Jonathan Su,
Cynthia Rudin
Abstract:
When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the networ…
▽ More
When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the Stanford Cars dataset. Our experiments show that ProtoPNet can achieve comparable accuracy with its analogous non-interpretable counterpart, and when several ProtoPNets are combined into a larger network, it can achieve an accuracy that is on par with some of the best-performing deep models. Moreover, ProtoPNet provides a level of interpretability that is absent in other interpretable deep models.
△ Less
Submitted 28 December, 2019; v1 submitted 27 June, 2018;
originally announced June 2018.