-
The RSNA Lumbar Degenerative Imaging Spine Classification (LumbarDISC) Dataset
Authors:
Tyler J. Richards,
Adam E. Flanders,
Errol Colak,
Luciano M. Prevedello,
Robyn L. Ball,
Felipe Kitamura,
John Mongan,
Maryam Vazirabad,
Hui-Ming Lin,
Anne Kendell,
Thanat Kanthawang,
Salita Angkurawaranon,
Emre Altinmakas,
Hakan Dogan,
Paulo Eduardo de Aguiar Kuriki,
Arjuna Somasundaram,
Christopher Ruston,
Deniz Bulja,
Naida Spahovic,
Jennifer Sommer,
Sirui Jiang,
Eduardo Moreno Judice de Mattos Farina,
Eduardo Caminha Nunes,
Michael Brassil,
Megan McNamara
, et al. (11 additional authors not shown)
Abstract:
The Radiological Society of North America (RSNA) Lumbar Degenerative Imaging Spine Classification (LumbarDISC) dataset is the largest publicly available dataset of adult MRI lumbar spine examinations annotated for degenerative changes. The dataset includes 2,697 patients with a total of 8,593 image series from 8 institutions across 6 countries and 5 continents. The dataset is available for free fo…
▽ More
The Radiological Society of North America (RSNA) Lumbar Degenerative Imaging Spine Classification (LumbarDISC) dataset is the largest publicly available dataset of adult MRI lumbar spine examinations annotated for degenerative changes. The dataset includes 2,697 patients with a total of 8,593 image series from 8 institutions across 6 countries and 5 continents. The dataset is available for free for non-commercial use via Kaggle and RSNA Medical Imaging Resource of AI (MIRA). The dataset was created for the RSNA 2024 Lumbar Spine Degenerative Classification competition where competitors developed deep learning models to grade degenerative changes in the lumbar spine. The degree of spinal canal, subarticular recess, and neural foraminal stenosis was graded at each intervertebral disc level in the lumbar spine. The images were annotated by expert volunteer neuroradiologists and musculoskeletal radiologists from the RSNA, American Society of Neuroradiology, and the American Society of Spine Radiology. This dataset aims to facilitate research and development in machine learning and lumbar spine imaging to lead to improved patient care and clinical efficiency.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Developing a Risk Identification Framework for Foundation Model Uses
Authors:
David Piorkowski,
Michael Hind,
John Richards,
Jacquelyn Martino
Abstract:
As foundation models grow in both popularity and capability, researchers have uncovered a variety of ways that the models can pose a risk to the model's owner, user, or others. Despite the efforts of measuring these risks via benchmarks and cataloging them in AI risk taxonomies, there is little guidance for practitioners on how to determine which risks are relevant for a given foundation model use…
▽ More
As foundation models grow in both popularity and capability, researchers have uncovered a variety of ways that the models can pose a risk to the model's owner, user, or others. Despite the efforts of measuring these risks via benchmarks and cataloging them in AI risk taxonomies, there is little guidance for practitioners on how to determine which risks are relevant for a given foundation model use. In this paper, we address this gap and develop requirements and an initial design for a risk identification framework. To do so, we look to prior literature to identify challenges for building a foundation model risk identification framework and adapt ideas from usage governance to synthesize four design requirements. We then demonstrate how a candidate framework can addresses these design requirements and provide a foundation model use example to show how the framework works in practice for a small subset of risks.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Comparative Analysis of Distributed Caching Algorithms: Performance Metrics and Implementation Considerations
Authors:
Helen Mayer,
James Richards
Abstract:
This paper presents a comprehensive comparison of distributed caching algorithms employed in modern distributed systems. We evaluate various caching strategies including Least Recently Used (LRU), Least Frequently Used (LFU), Adaptive Replacement Cache (ARC), and Time-Aware Least Recently Used (TLRU) against metrics such as hit ratio, latency reduction, memory overhead, and scalability. Our analys…
▽ More
This paper presents a comprehensive comparison of distributed caching algorithms employed in modern distributed systems. We evaluate various caching strategies including Least Recently Used (LRU), Least Frequently Used (LFU), Adaptive Replacement Cache (ARC), and Time-Aware Least Recently Used (TLRU) against metrics such as hit ratio, latency reduction, memory overhead, and scalability. Our analysis reveals that while traditional algorithms like LRU remain prevalent, hybrid approaches incorporating machine learning techniques demonstrate superior performance in dynamic environments. Additionally, we analyze implementation patterns across different distributed architectures and provide recommendations for algorithm selection based on specific workload characteristics.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
AI Risk Atlas: Taxonomy and Tooling for Navigating AI Risks and Resources
Authors:
Frank Bagehorn,
Kristina Brimijoin,
Elizabeth M. Daly,
Jessica He,
Michael Hind,
Luis Garces-Erice,
Christopher Giblin,
Ioana Giurgiu,
Jacquelyn Martino,
Rahul Nair,
David Piorkowski,
Ambrish Rawat,
John Richards,
Sean Rooney,
Dhaval Salwala,
Seshu Tirupathi,
Peter Urbanetz,
Kush R. Varshney,
Inge Vejsbjerg,
Mira L. Wolf-Bauwens
Abstract:
The rapid evolution of generative AI has expanded the breadth of risks associated with AI systems. While various taxonomies and frameworks exist to classify these risks, the lack of interoperability between them creates challenges for researchers, practitioners, and policymakers seeking to operationalise AI governance. To address this gap, we introduce the AI Risk Atlas, a structured taxonomy that…
▽ More
The rapid evolution of generative AI has expanded the breadth of risks associated with AI systems. While various taxonomies and frameworks exist to classify these risks, the lack of interoperability between them creates challenges for researchers, practitioners, and policymakers seeking to operationalise AI governance. To address this gap, we introduce the AI Risk Atlas, a structured taxonomy that consolidates AI risks from diverse sources and aligns them with governance frameworks. Additionally, we present the Risk Atlas Nexus, a collection of open-source tools designed to bridge the divide between risk definitions, benchmarks, datasets, and mitigation strategies. This knowledge-driven approach leverages ontologies and knowledge graphs to facilitate risk identification, prioritization, and mitigation. By integrating AI-assisted compliance workflows and automation strategies, our framework lowers the barrier to responsible AI adoption. We invite the broader research and open-source community to contribute to this evolving initiative, fostering cross-domain collaboration and ensuring AI governance keeps pace with technological advancements.
△ Less
Submitted 26 February, 2025;
originally announced March 2025.
-
Agentic AI Needs a Systems Theory
Authors:
Erik Miehling,
Karthikeyan Natesan Ramamurthy,
Kush R. Varshney,
Matthew Riemer,
Djallel Bouneffouf,
John T. Richards,
Amit Dhurandhar,
Elizabeth M. Daly,
Michael Hind,
Prasanna Sattigeri,
Dennis Wei,
Ambrish Rawat,
Jasmina Gajcin,
Werner Geyer
Abstract:
The endowment of AI with reasoning capabilities and some degree of agency is widely viewed as a path toward more capable and generalizable systems. Our position is that the current development of agentic AI requires a more holistic, systems-theoretic perspective in order to fully understand their capabilities and mitigate any emergent risks. The primary motivation for our position is that AI devel…
▽ More
The endowment of AI with reasoning capabilities and some degree of agency is widely viewed as a path toward more capable and generalizable systems. Our position is that the current development of agentic AI requires a more holistic, systems-theoretic perspective in order to fully understand their capabilities and mitigate any emergent risks. The primary motivation for our position is that AI development is currently overly focused on individual model capabilities, often ignoring broader emergent behavior, leading to a significant underestimation in the true capabilities and associated risks of agentic AI. We describe some fundamental mechanisms by which advanced capabilities can emerge from (comparably simpler) agents simply due to their interaction with the environment and other agents. Informed by an extensive amount of existing literature from various fields, we outline mechanisms for enhanced agent cognition, emergent causal reasoning ability, and metacognitive awareness. We conclude by presenting some key open challenges and guidance for the development of agentic AI. We emphasize that a systems-level perspective is essential for better understanding, and purposefully shaping, agentic AI systems.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants
Authors:
Jonan Richards,
Mairieli Wessel
Abstract:
As Large Language Models (LLMs) are increasingly adopted in software engineering, recently in the form of conversational assistants, ensuring these technologies align with developers' needs is essential. The limitations of traditional human-centered methods for evaluating LLM-based tools at scale raise the need for automatic evaluation. In this paper, we advocate combining insights from human-comp…
▽ More
As Large Language Models (LLMs) are increasingly adopted in software engineering, recently in the form of conversational assistants, ensuring these technologies align with developers' needs is essential. The limitations of traditional human-centered methods for evaluating LLM-based tools at scale raise the need for automatic evaluation. In this paper, we advocate combining insights from human-computer interaction (HCI) and artificial intelligence (AI) research to enable human-centered automatic evaluation of LLM-based conversational SE assistants. We identify requirements for such evaluation and challenges down the road, working towards a framework that ensures these assistants are designed and deployed in line with user needs.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Deep learning joint extremes of metocean variables using the SPAR model
Authors:
Ed Mackay,
Callum Murphy-Barltrop,
Jordan Richards,
Philip Jonathan
Abstract:
This paper presents a novel deep learning framework for estimating multivariate joint extremes of metocean variables, based on the Semi-Parametric Angular-Radial (SPAR) model. When considered in polar coordinates, the problem of modelling multivariate extremes is transformed to one of modelling an angular density, and the tail of a univariate radial variable conditioned on angle. In the SPAR appro…
▽ More
This paper presents a novel deep learning framework for estimating multivariate joint extremes of metocean variables, based on the Semi-Parametric Angular-Radial (SPAR) model. When considered in polar coordinates, the problem of modelling multivariate extremes is transformed to one of modelling an angular density, and the tail of a univariate radial variable conditioned on angle. In the SPAR approach, the tail of the radial variable is modelled using a generalised Pareto (GP) distribution, providing a natural extension of univariate extreme value theory to the multivariate setting. In this work, we show how the method can be applied in higher dimensions, using a case study for five metocean variables: wind speed, wind direction, wave height, wave period, and wave direction. The angular variable is modelled using a kernel density method, while the parameters of the GP model are approximated using fully-connected deep neural networks. Our approach provides great flexibility in the dependence structures that can be represented, together with computationally efficient routines for training the model. Furthermore, the application of the method requires fewer assumptions about the underlying distribution(s) compared to existing approaches, and an asymptotically justified means for extrapolating outside the range of observations. Using various diagnostic plots, we show that the fitted models provide a good description of the joint extremes of the metocean variables considered.
△ Less
Submitted 19 June, 2025; v1 submitted 20 December, 2024;
originally announced December 2024.
-
What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant
Authors:
Jonan Richards,
Mairieli Wessel
Abstract:
A growing number of tools have used Large Language Models (LLMs) to support developers' code understanding. However, developers still face several barriers to using such tools, including challenges in describing their intent in natural language, interpreting the tool outcome, and refining an effective prompt to obtain useful information. In this study, we designed an LLM-based conversational assis…
▽ More
A growing number of tools have used Large Language Models (LLMs) to support developers' code understanding. However, developers still face several barriers to using such tools, including challenges in describing their intent in natural language, interpreting the tool outcome, and refining an effective prompt to obtain useful information. In this study, we designed an LLM-based conversational assistant that provides a personalized interaction based on inferred user mental state (e.g., background knowledge and experience). We evaluate the approach in a within-subject study with fourteen novices to capture their perceptions and preferences. Our results provide insights for researchers and tool builders who want to create or improve LLM-based conversational assistants to support novices in code understanding.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Authors:
Erik Miehling,
Manish Nagireddy,
Prasanna Sattigeri,
Elizabeth M. Daly,
David Piorkowski,
John T. Richards
Abstract:
Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation of one or more conversational principles. By drawing upon extensive research from both the social science and AI communities, we propose a set of maxims -- quantity, quality, relevance, manner, benev…
▽ More
Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation of one or more conversational principles. By drawing upon extensive research from both the social science and AI communities, we propose a set of maxims -- quantity, quality, relevance, manner, benevolence, and transparency -- for describing effective human-AI conversation. We first justify the applicability of the first four maxims (from Grice) in the context of human-AI interactions. We then argue that two new maxims, benevolence (concerning the generation of, and engagement with, harmful content) and transparency (concerning recognition of one's knowledge boundaries, operational constraints, and intents), are necessary for addressing behavior unique to modern human-AI interactions. We evaluate the degree to which various language models are able to understand these maxims and find that models possess an internal prioritization of principles that can significantly impact their ability to interpret the maxims accurately.
△ Less
Submitted 22 June, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
A Data-Driven Autopilot for Fixed-Wing Aircraft Based on Model Predictive Control
Authors:
Riley J. Richards,
Juan A. Paredes,
Dennis S. Bernstein
Abstract:
Autopilots for fixed-wing aircraft are typically designed based on linearized aerodynamic models consisting of stability and control derivatives obtained from wind-tunnel testing. The resulting local controllers are then pieced together using gain scheduling. For applications in which the aerodynamics are unmodeled, the present paper proposes an autopilot based on predictive cost adaptive control…
▽ More
Autopilots for fixed-wing aircraft are typically designed based on linearized aerodynamic models consisting of stability and control derivatives obtained from wind-tunnel testing. The resulting local controllers are then pieced together using gain scheduling. For applications in which the aerodynamics are unmodeled, the present paper proposes an autopilot based on predictive cost adaptive control (PCAC). As an indirect adaptive control extension of model predictive control, PCAC uses recursive least squares (RLS) with variable-rate forgetting for online, closed-loop system identification. At each time step, RLS-based system identification updates the coefficients of an input-output model whose order is a hyperparameter specified by the user. For MPC, the receding-horizon optimization can be performed by either the backward-propagating Riccati equation or quadratic programming. The present paper investigates the performance of PCAC for fixed-wing aircraft without the use of any aerodynamic modeling or offline/prior data collection.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Quantitative AI Risk Assessments: Opportunities and Challenges
Authors:
David Piorkowski,
Michael Hind,
John Richards
Abstract:
Although AI systems are increasingly being leveraged to provide value to organizations, individuals, and society, significant attendant risks have been identified and have manifested. These risks have led to proposed regulations, litigation, and general societal concerns.
As with any promising technology, organizations want to benefit from the positive capabilities of AI technology while reducin…
▽ More
Although AI systems are increasingly being leveraged to provide value to organizations, individuals, and society, significant attendant risks have been identified and have manifested. These risks have led to proposed regulations, litigation, and general societal concerns.
As with any promising technology, organizations want to benefit from the positive capabilities of AI technology while reducing the risks. The best way to reduce risks is to implement comprehensive AI lifecycle governance where policies and procedures are described and enforced during the design, development, deployment, and monitoring of an AI system. Although support for comprehensive governance is beginning to emerge, organizations often need to identify the risks of deploying an already-built model without knowledge of how it was constructed or access to its original developers. Such an assessment will quantitatively assess the risks of an existing model in a manner analogous to how a home inspector might assess the risks of an already-built home or a physician might assess overall patient health based on a battery of tests.
Several AI risks can be quantified using metrics from the technical community. However, there are numerous issues in deciding how these metrics can be leveraged to create a quantitative AI risk assessment. This paper explores these issues, focusing on the opportunities, challenges, and potential impacts of such an approach, and discussing how it might influence AI regulations.
△ Less
Submitted 4 December, 2024; v1 submitted 13 September, 2022;
originally announced September 2022.
-
Regression modelling of spatiotemporal extreme U.S. wildfires via partially-interpretable neural networks
Authors:
Jordan Richards,
Raphaël Huser
Abstract:
Risk management in many environmental settings requires an understanding of the mechanisms that drive extreme events. Useful metrics for quantifying such risk are extreme quantiles of response variables conditioned on predictor variables that describe, e.g., climate, biosphere and environmental states. Typically these quantiles lie outside the range of observable data and so, for estimation, requi…
▽ More
Risk management in many environmental settings requires an understanding of the mechanisms that drive extreme events. Useful metrics for quantifying such risk are extreme quantiles of response variables conditioned on predictor variables that describe, e.g., climate, biosphere and environmental states. Typically these quantiles lie outside the range of observable data and so, for estimation, require specification of parametric extreme value models within a regression framework. Classical approaches in this context utilise linear or additive relationships between predictor and response variables and suffer in either their predictive capabilities or computational efficiency; moreover, their simplicity is unlikely to capture the truly complex structures that lead to the creation of extreme wildfires. In this paper, we propose a new methodological framework for performing extreme quantile regression using artificial neutral networks, which are able to capture complex non-linear relationships and scale well to high-dimensional data. The "black box" nature of neural networks means that they lack the desirable trait of interpretability often favoured by practitioners; thus, we unify linear, and additive, regression methodology with deep learning to create partially-interpretable neural networks that can be used for statistical inference but retain high prediction accuracy. To complement this methodology, we further propose a novel point process model for extreme values which overcomes the finite lower-endpoint problem associated with the generalised extreme value class of distributions. Efficacy of our unified framework is illustrated on U.S. wildfire data with a high-dimensional predictor set and we illustrate vast improvements in predictive performance over linear and spline-based regression techniques.
△ Less
Submitted 7 March, 2024; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Humble Machines: Attending to the Underappreciated Costs of Misplaced Distrust
Authors:
Bran Knowles,
Jason D'Cruz,
John T. Richards,
Kush R. Varshney
Abstract:
It is curious that AI increasingly outperforms human decision makers, yet much of the public distrusts AI to make decisions affecting their lives. In this paper we explore a novel theory that may explain one reason for this. We propose that public distrust of AI is a moral consequence of designing systems that prioritize reduction of costs of false positives over less tangible costs of false negat…
▽ More
It is curious that AI increasingly outperforms human decision makers, yet much of the public distrusts AI to make decisions affecting their lives. In this paper we explore a novel theory that may explain one reason for this. We propose that public distrust of AI is a moral consequence of designing systems that prioritize reduction of costs of false positives over less tangible costs of false negatives. We show that such systems, which we characterize as 'distrustful', are more likely to miscategorize trustworthy individuals, with cascading consequences to both those individuals and the overall human-AI trust relationship. Ultimately, we argue that public distrust of AI stems from well-founded concern about the potential of being miscategorized. We propose that restoring public trust in AI will require that systems are designed to embody a stance of 'humble trust', whereby the moral costs of the misplaced distrust associated with false negatives is weighted appropriately during development and use.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
The Many Facets of Trust in AI: Formalizing the Relation Between Trust and Fairness, Accountability, and Transparency
Authors:
Bran Knowles,
John T. Richards,
Frens Kroeger
Abstract:
Efforts to promote fairness, accountability, and transparency are assumed to be critical in fostering Trust in AI (TAI), but extant literature is frustratingly vague regarding this 'trust'. The lack of exposition on trust itself suggests that trust is commonly understood, uncomplicated, or even uninteresting. But is it? Our analysis of TAI publications reveals numerous orientations which differ in…
▽ More
Efforts to promote fairness, accountability, and transparency are assumed to be critical in fostering Trust in AI (TAI), but extant literature is frustratingly vague regarding this 'trust'. The lack of exposition on trust itself suggests that trust is commonly understood, uncomplicated, or even uninteresting. But is it? Our analysis of TAI publications reveals numerous orientations which differ in terms of who is doing the trusting (agent), in what (object), on the basis of what (basis), in order to what (objective), and why (impact). We develop an ontology that encapsulates these key axes of difference to a) illuminate seeming inconsistencies across the literature and b) more effectively manage a dizzying number of TAI considerations. We then reflect this ontology through a corpus of publications exploring fairness, accountability, and transparency to examine the variety of ways that TAI is considered within and between these approaches to promoting trust.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider
Authors:
C. Fanelli,
Z. Papandreou,
K. Suresh,
J. K. Adkins,
Y. Akiba,
A. Albataineh,
M. Amaryan,
I. C. Arsene,
C. Ayerbe Gayoso,
J. Bae,
X. Bai,
M. D. Baker,
M. Bashkanov,
R. Bellwied,
F. Benmokhtar,
V. Berdnikov,
J. C. Bernauer,
F. Bock,
W. Boeglin,
M. Borysova,
E. Brash,
P. Brindza,
W. J. Briscoe,
M. Brooks,
S. Bueltmann
, et al. (258 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to…
▽ More
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector.
△ Less
Submitted 19 May, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Conditional $β$-VAE for De Novo Molecular Generation
Authors:
Ryan J Richards,
Austen M Groener
Abstract:
Deep learning has significantly advanced and accelerated de novo molecular generation. Generative networks, namely Variational Autoencoders (VAEs) can not only randomly generate new molecules, but also alter molecular structures to optimize specific chemical properties which are pivotal for drug-discovery. While VAEs have been proposed and researched in the past for pharmaceutical applications, th…
▽ More
Deep learning has significantly advanced and accelerated de novo molecular generation. Generative networks, namely Variational Autoencoders (VAEs) can not only randomly generate new molecules, but also alter molecular structures to optimize specific chemical properties which are pivotal for drug-discovery. While VAEs have been proposed and researched in the past for pharmaceutical applications, they possess deficiencies which limit their ability to both optimize properties and decode syntactically valid molecules. We present a recurrent, conditional $β$-VAE which disentangles the latent space to enhance post hoc molecule optimization. We create a mutual information driven training protocol and data augmentations to both increase molecular validity and promote longer sequence generation. We demonstrate the efficacy of our framework on the ZINC-250k dataset, achieving SOTA unconstrained optimization results on the penalized LogP (pLogP) and QED scores, while also matching current SOTA results for validity, novelty and uniqueness scores for random generation. We match the current SOTA on QED for top-3 molecules at 0.948, while setting a new SOTA for pLogP optimization at 104.29, 90.12, 69.68 and demonstrating improved results on the constrained optimization task.
△ Less
Submitted 1 May, 2022;
originally announced May 2022.
-
Better Together? An Evaluation of AI-Supported Code Translation
Authors:
Justin D. Weisz,
Michael Muller,
Steven I. Ross,
Fernando Martinez,
Stephanie Houde,
Mayank Agarwal,
Kartik Talamadupula,
John T. Richards
Abstract:
Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful i…
▽ More
Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Evaluating a Methodology for Increasing AI Transparency: A Case Study
Authors:
David Piorkowski,
John Richards,
Michael Hind
Abstract:
In reaction to growing concerns about the potential harms of artificial intelligence (AI), societies have begun to demand more transparency about how AI models and systems are created and used. To address these concerns, several efforts have proposed documentation templates containing questions to be answered by model developers. These templates provide a useful starting point, but no single templ…
▽ More
In reaction to growing concerns about the potential harms of artificial intelligence (AI), societies have begun to demand more transparency about how AI models and systems are created and used. To address these concerns, several efforts have proposed documentation templates containing questions to be answered by model developers. These templates provide a useful starting point, but no single template can cover the needs of diverse documentation consumers. It is possible in principle, however, to create a repeatable methodology to generate truly useful documentation. Richards et al. [25] proposed such a methodology for identifying specific documentation needs and creating templates to address those needs. Although this is a promising proposal, it has not been evaluated.
This paper presents the first evaluation of this user-centered methodology in practice, reporting on the experiences of a team in the domain of AI for healthcare that adopted it to increase transparency for several AI models. The methodology was found to be usable by developers not trained in user-centered techniques, guiding them to creating a documentation template that addressed the specific needs of their consumers while still being reusable across different models and use cases. Analysis of the benefits and costs of this methodology are reviewed and suggestions for further improvement in both the methodology and supporting tools are summarized.
△ Less
Submitted 12 March, 2024; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Using Document Similarity Methods to create Parallel Datasets for Code Translation
Authors:
Mayank Agarwal,
Kartik Talamadupula,
Fernando Martinez,
Stephanie Houde,
Michael Muller,
John Richards,
Steven I Ross,
Justin D. Weisz
Abstract:
Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, sup…
▽ More
Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, supervised techniques have only been applied to a limited set of popular programming languages. To bypass this limitation, unsupervised neural machine translation techniques have been proposed to learn code translation using only monolingual corpora. In this work, we propose to use document similarity methods to create noisy parallel datasets of code, thus enabling supervised techniques to be applied for automated code translation without having to rely on the availability or expensive curation of parallel code datasets. We explore the noise tolerance of models trained on such automatically-created datasets and show that these models perform comparably to models trained on ground truth for reasonable levels of noise. Finally, we exhibit the practical utility of the proposed method by creating parallel datasets for languages beyond the ones explored in prior work, thus expanding the set of programming languages for automated code translation.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
AI Explainability 360: Impact and Design
Authors:
Vijay Arya,
Rachel K. E. Bellamy,
Pin-Yu Chen,
Amit Dhurandhar,
Michael Hind,
Samuel C. Hoffman,
Stephanie Houde,
Q. Vera Liao,
Ronny Luss,
Aleksandra Mojsilovic,
Sami Mourad,
Pablo Pedemonte,
Ramya Raghavendra,
John Richards,
Prasanna Sattigeri,
Karthikeyan Shanmugam,
Moninder Singh,
Kush R. Varshney,
Dennis Wei,
Yunfeng Zhang
Abstract:
As artificial intelligence and machine learning algorithms become increasingly prevalent in society, multiple stakeholders are calling for these algorithms to provide explanations. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, have different explanation needs. To address these needs, in 2019, we created AI Expl…
▽ More
As artificial intelligence and machine learning algorithms become increasingly prevalent in society, multiple stakeholders are calling for these algorithms to provide explanations. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, have different explanation needs. To address these needs, in 2019, we created AI Explainability 360 (Arya et al. 2020), an open source software toolkit featuring ten diverse and state-of-the-art explainability methods and two evaluation metrics. This paper examines the impact of the toolkit with several case studies, statistics, and community feedback. The different ways in which users have experienced AI Explainability 360 have resulted in multiple types of impact and improvements in multiple metrics, highlighted by the adoption of the toolkit by the independent LF AI & Data Foundation. The paper also describes the flexible design of the toolkit, examples of its use, and the significant educational material and documentation available to its users.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Perfection Not Required? Human-AI Partnerships in Code Translation
Authors:
Justin D. Weisz,
Michael Muller,
Stephanie Houde,
John Richards,
Steven I. Ross,
Fernando Martinez,
Mayank Agarwal,
Kartik Talamadupula
Abstract:
Generative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, suc…
▽ More
Generative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, such as compilation or logical errors. We examine the extent to which software engineers would tolerate such imperfections and explore ways to aid the detection and correction of those errors. Using a design scenario approach, we interviewed 11 software engineers to understand their reactions to the use of an NMT model in the context of application modernization, focusing on the task of translating source code from one language to another. Our three-stage scenario sparked discussions about the utility and desirability of working with an imperfect AI system, how acceptance of that system's outputs would be established, and future opportunities for generative AI in application modernization. Our study highlights how UI features such as confidence highlighting and alternate translations help software engineers work with and better understand generative NMT models.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
The Sanction of Authority: Promoting Public Trust in AI
Authors:
Bran Knowles,
John T. Richards
Abstract:
Trusted AI literature to date has focused on the trust needs of users who knowingly interact with discrete AIs. Conspicuously absent from the literature is a rigorous treatment of public trust in AI. We argue that public distrust of AI originates from the under-development of a regulatory ecosystem that would guarantee the trustworthiness of the AIs that pervade society. Drawing from structuration…
▽ More
Trusted AI literature to date has focused on the trust needs of users who knowingly interact with discrete AIs. Conspicuously absent from the literature is a rigorous treatment of public trust in AI. We argue that public distrust of AI originates from the under-development of a regulatory ecosystem that would guarantee the trustworthiness of the AIs that pervade society. Drawing from structuration theory and literature on institutional trust, we offer a model of public trust in AI that differs starkly from models driving Trusted AI efforts. This model provides a theoretical scaffolding for Trusted AI research which underscores the need to develop nothing less than a comprehensive and visibly functioning regulatory ecosystem. We elaborate the pivotal role of externally auditable AI documentation within this model and the work to be done to ensure it is effective, and outline a number of actions that would promote public trust in AI. We discuss how existing efforts to develop AI documentation within organizations -- both to inform potential adopters of AI components and support the deliberations of risk and ethics review boards -- is necessary but insufficient assurance of the trustworthiness of AI. We argue that being accountable to the public in ways that earn their trust, through elaborating rules for AI and developing resources for enforcing these rules, is what will ultimately make AI trustworthy enough to be woven into the fabric of our society.
△ Less
Submitted 22 January, 2021;
originally announced February 2021.
-
Quality Estimation & Interpretability for Code Translation
Authors:
Mayank Agarwal,
Kartik Talamadupula,
Stephanie Houde,
Fernando Martinez,
Michael Muller,
John Richards,
Steven Ross,
Justin D. Weisz
Abstract:
Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the transl…
▽ More
Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the translations; and consequently ascribe some measure of interpretability to the model's choices. In this paper, we attempt to estimate the quality of source code translations built on top of the TransCoder model. We consider the code translation task as an analog of machine translation (MT) for natural languages, with some added caveats. We present our main motivation from a user study built around code translation; and present a technique that correlates the confidences generated by that model to lint errors in the translated code. We conclude with some observations on these correlations, and some ideas for future work.
△ Less
Submitted 26 April, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Towards evaluating and eliciting high-quality documentation for intelligent systems
Authors:
David Piorkowski,
Daniel González,
John Richards,
Stephanie Houde
Abstract:
A vital component of trust and transparency in intelligent systems built on machine learning and artificial intelligence is the development of clear, understandable documentation. However, such systems are notorious for their complexity and opaqueness making quality documentation a non-trivial task. Furthermore, little is known about what makes such documentation "good." In this paper, we propose…
▽ More
A vital component of trust and transparency in intelligent systems built on machine learning and artificial intelligence is the development of clear, understandable documentation. However, such systems are notorious for their complexity and opaqueness making quality documentation a non-trivial task. Furthermore, little is known about what makes such documentation "good." In this paper, we propose and evaluate a set of quality dimensions to identify in what ways this type of documentation falls short. Then, using those dimensions, we evaluate three different approaches for eliciting intelligent system documentation. We show how the dimensions identify shortcomings in such documentation and posit how such dimensions can be use to further enable users to provide documentation that is suitable to a given persona or use case.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
A Methodology for Creating AI FactSheets
Authors:
John Richards,
David Piorkowski,
Michael Hind,
Stephanie Houde,
Aleksandra Mojsilović
Abstract:
As AI models and services are used in a growing number of highstakes areas, a consensus is forming around the need for a clearer record of how these models and services are developed to increase trust. Several proposals for higher quality and more consistent AI documentation have emerged to address ethical and legal concerns and general social impacts of such systems. However, there is little publ…
▽ More
As AI models and services are used in a growing number of highstakes areas, a consensus is forming around the need for a clearer record of how these models and services are developed to increase trust. Several proposals for higher quality and more consistent AI documentation have emerged to address ethical and legal concerns and general social impacts of such systems. However, there is little published work on how to create this documentation. This is the first work to describe a methodology for creating the form of AI documentation we call FactSheets. We have used this methodology to create useful FactSheets for nearly two dozen models. This paper describes this methodology and shares the insights we have gathered. Within each step of the methodology, we describe the issues to consider and the questions to explore with the relevant people in an organization who will be creating and consuming the AI facts in a FactSheet. This methodology will accelerate the broader adoption of transparent AI documentation.
△ Less
Submitted 27 June, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Distributed Inference with Sparse and Quantized Communication
Authors:
Aritra Mitra,
John A. Richards,
Saurabh Bagchi,
Shreyas Sundaram
Abstract:
We consider the problem of distributed inference where agents in a network observe a stream of private signals generated by an unknown state, and aim to uniquely identify this state from a finite set of hypotheses. We focus on scenarios where communication between agents is costly, and takes place over channels with finite bandwidth. To reduce the frequency of communication, we develop a novel eve…
▽ More
We consider the problem of distributed inference where agents in a network observe a stream of private signals generated by an unknown state, and aim to uniquely identify this state from a finite set of hypotheses. We focus on scenarios where communication between agents is costly, and takes place over channels with finite bandwidth. To reduce the frequency of communication, we develop a novel event-triggered distributed learning rule that is based on the principle of diffusing low beliefs on each false hypothesis. Building on this principle, we design a trigger condition under which an agent broadcasts only those components of its belief vector that have adequate innovation, to only those neighbors that require such information. We prove that our rule guarantees convergence to the true state exponentially fast almost surely despite sparse communication, and that it has the potential to significantly reduce information flow from uninformative agents to informative agents. Next, to deal with finite-precision communication channels, we propose a distributed learning rule that leverages the idea of adaptive quantization. We show that by sequentially refining the range of the quantizers, every agent can learn the truth exponentially fast almost surely, while using just $1$ bit to encode its belief on each hypothesis. For both our proposed algorithms, we rigorously characterize the trade-offs between communication-efficiency and the learning rate.
△ Less
Submitted 7 June, 2021; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Business (mis)Use Cases of Generative AI
Authors:
Stephanie Houde,
Vera Liao,
Jacquelyn Martino,
Michael Muller,
David Piorkowski,
John Richards,
Justin Weisz,
Yunfeng Zhang
Abstract:
Generative AI is a class of machine learning technology that learns to generate new data from training data. While deep fakes and media-and art-related generative AI breakthroughs have recently caught people's attention and imagination, the overall area is in its infancy for business use. Further, little is known about generative AI's potential for malicious misuse at large scale. Using co-creatio…
▽ More
Generative AI is a class of machine learning technology that learns to generate new data from training data. While deep fakes and media-and art-related generative AI breakthroughs have recently caught people's attention and imagination, the overall area is in its infancy for business use. Further, little is known about generative AI's potential for malicious misuse at large scale. Using co-creation design fictions with AI engineers, we explore the plausibility and severity of business misuse cases.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Experiences with Improving the Transparency of AI Models and Services
Authors:
Michael Hind,
Stephanie Houde,
Jacquelyn Martino,
Aleksandra Mojsilovic,
David Piorkowski,
John Richards,
Kush R. Varshney
Abstract:
AI models and services are used in a growing number of highstakes areas, resulting in a need for increased transparency. Consistent with this, several proposals for higher quality and more consistent documentation of AI data, models, and systems have emerged. Little is known, however, about the needs of those who would produce or consume these new forms of documentation. Through semi-structured de…
▽ More
AI models and services are used in a growing number of highstakes areas, resulting in a need for increased transparency. Consistent with this, several proposals for higher quality and more consistent documentation of AI data, models, and systems have emerged. Little is known, however, about the needs of those who would produce or consume these new forms of documentation. Through semi-structured developer interviews, and two document creation exercises, we have assembled a clearer picture of these needs and the various challenges faced in creating accurate and useful AI documentation. Based on the observations from this work, supplemented by feedback received during multiple design explorations and stakeholder conversations, we make recommendations for easing the collection and flexible presentation of AI facts to promote transparency.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques
Authors:
Vijay Arya,
Rachel K. E. Bellamy,
Pin-Yu Chen,
Amit Dhurandhar,
Michael Hind,
Samuel C. Hoffman,
Stephanie Houde,
Q. Vera Liao,
Ronny Luss,
Aleksandra Mojsilović,
Sami Mourad,
Pablo Pedemonte,
Ramya Raghavendra,
John Richards,
Prasanna Sattigeri,
Karthikeyan Shanmugam,
Moninder Singh,
Kush R. Varshney,
Dennis Wei,
Yunfeng Zhang
Abstract:
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these need…
▽ More
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (http://aix360.mybluemix.net/), an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessible versions of algorithms, to tutorials and an interactive web demo to introduce AI explainability to different audiences and application domains. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed.
△ Less
Submitted 14 September, 2019; v1 submitted 6 September, 2019;
originally announced September 2019.
-
A Communication-Efficient Algorithm for Exponentially Fast Non-Bayesian Learning in Networks
Authors:
Aritra Mitra,
John A. Richards,
Shreyas Sundaram
Abstract:
We introduce a simple time-triggered protocol to achieve communication-efficient non-Bayesian learning over a network. Specifically, we consider a scenario where a group of agents interact over a graph with the aim of discerning the true state of the world that generates their joint observation profiles. To address this problem, we propose a novel distributed learning rule wherein agents aggregate…
▽ More
We introduce a simple time-triggered protocol to achieve communication-efficient non-Bayesian learning over a network. Specifically, we consider a scenario where a group of agents interact over a graph with the aim of discerning the true state of the world that generates their joint observation profiles. To address this problem, we propose a novel distributed learning rule wherein agents aggregate neighboring beliefs based on a min-protocol, and the inter-communication intervals grow geometrically at a rate $a \geq 1$. Despite such sparse communication, we show that each agent is still able to rule out every false hypothesis exponentially fast with probability $1$, as long as $a$ is finite. For the special case when communication occurs at every time-step, i.e., when $a=1$, we prove that the asymptotic learning rates resulting from our algorithm are network-structure independent, and a strict improvement upon those existing in the literature. In contrast, when $a>1$, our analysis reveals that the asymptotic learning rates vary across agents, and exhibit a non-trivial dependence on the network topology coupled with the relative entropies of the agents' likelihood models. This motivates us to consider the problem of allocating signal structures to agents to maximize appropriate performance metrics. In certain special cases, we show that the eccentricity centrality and the decay centrality of the underlying graph help identify optimal allocations; for more general scenarios, we bound the deviation from the optimal allocation as a function of the parameter $a$, and the diameter of the communication graph.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
A New Approach to Distributed Hypothesis Testing and Non-Bayesian Learning: Improved Learning Rate and Byzantine-Resilience
Authors:
Aritra Mitra,
John A. Richards,
Shreyas Sundaram
Abstract:
We study a setting where a group of agents, each receiving partially informative private signals, seek to collaboratively learn the true underlying state of the world (from a finite set of hypotheses) that generates their joint observation profiles. To solve this problem, we propose a distributed learning rule that differs fundamentally from existing approaches, in that it does not employ any form…
▽ More
We study a setting where a group of agents, each receiving partially informative private signals, seek to collaboratively learn the true underlying state of the world (from a finite set of hypotheses) that generates their joint observation profiles. To solve this problem, we propose a distributed learning rule that differs fundamentally from existing approaches, in that it does not employ any form of "belief-averaging". Instead, agents update their beliefs based on a min-rule. Under standard assumptions on the observation model and the network structure, we establish that each agent learns the truth asymptotically almost surely. As our main contribution, we prove that with probability 1, each false hypothesis is ruled out by every agent exponentially fast at a network-independent rate that is strictly larger than existing rates. We then develop a computationally-efficient variant of our learning rule that is provably resilient to agents who do not behave as expected (as represented by a Byzantine adversary model) and deliberately try to spread misinformation.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
A New Approach for Distributed Hypothesis Testing with Extensions to Byzantine-Resilience
Authors:
Aritra Mitra,
John A. Richards,
Shreyas Sundaram
Abstract:
We study a setting where a group of agents, each receiving partially informative private observations, seek to collaboratively learn the true state (among a set of hypotheses) that explains their joint observation profiles over time. To solve this problem, we propose a distributed learning rule that differs fundamentally from existing approaches, in the sense, that it does not employ any form of "…
▽ More
We study a setting where a group of agents, each receiving partially informative private observations, seek to collaboratively learn the true state (among a set of hypotheses) that explains their joint observation profiles over time. To solve this problem, we propose a distributed learning rule that differs fundamentally from existing approaches, in the sense, that it does not employ any form of "belief-averaging". Specifically, every agent maintains a local belief (on each hypothesis) that is updated in a Bayesian manner without any network influence, and an actual belief that is updated (up to normalization) as the minimum of its own local belief and the actual beliefs of its neighbors. Under minimal requirements on the signal structures of the agents and the underlying communication graph, we establish consistency of the proposed belief update rule, i.e., we show that the actual beliefs of the agents asymptotically concentrate on the true state almost surely. As one of the key benefits of our approach, we show that our learning rule can be extended to scenarios that capture misbehavior on the part of certain agents in the network, modeled via the Byzantine adversary model. In particular, we prove that each non-adversarial agent can asymptotically learn the true state of the world almost surely, under appropriate conditions on the observation model and the network topology.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias
Authors:
Rachel K. E. Bellamy,
Kuntal Dey,
Michael Hind,
Samuel C. Hoffman,
Stephanie Houde,
Kalapriya Kannan,
Pranay Lohia,
Jacquelyn Martino,
Sameep Mehta,
Aleksandra Mojsilovic,
Seema Nagar,
Karthikeyan Natesan Ramamurthy,
John Richards,
Diptikalyan Saha,
Prasanna Sattigeri,
Moninder Singh,
Kush R. Varshney,
Yunfeng Zhang
Abstract:
Fairness is an increasingly important concern as machine learning models are used to support decision making in high-stakes applications such as mortgage lending, hiring, and prison sentencing. This paper introduces a new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license {https://github.com/ibm/aif360). The main objectives of this…
▽ More
Fairness is an increasingly important concern as machine learning models are used to support decision making in high-stakes applications such as mortgage lending, hiring, and prison sentencing. This paper introduces a new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license {https://github.com/ibm/aif360). The main objectives of this toolkit are to help facilitate the transition of fairness research algorithms to use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms.
The package includes a comprehensive set of fairness metrics for datasets and models, explanations for these metrics, and algorithms to mitigate bias in datasets and models. It also includes an interactive Web experience (https://aif360.mybluemix.net) that provides a gentle introduction to the concepts and capabilities for line-of-business users, as well as extensive documentation, usage guidance, and industry-specific tutorials to enable data scientists and practitioners to incorporate the most appropriate tool for their problem into their work products. The architecture of the package has been engineered to conform to a standard paradigm used in data science, thereby further improving usability for practitioners. Such architectural design and abstractions enable researchers and developers to extend the toolkit with their new algorithms and improvements, and to use it for performance benchmarking. A built-in testing infrastructure maintains code quality.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Classification of simulated radio signals using Wide Residual Networks for use in the search for extra-terrestrial intelligence
Authors:
G. A. Cox,
S. Egly,
G. R. Harp,
J. Richards,
S. Vinodababu,
J. Voien
Abstract:
We describe a new approach and algorithm for the detection of artificial signals and their classification in the search for extraterrestrial intelligence (SETI). The characteristics of radio signals observed during SETI research are often most apparent when those signals are represented as spectrograms. Additionally, many observed signals tend to share the same characteristics, allowing for sortin…
▽ More
We describe a new approach and algorithm for the detection of artificial signals and their classification in the search for extraterrestrial intelligence (SETI). The characteristics of radio signals observed during SETI research are often most apparent when those signals are represented as spectrograms. Additionally, many observed signals tend to share the same characteristics, allowing for sorting of the signals into different classes. For this work, complex-valued time-series data were simulated to produce a corpus of 140,000 signals from seven different signal classes. A wide residual neural network was then trained to classify these signal types using the gray-scale 2D spectrogram representation of those signals. An average $F_1$ score of 95.11\% was attained when tested on previously unobserved simulated signals. We also report on the performance of the model across a range of signal amplitudes.
△ Less
Submitted 22 March, 2018;
originally announced March 2018.
-
Detecting Egregious Conversations between Customers and Virtual Agents
Authors:
Tommy Sandbank,
Michal Shmueli-Scheuer,
Jonathan Herzig,
David Konopnicki,
John Richards,
David Piorkowski
Abstract:
Virtual agents are becoming a prominent channel of interaction in customer service. Not all customer interactions are smooth, however, and some can become almost comically bad. In such instances, a human agent might need to step in and salvage the conversation. Detecting bad conversations is important since disappointing customer service may threaten customer loyalty and impact revenue. In this pa…
▽ More
Virtual agents are becoming a prominent channel of interaction in customer service. Not all customer interactions are smooth, however, and some can become almost comically bad. In such instances, a human agent might need to step in and salvage the conversation. Detecting bad conversations is important since disappointing customer service may threaten customer loyalty and impact revenue. In this paper, we outline an approach to detecting such egregious conversations, using behavioral cues from the user, patterns in agent responses, and user-agent interaction. Using logs of two commercial systems, we show that using these features improves the detection F1-score by around 20% over using textual features alone. In addition, we show that those features are common across two quite different domains and, arguably, universal.
△ Less
Submitted 16 April, 2018; v1 submitted 15 November, 2017;
originally announced November 2017.