-
A Systematic Review of User-Centred Evaluation of Explainable AI in Healthcare
Authors:
Ivania Donoso-Guzmán,
Kristýna Sirka Kacafírková,
Maxwell Szymanski,
An Jacobs,
Denis Parra,
Katrien Verbert
Abstract:
Despite promising developments in Explainable Artificial Intelligence, the practical value of XAI methods remains under-explored and insufficiently validated in real-world settings. Robust and context-aware evaluation is essential, not only to produce understandable explanations but also to ensure their trustworthiness and usability for intended users, but tends to be overlooked because of no clea…
▽ More
Despite promising developments in Explainable Artificial Intelligence, the practical value of XAI methods remains under-explored and insufficiently validated in real-world settings. Robust and context-aware evaluation is essential, not only to produce understandable explanations but also to ensure their trustworthiness and usability for intended users, but tends to be overlooked because of no clear guidelines on how to design an evaluation with users.
This study addresses this gap with two main goals: (1) to develop a framework of well-defined, atomic properties that characterise the user experience of XAI in healthcare; and (2) to provide clear, context-sensitive guidelines for defining evaluation strategies based on system characteristics.
We conducted a systematic review of 82 user studies, sourced from five databases, all situated within healthcare settings and focused on evaluating AI-generated explanations. The analysis was guided by a predefined coding scheme informed by an existing evaluation framework, complemented by inductive codes developed iteratively.
The review yields three key contributions: (1) a synthesis of current evaluation practices, highlighting a growing focus on human-centred approaches in healthcare XAI; (2) insights into the interrelations among explanation properties; and (3) an updated framework and a set of actionable guidelines to support interdisciplinary teams in designing and implementing effective evaluation strategies for XAI systems tailored to specific application contexts.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
A Neural Network Model of Spatial and Feature-Based Attention
Authors:
Ruoyang Hu,
Robert A. Jacobs
Abstract:
Visual attention is a mechanism closely intertwined with vision and memory. Top-down information influences visual processing through attention. We designed a neural network model inspired by aspects of human visual attention. This model consists of two networks: one serves as a basic processor performing a simple task, while the other processes contextual information and guides the first network…
▽ More
Visual attention is a mechanism closely intertwined with vision and memory. Top-down information influences visual processing through attention. We designed a neural network model inspired by aspects of human visual attention. This model consists of two networks: one serves as a basic processor performing a simple task, while the other processes contextual information and guides the first network through attention to adapt to more complex tasks. After training the model and visualizing the learned attention response, we discovered that the model's emergent attention patterns corresponded to spatial and feature-based attention. This similarity between human visual attention and attention in computer vision suggests a promising direction for studying human cognition using neural network models.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Towards Fair Medical AI: Adversarial Debiasing of 3D CT Foundation Embeddings
Authors:
Guangyao Zheng,
Michael A. Jacobs,
Vladimir Braverman,
Vishwa S. Parekh
Abstract:
Self-supervised learning has revolutionized medical imaging by enabling efficient and generalizable feature extraction from large-scale unlabeled datasets. Recently, self-supervised foundation models have been extended to three-dimensional (3D) computed tomography (CT) data, generating compact, information-rich embeddings with 1408 features that achieve state-of-the-art performance on downstream t…
▽ More
Self-supervised learning has revolutionized medical imaging by enabling efficient and generalizable feature extraction from large-scale unlabeled datasets. Recently, self-supervised foundation models have been extended to three-dimensional (3D) computed tomography (CT) data, generating compact, information-rich embeddings with 1408 features that achieve state-of-the-art performance on downstream tasks such as intracranial hemorrhage detection and lung cancer risk forecasting. However, these embeddings have been shown to encode demographic information, such as age, sex, and race, which poses a significant risk to the fairness of clinical applications.
In this work, we propose a Variation Autoencoder (VAE) based adversarial debiasing framework to transform these embeddings into a new latent space where demographic information is no longer encoded, while maintaining the performance of critical downstream tasks. We validated our approach on the NLST lung cancer screening dataset, demonstrating that the debiased embeddings effectively eliminate multiple encoded demographic information and improve fairness without compromising predictive accuracy for lung cancer risk at 1-year and 2-year intervals. Additionally, our approach ensures the embeddings are robust against adversarial bias attacks. These results highlight the potential of adversarial debiasing techniques to ensure fairness and equity in clinical applications of self-supervised 3D CT embeddings, paving the way for their broader adoption in unbiased medical decision-making.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
Authors:
Hanna Wallach,
Meera Desai,
A. Feder Cooper,
Angelina Wang,
Chad Atalla,
Solon Barocas,
Su Lin Blodgett,
Alexandra Chouldechova,
Emily Corvi,
P. Alex Dow,
Jean Garcia-Gathright,
Alexandra Olteanu,
Nicholas Pangakis,
Stefanie Reed,
Emily Sheng,
Dan Vann,
Jennifer Wortman Vaughan,
Matthew Vogel,
Hannah Washington,
Abigail Z. Jacobs
Abstract:
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges comparisons" (Roose, 2024). In this position paper, we argue that the ML community would benefit from learning from and drawing on the social sciences when developing and using measurement instruments fo…
▽ More
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges comparisons" (Roose, 2024). In this position paper, we argue that the ML community would benefit from learning from and drawing on the social sciences when developing and using measurement instruments for evaluating GenAI systems. Specifically, our position is that evaluating GenAI systems is a social science measurement challenge. We present a four-level framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, behaviors, and impacts of GenAI systems. This framework has two important implications: First, it can broaden the expertise involved in evaluating GenAI systems by enabling stakeholders with different perspectives to participate in conceptual debates. Second, it brings rigor to both conceptual and operational debates by offering a set of lenses for interrogating validity.
△ Less
Submitted 6 June, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Authors:
A. Feder Cooper,
Christopher A. Choquette-Choo,
Miranda Bogen,
Matthew Jagielski,
Katja Filippova,
Ken Ziyu Liu,
Alexandra Chouldechova,
Jamie Hayes,
Yangsibo Huang,
Niloofar Mireshghallah,
Ilia Shumailov,
Eleni Triantafillou,
Peter Kairouz,
Nicole Mitchell,
Percy Liang,
Daniel E. Ho,
Yejin Choi,
Sanmi Koyejo,
Fernando Delgado,
James Grimmelmann,
Vitaly Shmatikov,
Christopher De Sa,
Solon Barocas,
Amy Cyphert,
Mark Lemley
, et al. (10 additional authors not shown)
Abstract:
We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. These aspirations are both numerous and varied, motivated by issues that pertain to privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effect…
▽ More
We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. These aspirations are both numerous and varied, motivated by issues that pertain to privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of targeted information from a generative-AI model's parameters, e.g., a particular individual's personal data or in-copyright expression of Spiderman that was included in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of "Spiderman." Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a framework for thinking rigorously about these challenges, which enables us to be clear about why unlearning is not a general-purpose solution for circumscribing generative-AI model behavior in service of broader positive impact. We aim for conceptual clarity and to encourage more thoughtful communication among machine learning (ML), law, and policy experts who seek to develop and apply technical methods for compliance with policy objectives.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Demographic Predictability in 3D CT Foundation Embeddings
Authors:
Guangyao Zheng,
Michael A. Jacobs,
Vishwa S. Parekh
Abstract:
Self-supervised foundation models have recently been successfully extended to encode three-dimensional (3D) computed tomography (CT) images, with excellent performance across several downstream tasks, such as intracranial hemorrhage detection and lung cancer risk forecasting. However, as self-supervised models learn from complex data distributions, questions arise concerning whether these embeddin…
▽ More
Self-supervised foundation models have recently been successfully extended to encode three-dimensional (3D) computed tomography (CT) images, with excellent performance across several downstream tasks, such as intracranial hemorrhage detection and lung cancer risk forecasting. However, as self-supervised models learn from complex data distributions, questions arise concerning whether these embeddings capture demographic information, such as age, sex, or race. Using the National Lung Screening Trial (NLST) dataset, which contains 3D CT images and demographic data, we evaluated a range of classifiers: softmax regression, linear regression, linear support vector machine, random forest, and decision tree, to predict sex, race, and age of the patients in the images. Our results indicate that the embeddings effectively encoded age and sex information, with a linear regression model achieving a root mean square error (RMSE) of 3.8 years for age prediction and a softmax regression model attaining an AUC of 0.998 for sex classification. Race prediction was less effective, with an AUC of 0.878. These findings suggest a detailed exploration into the information encoded in self-supervised learning frameworks is needed to help ensure fair, responsible, and patient privacy-protected healthcare AI.
△ Less
Submitted 27 November, 2024;
originally announced December 2024.
-
Evaluating Generative AI Systems is a Social Science Measurement Challenge
Authors:
Hanna Wallach,
Meera Desai,
Nicholas Pangakis,
A. Feder Cooper,
Angelina Wang,
Solon Barocas,
Alexandra Chouldechova,
Chad Atalla,
Su Lin Blodgett,
Emily Corvi,
P. Alex Dow,
Jean Garcia-Gathright,
Alexandra Olteanu,
Stefanie Reed,
Emily Sheng,
Dan Vann,
Jennifer Wortman Vaughan,
Matthew Vogel,
Hannah Washington,
Abigail Z. Jacobs
Abstract:
Across academia, industry, and government, there is an increasing awareness that the measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult. We argue that these measurement tasks are highly reminiscent of measurement tasks found throughout the social sciences. With this in mind, we present a framework, grounded in measurement theory from the social sciences…
▽ More
Across academia, industry, and government, there is an increasing awareness that the measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult. We argue that these measurement tasks are highly reminiscent of measurement tasks found throughout the social sciences. With this in mind, we present a framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, impacts, opportunities, and risks of GenAI systems. The framework distinguishes between four levels: the background concept, the systematized concept, the measurement instrument(s), and the instance-level measurements themselves. This four-level approach differs from the way measurement is typically done in ML, where researchers and practitioners appear to jump straight from background concepts to measurement instruments, with little to no explicit systematization in between. As well as surfacing assumptions, thereby making it easier to understand exactly what the resulting measurements do and do not mean, this framework has two important implications for evaluating evaluations: First, it can enable stakeholders from different worlds to participate in conceptual debates, broadening the expertise involved in evaluating GenAI systems. Second, it brings rigor to operational debates by offering a set of lenses for interrogating the validity of measurement instruments and their resulting measurements.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
Predicting DNA fragmentation: A non-destructive analogue to chemical assays using machine learning
Authors:
Byron A Jacobs,
Ifthakaar Shaik,
Frando Lin
Abstract:
Globally, infertility rates are increasing, with 2.5\% of all births being assisted by in vitro fertilisation (IVF) in 2022. Male infertility is the cause for approximately half of these cases. The quality of sperm DNA has substantial impact on the success of IVF. The assessment of sperm DNA is traditionally done through chemical assays which render sperm cells ineligible for IVF. Many compounding…
▽ More
Globally, infertility rates are increasing, with 2.5\% of all births being assisted by in vitro fertilisation (IVF) in 2022. Male infertility is the cause for approximately half of these cases. The quality of sperm DNA has substantial impact on the success of IVF. The assessment of sperm DNA is traditionally done through chemical assays which render sperm cells ineligible for IVF. Many compounding factors lead to the population crisis, with fertility rates dropping globally in recent history. As such assisted reproductive technologies (ART) have been the focus of recent research efforts. Simultaneously, artificial intelligence has grown ubiquitous and is permeating more aspects of modern life. With the advent of state-of-the-art machine learning and its exceptional performance in many sectors, this work builds on these successes and proposes a novel framework for the prediction of sperm cell DNA fragmentation from images of unstained sperm. Rendering a predictive model which preserves sperm integrity and allows for optimal selection of sperm for IVF.
△ Less
Submitted 12 February, 2025; v1 submitted 20 September, 2024;
originally announced September 2024.
-
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Authors:
Jinghan Yao,
Sam Ade Jacobs,
Masahiro Tanaka,
Olatunji Ruwase,
Hari Subramoni,
Dhabaleswar K. Panda
Abstract:
Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on extremely long contexts demands considerable GPU resources and increased memory, leading to higher costs and greater complexity. Alternative approaches that intro…
▽ More
Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on extremely long contexts demands considerable GPU resources and increased memory, leading to higher costs and greater complexity. Alternative approaches that introduce long context capabilities via downstream finetuning or adaptations impose significant design limitations. In this paper, we propose Fully Pipelined Distributed Transformer (FPDT) for efficiently training long-context LLMs with extreme hardware efficiency. For GPT and Llama models, we achieve a 16x increase in sequence length that can be trained on the same hardware compared to current state-of-the-art solutions. With our dedicated sequence chunk pipeline design, we can now train 8B LLM with 2 million sequence length on only 4 GPUs, while also maintaining over 55% of MFU. Our proposed FPDT is agnostic to existing training techniques and is proven to work efficiently across different LLM models.
△ Less
Submitted 13 May, 2025; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Authors:
Xinyu Lian,
Sam Ade Jacobs,
Lev Kurilenko,
Masahiro Tanaka,
Stas Bekman,
Olatunji Ruwase,
Minjia Zhang
Abstract:
Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state across multiple accelerators, a requirement for model scaling. Consolidating distributed model state into a single checkpoint unacceptably slows down training, and is impractical at extreme scales. Distributed checkpoints, in contrast, are t…
▽ More
Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state across multiple accelerators, a requirement for model scaling. Consolidating distributed model state into a single checkpoint unacceptably slows down training, and is impractical at extreme scales. Distributed checkpoints, in contrast, are tightly coupled to the model parallelism and hardware configurations of the training run, and thus unusable on different configurations. To address this problem, we propose Universal Checkpointing, a technique that enables efficient checkpoint creation while providing the flexibility of resuming on arbitrary parallelism strategy and hardware configurations. Universal Checkpointing unlocks unprecedented capabilities for large-scale training such as improved resilience to hardware failures through continued training on remaining healthy hardware, and reduced training time through opportunistic exploitation of elastic capacity.
The key insight of Universal Checkpointing is the selection of the optimal representation in each phase of the checkpointing life cycle: distributed representation for saving, and consolidated representation for loading. This is achieved using two key mechanisms. First, the universal checkpoint format, which consists of a consolidated representation of each model parameter and metadata for mapping parameter fragments into training ranks of arbitrary model-parallelism configuration. Second, the universal checkpoint language, a simple but powerful specification language for converting distributed checkpoints into the universal checkpoint format. Our evaluation demonstrates the effectiveness and generality of Universal Checkpointing on state-of-the-art model architectures and a wide range of parallelism techniques.
△ Less
Submitted 27 June, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Algorithmic Transparency and Participation through the Handoff Lens: Lessons Learned from the U.S. Census Bureau's Adoption of Differential Privacy
Authors:
Amina A. Abdu,
Lauren M. Chambers,
Deirdre K. Mulligan,
Abigail Z. Jacobs
Abstract:
Emerging discussions on the responsible government use of algorithmic technologies propose transparency and public participation as key mechanisms for preserving accountability and trust. But in practice, the adoption and use of any technology shifts the social, organizational, and political context in which it is embedded. Therefore translating transparency and participation efforts into meaningf…
▽ More
Emerging discussions on the responsible government use of algorithmic technologies propose transparency and public participation as key mechanisms for preserving accountability and trust. But in practice, the adoption and use of any technology shifts the social, organizational, and political context in which it is embedded. Therefore translating transparency and participation efforts into meaningful, effective accountability must take into account these shifts. We adopt two theoretical frames, Mulligan and Nissenbaum's handoff model and Star and Griesemer's boundary objects, to reveal such shifts during the U.S. Census Bureau's adoption of differential privacy (DP) in its updated disclosure avoidance system (DAS) for the 2020 census. This update preserved (and arguably strengthened) the confidentiality protections that the Bureau is mandated to uphold, and the Bureau engaged in a range of activities to facilitate public understanding of and participation in the system design process. Using publicly available documents concerning the Census' implementation of DP, this case study seeks to expand our understanding of how technical shifts implicate values, how such shifts can afford (or fail to afford) greater transparency and participation in system design, and the importance of localized expertise throughout. We present three lessons from this case study toward grounding understandings of algorithmic transparency and participation: (1) efforts towards transparency and participation in algorithmic governance must center values and policy decisions, not just technical design decisions; (2) the handoff model is a useful tool for revealing how such values may be cloaked beneath technical decisions; and (3) boundary objects alone cannot bridge distant communities without trusted experts traveling alongside to broker their adoption.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Jyoti Aneja,
Hany Awadalla,
Ahmed Awadallah,
Ammar Ahmad Awan,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Martin Cai,
Qin Cai,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Weizhu Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Hao Cheng,
Parul Chopra,
Xiyang Dai
, et al. (104 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts.
△ Less
Submitted 30 August, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Individual Text Corpora Predict Openness, Interests, Knowledge and Level of Education
Authors:
Markus J. Hofmann,
Markus T. Jansen,
Christoph Wigbels,
Benny Briesemeister,
Arthur M. Jacobs
Abstract:
Here we examine whether the personality dimension of openness to experience can be predicted from the individual google search history. By web scraping, individual text corpora (ICs) were generated from 214 participants with a mean number of 5 million word tokens. We trained word2vec models and used the similarities of each IC to label words, which were derived from a lexical approach of personali…
▽ More
Here we examine whether the personality dimension of openness to experience can be predicted from the individual google search history. By web scraping, individual text corpora (ICs) were generated from 214 participants with a mean number of 5 million word tokens. We trained word2vec models and used the similarities of each IC to label words, which were derived from a lexical approach of personality. These IC-label-word similarities were utilized as predictive features in neural models. For training and validation, we relied on 179 participants and held out a test sample of 35 participants. A grid search with varying number of predictive features, hidden units and boost factor was performed. As model selection criterion, we used R2 in the validation samples penalized by the absolute R2 difference between training and validation. The selected neural model explained 35% of the openness variance in the test sample, while an ensemble model with the same architecture often provided slightly more stable predictions for intellectual interests, knowledge in humanities and level of education. Finally, a learning curve analysis suggested that around 500 training participants are required for generalizable predictions. We discuss ICs as a complement or replacement of survey-based psychodiagnostics.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture Technology
Authors:
Emma Harvey,
Hauke Sandhaus,
Abigail Z. Jacobs,
Emanuel Moss,
Mona Sloane
Abstract:
Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, a…
▽ More
Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, and their varying attention to errors, become ingrained in motion capture design and innovation over time. Moreover, we show how contemporary motion capture systems perpetuate assumptions about human bodies and their movements. We suggest that social practices of measurement and validation are ubiquitous in the development of data- and sensor-driven systems more broadly, and provide this work as a basis for investigating hidden design assumptions and their potential negative consequences in human-computer interaction.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Report of the 1st Workshop on Generative AI and Law
Authors:
A. Feder Cooper,
Katherine Lee,
James Grimmelmann,
Daphne Ippolito,
Christopher Callison-Burch,
Christopher A. Choquette-Choo,
Niloofar Mireshghallah,
Miles Brundage,
David Mimno,
Madiha Zahrah Choksi,
Jack M. Balkin,
Nicholas Carlini,
Christopher De Sa,
Jonathan Frankle,
Deep Ganguli,
Bryant Gipson,
Andres Guadamuz,
Swee Leng Harris,
Abigail Z. Jacobs,
Elizabeth Joh,
Gautam Kamath,
Mark Lemley,
Cass Matthews,
Christine McLeavey,
Corynne McSherry
, et al. (10 additional authors not shown)
Abstract:
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report…
▽ More
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report with a high-level statement about why Generative AI is both immensely significant and immensely challenging for law. To meet these challenges, we conclude that there is an essential need for 1) a shared knowledge base that provides a common conceptual language for experts across disciplines; 2) clarification of the distinctive technical capabilities of generative-AI systems, as compared and contrasted to other computer and AI systems; 3) a logical taxonomy of the legal issues these systems raise; and, 4) a concrete research agenda to promote collaboration and knowledge-sharing on emerging issues at the intersection of Generative AI and law. In this report, we synthesize the key takeaways from the GenLaw workshop that begin to address these needs. All of the listed authors contributed to the workshop upon which this report is based, but they and their organizations do not necessarily endorse all of the specific claims in this report.
△ Less
Submitted 2 December, 2023; v1 submitted 10 November, 2023;
originally announced November 2023.
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Authors:
Sam Ade Jacobs,
Masahiro Tanaka,
Chengming Zhang,
Minjia Zhang,
Shuaiwen Leon Song,
Samyam Rajbhandari,
Yuxiong He
Abstract:
Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the first three dimensions: data parallelism for batch size, tensor parallelism for hidden size and pipeline parallelism for model depth or layers. These widely studie…
▽ More
Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the first three dimensions: data parallelism for batch size, tensor parallelism for hidden size and pipeline parallelism for model depth or layers. These widely studied forms of parallelism are not targeted or optimized for long sequence Transformer models. Given practical application needs for long sequence LLM, renewed attentions are being drawn to sequence parallelism. However, existing works in sequence parallelism are constrained by memory-communication inefficiency, limiting their scalability to long sequence large models. In this work, we introduce DeepSpeed-Ulysses, a novel, portable and effective methodology for enabling highly efficient and scalable LLM training with extremely long sequence length. DeepSpeed-Ulysses at its core partitions input data along the sequence dimension and employs an efficient all-to-all collective communication for attention computation. Theoretical communication analysis shows that whereas other methods incur communication overhead as sequence length increases, DeepSpeed-Ulysses maintains constant communication volume when sequence length and compute devices are increased proportionally. Furthermore, experimental evaluations show that DeepSpeed-Ulysses trains 2.5x faster with 4x longer sequence length than the existing method SOTA baseline.
△ Less
Submitted 4 October, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature
Authors:
Amina A. Abdu,
Irene V. Pasquetto,
Abigail Z. Jacobs
Abstract:
Recent work in algorithmic fairness has highlighted the challenge of defining racial categories for the purposes of anti-discrimination. These challenges are not new but have previously fallen to the state, which enacts race through government statistics, policies, and evidentiary standards in anti-discrimination law. Drawing on the history of state race-making, we examine how longstanding questio…
▽ More
Recent work in algorithmic fairness has highlighted the challenge of defining racial categories for the purposes of anti-discrimination. These challenges are not new but have previously fallen to the state, which enacts race through government statistics, policies, and evidentiary standards in anti-discrimination law. Drawing on the history of state race-making, we examine how longstanding questions about the nature of race and discrimination appear within the algorithmic fairness literature. Through a content analysis of 60 papers published at FAccT between 2018 and 2020, we analyze how race is conceptualized and formalized in algorithmic fairness frameworks. We note that differing notions of race are adopted inconsistently, at times even within a single analysis. We also explore the institutional influences and values associated with these choices. While we find that categories used in algorithmic fairness work often echo legal frameworks, we demonstrate that values from academic computer science play an equally important role in the construction of racial categories. Finally, we examine the reasoning behind different operationalizations of race, finding that few papers explicitly describe their choices and even fewer justify them. We argue that the construction of racial categories is a value-laden process with significant social and political consequences for the project of algorithmic fairness. The widespread lack of justification around the operationalization of race reflects institutional norms that allow these political decisions to remain obscured within the backstage of knowledge production.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study
Authors:
Patrick Saux,
Pierre Bauvin,
Violeta Raverdy,
Julien Teigny,
Hélène Verkindt,
Tomy Soumphonphakdy,
Maxence Debert,
Anne Jacobs,
Daan Jacobs,
Valerie Monpellier,
Phong Ching Lee,
Chin Hong Lim,
Johanna C Andersson-Assarsson,
Lena Carlsson,
Per-Arne Svensson,
Florence Galtier,
Guelareh Dezfoulian,
Mihaela Moldovanu,
Severine Andrieux,
Julien Couster,
Marie Lepage,
Erminia Lembo,
Ornella Verrastro,
Maud Robert,
Paulina Salminen
, et al. (9 additional authors not shown)
Abstract:
Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participa…
▽ More
Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Authors:
Guanhua Wang,
Heyang Qin,
Sam Ade Jacobs,
Connor Holmes,
Samyam Rajbhandari,
Olatunji Ruwase,
Feng Yan,
Lei Yang,
Yuxiong He
Abstract:
Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass,…
▽ More
Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low precision data. Collectively, ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better throughput at 384 GPU scale.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
A framework for dynamically training and adapting deep reinforcement learning models to different, low-compute, and continuously changing radiology deployment environments
Authors:
Guangyao Zheng,
Shuhao Lai,
Vladimir Braverman,
Michael A. Jacobs,
Vishwa S. Parekh
Abstract:
While Deep Reinforcement Learning has been widely researched in medical imaging, the training and deployment of these models usually require powerful GPUs. Since imaging environments evolve rapidly and can be generated by edge devices, the algorithm is required to continually learn and adapt to changing environments, and adjust to low-compute devices. To this end, we developed three image coreset…
▽ More
While Deep Reinforcement Learning has been widely researched in medical imaging, the training and deployment of these models usually require powerful GPUs. Since imaging environments evolve rapidly and can be generated by edge devices, the algorithm is required to continually learn and adapt to changing environments, and adjust to low-compute devices. To this end, we developed three image coreset algorithms to compress and denoise medical images for selective experience replayed-based lifelong reinforcement learning. We implemented neighborhood averaging coreset, neighborhood sensitivity-based sampling coreset, and maximum entropy coreset on full-body DIXON water and DIXON fat MRI images. All three coresets produced 27x compression with excellent performance in localizing five anatomical landmarks: left knee, right trochanter, left kidney, spleen, and lung across both imaging environments. Maximum entropy coreset obtained the best performance of $11.97\pm 12.02$ average distance error, compared to the conventional lifelong learning framework's $19.24\pm 50.77$.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Multi-environment lifelong deep reinforcement learning for medical imaging
Authors:
Guangyao Zheng,
Shuhao Lai,
Vladimir Braverman,
Michael A. Jacobs,
Vishwa S. Parekh
Abstract:
Deep reinforcement learning(DRL) is increasingly being explored in medical imaging. However, the environments for medical imaging tasks are constantly evolving in terms of imaging orientations, imaging sequences, and pathologies. To that end, we developed a Lifelong DRL framework, SERIL to continually learn new tasks in changing imaging environments without catastrophic forgetting. SERIL was devel…
▽ More
Deep reinforcement learning(DRL) is increasingly being explored in medical imaging. However, the environments for medical imaging tasks are constantly evolving in terms of imaging orientations, imaging sequences, and pathologies. To that end, we developed a Lifelong DRL framework, SERIL to continually learn new tasks in changing imaging environments without catastrophic forgetting. SERIL was developed using selective experience replay based lifelong learning technique for the localization of five anatomical landmarks in brain MRI on a sequence of twenty-four different imaging environments. The performance of SERIL, when compared to two baseline setups: MERT(multi-environment-best-case) and SERT(single-environment-worst-case) demonstrated excellent performance with an average distance of $9.90\pm7.35$ pixels from the desired landmark across all 120 tasks, compared to $10.29\pm9.07$ for MERT and $36.37\pm22.41$ for SERT($p<0.05$), demonstrating the excellent potential for continuously learning multiple tasks across dynamically changing imaging environments.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
The Role of Relevance in Fair Ranking
Authors:
Aparna Balagopalan,
Abigail Z. Jacobs,
Asia Biega
Abstract:
Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because t…
▽ More
Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because these constructs are typically not directly observable, platforms must instead resort to using proxy scores such as relevance and infer them from behavioral signals such as searcher clicks. Yet, it remains an open question whether relevance fulfills its role as such a worthiness score in high-stakes fair rankings. In this paper, we combine perspectives and tools from the social sciences, information retrieval, and fairness in machine learning to derive a set of desired criteria that relevance scores should satisfy in order to meaningfully guide fairness interventions. We then empirically show that not all of these criteria are met in a case study of relevance inferred from biased user click data. We assess the impact of these violations on the estimated system fairness and analyze whether existing fairness interventions may mitigate the identified issues. Our analyses and results surface the pressing need for new approaches to relevance collection and generation that are suitable for use in fair ranking.
△ Less
Submitted 6 June, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Asynchronous Decentralized Federated Lifelong Learning for Landmark Localization in Medical Imaging
Authors:
Guangyao Zheng,
Michael A. Jacobs,
Vladimir Braverman,
Vishwa S. Parekh
Abstract:
Federated learning is a recent development in the machine learning area that allows a system of devices to train on one or more tasks without sharing their data to a single location or device. However, this framework still requires a centralized global model to consolidate individual models into one, and the devices train synchronously, which both can be potential bottlenecks for using federated l…
▽ More
Federated learning is a recent development in the machine learning area that allows a system of devices to train on one or more tasks without sharing their data to a single location or device. However, this framework still requires a centralized global model to consolidate individual models into one, and the devices train synchronously, which both can be potential bottlenecks for using federated learning. In this paper, we propose a novel method of asynchronous decentralized federated lifelong learning (ADFLL) method that inherits the merits of federated learning and can train on multiple tasks simultaneously without the need for a central node or synchronous training. Thus, overcoming the potential drawbacks of conventional federated learning. We demonstrate excellent performance on the brain tumor segmentation (BRATS) dataset for localizing the left ventricle on multiple image sequences and image orientation. Our framework allows agents to achieve the best performance with a mean distance error of 7.81, better than the conventional all-knowing agent's mean distance error of 11.78, and significantly (p=0.01) better than a conventional lifelong learning agent with a distance error of 15.17 after eight rounds of training. In addition, all ADFLL agents have comparable or better performance than a conventional LL agent. In conclusion, we developed an ADFLL framework with excellent performance and speed-up compared to conventional RL agents.
△ Less
Submitted 10 January, 2024; v1 submitted 12 March, 2023;
originally announced March 2023.
-
Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging
Authors:
Guangyao Zheng,
Samson Zhou,
Vladimir Braverman,
Michael A. Jacobs,
Vishwa S. Parekh
Abstract:
Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experienc…
▽ More
Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experiences from all previous tasks make lifelong learning using selective experience replay computationally very expensive and impractical as the number of tasks increase. To that end, we propose a reward distribution-preserving coreset compression technique for compressing experience replay buffers stored for selective experience replay.
We evaluated the coreset compression technique on the brain tumor segmentation (BRATS) dataset for the task of ventricle localization and on the whole-body MRI for localization of left knee cap, left kidney, right trochanter, left lung, and spleen. The coreset lifelong learning models trained on a sequence of 10 different brain MR imaging environments demonstrated excellent performance localizing the ventricle with a mean pixel error distance of 12.93 for the compression ratio of 10x. In comparison, the conventional lifelong learning model localized the ventricle with a mean pixel distance of 10.87. Similarly, the coreset lifelong learning models trained on whole-body MRI demonstrated no significant difference (p=0.28) between the 10x compressed coreset lifelong learning models and conventional lifelong learning models for all the landmarks. The mean pixel distance for the 10x compressed models across all the landmarks was 25.30, compared to 19.24 for the conventional lifelong learning models. Our results demonstrate that the potential of the coreset-based ERB compression method for compressing experiences without a significant drop in performance.
△ Less
Submitted 9 January, 2024; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Eilmer: an Open-Source Multi-Physics Hypersonic Flow Solver
Authors:
Nicholas N. Gibbons,
Kyle A. Damm,
Peter A. Jacobs,
Rowan J. Gollan
Abstract:
This paper introduces Eilmer, a general-purpose open-source compressible flow solver developed at the University of Queensland, designed to support research calculations in hypersonics and high-speed aerothermodynamics. Eilmer has a broad userbase in several university research groups and a wide range of capabilities, which are documented on the project's website, in the accompanying reference man…
▽ More
This paper introduces Eilmer, a general-purpose open-source compressible flow solver developed at the University of Queensland, designed to support research calculations in hypersonics and high-speed aerothermodynamics. Eilmer has a broad userbase in several university research groups and a wide range of capabilities, which are documented on the project's website, in the accompanying reference manuals, and in an extensive catalogue of example simulations. The first part of this paper describes the formulation of the code: the equations, physical models, and numerical methods that are used in a basic fluid dynamics simulation, as well as a handful of optional multi-physics models that are commonly added on to do calculations of hypersonic flow. The second section describes the processes used to develop and maintain the code, documenting our adherence to good programming practice and endorsing certain techniques that seem to be particularly helpful for scientific codes. The final section describes a half-dozen example simulations that span the range of Eilmer's capabilities, each consisting of some sample results and a short explanation of the problem being solved, which together will hopefully assist new users in beginning to use Eilmer in their own research projects.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Image Trinarization Using a Partial Differential Equations: A Novel Approach to Automatic Sperm Image Analysis
Authors:
B. A. Jacobs
Abstract:
Partial differential equations have recently garnered substantial attention as an image processing framework due to their extensibility, the ability to rigorously engineer and analyse the governing dynamics as well as the ease of implementation using numerical methods. This paper explores a novel approach to image trinarization with a concrete real-world application of classifying regions of sperm…
▽ More
Partial differential equations have recently garnered substantial attention as an image processing framework due to their extensibility, the ability to rigorously engineer and analyse the governing dynamics as well as the ease of implementation using numerical methods. This paper explores a novel approach to image trinarization with a concrete real-world application of classifying regions of sperm images used in the automatic analysis of sperm morphology. The proposed methodology engineers a diffusion equation with non-linear source term, exhibiting three steady-states. The model is implemented as an image processor using a standard finite difference method to illustrate the efficacy of the proposed approach. The performance of the proposed approach is benchmarked against standard image clustering/segmentation methods and shown to be highly effective.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature
Authors:
Arthur M. Jacobs,
Annette Kinder
Abstract:
The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. In this study we address differences among the different literature categories in GLEC, as well as differences between authors. We report the results of three studies providing i) topic and sentiment analyses for…
▽ More
The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. In this study we address differences among the different literature categories in GLEC, as well as differences between authors. We report the results of three studies providing i) topic and sentiment analyses for six text categories of GLEC (i.e., children and youth, essays, novels, plays, poems, stories) and its >100 authors, ii) novel measures of semantic complexity as indices of the literariness, creativity and book beauty of the works in GLEC (e.g., Jane Austen's six novels), and iii) two experiments on text classification and authorship recognition using novel features of semantic complexity. The data on two novel measures estimating a text's literariness, intratextual variance and stepwise distance (van Cranenburgh et al., 2019) revealed that plays are the most literary texts in GLEC, followed by poems and novels. Computation of a novel index of text creativity (Gray et al., 2016) revealed poems and plays as the most creative categories with the most creative authors all being poets (Milton, Pope, Keats, Byron, or Wordsworth). We also computed a novel index of perceived beauty of verbal art (Kintsch, 2012) for the works in GLEC and predict that Emma is the theoretically most beautiful of Austen's novels. Finally, we demonstrate that these novel measures of semantic complexity are important features for text classification and authorship recognition with overall predictive accuracies in the range of .75 to .97. Our data pave the way for future computational and empirical studies of literature or experiments in reading psychology and offer multiple baselines and benchmarks for analysing and validating other book corpora.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Cross-Domain Federated Learning in Medical Imaging
Authors:
Vishwa S Parekh,
Shuhao Lai,
Vladimir Braverman,
Jeff Leal,
Steven Rowe,
Jay J Pillai,
Michael A Jacobs
Abstract:
Federated learning is increasingly being explored in the field of medical imaging to train deep learning models on large scale datasets distributed across different data centers while preserving privacy by avoiding the need to transfer sensitive patient information. In this manuscript, we explore federated learning in a multi-domain, multi-task setting wherein different participating nodes may con…
▽ More
Federated learning is increasingly being explored in the field of medical imaging to train deep learning models on large scale datasets distributed across different data centers while preserving privacy by avoiding the need to transfer sensitive patient information. In this manuscript, we explore federated learning in a multi-domain, multi-task setting wherein different participating nodes may contain datasets sourced from different domains and are trained to solve different tasks. We evaluated cross-domain federated learning for the tasks of object detection and segmentation across two different experimental settings: multi-modal and multi-organ. The result from our experiments on cross-domain federated learning framework were very encouraging with an overlap similarity of 0.79 for organ localization and 0.65 for lesion segmentation. Our results demonstrate the potential of federated learning in developing multi-domain, multi-task deep learning models without sharing data from different domains.
△ Less
Submitted 18 December, 2021;
originally announced December 2021.
-
Learning Interpretable Models Through Multi-Objective Neural Architecture Search
Authors:
Zachariah Carmichael,
Tim Moon,
Sam Ade Jacobs
Abstract:
Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these meth…
▽ More
Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these methods more pragmatic by exploiting distributed computation and novel optimization algorithms. However, there is little work in optimizing architectures for interpretability. To this end, we propose a multi-objective distributed NAS framework that optimizes for both task performance and "introspectability," a surrogate metric for aspects of interpretability. We leverage the non-dominated sorting genetic algorithm (NSGA-II) and explainable AI (XAI) techniques to reward architectures that can be better comprehended by domain experts. The framework is evaluated on several image classification datasets. We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within tolerable error.
△ Less
Submitted 4 July, 2023; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Electoral Programs of German Parties 2021: A Computational Analysis Of Their Comprehensibility and Likeability Based On SentiArt
Authors:
Arthur M. Jacobs,
Annette Kinder
Abstract:
The electoral programs of six German parties issued before the parliamentary elections of 2021 are analyzed using state-of-the-art computational tools for quantitative narrative, topic and sentiment analysis. We compare different methods for computing the textual similarity of the programs, Jaccard Bag similarity, Latent Semantic Analysis, doc2vec, and sBERT, the representational and computational…
▽ More
The electoral programs of six German parties issued before the parliamentary elections of 2021 are analyzed using state-of-the-art computational tools for quantitative narrative, topic and sentiment analysis. We compare different methods for computing the textual similarity of the programs, Jaccard Bag similarity, Latent Semantic Analysis, doc2vec, and sBERT, the representational and computational complexity increasing from the 1st to the 4th method. A new similarity measure for entire documents derived from the Fowlkes Mallows Score is applied to kmeans clustering of sBERT transformed sentences. Using novel indices of the readability and emotion potential of texts computed via SentiArt (Jacobs, 2019), our data shed light on the similarities and differences of the programs regarding their length, main ideas, comprehensibility, likeability, and semantic complexity. Among others, they reveal that the programs of the SPD and CDU have the best chances to be comprehensible and likeable -all other things being equal-, and they raise the important issue of which similarity measure is optimal for comparing texts such as electoral programs which necessarily share a lot of words. While such analyses can not replace qualitative analyses or a deep reading of the texts, they offer predictions that can be verified in empirical studies and may serve as a motivation for changing aspects of future electoral programs potentially making them more comprehensible and/or likeable.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Measurement as governance in and for responsible AI
Authors:
Abigail Z. Jacobs
Abstract:
Measurement of social phenomena is everywhere, unavoidably, in sociotechnical systems. This is not (only) an academic point: Fairness-related harms emerge when there is a mismatch in the measurement process between the thing we purport to be measuring and the thing we actually measure. However, the measurement process -- where social, cultural, and political values are implicitly encoded in sociot…
▽ More
Measurement of social phenomena is everywhere, unavoidably, in sociotechnical systems. This is not (only) an academic point: Fairness-related harms emerge when there is a mismatch in the measurement process between the thing we purport to be measuring and the thing we actually measure. However, the measurement process -- where social, cultural, and political values are implicitly encoded in sociotechnical systems -- is almost always obscured. Furthermore, this obscured process is where important governance decisions are encoded: governance about which systems are fair, which individuals belong in which categories, and so on. We can then use the language of measurement, and the tools of construct validity and reliability, to uncover hidden governance decisions. In particular, we highlight two types of construct validity, content validity and consequential validity, that are useful to elicit and characterize the feedback loops between the measurement, social construction, and enforcement of social categories. We then explore the constructs of fairness, robustness, and responsibility in the context of governance in and for responsible AI. Together, these perspectives help us unpack how measurement acts as a hidden governance process in sociotechnical systems. Understanding measurement as governance supports a richer understanding of the governance processes already happening in AI -- responsible or otherwise -- revealing paths to more effective interventions.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
Hosting Industry Centralization and Consolidation
Authors:
Luciano Zembruzki,
Raffaele Sommese,
Lisandro Zambenedetti Granville,
Arthur Selle Jacobs,
Mattijs Jonker,
Giovane C. M. Moura
Abstract:
There have been growing concerns about the concentration and centralization of Internet infrastructure. In this work, we scrutinize the hosting industry on the Internet by using active measurements, covering 19 Top-Level Domains (TLDs). We show how the market is heavily concentrated: 1/3 of the domains are hosted by only 5 hosting providers, all US-based companies. For the country-code TLDs (ccTLD…
▽ More
There have been growing concerns about the concentration and centralization of Internet infrastructure. In this work, we scrutinize the hosting industry on the Internet by using active measurements, covering 19 Top-Level Domains (TLDs). We show how the market is heavily concentrated: 1/3 of the domains are hosted by only 5 hosting providers, all US-based companies. For the country-code TLDs (ccTLDs), however, hosting is primarily done by local, national hosting providers and not by the large American cloud and content providers. We show how shared languages (and borders) shape the hosting market -- German hosting companies have a notable presence in Austrian and Swiss markets, given they all share German as official language. While hosting concentration has been relatively high and stable over the past four years, we see that American hosting companies have been continuously increasing their presence in the market related to high traffic, popular domains within ccTLDs -- except for Russia, notably.
△ Less
Submitted 25 January, 2022; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Is Einstein more agreeable and less neurotic than Hitler? A computational exploration of the emotional and personality profiles of historical persons
Authors:
Arthur M. Jacobs,
Annette Kinder
Abstract:
Recent progress in distributed semantic models (DSM) offers new ways to estimate personality traits of both fictive and real people. In this exploratory study we applied an extended version of the algorithm developed in Jacobs (2019) to compute the likeability scores, emotional figure profiles and BIG5 personality traits for 100 historical persons from the arts, politics or science domains whose n…
▽ More
Recent progress in distributed semantic models (DSM) offers new ways to estimate personality traits of both fictive and real people. In this exploratory study we applied an extended version of the algorithm developed in Jacobs (2019) to compute the likeability scores, emotional figure profiles and BIG5 personality traits for 100 historical persons from the arts, politics or science domains whose names are rather unique (e.g., Einstein, Kahlo, Picasso). We compared the results produced by static (word2vec) and dynamic (BERT) language model representations in four studies. The results show both the potential and limitations of such DSM-based computations of personality profiles and point ways to further develop this approach to become a useful tool in data science, psychology or computational and neurocognitive poetics (Jacobs, 2015).
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set
Authors:
Arthur M. Jacobs,
Annette Kinder
Abstract:
The Gutenberg Literary English Corpus (GLEC) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. However, so far only a small subcorpus, the Gutenberg English Poetry Corpus, has been submitted to quantitative text analyses providing predictions for scientific studies of literature. Here we show that in the entire GLEC quas…
▽ More
The Gutenberg Literary English Corpus (GLEC) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. However, so far only a small subcorpus, the Gutenberg English Poetry Corpus, has been submitted to quantitative text analyses providing predictions for scientific studies of literature. Here we show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features, computed via style and sentiment analysis, in both tasks. Our results identify two standard and two novel features (i.e., type-token ratio, frequency, sonority score, surprise) as most diagnostic in these tasks. By providing a simple tool applicable to both short poems and long novels generating quantitative predictions about features that co-determe the cognitive and affective processing of specific text categories or authors, our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Refining Network Intents for Self-Driving Networks
Authors:
Arthur Selle Jacobs,
Ricardo José Pfitscher,
Ronaldo Alves Ferreira,
Lisandro Zambenedetti Granville
Abstract:
Recent advances in artificial intelligence (AI) offer an opportunity for the adoption of self-driving networks. However, network operators or home-network users still do not have the right tools to exploit these new advancements in AI, since they have to rely on low-level languages to specify network policies. Intent-based networking (IBN) allows operators to specify high-level policies that dicta…
▽ More
Recent advances in artificial intelligence (AI) offer an opportunity for the adoption of self-driving networks. However, network operators or home-network users still do not have the right tools to exploit these new advancements in AI, since they have to rely on low-level languages to specify network policies. Intent-based networking (IBN) allows operators to specify high-level policies that dictate how the network should behave without worrying how they are translated into configuration commands in the network devices. However, the existing research proposals for IBN fail to exploit the knowledge and feedback from the network operator to validate or improve the translation of intents. In this paper, we introduce a novel intent-refinement process that uses machine learning and feedback from the operator to translate the operator's utterances into network configurations. Our refinement process uses a sequence-to-sequence learning model to extract intents from natural language and the feedback from the operator to improve learning. The key insight of our process is an intermediate representation that resembles natural language that is suitable to collect feedback from the operator but is structured enough to facilitate precise translations. Our prototype interacts with a network operator using natural language and translates the operator input to the intermediate representation before translating to SDN rules. Our experimental results show that our process achieves a correlation coefficient squared (i.e., R-squared) of 0.99 for a dataset with 5000 entries and the operator feedback significantly improves the accuracy of our model.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
Internet-human infrastructures: Lessons from Havana's StreetNet
Authors:
Abigail Z. Jacobs,
Michaelanne Dye
Abstract:
We propose a mixed-methods approach to understanding the human infrastructure underlying StreetNet (SNET), a distributed, community-run intranet that serves as the primary 'Internet' in Havana, Cuba. We bridge ethnographic studies and the study of social networks and organizations to understand the way that power is embedded in the structure of Havana's SNET. By quantitatively and qualitatively un…
▽ More
We propose a mixed-methods approach to understanding the human infrastructure underlying StreetNet (SNET), a distributed, community-run intranet that serves as the primary 'Internet' in Havana, Cuba. We bridge ethnographic studies and the study of social networks and organizations to understand the way that power is embedded in the structure of Havana's SNET. By quantitatively and qualitatively unpacking the human infrastructure of SNET, this work reveals how distributed infrastructure necessarily embeds the structural aspects of inequality distributed within that infrastructure. While traditional technical measurements of networks reflect the social, organizational, spatial, and technical constraints that shape the resulting network, ethnographies can help uncover the texture and role of these hidden supporting relationships. By merging these perspectives, this work contributes to our understanding of network roles in growing and maintaining distributed infrastructures, revealing new approaches to understanding larger, more complex Internet-human infrastructures---including the Internet and the WWW.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Measurement and Fairness
Authors:
Abigail Z. Jacobs,
Hanna Wallach
Abstract:
We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable propert…
▽ More
We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable properties (and other unobservable theoretical constructs) thought to be related to them -- i.e., operationalized via a measurement model. This process, which necessarily involves making assumptions, introduces the potential for mismatches between the theoretical understanding of the construct purported to be measured and its operationalization. We argue that many of the harms discussed in the literature on fairness in computational systems are direct results of such mismatches. We show how some of these harms could have been anticipated and, in some cases, mitigated if viewed through the lens of measurement modeling. To do this, we contribute fairness-oriented conceptualizations of construct reliability and construct validity that unite traditions from political science, education, and psychology and provide a set of tools for making explicit and testing assumptions about constructs and their operationalizations. We then turn to fairness itself, an essentially contested construct that has different theoretical understandings in different contexts. We argue that this contestedness underlies recent debates about fairness definitions: although these debates appear to be about different operationalizations, they are, in fact, debates about different theoretical understandings of fairness. We show how measurement modeling can provide a framework for getting to the core of these debates.
△ Less
Submitted 12 March, 2021; v1 submitted 11 December, 2019;
originally announced December 2019.
-
Enabling Machine Learning-Ready HPC Ensembles with Merlin
Authors:
J. Luc Peterson,
Ben Bay,
Joe Koning,
Peter Robinson,
Jessica Semler,
Jeremy White,
Rushil Anirudh,
Kevin Athey,
Peer-Timo Bremer,
Francesco Di Natale,
David Fox,
Jim A. Gaffney,
Sam A. Jacobs,
Bhavya Kailkhura,
Bogdan Kustowski,
Steven Langer,
Brian Spears,
Jayaraman Thiagarajan,
Brian Van Essen,
Jae-Seung Yeom
Abstract:
With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computin…
▽ More
With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computing (HPC) environment. In this paper, we present Merlin, a workflow framework to enable large ML-friendly ensembles of scientific HPC simulations. By augmenting traditional HPC with distributed compute technologies, Merlin aims to lower the barrier for scientific subject matter experts to incorporate ML into their analysis. In addition to its design, we describe some example applications that Merlin has enabled on leadership-class HPC resources, such as the ML-augmented optimization of nuclear fusion experiments and the calibration of infectious disease models to study the progression of and possible mitigation strategies for COVID-19.
△ Less
Submitted 1 July, 2021; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
Authors:
Sam Ade Jacobs,
Brian Van Essen,
David Hysom,
Jae-Seung Yeom,
Tim Moon,
Rushil Anirudh,
Jayaraman J. Thiagaranjan,
Shusen Liu,
Peer-Timo Bremer,
Jim Gaffney,
Tom Benson,
Peter Robinson,
Luc Peterson,
Brian Spears
Abstract:
Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train traditional as well as generative adversarial networks built on LBANN, a scalable deep learning framework optimized for HPC systems. LBANN combines multiple levels of par…
▽ More
Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train traditional as well as generative adversarial networks built on LBANN, a scalable deep learning framework optimized for HPC systems. LBANN combines multiple levels of parallelism and exploits some of the worlds largest supercomputers. We demonstrate our framework by creating a complex predictive model based on multi-variate data from high-energy-density physics containing hundreds of millions of images and hundreds of millions of scalar values derived from tens of millions of simulations of inertial confinement fusion. Our approach combines an HPC workflow and extends LBANN with optimized data ingestion and the new tournament-style training algorithm to produce a scalable neural network architecture using a CORAL-class supercomputer. Experimental results show that 64 trainers (1024 GPUs) achieve a speedup of 70.2 over a single trainer (16 GPUs) baseline, and an effective 109% parallel efficiency.
△ Less
Submitted 5 October, 2019;
originally announced October 2019.
-
Multiparametric Deep Learning Tissue Signatures for Muscular Dystrophy: Preliminary Results
Authors:
Alex E. Bocchieri,
Vishwa S. Parekh,
Kathryn R. Wagner. Shivani Ahlawat,
Vladimir Braverman,
Doris G. Leung,
Michael A. Jacobs
Abstract:
A current clinical challenge is identifying limb girdle muscular dystrophy 2I(LGMD2I)tissue changes in the thighs, in particular, separating fat, fat-infiltrated muscle, and muscle tissue. Deep learning algorithms have the ability to learn different features by using the inherent tissue contrasts from multiparametric magnetic resonance imaging (mpMRI). To that end, we developed a novel multiparame…
▽ More
A current clinical challenge is identifying limb girdle muscular dystrophy 2I(LGMD2I)tissue changes in the thighs, in particular, separating fat, fat-infiltrated muscle, and muscle tissue. Deep learning algorithms have the ability to learn different features by using the inherent tissue contrasts from multiparametric magnetic resonance imaging (mpMRI). To that end, we developed a novel multiparametric deep learning network (MPDL) tissue signature model based on mpMRI and applied it to LGMD2I. We demonstrate a new tissue signature model of muscular dystrophy with the MPDL algorithm segments different tissue types with excellent results.
△ Less
Submitted 31 July, 2019;
originally announced August 2019.
-
Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications
Authors:
Shusen Liu,
Di Wang,
Dan Maljovec,
Rushil Anirudh,
Jayaraman J. Thiagarajan,
Sam Ade Jacobs,
Brian C. Van Essen,
David Hysom,
Jae-Seung Yeom,
Jim Gaffney,
Luc Peterson,
Peter B. Robinson,
Harsh Bhatia,
Valerio Pascucci,
Brian K. Spears,
Peer-Timo Bremer
Abstract:
With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that re…
▽ More
With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that require techniques that can handle millions or more samples. Although some solutions to these interpretability challenges have been proposed, they typically do not scale beyond thousands of samples, nor do they provide the high-level intuition scientists are looking for. Here, we present the first scalable solution to explore and analyze high-dimensional functions often encountered in the scientific data analysis pipeline. By combining a new streaming neighborhood graph construction, the corresponding topology computation, and a novel data aggregation scheme, namely topology aware datacubes, we enable interactive exploration of both the topological and the geometric aspect of high-dimensional data. Following two use cases from high-energy-density (HED) physics and computational biology, we demonstrate how these capabilities have led to crucial new insights in both applications.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Multiparametric Deep Learning and Radiomics for Tumor Grading and Treatment Response Assessment of Brain Cancer: Preliminary Results
Authors:
Vishwa S. Parekh,
John Laterra,
Chetan Bettegowda,
Alex E. Bocchieri,
Jay J. Pillai,
Michael A. Jacobs
Abstract:
Radiomics is an exciting new area of texture research for extracting quantitative and morphological characteristics of pathological tissue. However, to date, only single images have been used for texture analysis. We have extended radiomic texture methods to use multiparametric (mp) data to get more complete information from all the images. These mpRadiomic methods could potentially provide a plat…
▽ More
Radiomics is an exciting new area of texture research for extracting quantitative and morphological characteristics of pathological tissue. However, to date, only single images have been used for texture analysis. We have extended radiomic texture methods to use multiparametric (mp) data to get more complete information from all the images. These mpRadiomic methods could potentially provide a platform for stratification of tumor grade as well as assessment of treatment response in brain tumors. In brain, multiparametric MRI (mpMRI) are based on contrast enhanced T1-weighted imaging (T1WI), T2WI, Fluid Attenuated Inversion Recovery (FLAIR), Diffusion Weighted Imaging (DWI) and Perfusion Weighted Imaging (PWI). Therefore, we applied our multiparametric radiomic framework (mpRadiomic) on 24 patients with brain tumors (8 grade II and 16 grade IV). The mpRadiomic framework classified grade IV tumors from grade II tumors with a sensitivity and specificity of 93% and 100%, respectively, with an AUC of 0.95. For treatment response, the mpRadiomic framework classified pseudo-progression from true-progression with an AUC of 0.93. In conclusion, the mpRadiomic analysis was able to effectively capture the multiparametric brain MRI texture and could be used as potential biomarkers for distinguishing grade IV from grade II tumors as well as determining true-progression from pseudo-progression.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency
Authors:
Ya Ju Fan,
Jonathan E. Allen,
Sam Ade Jacobs,
Brian C. Van Essen
Abstract:
Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to model complex biological processes. We present an autoencoder to capture nonlinear relationships recovered from gene expression profiles. The autoencoder is a nonl…
▽ More
Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to model complex biological processes. We present an autoencoder to capture nonlinear relationships recovered from gene expression profiles. The autoencoder is a nonlinear dimension reduction technique using an artificial neural network, which learns hidden representations of unlabeled data. We train the autoencoder on a large collection of tumor samples from the National Cancer Institute Genomic Data Commons, and obtain a generalized and unsupervised latent representation. We leverage a HPC-focused deep learning toolkit, Livermore Big Artificial Neural Network (LBANN) to efficiently parallelize the training algorithm, reducing computation times from several hours to a few minutes. With the trained autoencoder, we generate latent representations of a small dataset, containing pairs of normal and cancer cells of various tumor types. A novel measure called autoencoder node saliency (ANS) is introduced to identify the hidden nodes that best differentiate various pairs of cells. We compare our findings of the best classifying nodes with principal component analysis and the visualization of t-distributed stochastic neighbor embedding. We demonstrate that the autoencoder effectively extracts distinct gene features for multiple learning tasks in the dataset.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
Tumor Connectomics: Mapping the intra-tumoral complex interaction network
Authors:
Vishwa S. Parekh,
Michael A. Jacobs
Abstract:
Tumors are extremely heterogeneous and comprise of a number of intratumor microenvironments or sub-regions. These tumor microenvironments may interact with eac based on complex high-level relationships, which could provide important insight into the organizational structure of the tumor network. To that end, we developed a tumor connectomics framework (TCF) to understand and model the complex func…
▽ More
Tumors are extremely heterogeneous and comprise of a number of intratumor microenvironments or sub-regions. These tumor microenvironments may interact with eac based on complex high-level relationships, which could provide important insight into the organizational structure of the tumor network. To that end, we developed a tumor connectomics framework (TCF) to understand and model the complex functional and morphological interactions within the tumor. Then, we demonstrate the TCF's potential in predicting treatment response in breast cancer patients being treated with neoadjuvant chemotherapy. The TCF was implemented on a breast cancer patient cohort of thirty-four patients with dynamic contrast enhanced (DCE) magnetic resonance imaging (MRI) undergoing neodjuvant chemotherapy treatment. The intra-tumor network connections (tumor connectome) before and after treatment were modeled using advanced graph theoretic centrality, path length and clustering metrics from the DCE-MRI. The percentage change of the graph metrics between two time-points (Baseline and 1st cycle) was computed to predict the patient's final response to treatment. The TCF visualized the inter-voxel network connections across multiple time-points and was able to evaluate specific changes in the tumor connectome with treatment. Degree centrality was identified as the most significant predictor of treatment response with an AUC of 0.83 for classifying responders from non-responders. In conclusion, the TCF graph metrics produced excellent biomarkers for prediction of breast cancer treatment response with improved visualization and interpretability of changes both locally and globally in the tumor.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Discovering heterogeneous subpopulations for fine-grained analysis of opioid use and opioid use disorders
Authors:
Jen J. Gong,
Abigail Z. Jacobs,
Toby E. Stuart,
Mathijs de Vaan
Abstract:
The opioid epidemic in the United States claims over 40,000 lives per year, and it is estimated that well over two million Americans have an opioid use disorder. Over-prescription and misuse of prescription opioids play an important role in the epidemic. Individuals who are prescribed opioids, and who are diagnosed with opioid use disorder, have diverse underlying health states. Policy interventio…
▽ More
The opioid epidemic in the United States claims over 40,000 lives per year, and it is estimated that well over two million Americans have an opioid use disorder. Over-prescription and misuse of prescription opioids play an important role in the epidemic. Individuals who are prescribed opioids, and who are diagnosed with opioid use disorder, have diverse underlying health states. Policy interventions targeting prescription opioid use, opioid use disorder, and overdose often fail to account for this variation. To identify latent health states, or phenotypes, pertinent to opioid use and opioid use disorders, we use probabilistic topic modeling with medical diagnosis histories from a statewide population of individuals who were prescribed opioids. We demonstrate that our learned phenotypes are predictive of future opioid use-related outcomes. In addition, we show how the learned phenotypes can provide important context for variability in opioid prescriptions. Understanding the heterogeneity in individual health states and in prescription opioid use can help identify policy interventions to address this public health crisis.
△ Less
Submitted 1 May, 2019; v1 submitted 10 November, 2018;
originally announced November 2018.
-
Advanced machine learning informatics modeling using clinical and radiological imaging metrics for characterizing breast tumor characteristics with the OncotypeDX gene array
Authors:
Michael A. Jacobs,
Christopher Umbricht,
Vishwa Parekh,
Riham El Khouli,
Leslie Cope,
Katarzyna J. Macura,
Susan Harvey,
Antonio C. Wolff
Abstract:
Purpose-Optimal use of established and imaging methods, such as multiparametric magnetic resonance imaging(mpMRI) can simultaneously identify key functional parameters and provide unique imaging phenotypes of breast cancer. Therefore, we have developed and implemented a new machine-learning informatic system that integrates clinical variables, derived from imaging and clinical health records, to c…
▽ More
Purpose-Optimal use of established and imaging methods, such as multiparametric magnetic resonance imaging(mpMRI) can simultaneously identify key functional parameters and provide unique imaging phenotypes of breast cancer. Therefore, we have developed and implemented a new machine-learning informatic system that integrates clinical variables, derived from imaging and clinical health records, to compare with the 21-gene array assay, OncotypeDX. Materials and methods-We tested our informatics modeling in a subset of patients (n=81) who had ER+ disease and underwent OncotypeDX gene expression and breast mpMRI testing. The machine-learning informatic method is termed Integrated Radiomic Informatic System-IRIS was applied to the mpMRI, clinical and pathologic descriptors, as well as a gene array analysis. The IRIS method using an advanced graph theoretic model and quantitative metrics. Summary statistics (mean and standard deviations) for the quantitative imaging parameters were obtained. Sensitivity and specificity and Area Under the Curve were calculated for the classification of the patients. Results-The OncotypeDX classification by IRIS model had sensitivity of 95% and specificity of 89% with AUC of 0.92. The breast lesion size was larger for the high-risk groups and lower for both low risk and intermediate risk groups. There were significant differences in PK-DCE and ADC map values in each group. The ADC map values for high- and intermediate-risk groups were significantly lower than the low-risk group. Conclusion-These initial studies provide deeper understandings of imaging features and molecular gene array OncotypeDX score. This insight provides the foundation to relate these imaging features to the assessment of treatment response for improved personalized medicine.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
Assembly in populations of social networks
Authors:
Abigail Z. Jacobs
Abstract:
In-depth studies of sociotechnical systems are largely limited to single instances. Network surveys are expensive, and platforms vary in important ways, from interface design, to social norms, to historical contingencies. With single examples, we can not in general know how much of observed network structure is explained by historical accidents, random noise, or meaningful social processes, nor ca…
▽ More
In-depth studies of sociotechnical systems are largely limited to single instances. Network surveys are expensive, and platforms vary in important ways, from interface design, to social norms, to historical contingencies. With single examples, we can not in general know how much of observed network structure is explained by historical accidents, random noise, or meaningful social processes, nor can we claim that network structure predicts outcomes, such as organization success or ecosystem health. Here, I show how we can adopt a comparative approach for settings where we have, or can cleverly construct, multiple instances of a network to estimate the natural variability in social systems. The comparative approach makes previously untested theories testable. Drawing on examples from the social networks literature, I discuss emerging directions in the study of populations of sociotechnical systems using insights from organization theory and ecology.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Radiomic Synthesis Using Deep Convolutional Neural Networks
Authors:
Vishwa S. Parekh,
Michael A. Jacobs
Abstract:
Radiomics is a rapidly growing field that deals with modeling the textural information present in the different tissues of interest for clinical decision support. However, the process of generating radiomic images is computationally very expensive and could take substantial time per radiological image for certain higher order features, such as, gray-level co-occurrence matrix(GLCM), even with high…
▽ More
Radiomics is a rapidly growing field that deals with modeling the textural information present in the different tissues of interest for clinical decision support. However, the process of generating radiomic images is computationally very expensive and could take substantial time per radiological image for certain higher order features, such as, gray-level co-occurrence matrix(GLCM), even with high-end GPUs. To that end, we developed RadSynth, a deep convolutional neural network(CNN) model, to efficiently generate radiomic images. RadSynth was tested on a breast cancer patient cohort of twenty-four patients(ten benign, ten malignant and four normal) for computation of GLCM entropy images from post-contrast DCE-MRI. RadSynth produced excellent synthetic entropy images compared to traditional GLCM entropy images. The average percentage difference and correlation between the two techniques were 0.07 $\pm$ 0.06 and 0.97, respectively. In conclusion, RadSynth presents a new powerful tool for fast computation and visualization of the textural information present in the radiological images.
△ Less
Submitted 29 May, 2019; v1 submitted 25 October, 2018;
originally announced October 2018.
-
MPRAD: A Multiparametric Radiomics Framework
Authors:
Vishwa S. Parekh,
Michael A. Jacobs
Abstract:
Multiparametric radiological imaging is vital for detection, characterization and diagnosis of many different diseases. The use of radiomics for quantitative extraction of textural features from radiological imaging is increasing moving towards clinical decision support. However, current methods in radiomics are limited to using single images for the extraction of these textural features and may l…
▽ More
Multiparametric radiological imaging is vital for detection, characterization and diagnosis of many different diseases. The use of radiomics for quantitative extraction of textural features from radiological imaging is increasing moving towards clinical decision support. However, current methods in radiomics are limited to using single images for the extraction of these textural features and may limit the applicable scope of radiomics in different clinical settings. Thus, in the current form, they are not capable of capturing the true underlying tissue characteristics in high dimensional multiparametric imaging space. To overcome this challenge, we have developed a multiparametric imaging radiomic framework termed MPRAD for extraction of radiomic features from high dimensional datasets. MPRAD was tested on two different organs and diseases; breast cancer and cerebrovascular accidents in brain, commonly referred to as stroke. The MPRAD framework classified malignant from benign breast lesions with excellent sensitivity and specificity of 87% and 80.5% respectively with an AUC of 0.88 providing a 9%-28% increase in AUC over single radiomic parameters. More importantly, in breast, the glandular tissue MPRAD were similar between each group with no significance differences. Similarly, the MPRAD features in brain stroke demonstrated increased performance in distinguishing the perfusion-diffusion mismatch compared to single parameter radiomics and there were no differences within the white and gray matter tissue. In conclusion, we have introduced the use of multiparametric radiomics into a clinical setting
△ Less
Submitted 25 September, 2018;
originally announced September 2018.
-
DreamNLP: Novel NLP System for Clinical Report Metadata Extraction using Count Sketch Data Streaming Algorithm: Preliminary Results
Authors:
Sanghyun Choi,
Nikita Ivkin,
Vladimir Braverman,
Michael A. Jacobs
Abstract:
Extracting information from electronic health records (EHR) is a challenging task since it requires prior knowledge of the reports and some natural language processing algorithm (NLP). With the growing number of EHR implementations, such knowledge is increasingly challenging to obtain in an efficient manner. We address this challenge by proposing a novel methodology to analyze large sets of EHRs u…
▽ More
Extracting information from electronic health records (EHR) is a challenging task since it requires prior knowledge of the reports and some natural language processing algorithm (NLP). With the growing number of EHR implementations, such knowledge is increasingly challenging to obtain in an efficient manner. We address this challenge by proposing a novel methodology to analyze large sets of EHRs using a modified Count Sketch data streaming algorithm termed DreamNLP. By using DreamNLP, we generate a dictionary of frequently occurring terms or heavy hitters in the EHRs using low computational memory compared to conventional counting approach other NLP programs use. We demonstrate the extraction of the most important breast diagnosis features from the EHRs in a set of patients that underwent breast imaging. Based on the analysis, extraction of these terms would be useful for defining important features for downstream tasks such as machine learning for precision medicine.
△ Less
Submitted 25 August, 2018;
originally announced September 2018.