Skip to main content

Showing 1–50 of 63 results for author: Jacobs, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13904  [pdf

    cs.HC cs.AI cs.LG

    A Systematic Review of User-Centred Evaluation of Explainable AI in Healthcare

    Authors: Ivania Donoso-Guzmán, Kristýna Sirka Kacafírková, Maxwell Szymanski, An Jacobs, Denis Parra, Katrien Verbert

    Abstract: Despite promising developments in Explainable Artificial Intelligence, the practical value of XAI methods remains under-explored and insufficiently validated in real-world settings. Robust and context-aware evaluation is essential, not only to produce understandable explanations but also to ensure their trustworthiness and usability for intended users, but tends to be overlooked because of no clea… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  2. arXiv:2506.05487  [pdf, other

    cs.CV cs.CE

    A Neural Network Model of Spatial and Feature-Based Attention

    Authors: Ruoyang Hu, Robert A. Jacobs

    Abstract: Visual attention is a mechanism closely intertwined with vision and memory. Top-down information influences visual processing through attention. We designed a neural network model inspired by aspects of human visual attention. This model consists of two networks: one serves as a basic processor performing a simple task, while the other processes contextual information and guides the first network… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 6 pages, 9 figures

  3. arXiv:2502.04386  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Fair Medical AI: Adversarial Debiasing of 3D CT Foundation Embeddings

    Authors: Guangyao Zheng, Michael A. Jacobs, Vladimir Braverman, Vishwa S. Parekh

    Abstract: Self-supervised learning has revolutionized medical imaging by enabling efficient and generalizable feature extraction from large-scale unlabeled datasets. Recently, self-supervised foundation models have been extended to three-dimensional (3D) computed tomography (CT) data, generating compact, information-rich embeddings with 1408 features that achieve state-of-the-art performance on downstream t… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  4. arXiv:2502.00561  [pdf, ps, other

    cs.CY

    Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge

    Authors: Hanna Wallach, Meera Desai, A. Feder Cooper, Angelina Wang, Chad Atalla, Solon Barocas, Su Lin Blodgett, Alexandra Chouldechova, Emily Corvi, P. Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu, Nicholas Pangakis, Stefanie Reed, Emily Sheng, Dan Vann, Jennifer Wortman Vaughan, Matthew Vogel, Hannah Washington, Abigail Z. Jacobs

    Abstract: The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges comparisons" (Roose, 2024). In this position paper, we argue that the ML community would benefit from learning from and drawing on the social sciences when developing and using measurement instruments fo… ▽ More

    Submitted 6 June, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: In Proceedings of the 42nd International Conference on Machine Learning (ICML), 2025

  5. arXiv:2412.06966  [pdf, other

    cs.LG cs.AI cs.CY

    Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

    Authors: A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, Ilia Shumailov, Eleni Triantafillou, Peter Kairouz, Nicole Mitchell, Percy Liang, Daniel E. Ho, Yejin Choi, Sanmi Koyejo, Fernando Delgado, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Solon Barocas, Amy Cyphert, Mark Lemley , et al. (10 additional authors not shown)

    Abstract: We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. These aspirations are both numerous and varied, motivated by issues that pertain to privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effect… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Presented at the 2nd Workshop on Generative AI and Law at ICML (July 2024)

  6. arXiv:2412.00110  [pdf, other

    cs.CV cs.AI cs.ET cs.LG

    Demographic Predictability in 3D CT Foundation Embeddings

    Authors: Guangyao Zheng, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Self-supervised foundation models have recently been successfully extended to encode three-dimensional (3D) computed tomography (CT) images, with excellent performance across several downstream tasks, such as intracranial hemorrhage detection and lung cancer risk forecasting. However, as self-supervised models learn from complex data distributions, questions arise concerning whether these embeddin… ▽ More

    Submitted 27 November, 2024; originally announced December 2024.

    Comments: submitted to Radiology Cardiothoracic Imaging

  7. arXiv:2411.10939  [pdf, other

    cs.CY

    Evaluating Generative AI Systems is a Social Science Measurement Challenge

    Authors: Hanna Wallach, Meera Desai, Nicholas Pangakis, A. Feder Cooper, Angelina Wang, Solon Barocas, Alexandra Chouldechova, Chad Atalla, Su Lin Blodgett, Emily Corvi, P. Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu, Stefanie Reed, Emily Sheng, Dan Vann, Jennifer Wortman Vaughan, Matthew Vogel, Hannah Washington, Abigail Z. Jacobs

    Abstract: Across academia, industry, and government, there is an increasing awareness that the measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult. We argue that these measurement tasks are highly reminiscent of measurement tasks found throughout the social sciences. With this in mind, we present a framework, grounded in measurement theory from the social sciences… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024 Workshop on Evaluating Evaluations (EvalEval)

  8. arXiv:2409.13306  [pdf, other

    cs.LG

    Predicting DNA fragmentation: A non-destructive analogue to chemical assays using machine learning

    Authors: Byron A Jacobs, Ifthakaar Shaik, Frando Lin

    Abstract: Globally, infertility rates are increasing, with 2.5\% of all births being assisted by in vitro fertilisation (IVF) in 2022. Male infertility is the cause for approximately half of these cases. The quality of sperm DNA has substantial impact on the success of IVF. The assessment of sperm DNA is traditionally done through chemical assays which render sperm cells ineligible for IVF. Many compounding… ▽ More

    Submitted 12 February, 2025; v1 submitted 20 September, 2024; originally announced September 2024.

  9. arXiv:2408.16978  [pdf, other

    cs.DC cs.AI cs.LG

    Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

    Authors: Jinghan Yao, Sam Ade Jacobs, Masahiro Tanaka, Olatunji Ruwase, Hari Subramoni, Dhabaleswar K. Panda

    Abstract: Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on extremely long contexts demands considerable GPU resources and increased memory, leading to higher costs and greater complexity. Alternative approaches that intro… ▽ More

    Submitted 13 May, 2025; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: The Eighth Annual Conference on Machine Learning and Systems (MLSys'25)

  10. arXiv:2406.18820  [pdf, other

    cs.DC cs.LG

    Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training

    Authors: Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang

    Abstract: Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state across multiple accelerators, a requirement for model scaling. Consolidating distributed model state into a single checkpoint unacceptably slows down training, and is impractical at extreme scales. Distributed checkpoints, in contrast, are t… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  11. arXiv:2405.19187  [pdf, ps, other

    cs.CY

    Algorithmic Transparency and Participation through the Handoff Lens: Lessons Learned from the U.S. Census Bureau's Adoption of Differential Privacy

    Authors: Amina A. Abdu, Lauren M. Chambers, Deirdre K. Mulligan, Abigail Z. Jacobs

    Abstract: Emerging discussions on the responsible government use of algorithmic technologies propose transparency and public participation as key mechanisms for preserving accountability and trust. But in practice, the adoption and use of any technology shifts the social, organizational, and political context in which it is embedded. Therefore translating transparency and participation efforts into meaningf… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 21 pages, FAccT '24

  12. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai , et al. (104 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version… ▽ More

    Submitted 30 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 24 pages

  13. arXiv:2404.00165  [pdf

    cs.CL cs.LG

    Individual Text Corpora Predict Openness, Interests, Knowledge and Level of Education

    Authors: Markus J. Hofmann, Markus T. Jansen, Christoph Wigbels, Benny Briesemeister, Arthur M. Jacobs

    Abstract: Here we examine whether the personality dimension of openness to experience can be predicted from the individual google search history. By web scraping, individual text corpora (ICs) were generated from 214 participants with a mean number of 5 million word tokens. We trained word2vec models and used the similarities of each IC to label words, which were derived from a lexical approach of personali… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: Proceedings of the 8th workshop on Cognitive Aspects of the Lexicon (CogALex-VIII), LREC/Coling 2024

  14. arXiv:2401.10877  [pdf, other

    cs.CY cs.CV cs.HC

    The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture Technology

    Authors: Emma Harvey, Hauke Sandhaus, Abigail Z. Jacobs, Emanuel Moss, Mona Sloane

    Abstract: Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, a… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 34 pages, 9 figures. To appear in the 2024 ACM CHI Conference on Human Factors in Computing Systems (CHI '24)

  15. arXiv:2311.06477  [pdf, other

    cs.CY

    Report of the 1st Workshop on Generative AI and Law

    Authors: A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry , et al. (10 additional authors not shown)

    Abstract: This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report… ▽ More

    Submitted 2 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

  16. arXiv:2309.14509  [pdf, other

    cs.LG cs.CL cs.DC

    DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

    Authors: Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He

    Abstract: Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the first three dimensions: data parallelism for batch size, tensor parallelism for hidden size and pipeline parallelism for model depth or layers. These widely studie… ▽ More

    Submitted 4 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  17. An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

    Authors: Amina A. Abdu, Irene V. Pasquetto, Abigail Z. Jacobs

    Abstract: Recent work in algorithmic fairness has highlighted the challenge of defining racial categories for the purposes of anti-discrimination. These challenges are not new but have previously fallen to the state, which enacts race through government statistics, policies, and evidentiary standards in anti-discrimination law. Drawing on the history of state race-making, we examine how longstanding questio… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 13 pages, 2 figures, FAccT '23

    Journal ref: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 1324-1333)

  18. Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study

    Authors: Patrick Saux, Pierre Bauvin, Violeta Raverdy, Julien Teigny, Hélène Verkindt, Tomy Soumphonphakdy, Maxence Debert, Anne Jacobs, Daan Jacobs, Valerie Monpellier, Phong Ching Lee, Chin Hong Lim, Johanna C Andersson-Assarsson, Lena Carlsson, Per-Arne Svensson, Florence Galtier, Guelareh Dezfoulian, Mihaela Moldovanu, Severine Andrieux, Julien Couster, Marie Lepage, Erminia Lembo, Ornella Verrastro, Maud Robert, Paulina Salminen , et al. (9 additional authors not shown)

    Abstract: Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participa… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: The Lancet Digital Health, 2023

  19. arXiv:2306.10209  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

    Authors: Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He

    Abstract: Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass,… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 12 pages

  20. arXiv:2306.05310  [pdf, other

    cs.LG

    A framework for dynamically training and adapting deep reinforcement learning models to different, low-compute, and continuously changing radiology deployment environments

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: While Deep Reinforcement Learning has been widely researched in medical imaging, the training and deployment of these models usually require powerful GPUs. Since imaging environments evolve rapidly and can be generated by edge devices, the algorithm is required to continually learn and adapt to changing environments, and adjust to low-compute devices. To this end, we developed three image coreset… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  21. arXiv:2306.00188  [pdf, other

    cs.LG cs.CV eess.IV

    Multi-environment lifelong deep reinforcement learning for medical imaging

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Deep reinforcement learning(DRL) is increasingly being explored in medical imaging. However, the environments for medical imaging tasks are constantly evolving in terms of imaging orientations, imaging sequences, and pathologies. To that end, we developed a Lifelong DRL framework, SERIL to continually learn new tasks in changing imaging environments without catastrophic forgetting. SERIL was devel… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  22. arXiv:2305.05608  [pdf, other

    cs.IR cs.CY cs.LG

    The Role of Relevance in Fair Ranking

    Authors: Aparna Balagopalan, Abigail Z. Jacobs, Asia Biega

    Abstract: Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because t… ▽ More

    Submitted 6 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Published in SIGIR 2023

  23. arXiv:2303.06783  [pdf, other

    cs.LG cs.CV eess.IV

    Asynchronous Decentralized Federated Lifelong Learning for Landmark Localization in Medical Imaging

    Authors: Guangyao Zheng, Michael A. Jacobs, Vladimir Braverman, Vishwa S. Parekh

    Abstract: Federated learning is a recent development in the machine learning area that allows a system of devices to train on one or more tasks without sharing their data to a single location or device. However, this framework still requires a centralized global model to consolidate individual models into one, and the devices train synchronously, which both can be potential bottlenecks for using federated l… ▽ More

    Submitted 10 January, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

  24. arXiv:2302.11510  [pdf, other

    cs.LG cs.CV

    Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging

    Authors: Guangyao Zheng, Samson Zhou, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experienc… ▽ More

    Submitted 9 January, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

  25. Eilmer: an Open-Source Multi-Physics Hypersonic Flow Solver

    Authors: Nicholas N. Gibbons, Kyle A. Damm, Peter A. Jacobs, Rowan J. Gollan

    Abstract: This paper introduces Eilmer, a general-purpose open-source compressible flow solver developed at the University of Queensland, designed to support research calculations in hypersonics and high-speed aerothermodynamics. Eilmer has a broad userbase in several university research groups and a wide range of capabilities, which are documented on the project's website, in the accompanying reference man… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

    Journal ref: Comput. Phys. Commun. 282 (2023) Article 108551

  26. arXiv:2205.11927  [pdf, other

    cs.CV

    Image Trinarization Using a Partial Differential Equations: A Novel Approach to Automatic Sperm Image Analysis

    Authors: B. A. Jacobs

    Abstract: Partial differential equations have recently garnered substantial attention as an image processing framework due to their extensibility, the ability to rigorously engineer and analyse the governing dynamics as well as the ease of implementation using numerical methods. This paper explores a novel approach to image trinarization with a concrete real-world application of classifying regions of sperm… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    MSC Class: 68U10; 35K55; 65M12; 92C55

  27. arXiv:2201.04356  [pdf

    cs.CL

    Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature

    Authors: Arthur M. Jacobs, Annette Kinder

    Abstract: The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. In this study we address differences among the different literature categories in GLEC, as well as differences between authors. We report the results of three studies providing i) topic and sentiment analyses for… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 37 pages, 12 figures

  28. arXiv:2112.10001  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Cross-Domain Federated Learning in Medical Imaging

    Authors: Vishwa S Parekh, Shuhao Lai, Vladimir Braverman, Jeff Leal, Steven Rowe, Jay J Pillai, Michael A Jacobs

    Abstract: Federated learning is increasingly being explored in the field of medical imaging to train deep learning models on large scale datasets distributed across different data centers while preserving privacy by avoiding the need to transfer sensitive patient information. In this manuscript, we explore federated learning in a multi-domain, multi-task setting wherein different participating nodes may con… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: Under Review for MIDL 2022

  29. arXiv:2112.08645  [pdf, other

    cs.LG cs.AI cs.NE

    Learning Interpretable Models Through Multi-Objective Neural Architecture Search

    Authors: Zachariah Carmichael, Tim Moon, Sam Ade Jacobs

    Abstract: Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these meth… ▽ More

    Submitted 4 July, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: International Conference on Automated Machine Learning (AutoML) Workshop

  30. arXiv:2109.12500  [pdf

    cs.CL cs.LG

    Electoral Programs of German Parties 2021: A Computational Analysis Of Their Comprehensibility and Likeability Based On SentiArt

    Authors: Arthur M. Jacobs, Annette Kinder

    Abstract: The electoral programs of six German parties issued before the parliamentary elections of 2021 are analyzed using state-of-the-art computational tools for quantitative narrative, topic and sentiment analysis. We compare different methods for computing the textual similarity of the programs, Jaccard Bag similarity, Latent Semantic Analysis, doc2vec, and sBERT, the representational and computational… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: 24 pages, 5 figure,1 table

  31. arXiv:2109.05658  [pdf, other

    cs.CY

    Measurement as governance in and for responsible AI

    Authors: Abigail Z. Jacobs

    Abstract: Measurement of social phenomena is everywhere, unavoidably, in sociotechnical systems. This is not (only) an academic point: Fairness-related harms emerge when there is a mismatch in the measurement process between the thing we purport to be measuring and the thing we actually measure. However, the measurement process -- where social, cultural, and political values are implicitly encoded in sociot… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: 5 pages, 1 figure; KDD Workshop on Responsible AI 2021

  32. arXiv:2109.01187  [pdf, other

    cs.NI

    Hosting Industry Centralization and Consolidation

    Authors: Luciano Zembruzki, Raffaele Sommese, Lisandro Zambenedetti Granville, Arthur Selle Jacobs, Mattijs Jonker, Giovane C. M. Moura

    Abstract: There have been growing concerns about the concentration and centralization of Internet infrastructure. In this work, we scrutinize the hosting industry on the Internet by using active measurements, covering 19 Top-Level Domains (TLDs). We show how the market is heavily concentrated: 1/3 of the domains are hosted by only 5 hosting providers, all US-based companies. For the country-code TLDs (ccTLD… ▽ More

    Submitted 25 January, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: to appear in IEEE/IFIP Network Operations and Management Symposium https://noms2022.ieee-noms.org/

  33. arXiv:2106.07237  [pdf

    cs.CL cs.AI cs.LG

    Is Einstein more agreeable and less neurotic than Hitler? A computational exploration of the emotional and personality profiles of historical persons

    Authors: Arthur M. Jacobs, Annette Kinder

    Abstract: Recent progress in distributed semantic models (DSM) offers new ways to estimate personality traits of both fictive and real people. In this exploratory study we applied an extended version of the algorithm developed in Jacobs (2019) to compute the likeability scores, emotional figure profiles and BIG5 personality traits for 100 historical persons from the arts, politics or science domains whose n… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: 20 pages, 4 figures

  34. arXiv:2010.10801  [pdf

    cs.CL

    Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set

    Authors: Arthur M. Jacobs, Annette Kinder

    Abstract: The Gutenberg Literary English Corpus (GLEC) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. However, so far only a small subcorpus, the Gutenberg English Poetry Corpus, has been submitted to quantitative text analyses providing predictions for scientific studies of literature. Here we show that in the entire GLEC quas… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: 18 pages, 3 tables

  35. Refining Network Intents for Self-Driving Networks

    Authors: Arthur Selle Jacobs, Ricardo José Pfitscher, Ronaldo Alves Ferreira, Lisandro Zambenedetti Granville

    Abstract: Recent advances in artificial intelligence (AI) offer an opportunity for the adoption of self-driving networks. However, network operators or home-network users still do not have the right tools to exploit these new advancements in AI, since they have to rely on low-level languages to specify network policies. Intent-based networking (IBN) allows operators to specify high-level policies that dicta… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: 9 pages, 5 figures, 3 listings, 1 grammar

    ACM Class: C.2.3; C.2.1

    Journal ref: ACM SIGCOMM Computer Communication Review (CCR), vol. 48, issue 5, p. 55-63, October 2018

  36. arXiv:2004.12207  [pdf, other

    cs.SI cs.CY

    Internet-human infrastructures: Lessons from Havana's StreetNet

    Authors: Abigail Z. Jacobs, Michaelanne Dye

    Abstract: We propose a mixed-methods approach to understanding the human infrastructure underlying StreetNet (SNET), a distributed, community-run intranet that serves as the primary 'Internet' in Havana, Cuba. We bridge ethnographic studies and the study of social networks and organizations to understand the way that power is embedded in the structure of Havana's SNET. By quantitatively and qualitatively un… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

    Comments: 5 pages, 1 figure. WebConf Workshop on Innovative Ideas in Data Science (April 2020)

  37. Measurement and Fairness

    Authors: Abigail Z. Jacobs, Hanna Wallach

    Abstract: We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable propert… ▽ More

    Submitted 12 March, 2021; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: 11 pages, 1 figure. To be published in the proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT '21)

  38. arXiv:1912.02892  [pdf, other

    cs.DC cs.LG physics.comp-ph physics.plasm-ph

    Enabling Machine Learning-Ready HPC Ensembles with Merlin

    Authors: J. Luc Peterson, Ben Bay, Joe Koning, Peter Robinson, Jessica Semler, Jeremy White, Rushil Anirudh, Kevin Athey, Peer-Timo Bremer, Francesco Di Natale, David Fox, Jim A. Gaffney, Sam A. Jacobs, Bhavya Kailkhura, Bogdan Kustowski, Steven Langer, Brian Spears, Jayaraman Thiagarajan, Brian Van Essen, Jae-Seung Yeom

    Abstract: With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computin… ▽ More

    Submitted 1 July, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 28 pages, 9 figures; Submitted to FGCS

    Report number: LLNL-JRNL-821884

  39. arXiv:1910.02270  [pdf, other

    cs.DC cs.LG hep-ex physics.comp-ph

    Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

    Authors: Sam Ade Jacobs, Brian Van Essen, David Hysom, Jae-Seung Yeom, Tim Moon, Rushil Anirudh, Jayaraman J. Thiagaranjan, Shusen Liu, Peer-Timo Bremer, Jim Gaffney, Tom Benson, Peter Robinson, Luc Peterson, Brian Spears

    Abstract: Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train traditional as well as generative adversarial networks built on LBANN, a scalable deep learning framework optimized for HPC systems. LBANN combines multiple levels of par… ▽ More

    Submitted 5 October, 2019; originally announced October 2019.

  40. arXiv:1908.00175  [pdf

    eess.IV cs.LG physics.med-ph

    Multiparametric Deep Learning Tissue Signatures for Muscular Dystrophy: Preliminary Results

    Authors: Alex E. Bocchieri, Vishwa S. Parekh, Kathryn R. Wagner. Shivani Ahlawat, Vladimir Braverman, Doris G. Leung, Michael A. Jacobs

    Abstract: A current clinical challenge is identifying limb girdle muscular dystrophy 2I(LGMD2I)tissue changes in the thighs, in particular, separating fat, fat-infiltrated muscle, and muscle tissue. Deep learning algorithms have the ability to learn different features by using the inherent tissue contrasts from multiparametric magnetic resonance imaging (mpMRI). To that end, we developed a novel multiparame… ▽ More

    Submitted 31 July, 2019; originally announced August 2019.

    Comments: 6 pages, 3 figures. MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/H1g3ICh4cV

  41. arXiv:1907.08325  [pdf, other

    cs.LG cs.HC cs.NE stat.ML

    Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications

    Authors: Shusen Liu, Di Wang, Dan Maljovec, Rushil Anirudh, Jayaraman J. Thiagarajan, Sam Ade Jacobs, Brian C. Van Essen, David Hysom, Jae-Seung Yeom, Jim Gaffney, Luc Peterson, Peter B. Robinson, Harsh Bhatia, Valerio Pascucci, Brian K. Spears, Peer-Timo Bremer

    Abstract: With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that re… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

  42. arXiv:1906.04049  [pdf

    eess.IV cs.LG physics.med-ph q-bio.QM

    Multiparametric Deep Learning and Radiomics for Tumor Grading and Treatment Response Assessment of Brain Cancer: Preliminary Results

    Authors: Vishwa S. Parekh, John Laterra, Chetan Bettegowda, Alex E. Bocchieri, Jay J. Pillai, Michael A. Jacobs

    Abstract: Radiomics is an exciting new area of texture research for extracting quantitative and morphological characteristics of pathological tissue. However, to date, only single images have been used for texture analysis. We have extended radiomic texture methods to use multiparametric (mp) data to get more complete information from all the images. These mpRadiomic methods could potentially provide a plat… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: 6 pages, 4 figure, 2 tables, radiomics, brain

    MSC Class: 94A17; 68T10 ACM Class: I.4.7; I.4.10

  43. arXiv:1901.11152  [pdf, other

    cs.LG stat.ML

    Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency

    Authors: Ya Ju Fan, Jonathan E. Allen, Sam Ade Jacobs, Brian C. Van Essen

    Abstract: Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to model complex biological processes. We present an autoencoder to capture nonlinear relationships recovered from gene expression profiles. The autoencoder is a nonl… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Second Workshop on HPC Applications in Precision Medicine, June 2018

  44. arXiv:1901.09861  [pdf

    q-bio.QM cs.SI

    Tumor Connectomics: Mapping the intra-tumoral complex interaction network

    Authors: Vishwa S. Parekh, Michael A. Jacobs

    Abstract: Tumors are extremely heterogeneous and comprise of a number of intratumor microenvironments or sub-regions. These tumor microenvironments may interact with eac based on complex high-level relationships, which could provide important insight into the organizational structure of the tumor network. To that end, we developed a tumor connectomics framework (TCF) to understand and model the complex func… ▽ More

    Submitted 28 January, 2019; originally announced January 2019.

    Comments: 7 pages, 5 figures, SPIE Medical Imaging

  45. arXiv:1811.04344   

    q-bio.QM cs.LG stat.ML

    Discovering heterogeneous subpopulations for fine-grained analysis of opioid use and opioid use disorders

    Authors: Jen J. Gong, Abigail Z. Jacobs, Toby E. Stuart, Mathijs de Vaan

    Abstract: The opioid epidemic in the United States claims over 40,000 lives per year, and it is estimated that well over two million Americans have an opioid use disorder. Over-prescription and misuse of prescription opioids play an important role in the epidemic. Individuals who are prescribed opioids, and who are diagnosed with opioid use disorder, have diverse underlying health states. Policy interventio… ▽ More

    Submitted 1 May, 2019; v1 submitted 10 November, 2018; originally announced November 2018.

    Comments: Withdrawn pending data use agreement clarification

  46. arXiv:1811.03218  [pdf

    physics.med-ph cs.AI cs.CV cs.LG q-bio.QM

    Advanced machine learning informatics modeling using clinical and radiological imaging metrics for characterizing breast tumor characteristics with the OncotypeDX gene array

    Authors: Michael A. Jacobs, Christopher Umbricht, Vishwa Parekh, Riham El Khouli, Leslie Cope, Katarzyna J. Macura, Susan Harvey, Antonio C. Wolff

    Abstract: Purpose-Optimal use of established and imaging methods, such as multiparametric magnetic resonance imaging(mpMRI) can simultaneously identify key functional parameters and provide unique imaging phenotypes of breast cancer. Therefore, we have developed and implemented a new machine-learning informatic system that integrates clinical variables, derived from imaging and clinical health records, to c… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: 32 pages, 6 figures, Abstract number SSQ01-04:Radiological Society of North America 2015 Scientific Assembly and Annual Meeting,Chicago IL

    Report number: SSQ01-04

  47. arXiv:1811.01452  [pdf, other

    cs.SI physics.soc-ph

    Assembly in populations of social networks

    Authors: Abigail Z. Jacobs

    Abstract: In-depth studies of sociotechnical systems are largely limited to single instances. Network surveys are expensive, and platforms vary in important ways, from interface design, to social norms, to historical contingencies. With single examples, we can not in general know how much of observed network structure is explained by historical accidents, random noise, or meaningful social processes, nor ca… ▽ More

    Submitted 4 November, 2018; originally announced November 2018.

    Comments: 4 pages, 1 figure. Position paper for CSCW Workshop on Navigating the Challenges of Multi-Site Research

    ACM Class: J.4; H.5.3

  48. arXiv:1810.11090  [pdf

    cs.CV

    Radiomic Synthesis Using Deep Convolutional Neural Networks

    Authors: Vishwa S. Parekh, Michael A. Jacobs

    Abstract: Radiomics is a rapidly growing field that deals with modeling the textural information present in the different tissues of interest for clinical decision support. However, the process of generating radiomic images is computationally very expensive and could take substantial time per radiological image for certain higher order features, such as, gray-level co-occurrence matrix(GLCM), even with high… ▽ More

    Submitted 29 May, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: Submitted to ISBI 2019, 4 pages

  49. arXiv:1809.09973  [pdf

    cs.CV physics.bio-ph physics.med-ph q-bio.QM

    MPRAD: A Multiparametric Radiomics Framework

    Authors: Vishwa S. Parekh, Michael A. Jacobs

    Abstract: Multiparametric radiological imaging is vital for detection, characterization and diagnosis of many different diseases. The use of radiomics for quantitative extraction of textural features from radiological imaging is increasing moving towards clinical decision support. However, current methods in radiomics are limited to using single images for the extraction of these textural features and may l… ▽ More

    Submitted 25 September, 2018; originally announced September 2018.

    Comments: 32 pages, 7 figures

    Journal ref: Breast Cancer Res Treat (2020)

  50. arXiv:1809.02665  [pdf

    cs.LG eess.AS stat.ML

    DreamNLP: Novel NLP System for Clinical Report Metadata Extraction using Count Sketch Data Streaming Algorithm: Preliminary Results

    Authors: Sanghyun Choi, Nikita Ivkin, Vladimir Braverman, Michael A. Jacobs

    Abstract: Extracting information from electronic health records (EHR) is a challenging task since it requires prior knowledge of the reports and some natural language processing algorithm (NLP). With the growing number of EHR implementations, such knowledge is increasingly challenging to obtain in an efficient manner. We address this challenge by proposing a novel methodology to analyze large sets of EHRs u… ▽ More

    Submitted 25 August, 2018; originally announced September 2018.

    Comments: 13 pages, 3 figures, US patent

    ACM Class: E.1; E.2; F.2.2; I.2.7