Skip to main content

Showing 1–50 of 66 results for author: Fox, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.16484  [pdf, other

    cs.HC cs.AI

    AI-Powered Episodic Future Thinking

    Authors: Sareh Ahmadi, Michelle Rockwell, Megan Stuart, Allison Tegge, Xuan Wang, Jeffrey Stein, Edward A. Fox

    Abstract: Episodic Future Thinking (EFT) is an intervention that involves vividly imagining personal future events and experiences in detail. It has shown promise as an intervention to reduce delay discounting - the tendency to devalue delayed rewards in favor of immediate gratification - and to promote behavior change in a range of maladaptive health behaviors. We present EFTeacher, an AI chatbot powered b… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  2. arXiv:2502.14059  [pdf, other

    cs.HC

    Integrated Telehealth and Extended Reality to Enhance Home Exercise Adherence Following Total Hip and Knee Arthroplasty

    Authors: Christy L. Conroy, Gina M. Brunetti, Angelos Barmpoutis, Emily J. Fox

    Abstract: Nearly one million total hip and knee arthroplasties (THA/TKA) are performed annually in the United States, with most patients discharged home and prescribed home exercise programs (HEPs) to enhance lower extremity function. Traditional paper-based HEPs, while accessible and low-cost, often lack engagement and real-time feedback, which are critical for adherence and performance optimization. Exten… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: In Proceedings of the 3rd Workshop on XR Technologies for Healthcare, Wellbeing, and Medicine, IEEE VR Conference, March 8-12, 2025

  3. arXiv:2501.12352  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Test-time regression: a unifying framework for designing sequence models with associative memory

    Authors: Ke Alexander Wang, Jiaxin Shi, Emily B. Fox

    Abstract: Sequence models lie at the heart of modern deep learning. However, rapid advancements have produced a diversity of seemingly unrelated architectures, such as Transformers and recurrent alternatives. In this paper, we introduce a unifying framework to understand and derive these sequence models, inspired by the empirical importance of associative recall, the capability to retrieve contextually rele… ▽ More

    Submitted 1 May, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  4. arXiv:2412.06945  [pdf, ps, other

    cs.HC

    "It's Always a Losing Game": How Workers Understand and Resist Surveillance Technologies on the Job

    Authors: Cella M. Sum, Caroline Shi, Sarah E. Fox

    Abstract: With the rise of remote work, a range of surveillance technologies are increasingly being used by business owners to track and monitor employees, raising concerns about worker rights and privacy. Through analysis of Reddit posts and in-depth semi-structured interviews, this paper seeks to understand how workers across a range of sectors make sense of and respond to layered forms of surveillance. W… ▽ More

    Submitted 12 February, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

  5. Automating Chapter-Level Classification for Electronic Theses and Dissertations

    Authors: Bipasha Banerjee, William A. Ingram, Edward A. Fox

    Abstract: Traditional archival practices for describing electronic theses and dissertations (ETDs) rely on broad, high-level metadata schemes that fail to capture the depth, complexity, and interdisciplinary nature of these long scholarly works. The lack of detailed, chapter-level content descriptions impedes researchers' ability to locate specific sections or themes, thereby reducing discoverability and ov… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  6. Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals

    Authors: William A. Ingram, Bipasha Banerjee, Edward A. Fox

    Abstract: As research institutions increasingly commit to supporting the United Nations' Sustainable Development Goals (SDGs), there is a pressing need to accurately assess their research output against these goals. Current approaches, primarily reliant on keyword-based Boolean search queries, conflate incidental keyword matches with genuine contributions, reducing retrieval precision and complicating bench… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  7. arXiv:2411.17570  [pdf, other

    cs.LG cs.AI stat.AP stat.ML

    Learning Explainable Treatment Policies with Clinician-Informed Representations: A Practical Approach

    Authors: Johannes O. Ferstad, Emily B. Fox, David Scheinker, Ramesh Johari

    Abstract: Digital health interventions (DHIs) and remote patient monitoring (RPM) have shown great potential in improving chronic disease management through personalized care. However, barriers like limited efficacy and workload concerns hinder adoption of existing DHIs; while limited sample sizes and lack of interpretability limit the effectiveness and adoption of purely black-box algorithmic DHIs. In this… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Proceedings of Machine Learning for Health (ML4H) 2024. Code available at: https://github.com/jferstad/ml4h-explainable-policies

  8. arXiv:2411.12275  [pdf

    cs.CY cs.AI cs.CL

    Building Trust: Foundations of Security, Safety and Transparency in AI

    Authors: Huzaifa Sidhpurwala, Garth Mollett, Emily Fox, Mark Bestavros, Huamin Chen

    Abstract: This paper explores the rapidly evolving ecosystem of publicly available AI models, and their potential implications on the security and safety landscape. As AI models become increasingly prevalent, understanding their potential risks and vulnerabilities is crucial. We review the current security and safety scenarios while highlighting challenges such as tracking issues, remediation, and the appar… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  9. arXiv:2411.06590  [pdf, other

    cs.LG cs.AI cs.CL

    CriticAL: Critic Automation with Language Models

    Authors: Michael Y. Li, Vivek Vajipey, Noah D. Goodman, Emily B. Fox

    Abstract: Understanding the world through models is a fundamental goal of scientific research. While large language model (LLM) based approaches show promise in automating scientific discovery, they often overlook the importance of criticizing scientific models. Criticizing models deepens scientific understanding and drives the development of more accurate models. Automating model criticism is difficult bec… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

  10. arXiv:2411.04825  [pdf, other

    cs.CL cs.DL cs.LG

    VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models

    Authors: Ming Cheng, Jiaying Gong, Chenhan Yuan, William A. Ingram, Edward Fox, Hoda Eldardiry

    Abstract: Existing text simplification or paraphrase datasets mainly focus on sentence-level text generation in a general domain. These datasets are typically developed without using domain knowledge. In this paper, we release a novel dataset, VTechAGP, which is the first academic-to-general-audience text paraphrase dataset consisting of document-level these and dissertation academic and general-audience ab… ▽ More

    Submitted 21 February, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: 21 pages, 3 figures, accepted for publication in NAACL 2025 Main Conference

  11. arXiv:2410.08938  [pdf, other

    q-bio.QM cs.LG

    KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

    Authors: Benson Chen, Tomasz Danel, Patrick J. McEnaney, Nikhil Jain, Kirill Novikov, Spurti Umesh Akki, Joshua L. Turnbull, Virja Atul Pandya, Boris P. Belotserkovskii, Jared Bryce Weaver, Ankita Biswas, Dat Nguyen, Gabriel H. S. Dreiman, Mohammad Sultan, Nathaniel Stanley, Daniel M Whalen, Divya Kanichar, Christoph Klein, Emily Fox, R. Edward Watts

    Abstract: DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  12. arXiv:2409.11654  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.NC

    How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

    Authors: Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B. Burkhardt, Andrea Califano, Jonah Cool, Abby F. Dernburg, Kirsty Ewing, Emily B. Fox, Matthias Haury, Amy E. Herr, Eric Horvitz, Patrick D. Hsu, Viren Jain, Gregory R. Johnson, Thomas Kalil, David R. Kelley, Shana O. Kelley, Anna Kreshuk , et al. (17 additional authors not shown)

    Abstract: The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision… ▽ More

    Submitted 14 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

  13. arXiv:2407.00138  [pdf, other

    cs.AI cs.CV

    Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models

    Authors: Nila Masrourisaadat, Nazanin Sedaghatkish, Fatemeh Sarshartehrani, Edward A. Fox

    Abstract: Advances in generative models have led to significant interest in image synthesis, demonstrating the ability to generate high-quality images for a diverse range of text prompts. Despite this progress, most studies ignore the presence of bias. In this paper, we examine several text-to-image models not only by qualitatively assessing their performance in generating accurate images of human faces, gr… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 20 pages, 8 figures

    ACM Class: I.2.6; I.2.10; I.2.7; I.4.10

  14. Public Technologies Transforming Work of the Public and the Public Sector

    Authors: Seyun Kim, Bonnie Fan, Willa Yunqi Yang, Jessie Ramey, Sarah E Fox, Haiyi Zhu, John Zimmerman, Motahhare Eslami

    Abstract: Technologies adopted by the public sector have transformed the work practices of employees in public agencies by creating different means of communication and decision-making. Although much of the recent research in the future of work domain has concentrated on the effects of technological advancements on public sector employees, the influence on work practices of external stakeholders engaging wi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  15. arXiv:2404.05050  [pdf, other

    cs.HC

    Co-design Accessible Public Robots: Insights from People with Mobility Disability, Robotic Practitioners and Their Collaborations

    Authors: Howard Ziyu Han, Franklin Mingzhe Li, Alesandra Baca Vazquez, Daragh Byrne, Nikolas Martelaro, Sarah E Fox

    Abstract: Sidewalk robots are increasingly common across the globe. Yet, their operation on public paths poses challenges for people with mobility disabilities (PwMD) who face barriers to accessibility, such as insufficient curb cuts. We interviewed 15 PwMD to understand how they perceive sidewalk robots. Findings indicated that PwMD feel they have to compete for space on the sidewalk when robots are introd… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  16. arXiv:2403.12878  [pdf, other

    cs.CG

    Fréchet Edit Distance

    Authors: Emily Fox, Amir Nayyeri, Jonathan James Perry, Benjamin Raichel

    Abstract: We define and investigate the Fréchet edit distance problem. Given two polygonal curves $Ï€$ and $σ$ and a threshhold value $δ>0$, we seek the minimum number of edits to $σ$ such that the Fréchet distance between the edited $σ$ and $Ï€$ is at most $δ$. For the edit operations we consider three cases, namely, deletion of vertices, insertion of vertices, or both. For this basic problem we consider a n… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: To appear in SoCG 2024

  17. arXiv:2402.17879  [pdf, other

    cs.LG cs.CL

    Automated Statistical Model Discovery with Language Models

    Authors: Michael Y. Li, Emily B. Fox, Noah D. Goodman

    Abstract: Statistical model discovery is a challenging search over a vast space of models subject to domain-specific constraints. Efficiently searching over this space requires expertise in modeling and the problem domain. Motivated by the domain knowledge and programming capabilities of large language models (LMs), we introduce a method for language model driven automated statistical model discovery. We ca… ▽ More

    Submitted 22 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  18. arXiv:2402.17233  [pdf, other

    cs.LG stat.AP stat.ME

    Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response

    Authors: Bob Junyi Zou, Matthew E. Levine, Dessi P. Zaharieva, Ramesh Johari, Emily B. Fox

    Abstract: Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox… ▽ More

    Submitted 11 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  19. arXiv:2312.03344  [pdf, other

    cs.LG math.DS stat.AP stat.ML

    Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild

    Authors: Ke Alexander Wang, Emily B. Fox

    Abstract: Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Proceedings of Machine Learning for Health (ML4H) 2023. Code available at: https://github.com/KeAWang/interpretable-cgm-representations

  20. arXiv:2311.06300  [pdf, other

    cs.HC cs.AI

    AI Chatbot for Generating Episodic Future Thinking (EFT) Cue Texts for Health

    Authors: Sareh Ahmadi, Edward A. Fox

    Abstract: We describe an AI-powered chatbot to aid with health improvement by generating Episodic Future Thinking (EFT) cue texts that should reduce delay discounting. In prior studies, EFT has been shown to address maladaptive health behaviors. Those studies involved participants, working with researchers, vividly imagining future events, and writing a description that they subsequently will frequently rev… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  21. arXiv:2311.04262  [pdf, other

    cs.CV cs.AI cs.DL cs.LG

    ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations

    Authors: Muntabir Hasan Choudhury, Lamia Salsabil, William A. Ingram, Edward A. Fox, Jian Wu

    Abstract: Electronic theses and dissertations (ETDs) have been proposed, advocated, and generated for more than 25 years. Although ETDs are hosted by commercial or institutional digital library repositories, they are still an understudied type of scholarly big data, partially because they are usually longer than conference proceedings and journals. Segmenting ETDs will allow researchers to study sectional c… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: 10 pages, 3 figures, accepted to Innovative Applications of Artificial Intelligence (IAAI-24)

  22. Maximizing Equitable Reach and Accessibility of ETDs

    Authors: William A. Ingram, Jian Wu, Edward A. Fox

    Abstract: This poster addresses accessibility issues of electronic theses and dissertations (ETDs) in digital libraries (DLs). ETDs are available primarily as PDF files, which present barriers to equitable access, especially for users with visual impairments, cognitive or learning disabilities, or for anyone needing more efficient and effective ways of finding relevant information within these long document… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Journal ref: 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Santa Fe, NM, USA, 2023, pp. 256-257

  23. arXiv:2307.14899  [pdf, other

    cs.CL

    Retrieval-based Text Selection for Addressing Class-Imbalanced Data in Classification

    Authors: Sareh Ahmadi, Aditya Shah, Edward Fox

    Abstract: This paper addresses the problem of selecting of a set of texts for annotation in text classification using retrieval methods when there are limits on the number of annotations due to constraints on human resources. An additional challenge addressed is dealing with binary categories that have a small number of positive instances, reflecting severe class imbalance. In our situation, where annotatio… ▽ More

    Submitted 9 November, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  24. arXiv:2307.07440  [pdf, ps, other

    cs.DS

    A simple deterministic near-linear time approximation scheme for transshipment with arbitrary positive edge costs

    Authors: Emily Fox

    Abstract: We describe a simple deterministic near-linear time approximation scheme for uncapacitated minimum cost flow in undirected graphs with real edge weights, a problem also known as transshipment. Specifically, our algorithm takes as input a (connected) undirected graph $G = (V, E)$, vertex demands $b \in \mathbb{R}^V$ such that $\sum_{v \in V} b(v) = 0$, positive edge costs $c \in \mathbb{R}_{>0}^E$,… ▽ More

    Submitted 26 June, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted for ESA 2024 v3: ESA 2024 reviewer suggestions

  25. arXiv:2305.01638  [pdf, other

    cs.LG cs.CV stat.ML

    Sequence Modeling with Multiresolution Convolutional Memory

    Authors: Jiaxin Shi, Ke Alexander Wang, Emily B. Fox

    Abstract: Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural n… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: ICML 2023, Source code: https://github.com/thjashin/multires-conv

  26. arXiv:2304.14300  [pdf, other

    cs.LG math.DS q-bio.QM

    Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates

    Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox

    Abstract: Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficu… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: Work presented at NeurIPS 2022 Workshop on Learning from Time Series for Health (TS4H). arXiv admin note: substantial text overlap with arXiv:2302.11939

  27. arXiv:2303.17661  [pdf, other

    cs.DL cs.AI cs.LG

    MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries

    Authors: Muntabir Hasan Choudhury, Lamia Salsabil, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

    Abstract: Metadata quality is crucial for digital objects to be discovered through digital library interfaces. However, due to various reasons, the metadata of digital objects often exhibits incomplete, inconsistent, and incorrect values. We investigate methods to automatically detect, correct, and canonicalize scholarly metadata, using seven key fields of electronic theses and dissertations (ETDs) as a cas… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 7 pages, 3 tables, and 1 figure. Accepted by 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL '23) as a short paper

  28. arXiv:2301.13321  [pdf, other

    econ.TH cs.GT

    Censorship Resistance in On-Chain Auctions

    Authors: Elijah Fox, Mallesh Pai, Max Resnick

    Abstract: Modern blockchains guarantee that submitted transactions will be included eventually; a property formally known as liveness. But financial activity requires transactions to be included in a timely manner. Unfortunately, classical liveness is not strong enough to guarantee this, particularly in the presence of a motivated adversary who benefits from censoring transactions. We define censorship resi… ▽ More

    Submitted 26 June, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: 27 pages, 2 figures

  29. arXiv:2211.03891  [pdf, ps, other

    cs.CG cs.DS

    A deterministic near-linear time approximation scheme for geometric transportation

    Authors: Emily Fox, Jiashuai Lu

    Abstract: Given a set of points $P = (P^+ \sqcup P^-) \subset \mathbb{R}^d$ for some constant $d$ and a supply function $μ:P\to \mathbb{R}$ such that $μ(p) > 0~\forall p \in P^+$, $μ(p) < 0~\forall p \in P^-$, and $\sum_{p\in P}{μ(p)} = 0$, the geometric transportation problem asks one to find a transportation map $Ï„: P^+\times P^-\to \mathbb{R}_{\ge 0}$ such that… ▽ More

    Submitted 27 September, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: To appear in FOCS 2023. 24 pages. Update 2: Added corrections for minimum cost flow approximation scheme. Addressed reviewer comments. Update 1: Adds a new randomized near-linear time approximation scheme for uncapacitated minimum cost flow in undirected graphs (transshipment) with arbitrary edge costs. References more recent work in geometric bipartite matching

  30. arXiv:2107.00516  [pdf, other

    cs.DL cs.LG

    Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations

    Authors: Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

    Abstract: Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documen… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: 7 pages, 4 figures, 1 table. Accepted by JCDL '21 as a short paper

  31. arXiv:2106.15320  [pdf, other

    cs.CV cs.DL cs.LG

    ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations

    Authors: Sampanna Yashwant Kahu, William A. Ingram, Edward A. Fox, Jian Wu

    Abstract: We focus on electronic theses and dissertations (ETDs), aiming to improve access and expand their utility, since more than 6 million are publicly available, and they constitute an important corpus to aid research and education across disciplines. The corpus is growing as new born-digital documents are included, and since millions of older theses and dissertations have been converted to digital for… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: 16 pages, 3 figures, submitted to ACM/IEEE Joint Conference on Digital Libraries

  32. arXiv:2105.02675  [pdf, other

    stat.ME cs.LG stat.ML

    Granger Causality: A Review and Recent Advances

    Authors: Ali Shojaie, Emily B. Fox

    Abstract: Introduced more than a half century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this notion for inferring causal relationships among time series has remained the topic of continuous debate. Moreover, while the original definition was gen… ▽ More

    Submitted 6 May, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

    Comments: 40 pages, 12 figures

  33. arXiv:2104.12231  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance

    Authors: Andrew C. Miller, Leon A. Gatys, Joseph Futoma, Emily B. Fox

    Abstract: Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions $-$ are evaluated with a variety of performance metrics. An important first step in assessing the practical utility of a model is to evaluate its average performance over an entire population of interest. In many settings, it is also critical that the model makes good predictions within predefined… ▽ More

    Submitted 25 April, 2021; originally announced April 2021.

    Comments: 27 pages, 8 figures

  34. arXiv:2104.12219  [pdf, other

    stat.ML cs.LG stat.ME

    Breiman's two cultures: You don't have to choose sides

    Authors: Andrew C. Miller, Nicholas J. Foti, Emily B. Fox

    Abstract: Breiman's classic paper casts data analysis as a choice between two cultures: data modelers and algorithmic modelers. Stated broadly, data modelers use simple, interpretable models with well-understood theoretical properties to analyze data. Algorithmic modelers prioritize predictive accuracy and use more flexible function approximations to analyze data. This dichotomy overlooks a third set of mod… ▽ More

    Submitted 25 April, 2021; originally announced April 2021.

    Comments: Commentary to appear in a special issue of Observational Studies, discussing Leo Breiman's paper "Statistical Modeling: The Two Cultures" (https://doi.org/10.1214/ss/1009213726)

  35. Differentially Private Synthetic Medical Data Generation using Convolutional GANs

    Authors: Amirsina Torfi, Edward A. Fox, Chandan K. Reddy

    Abstract: Deep learning models have demonstrated superior performance in several application problems, such as image classification and speech processing. However, creating a deep learning model using health record data requires addressing certain privacy challenges that bring unique concerns to researchers working in this domain. One effective way to handle such private data issues is to generate realistic… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  36. arXiv:2012.00110  [pdf, other

    stat.ML cs.LG stat.AP

    Representing and Denoising Wearable ECG Recordings

    Authors: Jeffrey Chan, Andrew C. Miller, Emily B. Fox

    Abstract: Modern wearable devices are embedded with a range of noninvasive biomarker sensors that hold promise for improving detection and treatment of disease. One such sensor is the single-lead electrocardiogram (ECG) which measures electrical signals in the heart. The benefits of the sheer volume of ECG measurements with rich longitudinal structure made possible by wearables come at the price of potentia… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: ML for Mobile Health Workshop, NeurIPS 2020

  37. arXiv:2010.03549  [pdf, other

    cs.CV cs.AI

    On the Evaluation of Generative Adversarial Networks By Discriminative Models

    Authors: Amirsina Torfi, Mohammadreza Beyki, Edward A. Fox

    Abstract: Generative Adversarial Networks (GANs) can accurately model complex multi-dimensional data and generate realistic samples. However, due to their implicit estimation of data distributions, their evaluation is a challenging task. The majority of research efforts associated with tackling this issue were validated by qualitative visual evaluation. Such approaches do not generalize well beyond the imag… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to be published in ICPR 2020

  38. arXiv:2009.04485  [pdf, other

    cs.CL cs.IR cs.LG

    Aspect Classification for Legal Depositions

    Authors: Saurabh Chakravarty, Satvik Chekuri, Maanav Mehrotra, Edward A. Fox

    Abstract: Attorneys and others have a strong interest in having a digital library with suitable services (e.g., summarizing, searching, and browsing) to help them work with large corpora of legal depositions. Their needs often involve understanding the semantics of such documents. That depends in part on the role of the deponent, e.g., plaintiff, defendant, law enforcement personnel, expert, etc. In the cas… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

    Comments: 19 pages, 3 figures, 11 tables, detailed version of shorter paper being submitted to a conference

  39. arXiv:2008.02852  [pdf, other

    stat.ML cs.LG stat.AP

    Learning Insulin-Glucose Dynamics in the Wild

    Authors: Andrew C. Miller, Nicholas J. Foti, Emily Fox

    Abstract: We develop a new model of insulin-glucose dynamics for forecasting blood glucose in type 1 diabetics. We augment an existing biomedical model by introducing time-varying dynamics driven by a machine learning sequence model. Our model maintains a physiologically plausible inductive bias and clinically interpretable parameters -- e.g., insulin sensitivity -- while inheriting the flexibility of moder… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: Machine Learning for Healthcare 2020

  40. arXiv:2003.12206  [pdf, other

    cs.LG stat.ML

    Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)

    Authors: Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d'Alché-Buc, Emily Fox, Hugo Larochelle

    Abstract: One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible res… ▽ More

    Submitted 30 December, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: To appear at JMLR, 16 pages + Appendix

  41. arXiv:2003.01200  [pdf, other

    cs.CL cs.AI cs.LG

    Natural Language Processing Advancements By Deep Learning: A Survey

    Authors: Amirsina Torfi, Rouzbeh A. Shirvani, Yaser Keneshloo, Nader Tavaf, Edward A. Fox

    Abstract: Natural Language Processing (NLP) helps empower intelligent machines by enhancing a better understanding of the human language for linguistic-based human-computer communication. Recent developments in computational power and the advent of large amounts of linguistic data have heightened the need and demand for automating semantic analysis using data-driven approaches. The utilization of data-drive… ▽ More

    Submitted 27 February, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

  42. arXiv:2001.09346  [pdf, other

    cs.LG stat.ML

    CorGAN: Correlation-Capturing Convolutional Generative Adversarial Networks for Generating Synthetic Healthcare Records

    Authors: Amirsina Torfi, Edward A. Fox

    Abstract: Deep learning models have demonstrated high-quality performance in areas such as image classification and speech processing. However, creating a deep learning model using electronic health record (EHR) data, requires addressing particular privacy challenges that are unique to researchers in this domain. This matter focuses attention on generating realistic synthetic data while ensuring privacy. In… ▽ More

    Submitted 4 March, 2020; v1 submitted 25 January, 2020; originally announced January 2020.

    Comments: Accepted to be published in the 33rd International FLAIRS Conference, AI in Healthcare Informatics

  43. arXiv:1911.05683  [pdf, other

    cs.LG cs.HC stat.ML

    Modeling patterns of smartphone usage and their relationship to cognitive health

    Authors: Jonas Rauber, Emily B. Fox, Leon A. Gatys

    Abstract: The ubiquity of smartphone usage in many people's lives make it a rich source of information about a person's mental and cognitive state. In this work we analyze 12 weeks of phone usage data from 113 older adults, 31 with diagnosed cognitive impairment and 82 without. We develop structured models of users' smartphone interactions to reveal differences in phone usage patterns between people with an… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  44. arXiv:1905.07473  [pdf, other

    cs.LG math.OC stat.ML

    Adaptively Truncating Backpropagation Through Time to Control Gradient Bias

    Authors: Christopher Aicher, Nicholas J. Foti, Emily B. Fox

    Abstract: Truncated backpropagation through time (TBPTT) is a popular method for learning in recurrent neural networks (RNNs) that saves computation and memory at the cost of bias by truncating backpropagation after a fixed number of lags. In practice, choosing the optimal truncation length is difficult: TBPTT will not converge if the truncation length is too small, or will converge slowly if it is too larg… ▽ More

    Submitted 1 July, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

  45. arXiv:1901.10568  [pdf, other

    stat.ML cs.LG stat.CO

    Stochastic Gradient MCMC for Nonlinear State Space Models

    Authors: Christopher Aicher, Srshti Putcha, Christopher Nemeth, Paul Fearnhead, Emily B. Fox

    Abstract: State space models (SSMs) provide a flexible framework for modeling complex time series via a latent stochastic process. Inference for nonlinear, non-Gaussian SSMs is often tackled with particle methods that do not scale well to long time series. The challenge is two-fold: not only do computations scale linearly with time, as in the linear case, but particle filters additionally suffer from increa… ▽ More

    Submitted 16 July, 2023; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: To appear in Bayesian Analysis

  46. arXiv:1811.10202  [pdf, other

    cs.SI

    A Hybrid Model for Role-related User Classification on Twitter

    Authors: Liuqing Li, Ziqian Song, Xuan Zhang, Edward A. Fox

    Abstract: To aid a variety of research studies, we propose TWIROLE, a hybrid model for role-related user classification on Twitter, which detects male-related, female-related, and brand-related (i.e., organization or institution) users. TWIROLE leverages features from tweet contents, user profiles, and profile images, and then applies our hybrid model to identify a user's role. To evaluate it, we used two e… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

  47. arXiv:1810.09098  [pdf, other

    stat.ML cs.LG stat.CO

    Stochastic Gradient MCMC for State Space Models

    Authors: Christopher Aicher, Yi-An Ma, Nicholas J. Foti, Emily B. Fox

    Abstract: State space models (SSMs) are a flexible approach to modeling complex time series. However, inference in SSMs is often computationally prohibitive for long time series. Stochastic gradient MCMC (SGMCMC) is a popular method for scalable Bayesian inference for large independent data. Unfortunately when applied to dependent data, such as in SSMs, SGMCMC's stochastic gradient estimates are biased as t… ▽ More

    Submitted 9 July, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

  48. arXiv:1807.07621  [pdf, other

    stat.ML cs.LG stat.CO

    Approximate Collapsed Gibbs Clustering with Expectation Propagation

    Authors: Christopher Aicher, Emily B. Fox

    Abstract: We develop a framework for approximating collapsed Gibbs sampling in generative latent variable cluster models. Collapsed Gibbs is a popular MCMC method, which integrates out variables in the posterior to improve mixing. Unfortunately for many complex models, integrating out these variables is either analytically or computationally intractable. We efficiently approximate the necessary collapsed Gi… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

  49. arXiv:1806.09060  [pdf, other

    cs.LG stat.ML

    Disentangled VAE Representations for Multi-Aspect and Missing Data

    Authors: Samuel K. Ainsworth, Nicholas J. Foti, Emily B. Fox

    Abstract: Many problems in machine learning and related application areas are fundamentally variants of conditional modeling and sampling across multi-aspect data, either multi-view, multi-modal, or simply multi-group. For example, sampling from the distribution of English sentences conditioned on a given French sentence or sampling audio waveforms conditioned on a given piece of text. Central to many of th… ▽ More

    Submitted 23 June, 2018; originally announced June 2018.

  50. arXiv:1806.07137  [pdf, other

    stat.CO cs.LG stat.ML

    Large-Scale Stochastic Sampling from the Probability Simplex

    Authors: Jack Baker, Paul Fearnhead, Emily B Fox, Christopher Nemeth

    Abstract: Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demons… ▽ More

    Submitted 26 October, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

    Comments: Accepted to Advances in Neural Information Processing Systems (2018)