-
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Authors:
Edan Toledo,
Karen Hambardzumyan,
Martin Josifoski,
Rishi Hazra,
Nicolas Baldwin,
Alexis Audran-Reiss,
Michael Kuchnik,
Despoina Magka,
Minqi Jiang,
Alisia Maria Lupidi,
Andrei Lupu,
Roberta Raileanu,
Kelvin Niu,
Tatiana Shavrina,
Jean-Christophe Gagnon-Audet,
Michael Shvartsman,
Shagun Sodhani,
Alexander H. Miller,
Abhishek Charnalia,
Derek Dunfield,
Carole-Jean Wu,
Pontus Stenetorp,
Nicola Cancedda,
Jakob Nicolaus Foerster,
Yoram Bachrach
Abstract:
AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search polic…
▽ More
AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
LeanLTL: A unifying framework for linear temporal logics in Lean
Authors:
Eric Vin,
Kyle A. Miller,
Daniel J. Fremont
Abstract:
We propose LeanLTL, a unifying framework for linear temporal logics in Lean 4. LeanLTL supports reasoning about traces that represent either infinite or finite linear time. The library allows traditional LTL syntax to be combined with arbitrary Lean expressions, making it straightforward to define properties involving numerical or other types. We prove that standard flavors of LTL can be embedded…
▽ More
We propose LeanLTL, a unifying framework for linear temporal logics in Lean 4. LeanLTL supports reasoning about traces that represent either infinite or finite linear time. The library allows traditional LTL syntax to be combined with arbitrary Lean expressions, making it straightforward to define properties involving numerical or other types. We prove that standard flavors of LTL can be embedded in our framework. The library also provides automation for reasoning about LeanLTL formulas in a way that facilitates using Lean's existing tactics. Finally, we provide examples illustrating the utility of the library in reasoning about systems that come from applications.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Authors:
Bingchen Zhao,
Despoina Magka,
Minqi Jiang,
Xian Li,
Roberta Raileanu,
Tatiana Shavrina,
Jean-Christophe Gagnon-Audet,
Kelvin Niu,
Shagun Sodhani,
Michael Shvartsman,
Andrei Lupu,
Alisia Lupidi,
Edan Toledo,
Karen Hambardzumyan,
Martin Josifoski,
Thomas Foster,
Lucia Cipolina-Kun,
Abhishek Charnalia,
Derek Dunfield,
Alexander H. Miller,
Oisin Mac Aodha,
Jakob Foerster,
Yoram Bachrach
Abstract:
Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce results in an active research area, we introduce the Automated LLM Speedrunning Benchmark, leveraging the research community contributions on the NanoGPT speedr…
▽ More
Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce results in an active research area, we introduce the Automated LLM Speedrunning Benchmark, leveraging the research community contributions on the NanoGPT speedrun, a competition to train a GPT-2 model in the shortest time. Each of the 19 speedrun tasks provides the agent with the previous records training script, optionally paired with one of three hint formats, ranging from pseudocode to paper-like descriptions of the new records improvements. Records execute quickly by design and speedrun improvements encompass diverse code-level changes, ranging from high-level algorithmic advancements to hardware-aware optimizations. These features make the benchmark both accessible and realistic for the frontier problem of improving LLM training. We find that recent reasoning LLMs combined with SoTA scaffolds struggle to reimplement already-known innovations in our benchmark, even when given detailed hints. Our benchmark thus provides a simple, non-saturated measure of an LLMs ability to automate scientific reproduction, a necessary (but not sufficient) skill for an autonomous research agent.
△ Less
Submitted 30 June, 2025; v1 submitted 27 June, 2025;
originally announced June 2025.
-
Narrowing the Gap between TEEs Threat Model and Deployment Strategies
Authors:
Filip Rezabek,
Jonathan Passerat-Palmbach,
Moe Mahhouk,
Frieder Erdmann,
Andrew Miller
Abstract:
Confidential Virtual Machines (CVMs) provide isolation guarantees for data in use, but their threat model does not include physical level protection and side-channel attacks. Therefore, current deployments rely on trusted cloud providers to host the CVMs' underlying infrastructure. However, TEE attestations do not provide information about the operator hosting a CVM. Without knowing whether a Trus…
▽ More
Confidential Virtual Machines (CVMs) provide isolation guarantees for data in use, but their threat model does not include physical level protection and side-channel attacks. Therefore, current deployments rely on trusted cloud providers to host the CVMs' underlying infrastructure. However, TEE attestations do not provide information about the operator hosting a CVM. Without knowing whether a Trusted Execution Environment (TEE) runs within a provider's infrastructure, a user cannot accurately assess the risks of physical attacks. We observe a misalignment in the threat model where the workloads are protected against other tenants but do not offer end-to-end security assurances to external users without relying on cloud providers. The attestation should be extended to bind the CVM with the provider. A possible solution can rely on the Protected Platform Identifier (PPID), a unique CPU identifier. However, the implementation details of various TEE manufacturers, attestation flows, and providers vary. This makes verification of attestations, ease of migration, and building applications without relying on a trusted party challenging, highlighting a key limitation that must be addressed for the adoption of CVMs. We discuss two points focusing on hardening and extensions of TEEs' attestation.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Optimizing Storytelling, Improving Audience Retention, and Reducing Waste in the Entertainment Industry
Authors:
Andrew Cornfeld,
Ashley Miller,
Mercedes Mora-Figueroa,
Kurt Samuels,
Anthony Palomba
Abstract:
Television networks face high financial risk when making programming decisions, often relying on limited historical data to forecast episodic viewership. This study introduces a machine learning framework that integrates natural language processing (NLP) features from over 25000 television episodes with traditional viewership data to enhance predictive accuracy. By extracting emotional tone, cogni…
▽ More
Television networks face high financial risk when making programming decisions, often relying on limited historical data to forecast episodic viewership. This study introduces a machine learning framework that integrates natural language processing (NLP) features from over 25000 television episodes with traditional viewership data to enhance predictive accuracy. By extracting emotional tone, cognitive complexity, and narrative structure from episode dialogue, we evaluate forecasting performance using SARIMAX, rolling XGBoost, and feature selection models. While prior viewership remains a strong baseline predictor, NLP features contribute meaningful improvements for some series. We also introduce a similarity scoring method based on Euclidean distance between aggregate dialogue vectors to compare shows by content. Tested across diverse genres, including Better Call Saul and Abbott Elementary, our framework reveals genre-specific performance and offers interpretable metrics for writers, executives, and marketers seeking data-driven insight into audience behavior.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures
Authors:
AJ Miller,
Fangzhou Yu,
Michael Brauckmann,
Farbod Farshidian
Abstract:
This work presents an overview of the technical details behind a high performance reinforcement learning policy deployment with the Spot RL Researcher Development Kit for low level motor access on Boston Dynamics Spot. This represents the first public demonstration of an end to end end reinforcement learning policy deployed on Spot hardware with training code publicly available through Nvidia Isaa…
▽ More
This work presents an overview of the technical details behind a high performance reinforcement learning policy deployment with the Spot RL Researcher Development Kit for low level motor access on Boston Dynamics Spot. This represents the first public demonstration of an end to end end reinforcement learning policy deployed on Spot hardware with training code publicly available through Nvidia IsaacLab and deployment code available through Boston Dynamics. We utilize Wasserstein Distance and Maximum Mean Discrepancy to quantify the distributional dissimilarity of data collected on hardware and in simulation to measure our sim2real gap. We use these measures as a scoring function for the Covariance Matrix Adaptation Evolution Strategy to optimize simulated parameters that are unknown or difficult to measure from Spot. Our procedure for modeling and training produces high quality reinforcement learning policies capable of multiple gaits, including a flight phase. We deploy policies capable of over 5.2ms locomotion, more than triple Spots default controller maximum speed, robustness to slippery surfaces, disturbance rejection, and overall agility previously unseen on Spot. We detail our method and release our code to support future work on Spot with the low level API.
△ Less
Submitted 3 July, 2025; v1 submitted 24 April, 2025;
originally announced April 2025.
-
How new data permeates LLM knowledge and how to dilute it
Authors:
Chen Sun,
Renat Aksitov,
Andrey Zhmoginov,
Nolan Andrew Miller,
Max Vladymyrov,
Ulrich Rueckert,
Been Kim,
Mark Sandler
Abstract:
Large language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information affect existing knowledge, leading to both beneficial generalization and problematic hallucination, remains poorly understood. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to…
▽ More
Large language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information affect existing knowledge, leading to both beneficial generalization and problematic hallucination, remains poorly understood. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. To systematically study this phenomenon, we introduce "Outlandish," a carefully curated dataset of 1320 diverse text samples designed to probe how new knowledge permeates through an LLM's existing knowledge base. Using this dataset, we show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning. This relationship holds robustly across different model architectures (PALM-2, Gemma, Llama), sizes, and training stages. Finally, we develop two novel techniques to modulate how new knowledge affects existing model behavior: (1) a ``stepping-stone'' text augmentation strategy and (2) an ``ignore-k'' update pruning method. These approaches reduce undesirable priming effects by 50-95\% while preserving the model's ability to learn new information. Our findings provide both empirical insights into how LLMs learn and practical tools for improving the specificity of knowledge insertion in language models. Further materials: https://sunchipsster1.github.io/projects/outlandish/
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Using tournaments to calculate AUROC for zero-shot classification with LLMs
Authors:
Wonjin Yoon,
Ian Bulovic,
Timothy A. Miller
Abstract:
Large language models perform surprisingly well on many zero-shot classification tasks, but are difficult to fairly compare to supervised classifiers due to the lack of a modifiable decision boundary. In this work, we propose and evaluate a method that converts binary classification tasks into pairwise comparison tasks, obtaining relative rankings from LLMs. Repeated pairwise comparisons can be us…
▽ More
Large language models perform surprisingly well on many zero-shot classification tasks, but are difficult to fairly compare to supervised classifiers due to the lack of a modifiable decision boundary. In this work, we propose and evaluate a method that converts binary classification tasks into pairwise comparison tasks, obtaining relative rankings from LLMs. Repeated pairwise comparisons can be used to score instances using the Elo rating system (used in chess and other competitions), inducing a confidence ordering over instances in a dataset. We evaluate scheduling algorithms for their ability to minimize comparisons, and show that our proposed algorithm leads to improved classification performance, while also providing more information than traditional zero-shot classification.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
NDAI Agreements
Authors:
Matthew Stephenson,
Andrew Miller,
Xyn Sun,
Bhargav Annem,
Rohan Parikh
Abstract:
We study a fundamental challenge in the economics of innovation: an inventor must reveal details of a new idea to secure compensation or funding, yet such disclosure risks expropriation. We present a model in which a seller (inventor) and buyer (investor) bargain over an information good under the threat of hold-up. In the classical setting, the seller withholds disclosure to avoid misappropriatio…
▽ More
We study a fundamental challenge in the economics of innovation: an inventor must reveal details of a new idea to secure compensation or funding, yet such disclosure risks expropriation. We present a model in which a seller (inventor) and buyer (investor) bargain over an information good under the threat of hold-up. In the classical setting, the seller withholds disclosure to avoid misappropriation, leading to inefficiency. We show that trusted execution environments (TEEs) combined with AI agents can mitigate and even fully eliminate this hold-up problem. By delegating the disclosure and payment decisions to tamper-proof programs, the seller can safely reveal the invention without risking expropriation, achieving full disclosure and an efficient ex post transfer. Moreover, even if the invention's value exceeds a threshold that TEEs can fully secure, partial disclosure still improves outcomes compared to no disclosure. Recognizing that real AI agents are imperfect, we model "agent errors" in payments or disclosures and demonstrate that budget caps and acceptance thresholds suffice to preserve most of the efficiency gains.
Our results imply that cryptographic or hardware-based solutions can function as an "ironclad NDA," substantially mitigating the fundamental disclosure-appropriation paradox first identified by Arrow (1962) and Nelson (1959). This has far-reaching policy implications for fostering R&D, technology transfer, and collaboration.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
The Limits of Tolerance
Authors:
Alan D. Miller
Abstract:
I propose a model of aggregation of intervals relevant to the study of legal standards of tolerance. Seven axioms: responsiveness, anonymity, continuity, strategyproofness, and three variants of neutrality are then used to prove several important results about a new class of aggregation methods called endpoint rules. The class of endpoint rules includes extreme tolerance (allowing anything permitt…
▽ More
I propose a model of aggregation of intervals relevant to the study of legal standards of tolerance. Seven axioms: responsiveness, anonymity, continuity, strategyproofness, and three variants of neutrality are then used to prove several important results about a new class of aggregation methods called endpoint rules. The class of endpoint rules includes extreme tolerance (allowing anything permitted by anyone) and a form of majoritarianism (the median rule).
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
Leveraging Cardiovascular Simulations for In-Vivo Prediction of Cardiac Biomarkers
Authors:
Laura Manduchi,
Antoine Wehenkel,
Jens Behrmann,
Luca Pegolotti,
Andy C. Miller,
Ozan Sener,
Marco Cuturi,
Guillermo Sapiro,
Jörn-Henrik Jacobsen
Abstract:
Whole-body hemodynamics simulators, which model blood flow and pressure waveforms as functions of physiological parameters, are now essential tools for studying cardiovascular systems. However, solving the corresponding inverse problem of mapping observations (e.g., arterial pressure waveforms at specific locations in the arterial network) back to plausible physiological parameters remains challen…
▽ More
Whole-body hemodynamics simulators, which model blood flow and pressure waveforms as functions of physiological parameters, are now essential tools for studying cardiovascular systems. However, solving the corresponding inverse problem of mapping observations (e.g., arterial pressure waveforms at specific locations in the arterial network) back to plausible physiological parameters remains challenging. Leveraging recent advances in simulation-based inference, we cast this problem as statistical inference by training an amortized neural posterior estimator on a newly built large dataset of cardiac simulations that we publicly release. To better align simulated data with real-world measurements, we incorporate stochastic elements modeling exogenous effects. The proposed framework can further integrate in-vivo data sources to refine its predictive capabilities on real-world data. In silico, we demonstrate that the proposed framework enables finely quantifying uncertainty associated with individual measurements, allowing trustworthy prediction of four biomarkers of clinical interest--namely Heart Rate, Cardiac Output, Systemic Vascular Resistance, and Left Ventricular Ejection Time--from arterial pressure waveforms and photoplethysmograms. Furthermore, we validate the framework in vivo, where our method accurately captures temporal trends in CO and SVR monitoring on the VitalDB dataset. Finally, the predictive error made by the model monotonically increases with the predicted uncertainty, thereby directly supporting the automatic rejection of unusable measurements.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Authors:
Salar Abbaspourazad,
Anshuman Mishra,
Joseph Futoma,
Andrew C. Miller,
Ian Shapiro
Abstract:
Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosign…
▽ More
Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly smaller power footprint and is available in almost any wearable device. While accelerometry is widely used for activity recognition and fitness, it is less explored for health biomarkers and diagnosis. Here, we show that an accelerometry foundation model can predict a wide variety of health targets. To achieve improved performance, we distill representational knowledge from PPG encoders to accelerometery encoders using 20 million minutes of unlabeled data, collected from ~172K participants in the Apple Heart and Movement Study under informed consent. We observe strong cross-modal alignment on unseen data, e.g., 99.2% top-1 accuracy for retrieving PPG embeddings from accelerometry embeddings. We show that distilled accelerometry encoders have significantly more informative representations compared to self-supervised or supervised encoders trained directly on accelerometry data, observed by at least 23%-49% improved performance for predicting heart rate and heart rate variability. We also show that distilled accelerometry encoders are readily predictive of a wide array of downstream health targets, i.e., they are generalist foundation models. We believe accelerometry foundation models for health may unlock new opportunities for developing digital biomarkers from any wearable device.
△ Less
Submitted 31 January, 2025; v1 submitted 15 December, 2024;
originally announced December 2024.
-
Closed-loop multi-step planning with innate physics knowledge
Authors:
Giulia Lafratta,
Bernd Porr,
Christopher Chandler,
Alice Miller
Abstract:
We present a hierarchical framework to solve robot planning as an input control problem. At the lowest level are temporary closed control loops, ("tasks"), each representing a behaviour, contingent on a specific sensory input and therefore temporary. At the highest level, a supervising "Configurator" directs task creation and termination. Here resides "core" knowledge as a physics engine, where se…
▽ More
We present a hierarchical framework to solve robot planning as an input control problem. At the lowest level are temporary closed control loops, ("tasks"), each representing a behaviour, contingent on a specific sensory input and therefore temporary. At the highest level, a supervising "Configurator" directs task creation and termination. Here resides "core" knowledge as a physics engine, where sequences of tasks can be simulated. The Configurator encodes and interprets simulation results,based on which it can choose a sequence of tasks as a plan. We implement this framework on a real robot and test it in an overtaking scenario as proof-of-concept.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability
Authors:
Yanjun Gao,
Skatje Myers,
Shan Chen,
Dmitriy Dligach,
Timothy A Miller,
Danielle Bitterman,
Guanhua Chen,
Anoop Mayampurath,
Matthew Churpek,
Majid Afshar
Abstract:
Large language models (LLMs) are being explored for diagnostic decision support, yet their ability to estimate pre-test probabilities, vital for clinical decision-making, remains limited. This study evaluates two LLMs, Mistral-7B and Llama3-70B, using structured electronic health record data on three diagnosis tasks. We examined three current methods of extracting LLM probability estimations and r…
▽ More
Large language models (LLMs) are being explored for diagnostic decision support, yet their ability to estimate pre-test probabilities, vital for clinical decision-making, remains limited. This study evaluates two LLMs, Mistral-7B and Llama3-70B, using structured electronic health record data on three diagnosis tasks. We examined three current methods of extracting LLM probability estimations and revealed their limitations. We aim to highlight the need for improved techniques in LLM confidence estimation.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Reclaiming "Open AI" -- AI Model Serving Can Be Open Access, Yet Monetizable and Loyal
Authors:
Zerui Cheng,
Edoardo Contente,
Ben Finch,
Oleg Golev,
Jonathan Hayase,
Andrew Miller,
Niusha Moshrefi,
Anshul Nasery,
Sandeep Nailwal,
Sewoong Oh,
Himanshu Tyagi,
Pramod Viswanath
Abstract:
The rapid rise of AI has split model serving between open-weight distribution, which often lacks owner control and monetization, and opaque API-based approaches that risk user privacy and model transparency, forming a dichotomy that hinders an equitable AI ecosystem. This position paper introduces, rigorously formulates, and champions the Open-access, Monetizable, and Loyal (OML) paradigm for AI m…
▽ More
The rapid rise of AI has split model serving between open-weight distribution, which often lacks owner control and monetization, and opaque API-based approaches that risk user privacy and model transparency, forming a dichotomy that hinders an equitable AI ecosystem. This position paper introduces, rigorously formulates, and champions the Open-access, Monetizable, and Loyal (OML) paradigm for AI model serving: a foundational shift to securely distribute and serve AI models by synthesizing transparency with granular monetization and critical safety controls. We survey diverse OML constructions from theory and practice, analyze their security, performance, and practical trade-offs, outline a conceptual OML deployment protocol, and discuss market and policy implications. We assert that OML can foster a democratized, self-sustaining, and innovative AI landscape, mitigating centralized power risks. Finally, we call on the research community to further explore the broad design space of OML, spanning cryptographic, AI-native, and socio-economic mechanisms, to realize its full potential for a collaborative, accountable, and resilient AI future.
△ Less
Submitted 3 June, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
Learning and Unlearning of Fabricated Knowledge in Language Models
Authors:
Chen Sun,
Nolan Andrew Miller,
Andrey Zhmoginov,
Max Vladymyrov,
Mark Sandler
Abstract:
What happens when a new piece of knowledge is introduced into the training data and how long does it last while a large language model (LM) continues to train? We investigate this question by injecting facts into LMs from a new probing dataset, "Outlandish", which is designed to permit the testing of a spectrum of different fact types. When studying how robust these memories are, there appears to…
▽ More
What happens when a new piece of knowledge is introduced into the training data and how long does it last while a large language model (LM) continues to train? We investigate this question by injecting facts into LMs from a new probing dataset, "Outlandish", which is designed to permit the testing of a spectrum of different fact types. When studying how robust these memories are, there appears to be a sweet spot in the spectrum of fact novelty between consistency with world knowledge and total randomness, where the injected memory is the most enduring. Specifically we show that facts that conflict with common knowledge are remembered for tens of thousands of training steps, while prompts not conflicting with common knowledge (mundane), as well as scrambled prompts (randomly jumbled) are both forgotten much more rapidly. Further, knowledge-conflicting facts can "prime'' how the language model hallucinates on logically unrelated prompts, showing their propensity for non-target generalization, while both mundane and randomly jumbled facts prime significantly less. Finally, we show that impacts of knowledge-conflicting facts in LMs, though they can be long lasting, can be largely erased by novel application of multi-step sparse updates, even while the training ability of the model is preserved. As such, this very simple procedure has direct implications for mitigating the effects of data poisoning in training.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Considerations for Distribution Shift Robustness of Diagnostic Models in Healthcare
Authors:
Arno Blaas,
Adam Goliński,
Andrew Miller,
Luca Zappella,
Jörn-Henrik Jacobsen,
Christina Heinze-Deml
Abstract:
We consider robustness to distribution shifts in the context of diagnostic models in healthcare, where the prediction target $Y$, e.g., the presence of a disease, is causally upstream of the observations $X$, e.g., a biomarker. Distribution shifts may occur, for instance, when the training data is collected in a domain with patients having particular demographic characteristics while the model is…
▽ More
We consider robustness to distribution shifts in the context of diagnostic models in healthcare, where the prediction target $Y$, e.g., the presence of a disease, is causally upstream of the observations $X$, e.g., a biomarker. Distribution shifts may occur, for instance, when the training data is collected in a domain with patients having particular demographic characteristics while the model is deployed on patients from a different demographic group. In the domain of applied ML for health, it is common to predict $Y$ from $X$ without considering further information about the patient. However, beyond the direct influence of the disease $Y$ on biomarker $X$, a predictive model may learn to exploit confounding dependencies (or shortcuts) between $X$ and $Y$ that are unstable under certain distribution shifts. In this work, we highlight a data generating mechanism common to healthcare settings and discuss how recent theoretical results from the causality literature can be applied to build robust predictive models. We theoretically show why ignoring covariates as well as common invariant learning approaches will in general not yield robust predictors in the studied setting, while including certain covariates into the prediction model will. In an extensive simulation study, we showcase the robustness (or lack thereof) of different predictors under various data generating processes. Lastly, we analyze the performance of the different approaches using the PTB-XL dataset, a public dataset of annotated ECG recordings.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Do LLMs "know" internally when they follow instructions?
Authors:
Juyeon Heo,
Christina Heinze-Deml,
Oussama Elachqar,
Kwan Ho Ryan Chan,
Shirley Ren,
Udhay Nallasamy,
Andy Miller,
Jaya Narain
Abstract:
Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines. However, LLMs often fail to follow even simple and clear instructions. To improve instruction-following behavior and prevent undesirable outputs, a deeper understanding of how LLMs' internal states relate to these outcomes is r…
▽ More
Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines. However, LLMs often fail to follow even simple and clear instructions. To improve instruction-following behavior and prevent undesirable outputs, a deeper understanding of how LLMs' internal states relate to these outcomes is required. In this work, we investigate whether LLMs encode information in their representations that correlate with instruction-following success - a property we term knowing internally. Our analysis identifies a direction in the input embedding space, termed the instruction-following dimension, that predicts whether a response will comply with a given instruction. We find that this dimension generalizes well across unseen tasks but not across unseen instruction types. We demonstrate that modifying representations along this dimension improves instruction-following success rates compared to random changes, without compromising response quality. Further investigation reveals that this dimension is more closely related to the phrasing of prompts rather than the inherent difficulty of the task or instructions. This work provides insight into the internal workings of LLMs' instruction-following, paving the way for reliable LLM agents.
△ Less
Submitted 28 March, 2025; v1 submitted 18 October, 2024;
originally announced October 2024.
-
Future of Algorithmic Organization: Large-Scale Analysis of Decentralized Autonomous Organizations (DAOs)
Authors:
Tanusree Sharma,
Yujin Potter,
Kornrapat Pongmala,
Henry Wang,
Andrew Miller,
Dawn Song,
Yang Wang
Abstract:
Decentralized Autonomous Organizations (DAOs) resemble early online communities, particularly those centered around open-source projects, and present a potential empirical framework for complex social-computing systems by encoding governance rules within "smart contracts" on the blockchain. A key function of a DAO is collective decision-making, typically carried out through a series of proposals w…
▽ More
Decentralized Autonomous Organizations (DAOs) resemble early online communities, particularly those centered around open-source projects, and present a potential empirical framework for complex social-computing systems by encoding governance rules within "smart contracts" on the blockchain. A key function of a DAO is collective decision-making, typically carried out through a series of proposals where members vote on organizational events using governance tokens, signifying relative influence within the DAO. In just a few years, the deployment of DAOs surged with a total treasury of $24.5 billion and 11.1M governance token holders collectively managing decisions across over 13,000 DAOs as of 2024. In this study, we examine the operational dynamics of 100 DAOs, like pleasrdao, lexdao, lootdao, optimism collective, uniswap, etc. With large-scale empirical analysis of a diverse set of DAO categories and smart contracts and by leveraging on-chain (e.g., voting results) and off-chain data, we examine factors such as voting power, participation, and DAO characteristics dictating the level of decentralization, thus, the efficiency of management structures. As such, our study highlights that increased grassroots participation correlates with higher decentralization in a DAO, and lower variance in voting power within a DAO correlates with a higher level of decentralization, as consistently measured by Gini metrics. These insights closely align with key topics in political science, such as the allocation of power in decision-making and the effects of various governance models. We conclude by discussing the implications for researchers, and practitioners, emphasizing how these factors can inform the design of democratic governance systems in emerging applications that require active engagement from stakeholders in decision-making.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Fast Symbolic Integer-Linear Spectra
Authors:
Jonny Luntzel,
Abraham Miller
Abstract:
Here we contribute a fast symbolic eigenvalue solver for matrices whose eigenvalues are $\mathbb{Z}$-linear combinations of their entries, alongside efficient general and stochastic $M^{X}$ generators. Users can interact with a few degrees of freedom to create linear operators, making high-dimensional symbolic analysis feasible for when numerical analyses are insufficient.
Here we contribute a fast symbolic eigenvalue solver for matrices whose eigenvalues are $\mathbb{Z}$-linear combinations of their entries, alongside efficient general and stochastic $M^{X}$ generators. Users can interact with a few degrees of freedom to create linear operators, making high-dimensional symbolic analysis feasible for when numerical analyses are insufficient.
△ Less
Submitted 12 December, 2024; v1 submitted 18 September, 2024;
originally announced October 2024.
-
Lessons Learned on Information Retrieval in Electronic Health Records: A Comparison of Embedding Models and Pooling Strategies
Authors:
Skatje Myers,
Timothy A. Miller,
Yanjun Gao,
Matthew M. Churpek,
Anoop Mayampurath,
Dmitriy Dligach,
Majid Afshar
Abstract:
Objective: Applying large language models (LLMs) to the clinical domain is challenging due to the context-heavy nature of processing medical records. Retrieval-augmented generation (RAG) offers a solution by facilitating reasoning over large text sources. However, there are many parameters to optimize in just the retrieval system alone. This paper presents an ablation study exploring how different…
▽ More
Objective: Applying large language models (LLMs) to the clinical domain is challenging due to the context-heavy nature of processing medical records. Retrieval-augmented generation (RAG) offers a solution by facilitating reasoning over large text sources. However, there are many parameters to optimize in just the retrieval system alone. This paper presents an ablation study exploring how different embedding models and pooling methods affect information retrieval for the clinical domain.
Methods: Evaluating on three retrieval tasks on two electronic health record (EHR) data sources, we compared seven models, including medical- and general-domain models, specialized encoder embedding models, and off-the-shelf decoder LLMs. We also examine the choice of embedding pooling strategy for each model, independently on the query and the text to retrieve.
Results: We found that the choice of embedding model significantly impacts retrieval performance, with BGE, a comparatively small general-domain model, consistently outperforming all others, including medical-specific models. However, our findings also revealed substantial variability across datasets and query text phrasings. We also determined the best pooling methods for each of these models to guide future design of retrieval systems.
Discussion: The choice of embedding model, pooling strategy, and query formulation can significantly impact retrieval performance and the performance of these models on other public benchmarks does not necessarily transfer to new domains. Further studies such as this one are vital for guiding empirically-grounded development of retrieval frameworks, such as in the context of RAG, for the clinical domain.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
"The struggle is a part of the experience": Engaging Discontents in the Design of Family Meal Technologies
Authors:
Yuxing Wu,
Andrew D Miller,
Chia-Fang Chung,
Elizabeth Kaziunas
Abstract:
Meals are a central (and messy) part of family life. Previous design framings for mealtime technologies have focused on supporting dietary needs or social and celebratory interactions at the dinner table; however, family meals involve the coordination of many activities and complicated family dynamics. In this paper, we report on findings from interviews and design sessions with 18 families from t…
▽ More
Meals are a central (and messy) part of family life. Previous design framings for mealtime technologies have focused on supporting dietary needs or social and celebratory interactions at the dinner table; however, family meals involve the coordination of many activities and complicated family dynamics. In this paper, we report on findings from interviews and design sessions with 18 families from the Midwestern United States (including both partners/parents and children) to uncover important family differences and tensions that arise around domestic meal experiences. Drawing on feminist theory, we unpack the work of feeding a family as a form of care, drawing attention to the social and emotional complexity of family meals. Critically situating our data within current design narratives, we propose the sensitizing concepts of generative and systemic discontents as a productive way towards troubling the design space of family-food interaction to contend with the struggles that are a part of everyday family meal experiences.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?
Authors:
Yanjun Gao,
Skatje Myers,
Shan Chen,
Dmitriy Dligach,
Timothy A Miller,
Danielle Bitterman,
Matthew Churpek,
Majid Afshar
Abstract:
The introduction of Large Language Models (LLMs) has advanced data representation and analysis, bringing significant progress in their use for medical questions and answering. Despite these advancements, integrating tabular data, especially numerical data pivotal in clinical contexts, into LLM paradigms has not been thoroughly explored. In this study, we examine the effectiveness of vector represe…
▽ More
The introduction of Large Language Models (LLMs) has advanced data representation and analysis, bringing significant progress in their use for medical questions and answering. Despite these advancements, integrating tabular data, especially numerical data pivotal in clinical contexts, into LLM paradigms has not been thoroughly explored. In this study, we examine the effectiveness of vector representations from last hidden states of LLMs for medical diagnostics and prognostics using electronic health record (EHR) data. We compare the performance of these embeddings with that of raw numerical EHR data when used as feature inputs to traditional machine learning (ML) algorithms that excel at tabular data learning, such as eXtreme Gradient Boosting. We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluating their utilities as feature extractors to enhance ML classifiers for predicting diagnoses, length of stay, and mortality. Furthermore, we examine prompt engineering techniques on zero-shot and few-shot LLM embeddings to measure their impact comprehensively. Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results, suggesting a promising avenue for future research in medical applications.
△ Less
Submitted 19 September, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
PROF: Protected Order Flow in a Profit-Seeking World
Authors:
Kushal Babel,
Nerla Jean-Louis,
Yan Ji,
Ujval Misra,
Mahimna Kelkar,
Kosala Yapa Mudiyanselage,
Andrew Miller,
Ari Juels
Abstract:
Users of decentralized finance (DeFi) applications face significant risks from adversarial actions that manipulate the order of transactions to extract value from users. Such actions -- an adversarial form of what is called maximal-extractable value (MEV) -- impact both individual outcomes and the stability of the DeFi ecosystem. MEV exploitation, moreover, is being institutionalized through an ar…
▽ More
Users of decentralized finance (DeFi) applications face significant risks from adversarial actions that manipulate the order of transactions to extract value from users. Such actions -- an adversarial form of what is called maximal-extractable value (MEV) -- impact both individual outcomes and the stability of the DeFi ecosystem. MEV exploitation, moreover, is being institutionalized through an architectural paradigm known Proposer-Builder Separation (PBS).
This work introduces a system called PROF (PRotected Order Flow) that is designed to limit harmful forms of MEV in existing PBS systems. PROF aims at this goal using two ideas. First, PROF imposes an ordering on a set ("bundle") of privately input transactions and enforces that ordering all the way through to block production -- preventing transaction-order manipulation. Second, PROF creates bundles whose inclusion is profitable to block producers, thereby ensuring that bundles see timely inclusion in blocks.
PROF is backward-compatible, meaning that it works with existing and future PBS designs. PROF is also compatible with any desired algorithm for ordering transactions within a PROF bundle (e.g., first-come, first-serve, fee-based, etc.). It executes efficiently, i.e., with low latency, and requires no additional trust assumptions among PBS entities. We quantitatively and qualitatively analyze incentive structure of PROF, and its utility to users compared with existing solutions. We also report on inclusion likelihood of PROF transactions, and concrete latency numbers through our end-to-end implementation.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
What is my quantum computer good for? Quantum capability learning with physics-aware neural networks
Authors:
Daniel Hothem,
Ashe Miller,
Timothy Proctor
Abstract:
Quantum computers have the potential to revolutionize diverse fields, including quantum chemistry, materials science, and machine learning. However, contemporary quantum computers experience errors that often cause quantum programs run on them to fail. Until quantum computers can reliably execute large quantum programs, stakeholders will need fast and reliable methods for assessing a quantum compu…
▽ More
Quantum computers have the potential to revolutionize diverse fields, including quantum chemistry, materials science, and machine learning. However, contemporary quantum computers experience errors that often cause quantum programs run on them to fail. Until quantum computers can reliably execute large quantum programs, stakeholders will need fast and reliable methods for assessing a quantum computer's capability-i.e., the programs it can run and how well it can run them. Previously, off-the-shelf neural network architectures have been used to model quantum computers' capabilities, but with limited success, because these networks fail to learn the complex quantum physics that determines real quantum computers' errors. We address this shortcoming with a new quantum-physics-aware neural network architecture for learning capability models. Our architecture combines aspects of graph neural networks with efficient approximations to the physics of errors in quantum programs. This approach achieves up to $\sim50\%$ reductions in mean absolute error on both experimental and simulated data, over state-of-the-art models based on convolutional neural networks.
△ Less
Submitted 26 February, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Reconfiguration of Multisets with Applications to Bin Packing
Authors:
Jeffrey Kam,
Shahin Kamali,
Avery Miller,
Naomi Nishimura
Abstract:
We use the reconfiguration framework to analyze problems that involve the rearrangement of items among groups. In various applications, a group of items could correspond to the files or jobs assigned to a particular machine, and the goal of rearrangement could be improving efficiency or increasing locality.
To cover problems arising in a wide range of application areas, we define the general Rep…
▽ More
We use the reconfiguration framework to analyze problems that involve the rearrangement of items among groups. In various applications, a group of items could correspond to the files or jobs assigned to a particular machine, and the goal of rearrangement could be improving efficiency or increasing locality.
To cover problems arising in a wide range of application areas, we define the general Repacking problem as the rearrangement of multisets of multisets. We present hardness results for the general case and algorithms for various classes of instances that arise in real-life scenarios. By limiting the total size of items in each multiset, our results can be viewed as an offline approach to Bin Packing, in which each bin is represented as a multiset.
In addition to providing the first results on reconfiguration of multisets, our contributions open up several research avenues: the interplay between reconfiguration and online algorithms and parallel algorithms; the use of the tools of linear programming in reconfiguration; and, in the longer term, a focus on extra resources in reconfiguration.
△ Less
Submitted 28 October, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
The Artificial Intelligence Ontology: LLM-assisted construction of AI concept hierarchies
Authors:
Marcin P. Joachimiak,
Mark A. Miller,
J. Harry Caufield,
Ryan Ly,
Nomi L. Harris,
Andrew Tritt,
Christopher J. Mungall,
Kristofer E. Bouchard
Abstract:
The Artificial Intelligence Ontology (AIO) is a systematization of artificial intelligence (AI) concepts, methodologies, and their interrelations. Developed via manual curation, with the additional assistance of large language models (LLMs), AIO aims to address the rapidly evolving landscape of AI by providing a comprehensive framework that encompasses both technical and ethical aspects of AI tech…
▽ More
The Artificial Intelligence Ontology (AIO) is a systematization of artificial intelligence (AI) concepts, methodologies, and their interrelations. Developed via manual curation, with the additional assistance of large language models (LLMs), AIO aims to address the rapidly evolving landscape of AI by providing a comprehensive framework that encompasses both technical and ethical aspects of AI technologies. The primary audience for AIO includes AI researchers, developers, and educators seeking standardized terminology and concepts within the AI domain. The ontology is structured around six top-level branches: Networks, Layers, Functions, LLMs, Preprocessing, and Bias, each designed to support the modular composition of AI methods and facilitate a deeper understanding of deep learning architectures and ethical considerations in AI.
AIO's development utilized the Ontology Development Kit (ODK) for its creation and maintenance, with its content being dynamically updated through AI-driven curation support. This approach not only ensures the ontology's relevance amidst the fast-paced advancements in AI but also significantly enhances its utility for researchers, developers, and educators by simplifying the integration of new AI concepts and methodologies.
The ontology's utility is demonstrated through the annotation of AI methods data in a catalog of AI research publications and the integration into the BioPortal ontology resource, highlighting its potential for cross-disciplinary research. The AIO ontology is open source and is available on GitHub (https://github.com/berkeleybop/artificial-intelligence-ontology) and BioPortal (https://bioportal.bioontology.org/ontologies/AIO).
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
Authors:
Subhabrata Mukherjee,
Paul Gamble,
Markel Sanz Ausin,
Neel Kant,
Kriti Aggarwal,
Neha Manjunath,
Debajyoti Datta,
Zhengliang Liu,
Jiayuan Ding,
Sophia Busacca,
Cezanne Bianco,
Swapnil Sharma,
Rae Lasko,
Michelle Voisard,
Sanchay Harneja,
Darya Filippova,
Gerry Meixiong,
Kevin Cha,
Amir Youssefi,
Meyhaa Buvanesh,
Howard Weingram,
Sebastian Bierman-Lytle,
Harpreet Singh Mangat,
Kim Parikh,
Saad Godil
, et al. (1 additional authors not shown)
Abstract:
We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful pr…
▽ More
We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful primary agent that focuses on driving an engaging conversation and several specialist support agents focused on healthcare tasks performed by nurses to increase safety and reduce hallucinations. We develop a sophisticated training protocol for iterative co-training of the agents that optimize for diverse objectives. We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents. We align our models to speak like medical professionals, using organic healthcare conversations and simulated ones between patient actors and experienced nurses. This allows our system to express unique capabilities such as rapport building, trust building, empathy and bedside manner. Finally, we present the first comprehensive clinician evaluation of an LLM system for healthcare. We recruited over 1100 U.S. licensed nurses and over 130 U.S. licensed physicians to perform end-to-end conversational evaluations of our system by posing as patients and rating the system on several measures. We demonstrate Polaris performs on par with human nurses on aggregate across dimensions such as medical safety, clinical readiness, conversational quality, and bedside manner. Additionally, we conduct a challenging task-based evaluation of the individual specialist support agents, where we demonstrate our LLM agents significantly outperform a much larger general-purpose LLM (GPT-4) as well as from its own medium-size class (LLaMA-2 70B).
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Closed-loop Multi-step Planning
Authors:
Giulia Lafratta,
Bernd Porr,
Christopher Chandler,
Alice Miller
Abstract:
Living organisms interact with their surroundings in a closed-loop fashion, where sensory inputs dictate the initiation and termination of behaviours. Even simple animals are able to develop and execute complex plans, which has not yet been replicated in robotics using pure closed-loop input control. We propose a solution to this problem by defining a set of discrete and temporary closed-loop cont…
▽ More
Living organisms interact with their surroundings in a closed-loop fashion, where sensory inputs dictate the initiation and termination of behaviours. Even simple animals are able to develop and execute complex plans, which has not yet been replicated in robotics using pure closed-loop input control. We propose a solution to this problem by defining a set of discrete and temporary closed-loop controllers, called ``Tasks'', each representing a closed-loop behaviour. We further introduce a supervisory module which has an innate understanding of physics and causality, through which it can simulate the execution of Task sequences over time and store the results in a model of the environment. On the basis of this model, plans can be made by chaining temporary closed-loop controllers. Our proposed framework was implemented for a real robot and tested in two scenarios as proof of concept.
△ Less
Submitted 29 January, 2025; v1 submitted 23 February, 2024;
originally announced February 2024.
-
A Causal Framework to Evaluate Racial Bias in Law Enforcement Systems
Authors:
Jessy Xinyi Han,
Andrew Miller,
S. Craig Watkins,
Christopher Winship,
Fotini Christia,
Devavrat Shah
Abstract:
We are interested in developing a data-driven method to evaluate race-induced biases in law enforcement systems. While the recent works have addressed this question in the context of police-civilian interactions using police stop data, they have two key limitations. First, bias can only be properly quantified if true criminality is accounted for in addition to race, but it is absent in prior works…
▽ More
We are interested in developing a data-driven method to evaluate race-induced biases in law enforcement systems. While the recent works have addressed this question in the context of police-civilian interactions using police stop data, they have two key limitations. First, bias can only be properly quantified if true criminality is accounted for in addition to race, but it is absent in prior works. Second, law enforcement systems are multi-stage and hence it is important to isolate the true source of bias within the "causal chain of interactions" rather than simply focusing on the end outcome; this can help guide reforms. In this work, we address these challenges by presenting a multi-stage causal framework incorporating criminality. We provide a theoretical characterization and an associated data-driven method to evaluate (a) the presence of any form of racial bias, and (b) if so, the primary source of such a bias in terms of race and criminality. Our framework identifies three canonical scenarios with distinct characteristics: in settings like (1) airport security, the primary source of observed bias against a race is likely to be bias in law enforcement against innocents of that race; (2) AI-empowered policing, the primary source of observed bias against a race is likely to be bias in law enforcement against criminals of that race; and (3) police-civilian interaction, the primary source of observed bias against a race could be bias in law enforcement against that race or bias from the general public in reporting against the other race. Through an extensive empirical study using police-civilian interaction data and 911 call data, we find an instance of such a counter-intuitive phenomenon: in New Orleans, the observed bias is against the majority race and the likely reason for it is the over-reporting (via 911 calls) of incidents involving the minority race by the general public.
△ Less
Submitted 20 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
A Survey of Deep Learning and Foundation Models for Time Series Forecasting
Authors:
John A. Miller,
Mohammed Aldosari,
Farah Saeed,
Nasid Habib Barna,
Subas Rana,
I. Budak Arpinar,
Ninghao Liu
Abstract:
Deep Learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting. For example, in the well-known Makridakis (M) Competitions, hybrids of traditional statistical or machine learning techniques have only recently become the top performers. With the recent architectural advances in deep learning being applied to time seri…
▽ More
Deep Learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting. For example, in the well-known Makridakis (M) Competitions, hybrids of traditional statistical or machine learning techniques have only recently become the top performers. With the recent architectural advances in deep learning being applied to time series forecasting (e.g., encoder-decoders with attention, transformers, and graph neural networks), deep learning has begun to show significant advantages. Still, in the area of pandemic prediction, there remain challenges for deep learning models: the time series is not long enough for effective training, unawareness of accumulated scientific knowledge, and interpretability of the model. To this end, the development of foundation models (large deep learning models with extensive pre-training) allows models to understand patterns and acquire knowledge that can be applied to new related problems before extensive training data becomes available. Furthermore, there is a vast amount of knowledge available that deep learning models can tap into, including Knowledge Graphs and Large Language Models fine-tuned with scientific domain knowledge. There is ongoing research examining how to utilize or inject such knowledge into deep learning models. In this survey, several state-of-the-art modeling techniques are reviewed, and suggestions for further work are provided.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
From low resource information extraction to identifying influential nodes in knowledge graphs
Authors:
Erica Cai,
Olga Simek,
Benjamin A. Miller,
Danielle Sullivan-Pao,
Evan Young,
Christopher L. Smith
Abstract:
We propose a pipeline for identifying important entities from intelligence reports that constructs a knowledge graph, where nodes correspond to entities of fine-grained types (e.g. traffickers) extracted from the text and edges correspond to extracted relations between entities (e.g. cartel membership). The important entities in intelligence reports then map to central nodes in the knowledge graph…
▽ More
We propose a pipeline for identifying important entities from intelligence reports that constructs a knowledge graph, where nodes correspond to entities of fine-grained types (e.g. traffickers) extracted from the text and edges correspond to extracted relations between entities (e.g. cartel membership). The important entities in intelligence reports then map to central nodes in the knowledge graph. We introduce a novel method that extracts fine-grained entities in a few-shot setting (few labeled examples), given limited resources available to label the frequently changing entity types that intelligence analysts are interested in. It outperforms other state-of-the-art methods. Next, we identify challenges facing previous evaluations of zero-shot (no labeled examples) methods for extracting relations, affecting the step of populating edges. Finally, we explore the utility of the pipeline: given the goal of identifying important entities, we evaluate the impact of relation extraction errors on the identification of central nodes in several real and synthetic networks. The impact of these errors varies significantly by graph topology, suggesting that confidence in measurements based on automatically extracted relations should depend on observed network features.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Evaluating the security of CRYSTALS-Dilithium in the quantum random oracle model
Authors:
Kelsey A. Jackson,
Carl A. Miller,
Daochen Wang
Abstract:
In the wake of recent progress on quantum computing hardware, the National Institute of Standards and Technology (NIST) is standardizing cryptographic protocols that are resistant to attacks by quantum adversaries. The primary digital signature scheme that NIST has chosen is CRYSTALS-Dilithium. The hardness of this scheme is based on the hardness of three computational problems: Module Learning wi…
▽ More
In the wake of recent progress on quantum computing hardware, the National Institute of Standards and Technology (NIST) is standardizing cryptographic protocols that are resistant to attacks by quantum adversaries. The primary digital signature scheme that NIST has chosen is CRYSTALS-Dilithium. The hardness of this scheme is based on the hardness of three computational problems: Module Learning with Errors (MLWE), Module Short Integer Solution (MSIS), and SelfTargetMSIS. MLWE and MSIS have been well-studied and are widely believed to be secure. However, SelfTargetMSIS is novel and, though classically as hard as MSIS, its quantum hardness is unclear. In this paper, we provide the first proof of the hardness of SelfTargetMSIS via a reduction from MLWE in the Quantum Random Oracle Model (QROM). Our proof uses recently developed techniques in quantum reprogramming and rewinding. A central part of our approach is a proof that a certain hash function, derived from the MSIS problem, is collapsing. From this approach, we deduce a new security proof for Dilithium under appropriate parameter settings. Compared to the previous work by Kiltz, Lyubashevsky, and Schaffner (EUROCRYPT 2018) that gave the only other rigorous security proof for a variant of Dilithium, our proof has the advantage of being applicable under the condition q = 1 mod 2n, where q denotes the modulus and n the dimension of the underlying algebraic ring. This condition is part of the original Dilithium proposal and is crucial for the efficient implementation of the scheme. We provide new secure parameter sets for Dilithium under the condition q = 1 mod 2n, finding that our public key size and signature size are about 2.9 times and 1.3 times larger, respectively, than those proposed by Kiltz et al. at the same security level.
△ Less
Submitted 7 March, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Large-scale Training of Foundation Models for Wearable Biosignals
Authors:
Salar Abbaspourazad,
Oussama Elachqar,
Andrew C. Miller,
Saba Emrani,
Udhyakumar Nallasamy,
Ian Shapiro
Abstract:
Tracking biosignals is crucial for monitoring wellness and preempting the development of severe medical conditions. Today, wearable devices can conveniently record various biosignals, creating the opportunity to monitor health status without disruption to one's daily routine. Despite widespread use of wearable devices and existing digital biomarkers, the absence of curated data with annotated medi…
▽ More
Tracking biosignals is crucial for monitoring wellness and preempting the development of severe medical conditions. Today, wearable devices can conveniently record various biosignals, creating the opportunity to monitor health status without disruption to one's daily routine. Despite widespread use of wearable devices and existing digital biomarkers, the absence of curated data with annotated medical labels hinders the development of new biomarkers to measure common health conditions. In fact, medical datasets are usually small in comparison to other domains, which is an obstacle for developing neural network models for biosignals. To address this challenge, we have employed self-supervised learning using the unlabeled sensor data collected under informed consent from the large longitudinal Apple Heart and Movement Study (AHMS) to train foundation models for two common biosignals: photoplethysmography (PPG) and electrocardiogram (ECG) recorded on Apple Watch. We curated PPG and ECG datasets from AHMS that include data from ~141K participants spanning ~3 years. Our self-supervised learning framework includes participant level positive pair selection, stochastic augmentation module and a regularized contrastive loss optimized with momentum training, and generalizes well to both PPG and ECG modalities. We show that the pre-trained foundation models readily encode information regarding participants' demographics and health conditions. To the best of our knowledge, this is the first study that builds foundation models using large-scale PPG and ECG data collected via wearable consumer devices $\unicode{x2013}$ prior works have commonly used smaller-size datasets collected in clinical and experimental settings. We believe PPG and ECG foundation models can enhance future wearable devices by reducing the reliance on labeled data and hold the potential to help the users improve their health.
△ Less
Submitted 6 March, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Authors:
Kristen Grauman,
Andrew Westbury,
Lorenzo Torresani,
Kris Kitani,
Jitendra Malik,
Triantafyllos Afouras,
Kumar Ashutosh,
Vijay Baiyya,
Siddhant Bansal,
Bikram Boote,
Eugene Byrne,
Zach Chavis,
Joya Chen,
Feng Cheng,
Fu-Jen Chu,
Sean Crane,
Avijit Dasgupta,
Jing Dong,
Maria Escobar,
Cristhian Forigua,
Abrham Gebreselasie,
Sanjay Haresh,
Jing Huang,
Md Mohaiminul Islam,
Suyog Jain
, et al. (76 additional authors not shown)
Abstract:
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from…
▽ More
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/
△ Less
Submitted 25 September, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Fast Deterministic Rendezvous in Labeled Lines
Authors:
Avery Miller,
Andrzej Pelc
Abstract:
Two mobile agents, starting from different nodes of a network modeled as a graph, and woken up at possibly different times, have to meet at the same node. This problem is known as rendezvous. We consider deterministic distributed rendezvous in the infinite path. Each node has a distinct label which is a positive integer. The time of rendezvous is the number of rounds until meeting, counted from th…
▽ More
Two mobile agents, starting from different nodes of a network modeled as a graph, and woken up at possibly different times, have to meet at the same node. This problem is known as rendezvous. We consider deterministic distributed rendezvous in the infinite path. Each node has a distinct label which is a positive integer. The time of rendezvous is the number of rounds until meeting, counted from the starting round of the earlier agent. We consider three scenarios. In the first scenario, each agent knows its position in the line, i.e., each of them knows its initial distance from the smallest-labeled node, on which side of this node it is located, and the direction towards it. For this scenario, we give a rendezvous algorithm working in time $O(D)$, where $D$ is the initial distance between the agents. This complexity is clearly optimal. In the second scenario, each agent initially knows only the label of its starting node and the initial distance $D$ between the agents. In this scenario, we give a rendezvous algorithm working in time $O(D\log^*\ell)$, where $\ell$ is the larger label of the starting nodes. We prove a matching lower bound $Ω(D\log^*\ell)$. Finally, in the most general scenario, where each agent initially knows only the label of its starting node, we give a rendezvous algorithm working in time $O(D^2(\log^*\ell)^3)$, which is at most cubic in the lower bound. All our results remain valid (with small changes) for arbitrary finite paths and for cycles. Our algorithms are drastically better than approaches that use graph exploration, whose running times depend on the graph's size or diameter. Our main methodological tool, and the main novelty of the paper, is a two way reduction: from fast colouring of the infinite labeled path using a constant number of colours in the LOCAL model to fast rendezvous in this path, and vice-versa.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Model Checking for Closed-Loop Robot Reactive Planning
Authors:
Christopher Chandler,
Bernd Porr,
Alice Miller,
Giulia Lafratta
Abstract:
In this paper, we show how model checking can be used to create multi-step plans for a differential drive wheeled robot so that it can avoid immediate danger. Using a small, purpose built model checking algorithm in situ we generate plans in real-time in a way that reflects the egocentric reactive response of simple biological agents. Our approach is based on chaining temporary control systems whi…
▽ More
In this paper, we show how model checking can be used to create multi-step plans for a differential drive wheeled robot so that it can avoid immediate danger. Using a small, purpose built model checking algorithm in situ we generate plans in real-time in a way that reflects the egocentric reactive response of simple biological agents. Our approach is based on chaining temporary control systems which are spawned to eliminate disturbances in the local environment that disrupt an autonomous agent from its preferred action (or resting state). The method involves a novel discretization of 2D LiDAR data which is sensitive to bounded stochastic variations in the immediate environment. We operationalise multi-step planning using invariant checking by forward depth-first search, using a cul-de-sac scenario as a first test case. Our results demonstrate that model checking can be used to plan efficient trajectories for local obstacle avoidance, improving on the performance of a reactive agent which can only plan one step. We achieve this in near real-time using no pre-computed data. While our method has limitations, we believe our approach shows promise as an avenue for the development of safe, reliable and transparent trajectory planning in the context of autonomous vehicles.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
GPT4All: An Ecosystem of Open Source Compressed Language Models
Authors:
Yuvanesh Anand,
Zach Nussbaum,
Adam Treat,
Aaron Miller,
Richard Guo,
Ben Schmidt,
GPT4All Community,
Brandon Duderstadt,
Andriy Mulyar
Abstract:
Large language models (LLMs) have recently achieved human-level performance on a range of professional and academic benchmarks. The accessibility of these models has lagged behind their performance. State-of-the-art LLMs require costly infrastructure; are only accessible via rate-limited, geo-locked, and censored web interfaces; and lack publicly available code and technical reports. In this paper…
▽ More
Large language models (LLMs) have recently achieved human-level performance on a range of professional and academic benchmarks. The accessibility of these models has lagged behind their performance. State-of-the-art LLMs require costly infrastructure; are only accessible via rate-limited, geo-locked, and censored web interfaces; and lack publicly available code and technical reports. In this paper, we tell the story of GPT4All, a popular open source repository that aims to democratize access to LLMs. We outline the technical details of the original GPT4All model family, as well as the evolution of the GPT4All project from a single model into a fully fledged open source ecosystem. It is our hope that this paper acts as both a technical overview of the original GPT4All models as well as a case study on the subsequent growth of the GPT4All open source ecosystem.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
GRASP: Accelerating Shortest Path Attacks via Graph Attention
Authors:
Zohair Shafi,
Benjamin A. Miller,
Ayan Chatterjee,
Tina Eliassi-Rad,
Rajmonda S. Caceres
Abstract:
Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, ar…
▽ More
Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.
△ Less
Submitted 23 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks
Authors:
Zohair Shafi,
Benjamin A. Miller,
Tina Eliassi-Rad,
Rajmonda S. Caceres
Abstract:
Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We investigate the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that augments existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. Graph-SCP uses both supervised learning from prior solved…
▽ More
Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We investigate the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that augments existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. Graph-SCP uses both supervised learning from prior solved instances and unsupervised learning aimed at minimizing the SCP objective. We evaluate the performance of Graph-SCP on synthetically weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 60-80% and achieves runtime speedups of up to 10x on average when compared to Gurobi (a state-of-the-art commercial solver), while maintaining solution quality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial runtime. We showcase Graph-SCP's ability to generalize to larger problem sizes, training on SCP instances with up to 3,000 subsets and testing on SCP instances with up to 10,000 subsets.
△ Less
Submitted 26 August, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Decoding the Alphabet Soup of Degrees in the United States Postsecondary Education System Through Hybrid Method: Database and Text Mining
Authors:
Sahar Voghoei,
James Byars,
John A Miller,
Khaled Rasheed,
Hamid A Arabnia
Abstract:
This paper proposes a model to predict the levels (e.g., Bachelor, Master, etc.) of postsecondary degree awards that have been ambiguously expressed in the student tracking reports of the National Student Clearinghouse (NSC). The model will be the hybrid of two modules. The first module interprets the relevant abbreviatory elements embedded in NSC reports by referring to a comprehensive database t…
▽ More
This paper proposes a model to predict the levels (e.g., Bachelor, Master, etc.) of postsecondary degree awards that have been ambiguously expressed in the student tracking reports of the National Student Clearinghouse (NSC). The model will be the hybrid of two modules. The first module interprets the relevant abbreviatory elements embedded in NSC reports by referring to a comprehensive database that we have made of nearly 950 abbreviations for degree titles used by American postsecondary educators. The second module is a combination of feature classification and text mining modeled with CNN-BiLSTM, which is preceded by several steps of heavy pre-processing. The model proposed in this paper was trained with four multi-label datasets of different grades of resolution and returned 97.83\% accuracy with the most sophisticated dataset. Such a thorough classification of degree levels will provide insights into the modeling patterns of student success and mobility. To date, such a classification strategy has not been attempted except using manual methods and simple text parsing logic.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Considerations for health care institutions training large language models on electronic health records
Authors:
Weipeng Zhou,
Danielle Bitterman,
Majid Afshar,
Timothy A. Miller
Abstract:
Large language models (LLMs) like ChatGPT have excited scientists across fields; in medicine, one source of excitement is the potential applications of LLMs trained on electronic health record (EHR) data. But there are tough questions we must first answer if health care institutions are interested in having LLMs trained on their own data; should they train an LLM from scratch or fine-tune it from…
▽ More
Large language models (LLMs) like ChatGPT have excited scientists across fields; in medicine, one source of excitement is the potential applications of LLMs trained on electronic health record (EHR) data. But there are tough questions we must first answer if health care institutions are interested in having LLMs trained on their own data; should they train an LLM from scratch or fine-tune it from an open-source model? For healthcare institutions with a predefined budget, what are the biggest LLMs they can afford? In this study, we take steps towards answering these questions with an analysis on dataset sizes, model sizes, and costs for LLM training using EHR data. This analysis provides a framework for thinking about these questions in terms of data scale, compute scale, and training budgets.
△ Less
Submitted 23 August, 2023;
originally announced September 2023.
-
Cops and Robbers on 1-Planar Graphs
Authors:
Stephane Durocher,
Shahin Kamali,
Myroslav Kryven,
Fengyi Liu,
Amirhossein Mashghdoust,
Avery Miller,
Pouria Zamani Nezhad,
Ikaro Penha Costa,
Timothy Zapp
Abstract:
Cops and Robbers is a well-studied pursuit-evasion game in which a set of cops seeks to catch a robber in a graph G, where cops and robber move along edges of G. The cop number of G is the minimum number of cops that is sufficient to catch the robber. Every planar graph has cop number at most three, and there are planar graphs for which three cops are necessary [Aigner and Fromme, DAM 1984]. We st…
▽ More
Cops and Robbers is a well-studied pursuit-evasion game in which a set of cops seeks to catch a robber in a graph G, where cops and robber move along edges of G. The cop number of G is the minimum number of cops that is sufficient to catch the robber. Every planar graph has cop number at most three, and there are planar graphs for which three cops are necessary [Aigner and Fromme, DAM 1984]. We study the problem for beyond-planar graphs, that is, graphs that can be drawn in the plane with few crossings. In particular, we focus on 1-planar graphs, that is, graphs that can be drawn in the plane with at most one crossing per edge. In contrast to planar graphs, we show that some 1-planar graphs have unbounded cop number. Meanwhile, for maximal 1-planar graphs, we prove that three cops are always sufficient and sometimes necessary. In addition, we characterize outer 1-planar graphs with respect to their cop number.
△ Less
Submitted 6 September, 2023; v1 submitted 2 September, 2023;
originally announced September 2023.
-
Towards Exascale Computation for Turbomachinery Flows
Authors:
Yuhang Fu,
Weiqi Shen,
Jiahuan Cui,
Yao Zheng,
Guangwen Yang,
Zhao Liu,
Jifa Zhang,
Tingwei Ji,
Fangfang Xie,
Xiaojing Lv,
Hanyue Liu,
Xu Liu,
Xiyang Liu,
Xiaoyu Song,
Guocheng Tao,
Yan Yan,
Paul Tucker,
Steven A. E. Miller,
Shirui Luo,
Seid Koric,
Weimin Zheng
Abstract:
A state-of-the-art large eddy simulation code has been developed to solve compressible flows in turbomachinery. The code has been engineered with a high degree of scalability, enabling it to effectively leverage the many-core architecture of the new Sunway system. A consistent performance of 115.8 DP-PFLOPs has been achieved on a high-pressure turbine cascade consisting of over 1.69 billion mesh e…
▽ More
A state-of-the-art large eddy simulation code has been developed to solve compressible flows in turbomachinery. The code has been engineered with a high degree of scalability, enabling it to effectively leverage the many-core architecture of the new Sunway system. A consistent performance of 115.8 DP-PFLOPs has been achieved on a high-pressure turbine cascade consisting of over 1.69 billion mesh elements and 865 billion Degree of Freedoms (DOFs). By leveraging a high-order unstructured solver and its portability to large heterogeneous parallel systems, we have progressed towards solving the grand challenge problem outlined by NASA, which involves a time-dependent simulation of a complete engine, incorporating all the aerodynamic and heat transfer components.
△ Less
Submitted 29 December, 2023; v1 submitted 12 August, 2023;
originally announced August 2023.
-
Complex Network Effects on the Robustness of Graph Convolutional Networks
Authors:
Benjamin A. Miller,
Kevin Chan,
Tina Eliassi-Rad
Abstract:
Vertex classification -- the problem of identifying the class labels of nodes in a graph -- has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation networks or roles of machines in a computer network. Vertex classification using graph convolutional networks is susceptible to targeted poisoning attacks, in which both graph structure and node…
▽ More
Vertex classification -- the problem of identifying the class labels of nodes in a graph -- has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation networks or roles of machines in a computer network. Vertex classification using graph convolutional networks is susceptible to targeted poisoning attacks, in which both graph structure and node attributes can be changed in an attempt to misclassify a target node. This vulnerability decreases users' confidence in the learning method and can prevent adoption in high-stakes contexts. Defenses have also been proposed, focused on filtering edges before creating the model or aggregating information from neighbors more robustly. This paper considers an alternative: we leverage network characteristics in the training data selection process to improve robustness of vertex classifiers.
We propose two alternative methods of selecting training data: (1) to select the highest-degree nodes and (2) to iteratively select the node with the most neighbors minimally connected to the training set. In the datasets on which the original attack was demonstrated, we show that changing the training set can make the network much harder to attack. To maintain a given probability of attack success, the adversary must use far more perturbations; often a factor of 2--4 over the random training baseline. These training set selection methods often work in conjunction with the best recently published defenses to provide even greater robustness. While increasing the amount of randomly selected training data sometimes results in a more robust classifier, the proposed methods increase robustness substantially more. We also run a simulation study in which we demonstrate conditions under which each of the two methods outperforms the other, controlling for the graph topology, homophily of the labels, and node attributes.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Using Overlapping Methods to Counter Adversaries in Community Detection
Authors:
Benjamin A. Miller,
Kevin Chan,
Tina Eliassi-Rad
Abstract:
When dealing with large graphs, community detection is a useful data triage tool that can identify subsets of the network that a data analyst should investigate. In an adversarial scenario, the graph may be manipulated to avoid scrutiny of certain nodes by the analyst. Robustness to such behavior is an important consideration for data analysts in high-stakes scenarios such as cyber defense and cou…
▽ More
When dealing with large graphs, community detection is a useful data triage tool that can identify subsets of the network that a data analyst should investigate. In an adversarial scenario, the graph may be manipulated to avoid scrutiny of certain nodes by the analyst. Robustness to such behavior is an important consideration for data analysts in high-stakes scenarios such as cyber defense and counterterrorism. In this paper, we evaluate the use of overlapping community detection methods in the presence of adversarial attacks aimed at lowering the priority of a specific vertex. We formulate the data analyst's choice as a Stackelberg game in which the analyst chooses a community detection method and the attacker chooses an attack strategy in response. Applying various attacks from the literature to seven real network datasets, we find that, when the attacker has a sufficient budget, overlapping community detection methods outperform non-overlapping methods, often overwhelmingly so. This is the case when the attacker can only add edges that connect to the target and when the capability is added to add edges between neighbors of the target. We also analyze the tradeoff between robustness in the presence of an attack and performance when there is no attack. Our extensible analytic framework enables network data analysts to take these considerations into account and incorporate new attacks and community detection methods as they are developed.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Simulation-based Inference for Cardiovascular Models
Authors:
Antoine Wehenkel,
Laura Manduchi,
Jens Behrmann,
Luca Pegolotti,
Andrew C. Miller,
Guillermo Sapiro,
Ozan Sener,
Marco Cuturi,
Jörn-Henrik Jacobsen
Abstract:
Over the past decades, hemodynamics simulators have steadily evolved and have become tools of choice for studying cardiovascular systems in-silico. While such tools are routinely used to simulate whole-body hemodynamics from physiological parameters, solving the corresponding inverse problem of mapping waveforms back to plausible physiological parameters remains both promising and challenging. Mot…
▽ More
Over the past decades, hemodynamics simulators have steadily evolved and have become tools of choice for studying cardiovascular systems in-silico. While such tools are routinely used to simulate whole-body hemodynamics from physiological parameters, solving the corresponding inverse problem of mapping waveforms back to plausible physiological parameters remains both promising and challenging. Motivated by advances in simulation-based inference (SBI), we cast this inverse problem as statistical inference. In contrast to alternative approaches, SBI provides \textit{posterior distributions} for the parameters of interest, providing a \textit{multi-dimensional} representation of uncertainty for \textit{individual} measurements. We showcase this ability by performing an in-silico uncertainty analysis of five biomarkers of clinical interest comparing several measurement modalities. Beyond the corroboration of known facts, such as the feasibility of estimating heart rate, our study highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations for which parameter estimation exhibits distinct uncertainty regimes. Finally, we study the gap between in-vivo and in-silico with the MIMIC-III waveform database and critically discuss how cardiovascular simulations can inform real-world data analysis.
△ Less
Submitted 30 December, 2024; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Defense Against Shortest Path Attacks
Authors:
Benjamin A. Miller,
Zohair Shafi,
Wheeler Ruml,
Yevgeniy Vorobeychik,
Tina Eliassi-Rad,
Scott Alfeld
Abstract:
Identifying shortest paths between nodes in a network is an important task in many applications. Recent work has shown that a malicious actor can manipulate a graph to make traffic between two nodes of interest follow their target path. In this paper, we develop a defense against such attacks by modifying the edge weights that users observe. The defender must balance inhibiting the attacker agains…
▽ More
Identifying shortest paths between nodes in a network is an important task in many applications. Recent work has shown that a malicious actor can manipulate a graph to make traffic between two nodes of interest follow their target path. In this paper, we develop a defense against such attacks by modifying the edge weights that users observe. The defender must balance inhibiting the attacker against any negative effects on benign users. Specifically, the defender's goals are: (a) recommend the shortest paths to users, (b) make the lengths of the shortest paths in the published graph close to those of the same paths in the true graph, and (c) minimize the probability of an attack. We formulate the defense as a Stackelberg game in which the defender is the leader and the attacker is the follower. We also consider a zero-sum version of the game in which the defender's goal is to minimize cost while achieving the minimum possible attack probability. We show that the defense problem is NP-hard and propose heuristic solutions for both the zero-sum and non-zero-sum settings. By relaxing some constraints of the original problem, we formulate a linear program for local optimization around a feasible point. We present defense results with both synthetic and real networks and show that our methods often reach the lower bound of the defender's cost.
△ Less
Submitted 30 April, 2025; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Reinforcement Learning for Legged Robots: Motion Imitation from Model-Based Optimal Control
Authors:
AJ Miller,
Shamel Fahmi,
Matthew Chignoli,
Sangbae Kim
Abstract:
We propose MIMOC: Motion Imitation from Model-Based Optimal Control. MIMOC is a Reinforcement Learning (RL) controller that learns agile locomotion by imitating reference trajectories from model-based optimal control. MIMOC mitigates challenges faced by other motion imitation RL approaches because the references are dynamically consistent, require no motion retargeting, and include torque referenc…
▽ More
We propose MIMOC: Motion Imitation from Model-Based Optimal Control. MIMOC is a Reinforcement Learning (RL) controller that learns agile locomotion by imitating reference trajectories from model-based optimal control. MIMOC mitigates challenges faced by other motion imitation RL approaches because the references are dynamically consistent, require no motion retargeting, and include torque references. Hence, MIMOC does not require fine-tuning. MIMOC is also less sensitive to modeling and state estimation inaccuracies than model-based controllers. We validate MIMOC on the Mini-Cheetah in outdoor environments over a wide variety of challenging terrain, and on the MIT Humanoid in simulation. We show cases where MIMOC outperforms model-based optimal controllers, and show that imitating torque references improves the policy's performance.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Scalable Low-latency Optical Phase Sensor Array
Authors:
Zhanghao Sun,
Sunil Pai,
Carson Valdez,
Maziyar Milanizadeh,
Andrea Melloni,
Francesco Morichetti,
David A. B. Miller,
Olav Solgaard
Abstract:
Optical phase measurement is critical for many applications and traditional approaches often suffer from mechanical instability, temporal latency, and computational complexity. In this paper, we describe compact phase sensor arrays based on integrated photonics, which enable accurate and scalable reference-free phase sensing in a few measurement steps. This is achieved by connecting multiple two-p…
▽ More
Optical phase measurement is critical for many applications and traditional approaches often suffer from mechanical instability, temporal latency, and computational complexity. In this paper, we describe compact phase sensor arrays based on integrated photonics, which enable accurate and scalable reference-free phase sensing in a few measurement steps. This is achieved by connecting multiple two-port phase sensors into a graph to measure relative phases between neighboring and distant spatial locations. We propose an efficient post-processing algorithm, as well as circuit design rules to reduce random and biased error accumulations. We demonstrate the effectiveness of our system in both simulations and experiments with photonic integrated circuits. The proposed system measures the optical phase directly without the need for external references or spatial light modulators, thus providing significant benefits for applications including microscope imaging and optical phased arrays.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.