-
LSM-2: Learning from Incomplete Wearable Sensor Data
Authors:
Maxwell A. Xu,
Girish Narayanswamy,
Kumar Ayush,
Dimitris Spathis,
Shun Liao,
Shyam A. Tailor,
Ahmed Metwally,
A. Ali Heydari,
Yuwei Zhang,
Jake Garrison,
Samy Abdel-Ghaffar,
Xuhai Xu,
Ken Gu,
Jacob Sunshine,
Ming-Zher Poh,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Mark Malhotra,
Shwetak Patel,
Yuzhe Yang,
James M. Rehg,
Xin Liu,
Daniel McDuff
Abstract:
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-…
▽ More
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM's core novelty lies in its use of learnable mask tokens to model both existing ("inherited") and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Scaling Wearable Foundation Models
Authors:
Girish Narayanswamy,
Xin Liu,
Kumar Ayush,
Yuzhe Yang,
Xuhai Xu,
Shun Liao,
Jake Garrison,
Shyam Tailor,
Jake Sunshine,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Jiening Zhan,
Mark Malhotra,
Shwetak Patel,
Samy Abdel-Ghaffar,
Daniel McDuff
Abstract:
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre…
▽ More
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Authors:
Akshay Paruchuri,
Jake Garrison,
Shun Liao,
John Hernandez,
Jacob Sunshine,
Tim Althoff,
Xin Liu,
Daniel McDuff
Abstract:
Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability distributions. In this paper, we focus on evaluating the probabilistic reasoning capabilities of LMs using idealized and real-world statistical distributions. We perform a…
▽ More
Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability distributions. In this paper, we focus on evaluating the probabilistic reasoning capabilities of LMs using idealized and real-world statistical distributions. We perform a systematic evaluation of state-of-the-art LMs on three tasks: estimating percentiles, drawing samples, and calculating probabilities. We evaluate three ways to provide context to LMs 1) anchoring examples from within a distribution or family of distributions, 2) real-world context, 3) summary statistics on which to base a Normal approximation. Models can make inferences about distributions, and can be further aided by the incorporation of real-world context, example shots and simplified assumptions, even if these assumptions are incorrect or misspecified. To conduct this work, we developed a comprehensive benchmark distribution dataset with associated question-answer pairs that we have released publicly.
△ Less
Submitted 30 September, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Transforming Wearable Data into Health Insights using Large Language Model Agents
Authors:
Mike A. Merrill,
Akshay Paruchuri,
Naghmeh Rezaei,
Geza Kovacs,
Javier Perez,
Yun Liu,
Erik Schenck,
Nova Hammerquist,
Jake Sunshine,
Shyam Tailor,
Kumar Ayush,
Hao-Wei Su,
Qian He,
Cory Y. McLean,
Mark Malhotra,
Shwetak Patel,
Jiening Zhan,
Tim Althoff,
Daniel McDuff,
Xin Liu
Abstract:
Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising…
▽ More
Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising opportunity to enable such personalized analysis at scale. Yet, the application of LLM agents in analyzing personal health is still largely untapped. In this paper, we introduce the Personal Health Insights Agent (PHIA), an agent system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearables. We curate two benchmark question-answering datasets of over 4000 health insights questions. Based on 650 hours of human and expert evaluation we find that PHIA can accurately address over 84% of factual numerical questions and more than 83% of crowd-sourced open-ended questions. This work has implications for advancing behavioral health across the population, potentially enabling individuals to interpret their own wearable data, and paving the way for a new era of accessible, personalized wellness regimens that are informed by data-driven insights.
△ Less
Submitted 11 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
A Study of Undefined Behavior Across Foreign Function Boundaries in Rust Libraries
Authors:
Ian McCormack,
Joshua Sunshine,
Jonathan Aldrich
Abstract:
Developers rely on the static safety guarantees of the Rust programming language to write secure and performant applications. However, Rust is frequently used to interoperate with other languages which allow design patterns that conflict with Rust's evolving aliasing models. Miri is currently the only dynamic analysis tool that can validate applications against these models, but it does not suppor…
▽ More
Developers rely on the static safety guarantees of the Rust programming language to write secure and performant applications. However, Rust is frequently used to interoperate with other languages which allow design patterns that conflict with Rust's evolving aliasing models. Miri is currently the only dynamic analysis tool that can validate applications against these models, but it does not support finding bugs in foreign functions, indicating that there may be a critical correctness gap across the Rust ecosystem. We conducted a large-scale evaluation of Rust libraries that call foreign functions to determine whether Miri's dynamic analyses remain useful in this context. We used Miri and an LLVM interpreter to jointly execute applications that call foreign functions, where we found 46 instances of undefined or undesired behavior in 37 libraries. Three bugs were found in libraries that had more than 10,000 daily downloads on average during our observation period, and one was found in a library maintained by the Rust Project. Many of these bugs were violations of Rust's aliasing models, but the latest Tree Borrows model was significantly more permissive than the earlier Stacked Borrows model. The Rust community must invest in new, production-ready tooling for multi-language applications to ensure that developers can detect these errors.
△ Less
Submitted 2 April, 2025; v1 submitted 17 April, 2024;
originally announced April 2024.
-
A Mixed-Methods Study on the Implications of Unsafe Rust for Interoperation, Encapsulation, and Tooling
Authors:
Ian McCormack,
Tomas Dougan,
Sam Estep,
Hanan Hibshi,
Jonathan Aldrich,
Joshua Sunshine
Abstract:
The Rust programming language restricts aliasing to provide static safety guarantees. However, in certain situations, developers need to bypass these guarantees by using a set of unsafe features. If they are used incorrectly, these features can reintroduce the types of safety issues that Rust was designed to prevent. We seek to understand how current development tools can be improved to better ass…
▽ More
The Rust programming language restricts aliasing to provide static safety guarantees. However, in certain situations, developers need to bypass these guarantees by using a set of unsafe features. If they are used incorrectly, these features can reintroduce the types of safety issues that Rust was designed to prevent. We seek to understand how current development tools can be improved to better assist developers who find it necessary to interact with unsafe code. To that end, we study how developers reason about foreign function calls, the limitations of the tools that they currently use, their motivations for using unsafe code, and how they reason about encapsulating it. We conducted a mixed-methods investigation consisting of semi-structured interviews with 19 developers, followed by a survey that reached an additional 160 developers. Our participants were motivated to use unsafe code when they perceived that there was no alternative, and most avoided using it. However, limited tooling support for foreign function calls made participants uncertain about their design choices, and certain foreign aliasing and concurrency patterns were difficult to encapsulate. To overcome these challenges, Rust developers need verification tools that can provide guarantees of soundness within multi-language applications.
△ Less
Submitted 19 October, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Rose: Composable Autodiff for the Interactive Web
Authors:
Sam Estep,
Wode Ni,
Raven Rothkopf,
Joshua Sunshine
Abstract:
Reverse-mode automatic differentiation (autodiff) has been popularized by deep learning, but its ability to compute gradients is also valuable for interactive use cases such as bidirectional computer-aided design, embedded physics simulations, visualizing causal inference, and more. Unfortunately, the web is ill-served by existing autodiff frameworks, which use autodiff strategies that perform poo…
▽ More
Reverse-mode automatic differentiation (autodiff) has been popularized by deep learning, but its ability to compute gradients is also valuable for interactive use cases such as bidirectional computer-aided design, embedded physics simulations, visualizing causal inference, and more. Unfortunately, the web is ill-served by existing autodiff frameworks, which use autodiff strategies that perform poorly on dynamic scalar programs, and pull in heavy dependencies that would result in unacceptable webpage sizes. This work introduces Rose, a lightweight autodiff framework for the web using a new hybrid approach to reverse-mode autodiff, blending conventional tracing and transformation techniques in a way that uses the host language for metaprogramming while also allowing the programmer to explicitly define reusable functions that comprise a larger differentiable computation. We demonstrate the value of the Rose design by porting two differentiable physics simulations, and evaluate its performance on an optimization-based diagramming application, showing Rose outperforming the state-of-the-art in web-based autodiff by multiple orders of magnitude.
△ Less
Submitted 12 July, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Towards Accurate Differential Diagnosis with Large Language Models
Authors:
Daniel McDuff,
Mike Schaekermann,
Tao Tu,
Anil Palepu,
Amy Wang,
Jake Garrison,
Karan Singhal,
Yash Sharma,
Shekoofeh Azizi,
Kavita Kulkarni,
Le Hou,
Yong Cheng,
Yun Liu,
S Sara Mahdavi,
Sushant Prakash,
Anupam Pathak,
Christopher Semturs,
Shwetak Patel,
Dale R Webster,
Ewa Dominowska,
Juraj Gottweis,
Joelle Barral,
Katherine Chou,
Greg S Corrado,
Yossi Matias
, et al. (3 additional authors not shown)
Abstract:
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM op…
▽ More
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Can Large Language Models Write Good Property-Based Tests?
Authors:
Vasudev Vikram,
Caroline Lemieux,
Joshua Sunshine,
Rohan Padhye
Abstract:
Property-based testing (PBT), while an established technique in the software testing research community, is still relatively underused in real-world software. Pain points in writing property-based tests include implementing diverse random input generators and thinking of meaningful properties to test. Developers, however, are more amenable to writing documentation; plenty of library API documentat…
▽ More
Property-based testing (PBT), while an established technique in the software testing research community, is still relatively underused in real-world software. Pain points in writing property-based tests include implementing diverse random input generators and thinking of meaningful properties to test. Developers, however, are more amenable to writing documentation; plenty of library API documentation is available and can be used as natural language specifications for PBTs. As large language models (LLMs) have recently shown promise in a variety of coding tasks, we investigate using modern LLMs to automatically synthesize PBTs using two prompting techniques. A key challenge is to rigorously evaluate the LLM-synthesized PBTs. We propose a methodology to do so considering several properties of the generated tests: (1) validity, (2) soundness, and (3) property coverage, a novel metric that measures the ability of the PBT to detect property violations through generation of property mutants. In our evaluation on 40 Python library API methods across three models (GPT-4, Gemini-1.5-Pro, Claude-3-Opus), we find that with the best model and prompting approach, a valid and sound PBT can be synthesized in 2.4 samples on average. We additionally find that our metric for determining soundness of a PBT is aligned with human judgment of property assertions, achieving a precision of 100% and recall of 97%. Finally, we evaluate the property coverage of LLMs across all API methods and find that the best model (GPT-4) is able to automatically synthesize correct PBTs for 21% of properties extractable from API documentation.
△ Less
Submitted 21 July, 2024; v1 submitted 10 July, 2023;
originally announced July 2023.
-
Large Language Models are Few-Shot Health Learners
Authors:
Xin Liu,
Daniel McDuff,
Geza Kovacs,
Isaac Galatzer-Levy,
Jacob Sunshine,
Jiening Zhan,
Ming-Zher Poh,
Shun Liao,
Paolo Di Achille,
Shwetak Patel
Abstract:
Large language models (LLMs) can capture rich representations of concepts that are useful for real-world tasks. However, language alone is limited. While existing LLMs excel at text-based inferences, health applications require that models be grounded in numerical data (e.g., vital signs, laboratory values in clinical domains; steps, movement in the wellness domain) that is not easily or readily e…
▽ More
Large language models (LLMs) can capture rich representations of concepts that are useful for real-world tasks. However, language alone is limited. While existing LLMs excel at text-based inferences, health applications require that models be grounded in numerical data (e.g., vital signs, laboratory values in clinical domains; steps, movement in the wellness domain) that is not easily or readily expressed as text in existing training corpus. We demonstrate that with only few-shot tuning, a large language model is capable of grounding various physiological and behavioral time-series data and making meaningful inferences on numerous health tasks for both clinical and wellness contexts. Using data from wearable and medical sensor recordings, we evaluate these capabilities on the tasks of cardiac signal analysis, physical activity recognition, metabolic calculation (e.g., calories burned), and estimation of stress reports and mental health screeners.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Gradual C0: Symbolic Execution for Gradual Verification
Authors:
Jenna DiVincenzo,
Ian McCormack,
Hemant Gouni,
Jacob Gorenburg,
Jan-Paul Ramos-Dávila,
Mona Zhang,
Conrad Zimmerman,
Joshua Sunshine,
Éric Tanter,
Jonathan Aldrich
Abstract:
Current static verification techniques support a wide range of programs. However, such techniques only support complete and detailed specifications, which places an undue burden on users. To solve this problem, prior work proposed gradual verification, which handles complete, partial, or missing specifications by soundly combining static and dynamic checking. Gradual verification has also been ext…
▽ More
Current static verification techniques support a wide range of programs. However, such techniques only support complete and detailed specifications, which places an undue burden on users. To solve this problem, prior work proposed gradual verification, which handles complete, partial, or missing specifications by soundly combining static and dynamic checking. Gradual verification has also been extended to programs that manipulate recursive, mutable data structures on the heap. Unfortunately, this extension does not reward users with decreased dynamic checking as specifications are refined. In fact, all properties are checked dynamically regardless of any static guarantees. Additionally, no full-fledged implementation of gradual verification exists so far, which prevents studying its performance and applicability in practice.
We present Gradual C0, the first practicable gradual verifier for recursive heap data structures, which targets C0, a safe subset of C designed for education. Static verifiers supporting separation logic or implicit dynamic frames use symbolic execution for reasoning; so Gradual C0, which extends one such verifier, adopts symbolic execution at its core instead of the weakest liberal precondition approach used in prior work. Our approach addresses technical challenges related to symbolic execution with imprecise specifications, heap ownership, and branching in both program statements and specification formulas. We also deal with challenges related to minimizing insertion of dynamic checks and extensibility to other programming languages beyond C0. Finally, we provide the first empirical performance evaluation of a gradual verifier, and found that on average, Gradual C0 decreases run-time overhead between 11-34% compared to the fully-dynamic approach used in prior work. Further, the worst-case scenarios for performance are predictable and avoidable.
△ Less
Submitted 19 January, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Gradual Program Analysis for Null Pointers
Authors:
Sam Estep,
Jenna Wise,
Jonathan Aldrich,
Éric Tanter,
Johannes Bader,
Joshua Sunshine
Abstract:
Static analysis tools typically address the problem of excessive false positives by requiring programmers to explicitly annotate their code. However, when faced with incomplete annotations, many analysis tools are either too conservative, yielding false positives, or too optimistic, resulting in unsound analysis results. In order to flexibly and soundly deal with partially-annotated programs, we p…
▽ More
Static analysis tools typically address the problem of excessive false positives by requiring programmers to explicitly annotate their code. However, when faced with incomplete annotations, many analysis tools are either too conservative, yielding false positives, or too optimistic, resulting in unsound analysis results. In order to flexibly and soundly deal with partially-annotated programs, we propose to build upon and adapt the gradual typing approach to abstract-interpretation-based program analyses. Specifically, we focus on null-pointer analysis and demonstrate that a gradual null-pointer analysis hits a sweet spot, by gracefully applying static analysis where possible and relying on dynamic checks where necessary for soundness. In addition to formalizing a gradual null-pointer analysis for a core imperative language, we build a prototype using the Infer static analysis framework, and present preliminary evidence that the gradual null-pointer analysis reduces false positives compared to two existing null-pointer checkers for Infer. Further, we discuss ways in which the gradualization approach used to derive the gradual analysis from its static counterpart can be extended to support more domains. This work thus provides a basis for future analysis tools that can smoothly navigate the tradeoff between human effort and run-time overhead to reduce the number of reported false positives.
△ Less
Submitted 14 July, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Containing Malicious Package Updates in npm with a Lightweight Permission System
Authors:
Gabriel Ferreira,
Limin Jia,
Joshua Sunshine,
Christian Kästner
Abstract:
The large amount of third-party packages available in fast-moving software ecosystems, such as Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages in the npm repository that are used in Node.js applications perform only simple computations and do not need access to filesyst…
▽ More
The large amount of third-party packages available in fast-moving software ecosystems, such as Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages in the npm repository that are used in Node.js applications perform only simple computations and do not need access to filesystem or network APIs. This offers the opportunity to enforce least-privilege design per package, protecting applications and package dependencies from malicious updates. We propose a lightweight permission system that protects Node.js applications by enforcing package permissions at runtime. We discuss the design space of solutions and show that our system makes a large number of packages much harder to be exploited, almost for free.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Smart Speakers, the Next Frontier in Computational Health
Authors:
Jacob Sunshine
Abstract:
The rapid dissemination and adoption of smart speakers has enabled substantial opportunities to improve human health. Just as the introduction of the mobile phone led to considerable health innovation, smart speaker computing systems carry several unique advantages that have the potential to catalyze new fields of health research, particularly in out-of-hospital environments. The recent rise and u…
▽ More
The rapid dissemination and adoption of smart speakers has enabled substantial opportunities to improve human health. Just as the introduction of the mobile phone led to considerable health innovation, smart speaker computing systems carry several unique advantages that have the potential to catalyze new fields of health research, particularly in out-of-hospital environments. The recent rise and ubiquity of these smart computing systems hold significant potential for enhancing chronic disease management, enabling passive identification of unwitnessed medical emergencies, detecting subtle changes in human behavior and cognition, limiting isolation, and potentially allowing widespread, passive, remote monitoring of respiratory diseases that impact the public health. There are 3 broad mechanisms for how a smart speaker can interact with a person to improve health. These include (i) as an intelligent conversational agent, (ii) a passive identifier of medically relevant diagnostic sounds and (iii) active sensing using the device's internal hardware to measure physiologic parameters, such as with active sonar, radar or computer vision. Each of these different modalities have specific clinical use cases, all of which need to be balanced against potential privacy concerns, equity related to system access and regulatory frameworks which have not yet been developed for this unique type of passive data collection.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing
Authors:
Justin Chan,
Dean Foster,
Shyam Gollakota,
Eric Horvitz,
Joseph Jaeger,
Sham Kakade,
Tadayoshi Kohno,
John Langford,
Jonathan Larson,
Puneet Sharma,
Sudheesh Singanamalla,
Jacob Sunshine,
Stefano Tessaro
Abstract:
The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the…
▽ More
The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the civil liberties of individuals. In particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. We also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information.
More generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the COVID-19 pandemic.
△ Less
Submitted 7 May, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Can Advanced Type Systems Be Usable? An Empirical Study of Ownership, Assets, and Typestate in Obsidian
Authors:
Michael Coblenz,
Jonathan Aldrich,
Joshua Sunshine,
Brad A. Myers
Abstract:
Some blockchain programs (smart contracts) have included serious security vulnerabilities. Obsidian is a new typestate-oriented programming language that uses a strong type system to rule out some of these vulnerabilities. Although Obsidian was designed to promote usability to make it as easy as possible to write programs, strong type systems can cause a language to be difficult to use. In particu…
▽ More
Some blockchain programs (smart contracts) have included serious security vulnerabilities. Obsidian is a new typestate-oriented programming language that uses a strong type system to rule out some of these vulnerabilities. Although Obsidian was designed to promote usability to make it as easy as possible to write programs, strong type systems can cause a language to be difficult to use. In particular, ownership, typestate, and assets, which Obsidian uses to provide safety guarantees, have not seen broad adoption together in popular languages and result in significant usability challenges. We performed an empirical study with 20 participants comparing Obsidian to Solidity, which is the language most commonly used for writing smart contracts today. We observed that Obsidian participants were able to successfully complete more of the programming tasks than the Solidity participants. We also found that the Solidity participants commonly inserted asset-related bugs, which Obsidian detects at compile time.
△ Less
Submitted 15 October, 2020; v1 submitted 26 March, 2020;
originally announced March 2020.
-
PLIERS: A Process that Integrates User-Centered Methods into Programming Language Design
Authors:
Michael Coblenz,
Gauri Kambhatla,
Paulette Koronkevich,
Jenna L. Wise,
Celeste Barnaby,
Joshua Sunshine,
Jonathan Aldrich,
Brad A. Myers
Abstract:
Programming language design requires making many usability-related design decisions. However, existing HCI methods can be impractical to apply to programming languages: they have high iteration costs, programmers require significant learning time, and user performance has high variance. To address these problems, we adapted both formative and summative HCI methods to make them more suitable for pr…
▽ More
Programming language design requires making many usability-related design decisions. However, existing HCI methods can be impractical to apply to programming languages: they have high iteration costs, programmers require significant learning time, and user performance has high variance. To address these problems, we adapted both formative and summative HCI methods to make them more suitable for programming language design. We integrated these methods into a new process, PLIERS, for designing programming languages in a user-centered way. We evaluated PLIERS by using it to design two new programming languages. Glacier extends Java to enable programmers to express immutability properties effectively and easily. Obsidian is a language for blockchains that includes verification of critical safety properties. Summative usability studies showed that programmers were able to program effectively in both languages after short training periods.
△ Less
Submitted 25 August, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Obsidian: Typestate and Assets for Safer Blockchain Programming
Authors:
Michael Coblenz,
Reed Oei,
Tyler Etzel,
Paulette Koronkevich,
Miles Baker,
Yannick Bloem,
Brad A. Myers,
Joshua Sunshine,
Jonathan Aldrich
Abstract:
Blockchain platforms are coming into broad use for processing critical transactions among participants who have not established mutual trust. Many blockchains are programmable, supporting smart contracts, which maintain persistent state and support transactions that transform the state. Unfortunately, bugs in many smart contracts have been exploited by hackers. Obsidian is a novel programming lang…
▽ More
Blockchain platforms are coming into broad use for processing critical transactions among participants who have not established mutual trust. Many blockchains are programmable, supporting smart contracts, which maintain persistent state and support transactions that transform the state. Unfortunately, bugs in many smart contracts have been exploited by hackers. Obsidian is a novel programming language with a type system that enables static detection of bugs that are common in smart contracts today. Obsidian is based on a core calculus, Silica, for which we proved type soundness. Obsidian uses typestate to detect improper state manipulation and uses linear types to detect abuse of assets. We describe two case studies that evaluate Obsidian's applicability to the domains of parametric insurance and supply chain management, finding that Obsidian's type system facilitates reasoning about high-level states and ownership of resources. We compared our Obsidian implementation to a Solidity implementation, observing that the Solidity implementation requires much boilerplate checking and tracking of state, whereas Obsidian does this work statically.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
Design Dimensions for Software Certification: A Grounded Analysis
Authors:
Gabriel Ferreira,
Christian Kästner,
Joshua Sunshine,
Sven Apel,
William Scherlis
Abstract:
In many domains, software systems cannot be deployed until authorities judge them fit for use in an intended operating environment. Certification standards and processes have been devised and deployed to regulate operations of software systems and prevent their failures. However, practitioners are often unsatisfied with the efficiency and value proposition of certification efforts. In this study,…
▽ More
In many domains, software systems cannot be deployed until authorities judge them fit for use in an intended operating environment. Certification standards and processes have been devised and deployed to regulate operations of software systems and prevent their failures. However, practitioners are often unsatisfied with the efficiency and value proposition of certification efforts. In this study, we compare two certification standards, Common Criteria and DO-178C, and collect insights from literature and from interviews with subject-matter experts to identify design options relevant to the design of standards. The results of the comparison of certification efforts---leading to the identification of design dimensions that affect their quality---serve as a framework to guide the comparison, creation, and revision of certification standards and processes. This paper puts software engineering research in context and discusses key issues around process and quality assurance and includes observations from industry about relevant topics such as recertification, timely evaluations, but also technical discussions around model-driven approaches and formal methods. Our initial characterization of the design space of certification efforts can be used to inform technical discussions and to influence the directions of new or existing certification efforts. Practitioners, technical commissions, and government can directly benefit from our analytical framework.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Contactless Cardiac Arrest Detection Using Smart Devices
Authors:
Justin Chan,
Thomas Rea,
Shyamnath Gollakota,
Jacob E. Sunshine
Abstract:
Out-of-hospital cardiac arrest (OHCA) is a leading cause of death worldwide. Rapid diagnosis and initiation of cardiopulmonary resuscitation (CPR) is the cornerstone of therapy for victims of cardiac arrest. Yet a significant fraction of cardiac arrest victims have no chance of survival because they experience an unwitnessed event, often in the privacy of their own homes. An under-appreciated diag…
▽ More
Out-of-hospital cardiac arrest (OHCA) is a leading cause of death worldwide. Rapid diagnosis and initiation of cardiopulmonary resuscitation (CPR) is the cornerstone of therapy for victims of cardiac arrest. Yet a significant fraction of cardiac arrest victims have no chance of survival because they experience an unwitnessed event, often in the privacy of their own homes. An under-appreciated diagnostic element of cardiac arrest is the presence of agonal breathing, an audible biomarker and brainstem reflex that arises in the setting of severe hypoxia. Here, we demonstrate that a support vector machine (SVM) can classify agonal breathing instances in real-time within a bedroom environment. Using real-world labeled 9-1-1 audio of cardiac arrests, we train the SVM to accurately classify agonal breathing instances. We obtain an area under the curve (AUC) of 0.998 and an operating point with an overall sensitivity and specificity of 97.03% (95% CI: 96.62 -- 97.41%) and 98.20% (95% CI: 97.87 -- 98.49%). We achieve a false positive rate between 0% -- 0.10% over 82 hours (117,895 audio segments) of polysomnographic sleep lab data that includes snoring, hypopnea, central and obstructive sleep apnea events. We demonstrate the effectiveness of our contactless system in identifying real-world cardiac arrest-associated agonal breathing instances and successfully evaluate our classifier using commodity smart devices (Amazon Echo and Apple iPhone).
△ Less
Submitted 27 February, 2019; v1 submitted 31 January, 2019;
originally announced February 2019.
-
Debugging Framework Applications: Benefits and Challenges
Authors:
Zack Coker,
David Gray Widder,
Claire Le Goues,
Christopher Bogart,
Joshua Sunshine
Abstract:
Aspects of frameworks, such as inversion of control and the structure of framework applications, require developers to adjust their debugging strategies as compared to debugging sequential programs. However, the benefits and challenges of framework debugging are not fully understood, and gaining this knowledge could provide guidance in debugging strategies and framework tool design. To gain insigh…
▽ More
Aspects of frameworks, such as inversion of control and the structure of framework applications, require developers to adjust their debugging strategies as compared to debugging sequential programs. However, the benefits and challenges of framework debugging are not fully understood, and gaining this knowledge could provide guidance in debugging strategies and framework tool design. To gain insight into the process developers use to fix problems in framework applications, we performed two human studies investigating how developers fix applications that use a framework API incorrectly. These studies focused on the Android Fragment class and the ROS framework. We analyzed the results of the studies using a mixed-methods approach, consisting of techniques from grounded theory, qualitative content analysis, and case studies. From our analysis, we produced a theory of the benefits and challenges of framework debugging. This theory states that developers find inversion of control challenging when debugging but find the structure of framework applications helpful. This theory could be used to guide strategies for debugging framework applications and framework tool designs.
△ Less
Submitted 16 January, 2018;
originally announced January 2018.
-
Toward Semantic Foundations for Program Editors
Authors:
Cyrus Omar,
Ian Voysey,
Michael Hilton,
Joshua Sunshine,
Claire Le Goues,
Jonathan Aldrich,
Matthew A. Hammer
Abstract:
Programming language definitions assign formal meaning to complete programs. Programmers, however, spend a substantial amount of time interacting with incomplete programs -- programs with holes, type inconsistencies and binding inconsistencies -- using tools like program editors and live programming environments (which interleave editing and evaluation). Semanticists have done comparatively little…
▽ More
Programming language definitions assign formal meaning to complete programs. Programmers, however, spend a substantial amount of time interacting with incomplete programs -- programs with holes, type inconsistencies and binding inconsistencies -- using tools like program editors and live programming environments (which interleave editing and evaluation). Semanticists have done comparatively little to formally characterize (1) the static and dynamic semantics of incomplete programs; (2) the actions available to programmers as they edit and inspect incomplete programs; and (3) the behavior of editor services that suggest likely edit actions to the programmer based on semantic information extracted from the incomplete program being edited, and from programs that the system has encountered in the past. As such, each tool designer has largely been left to develop their own ad hoc heuristics.
This paper serves as a vision statement for a research program that seeks to develop these "missing" semantic foundations. Our hope is that these contributions, which will take the form of a series of simple formal calculi equipped with a tractable metatheory, will guide the design of a variety of current and future interactive programming tools, much as various lambda calculi have guided modern language designs. Our own research will apply these principles in the design of Hazel, an experimental live lab notebook programming environment designed for data science tasks. We plan to co-design the Hazel language with the editor so that we can explore concepts such as edit-time semantic conflict resolution mechanisms and mechanisms that allow library providers to install library-specific editor services.
△ Less
Submitted 25 March, 2017;
originally announced March 2017.