Search | arXiv e-print repository

Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle

Authors: Rolando Garcia, Pragya Kallanagoudar, Chithra Anand, Sarah E. Chasins, Joseph M. Hellerstein, Erin Michelle Turner Kerrison, Aditya G. Parameswaran

Abstract: In this paper we present techniques to incrementally harvest and query arbitrary metadata from machine learning pipelines, without disrupting agile practices. We center our approach on the developer-favored technique for generating metadata -- log statements -- leveraging the fact that logging creates context. We show how hindsight logging allows such statements to be added and executed post-hoc,… ▽ More In this paper we present techniques to incrementally harvest and query arbitrary metadata from machine learning pipelines, without disrupting agile practices. We center our approach on the developer-favored technique for generating metadata -- log statements -- leveraging the fact that logging creates context. We show how hindsight logging allows such statements to be added and executed post-hoc, without requiring developer foresight. Relational views of incomplete metadata can be queried to dynamically materialize new metadata in bulk and on demand across multiple versions of workflows. This is done in a "metadata later" style, off the critical path of agile development. We realize these ideas in a system called FlorDB and demonstrate how the data context framework covers a range of both ad-hoc metadata as well as special cases treated today by bespoke feature stores and model repositories. Through a usage scenario -- including both ML and human feedback -- we illustrate how the component techniques come together to resolve classic software engineering trade-offs between agility and discipline. △ Less

Submitted 15 November, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

arXiv:2405.16669 [pdf, other]

doi 10.1145/3613904.3642605

Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study

Authors: Hellina Hailu Nigatu, John Canny, Sarah E. Chasins

Abstract: Online Knowledge Repositories (OKRs) like Wikipedia offer communities a way to share and preserve information about themselves and their ways of living. However, for communities with low-resourced languages -- including most African communities -- the quality and volume of content available are often inadequate. One reason for this lack of adequate content could be that many OKRs embody Western wa… ▽ More Online Knowledge Repositories (OKRs) like Wikipedia offer communities a way to share and preserve information about themselves and their ways of living. However, for communities with low-resourced languages -- including most African communities -- the quality and volume of content available are often inadequate. One reason for this lack of adequate content could be that many OKRs embody Western ways of knowledge preservation and sharing, requiring many low-resourced language communities to adapt to new interactions. To understand the challenges faced by low-resourced language contributors on the popular OKR Wikipedia, we conducted (1) a thematic analysis of Wikipedia forum discussions and (2) a contextual inquiry study with 14 novice contributors. We focused on three Ethiopian languages: Afan Oromo, Amharic, and Tigrinya. Our analysis revealed several recurring themes; for example, contributors struggle to find resources to corroborate their articles in low-resourced languages, and language technology support, like translation systems and spellcheck, result in several errors that waste contributors' time. We hope our study will support designers in making online knowledge repositories accessible to low-resourced language speakers. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI 2024)

arXiv:2205.07147 [pdf]

The Sky Above The Clouds

Authors: Sarah Chasins, Alvin Cheung, Natacha Crooks, Ali Ghodsi, Ken Goldberg, Joseph E. Gonzalez, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, Michael W. Mahoney, Aditya Parameswaran, David Patterson, Raluca Ada Popa, Koushik Sen, Scott Shenker, Dawn Song, Ion Stoica

Abstract: Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen ye… ▽ More Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen years old, could evolve as it matures. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: 35 pages

arXiv:1904.05387 [pdf, other]

doi 10.1145/3332165.3347940

Tea: A High-level Language and Runtime System for Automating Statistical Analysis

Authors: Eunice Jun, Maureen Daum, Jared Roesch, Sarah E. Chasins, Emery D. Berger, Rene Just, Katharina Reinecke

Abstract: Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introdu… ▽ More Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: 11 pages

arXiv:1611.07620 [pdf, other]

doi 10.4204/EPTCS.229.3

Using SyGuS to Synthesize Reactive Motion Plans

Authors: Sarah Chasins, Julie L. Newcomb

Abstract: We present an approach for synthesizing reactive robot motion plans, based on compilation to Syntax-Guided Synthesis (SyGuS) specifications. Our method reduces the motion planning problem to the problem of synthesizing a function that can choose the next robot action in response to the current state of the system. This technique offers reactivity not by generating new motion plans throughout dep… ▽ More We present an approach for synthesizing reactive robot motion plans, based on compilation to Syntax-Guided Synthesis (SyGuS) specifications. Our method reduces the motion planning problem to the problem of synthesizing a function that can choose the next robot action in response to the current state of the system. This technique offers reactivity not by generating new motion plans throughout deployment, but by synthesizing a single program that causes the robot to reach its target from any system state that is consistent with the system model. This approach allows our tool to handle environments with adversarial obstacles. This work represents the first use of the SyGuS formalism to solve robot motion planning problems. We investigate whether using SyGuS for a bounded two-player reachability game is practical at this point in time. △ Less

Submitted 22 November, 2016; originally announced November 2016.

Comments: In Proceedings SYNT 2016, arXiv:1611.07178

Journal ref: EPTCS 229, 2016, pp. 3-20

Showing 1–5 of 5 results for author: Chasins, S