-
DeepCS-TRD, a Deep Learning-based Cross-Section Tree Ring Detector
Authors:
Henry Marichal,
Verónica Casaravilla,
Candice Power,
Karolain Mello,
Joaquín Mazarino,
Christine Lucas,
Ludmila Profumo,
Diego Passarella,
Gregory Randall
Abstract:
Here, we propose Deep CS-TRD, a new automatic algorithm for detecting tree rings in whole cross-sections. It substitutes the edge detection step of CS-TRD by a deep-learning-based approach (U-Net), which allows the application of the method to different image domains: microscopy, scanner or smartphone acquired, and species (Pinus taeda, Gleditsia triachantos and Salix glauca). Additionally, we int…
▽ More
Here, we propose Deep CS-TRD, a new automatic algorithm for detecting tree rings in whole cross-sections. It substitutes the edge detection step of CS-TRD by a deep-learning-based approach (U-Net), which allows the application of the method to different image domains: microscopy, scanner or smartphone acquired, and species (Pinus taeda, Gleditsia triachantos and Salix glauca). Additionally, we introduce two publicly available datasets of annotated images to the community. The proposed method outperforms state-of-the-art approaches in macro images (Pinus taeda and Gleditsia triacanthos) while showing slightly lower performance in microscopy images of Salix glauca. To our knowledge, this is the first paper that studies automatic tree ring detection for such different species and acquisition conditions. The dataset and source code are available in https://github.com/hmarichal93/deepcstrd
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
The Free Termination Property of Queries Over Time
Authors:
Conor Power,
Paraschos Koutris,
Joseph M Hellerstein
Abstract:
Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow nodes in a distributed (database) system to unilaterally terminate execution even though they may receive additional data or messages in the future? This completeness question is complementary to the soun…
▽ More
Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow nodes in a distributed (database) system to unilaterally terminate execution even though they may receive additional data or messages in the future? This completeness question is complementary to the soundness questions studied in the CALM literature. We also develop a new model based on semiautomata that allows us to bridge from the relational transducer model of the CALM papers to algebraic models that are popular among software engineers (e.g. CRDTs) and of increasing interest to database theory for datalog extensions and incremental view maintenance.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Optimizing the cloud? Don't train models. Build oracles!
Authors:
Tiemo Bang,
Conor Power,
Siavash Ameli,
Natacha Crooks,
Joseph M. Hellerstein
Abstract:
We propose cloud oracles, an alternative to machine learning for online optimization of cloud configurations. Our cloud oracle approach guarantees complete accuracy and explainability of decisions for problems that can be formulated as parametric convex optimizations. We give experimental evidence of this technique's efficacy and share a vision of research directions for expanding its applicabilit…
▽ More
We propose cloud oracles, an alternative to machine learning for online optimization of cloud configurations. Our cloud oracle approach guarantees complete accuracy and explainability of decisions for problems that can be formulated as parametric convex optimizations. We give experimental evidence of this technique's efficacy and share a vision of research directions for expanding its applicability.
△ Less
Submitted 22 December, 2023; v1 submitted 13 August, 2023;
originally announced August 2023.
-
Optimizing Stateful Dataflow with Local Rewrites
Authors:
Shadaj Laddad,
Conor Power,
Tyler Hou,
Alvin Cheung,
Joseph M. Hellerstein
Abstract:
Optimizing a stateful dataflow language is a challenging task. There are strict correctness constraints for preserving properties expected by downstream consumers, a large space of possible optimizations, and complex analyses that must reason about the behavior of the program over time. Classic compiler techniques with specialized optimization passes yield unpredictable performance and have comple…
▽ More
Optimizing a stateful dataflow language is a challenging task. There are strict correctness constraints for preserving properties expected by downstream consumers, a large space of possible optimizations, and complex analyses that must reason about the behavior of the program over time. Classic compiler techniques with specialized optimization passes yield unpredictable performance and have complex correctness proofs. But with e-graphs, we can dramatically simplify the process of building a correct optimizer while yielding more consistent results! In this short paper, we discuss our early work using e-graphs to develop an optimizer for a the Hydroflow dataflow language. Our prototype demonstrates that composing simple, easy-to-prove rewrite rules is sufficient to match techniques in hand-optimized systems.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
Invited Paper: Initial Steps Toward a Compiler for Distributed Programs
Authors:
Joseph M. Hellerstein,
Shadaj Laddad,
Mae Milano,
Conor Power,
Mingwei Samuel
Abstract:
In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program tra…
▽ More
In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program transformations can combine, step by step, to convert a single-node program into a variety of distributed design points that offer the same semantics with different performance and deployment characteristics.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Keep CALM and CRDT On
Authors:
Shadaj Laddad,
Conor Power,
Mae Milano,
Alvin Cheung,
Natacha Crooks,
Joseph M. Hellerstein
Abstract:
Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CR…
▽ More
Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for developing CRDT data stores that let developers safely and efficiently interact with replicated application state.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Katara: Synthesizing CRDTs with Verified Lifting
Authors:
Shadaj Laddad,
Conor Power,
Mae Milano,
Alvin Cheung,
Joseph M. Hellerstein
Abstract:
Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we pres…
▽ More
Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between non-commutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification.
△ Less
Submitted 21 September, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
KEA: Tuning an Exabyte-Scale Data Infrastructure
Authors:
Yiwen Zhu,
Subru Krishnan,
Konstantinos Karanasos,
Isha Tarte,
Conor Power,
Abhishek Modi,
Manoj Kumar,
Deli Zhang,
Kartheek Muthyala,
Nick Jurgens,
Sarvesh Sakalanaga,
Sudhir Darbha,
Minu Iyer,
Ankita Agarwal,
Carlo Curino
Abstract:
Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this infrastructure, achieving state-of-the-art efficienc…
▽ More
Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this infrastructure, achieving state-of-the-art efficiency (>60% average CPU utilization across all clusters). Despite rich telemetry and strong expertise, faced with evolving hardware/software/workloads this manual tuning approach had reached its limit -- we had plateaued.
In this paper, we present KEA, a multi-year effort to automate our tuning processes to be fully data/model-driven. KEA leverages a mix of domain knowledge and principled data science to capture the essence of our cluster dynamic behavior in a set of machine learning (ML) models based on collected system data. These models power automated optimization procedures for parameter tuning, and inform our leadership in critical decisions around engineering and capacity management (such as hardware and data center design, software investments, etc.). We combine "observational" tuning (i.e., using models to predict system behavior without direct experimentation) with judicious use of "flighting" (i.e., conservative testing in production). This allows us to support a broad range of applications that we discuss in this paper.
KEA continuously tunes our cluster configurations and is on track to save Microsoft tens of millions of dollars per year. At the best of our knowledge, this paper is the first to discuss research challenges and practical learnings that emerge when tuning an exabyte-scale data infrastructure.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
From Helmut Jürgensen's Former Students: The Game of Informatics Research
Authors:
Mark Daley,
Mark Eramian,
Christopher Power,
Ian McQuillan
Abstract:
Personal reflections are given on being students of Helmut Jürgensen. Then, we attempt to address his hypothesis that informatics follows trend-like behaviours through the use of a content analysis of university job advertisements, and then via simulation techniques from the area of quantitative economics.
Personal reflections are given on being students of Helmut Jürgensen. Then, we attempt to address his hypothesis that informatics follows trend-like behaviours through the use of a content analysis of university job advertisements, and then via simulation techniques from the area of quantitative economics.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
Characteristics and Motivations of Players with Disabilities in Digital Games
Authors:
Jen Beeston,
Christopher Power,
Paul Cairns,
Mark Barlet
Abstract:
In research and practice into the accessibility of digital games, much of the work has focused on how to make games accessible to people with disa- bilities. With an increasing number of people with disabilities playing main- stream commercial games, it is important that we understand who they are and how they play in order to take a more user-centered approach as this field grows. We conducted a…
▽ More
In research and practice into the accessibility of digital games, much of the work has focused on how to make games accessible to people with disa- bilities. With an increasing number of people with disabilities playing main- stream commercial games, it is important that we understand who they are and how they play in order to take a more user-centered approach as this field grows. We conducted a demographic survey of 230 players with disabilities and found that they play mainstream digital games using a variety of assistive tech- nologies, use accessibility options such as key remapping and subtitles, and they identify themselves as gamers who play digital games as their primary hobby. This gives us a richer picture of players with disabilities and indicates that there are opportunities to begin to look at accessible player experiences (APX) in games.
△ Less
Submitted 29 May, 2018;
originally announced May 2018.