Search | arXiv e-print repository

DeepCS-TRD, a Deep Learning-based Cross-Section Tree Ring Detector

Authors: Henry Marichal, Verónica Casaravilla, Candice Power, Karolain Mello, Joaquín Mazarino, Christine Lucas, Ludmila Profumo, Diego Passarella, Gregory Randall

Abstract: Here, we propose Deep CS-TRD, a new automatic algorithm for detecting tree rings in whole cross-sections. It substitutes the edge detection step of CS-TRD by a deep-learning-based approach (U-Net), which allows the application of the method to different image domains: microscopy, scanner or smartphone acquired, and species (Pinus taeda, Gleditsia triachantos and Salix glauca). Additionally, we int… ▽ More Here, we propose Deep CS-TRD, a new automatic algorithm for detecting tree rings in whole cross-sections. It substitutes the edge detection step of CS-TRD by a deep-learning-based approach (U-Net), which allows the application of the method to different image domains: microscopy, scanner or smartphone acquired, and species (Pinus taeda, Gleditsia triachantos and Salix glauca). Additionally, we introduce two publicly available datasets of annotated images to the community. The proposed method outperforms state-of-the-art approaches in macro images (Pinus taeda and Gleditsia triacanthos) while showing slightly lower performance in microscopy images of Salix glauca. To our knowledge, this is the first paper that studies automatic tree ring detection for such different species and acquisition conditions. The dataset and source code are available in https://github.com/hmarichal93/deepcstrd △ Less

Submitted 22 April, 2025; originally announced April 2025.

Comments: 12 pages, 6 figures. Accepted in ICIAP 2025

arXiv:2502.00222 [pdf, other]

The Free Termination Property of Queries Over Time

Authors: Conor Power, Paraschos Koutris, Joseph M Hellerstein

Abstract: Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow nodes in a distributed (database) system to unilaterally terminate execution even though they may receive additional data or messages in the future? This completeness question is complementary to the soun… ▽ More Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow nodes in a distributed (database) system to unilaterally terminate execution even though they may receive additional data or messages in the future? This completeness question is complementary to the soundness questions studied in the CALM literature. We also develop a new model based on semiautomata that allows us to bridge from the relational transducer model of the CALM papers to algebraic models that are popular among software engineers (e.g. CRDTs) and of increasing interest to database theory for datalog extensions and incremental view maintenance. △ Less

Submitted 31 January, 2025; originally announced February 2025.

arXiv:2308.06815 [pdf, other]

Optimizing the cloud? Don't train models. Build oracles!

Authors: Tiemo Bang, Conor Power, Siavash Ameli, Natacha Crooks, Joseph M. Hellerstein

Abstract: We propose cloud oracles, an alternative to machine learning for online optimization of cloud configurations. Our cloud oracle approach guarantees complete accuracy and explainability of decisions for problems that can be formulated as parametric convex optimizations. We give experimental evidence of this technique's efficacy and share a vision of research directions for expanding its applicabilit… ▽ More We propose cloud oracles, an alternative to machine learning for online optimization of cloud configurations. Our cloud oracle approach guarantees complete accuracy and explainability of decisions for problems that can be formulated as parametric convex optimizations. We give experimental evidence of this technique's efficacy and share a vision of research directions for expanding its applicability. △ Less

Submitted 22 December, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: Camera-ready publication for CIDR'24: https://www.cidrdb.org/cidr2024/papers/p47-bang.pdf

arXiv:2306.10585 [pdf, other]

Optimizing Stateful Dataflow with Local Rewrites

Authors: Shadaj Laddad, Conor Power, Tyler Hou, Alvin Cheung, Joseph M. Hellerstein

Abstract: Optimizing a stateful dataflow language is a challenging task. There are strict correctness constraints for preserving properties expected by downstream consumers, a large space of possible optimizations, and complex analyses that must reason about the behavior of the program over time. Classic compiler techniques with specialized optimization passes yield unpredictable performance and have comple… ▽ More Optimizing a stateful dataflow language is a challenging task. There are strict correctness constraints for preserving properties expected by downstream consumers, a large space of possible optimizations, and complex analyses that must reason about the behavior of the program over time. Classic compiler techniques with specialized optimization passes yield unpredictable performance and have complex correctness proofs. But with e-graphs, we can dramatically simplify the process of building a correct optimizer while yielding more consistent results! In this short paper, we discuss our early work using e-graphs to develop an optimizer for a the Hydroflow dataflow language. Our prototype demonstrates that composing simple, easy-to-prove rewrite rules is sufficient to match techniques in hand-optimized systems. △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: EGRAPHS 2023

arXiv:2305.14614 [pdf, other]

doi 10.1145/3584684.3597272

Invited Paper: Initial Steps Toward a Compiler for Distributed Programs

Authors: Joseph M. Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, Mingwei Samuel

Abstract: In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program tra… ▽ More In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program transformations can combine, step by step, to convert a single-node program into a variety of distributed design points that offer the same semantics with different performance and deployment characteristics. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Journal ref: The 5th workshop on Advanced tools, program- ming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems (ApPLIED 2023), June 19, 2023, Orlando, FL, USA

arXiv:2210.12605 [pdf, other]

Keep CALM and CRDT On

Authors: Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein

Abstract: Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CR… ▽ More Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for developing CRDT data stores that let developers safely and efficiently interact with replicated application state. △ Less

Submitted 22 October, 2022; originally announced October 2022.

arXiv:2205.12425 [pdf, other]

Katara: Synthesizing CRDTs with Verified Lifting

Authors: Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein

Abstract: Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we pres… ▽ More Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between non-commutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification. △ Less

Submitted 21 September, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

ACM Class: D.1.2

arXiv:2106.11445 [pdf, other]

doi 10.1145/3448016.3457569

KEA: Tuning an Exabyte-Scale Data Infrastructure

Authors: Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, Carlo Curino

Abstract: Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this infrastructure, achieving state-of-the-art efficienc… ▽ More Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this infrastructure, achieving state-of-the-art efficiency (>60% average CPU utilization across all clusters). Despite rich telemetry and strong expertise, faced with evolving hardware/software/workloads this manual tuning approach had reached its limit -- we had plateaued. In this paper, we present KEA, a multi-year effort to automate our tuning processes to be fully data/model-driven. KEA leverages a mix of domain knowledge and principled data science to capture the essence of our cluster dynamic behavior in a set of machine learning (ML) models based on collected system data. These models power automated optimization procedures for parameter tuning, and inform our leadership in critical decisions around engineering and capacity management (such as hardware and data center design, software investments, etc.). We combine "observational" tuning (i.e., using models to predict system behavior without direct experimentation) with judicious use of "flighting" (i.e., conservative testing in production). This allows us to support a broad range of applications that we discuss in this paper. KEA continuously tunes our cluster configurations and is on track to save Microsoft tens of millions of dollars per year. At the best of our knowledge, this paper is the first to discuss research challenges and practical learnings that emerge when tuning an exabyte-scale data infrastructure. △ Less

Submitted 21 June, 2021; originally announced June 2021.

arXiv:1903.03405 [pdf, other]

doi 10.25596/jalc-2018-127

From Helmut Jürgensen's Former Students: The Game of Informatics Research

Authors: Mark Daley, Mark Eramian, Christopher Power, Ian McQuillan

Abstract: Personal reflections are given on being students of Helmut Jürgensen. Then, we attempt to address his hypothesis that informatics follows trend-like behaviours through the use of a content analysis of university job advertisements, and then via simulation techniques from the area of quantitative economics. Personal reflections are given on being students of Helmut Jürgensen. Then, we attempt to address his hypothesis that informatics follows trend-like behaviours through the use of a content analysis of university job advertisements, and then via simulation techniques from the area of quantitative economics. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Journal ref: Journal of Automata, Languages and Combinatorics, 23, 127-141, 2018

arXiv:1805.11352 [pdf]

Characteristics and Motivations of Players with Disabilities in Digital Games

Authors: Jen Beeston, Christopher Power, Paul Cairns, Mark Barlet

Abstract: In research and practice into the accessibility of digital games, much of the work has focused on how to make games accessible to people with disa- bilities. With an increasing number of people with disabilities playing main- stream commercial games, it is important that we understand who they are and how they play in order to take a more user-centered approach as this field grows. We conducted a… ▽ More In research and practice into the accessibility of digital games, much of the work has focused on how to make games accessible to people with disa- bilities. With an increasing number of people with disabilities playing main- stream commercial games, it is important that we understand who they are and how they play in order to take a more user-centered approach as this field grows. We conducted a demographic survey of 230 players with disabilities and found that they play mainstream digital games using a variety of assistive tech- nologies, use accessibility options such as key remapping and subtitles, and they identify themselves as gamers who play digital games as their primary hobby. This gives us a richer picture of players with disabilities and indicates that there are opportunities to begin to look at accessible player experiences (APX) in games. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Showing 1–10 of 10 results for author: Power, C