Skip to main content

Showing 1–8 of 8 results for author: Liu, Z H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.20683  [pdf, ps, other

    cs.DB

    In-memory Incremental Maintenance of Provenance Sketches [extended version]

    Authors: Pengyuan Li, Boris Glavic, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu, Danica Porobic, Xing Niu

    Abstract: Provenance-based data skipping compactly over-approximates the provenance of a query using so-called provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a sketch captured at some time in the past may become stale if the data has been updated subsequently. Thus, there is a need to maintain provenance sketches. In this… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2504.10726  [pdf, other

    cs.SE

    Beyond the Classroom: Bridging the Gap Between Academia and Industry with a Hands-on Learning Approach

    Authors: Mingyang Xu, Ryan Zheng He Liu, Mark Stoodley, Ladan Tahvildari

    Abstract: Modern software systems require various capabilities to meet architectural and operational demands, such as the ability to scale automatically and recover from sudden failures. Self-adaptive software systems have emerged as a critical focus in software design and operation due to their capacity to autonomously adapt to changing environments. However, educating students on this topic is scarce in a… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by the 2025 IEEE/ACM 37th International Conference on Software Engineering Education and Training (CSEE&T)

  3. arXiv:2403.01003  [pdf, other

    cs.SE cs.AI

    FlaKat: A Machine Learning-Based Categorization Framework for Flaky Tests

    Authors: Shizhe Lin, Ryan Zheng He Liu, Ladan Tahvildari

    Abstract: Flaky tests can pass or fail non-deterministically, without alterations to a software system. Such tests are frequently encountered by developers and hinder the credibility of test suites. State-of-the-art research incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy. Moreover, the majority of automated flaky test repair solutions are designed for… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  4. MultiCategory: Multi-model Query Processing Meets Category Theory and Functional Programming

    Authors: Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, Gregory Pogossiants

    Abstract: The variety of data is one of the important issues in the era of Big Data. The data are naturally organized in different formats and models, including structured data, semi-structured data, and unstructured data. Prior research has envisioned an approach to abstract multi-model data with a schema category and an instance category by using category theory. In this paper, we demonstrate a system, ca… ▽ More

    Submitted 30 August, 2021; originally announced September 2021.

    Comments: VLDB'21 Demonstration paper, 4 pages, 6 figures

    Journal ref: Proceedings of the VLDB Endowment, Vol. 14, No. 12: 2663- 2666, 2021

  5. arXiv:1804.07156  [pdf, other

    cs.DB

    Heuristic and Cost-based Optimization for Diverse Provenance Tasks

    Authors: Xing Niu, Raghav Kapoor, Boris Glavic, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Venkatesh Radhakrishnan

    Abstract: A well-established technique for capturing database provenance as annotations on data is to instrument queries to propagate such annotations. However, even sophisticated query optimizers often fail to produce efficient execution plans for instrumented queries. We develop provenance-aware optimization techniques to address this problem. Specifically, we study algebraic equivalences targeted at inst… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

    Comments: IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018, long version, 31 pages. arXiv admin note: substantial text overlap with arXiv:1701.05513

    Journal ref: IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018

  6. arXiv:1707.09930  [pdf, other

    cs.DB

    Debugging Transactions and Tracking their Provenance with Reenactment

    Authors: Xing Niu, Bahareh Sadat Arab, Seokki Lee, Su Feng, Xun Zou, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu, Boris Glavic

    Abstract: Debugging transactions and understanding their execution are of immense importance for developing OLAP applications, to trace causes of errors in production systems, and to audit the operations of a database. However, debugging transactions is hard for several reasons: 1) after the execution of a transaction, its input is no longer available for debugging, 2) internal states of a transaction are t… ▽ More

    Submitted 31 July, 2017; originally announced July 2017.

    Comments: to appear as "Debugging Transactions and Tracking their Provenance with Reenactment" in PVDLB 2017, vol 10., nr. 12

    ACM Class: H.2

  7. arXiv:1612.08050  [pdf, other

    cs.DB

    UDBMS: Road to Unification for Multi-model Data Management

    Authors: Jiaheng Lu, Zhen Hua Liu, Pengfei Xu, Chao Zhang

    Abstract: A traditional database systems is organized around a single data model that determines how data can be organized, stored and manipulated. But the vision of this paper is to develop new principles and techniques to manage multiple data models against a single, integrated backend. For example, semi-structured, graph and relational models are examples of data models that may be supported by a new sys… ▽ More

    Submitted 23 December, 2016; originally announced December 2016.

  8. arXiv:1601.00073  [pdf, other

    cs.DB cs.PL

    Mimir: Bringing CTables into Practice

    Authors: Arindam Nandi, Ying Yang, Oliver Kennedy, Boris Glavic, Ronny Fehling, Zhen Hua Liu, Dieter Gawlick

    Abstract: The present state of the art in analytics requires high upfront investment of human effort and computational resources to curate datasets, even before the first query is posed. So-called pay-as-you-go data curation techniques allow these high costs to be spread out, first by enabling queries over uncertain and incomplete data, and then by assessing the quality of the query results. We describe the… ▽ More

    Submitted 1 January, 2016; originally announced January 2016.

    Comments: Under submission; The first two authors should be considered a joint first-author