Skip to main content

Showing 1–15 of 15 results for author: Kennedy, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.08676  [pdf, other

    cs.DB

    Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data (Extended version)

    Authors: Su Feng, Boris Glavic, Oliver Kennedy

    Abstract: Uncertainty arises naturally inmany application domains due to, e.g., data entry errors and ambiguity in data cleaning. Prior work in incomplete and probabilistic databases has investigated the semantics and efficient evaluation of ranking and top-k queries over uncertain data. However, most approaches deal with top-k and ranking in isolation and do represent uncertain input data and query results… ▽ More

    Submitted 3 May, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  2. arXiv:2204.02758  [pdf, ps, other

    cs.DB cs.CC

    Computing expected multiplicities for bag-TIDBs with bounded multiplicities

    Authors: Su Feng, Boris Glavic, Aaron Huber, Oliver Kennedy, Atri Rudra

    Abstract: In this work, we study the problem of computing a tuple's expected multiplicity over probabilistic databases with bag semantics (where each tuple is associated with a multiplicity) exactly and approximately. We consider bag-TIDBs where we have a bound $c$ on the maximum multiplicity of each tuple and tuples are independent probabilistic events (we refer to such databases as c-TIDBs. We are specifi… ▽ More

    Submitted 1 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Added grant acknowledgements in v.3

  3. arXiv:2104.01241  [pdf, other

    cs.PL cs.DB

    TreeToaster: Towards an IVM-Optimized Compiler

    Authors: Darshana Balakrishnan, Carl Nuessle, Oliver Kennedy, Lukasz Ziarek

    Abstract: A compiler's optimizer operates over abstract syntax trees (ASTs), continuously applying rewrite rules to replace subtrees of the AST with more efficient ones. Especially on large source repositories, even simply finding opportunities for a rewrite can be expensive, as optimizer traverses the AST naively. In this paper, we leverage the need to repeatedly find rewrites, and explore options for maki… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: 23 pages, 17 figures

  4. arXiv:2102.11796  [pdf, other

    cs.DB

    Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds (extended version)

    Authors: Su Feng, Aaron Huber, Boris Glavic, Oliver Kennedy

    Abstract: Certain answers are a principled method for coping with the uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Prior work introduced Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers. UA-DBs combine the reliability of certain answers based o… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

  5. arXiv:1904.00234  [pdf, other

    cs.DB

    Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers (extended version)

    Authors: Su Feng, Aaron Huber, Boris Glavic, Oliver Kennedy

    Abstract: Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-app… ▽ More

    Submitted 30 March, 2019; originally announced April 2019.

  6. arXiv:1901.07627  [pdf, other

    cs.DB

    Just-in-Time Index Compilation

    Authors: Darshana Balakrishnan, Lukasz Ziarek, Oliver Kennedy

    Abstract: Creating or modifying a primary index is a time-consuming process, as the index typically needs to be rebuilt from scratch. In this paper, we explore a more graceful "just-in-time" approach to index reorganization, where small changes are dynamically applied in the background. To enable this type of reorganization, we formalize a composable organizational grammar, expressive enough to capture inst… ▽ More

    Submitted 22 January, 2019; originally announced January 2019.

    Comments: Work Supported by NSF Award #IIS-1617586

  7. arXiv:1809.00405  [pdf, other

    cs.DB

    Query Log Compression for Workload Analytics

    Authors: Ting Xie, Oliver Kennedy, Varun Chandola

    Abstract: Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmark development, and many other database administration tasks. Unfortunately, it is common for production databases to deal with millions or even more queries each day, so these logs must be summarized before they can be used. Designing an appropriate summary encoding requires trading off between concise… ▽ More

    Submitted 29 September, 2018; v1 submitted 2 September, 2018; originally announced September 2018.

    Comments: Typos fixed, some irrelevant figures and paragraphs are trimmed

  8. arXiv:1608.01013  [pdf, other

    cs.DB

    Summarizing Large Query Logs in Ettu

    Authors: Gokhan Kul, Duc Luong, Ting Xie, Patrick Coonan, Varun Chandola, Oliver Kennedy, Shambhu Upadhyaya

    Abstract: Database access logs are large, unwieldy, and hard for humans to inspect and summarize. In spite of this, they remain the canonical go-to resource for tasks ranging from performance tuning to security auditing. In this paper, we address the challenge of compactly encoding large sequences of SQL queries for presentation to a human user. Our approach is based on the Weisfeiler-Lehman (WL) approximat… ▽ More

    Submitted 2 August, 2016; originally announced August 2016.

    Comments: there are 12 pages, 8 figures, 4 tables and 28 referenced papers in bibliography

  9. arXiv:1606.02250  [pdf, other

    cs.DB

    Communicating Data Quality in On-Demand Curation

    Authors: Poonam Kumari, Said Achmiz, Oliver Kennedy

    Abstract: On-demand curation (ODC) tools like Paygo, KATARA, and Mimir allow users to defer expensive curation effort until it is necessary. In contrast to classical databases that do not respond to queries over potentially erroneous data, ODC systems instead answer with guesses or approximations. The quality and scope of these guesses may vary and it is critical that an ODC system be able to communicate th… ▽ More

    Submitted 7 June, 2016; originally announced June 2016.

    Comments: Under submission

  10. arXiv:1606.00046  [pdf, other

    cs.DB

    The Exception that Improves the Rule

    Authors: Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Mueller

    Abstract: The database community has developed numerous tools and techniques for data curation and exploration, from declarative languages, to specialized techniques for data repair, and more. Yet, there is currently no consensus on how to best expose these powerful tools to an analyst in a simple, intuitive, and above all, flexible way. Thus, analysts continue to rely on tools such as spreadsheets, imperat… ▽ More

    Submitted 31 May, 2016; originally announced June 2016.

    Comments: Authors in alphabetical order; Preprint for HILDA 2015

  11. arXiv:1601.00073  [pdf, other

    cs.DB cs.PL

    Mimir: Bringing CTables into Practice

    Authors: Arindam Nandi, Ying Yang, Oliver Kennedy, Boris Glavic, Ronny Fehling, Zhen Hua Liu, Dieter Gawlick

    Abstract: The present state of the art in analytics requires high upfront investment of human effort and computational resources to curate datasets, even before the first query is posed. So-called pay-as-you-go data curation techniques allow these high costs to be spread out, first by enabling queries over uncertain and incomplete data, and then by assessing the quality of the query results. We describe the… ▽ More

    Submitted 1 January, 2016; originally announced January 2016.

    Comments: Under submission; The first two authors should be considered a joint first-author

  12. arXiv:1303.4471  [pdf, ps, other

    cs.DB

    BarQL: Collaborating Through Change

    Authors: Oliver Kennedy, Lukasz Ziarek

    Abstract: Applications such as Google Docs, Office 365, and Dropbox show a growing trend towards incorporating multi-user live collaboration functionality into web applications. These collaborative applications share a need to efficiently express shared state, and a common strategy for doing so is a shared log abstraction. Extensive research efforts on log abstractions by the database, programming languages… ▽ More

    Submitted 18 March, 2013; originally announced March 2013.

    Comments: BarQL reference document

  13. arXiv:1207.0137  [pdf, other

    cs.DB

    DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views

    Authors: Yanif Ahmad, Oliver Kennedy, Christoph Koch, Milos Nikolic

    Abstract: Applications ranging from algorithmic trading to scientific data analysis require realtime analytics based on views over databases that change at very high rates. Such views have to be kept fresh at low maintenance cost and latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data.… ▽ More

    Submitted 30 June, 2012; originally announced July 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 10, pp. 968-979 (2012)

  14. arXiv:1008.3551  [pdf, other

    cs.CE

    Inventory Allocation for Online Graphical Display Advertising

    Authors: Jian Yang, Erik Vee, Sergei Vassilvitskii, John Tomlin, Jayavel Shanmugasundaram, Tasos Anastasakos, Oliver Kennedy

    Abstract: We discuss a multi-objective/goal programming model for the allocation of inventory of graphical advertisements. The model considers two types of campaigns: guaranteed delivery (GD), which are sold months in advance, and non-guaranteed delivery (NGD), which are sold using real-time auctions. We investigate various advertiser and publisher objectives such as (a) revenue from the sale of impressions… ▽ More

    Submitted 20 August, 2010; originally announced August 2010.

    Report number: YL-2010-004

  15. arXiv:0810.3227  [pdf, ps, other

    cs.DC cs.DB cs.DS

    Dynamic Approaches to In-Network Aggregation

    Authors: Oliver Kennedy, Christoph Koch, Al Demers

    Abstract: Collaboration between small-scale wireless devices hinges on their ability to infer properties shared across multiple nearby nodes. Wireless-enabled mobile devices in particular create a highly dynamic environment not conducive to distributed reasoning about such global properties. This paper addresses a specific instance of this problem: distributed aggregation. We present extensions to existin… ▽ More

    Submitted 17 October, 2008; originally announced October 2008.