Skip to main content

Showing 1–3 of 3 results for author: Marcus, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.07014  [pdf, other

    stat.ML cs.DB cs.LG

    A Practical Theory of Generalization in Selectivity Learning

    Authors: Peizhi Wu, Haoshu Xu, Ryan Marcus, Zachary G. Ives

    Abstract: Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist substantial gaps between practical solutions and state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework. In this paper, we a… ▽ More

    Submitted 7 May, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 15 pages. Technical Report (Extended Version)

  2. arXiv:2006.05265  [pdf, other

    cs.LG cs.SE stat.ML

    MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

    Authors: Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

    Abstract: Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses… ▽ More

    Submitted 2 June, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: arXiv admin note: text overlap with arXiv:2003.11118

  3. arXiv:2003.09758  [pdf, other

    cs.LG cs.DB stat.ML

    ARDA: Automatic Relational Data Augmentation for Machine Learning

    Authors: Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, David Karger

    Abstract: Automatic machine learning (\AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the machine learning pipeline like model selection, hyperparameter tuning, and feature selection, relatively few works have focused on automatic data augmen… ▽ More

    Submitted 21 March, 2020; originally announced March 2020.