Skip to main content

Showing 1–11 of 11 results for author: Hilprecht, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.13581  [pdf, other

    cs.DB cs.AI

    SPARE: A Single-Pass Neural Model for Relational Databases

    Authors: Benjamin Hilprecht, Kristian Kersting, Carsten Binnig

    Abstract: While there has been extensive work on deep neural networks for images and text, deep learning for relational databases (RDBs) is still a rather unexplored field. One direction that recently gained traction is to apply Graph Neural Networks (GNNs) to RBDs. However, training GNNs on large relational databases (i.e., data stored in multiple database tables) is rather inefficient due to multiple ro… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  2. arXiv:2305.15321  [pdf, other

    cs.DB cs.CL

    Towards Foundation Models for Relational Databases [Vision Paper]

    Authors: Liane Vogel, Benjamin Hilprecht, Carsten Binnig

    Abstract: Tabular representation learning has recently gained a lot of attention. However, existing approaches only learn a representation from a single table, and thus ignore the potential to learn from the full structure of relational databases, including neighboring tables that can contain important information for a contextualized representation. Moreover, current models are significantly limited in sca… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at the Tabular Representation Learning Workshop at NeurIPS 2022 (TRL@NeurIPS2022)

  3. arXiv:2207.01269  [pdf, other

    cs.DB cs.LG

    DiffML: End-to-end Differentiable ML Pipelines

    Authors: Benjamin Hilprecht, Christian Hammacher, Eduardo Reis, Mohamed Abdelaal, Carsten Binnig

    Abstract: In this paper, we present our vision of differentiable ML pipelines called DiffML to automate the construction of ML pipelines in an end-to-end fashion. The idea is that DiffML allows to jointly train not just the ML model itself but also the entire pipeline including data preprocessing steps, e.g., data cleaning, feature selection, etc. Our core idea is to formulate all pipeline steps in a differ… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

  4. arXiv:2203.14144  [pdf, other

    cs.DB cs.CL

    Demonstrating CAT: Synthesizing Data-Aware Conversational Agents for Transactional Databases

    Authors: Marius Gassen, Benjamin Hättasch, Benjamin Hilprecht, Nadja Geisler, Alexander Fraser, Carsten Binnig

    Abstract: Databases for OLTP are often the backbone for applications such as hotel room or cinema ticket booking applications. However, developing a conversational agent (i.e., a chatbot-like interface) to allow end-users to interact with an application using natural language requires both immense amounts of training data and NLP expertise. This motivates CAT, which can be used to easily create conversation… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

    Comments: Submitted as demonstration proposal to VLDB 2022

  5. arXiv:2201.00561  [pdf, other

    cs.DB cs.AI

    Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction

    Authors: Benjamin Hilprecht, Carsten Binnig

    Abstract: In this paper, we introduce zero-shot cost models which enable learned cost estimation that generalizes to unseen databases. In contrast to state-of-the-art workload-driven approaches which require to execute a large set of training queries on every new database, zero-shot cost models thus allow to instantiate a learned cost model out-of-the-box without expensive training data collection. To enabl… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

  6. arXiv:2105.12457  [pdf, other

    cs.DB

    ReStore -- Neural Data Completion for Relational Databases

    Authors: Benjamin Hilprecht, Carsten Binnig

    Abstract: Classical approaches for OLAP assume that the data of all tables is complete. However, in case of incomplete tables with missing tuples, classical approaches fail since the result of a SQL aggregate query might significantly differ from the results computed on the full dataset. Today, the only way to deal with missing data is to manually complete the dataset which causes not only high efforts but… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  7. arXiv:2105.00642  [pdf, other

    cs.DB cs.AI

    One Model to Rule them All: Towards Zero-Shot Learning for Databases

    Authors: Benjamin Hilprecht, Carsten Binnig

    Abstract: In this paper, we present our vision of so called zero-shot learning for databases which is a new learning approach for database components. Zero-shot learning for databases is inspired by recent advances in transfer learning of models such as GPT-3 and can support a new database out-of-the box without the need to train a new model. Furthermore, it can easily be extended to few-shot learning by fu… ▽ More

    Submitted 3 January, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

  8. arXiv:1909.00607  [pdf, other

    cs.DB

    DeepDB: Learn from Data, not from Queries!

    Authors: Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, Carsten Binnig

    Abstract: The typical approach for learned DBMS components is to capture the behavior by running a representative set of queries and use the observations to train a machine learning model. This workload-driven approach, however, has two major downsides. First, collecting the training data can be very expensive, since all queries need to be executed on potentially large databases. Second, training data has t… ▽ More

    Submitted 2 September, 2019; originally announced September 2019.

  9. arXiv:1906.03006  [pdf, other

    cs.CR cs.LG

    Reconstruction and Membership Inference Attacks against Generative Models

    Authors: Benjamin Hilprecht, Martin Härterich, Daniel Bernau

    Abstract: We present two information leakage attacks that outperform previous work on membership inference against generative models. The first attack allows membership inference without assumptions on the type of the generative model. Contrary to previous evaluation metrics for generative models, like Kernel Density Estimation, it only considers samples of the model which are close to training data records… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  10. arXiv:1904.01279  [pdf, other

    cs.DB

    Learning a Partitioning Advisor with Deep Reinforcement Learning

    Authors: Benjamin Hilprecht, Carsten Binnig, Uwe Roehm

    Abstract: Commercial data analytics products such as Microsoft Azure SQL Data Warehouse or Amazon Redshift provide ready-to-use scale-out database solutions for OLAP-style workloads in the cloud. While the provisioning of a database cluster is usually fully automated by cloud providers, customers typically still have to make important design decisions which were traditionally made by the database administra… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

  11. arXiv:1811.06224  [pdf, other

    cs.DB cs.LG

    Model-based Approximate Query Processing

    Authors: Moritz Kulessa, Alejandro Molina, Carsten Binnig, Benjamin Hilprecht, Kristian Kersting

    Abstract: Interactive visualizations are arguably the most important tool to explore, understand and convey facts about data. In the past years, the database community has been working on different techniques for Approximate Query Processing (AQP) that aim to deliver an approximate query result given a fixed time bound to support interactive visualizations better. However, classical AQP approaches suffer fr… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.