Skip to main content

Showing 1–6 of 6 results for author: Rausch, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.09118  [pdf, other

    cs.LG cs.CV

    DSG: An End-to-End Document Structure Generator

    Authors: Johannes Rausch, Gentiana Rashiti, Maxim Gusev, Ce Zhang, Stefan Feuerriegel

    Abstract: Information in industry, research, and the public sector is widely stored as rendered documents (e.g., PDF files, scans). Hence, to enable downstream tasks, systems are needed that map rendered documents onto a structured hierarchical format. However, existing systems for this task are limited by heuristics and are not end-to-end trainable. In this work, we introduce the Document Structure Generat… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted at ICDM 2023

  2. arXiv:2201.01654  [pdf, other

    cs.CV

    TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets

    Authors: Susie Xi Rao, Johannes Rausch, Peter Egger, Ce Zhang

    Abstract: Tables have been an ever-existing structure to store data. There exist now different approaches to store tabular data physically. PDFs, images, spreadsheets, and CSVs are leading examples. Being able to parse table structures and extract content bounded by these structures is of high importance in many applications. In this paper, we devise TableParser, a system capable of parsing tables in both n… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

    Comments: accepted in the AAAI-22 Workshop on Scientific Document Understanding at the Thirty-Sixth AAAI Conference on Artificial Intelligence (SDU@AAAI-22)

  3. arXiv:2010.09818  [pdf, other

    cs.LG stat.ML

    Online Active Model Selection for Pre-trained Classifiers

    Authors: Mohammad Reza Karimi, Nezihe Merve Gürel, Bojan Karlaš, Johannes Rausch, Ce Zhang, Andreas Krause

    Abstract: Given $k$ pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide when to query a label so that we can distinguish the best model from the rest while making a small number of queries? Answering this question has a profound impact on a range of practical scenarios. In this work, we design an online selective sampling approach that actively selects informative exa… ▽ More

    Submitted 17 April, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

  4. arXiv:2009.06192  [pdf, other

    cs.LG cs.CY stat.ML

    A Principled Approach to Data Valuation for Federated Learning

    Authors: Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, Dawn Song

    Abstract: Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources. In order to sustain long-term participation of data owners, it is important to fairly appraise each data source and compensate data owners for their contribution to the training process. The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data v… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

  5. arXiv:1911.01702  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    DocParser: Hierarchical Structure Parsing of Document Renderings

    Authors: Johannes Rausch, Octavio Martinez, Fabian Bissig, Ce Zhang, Stefan Feuerriegel

    Abstract: Translating renderings (e. g. PDFs, scans) into hierarchical document structures is extensively demanded in the daily routines of many real-world applications. However, a holistic, principled approach to inferring the complete hierarchical structure of documents is missing. As a remedy, we developed "DocParser": an end-to-end system for parsing the complete document structure - including all text… ▽ More

    Submitted 25 January, 2021; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: AAAI 2021

  6. Living in Parallel Realities -- Co-Existing Schema Versions with a Bidirectional Database Evolution Language

    Authors: Kai Herrmann, Hannes Voigt, Andreas Behrend, Jonas Rausch, Wolfgang Lehner

    Abstract: We introduce end-to-end support of co-existing schema versions within one database. While it is state of the art to run multiple versions of a continuously developed application concurrently, it is hard to do the same for databases. In order to keep multiple co-existing schema versions alive; which are all accessing the same data set; developers usually employ handwritten delta code (e.g. views an… ▽ More

    Submitted 19 September, 2017; v1 submitted 19 August, 2016; originally announced August 2016.