Skip to main content

Showing 1–10 of 10 results for author: Amann, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.21329  [pdf, ps, other

    cs.IR cs.AI cs.CL cs.DB cs.LG

    Something's Fishy In The Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks

    Authors: Allaa Boutaleb, Bernd Amann, Hubert Naacke, Rafael Angarita

    Abstract: Recent table representation learning and data discovery methods tackle table union search (TUS) within data lakes, which involves identifying tables that can be unioned with a given query table to enrich its content. These methods are commonly evaluated using benchmarks that aim to assess semantic understanding in real-world TUS tasks. However, our analysis of prominent TUS benchmarks reveals seve… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted @ ACL 2025's Table Representation Learning Workshop (TRL)

  2. arXiv:2306.02221  [pdf, other

    cs.IR cs.AI

    ATEM: A Topic Evolution Model for the Detection of Emerging Topics in Scientific Archives

    Authors: Hamed Rahimi, Hubert Naacke, Camelia Constantin, Bernd Amann

    Abstract: This paper presents ATEM, a novel framework for studying topic evolution in scientific archives. ATEM is based on dynamic topic modeling and dynamic graph embedding techniques that explore the dynamics of content and citations of documents within a scientific corpus. ATEM explores a new notion of contextual emergence for the discovery of emerging interdisciplinary research topics based on the dyna… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  3. arXiv:2305.14587  [pdf, other

    cs.CL cs.IR

    Contextualized Topic Coherence Metrics

    Authors: Hamed Rahimi, Jacob Louis Hoover, David Mimno, Hubert Naacke, Camelia Constantin, Bernd Amann

    Abstract: The recent explosion in work on neural topic modeling has been criticized for optimizing automated topic evaluation metrics at the expense of actual meaningful topic identification. But human annotation remains expensive and time-consuming. We propose LLM-based methods inspired by standard human topic evaluations, in a family of metrics called Contextualized Topic Coherence (CTC). We evaluate both… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  4. arXiv:2302.01501  [pdf, other

    cs.IR cs.AI cs.LG cs.NE cs.SI

    ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics

    Authors: Hamed Rahimi, Hubert Naacke, Camelia Constantin, Bernd Amann

    Abstract: This paper presents an algorithmic family of dynamic topic models called Aligned Neural Topic Models (ANTM), which combine novel data mining algorithms to provide a modular framework for discovering evolving topics. ANTM maintains the temporal continuity of evolving topics by extracting time-aware features from documents using advanced pre-trained Large Language Models (LLMs) and employing an over… ▽ More

    Submitted 4 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  5. arXiv:2111.13927  [pdf, other

    cs.DB

    Controlling the Correctness of Aggregation Operations During Sessions of Interactive Analytic Queries

    Authors: Eric Simon, Bernd Amann, Rutian Liu, Stéphane Gançarski

    Abstract: We present a comprehensive set of conditions and rules to control the correctness of aggregation queries within an interactive data analysis session. The goal is to extend self-service data preparation and BI tools to automatically detect semantically incorrect aggregate queries on analytic tables and views built by using the common analytic operations including filter, project, join, aggregate, u… ▽ More

    Submitted 6 December, 2021; v1 submitted 27 November, 2021; originally announced November 2021.

    Comments: 58 pages, 23 figures

  6. arXiv:1907.00050  [pdf, other

    cs.DC cs.DB cs.PF

    State-of-the-Art on Query & Transaction Processing Acceleration

    Authors: Bernd Amann, Youry Khmelevsky, Gaetan Hains

    Abstract: The vast amount of processing power and memory bandwidth provided by modern Graphics Processing Units (GPUs) make them a platform for data-intensive applications. The database community identified GPUs as effective co-processors for data processing. In the past years, there were many approaches to make use of GPUs at different levels of a database system. In this Internal Technical Report, based o… ▽ More

    Submitted 26 June, 2019; originally announced July 2019.

    Comments: 7 pages, 4 tables

  7. arXiv:1610.06500  [pdf, other

    cs.DB

    Continuous Top-k Queries over Real-Time Web Streams

    Authors: Nelly Vouzoukidou, Bernd Amann, Vassilis Christophides

    Abstract: The Web has become a large-scale real-time information system forcing us to revise both how to effectively assess relevance of information for a user and how to efficiently implement information retrieval and dissemination functionality. To increase information relevance, Real-time Web applications such as Twitter and Facebook, extend content and social-graph relevance scores with "real-time" user… ▽ More

    Submitted 20 October, 2016; originally announced October 2016.

  8. arXiv:1604.08903  [pdf, other

    cs.DB

    SPARQL query processing with Apache Spark

    Authors: Hubert Naacke, Olivier Curé, Bernd Amann

    Abstract: The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted to various "big data" problems. Query processing is one of them and needs to be efficiently addressed with executions over scalable, highly available and fault tolerant frameworks. Data management systems requiring these propertie… ▽ More

    Submitted 3 November, 2016; v1 submitted 29 April, 2016; originally announced April 2016.

    Comments: 13 pages (ACM 2 columns format), 10 figures

  9. arXiv:1510.03409  [pdf, other

    cs.DB

    LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs

    Authors: Olivier Curé, Hubert Naacke, Tendry Randriamalala, Bernd Amann

    Abstract: The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various "big data" problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, su… ▽ More

    Submitted 12 October, 2015; originally announced October 2015.

    Comments: 8 pages, 1 figure

  10. arXiv:1507.02321  [pdf, other

    cs.DB

    On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

    Authors: Olivier Curé, Hubert Naacke, Mohamed-Amine Baazizi, Bernd Amann

    Abstract: Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental resul… ▽ More

    Submitted 8 July, 2015; originally announced July 2015.

    Comments: 16 pages, 3 figures