Skip to main content

Showing 1–11 of 11 results for author: Minhas, U F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.03560  [pdf, other

    cs.CL cs.AI cs.LG

    KG-TRICK: Unifying Textual and Relational Information Completion of Knowledge for Multilingual Knowledge Graphs

    Authors: Zelin Zhou, Simone Conia, Daniel Lee, Min Li, Shenglei Huang, Umar Farooq Minhas, Saloni Potdar, Henry Xiao, Yunyao Li

    Abstract: Multilingual knowledge graphs (KGs) provide high-quality relational and textual information for various NLP applications, but they are often incomplete, especially in non-English languages. Previous research has shown that combining information from KGs in different languages aids either Knowledge Graph Completion (KGC), the task of predicting missing relations between entities, or Knowledge Graph… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: Camera ready for COLING 2025

  2. arXiv:2411.00970  [pdf, other

    cs.DB cs.AI cs.LG

    Incremental IVF Index Maintenance for Streaming Vector Search

    Authors: Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Umar Farooq Minhas, Jeffery Pound, Cedric Renggli, Nima Reyhani, Ihab F. Ilyas, Theodoros Rekatsinas, Shivaram Venkataraman

    Abstract: The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existing vector search indexes degrade in search quality and performance as the underlying data is updated… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: 14 pages, 14 figures

  3. arXiv:2410.14057  [pdf, other

    cs.CL cs.AI

    Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

    Authors: Simone Conia, Daniel Lee, Min Li, Umar Farooq Minhas, Saloni Potdar, Yunyao Li

    Abstract: Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, t… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Accepted at EMNLP 2024

  4. arXiv:2404.01626  [pdf, other

    cs.CL cs.IR

    Entity Disambiguation via Fusion Entity Decoding

    Authors: Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li

    Abstract: Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training a… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL'24 main

  5. arXiv:2311.15781  [pdf, other

    cs.AI cs.CL cs.LG

    Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

    Authors: Simone Conia, Min Li, Daniel Lee, Umar Farooq Minhas, Ihab Ilyas, Yunyao Li

    Abstract: Recent work in Natural Language Processing and Computer Vision has been using textual information -- e.g., entity names and descriptions -- available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Camera ready for EMNLP 2023

  6. Growing and Serving Large Open-domain Knowledge Graphs

    Authors: Ihab F. Ilyas, JP Lacerda, Yunyao Li, Umar Farooq Minhas, Ali Mousavi, Jeffrey Pound, Theodoros Rekatsinas, Chiraag Sumanth

    Abstract: Applications of large open-domain knowledge graphs (KGs) to real-world problems pose many unique challenges. In this paper, we present extensions to Saga our platform for continuous construction and serving of knowledge at scale. In particular, we describe a pipeline for training knowledge graph embeddings that powers key capabilities such as fact ranking, fact verification, a related entities ser… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: To be published in SIGMOD 2023

  7. arXiv:2304.01926  [pdf

    cs.DB cs.AI cs.LG

    High-Throughput Vector Similarity Search in Knowledge Graphs

    Authors: Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, Theodoros Rekatsinas

    Abstract: There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries a… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: 13 pages, 7 figures, to be published in ACM SIGMOD 2023

  8. arXiv:2111.14905  [pdf, other

    cs.DB cs.LG

    Bounding the Last Mile: Efficient Learned String Indexing

    Authors: Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, Tim Kraska

    Abstract: We introduce the RadixStringSpline (RSS) learned index structure for efficiently indexing strings. RSS is a tree of radix splines each indexing a fixed number of bytes. RSS approaches or exceeds the performance of traditional string indexes while using 7-70$\times$ less memory. RSS achieves this by using the minimal string prefix to sufficiently distinguish the data unlike most learned approaches… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: 3rd International Workshop on Applied AI for Database Systems and Applications (AIDB'21), August 20, 2021, Copenhagen, Denmark

  9. APEX: A High-Performance Learned Index on Persistent Memory

    Authors: Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, Tianzheng Wang

    Abstract: The recently released persistent memory (PM) offers high performance, persistence, and is cheaper than DRAM. This opens up new possibilities for indexes that operate and persist data directly on the memory bus. Recent learned indexes exploit data distribution and have shown great potential for some workloads. However, none support persistence or instant recovery, and existing PM-based indexes typi… ▽ More

    Submitted 6 December, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear at VLDB 2022 (PVLDB Vol. 15 Issue 3)

  10. arXiv:2004.10898  [pdf, other

    cs.DB cs.DS cs.LG

    Qd-tree: Learning Data Layouts for Big Data Analytics

    Authors: Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, Rajeev Acharya

    Abstract: Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: ACM SIGMOD 2020

  11. arXiv:1905.08898  [pdf, other

    cs.DB cs.DS cs.LG

    ALEX: An Updatable Adaptive Learned Index

    Authors: Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, Tim Kraska

    Abstract: Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory… ▽ More

    Submitted 20 May, 2020; v1 submitted 21 May, 2019; originally announced May 2019.

    Report number: MSR-TR-2020-12