Skip to main content

Showing 1–4 of 4 results for author: Everaert, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.14572  [pdf, other

    cs.LG cs.CL

    Towards Knowledge Checking in Retrieval-augmented Generation: A Representation Perspective

    Authors: Shenglai Zeng, Jiankun Zhang, Bingheng Li, Yuping Lin, Tianqi Zheng, Dante Everaert, Hanqing Lu, Hui Liu, Hui Liu, Yue Xing, Monica Xiao Cheng, Jiliang Tang

    Abstract: Retrieval-Augmented Generation (RAG) systems have shown promise in enhancing the performance of Large Language Models (LLMs). However, these systems face challenges in effectively integrating external knowledge with the LLM's internal knowledge, often leading to issues with misleading or unhelpful information. This work aims to provide a systematic study on knowledge checking in RAG systems. We co… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  2. arXiv:2411.04129  [pdf, other

    cs.IR cs.AI cs.LG

    AmazonQAC: A Large-Scale, Naturalistic Query Autocomplete Dataset

    Authors: Dante Everaert, Rohit Patki, Tianqi Zheng, Christopher Potts

    Abstract: Query Autocomplete (QAC) is a critical feature in modern search engines, facilitating user interaction by predicting search queries based on input prefixes. Despite its widespread adoption, the absence of large-scale, realistic datasets has hindered advancements in QAC system development. This paper addresses this gap by introducing AmazonQAC, a new QAC dataset sourced from Amazon Search logs, com… ▽ More

    Submitted 22 October, 2024; originally announced November 2024.

    Comments: EMNLP 2024

  3. arXiv:2410.11655  [pdf, other

    cs.CL cs.AI

    Retrieval Augmented Spelling Correction for E-Commerce Applications

    Authors: Xuan Guo, Rohit Patki, Dante Everaert, Christopher Potts

    Abstract: The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling. We seek to address this challenge via Retrieval Augmented Generation (RAG). On this approach, product names are retrieved from a catalog and incorporated into the c… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  4. arXiv:2306.11670  [pdf, other

    cs.LG cs.AI

    GIO: Gradient Information Optimization for Training Dataset Selection

    Authors: Dante Everaert, Christopher Potts

    Abstract: It is often advantageous to train models on a subset of the available train examples, because the examples are of variable quality or because one would like to train with fewer examples, without sacrificing performance. We present Gradient Information Optimization (GIO), a scalable, task-agnostic approach to this data selection problem that requires only a small set of (unlabeled) examples represe… ▽ More

    Submitted 26 July, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 Spotlight paper