Skip to main content

Showing 1–5 of 5 results for author: Colak, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07309  [pdf, other

    cs.CL

    ConfQA: Answer Only If You Are Confident

    Authors: Yin Huang, Yifan Ethan Xu, Kai Sun, Vera Yan, Alicia Sun, Haidar Khan, Jimmy Nguyen, Mohammad Kachuee, Zhaojiang Lin, Yue Liu, Aaron Colak, Anuj Kumar, Wen-tau Yih, Xin Luna Dong

    Abstract: Can we teach Large Language Models (LLMs) to refrain from hallucinating factual statements? In this paper we present a fine-tuning strategy that we call ConfQA, which can reduce hallucination rate from 20-40% to under 5% across multiple factuality benchmarks. The core idea is simple: when the LLM answers a question correctly, it is trained to continue with the answer; otherwise, it is trained to a… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 10 pages main content, 10 pages appendix, 5 figures, 7 tables

  2. arXiv:2502.12458  [pdf, other

    cs.CL

    An Empirical Evaluation of Encoder Architectures for Fast Real-Time Long Conversational Understanding

    Authors: Annamalai Senthilnathan, Kristjan Arumae, Mohammed Khalilia, Zhengzheng Xing, Aaron R. Colak

    Abstract: Analyzing long text data such as customer call transcripts is a cost-intensive and tedious task. Machine learning methods, namely Transformers, are leveraged to model agent-customer interactions. Unfortunately, Transformers adhere to fixed-length architectures and their self-attention mechanism scales quadratically with input length. Such limitations make it challenging to leverage traditional Tra… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  3. arXiv:2309.05619  [pdf, other

    cs.CL

    Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP

    Authors: Wei Du, Laksh Advani, Yashmeet Gambhir, Daniel J Perry, Prashant Shiralkar, Zhengzheng Xing, Aaron Colak

    Abstract: Large language models (LLMs) have demonstrated significant capability to generalize across a large number of NLP tasks. For industry applications, it is imperative to assess the performance of the LLM on unlabeled production data from time to time to validate for a real-world setting. Human labeling to assess model error requires considerable expense and time delay. Here we demonstrate that ensemb… ▽ More

    Submitted 19 November, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Camera ready version for 2023 EMNLP (The Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM))

  4. arXiv:2211.15927  [pdf, ps, other

    cs.CL cs.LG

    Compressing Cross-Lingual Multi-Task Models at Qualtrics

    Authors: Daniel Campos, Daniel Perry, Samir Joshi, Yashmeet Gambhir, Wei Du, Zhengzheng Xing, Aaron Colak

    Abstract: Experience management is an emerging business area where organizations focus on understanding the feedback of customers and employees in order to improve their end-to-end experiences. This results in a unique set of machine learning problems to help understand how people feel, discover issues they care about, and find which actions need to be taken on data that are different in content and distrib… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: accepted to IAAI-23 (part of AAAI-23)

    ACM Class: I.2.7

  5. arXiv:1811.12276  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Hospital Mortality Prediction with Medical Named Entities and Multimodal Learning

    Authors: Mengqi Jin, Mohammad Taha Bahadori, Aaron Colak, Parminder Bhatia, Busra Celikkaya, Ram Bhakta, Selvan Senthivel, Mohammed Khalilia, Daniel Navarro, Borui Zhang, Tiberiu Doman, Arun Ravi, Matthieu Liger, Taha Kass-hout

    Abstract: Clinical text provides essential information to estimate the acuity of a patient during hospital stays in addition to structured clinical data. In this study, we explore how clinical text can complement a clinical predictive learning task. We leverage an internal medical natural language processing service to perform named entity extraction and negation detection on clinical notes and compose sele… ▽ More

    Submitted 3 December, 2018; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216