Search | arXiv e-print repository

arXiv:2504.19254 [pdf, other]

Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

Authors: Dylan Bouchard, Mohit Singh Chauhan

Abstract: Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a… ▽ More Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs. △ Less

Submitted 30 April, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

Comments: UQLM repository: https://github.com/cvs-health/uqlm

arXiv:2501.03112 [pdf, other]

doi 10.21105/joss.07570

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases

Authors: Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Viren Bajaj, Zeya Ahmad

Abstract: Large Language Models (LLMs) have been observed to exhibit bias in numerous ways, potentially creating or worsening outcomes for specific groups identified by protected attributes such as sex, race, sexual orientation, or age. To help address this gap, we introduce LangFair, an open-source Python package that aims to equip LLM practitioners with the tools to evaluate bias and fairness risks releva… ▽ More Large Language Models (LLMs) have been observed to exhibit bias in numerous ways, potentially creating or worsening outcomes for specific groups identified by protected attributes such as sex, race, sexual orientation, or age. To help address this gap, we introduce LangFair, an open-source Python package that aims to equip LLM practitioners with the tools to evaluate bias and fairness risks relevant to their specific use cases. The package offers functionality to easily generate evaluation datasets, comprised of LLM responses to use-case-specific prompts, and subsequently calculate applicable metrics for the practitioner's use case. To guide in metric selection, LangFair offers an actionable decision framework. △ Less

Submitted 6 January, 2025; originally announced January 2025.

Comments: Journal of Open Source Software; LangFair repository: https://github.com/cvs-health/langfair

Journal ref: Journal of Open Source Software, 10(105), 7570 (2025)

arXiv:2409.04480 [pdf]

Asymmetric Bidirectional Quantum Teleportation: Arbitrary bi-modal Information State

Authors: Ankita Pathak, Madan Singh Chauhan, Ravi S. Singh

Abstract: Optical coherent states are experimentally realizable continuous variable quantum states of which preparation by lasers, as well as its manipulation and monitoring by linear optical gadgets are well established. We propose a strategy to send an arbitrary superposition of four-component bimodal entangled coherent states from a sender to a receiver who, simultaneously, tries to transmit an unknown S… ▽ More Optical coherent states are experimentally realizable continuous variable quantum states of which preparation by lasers, as well as its manipulation and monitoring by linear optical gadgets are well established. We propose a strategy to send an arbitrary superposition of four-component bimodal entangled coherent states from a sender to a receiver who, simultaneously, tries to transmit an unknown Schrodinger Cat coherent state to sender via employing a cluster consisting of three superposition of two component bimodal entangled coherent states as the quantum channel and utilizing linear optical gadgets. Heralded detection of photons in laboratories of sender and receiver followed by classical communications of even and odd number of photons and local unitary operations, impeccably, accomplishes simultaneous faithful asymmetric bidirectional quantum teleportation with one eighth of probability of success. It is seen that not all detection events implement the protocol and, therefore, one has to locally apply displacement operator, a necessary evil. We analyze near faithful partial asymmetric bidirectional quantum teleportation and associated probability of success therein. We demonstrated that, for an intense coherent optical field, fidelity approach unity. △ Less

Submitted 26 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

Comments: 30 pages;Figs: 1-5; Appendices: A and B; Tables(1-6); To be submitted for possible publication

arXiv:2301.05691 [pdf]

Structural phase transitions in perovskite BaCeO3 with data mining and first-principles theoretical calculations

Authors: Farha Naaz, Manendra S. Chauhan, Kedar Yadav, Surender Singh, Ashok Kumar, Dasari L. V. K. Prasad

Abstract: Several experiments conducted over decades have revealed that the perovskite-structured BaCeO3 goes through a series of temperature-induced structural phase transitions. However, it has been frequently observed that the number of phases and the sequence in which they appear as a function of temperature differ between experiments. Insofar as neutron diffraction and Raman spectroscopy experiments ar… ▽ More Several experiments conducted over decades have revealed that the perovskite-structured BaCeO3 goes through a series of temperature-induced structural phase transitions. However, it has been frequently observed that the number of phases and the sequence in which they appear as a function of temperature differ between experiments. Insofar as neutron diffraction and Raman spectroscopy experiments are concern, four structures are well characterized with three transitions: Pnma to Imma [563 K] to R-3c [673 K] to Pm-3m [1173 K]. In contrast, thermoanalytical methods showed multiple singularities corresponding to at-least three more structural transitions at around 830 K, 900 K, and 1030 K. In account of these conflicting experimental findings, we computed free energy phase diagram for BaCeO3 employing crystal structure data mining in conjunction with first principles electronic structure and phonon lattice dynamics. A total of 34 polymorphs have been predicted, the most stable of which follows the Glazer classification of the perovskite tilt system. It has been predicted that the Cmcm and P4/mbm phases surpass Pnma at 666 K and 1210 K, respectively. At any temperature, two alternate tetragonal phases (P42/nmc and I4/mcm) are also found to be 20 to 30 meV less favored than the Pnma. While the calculated stability order of the predicted polymorphs is in acceptable agreement with the results of neutron diffraction, the transitions observed in thermoanalytical studies could be ascribed to the development of four novel phases (Cmcm, P4/mbm, P42/nmc, and I4/mcm) at intermediate temperatures. However, we analyze that the R-3c phase predominantly stabilized over a broad temperature field, masking all subsequent phases up until the cubic Pm-3m. Consequently, the novel phases predicted to occur in thermoanalytical studies are only fleetingly metastable. △ Less

Submitted 30 November, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: 20 pages, 5 figures, 1 table

arXiv:2104.08741 [pdf, other]

CEAR: Cross-Entity Aware Reranker for Knowledge Base Completion

Authors: Keshav Kolluru, Mayank Singh Chauhan, Yatin Nandwani, Parag Singla, Mausam

Abstract: Pre-trained language models (LMs) like BERT have shown to store factual knowledge about the world. This knowledge can be used to augment the information present in Knowledge Bases, which tend to be incomplete. However, prior attempts at using BERT for task of Knowledge Base Completion (KBC) resulted in performance worse than embedding based techniques that rely only on the graph structure. In this… ▽ More Pre-trained language models (LMs) like BERT have shown to store factual knowledge about the world. This knowledge can be used to augment the information present in Knowledge Bases, which tend to be incomplete. However, prior attempts at using BERT for task of Knowledge Base Completion (KBC) resulted in performance worse than embedding based techniques that rely only on the graph structure. In this work we develop a novel model, Cross-Entity Aware Reranker (CEAR), that uses BERT to re-rank the output of existing KBC models with cross-entity attention. Unlike prior work that scores each entity independently, CEAR uses BERT to score the entities together, which is effective for exploiting its factual knowledge. CEAR achieves a new state of art for the OLPBench dataset. △ Less

Submitted 27 January, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: We found a bug in the code that invalidates the reported results for FB15k-237 and WN18RR. The results for OLPBench hold the same. We are in process of updating the paper

arXiv:1901.06358 [pdf, other]

Embedded CNN based vehicle classification and counting in non-laned road traffic

Authors: Mayank Singh Chauhan, Arshdeep Singh, Mansi Khemka, Arneish Prateek, Rijurekha Sen

Abstract: Classifying and counting vehicles in road traffic has numerous applications in the transportation engineering domain. However, the wide variety of vehicles (two-wheelers, three-wheelers, cars, buses, trucks etc.) plying on roads of developing regions without any lane discipline, makes vehicle classification and counting a hard problem to automate. In this paper, we use state of the art Convolution… ▽ More Classifying and counting vehicles in road traffic has numerous applications in the transportation engineering domain. However, the wide variety of vehicles (two-wheelers, three-wheelers, cars, buses, trucks etc.) plying on roads of developing regions without any lane discipline, makes vehicle classification and counting a hard problem to automate. In this paper, we use state of the art Convolutional Neural Network (CNN) based object detection models and train them for multiple vehicle classes using data from Delhi roads. We get upto 75% MAP on an 80-20 train-test split using 5562 video frames from four different locations. As robust network connectivity is scarce in developing regions for continuous video transmissions from the road to cloud servers, we also evaluate the latency, energy and hardware cost of embedded implementations of our CNN model based inferences. △ Less

Submitted 18 January, 2019; originally announced January 2019.

Comments: *These authors contributed equally

Showing 1–6 of 6 results for author: Chauhan, M S