Search | arXiv e-print repository

FELM: Benchmarking Factuality Evaluation of Large Language Models

Authors: Shiqi Chen, Yiran Zhao, Jinghan Zhang, I-Chun Chern, Siyang Gao, Pengfei Liu, Junxian He

Abstract: Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in subst… ▽ More Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as felm. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g.~information from Wikipedia), felm focuses on factuality across diverse domains, spanning from world knowledge to math and reasoning. Our annotation is based on text segments, which can help pinpoint specific factual errors. The factuality annotations are further supplemented by predefined error types and reference links that either support or contradict the statement. In our experiments, we investigate the performance of several LLM-based factuality evaluators on felm, including both vanilla LLMs and those augmented with retrieval mechanisms and chain-of-thought processes. Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors. △ Less

Submitted 28 November, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023 Track on Datasets and Benchmarks

arXiv:2307.13528 [pdf, other]

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Authors: I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu

Abstract: The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for… ▽ More The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool . △ Less

Submitted 26 July, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.07748 [pdf, other]

Audio-Visual Speech Enhancement Using Self-supervised Learning to Improve Speech Intelligibility in Cochlear Implant Simulations

Authors: Richard Lee Lai, Jen-Cheng Hou, I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Mandar Gogate, Tughrul Arslan, Amir Hussain, Yu Tsao

Abstract: Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability… ▽ More Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability of training data for the AVSE task. To address this problem, we propose a novel deep neural network framework termed Self-Supervised Learning-based AVSE (SSL-AVSE). The proposed SSL-AVSE combines visual cues, such as lip and mouth movements, from the target speakers with corresponding audio signals. The contextually combined audio and visual data are then fed into a Transformer-based SSL AV-HuBERT model to extract features, which are further processed using a BLSTM-based SE model. The results demonstrate several key findings. Firstly, SSL-AVSE successfully overcomes the issue of limited data by leveraging the AV-HuBERT model. Secondly, by fine-tuning the AV-HuBERT model parameters for the target SE task, significant performance improvements are achieved. Specifically, there is a notable enhancement in PESQ (Perceptual Evaluation of Speech Quality) from 1.43 to 1.67 and in STOI (Short-Time Objective Intelligibility) from 0.70 to 0.74. Furthermore, the performance of the SSL-AVSE was evaluated using CI vocoded speech to assess the intelligibility for CI users. Comparative experimental outcomes reveal that in the presence of dynamic noises encountered during human conversations, SSL-AVSE exhibits a substantial improvement. The NCM (Normal Correlation Matrix) values indicate an increase of 26.5% to 87.2% compared to the noisy baseline. △ Less

Submitted 19 March, 2025; v1 submitted 15 July, 2023; originally announced July 2023.

arXiv:2307.04507 [pdf, other]

Improving Factuality of Abstractive Summarization via Contrastive Reward Learning

Authors: I-Chun Chern, Zhiruo Wang, Sanjan Das, Bhavuk Sharma, Pengfei Liu, Graham Neubig

Abstract: Modern abstractive summarization models often generate summaries that contain hallucinated or contradictory information. In this paper, we propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics. Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factu… ▽ More Modern abstractive summarization models often generate summaries that contain hallucinated or contradictory information. In this paper, we propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics. Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factuality metrics using contrastive reward learning, leading to more factual summaries by human evaluations. This suggests that further advances in learning and evaluation algorithms can feed directly into providing more factual summaries. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: TrustNLP @ ACL 2023

arXiv:2210.17456 [pdf, other]

Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings

Authors: I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou

Abstract: AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-moda… ▽ More AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-modal AV regression tasks, such as audio-visual speech enhancement (AVSE) and audio-visual speech separation (AVSS). In this study, we leveraged the pre-trained AV-HuBERT model followed by an SE module for AVSE and AVSS. Comparative experimental results demonstrate that our proposed model performs better than the state-of-the-art AVSE and traditional audio-only SE models. In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HuBERT can be generalized to audio-visual regression tasks. △ Less

Submitted 31 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: ICASSP AMHAT 2023

arXiv:2102.01984 [pdf, other]

doi 10.1109/ISIT45174.2021.9518018

Decoding of Quantum Data-Syndrome Codes via Belief Propagation

Authors: Kao-Yueh Kuo, I-Chun Chern, Ching-Yi Lai

Abstract: Quantum error correction is necessary to protect logical quantum states and operations. However, no meaningful data protection can be made when the syndrome extraction is erroneous due to faulty measurement gates. Quantum data-syndrome (DS) codes are designed to protect the data qubits and syndrome bits concurrently. In this paper, we propose an efficient decoding algorithm for quantum DS codes wi… ▽ More Quantum error correction is necessary to protect logical quantum states and operations. However, no meaningful data protection can be made when the syndrome extraction is erroneous due to faulty measurement gates. Quantum data-syndrome (DS) codes are designed to protect the data qubits and syndrome bits concurrently. In this paper, we propose an efficient decoding algorithm for quantum DS codes with sparse check matrices. Based on a refined belief propagation (BP) decoding for stabilizer codes, we propose a DS-BP algorithm to handle the quaternary quantum data errors and binary syndrome bit errors. Moreover, a sparse quantum code may inherently be able to handle minor syndrome errors so that fewer redundant syndrome measurements are necessary. We demonstrate this with simulations on a quantum hypergraph-product code. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Journal ref: in Proc. IEEE International Symposium on Information Theory (ISIT), 2021, pp. 1552--1557

arXiv:1509.05182 [pdf, other]

Ground state patterns and phase transition of spin-1 Bose-Einstein condensates via Γ-convergence theory

Authors: I-Liang Chern, Chiu-Fen Chou, Tien-Tsan Shieh

Abstract: We develop an analytic theory for the ground state patterns and their phase transitions for spin-1 Bose-Einstein condensates on a bounded domain in the presence of a uniform magnetic field. Within the Thomas-Fermi approximation, these ground state patterns are composed of four basic states: magnetic state, nematic state, two-component state and three-component state, separated by interfaces. A com… ▽ More We develop an analytic theory for the ground state patterns and their phase transitions for spin-1 Bose-Einstein condensates on a bounded domain in the presence of a uniform magnetic field. Within the Thomas-Fermi approximation, these ground state patterns are composed of four basic states: magnetic state, nematic state, two-component state and three-component state, separated by interfaces. A complete phase diagram of the ground state patterns are found analytically with different quadratic Zeeman energy q and total magnetization M for both ferromagnetic and antiferromagnetic systems. Using the Γ-convergence technique, it is found that the semi-classical limits of these ground states minimize an energy functional which consists of interior interface energy plus a boundary contact energy. As a consequence, the interface between two different basic states has constant mean curvature, and the contact angle between the interface and the boundary obeys Young's relation. △ Less

Submitted 30 September, 2015; v1 submitted 17 September, 2015; originally announced September 2015.

arXiv:1402.2455 [pdf, other]

doi 10.1088/0266-5611/30/5/055003

String-Averaging Expectation-Maximization for Maximum Likelihood Estimation in Emission Tomography

Authors: E. S. Helou, Y. Censor, T. -B. Chen, I-L. Chern, Á. R. De Pierro, M. Jiang, H. H. -S. Lu

Abstract: We study the maximum likelihood model in emission tomography and propose a new family of algorithms for its solution, called String-Averaging Expectation-Maximization (SAEM). In the String-Averaging algorithmic regime, the index set of all underlying equations is split into subsets, called "strings," and the algorithm separately proceeds along each string, possibly in parallel. Then, the end-point… ▽ More We study the maximum likelihood model in emission tomography and propose a new family of algorithms for its solution, called String-Averaging Expectation-Maximization (SAEM). In the String-Averaging algorithmic regime, the index set of all underlying equations is split into subsets, called "strings," and the algorithm separately proceeds along each string, possibly in parallel. Then, the end-points of all strings are averaged to form the next iterate. SAEM algorithms with several strings presents better practical merits than the classical Row-Action Maximum-Likelihood Algorithm (RAMLA). We present numerical experiments showing the effectiveness of the algorithmic scheme in realistic situations. Performance is evaluated from the computational cost and reconstruction quality viewpoints. A complete convergence theory is also provided. △ Less

Submitted 11 February, 2014; originally announced February 2014.

arXiv:1302.0279 [pdf, ps, other]

Phase transition between two-component and three-component ground states of spin-1 Bose-Einstein condensates

Authors: Liren Lin, I-Liang Chern

Abstract: For an antiferromagnetic spin-1 Bose-Einstein condensate under an applied uniform magnetic field, its ground state $(ψ_1,ψ_0,ψ_{-1})$ undergoes a phase transition from a two-component state ($ψ_0 \equiv 0$) to a three-component state ($ψ_j\ne 0$ for all $j$) at a critical value of the magnetic field. This phenomenon has been observed in numerical simulations as well as in experiments. In this pape… ▽ More For an antiferromagnetic spin-1 Bose-Einstein condensate under an applied uniform magnetic field, its ground state $(ψ_1,ψ_0,ψ_{-1})$ undergoes a phase transition from a two-component state ($ψ_0 \equiv 0$) to a three-component state ($ψ_j\ne 0$ for all $j$) at a critical value of the magnetic field. This phenomenon has been observed in numerical simulations as well as in experiments. In this paper, we provide a mathematical proof based on a simple principle found by the authors: a redistribution of the mass densities between different components will decrease the kinetic energy. △ Less

Submitted 30 April, 2022; v1 submitted 1 February, 2013; originally announced February 2013.

Comments: 23 pages

MSC Class: 35Q55; 47J30; 49K20

arXiv:1102.0832 [pdf, ps, other]

Proofs of some simplified characterizations of the ground states of spin-1 Bose-Einstein condensates

Authors: Liren Lin, I-Liang Chern

Abstract: We justify some characterizations of the ground states of spin-1 Bose-Einstein condensates exhibited from numerical simulations. For ferromagnetic systems, we show the validity of the single-mode approximation (SMA). For an antiferromagnetic system with nonzero magnetization, we prove the vanishing of the $m_F=0$ component. In the end of the paper some remaining degenerate situations are also disc… ▽ More We justify some characterizations of the ground states of spin-1 Bose-Einstein condensates exhibited from numerical simulations. For ferromagnetic systems, we show the validity of the single-mode approximation (SMA). For an antiferromagnetic system with nonzero magnetization, we prove the vanishing of the $m_F=0$ component. In the end of the paper some remaining degenerate situations are also discussed. The proofs of the main results are all based on a simple observation, that a redistribution of masses among different components will reduce the kinetic energy. △ Less

Submitted 14 May, 2012; v1 submitted 3 February, 2011; originally announced February 2011.

Comments: 13 pages (include bibliography)

MSC Class: 35Q55; 47J30; 49K20

Journal ref: Discrete Contin. Dyn. Syst. Ser. B 19 (2014), no. 4, 1119-1128

Showing 1–10 of 10 results for author: Chern, I