-
FELM: Benchmarking Factuality Evaluation of Large Language Models
Authors:
Shiqi Chen,
Yiran Zhao,
Jinghan Zhang,
I-Chun Chern,
Siyang Gao,
Pengfei Liu,
Junxian He
Abstract:
Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in subst…
▽ More
Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as felm. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g.~information from Wikipedia), felm focuses on factuality across diverse domains, spanning from world knowledge to math and reasoning. Our annotation is based on text segments, which can help pinpoint specific factual errors. The factuality annotations are further supplemented by predefined error types and reference links that either support or contradict the statement. In our experiments, we investigate the performance of several LLM-based factuality evaluators on felm, including both vanilla LLMs and those augmented with retrieval mechanisms and chain-of-thought processes. Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors.
△ Less
Submitted 28 November, 2023; v1 submitted 1 October, 2023;
originally announced October 2023.
-
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Authors:
I-Chun Chern,
Steffi Chern,
Shiqi Chen,
Weizhe Yuan,
Kehua Feng,
Chunting Zhou,
Junxian He,
Graham Neubig,
Pengfei Liu
Abstract:
The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for…
▽ More
The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .
△ Less
Submitted 26 July, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Audio-Visual Speech Enhancement Using Self-supervised Learning to Improve Speech Intelligibility in Cochlear Implant Simulations
Authors:
Richard Lee Lai,
Jen-Cheng Hou,
I-Chun Chern,
Kuo-Hsuan Hung,
Yi-Ting Chen,
Mandar Gogate,
Tughrul Arslan,
Amir Hussain,
Yu Tsao
Abstract:
Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability…
▽ More
Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability of training data for the AVSE task. To address this problem, we propose a novel deep neural network framework termed Self-Supervised Learning-based AVSE (SSL-AVSE). The proposed SSL-AVSE combines visual cues, such as lip and mouth movements, from the target speakers with corresponding audio signals. The contextually combined audio and visual data are then fed into a Transformer-based SSL AV-HuBERT model to extract features, which are further processed using a BLSTM-based SE model. The results demonstrate several key findings. Firstly, SSL-AVSE successfully overcomes the issue of limited data by leveraging the AV-HuBERT model. Secondly, by fine-tuning the AV-HuBERT model parameters for the target SE task, significant performance improvements are achieved. Specifically, there is a notable enhancement in PESQ (Perceptual Evaluation of Speech Quality) from 1.43 to 1.67 and in STOI (Short-Time Objective Intelligibility) from 0.70 to 0.74. Furthermore, the performance of the SSL-AVSE was evaluated using CI vocoded speech to assess the intelligibility for CI users. Comparative experimental outcomes reveal that in the presence of dynamic noises encountered during human conversations, SSL-AVSE exhibits a substantial improvement. The NCM (Normal Correlation Matrix) values indicate an increase of 26.5% to 87.2% compared to the noisy baseline.
△ Less
Submitted 19 March, 2025; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Improving Factuality of Abstractive Summarization via Contrastive Reward Learning
Authors:
I-Chun Chern,
Zhiruo Wang,
Sanjan Das,
Bhavuk Sharma,
Pengfei Liu,
Graham Neubig
Abstract:
Modern abstractive summarization models often generate summaries that contain hallucinated or contradictory information. In this paper, we propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics. Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factu…
▽ More
Modern abstractive summarization models often generate summaries that contain hallucinated or contradictory information. In this paper, we propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics. Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factuality metrics using contrastive reward learning, leading to more factual summaries by human evaluations. This suggests that further advances in learning and evaluation algorithms can feed directly into providing more factual summaries.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings
Authors:
I-Chun Chern,
Kuo-Hsuan Hung,
Yi-Ting Chen,
Tassadaq Hussain,
Mandar Gogate,
Amir Hussain,
Yu Tsao,
Jen-Cheng Hou
Abstract:
AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-moda…
▽ More
AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-modal AV regression tasks, such as audio-visual speech enhancement (AVSE) and audio-visual speech separation (AVSS). In this study, we leveraged the pre-trained AV-HuBERT model followed by an SE module for AVSE and AVSS. Comparative experimental results demonstrate that our proposed model performs better than the state-of-the-art AVSE and traditional audio-only SE models. In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HuBERT can be generalized to audio-visual regression tasks.
△ Less
Submitted 31 May, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Decoding of Quantum Data-Syndrome Codes via Belief Propagation
Authors:
Kao-Yueh Kuo,
I-Chun Chern,
Ching-Yi Lai
Abstract:
Quantum error correction is necessary to protect logical quantum states and operations. However, no meaningful data protection can be made when the syndrome extraction is erroneous due to faulty measurement gates. Quantum data-syndrome (DS) codes are designed to protect the data qubits and syndrome bits concurrently. In this paper, we propose an efficient decoding algorithm for quantum DS codes wi…
▽ More
Quantum error correction is necessary to protect logical quantum states and operations. However, no meaningful data protection can be made when the syndrome extraction is erroneous due to faulty measurement gates. Quantum data-syndrome (DS) codes are designed to protect the data qubits and syndrome bits concurrently. In this paper, we propose an efficient decoding algorithm for quantum DS codes with sparse check matrices. Based on a refined belief propagation (BP) decoding for stabilizer codes, we propose a DS-BP algorithm to handle the quaternary quantum data errors and binary syndrome bit errors. Moreover, a sparse quantum code may inherently be able to handle minor syndrome errors so that fewer redundant syndrome measurements are necessary. We demonstrate this with simulations on a quantum hypergraph-product code.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Ground state patterns and phase transition of spin-1 Bose-Einstein condensates via Γ-convergence theory
Authors:
I-Liang Chern,
Chiu-Fen Chou,
Tien-Tsan Shieh
Abstract:
We develop an analytic theory for the ground state patterns and their phase transitions for spin-1 Bose-Einstein condensates on a bounded domain in the presence of a uniform magnetic field. Within the Thomas-Fermi approximation, these ground state patterns are composed of four basic states: magnetic state, nematic state, two-component state and three-component state, separated by interfaces. A com…
▽ More
We develop an analytic theory for the ground state patterns and their phase transitions for spin-1 Bose-Einstein condensates on a bounded domain in the presence of a uniform magnetic field. Within the Thomas-Fermi approximation, these ground state patterns are composed of four basic states: magnetic state, nematic state, two-component state and three-component state, separated by interfaces. A complete phase diagram of the ground state patterns are found analytically with different quadratic Zeeman energy q and total magnetization M for both ferromagnetic and antiferromagnetic systems. Using the Γ-convergence technique, it is found that the semi-classical limits of these ground states minimize an energy functional which consists of interior interface energy plus a boundary contact energy. As a consequence, the interface between two different basic states has constant mean curvature, and the contact angle between the interface and the boundary obeys Young's relation.
△ Less
Submitted 30 September, 2015; v1 submitted 17 September, 2015;
originally announced September 2015.
-
String-Averaging Expectation-Maximization for Maximum Likelihood Estimation in Emission Tomography
Authors:
E. S. Helou,
Y. Censor,
T. -B. Chen,
I-L. Chern,
Á. R. De Pierro,
M. Jiang,
H. H. -S. Lu
Abstract:
We study the maximum likelihood model in emission tomography and propose a new family of algorithms for its solution, called String-Averaging Expectation-Maximization (SAEM). In the String-Averaging algorithmic regime, the index set of all underlying equations is split into subsets, called "strings," and the algorithm separately proceeds along each string, possibly in parallel. Then, the end-point…
▽ More
We study the maximum likelihood model in emission tomography and propose a new family of algorithms for its solution, called String-Averaging Expectation-Maximization (SAEM). In the String-Averaging algorithmic regime, the index set of all underlying equations is split into subsets, called "strings," and the algorithm separately proceeds along each string, possibly in parallel. Then, the end-points of all strings are averaged to form the next iterate. SAEM algorithms with several strings presents better practical merits than the classical Row-Action Maximum-Likelihood Algorithm (RAMLA). We present numerical experiments showing the effectiveness of the algorithmic scheme in realistic situations. Performance is evaluated from the computational cost and reconstruction quality viewpoints. A complete convergence theory is also provided.
△ Less
Submitted 11 February, 2014;
originally announced February 2014.
-
Phase transition between two-component and three-component ground states of spin-1 Bose-Einstein condensates
Authors:
Liren Lin,
I-Liang Chern
Abstract:
For an antiferromagnetic spin-1 Bose-Einstein condensate under an applied uniform magnetic field, its ground state $(ψ_1,ψ_0,ψ_{-1})$ undergoes a phase transition from a two-component state ($ψ_0 \equiv 0$) to a three-component state ($ψ_j\ne 0$ for all $j$) at a critical value of the magnetic field. This phenomenon has been observed in numerical simulations as well as in experiments. In this pape…
▽ More
For an antiferromagnetic spin-1 Bose-Einstein condensate under an applied uniform magnetic field, its ground state $(ψ_1,ψ_0,ψ_{-1})$ undergoes a phase transition from a two-component state ($ψ_0 \equiv 0$) to a three-component state ($ψ_j\ne 0$ for all $j$) at a critical value of the magnetic field. This phenomenon has been observed in numerical simulations as well as in experiments. In this paper, we provide a mathematical proof based on a simple principle found by the authors: a redistribution of the mass densities between different components will decrease the kinetic energy.
△ Less
Submitted 30 April, 2022; v1 submitted 1 February, 2013;
originally announced February 2013.
-
Proofs of some simplified characterizations of the ground states of spin-1 Bose-Einstein condensates
Authors:
Liren Lin,
I-Liang Chern
Abstract:
We justify some characterizations of the ground states of spin-1 Bose-Einstein condensates exhibited from numerical simulations. For ferromagnetic systems, we show the validity of the single-mode approximation (SMA). For an antiferromagnetic system with nonzero magnetization, we prove the vanishing of the $m_F=0$ component. In the end of the paper some remaining degenerate situations are also disc…
▽ More
We justify some characterizations of the ground states of spin-1 Bose-Einstein condensates exhibited from numerical simulations. For ferromagnetic systems, we show the validity of the single-mode approximation (SMA). For an antiferromagnetic system with nonzero magnetization, we prove the vanishing of the $m_F=0$ component. In the end of the paper some remaining degenerate situations are also discussed. The proofs of the main results are all based on a simple observation, that a redistribution of masses among different components will reduce the kinetic energy.
△ Less
Submitted 14 May, 2012; v1 submitted 3 February, 2011;
originally announced February 2011.