Search | arXiv e-print repository

LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification

Authors: Siyuan Wang, James R. Foulds, Md Osman Gani, Shimei Pan

Abstract: In this paper, we introduce CIBER (Claim Investigation Based on Evidence Retrieval), an extension of the Retrieval-Augmented Generation (RAG) framework designed to identify corroborating and refuting documents as evidence for scientific claim verification. CIBER addresses the inherent uncertainty in Large Language Models (LLMs) by evaluating response consistency across diverse interrogation probes… ▽ More In this paper, we introduce CIBER (Claim Investigation Based on Evidence Retrieval), an extension of the Retrieval-Augmented Generation (RAG) framework designed to identify corroborating and refuting documents as evidence for scientific claim verification. CIBER addresses the inherent uncertainty in Large Language Models (LLMs) by evaluating response consistency across diverse interrogation probes. By focusing on the behavioral analysis of LLMs without requiring access to their internal information, CIBER is applicable to both white-box and black-box models. Furthermore, CIBER operates in an unsupervised manner, enabling easy generalization across various scientific domains. Comprehensive evaluations conducted using LLMs with varying levels of linguistic proficiency reveal CIBER's superior performance compared to conventional RAG approaches. These findings not only highlight the effectiveness of CIBER but also provide valuable insights for future advancements in LLM-based scientific claim verification. △ Less

Submitted 10 March, 2025; originally announced March 2025.

arXiv:2502.07790 [pdf, other]

Can Generative AI be Egalitarian?

Authors: Philip Feldman, James R. Foulds, Shimei Pan

Abstract: The recent explosion of "foundation" generative AI models has been built upon the extensive extraction of value from online sources, often without corresponding reciprocation. This pattern mirrors and intensifies the extractive practices of surveillance capitalism, while the potential for enormous profit has challenged technology organizations' commitments to responsible AI practices, raising sign… ▽ More The recent explosion of "foundation" generative AI models has been built upon the extensive extraction of value from online sources, often without corresponding reciprocation. This pattern mirrors and intensifies the extractive practices of surveillance capitalism, while the potential for enormous profit has challenged technology organizations' commitments to responsible AI practices, raising significant ethical and societal concerns. However, a promising alternative is emerging: the development of models that rely on content willingly and collaboratively provided by users. This article explores this "egalitarian" approach to generative AI, taking inspiration from the successful model of Wikipedia. We explore the potential implications of this approach for the design, development, and constraints of future foundation models. We argue that such an approach is not only ethically sound but may also lead to models that are more responsive to user needs, more diverse in their training data, and ultimately more aligned with societal values. Furthermore, we explore potential challenges and limitations of this approach, including issues of scalability, quality control, and potential biases inherent in volunteer-contributed content. △ Less

Submitted 20 January, 2025; originally announced February 2025.

Comments: 14 pages, 5 figures

ACM Class: K.4.1

Journal ref: October 2024 IEEE Consumer Technology Society (CTSoc) News on Consumer Technology (https://ctsoc.ieee.org/images/CTSOC-NCT-2024-10-FA.pdf)

arXiv:2403.01193 [pdf, other]

RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

Authors: Philip Feldman, James R. Foulds, Shimei Pan

Abstract: Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation… ▽ More Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs. △ Less

Submitted 12 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

Comments: 7 Pages, 1 Figure, 1 Table

ACM Class: H.3.3; I.2.7

arXiv:2402.01663 [pdf, other]

Killer Apps: Low-Speed, Large-Scale AI Weapons

Authors: Philip Feldman, Aaron Dant, James R. Foulds

Abstract: The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid de… ▽ More The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid decision-making in kinetic conflict. However, an equally important but often overlooked aspect is the potential of AI-based psychological manipulation at internet scales within the information domain. These capabilities could pose significant threats to individuals, organizations, and societies globally. This paper explores the concept of AI weapons, their deployment, detection, and potential countermeasures. △ Less

Submitted 17 June, 2024; v1 submitted 14 January, 2024; originally announced February 2024.

Comments: 10 pages with 10 pages of appendices. 3 Figures, 2 code listings

ACM Class: I.2.7; H.4.3; J.4

Journal ref: Workshops at the International Conference on Intelligent User Interfaces (IUI) 2024

arXiv:2306.06085 [pdf, other]

Trapping LLM Hallucinations Using Tagged Context Prompts

Authors: Philip Feldman, James R. Foulds, Shimei Pan

Abstract: Recent advances in large language models (LLMs), such as ChatGPT, have led to highly sophisticated conversation agents. However, these models suffer from "hallucinations," where the model generates false or fabricated information. Addressing this challenge is crucial, particularly with AI-driven platforms being adopted across various sectors. In this paper, we propose a novel method to recognize a… ▽ More Recent advances in large language models (LLMs), such as ChatGPT, have led to highly sophisticated conversation agents. However, these models suffer from "hallucinations," where the model generates false or fabricated information. Addressing this challenge is crucial, particularly with AI-driven platforms being adopted across various sectors. In this paper, we propose a novel method to recognize and flag instances when LLMs perform outside their domain knowledge, and ensuring users receive accurate information. We find that the use of context combined with embedded tags can successfully combat hallucinations within generative language models. To do this, we baseline hallucination frequency in no-context prompt-response pairs using generated URLs as easily-tested indicators of fabricated data. We observed a significant reduction in overall hallucination when context was supplied along with question prompts for tested generative engines. Lastly, we evaluated how placing tags within contexts impacted model responses and were able to eliminate hallucinations in responses with 98.88% effectiveness. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 13 pages, 3 Figures, 2 Tables

ACM Class: I.2.7; K.4.2

arXiv:2301.05198 [pdf, other]

The Keyword Explorer Suite: A Toolkit for Understanding Online Populations

Authors: Philip Feldman, Shimei Pan, James R. Foulds

Abstract: We have developed a set of Python applications that use large language models to identify and analyze data from social media platforms relevant to a population of interest. Our pipeline begins with using OpenAI's GPT-3 to generate potential keywords for identifying relevant text content from the target population. The keywords are then validated, and the content downloaded and analyzed using GPT-3… ▽ More We have developed a set of Python applications that use large language models to identify and analyze data from social media platforms relevant to a population of interest. Our pipeline begins with using OpenAI's GPT-3 to generate potential keywords for identifying relevant text content from the target population. The keywords are then validated, and the content downloaded and analyzed using GPT-3 embedding and manifold reduction. Corpora are then created to fine-tune GPT-2 models to explore latent information via prompt-based queries. These tools allow researchers and practitioners to gain valuable insights into population subgroups online. Source code at https://github.com/pgfeldman/KeywordExplorer △ Less

Submitted 13 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

Comments: 6 pages, 4 figures

ACM Class: H.5.2; H.1.2; I.2.7

arXiv:2209.07044 [pdf, other]

Fair Inference for Discrete Latent Variable Models

Authors: Rashidul Islam, Shimei Pan, James R. Foulds

Abstract: It is now well understood that machine learning models, trained on data without due care, often exhibit unfair and discriminatory behavior against certain populations. Traditional algorithmic fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair… ▽ More It is now well understood that machine learning models, trained on data without due care, often exhibit unfair and discriminatory behavior against certain populations. Traditional algorithmic fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair representation learning of continuous embeddings. In this paper, we conversely focus on unsupervised learning using probabilistic graphical models with discrete latent variables. We develop a fair stochastic variational inference technique for the discrete latent variables, which is accomplished by including a fairness penalty on the variational distribution that aims to respect the principles of intersectionality, a critical lens on fairness from the legal, social science, and humanities literature, and then optimizing the variational parameters under this penalty. We first show the utility of our method in improving equity and fairness for clustering using naïve Bayes and Gaussian mixture models on benchmark datasets. To demonstrate the generality of our approach and its potential for real-world impact, we then develop a special-purpose graphical model for criminal justice risk assessments, and use our fairness approach to prevent the inferences from encoding unfair societal biases. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2204.07483 [pdf, other]

Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models

Authors: Philip Feldman, Aaron Dant, James R. Foulds, Shemei Pan

Abstract: Text analysis of social media for sentiment, topic analysis, and other analysis depends initially on the selection of keywords and phrases that will be used to create the research corpora. However, keywords that researchers choose may occur infrequently, leading to errors that arise from using small samples. In this paper, we use the capacity for memorization, interpolation, and extrapolation of T… ▽ More Text analysis of social media for sentiment, topic analysis, and other analysis depends initially on the selection of keywords and phrases that will be used to create the research corpora. However, keywords that researchers choose may occur infrequently, leading to errors that arise from using small samples. In this paper, we use the capacity for memorization, interpolation, and extrapolation of Transformer Language Models such as the GPT series to learn the linguistic behaviors of a subgroup within larger corpora of Yelp reviews. We then use prompt-based queries to generate synthetic text that can be analyzed to produce insights into specific opinions held by the populations that the models were trained on. Once learned, more specific sentiment queries can be made of the model with high levels of accuracy when compared to traditional keyword searches. We show that even in cases where a specific keyphrase is limited or not present at all in the training corpora, the GPT is able to accurately generate large volumes of text that have the correct sentiment. △ Less

Submitted 19 April, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: 10 pages, 9 figures, 7 tables

ACM Class: K.4.m

arXiv:2105.07996 [pdf, ps, other]

Learning User Embeddings from Temporal Social Media Data: A Survey

Authors: Fatema Hasan, Kevin S. Xu, James R. Foulds, Shimei Pan

Abstract: User-generated data on social media contain rich information about who we are, what we like and how we make decisions. In this paper, we survey representative work on learning a concise latent user representation (a.k.a. user embedding) that can capture the main characteristics of a social media user. The learned user embeddings can later be used to support different downstream user analysis tasks… ▽ More User-generated data on social media contain rich information about who we are, what we like and how we make decisions. In this paper, we survey representative work on learning a concise latent user representation (a.k.a. user embedding) that can capture the main characteristics of a social media user. The learned user embeddings can later be used to support different downstream user analysis tasks such as personality modeling, suicidal risk assessment and purchase decision prediction. The temporal nature of user-generated data on social media has largely been overlooked in much of the existing user embedding literature. In this survey, we focus on research that bridges the gap by incorporating temporal/sequential information in user representation learning. We categorize relevant papers along several key dimensions, identify limitations in the current work and suggest future research directions. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2104.10259 [pdf, other]

Analyzing COVID-19 Tweets with Transformer-based Language Models

Authors: Philip Feldman, Sim Tiwari, Charissa S. L. Cheah, James R. Foulds, Shimei Pan

Abstract: This paper describes a method for using Transformer-based Language Models (TLMs) to understand public opinion from social media posts. In this approach, we train a set of GPT models on several COVID-19 tweet corpora that reflect populations of users with distinctive views. We then use prompt-based queries to probe these models to reveal insights into the biases and opinions of the users. We demons… ▽ More This paper describes a method for using Transformer-based Language Models (TLMs) to understand public opinion from social media posts. In this approach, we train a set of GPT models on several COVID-19 tweet corpora that reflect populations of users with distinctive views. We then use prompt-based queries to probe these models to reveal insights into the biases and opinions of the users. We demonstrate how this approach can be used to produce results which resemble polling the public on diverse social, political and public health issues. The results on the COVID-19 tweet data show that transformer language models are promising tools that can help us understand public opinions on social media at scale. △ Less

Submitted 5 May, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: Six pages, six tables, four figures

ACM Class: J.4; I.2.7

arXiv:2010.06820 [pdf, other]

Equitable Allocation of Healthcare Resources with Fair Cox Models

Authors: Kamrun Naher Keya, Rashidul Islam, Shimei Pan, Ian Stockwell, James R. Foulds

Abstract: Healthcare programs such as Medicaid provide crucial services to vulnerable populations, but due to limited resources, many of the individuals who need these services the most languish on waiting lists. Survival models, e.g. the Cox proportional hazards model, can potentially improve this situation by predicting individuals' levels of need, which can then be used to prioritize the waiting lists. P… ▽ More Healthcare programs such as Medicaid provide crucial services to vulnerable populations, but due to limited resources, many of the individuals who need these services the most languish on waiting lists. Survival models, e.g. the Cox proportional hazards model, can potentially improve this situation by predicting individuals' levels of need, which can then be used to prioritize the waiting lists. Providing care to those in need can prevent institutionalization for those individuals, which both improves quality of life and reduces overall costs. While the benefits of such an approach are clear, care must be taken to ensure that the prioritization process is fair or independent of demographic information-based harmful stereotypes. In this work, we develop multiple fairness definitions for survival models and corresponding fair Cox proportional hazards models to ensure equitable allocation of healthcare resources. We demonstrate the utility of our methods in terms of fairness and predictive accuracy on two publicly available survival datasets. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: AAAI Fall Symposium on AI in Government and Public Sector (AAAI FSS-20), 2020

arXiv:1909.04702 [pdf, other]

Neural Embedding Allocation: Distributed Representations of Topic Models

Authors: Kamrun Naher Keya, Yannis Papanikolaou, James R. Foulds

Abstract: Word embedding models such as the skip-gram learn vector representations of words' semantic relationships, and document embedding models learn similar representations for documents. On the other hand, topic models provide latent representations of the documents' topical themes. To get the benefits of these representations simultaneously, we propose a unifying algorithm, called neural embedding all… ▽ More Word embedding models such as the skip-gram learn vector representations of words' semantic relationships, and document embedding models learn similar representations for documents. On the other hand, topic models provide latent representations of the documents' topical themes. To get the benefits of these representations simultaneously, we propose a unifying algorithm, called neural embedding allocation (NEA), which deconstructs topic models into interpretable vector-space embeddings of words, topics, documents, authors, and so on, by learning neural embeddings to mimic the topic models. We showcase NEA's effectiveness and generality on LDA, author-topic models and the recently proposed mixed membership skip gram topic model and achieve better performance with the embeddings compared to several state-of-the-art models. Furthermore, we demonstrate that using NEA to smooth out the topics improves coherence scores over the original topic models when the number of topics is large. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Showing 1–12 of 12 results for author: Foulds, J R