-
Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications
Authors:
Fadel M. Megahed,
Ying-Ju Chen,
L. Allision Jones-Farmer,
Younghwa Lee,
Jiawei Brooke Wang,
Inez M. Zwetsloot
Abstract:
This study introduces a framework for evaluating consistency in large language model (LLM) binary text classification, addressing the lack of established reliability assessment methods. Adapting psychometric principles, we determine sample size requirements, develop metrics for invalid responses, and evaluate intra- and inter-rater reliability. Our case study examines financial news sentiment clas…
▽ More
This study introduces a framework for evaluating consistency in large language model (LLM) binary text classification, addressing the lack of established reliability assessment methods. Adapting psychometric principles, we determine sample size requirements, develop metrics for invalid responses, and evaluate intra- and inter-rater reliability. Our case study examines financial news sentiment classification across 14 LLMs (including claude-3-7-sonnet, gpt-4o, deepseek-r1, gemma3, llama3.2, phi4, and command-r-plus), with five replicates per model on 1,350 articles. Models demonstrated high intra-rater consistency, achieving perfect agreement on 90-98% of examples, with minimal differences between expensive and economical models from the same families. When validated against StockNewsAPI labels, models achieved strong performance (accuracy 0.76-0.88), with smaller models like gemma3:1B, llama3.2:3B, and claude-3-5-haiku outperforming larger counterparts. All models performed at chance when predicting actual market movements, indicating task constraints rather than model limitations. Our framework provides systematic guidance for LLM selection, sample size planning, and reliability assessment, enabling organizations to optimize resources for classification tasks.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Adapting OpenAI's CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples
Authors:
Fadel M. Megahed,
Ying-Ju Chen,
Bianca Maria Colosimo,
Marco Luigi Giuseppe Grasso,
L. Allison Jones-Farmer,
Sven Knoth,
Hongyue Sun,
Inez Zwetsloot
Abstract:
This expository paper introduces a simplified approach to image-based quality inspection in manufacturing using OpenAI's CLIP (Contrastive Language-Image Pretraining) model adapted for few-shot learning. While CLIP has demonstrated impressive capabilities in general computer vision tasks, its direct application to manufacturing inspection presents challenges due to the domain gap between its train…
▽ More
This expository paper introduces a simplified approach to image-based quality inspection in manufacturing using OpenAI's CLIP (Contrastive Language-Image Pretraining) model adapted for few-shot learning. While CLIP has demonstrated impressive capabilities in general computer vision tasks, its direct application to manufacturing inspection presents challenges due to the domain gap between its training data and industrial applications. We evaluate CLIP's effectiveness through five case studies: metallic pan surface inspection, 3D printing extrusion profile analysis, stochastic textured surface evaluation, automotive assembly inspection, and microstructure image classification. Our results show that CLIP can achieve high classification accuracy with relatively small learning sets (50-100 examples per class) for single-component and texture-based applications. However, the performance degrades with complex multi-component scenes. We provide a practical implementation framework that enables quality engineers to quickly assess CLIP's suitability for their specific applications before pursuing more complex solutions. This work establishes CLIP-based few-shot learning as an effective baseline approach that balances implementation simplicity with robust performance, demonstrated in several manufacturing quality control applications.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
ChatISA: A Prompt-Engineered, In-House Multi-Modal Generative AI Chatbot for Information Systems Education
Authors:
Fadel M. Megahed,
Ying-Ju Chen,
Joshua A. Ferris,
Cameron Resatar,
Kaitlyn Ross,
Younghwa Lee,
L. Allison Jones-Farmer
Abstract:
As generative AI ('GenAI') continues to evolve, educators face the challenge of preparing students for a future where AI-assisted work is integral to professional success. This paper introduces ChatISA, an in-house, multi-model AI chatbot designed to support students and faculty in an Information Systems and Analytics (ISA) department. ChatISA comprises four primary modules: Coding Companion, Proj…
▽ More
As generative AI ('GenAI') continues to evolve, educators face the challenge of preparing students for a future where AI-assisted work is integral to professional success. This paper introduces ChatISA, an in-house, multi-model AI chatbot designed to support students and faculty in an Information Systems and Analytics (ISA) department. ChatISA comprises four primary modules: Coding Companion, Project Coach, Exam Ally, and Interview Mentor, each tailored to enhance different aspects of the educational experience. Through iterative development, student feedback, and leveraging open-source frameworks, we created a robust tool that addresses coding inquiries, project management, exam preparation, and interview readiness. The implementation of ChatISA provided valuable insights and highlighted key challenges. Our findings demonstrate the benefits of ChatISA for ISA education while underscoring the need for adaptive pedagogy and proactive engagement with AI tools to fully harness their educational potential. To support broader adoption and innovation, all code for ChatISA is made publicly available on GitHub, enabling other institutions to customize and integrate similar AI-driven educational tools within their curricula.
△ Less
Submitted 16 May, 2025; v1 submitted 13 June, 2024;
originally announced July 2024.
-
Introducing ChatSQC: Enhancing Statistical Quality Control with Augmented AI
Authors:
Fadel M. Megahed,
Ying-Ju Chen,
Inez Zwetsloot,
Sven Knoth,
Douglas C. Montgomery,
L. Allison Jones-Farmer
Abstract:
We introduce ChatSQC, an innovative chatbot system that combines the power of OpenAI's Large Language Models (LLM) with a specific knowledge base in Statistical Quality Control (SQC). Our research focuses on enhancing LLMs using specific SQC references, shedding light on how data preprocessing parameters and LLM selection impact the quality of generated responses. By illustrating this process, we…
▽ More
We introduce ChatSQC, an innovative chatbot system that combines the power of OpenAI's Large Language Models (LLM) with a specific knowledge base in Statistical Quality Control (SQC). Our research focuses on enhancing LLMs using specific SQC references, shedding light on how data preprocessing parameters and LLM selection impact the quality of generated responses. By illustrating this process, we hope to motivate wider community engagement to refine LLM design and output appraisal techniques. We also highlight potential research opportunities within the SQC domain that can be facilitated by leveraging ChatSQC, thereby broadening the application spectrum of SQC. A primary goal of our work is to provide a template and proof-of-concept on how LLMs can be utilized by our community. To continuously improve ChatSQC, we ask the SQC community to provide feedback, highlight potential issues, request additional features, and/or contribute via pull requests through our public GitHub repository. Additionally, the team will continue to explore adding supplementary reference material that would further improve the contextual understanding of the chatbot. Overall, ChatSQC serves as a testament to the transformative potential of AI within SQC, and we hope it will spur further advancements in the integration of AI in this field.
△ Less
Submitted 28 March, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
How Generative AI models such as ChatGPT can be (Mis)Used in SPC Practice, Education, and Research? An Exploratory Study
Authors:
Fadel M. Megahed,
Ying-Ju Chen,
Joshua A. Ferris,
Sven Knoth,
L. Allison Jones-Farmer
Abstract:
Generative Artificial Intelligence (AI) models such as OpenAI's ChatGPT have the potential to revolutionize Statistical Process Control (SPC) practice, learning, and research. However, these tools are in the early stages of development and can be easily misused or misunderstood. In this paper, we give an overview of the development of Generative AI. Specifically, we explore ChatGPT's ability to pr…
▽ More
Generative Artificial Intelligence (AI) models such as OpenAI's ChatGPT have the potential to revolutionize Statistical Process Control (SPC) practice, learning, and research. However, these tools are in the early stages of development and can be easily misused or misunderstood. In this paper, we give an overview of the development of Generative AI. Specifically, we explore ChatGPT's ability to provide code, explain basic concepts, and create knowledge related to SPC practice, learning, and research. By investigating responses to structured prompts, we highlight the benefits and limitations of the results. Our study indicates that the current version of ChatGPT performs well for structured tasks, such as translating code from one language to another and explaining well-known concepts but struggles with more nuanced tasks, such as explaining less widely known terms and creating code from scratch. We find that using new AI tools may help practitioners, educators, and researchers to be more efficient and productive. However, in their current stages of development, some results are misleading and wrong. Overall, the use of generative AI models in SPC must be properly validated and used in conjunction with other methods to ensure accurate results.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
FA*IR: A Fair Top-k Ranking Algorithm
Authors:
Meike Zehlike,
Francesco Bonchi,
Carlos Castillo,
Sara Hajian,
Mohamed Megahed,
Ricardo Baeza-Yates
Abstract:
In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n >> k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proport…
▽ More
In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n >> k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proportion of protected candidates in every prefix of the top-k ranking remains statistically above or indistinguishable from a given minimum.
Utility is operationalized in two ways: (i) every candidate included in the top-$k$ should be more qualified than every candidate not included; and (ii) for every pair of candidates in the top-k, the more qualified candidate should be ranked above. An efficient algorithm is presented for producing the Fair Top-k Ranking, and tested experimentally on existing datasets as well as new datasets released with this paper, showing that our approach yields small distortions with respect to rankings that maximize utility without considering fairness criteria.
To the best of our knowledge, this is the first algorithm grounded in statistical tests that can mitigate biases in the representation of an under-represented group along a ranked list.
△ Less
Submitted 2 July, 2018; v1 submitted 20 June, 2017;
originally announced June 2017.