-
Enhancing Factual Accuracy and Citation Generation in LLMs via Multi-Stage Self-Verification
Authors:
Fernando Gabriela García,
Qiyang Shi,
Zilin Feng
Abstract:
This research introduces VeriFact-CoT (Verified Factual Chain-of-Thought), a novel method designed to address the pervasive issues of hallucination and the absence of credible citation sources in Large Language Models (LLMs) when generating complex, fact-sensitive content. By incorporating a multi-stage mechanism of 'fact verification-reflection-citation integration,' VeriFact-CoT empowers LLMs to…
▽ More
This research introduces VeriFact-CoT (Verified Factual Chain-of-Thought), a novel method designed to address the pervasive issues of hallucination and the absence of credible citation sources in Large Language Models (LLMs) when generating complex, fact-sensitive content. By incorporating a multi-stage mechanism of 'fact verification-reflection-citation integration,' VeriFact-CoT empowers LLMs to critically self-examine and revise their intermediate reasoning steps and final answers. This process significantly enhances the objective accuracy, trustworthiness, and traceability of the generated outputs, making LLMs more reliable for applications demanding high fidelity such as scientific research, news reporting, and legal consultation.
△ Less
Submitted 6 September, 2025;
originally announced September 2025.
-
Unlocking Compositional Control: Self-Supervision for LVLM-Based Image Generation
Authors:
Fernando Gabriela Garcia,
Spencer Burns,
Ryan Shaw,
Hunter Young
Abstract:
This paper introduces Hierarchical Self-Supervised LVLM (Hi-SSLVLM), a novel generative model designed to significantly advance text-to-image synthesis, particularly for complex and compositionally challenging prompts. Traditional methods often grapple with the high cost of meticulously curated paired image-text datasets and struggle with precise control over fine-grained visual attributes and int…
▽ More
This paper introduces Hierarchical Self-Supervised LVLM (Hi-SSLVLM), a novel generative model designed to significantly advance text-to-image synthesis, particularly for complex and compositionally challenging prompts. Traditional methods often grapple with the high cost of meticulously curated paired image-text datasets and struggle with precise control over fine-grained visual attributes and intricate spatial relationships. Our Hi-SSLVLM addresses these limitations through a unique two-stage self-supervised learning strategy. The first stage, Multi-Granularity Visual-Language Grounding, enables the Large Vision-Language Model (LVLM) backbone to autonomously generate and align hierarchical captions (global and local) to images, cultivating a deep internal semantic understanding without reliance on extensive human annotation. The second stage, Self-Refinement and Guided Image Generation, leverages this acquired knowledge by an Internal Compositional Planning (ICP) mechanism, where the LVLM first formulates detailed textual sub-prompts to guide the image generation process, complemented by a novel Semantic Consistency Loss for precise output alignment. Comprehensive experiments against leading baselines, including Janus-Pro-1B, Stable Diffusion XL 1.0, DeepFloyd IF v1.0, and ControlNet-XL, on multi-dimensional benchmarks such as Gemini-2.0-Flash and InternVL3-78B, demonstrate Hi-SSLVLM's superior performance across all fine-grained metrics. An in-depth ablation study confirms the critical role of each proposed component. Furthermore, human evaluations corroborate our quantitative findings, highlighting Hi-SSLVLM's enhanced fidelity to prompt, compositional accuracy, and overall aesthetic quality, marking a significant step towards more controllable and semantically consistent open-ended text-to-image generation.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning
Authors:
Harrison Fuller,
Fernando Gabriela Garcia,
Victor Flores
Abstract:
Few-shot learning in medical image classification presents a significant challenge due to the limited availability of annotated data and the complex nature of medical imagery. In this work, we propose Adaptive Vision-Language Fine-tuning with Hierarchical Contrastive Alignment (HiCA), a novel framework that leverages the capabilities of Large Vision-Language Models (LVLMs) for medical image analys…
▽ More
Few-shot learning in medical image classification presents a significant challenge due to the limited availability of annotated data and the complex nature of medical imagery. In this work, we propose Adaptive Vision-Language Fine-tuning with Hierarchical Contrastive Alignment (HiCA), a novel framework that leverages the capabilities of Large Vision-Language Models (LVLMs) for medical image analysis. HiCA introduces a two-stage fine-tuning strategy, combining domain-specific pretraining and hierarchical contrastive learning to align visual and textual representations at multiple levels. We evaluate our approach on two benchmark datasets, Chest X-ray and Breast Ultrasound, achieving state-of-the-art performance in both few-shot and zero-shot settings. Further analyses demonstrate the robustness, generalizability, and interpretability of our method, with substantial improvements in performance compared to existing baselines. Our work highlights the potential of hierarchical contrastive strategies in adapting LVLMs to the unique challenges of medical imaging tasks.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Leveraging Large Language Models for Comparative Literature Summarization with Reflective Incremental Mechanisms
Authors:
Fernando Gabriela Garcia,
Spencer Burns,
Harrison Fuller
Abstract:
In this paper, we introduce ChatCite, a novel method leveraging large language models (LLMs) for generating comparative literature summaries. The ability to summarize research papers with a focus on key comparisons between studies is an essential task in academic research. Existing summarization models, while effective at generating concise summaries, fail to provide deep comparative insights. Cha…
▽ More
In this paper, we introduce ChatCite, a novel method leveraging large language models (LLMs) for generating comparative literature summaries. The ability to summarize research papers with a focus on key comparisons between studies is an essential task in academic research. Existing summarization models, while effective at generating concise summaries, fail to provide deep comparative insights. ChatCite addresses this limitation by incorporating a multi-step reasoning mechanism that extracts critical elements from papers, incrementally builds a comparative summary, and refines the output through a reflective memory process. We evaluate ChatCite on a custom dataset, CompLit-LongContext, consisting of 1000 research papers with annotated comparative summaries. Experimental results show that ChatCite outperforms several baseline methods, including GPT-4, BART, T5, and CoT, across various automatic evaluation metrics such as ROUGE and the newly proposed G-Score. Human evaluation further confirms that ChatCite generates more coherent, insightful, and fluent summaries compared to these baseline models. Our method provides a significant advancement in automatic literature review generation, offering researchers a powerful tool for efficiently comparing and synthesizing scientific research.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.