Skip to main content

Showing 1–3 of 3 results for author: Davoudi, S P M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.20758  [pdf, ps, other

    stat.AP cs.AI cs.CL

    Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth

    Authors: Seyed Pouyan Mousavi Davoudi, Amin Gholami Davodi, Alireza Amiri-Margavi, Mahdi Jafari

    Abstract: We introduce a new approach in which several advanced large language models-specifically GPT-4-0125-preview, Meta-LLAMA-3-70B-Instruct, Claude-3-Opus, and Gemini-1.5-Flash-collaborate to both produce and answer intricate, doctoral-level probability problems without relying on any single "correct" reference. Rather than depending on an established ground truth, our investigation focuses on how agre… ▽ More

    Submitted 26 June, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: 7pages

  2. arXiv:2411.16797  [pdf, other

    cs.CL cs.AI

    Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models

    Authors: Alireza Amiri-Margavi, Iman Jebellat, Ehsan Jebellat, Seyed Pouyan Mousavi Davoudi

    Abstract: We propose a collaborative framework in which multiple large language models -- including GPT-4-0125-preview, Meta-LLaMA-3-70B-Instruct, Claude-3-Opus, and Gemini-1.5-Flash -- generate and answer complex, PhD-level statistical questions when definitive ground truth is unavailable. Our study examines how inter-model consensus improves both response reliability and identifies the quality of the gene… ▽ More

    Submitted 23 February, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: 14 pages, 2 figures

  3. arXiv:2406.05194  [pdf, other

    cs.CL cs.AI cs.LG

    LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

    Authors: Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour

    Abstract: Large language models (LLMs) demonstrate impressive capabilities in mathematical reasoning. However, despite these achievements, current evaluations are mostly limited to specific mathematical topics, and it remains unclear whether LLMs are genuinely engaging in reasoning. To address these gaps, we present the Mathematical Topics Tree (MaTT) benchmark, a challenging and structured benchmark that o… ▽ More

    Submitted 29 March, 2025; v1 submitted 7 June, 2024; originally announced June 2024.