-
Can LLMs $\textit{understand}$ Math? -- Exploring the Pitfalls in Mathematical Reasoning
Authors:
Tiasa Singha Roy,
Aditeya Baral,
Ayush Rajesh Jhaveri,
Yusuf Baig
Abstract:
Large language models (LLMs) demonstrate considerable potential in various natural language tasks but face significant challenges in mathematical reasoning, particularly in executing precise, multi-step logic. However, current evaluation frameworks judge their performance solely based on accuracy, which only accounts for the final answer. This study explores these pitfalls by employing a novel eva…
▽ More
Large language models (LLMs) demonstrate considerable potential in various natural language tasks but face significant challenges in mathematical reasoning, particularly in executing precise, multi-step logic. However, current evaluation frameworks judge their performance solely based on accuracy, which only accounts for the final answer. This study explores these pitfalls by employing a novel evaluation framework. We propose an evaluation metric called the MAPLE score, which holistically quantifies reasoning misalignment by integrating error rates, redundancy, and validity.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Anti-Diffusion in an Algae-Bacteria Microcosm: Photosynthesis, Chemotaxis, and Expulsion
Authors:
Praneet Prakash,
Yasa Baig,
Francois J. Peaudecerf,
Raymond E. Goldstein
Abstract:
In Nature there are significant relationships known between microorganisms from two kingdoms of life, as in the supply of vitamin B$_{12}$ by bacteria to algae. Such interactions motivate general investigations into the spatio-temporal dynamics of metabolite exchanges. Here we study by experiment and theory a model system: a coculture of the bacterium $B. subtilis$, an obligate aerobe that is chem…
▽ More
In Nature there are significant relationships known between microorganisms from two kingdoms of life, as in the supply of vitamin B$_{12}$ by bacteria to algae. Such interactions motivate general investigations into the spatio-temporal dynamics of metabolite exchanges. Here we study by experiment and theory a model system: a coculture of the bacterium $B. subtilis$, an obligate aerobe that is chemotactic to oxygen, and a nonmotile mutant of the alga $C. reinhardtii$, which photosynthetically produces oxygen when illuminated. Strikingly, when a shaft of light illuminates a thin, initially uniform suspension of the two, the chemotactic influx of bacteria to the photosynthetically active region leads to expulsion of the algae from that area. This effect arises from algal transport due to spatially-varying collective behavior of bacteria, and is mathematically related to the ``turbulent diamagnetism" associated with magnetic flux expulsion in stars.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Multitask Learning for Citation Purpose Classification
Authors:
Alex Oesterling,
Angikar Ghosal,
Haoyang Yu,
Rui Xin,
Yasa Baig,
Lesia Semenova,
Cynthia Rudin
Abstract:
We present our entry into the 2021 3C Shared Task Citation Context Classification based on Purpose competition. The goal of the competition is to classify a citation in a scientific article based on its purpose. This task is important because it could potentially lead to more comprehensive ways of summarizing the purpose and uses of scientific articles, but it is also difficult, mainly due to the…
▽ More
We present our entry into the 2021 3C Shared Task Citation Context Classification based on Purpose competition. The goal of the competition is to classify a citation in a scientific article based on its purpose. This task is important because it could potentially lead to more comprehensive ways of summarizing the purpose and uses of scientific articles, but it is also difficult, mainly due to the limited amount of available training data in which the purposes of each citation have been hand-labeled, along with the subjectivity of these labels. Our entry in the competition is a multi-task model that combines multiple modules designed to handle the problem from different perspectives, including hand-generated linguistic features, TF-IDF features, and an LSTM-with-attention model. We also provide an ablation study and feature analysis whose insights could lead to future work.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.