Search | arXiv e-print repository

Can LLMs $\textit{understand}$ Math? -- Exploring the Pitfalls in Mathematical Reasoning

Authors: Tiasa Singha Roy, Aditeya Baral, Ayush Rajesh Jhaveri, Yusuf Baig

Abstract: Large language models (LLMs) demonstrate considerable potential in various natural language tasks but face significant challenges in mathematical reasoning, particularly in executing precise, multi-step logic. However, current evaluation frameworks judge their performance solely based on accuracy, which only accounts for the final answer. This study explores these pitfalls by employing a novel eva… ▽ More Large language models (LLMs) demonstrate considerable potential in various natural language tasks but face significant challenges in mathematical reasoning, particularly in executing precise, multi-step logic. However, current evaluation frameworks judge their performance solely based on accuracy, which only accounts for the final answer. This study explores these pitfalls by employing a novel evaluation framework. We propose an evaluation metric called the MAPLE score, which holistically quantifies reasoning misalignment by integrating error rates, redundancy, and validity. △ Less

Submitted 21 May, 2025; originally announced May 2025.

arXiv:2505.12587 [pdf, ps, other]

CMLFormer: A Dual Decoder Transformer with Switching Point Learning for Code-Mixed Language Modeling

Authors: Aditeya Baral, Allen George Ajith, Roshan Nayak, Mrityunjay Abhijeet Bhanja

Abstract: Code-mixed languages, characterized by frequent within-sentence language transitions, present structural challenges that standard language models fail to address. In this work, we propose CMLFormer, an enhanced multi-layer dual-decoder Transformer with a shared encoder and synchronized decoder cross-attention, designed to model the linguistic and semantic dynamics of code-mixed text. CMLFormer is… ▽ More Code-mixed languages, characterized by frequent within-sentence language transitions, present structural challenges that standard language models fail to address. In this work, we propose CMLFormer, an enhanced multi-layer dual-decoder Transformer with a shared encoder and synchronized decoder cross-attention, designed to model the linguistic and semantic dynamics of code-mixed text. CMLFormer is pre-trained on an augmented Hinglish corpus with switching point and translation annotations with multiple new objectives specifically aimed at capturing switching behavior, cross-lingual structure, and code-mixing complexity. Our experiments show that CMLFormer improves F1 score, precision, and accuracy over other approaches on the HASOC-2021 benchmark under select pre-training setups. Attention analyses further show that it can identify and attend to switching points, validating its sensitivity to code-mixed structure. These results demonstrate the effectiveness of CMLFormer's architecture and multi-task pre-training strategy for modeling code-mixed languages. △ Less

Submitted 18 May, 2025; originally announced May 2025.

arXiv:2402.01007 [pdf]

Municipal cyber risk modeling using cryptographic computing to inform cyber policymaking

Authors: Avital Baral, Taylor Reynolds, Lawrence Susskind, Daniel J. Weitzner, Angelina Wu

Abstract: Municipalities are vulnerable to cyberattacks with devastating consequences, but they lack key information to evaluate their own risk and compare their security posture to peers. Using data from 83 municipalities collected via a cryptographically secure computation platform about their security posture, incidents, security control failures, and losses, we build data-driven cyber risk models and cy… ▽ More Municipalities are vulnerable to cyberattacks with devastating consequences, but they lack key information to evaluate their own risk and compare their security posture to peers. Using data from 83 municipalities collected via a cryptographically secure computation platform about their security posture, incidents, security control failures, and losses, we build data-driven cyber risk models and cyber security benchmarks for municipalities. We produce benchmarks of the security posture in a sector, the frequency of cyber incidents, forecasted annual losses for organizations based on their defensive posture, and a weighting of cyber controls based on their individual failure rates and associated losses. Combined, these four items can help guide cyber policymaking by quantifying the cyber risk in a sector, identifying gaps that need to be addressed, prioritizing policy interventions, and tracking progress of those interventions over time. In the case of the municipalities, these newly derived risk measures highlight the need for continuous measured improvement of cybersecurity readiness, show clear areas of weakness and strength, and provide governments with some early targets for policy focus such as security education, incident response, and focusing efforts first on municipalities at the lowest security levels that have the highest risk reduction per security dollar invested. △ Less

Submitted 5 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: Working Draft for Presentation at the Cybersecurity Law and Policy Scholars Conference - September 29, 2023

MSC Class: K.6.5 and E.3

arXiv:2305.09248 [pdf, other]

Maximum-Width Rainbow-Bisecting Empty Annulus

Authors: Sang Won Bae, Sandip Banerjee, Arpita Baral, Priya Ranjan Sinha Mahapatra, Sang Duk Yoon

Abstract: Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of… ▽ More Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of a particular shape with maximum possible width such that $A$ does not contain any input points and it bisects the input point set into two parts, each of which is a rainbow. We compute a maximum-width rainbow-bisecting empty axis-parallel square, axis-parallel rectangular and circular annulus in $O(n^3)$ time using $O(n)$ space, in $O(k^2n^2\log n)$ time using $O(n\log n)$ space and in $O(n^3)$ time using $O(n^2)$ space respectively. △ Less

Submitted 26 March, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: A preliminary version is accepted in EuroCG 2021 and the expanded version is accepted in the journal Computational Geometry: Theory and Applications

arXiv:1811.06217 [pdf, other]

Maximum-Width Empty Square and Rectangular Annulus

Authors: Sang Won Bae, Arpita Baral, Priya Ranjan Sinha Mahapatra

Abstract: An annulus is, informally, a ring-shaped region, often described by two concentric circles. The maximum-width empty annulus problem asks to find an annulus of a certain shape with the maximum possible width that avoids a given set of $n$ points in the plane. This problem can also be interpreted as the problem of finding an optimal location of a ring-shaped obnoxious facility among the input points… ▽ More An annulus is, informally, a ring-shaped region, often described by two concentric circles. The maximum-width empty annulus problem asks to find an annulus of a certain shape with the maximum possible width that avoids a given set of $n$ points in the plane. This problem can also be interpreted as the problem of finding an optimal location of a ring-shaped obnoxious facility among the input points. In this paper, we study square and rectangular variants of the maximum-width empty anuulus problem, and present first nontrivial algorithms. Specifically, our algorithms run in $O(n^3)$ and $O(n^2 \log n)$ time for computing a maximum-width empty axis-parallel square and rectangular annulus, respectively. Both algorithms use only $O(n)$ space. △ Less

Submitted 15 November, 2018; originally announced November 2018.

arXiv:1712.00375 [pdf, ps, other]

Maximum-width Axis-Parallel Empty Rectangular Annulus

Authors: Arpita Baral, Abhilash Gondane, Sanjib Sadhu, Priya Ranjan Sinha Mahapatra

Abstract: Given a set $P$ of $n$ points on $\mathbb R^{2}$, we address the problem of computing an axis-parallel empty rectangular annulus $A$ of maximum-width such that no point of $P$ lies inside $A$ but all points of $P$ must lie inside, outside and on the boundaries of two parallel rectangles forming the annulus $A$. We propose an $O(n^3)$ time and $O(n)$ space algorithm to solve the problem. In a parti… ▽ More Given a set $P$ of $n$ points on $\mathbb R^{2}$, we address the problem of computing an axis-parallel empty rectangular annulus $A$ of maximum-width such that no point of $P$ lies inside $A$ but all points of $P$ must lie inside, outside and on the boundaries of two parallel rectangles forming the annulus $A$. We propose an $O(n^3)$ time and $O(n)$ space algorithm to solve the problem. In a particular case when the inner rectangle of an axis-parallel empty rectangular annulus reduces to an input point we can solve the problem in $O(n \log n)$ time and $O(n)$ space. △ Less

Submitted 1 December, 2017; originally announced December 2017.

Showing 1–6 of 6 results for author: Baral, A