-
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class
Authors:
James V. Roggeveen,
Erik Y. Wang,
Will Flintoft,
Peter Donets,
Lucy S. Nathwani,
Nickholas Gutierrez,
David Ettel,
Anton Marius Graf,
Siddharth Dandavate,
Arjun Nageswaran,
Raglan Ward,
Ava Williamson,
Anne Mykland,
Kacper K. Migacz,
Yijun Wang,
Egemen Bostan,
Duy Thuc Nguyen,
Zhe He,
Marc L. Descoteaux,
Felix Yeung,
Shida Liu,
Jorge García Ponce,
Luke Zhu,
Yuyang Chen,
Ekaterina S. Ivshina
, et al. (20 additional authors not shown)
Abstract:
Large language models (LLMs) have shown remarkable progress in mathematical problem-solving, but evaluation has largely focused on problems that have exact analytical solutions or involve formal proofs, often overlooking approximation-based problems ubiquitous in applied science and engineering. To fill this gap, we build on prior work and present HARDMath2, a dataset of 211 original problems cove…
▽ More
Large language models (LLMs) have shown remarkable progress in mathematical problem-solving, but evaluation has largely focused on problems that have exact analytical solutions or involve formal proofs, often overlooking approximation-based problems ubiquitous in applied science and engineering. To fill this gap, we build on prior work and present HARDMath2, a dataset of 211 original problems covering the core topics in an introductory graduate applied math class, including boundary-layer analysis, WKB methods, asymptotic solutions of nonlinear partial differential equations, and the asymptotics of oscillatory integrals. This dataset was designed and verified by the students and instructors of a core graduate applied mathematics course at Harvard. We build the dataset through a novel collaborative environment that challenges students to write and refine difficult problems consistent with the class syllabus, peer-validate solutions, test different models, and automatically check LLM-generated solutions against their own answers and numerical ground truths. Evaluation results show that leading frontier models still struggle with many of the problems in the dataset, highlighting a gap in the mathematical reasoning skills of current LLMs. Importantly, students identified strategies to create increasingly difficult problems by interacting with the models and exploiting common failure modes. This back-and-forth with the models not only resulted in a richer and more challenging benchmark but also led to qualitative improvements in the students' understanding of the course material, which is increasingly important as we enter an age where state-of-the-art language models can solve many challenging problems across a wide domain of fields.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
List Privacy Under Function Recoverability
Authors:
Ajaykrishnan Nageswaran,
Prakash Narayan
Abstract:
For a given function of user data, a querier must recover with at least a prescribed probability, the value of the function based on a user-provided query response. Subject to this requirement, the user forms the query response so as to minimize the likelihood of the querier guessing a list of prescribed size to which the data value belongs based on the query response. We obtain a general converse…
▽ More
For a given function of user data, a querier must recover with at least a prescribed probability, the value of the function based on a user-provided query response. Subject to this requirement, the user forms the query response so as to minimize the likelihood of the querier guessing a list of prescribed size to which the data value belongs based on the query response. We obtain a general converse upper bound for maximum list privacy. This bound is shown to be tight for the case of a binary-valued function through an explicit achievability scheme that involves an add-noise query response.
△ Less
Submitted 3 July, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Gaussian Data Privacy Under Linear Function Recoverability
Authors:
Ajaykrishnan Nageswaran
Abstract:
A user's data is represented by a Gaussian random variable. Given a linear function of the data, a querier is required to recover, with at least a prescribed accuracy level, the function value based on a query response provided by the user. The user devises the query response, subject to the recoverability requirement, so as to maximize privacy of the data from the querier. Recoverability and priv…
▽ More
A user's data is represented by a Gaussian random variable. Given a linear function of the data, a querier is required to recover, with at least a prescribed accuracy level, the function value based on a query response provided by the user. The user devises the query response, subject to the recoverability requirement, so as to maximize privacy of the data from the querier. Recoverability and privacy are both measured by $\ell_2$-distance criteria. An exact characterization is provided of maximum user data privacy under the recoverability condition. An explicit optimal achievability scheme for the user is given whose privacy is shown to match a converse upper bound.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Distribution Privacy Under Function Recoverability
Authors:
Ajaykrishnan Nageswaran,
Prakash Narayan
Abstract:
A user generates n independent and identically distributed data random variables with a probability mass function that must be guarded from a querier. The querier must recover, with a prescribed accuracy, a given function of the data from each of n independent and identically distributed query responses upon eliciting them from the user. The user chooses the data probability mass function and devi…
▽ More
A user generates n independent and identically distributed data random variables with a probability mass function that must be guarded from a querier. The querier must recover, with a prescribed accuracy, a given function of the data from each of n independent and identically distributed query responses upon eliciting them from the user. The user chooses the data probability mass function and devises the random query responses to maximize distribution privacy as gauged by the (Kullback-Leibler) divergence between the former and the querier's best estimate of it based on the n query responses. Considering an arbitrary function, a basic achievable lower bound for distribution privacy is provided that does not depend on n and corresponds to worst-case privacy. Worst-case privacy equals the logsum cardinalities of inverse atoms under the given function, with the number of summands decreasing as the querier recovers the function with improving accuracy. Next, upper (converse) and lower (achievability) bounds for distribution privacy, dependent on n, are developed. The former improves upon worst-case privacy and the latter does so under suitable assumptions; both converge to it as n grows. The converse and achievability proofs identify explicit strategies for the user and the querier.
△ Less
Submitted 30 December, 2021; v1 submitted 14 March, 2021;
originally announced March 2021.
-
Data Privacy for a $ρ$-Recoverable Function
Authors:
Ajaykrishnan Nageswaran,
Prakash Narayan
Abstract:
A user's data is represented by a finite-valued random variable. Given a function of the data, a querier is required to recover, with at least a prescribed probability, the value of the function based on a query response provided by the user. The user devises the query response, subject to the recoverability requirement, so as to maximize privacy of the data from the querier. Privacy is measured b…
▽ More
A user's data is represented by a finite-valued random variable. Given a function of the data, a querier is required to recover, with at least a prescribed probability, the value of the function based on a query response provided by the user. The user devises the query response, subject to the recoverability requirement, so as to maximize privacy of the data from the querier. Privacy is measured by the probability of error incurred by the querier in estimating the data from the query response. We analyze single and multiple independent query responses, with each response satisfying the recoverability requirement, that provide maximum privacy to the user. In the former setting, we also consider privacy for a predicate of the user's data. Achievability schemes with explicit randomization mechanisms for query responses are given and their privacy compared with converse upper bounds.
△ Less
Submitted 10 January, 2019; v1 submitted 21 February, 2018;
originally announced February 2018.