-
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics
Authors:
Herman Chau,
Helen Jenne,
Davis Brown,
Jesse He,
Mark Raugas,
Sara Billey,
Henry Kvinge
Abstract:
With recent dramatic increases in AI system capabilities, there has been growing interest in utilizing machine learning for reasoning-heavy, quantitative tasks, particularly mathematics. While there are many resources capturing mathematics at the high-school, undergraduate, and graduate level, there are far fewer resources available that align with the level of difficulty and open endedness encoun…
▽ More
With recent dramatic increases in AI system capabilities, there has been growing interest in utilizing machine learning for reasoning-heavy, quantitative tasks, particularly mathematics. While there are many resources capturing mathematics at the high-school, undergraduate, and graduate level, there are far fewer resources available that align with the level of difficulty and open endedness encountered by professional mathematicians working on open problems. To address this, we introduce a new collection of datasets, the Algebraic Combinatorics Dataset Repository (ACD Repo), representing either foundational results or open problems in algebraic combinatorics, a subfield of mathematics that studies discrete structures arising from abstract algebra. Further differentiating our dataset collection is the fact that it aims at the conjecturing process. Each dataset includes an open-ended research-level question and a large collection of examples (up to 10M in some cases) from which conjectures should be generated. We describe all nine datasets, the different ways machine learning models can be applied to them (e.g., training with narrow models followed by interpretability analysis or program synthesis with LLMs), and discuss some of the challenges involved in designing datasets like these.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
Authors:
Jesse He,
Helen Jenne,
Herman Chau,
Davis Brown,
Mark Raugas,
Sara Billey,
Henry Kvinge
Abstract:
Machine learning is becoming an increasingly valuable tool in mathematics, enabling one to identify subtle patterns across collections of examples so vast that they would be impossible for a single researcher to feasibly review and analyze. In this work, we use graph neural networks to investigate \emph{quiver mutation} -- an operation that transforms one quiver (or directed multigraph) into anoth…
▽ More
Machine learning is becoming an increasingly valuable tool in mathematics, enabling one to identify subtle patterns across collections of examples so vast that they would be impossible for a single researcher to feasibly review and analyze. In this work, we use graph neural networks to investigate \emph{quiver mutation} -- an operation that transforms one quiver (or directed multigraph) into another -- which is central to the theory of cluster algebras with deep connections to geometry, topology, and physics. In the study of cluster algebras, the question of \emph{mutation equivalence} is of fundamental concern: given two quivers, can one efficiently determine if one quiver can be transformed into the other through a sequence of mutations? In this paper, we use graph neural networks and AI explainability techniques to independently discover mutation equivalence criteria for quivers of type $\tilde{D}$. Along the way, we also show that even without explicit training to do so, our model captures structure within its hidden representation that allows us to reconstruct known criteria from type $D$, adding to the growing evidence that modern machine learning models are capable of learning abstract and parsimonious rules from mathematical data.
△ Less
Submitted 23 June, 2025; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Existence and hardness of conveyor belts
Authors:
Molly Baird,
Sara C. Billey,
Erik D. Demaine,
Martin L. Demaine,
David Eppstein,
Sándor Fekete,
Graham Gordon,
Sean Griffin,
Joseph S. B. Mitchell,
Joshua P. Swanson
Abstract:
An open problem of Manuel Abellanas asks whether every set of disjoint closed unit disks in the plane can be connected by a conveyor belt, which means a tight simple closed curve that touches the boundary of each disk, possibly multiple times. We prove three main results. First, for unit disks whose centers are both $x$-monotone and $y$-monotone, or whose centers have $x$-coordinates that differ b…
▽ More
An open problem of Manuel Abellanas asks whether every set of disjoint closed unit disks in the plane can be connected by a conveyor belt, which means a tight simple closed curve that touches the boundary of each disk, possibly multiple times. We prove three main results. First, for unit disks whose centers are both $x$-monotone and $y$-monotone, or whose centers have $x$-coordinates that differ by at least two units, a conveyor belt always exists and can be found efficiently. Second, it is NP-complete to determine whether disks of varying radii have a conveyor belt, and it remains NP-complete when we constrain the belt to touch disks exactly once. Third, any disjoint set of $n$ disks of arbitrary radii can be augmented by $O(n)$ "guide" disks so that the augmented system has a conveyor belt touching each disk exactly once, answering a conjecture of Demaine, Demaine, and Palop.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
Fingerprint databases for theorems
Authors:
Sara C. Billey,
Bridget E. Tenner
Abstract:
We discuss the advantages of searchable, collaborative, language-independent databases of mathematical results, indexed by "fingerprints" of small and canonical data. Our motivating example is Neil Sloane's massively influential On-Line Encyclopedia of Integer Sequences. We hope to encourage the greater mathematical community to search for the appropriate fingerprints within each discipline, and t…
▽ More
We discuss the advantages of searchable, collaborative, language-independent databases of mathematical results, indexed by "fingerprints" of small and canonical data. Our motivating example is Neil Sloane's massively influential On-Line Encyclopedia of Integer Sequences. We hope to encourage the greater mathematical community to search for the appropriate fingerprints within each discipline, and to compile fingerprint databases of results wherever possible. The benefits of these databases are broad - advancing the state of knowledge, enhancing experimental mathematics, enabling researchers to discover unexpected connections between areas, and even improving the refereeing process for journal publication.
△ Less
Submitted 13 April, 2013;
originally announced April 2013.