Search | arXiv e-print repository

Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold Networks

Authors: Thomas R. Harvey, Fabian Ruehle, Kit Fraser-Taliente, James Halverson

Abstract: We present a novel approach to symbolic regression using vision-capable large language models (LLMs) and the ideas behind Google DeepMind's Funsearch. The LLM is given a plot of a univariate function and tasked with proposing an ansatz for that function. The free parameters of the ansatz are fitted using standard numerical optimisers, and a collection of such ansätze make up the population of a… ▽ More We present a novel approach to symbolic regression using vision-capable large language models (LLMs) and the ideas behind Google DeepMind's Funsearch. The LLM is given a plot of a univariate function and tasked with proposing an ansatz for that function. The free parameters of the ansatz are fitted using standard numerical optimisers, and a collection of such ansätze make up the population of a genetic algorithm. Unlike other symbolic regression techniques, our method does not require the specification of a set of functions to be used in regression, but with appropriate prompt engineering, we can arbitrarily condition the generative step. By using Kolmogorov Arnold Networks (KANs), we demonstrate that ``univariate is all you need'' for symbolic regression, and extend this method to multivariate functions by learning the univariate function on each edge of a trained KAN. The combined expression is then simplified by further processing with a language model. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2504.06330 [pdf, other]

Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images

Authors: Hicham Talaoubrid, Anissa Mokraoui, Ismail Ben Ayed, Axel Prouvost, Sonimith Hang, Monit Korn, Rémi Harvey

Abstract: This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results s… ▽ More This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results show that LoRA applied after an initial fine-tuning slightly improves performance in low-shot settings (e.g., 1-shot and 5-shot), while full fine-tuning remains more effective in higher-shot configurations. These findings highlight LoRA's potential for efficient adaptation in aerial object detection, encouraging further research into parameter-efficient fine-tuning strategies for few-shot learning. Our code is available here: https://github.com/HichTala/LoRA-DiffusionDet. △ Less

Submitted 8 April, 2025; originally announced April 2025.

arXiv:2503.11061 [pdf, other]

Generative Modeling for Mathematical Discovery

Authors: Jordan S. Ellenberg, Cristofero S. Fraser-Taliente, Thomas R. Harvey, Karan Srivastava, Andrew V. Sutherland

Abstract: We present a new implementation of the LLM-driven genetic algorithm {\it funsearch}, whose aim is to generate examples of interest to mathematicians and which has already had some success in problems in extremal combinatorics. Our implementation is designed to be useful in practice for working mathematicians; it does not require expertise in machine learning or access to high-performance computing… ▽ More We present a new implementation of the LLM-driven genetic algorithm {\it funsearch}, whose aim is to generate examples of interest to mathematicians and which has already had some success in problems in extremal combinatorics. Our implementation is designed to be useful in practice for working mathematicians; it does not require expertise in machine learning or access to high-performance computing resources. Applying {\it funsearch} to a new problem involves modifying a small segment of Python code and selecting a large language model (LLM) from one of many third-party providers. We benchmarked our implementation on three different problems, obtaining metrics that may inform applications of {\it funsearch} to new problems. Our results demonstrate that {\it funsearch} successfully learns in a variety of combinatorial and number-theoretic settings, and in some contexts learns principles that generalize beyond the problem originally trained on. △ Less

Submitted 16 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

Comments: 22 pages, 14 figures

MSC Class: 68T20

arXiv:2306.03147 [pdf, other]

Decoding Nature with Nature's Tools: Heterotic Line Bundle Models of Particle Physics with Genetic Algorithms and Quantum Annealing

Authors: Steve Abel, Andrei Constantin, Thomas R. Harvey, Andre Lukas, Luca A. Nutricati

Abstract: The string theory landscape may include a multitude of ultraviolet embeddings of the Standard Model, but identifying these has proven difficult due to the enormous number of available string compactifications. Genetic Algorithms (GAs) represent a powerful class of discrete optimisation techniques that can efficiently deal with the immensity of the string landscape, especially when enhanced with in… ▽ More The string theory landscape may include a multitude of ultraviolet embeddings of the Standard Model, but identifying these has proven difficult due to the enormous number of available string compactifications. Genetic Algorithms (GAs) represent a powerful class of discrete optimisation techniques that can efficiently deal with the immensity of the string landscape, especially when enhanced with input from quantum annealers. In this letter we focus on geometric compactifications of the $E_8\times E_8$ heterotic string theory compactified on smooth Calabi-Yau threefolds with Abelian bundles. We make use of analytic formulae for bundle-valued cohomology to impose the entire range of spectrum requirements, something that has not been possible so far. For manifolds with a relatively low number of Kahler parameters we compare the GA search results with results from previous systematic scans, showing that GAs can find nearly all the viable solutions while visiting only a tiny fraction of the solution space. Moreover, we carry out GA searches on manifolds with a larger numbers of Kahler parameters where systematic searches are not feasible. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: 11 pages, 4 figures

arXiv:2305.06487 [pdf, other]

doi 10.1109/LRA.2023.3315210

SMART: Self-Morphing Adaptive Replanning Tree

Authors: Zongyuan Shen, James P. Wilson, Shalabh Gupta, Ryan Harvey

Abstract: The paper presents an algorithm, called Self-Morphing Adaptive Replanning Tree (SMART), that facilitates fast replanning in dynamic environments. SMART performs risk based tree-pruning if the current path is obstructed by nearby moving obstacle(s), resulting in multiple disjoint subtrees. Then, for speedy recovery, it exploits these subtrees and performs informed tree-repair at hot-spots that lie… ▽ More The paper presents an algorithm, called Self-Morphing Adaptive Replanning Tree (SMART), that facilitates fast replanning in dynamic environments. SMART performs risk based tree-pruning if the current path is obstructed by nearby moving obstacle(s), resulting in multiple disjoint subtrees. Then, for speedy recovery, it exploits these subtrees and performs informed tree-repair at hot-spots that lie at the intersection of subtrees to find a new path. The performance of SMART is comparatively evaluated with eight existing algorithms through extensive simulations. Two scenarios are considered with: 1) dynamic obstacles and 2) both static and dynamic obstacles. The results show that SMART yields significant improvements in replanning time, success rate and travel time. Finally, the performance of SMART is validated by a real laboratory experiment. △ Less

Submitted 21 September, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 9 pages

Journal ref: IEEE Robotics and Automation Letters, 2023

arXiv:2109.05043 [pdf, other]

SMARRT: Self-Repairing Motion-Reactive Anytime RRT for Dynamic Environments

Authors: Zongyuan Shen, James Wilson, Ryan Harvey, Shalabh Gupta

Abstract: This paper addresses the fast replanning problem in dynamic environments with moving obstacles. Since for randomly moving obstacles the future states are unpredictable, the proposed method, called SMARRT, reacts to obstacle motions and revises the path in real-time based on the current interfering obstacle state (i.e., position and velocity). SMARRT is fast and efficient and performs collision che… ▽ More This paper addresses the fast replanning problem in dynamic environments with moving obstacles. Since for randomly moving obstacles the future states are unpredictable, the proposed method, called SMARRT, reacts to obstacle motions and revises the path in real-time based on the current interfering obstacle state (i.e., position and velocity). SMARRT is fast and efficient and performs collision checking only on the partial path segment close to the robot within a feasibility checking horizon. If the path is infeasible, then tree parts associated with the path inside the horizon are pruned while maintaining the maximal tree structure of already-explored regions. Then, a multi-resolution utility map is created to capture the environmental information used to compute the replanning utility for each cell on the multi-scale tiling. A hierarchical searching method is applied on the map to find the sampling cell efficiently. Finally, uniform samples are drawn within the sampling cell for fast replanning. The SMARRT method is validated via simulation runs, and the results are evaluated in comparison to four existing methods. The SMARRT method yields significant improvements in travel time, replanning time, and success rate compared against the existing methods. △ Less

Submitted 10 September, 2021; originally announced September 2021.

arXiv:2108.07316 [pdf, other]

doi 10.1002/prop.202100186

Heterotic String Model Building with Monad Bundles and Reinforcement Learning

Authors: Andrei Constantin, Thomas R. Harvey, Andre Lukas

Abstract: We use reinforcement learning as a means of constructing string compactifications with prescribed properties. Specifically, we study heterotic SO(10) GUT models on Calabi-Yau three-folds with monad bundles, in search of phenomenologically promising examples. Due to the vast number of bundles and the sparseness of viable choices, methods based on systematic scanning are not suitable for this class… ▽ More We use reinforcement learning as a means of constructing string compactifications with prescribed properties. Specifically, we study heterotic SO(10) GUT models on Calabi-Yau three-folds with monad bundles, in search of phenomenologically promising examples. Due to the vast number of bundles and the sparseness of viable choices, methods based on systematic scanning are not suitable for this class of models. By focusing on two specific manifolds with Picard numbers two and three, we show that reinforcement learning can be used successfully to explore monad bundles. Training can be accomplished with minimal computing resources and leads to highly efficient policy networks. They produce phenomenologically promising states for nearly 100% of episodes and within a small number of steps. In this way, hundreds of new candidate standard models are found. △ Less

Submitted 16 August, 2021; originally announced August 2021.

Comments: 35 pages, 9 figures, data set of models included as ancillary material in the submission

arXiv:2108.01626 [pdf, other]

CPPNet: A Coverage Path Planning Network

Authors: Zongyuan Shen, Palash Agrawal, James P. Wilson, Ryan Harvey, Shalabh Gupta

Abstract: This paper presents a deep-learning based CPP algorithm, called Coverage Path Planning Network (CPPNet). CPPNet is built using a convolutional neural network (CNN) whose input is a graph-based representation of the occupancy grid map while its output is an edge probability heat graph, where the value of each edge is the probability of belonging to the optimal TSP tour. Finally, a greedy search is… ▽ More This paper presents a deep-learning based CPP algorithm, called Coverage Path Planning Network (CPPNet). CPPNet is built using a convolutional neural network (CNN) whose input is a graph-based representation of the occupancy grid map while its output is an edge probability heat graph, where the value of each edge is the probability of belonging to the optimal TSP tour. Finally, a greedy search is used to select the final optimized tour. CPPNet is trained and comparatively evaluated against the TSP tour. It is shown that CPPNet provides near-optimal solutions while requiring significantly less computational time, thus enabling real-time coverage path planning in partially unknown and dynamic environments. △ Less

Submitted 3 August, 2021; originally announced August 2021.

arXiv:2104.11059 [pdf, other]

MRRT: Multiple Rapidly-Exploring Random Trees for Fast Online Replanning in Dynamic Environments

Authors: Zongyuan Shen, James P. Wilson, Ryan Harvey, Shalabh Gupta

Abstract: This paper presents a novel algorithm, called MRRT, which uses multiple rapidly-exploring random trees for fast online replanning of autonomous vehicles in dynamic environments with moving obstacles. The proposed algorithm is built upon the RRT algorithm with a multi-tree structure. At the beginning, the RRT algorithm is applied to find the initial solution based on partial knowledge of the enviro… ▽ More This paper presents a novel algorithm, called MRRT, which uses multiple rapidly-exploring random trees for fast online replanning of autonomous vehicles in dynamic environments with moving obstacles. The proposed algorithm is built upon the RRT algorithm with a multi-tree structure. At the beginning, the RRT algorithm is applied to find the initial solution based on partial knowledge of the environment. Then, the robot starts to execute this path. At each iteration, the new obstacle configurations are collected by the robot's sensor and used to replan the path. This new information can come from unknown static obstacles (e.g., seafloor layout) as well as moving obstacles. Then, to accommodate the environmental changes, two procedures are adopted: 1) edge pruning, and 2) tree regrowing. Specifically, the edge pruning procedure checks the collision status through the tree and only removes the invalid edges while maintaining the tree structure of already-explored regions. Due to removal of invalid edges, the tree could be broken into multiple disjoint trees. As such, the RRT algorithm is applied to regrow the trees. Specifically, a sample is created randomly and joined to all the disjoint trees in its local neighborhood by connecting to the nearest nodes. Finally, a new solution is found for the robot. The advantages of the proposed MRRT algorithm are as follows: i) retains the maximal tree structure by only pruning the edges which collide with the obstacles, ii) guarantees probabilistic completeness, and iii) is computational efficient for fast replanning since all disjoint trees are maintained for future connections and expanded simultaneously. △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2005.02163 [pdf, other]

Detecting Electric Devices in 3D Images of Bags

Authors: Anthony Bagnall, Paul Southam, James Large, Richard Harvey

Abstract: The aviation and transport security industries face the challenge of screening high volumes of baggage for threats and contraband in the minimum time possible. Automation and semi-automation of this procedure offers the potential to increase security by detecting more threats and improve the customer experience by speeding up the process. Traditional 2D x-ray images are often extremely difficult t… ▽ More The aviation and transport security industries face the challenge of screening high volumes of baggage for threats and contraband in the minimum time possible. Automation and semi-automation of this procedure offers the potential to increase security by detecting more threats and improve the customer experience by speeding up the process. Traditional 2D x-ray images are often extremely difficult to examine due to the fact that they are tightly packed and contain a wide variety of cluttered and occluded objects. Because of these limitations, major airports are introducing 3D x-ray Computed Tomography (CT) baggage scanning. We investigate whether we can automate the process of detecting electric devices in these 3D images of luggage. Detecting electrical devices is of particular concern as they can be used to conceal explosives. Given the massive volume of luggage that needs to be screened for this threat, the best way to automate the detection is to first filter whether a bag contains an electric device or not, and if it does, to identify the number of devices and their location. We present an algorithm, Unpack, Predict, eXtract, Repack (UXPR), which involves unpacking through segmenting the data at a range of scales using an algorithm known as the Sieve, predicting whether a segment is electrical or not based on the histogram of voxel intensities, then repacking the bag by ensembling the segments and predictions to identify the devices in bags. Through a range of experiments using data provided by ALERT (Awareness and Localization of Explosives-Related Threats) we show that this system can find a high proportion of devices with unsupervised segmentation if a similar device has been seen before, and shows promising results for detecting devices not seen at all based on the properties of its constituent parts. △ Less

Submitted 25 April, 2020; originally announced May 2020.

arXiv:1805.02948 [pdf, other]

Comparing heterogeneous visual gestures for measuring the diversity of visual speech signals

Authors: Helen L Bear, Richard Harvey

Abstract: Visual lip gestures observed whilst lipreading have a few working definitions, the most common two are; `the visual equivalent of a phoneme' and `phonemes which are indistinguishable on the lips'. To date there is no formal definition, in part because to date we have not established a two-way relationship or mapping between visemes and phonemes. Some evidence suggests that visual speech is highly… ▽ More Visual lip gestures observed whilst lipreading have a few working definitions, the most common two are; `the visual equivalent of a phoneme' and `phonemes which are indistinguishable on the lips'. To date there is no formal definition, in part because to date we have not established a two-way relationship or mapping between visemes and phonemes. Some evidence suggests that visual speech is highly dependent upon the speaker. So here, we use a phoneme-clustering method to form new phoneme-to-viseme maps for both individual and multiple speakers. We test these phoneme to viseme maps to examine how similarly speakers talk visually and we use signed rank tests to measure the distance between individuals. We conclude that broadly speaking, speakers have the same repertoire of mouth gestures, where they differ is in the use of the gestures. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Journal ref: Computer Speech and Language, May 2018

arXiv:1805.02934 [pdf, other]

doi 10.1016/j.specom.2017.07.001

Phoneme-to-viseme mappings: the good, the bad, and the ugly

Authors: Helen L Bear, Richard Harvey

Abstract: Visemes are the visual equivalent of phonemes. Although not precisely defined, a working definition of a viseme is "a set of phonemes which have identical appearance on the lips". Therefore a phoneme falls into one viseme class but a viseme may represent many phonemes: a many to one mapping. This mapping introduces ambiguity between phonemes when using viseme classifiers. Not only is this ambiguit… ▽ More Visemes are the visual equivalent of phonemes. Although not precisely defined, a working definition of a viseme is "a set of phonemes which have identical appearance on the lips". Therefore a phoneme falls into one viseme class but a viseme may represent many phonemes: a many to one mapping. This mapping introduces ambiguity between phonemes when using viseme classifiers. Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings. In this paper we explore the issue of this choice of viseme-to-phoneme map. We show that there is definite difference in performance between viseme-to-phoneme mappings and explore why some maps appear to work better than others. We also devise a new algorithm for constructing phoneme-to-viseme mappings from labeled speech data. These new visemes, `Bear' visemes, are shown to perform better than previously known units. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Journal ref: Speech Communication, Special Issue on AV expressive speech. 2017

arXiv:1805.02924 [pdf, other]

Comparing phonemes and visemes with DNN-based lipreading

Authors: Kwanchiva Thangthai, Helen L Bear, Richard Harvey

Abstract: There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report t… ▽ More There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is a Weighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Journal ref: BMVC Lipreading Workshop 2017

arXiv:1710.01169 [pdf, other]

Decoding visemes: improving machine lipreading

Authors: Helen L. Bear, Richard Harvey

Abstract: To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses p… ▽ More To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results. △ Less