Skip to main content

Showing 1–19 of 19 results for author: Harvey, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.07956  [pdf, ps, other

    cs.LG cs.NE cs.SC

    Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold Networks

    Authors: Thomas R. Harvey, Fabian Ruehle, Kit Fraser-Taliente, James Halverson

    Abstract: We present a novel approach to symbolic regression using vision-capable large language models (LLMs) and the ideas behind Google DeepMind's Funsearch. The LLM is given a plot of a univariate function and tasked with proposing an ansatz for that function. The free parameters of the ansatz are fitted using standard numerical optimisers, and a collection of such ansätze make up the population of a… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  2. arXiv:2504.06330  [pdf, other

    cs.CV cs.AI

    Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images

    Authors: Hicham Talaoubrid, Anissa Mokraoui, Ismail Ben Ayed, Axel Prouvost, Sonimith Hang, Monit Korn, Rémi Harvey

    Abstract: This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results s… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  3. arXiv:2503.11061  [pdf, other

    cs.LG math.CO

    Generative Modeling for Mathematical Discovery

    Authors: Jordan S. Ellenberg, Cristofero S. Fraser-Taliente, Thomas R. Harvey, Karan Srivastava, Andrew V. Sutherland

    Abstract: We present a new implementation of the LLM-driven genetic algorithm {\it funsearch}, whose aim is to generate examples of interest to mathematicians and which has already had some success in problems in extremal combinatorics. Our implementation is designed to be useful in practice for working mathematicians; it does not require expertise in machine learning or access to high-performance computing… ▽ More

    Submitted 16 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: 22 pages, 14 figures

    MSC Class: 68T20

  4. arXiv:2306.03147  [pdf, other

    hep-th cs.AI

    Decoding Nature with Nature's Tools: Heterotic Line Bundle Models of Particle Physics with Genetic Algorithms and Quantum Annealing

    Authors: Steve Abel, Andrei Constantin, Thomas R. Harvey, Andre Lukas, Luca A. Nutricati

    Abstract: The string theory landscape may include a multitude of ultraviolet embeddings of the Standard Model, but identifying these has proven difficult due to the enormous number of available string compactifications. Genetic Algorithms (GAs) represent a powerful class of discrete optimisation techniques that can efficiently deal with the immensity of the string landscape, especially when enhanced with in… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 11 pages, 4 figures

  5. SMART: Self-Morphing Adaptive Replanning Tree

    Authors: Zongyuan Shen, James P. Wilson, Shalabh Gupta, Ryan Harvey

    Abstract: The paper presents an algorithm, called Self-Morphing Adaptive Replanning Tree (SMART), that facilitates fast replanning in dynamic environments. SMART performs risk based tree-pruning if the current path is obstructed by nearby moving obstacle(s), resulting in multiple disjoint subtrees. Then, for speedy recovery, it exploits these subtrees and performs informed tree-repair at hot-spots that lie… ▽ More

    Submitted 21 September, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 9 pages

    Journal ref: IEEE Robotics and Automation Letters, 2023

  6. arXiv:2109.05043  [pdf, other

    cs.RO

    SMARRT: Self-Repairing Motion-Reactive Anytime RRT for Dynamic Environments

    Authors: Zongyuan Shen, James Wilson, Ryan Harvey, Shalabh Gupta

    Abstract: This paper addresses the fast replanning problem in dynamic environments with moving obstacles. Since for randomly moving obstacles the future states are unpredictable, the proposed method, called SMARRT, reacts to obstacle motions and revises the path in real-time based on the current interfering obstacle state (i.e., position and velocity). SMARRT is fast and efficient and performs collision che… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  7. Heterotic String Model Building with Monad Bundles and Reinforcement Learning

    Authors: Andrei Constantin, Thomas R. Harvey, Andre Lukas

    Abstract: We use reinforcement learning as a means of constructing string compactifications with prescribed properties. Specifically, we study heterotic SO(10) GUT models on Calabi-Yau three-folds with monad bundles, in search of phenomenologically promising examples. Due to the vast number of bundles and the sparseness of viable choices, methods based on systematic scanning are not suitable for this class… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    Comments: 35 pages, 9 figures, data set of models included as ancillary material in the submission

  8. arXiv:2108.01626  [pdf, other

    cs.RO eess.SY

    CPPNet: A Coverage Path Planning Network

    Authors: Zongyuan Shen, Palash Agrawal, James P. Wilson, Ryan Harvey, Shalabh Gupta

    Abstract: This paper presents a deep-learning based CPP algorithm, called Coverage Path Planning Network (CPPNet). CPPNet is built using a convolutional neural network (CNN) whose input is a graph-based representation of the occupancy grid map while its output is an edge probability heat graph, where the value of each edge is the probability of belonging to the optimal TSP tour. Finally, a greedy search is… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

  9. arXiv:2104.11059  [pdf, other

    cs.RO cs.AI eess.SY

    MRRT: Multiple Rapidly-Exploring Random Trees for Fast Online Replanning in Dynamic Environments

    Authors: Zongyuan Shen, James P. Wilson, Ryan Harvey, Shalabh Gupta

    Abstract: This paper presents a novel algorithm, called MRRT, which uses multiple rapidly-exploring random trees for fast online replanning of autonomous vehicles in dynamic environments with moving obstacles. The proposed algorithm is built upon the RRT algorithm with a multi-tree structure. At the beginning, the RRT algorithm is applied to find the initial solution based on partial knowledge of the enviro… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

  10. arXiv:2005.02163  [pdf, other

    cs.CV cs.LG stat.ML

    Detecting Electric Devices in 3D Images of Bags

    Authors: Anthony Bagnall, Paul Southam, James Large, Richard Harvey

    Abstract: The aviation and transport security industries face the challenge of screening high volumes of baggage for threats and contraband in the minimum time possible. Automation and semi-automation of this procedure offers the potential to increase security by detecting more threats and improve the customer experience by speeding up the process. Traditional 2D x-ray images are often extremely difficult t… ▽ More

    Submitted 25 April, 2020; originally announced May 2020.

  11. arXiv:1805.02948  [pdf, other

    eess.IV cs.CV cs.SD eess.AS

    Comparing heterogeneous visual gestures for measuring the diversity of visual speech signals

    Authors: Helen L Bear, Richard Harvey

    Abstract: Visual lip gestures observed whilst lipreading have a few working definitions, the most common two are; `the visual equivalent of a phoneme' and `phonemes which are indistinguishable on the lips'. To date there is no formal definition, in part because to date we have not established a two-way relationship or mapping between visemes and phonemes. Some evidence suggests that visual speech is highly… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

    Journal ref: Computer Speech and Language, May 2018

  12. arXiv:1805.02934  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    Phoneme-to-viseme mappings: the good, the bad, and the ugly

    Authors: Helen L Bear, Richard Harvey

    Abstract: Visemes are the visual equivalent of phonemes. Although not precisely defined, a working definition of a viseme is "a set of phonemes which have identical appearance on the lips". Therefore a phoneme falls into one viseme class but a viseme may represent many phonemes: a many to one mapping. This mapping introduces ambiguity between phonemes when using viseme classifiers. Not only is this ambiguit… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

    Journal ref: Speech Communication, Special Issue on AV expressive speech. 2017

  13. arXiv:1805.02924  [pdf, other

    cs.CV cs.CL cs.SD eess.AS eess.IV

    Comparing phonemes and visemes with DNN-based lipreading

    Authors: Kwanchiva Thangthai, Helen L Bear, Richard Harvey

    Abstract: There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report t… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

    Journal ref: BMVC Lipreading Workshop 2017

  14. arXiv:1710.01169  [pdf, other

    cs.CV eess.AS

    Decoding visemes: improving machine lipreading

    Authors: Helen L. Bear, Richard Harvey

    Abstract: To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses p… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Journal ref: Helen L Bear and Richard Harvey. Decoding visemes: improving machine lipreading. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. p2009-2013

  15. arXiv:1710.01142  [pdf, other

    cs.CV cs.CL eess.AS

    Finding phonemes: improving machine lip-reading

    Authors: Helen L. Bear, Richard W. Harvey, Yuxuan Lan

    Abstract: In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated pho… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Journal ref: Helen L. Bear, Richard W. Harvey, Yuxuan Lan. Finding phonemes: improving machine lip-reading. Audio-Visual Speech Processing (AVSP), 2015 p115-120

  16. arXiv:1710.01122  [pdf, other

    cs.CV eess.AS

    Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

    Authors: Helen L. Bear, Stephen J. Cox, Richard W. Harvey

    Abstract: In machine lip-reading, which is identification of speech from visual-only information, there is evidence to show that visual speech is highly dependent upon the speaker [1]. Here, we use a phoneme-clustering method to form new phoneme-to-viseme maps for both individual and multiple speakers. We use these maps to examine how similarly speakers talk visually. We conclude that broadly speaking, spea… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Journal ref: Helen L. Bear, Stephen J. Cox, Richard W. Harvey, Speaker-independent machine lip-reading with speaker-dependent viseme classifiers. Audio-Visual Speech Processing (AVSP) 2015, p190-195

  17. arXiv:1710.01093  [pdf, other

    cs.CV cs.CL eess.AS

    Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

    Authors: Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan

    Abstract: A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Journal ref: Helen L. Bear, Richard W. Harvey, Barry-John Theobald, and Yuxuan Lan. Which phoneme-to-viseme maps best improve visual-only computer lip-reading? Advances in Visual Computing 2014. p230-239

  18. arXiv:1710.01084  [pdf, other

    cs.CV eess.IV

    Some observations on computer lip-reading: moving from the dream to the reality

    Authors: Helen L. Bear, Gari Owen, Richard Harvey, Barry-John Theobald

    Abstract: In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called visemes for example). Here we review these and other assumptions and show the surprising result that computer lip-reading is not heavily constrained by video resolution,… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Journal ref: Helen L. Bear, Gari Owen, Richard Harvey, and Barry-John Theobald. Some observations on computer lip-reading: moving from the dream to the reality. International Society for Optics and Photonics- Security and defence. 2014. p92530G--92530G

  19. arXiv:1710.01073  [pdf, other

    cs.CV eess.IV

    Resolution limits on visual speech recognition

    Authors: Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan

    Abstract: Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Journal ref: Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan. Resolution limits on visual speech recognition. International Conference on Image Processing (ICIP). 2014. p1371-1375