Skip to main content

Showing 1–50 of 134 results for author: Roberts, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.10910  [pdf, ps, other

    cs.CL

    Magistral

    Authors: Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou , et al. (76 additional authors not shown)

    Abstract: We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2506.09452  [pdf, ps, other

    cs.LG cs.CL cs.CR cs.IT

    Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform

    Authors: Jay Roberts, Kyle Mylonakis, Sidhartha Roy, Kaan Kale

    Abstract: The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models (LLMs) has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the compute infrastructure is typically shared across many teams in order to maximize the return on investment. In both scenarios the deployed models operate only… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Submitted to IEEE S&P 2026

    ACM Class: I.2.7; I.2.m

  3. arXiv:2505.13758  [pdf, ps, other

    cs.CR

    BeamClean: Language Aware Embedding Reconstruction

    Authors: Kaan Kale, Kyle Mylonakis, Jay Roberts, Sidhartha Roy

    Abstract: In this work, we consider an inversion attack on the obfuscated input embeddings sent to a language model on a server, where the adversary has no access to the language model or the obfuscation mechanism and sees only the obfuscated embeddings along with the model's embedding table. We propose BeamClean, an inversion attack that jointly estimates the noise parameters and decodes token sequences by… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 9 pages, 5 figures, under review at NeurIPS, 2025

  4. arXiv:2505.12556  [pdf, ps, other

    cs.LG cs.AI

    Beyond Accuracy: EcoL2 Metric for Sustainable Neural PDE Solvers

    Authors: Taniya Kapoor, Abhishek Chandra, Anastasios Stamou, Stephen J Roberts

    Abstract: Real-world systems, from aerospace to railway engineering, are modeled with partial differential equations (PDEs) describing the physics of the system. Estimating robust solutions for such problems is essential. Deep learning-based architectures, such as neural PDE solvers, have recently gained traction as a reliable solution method. The current state of development of these approaches, however, p… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  5. arXiv:2505.11626  [pdf, ps, other

    cs.CL cs.AI

    THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering

    Authors: Udita Patel, Rutu Mulkar, Jay Roberts, Cibi Chakravarthy Senthilkumar, Sujay Gandhi, Xiaofei Zheng, Naumaan Nayyar, Parul Kalra, Rafael Castrillo

    Abstract: We propose THELMA (Task Based Holistic Evaluation of Large Language Model Applications), a reference free framework for RAG (Retrieval Augmented generation) based question answering (QA) applications. THELMA consist of six interdependent metrics specifically designed for holistic, fine grained evaluation of RAG QA applications. THELMA framework helps developers and application owners evaluate, mon… ▽ More

    Submitted 3 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

    Comments: Added author

  6. arXiv:2504.17321  [pdf, other

    physics.geo-ph cs.LG

    Dargana: fine-tuning EarthPT for dynamic tree canopy mapping from space

    Authors: Michael J. Smith, Luke Fleming, James E. Geach, Ryan J. Roberts, Freddie Kalaitzis, James Banister

    Abstract: We present Dargana, a fine-tuned variant of the EarthPT time-series foundation model that achieves specialisation using <3% of its pre-training data volume and 5% of its pre-training compute. Dargana is fine-tuned to generate regularly updated classification of tree canopy cover at 10m resolution, distinguishing conifer and broadleaved tree types. Using Cornwall, UK, as a test case, the model achi… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 9 pages, 6 figures, spotlight at `Tackling Climate Change with Machine Learning', ICLR 2025

  7. arXiv:2503.16443  [pdf, other

    cs.HC

    Analysis of Distracted Pedestrians Crossing Behavior: An Immersive Virtual Reality Application

    Authors: Methusela Sulle, Judith Mwakalonge, Gurcan Comert, Saidi Siuhi, Nana Kankam Gyimah, Jaylen Roberts, Denis Ruganuza

    Abstract: Pedestrian safety is a critical public health priority, with pedestrian fatalities accounting for 18% of all U.S. traffic deaths in 2022. The rising prevalence of distracted walking, exacerbated by mobile device use, poses significant risks at signalized intersections. This study utilized an immersive virtual reality (VR) environment to simulate real-world traffic scenarios and assess pedestrian b… ▽ More

    Submitted 14 February, 2025; originally announced March 2025.

  8. arXiv:2503.12530  [pdf, other

    cs.CL

    Basic Category Usage in Vision Language Models

    Authors: Hunter Sawyer, Jesse Roberts, Kyle Moore

    Abstract: The field of psychology has long recognized a basic level of categorization that humans use when labeling visual stimuli, a term coined by Rosch in 1976. This level of categorization has been found to be used most frequently, to have higher information density, and to aid in visual language tasks with priming in humans. Here, we investigate basic level categorization in two recently released, open… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  9. arXiv:2503.12528  [pdf, other

    cs.CL

    Investigating Human-Aligned Large Language Model Uncertainty

    Authors: Kyle Moore, Jesse Roberts, Daryl Watson, Pamela Wisniewski

    Abstract: Recent work has sought to quantify large language model uncertainty to facilitate model control and modulate user trust. Previous works focus on measures of uncertainty that are theoretically grounded or reflect the average overt behavior of the model. In this work, we investigate a variety of uncertainty measures, in order to identify measures that correlate with human group-level uncertainty. We… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  10. arXiv:2502.10410  [pdf

    cs.CY cs.AI

    Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources

    Authors: Hannah-Beth Clark, Margaux Dowland, Laura Benton, Reka Budai, Ibrahim Kaan Keskin, Emma Searle, Matthew Gregory, Mark Hodierne, William Gayne, John Roberts

    Abstract: As a publicly funded body in the UK, Oak National Academy is in a unique position to innovate within this field as we have a comprehensive curriculum of approximately 13,000 open education resources (OER) for all National Curriculum subjects, designed and quality-assured by expert, human teachers. This has provided the corpus of content needed for building a high-quality AI-powered lesson planning… ▽ More

    Submitted 23 January, 2025; originally announced February 2025.

    Comments: 27 pages, Part of MIT Open Learning AI and Open Education Initiative Series, published Jan 2025 https://aiopeneducation.pubpub.org/pub/i36sncz8/release/3?readingCollection=06969c6d

  11. arXiv:2502.09696  [pdf, other

    cs.CV

    ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models

    Authors: Jonathan Roberts, Mohammad Reza Taesiri, Ansh Sharma, Akash Gupta, Samuel Roberts, Ioana Croitoru, Simion-Vlad Bogolin, Jialu Tang, Florian Langer, Vyas Raina, Vatsal Raina, Hanyi Xiong, Vishaal Udandarao, Jingyi Lu, Shiyang Chen, Sam Purkis, Tianshuo Yan, Wenye Lin, Gyungin Shin, Qiaochu Yang, Anh Totti Nguyen, David I. Atkinson, Aaditya Baranwal, Alexandru Coca, Mikah Dang , et al. (9 additional authors not shown)

    Abstract: Large Multimodal Models (LMMs) exhibit major shortfalls when interpreting images and, by some measures, have poorer spatial cognition than small children or animals. Despite this, they attain high scores on many popular visual benchmarks, with headroom rapidly eroded by an ongoing surge of model progress. To address this, there is a pressing need for difficult benchmarks that remain relevant for l… ▽ More

    Submitted 6 March, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: 20 pages, 13 figures

  12. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  13. arXiv:2501.14176  [pdf, other

    cs.LG cs.AI

    RL + Transformer = A General-Purpose Problem Solver

    Authors: Micah Rentschler, Jesse Roberts

    Abstract: What if artificial intelligence could not only solve problems for which it was trained but also learn to teach itself to solve new problems (i.e., meta-learn)? In this study, we demonstrate that a pre-trained transformer fine-tuned with reinforcement learning over multiple episodes develops the ability to solve problems that it has never encountered before - an emergent ability called In-Context R… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  14. Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection

    Authors: S M Mostaq Hossain, Amani Altarawneh, Jesse Roberts

    Abstract: As blockchain technology and smart contracts become widely adopted, securing them throughout every stage of the transaction process is essential. The concern of improved security for smart contracts is to find and detect vulnerabilities using classical Machine Learning (ML) models and fine-tuned Large Language Models (LLM). The robustness of such work rests on a labeled smart contract dataset that… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: 7 pages, 4 figures, 1 table. This paper has accepted in 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC)

  15. arXiv:2412.17613  [pdf, other

    cs.LG stat.ML

    Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities

    Authors: Lawrence Wang, Stephen J. Roberts

    Abstract: Traditional analyses of gradient descent optimization show that, when the largest eigenvalue of the loss Hessian - often referred to as the sharpness - is below a critical learning-rate threshold, then training is 'stable' and training loss decreases monotonically. Recent studies, however, have suggested that the majority of modern deep neural networks achieve good performance despite operating ou… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

  16. arXiv:2412.13602  [pdf, ps, other

    cs.CL

    GAMEBoT: Transparent Assessment of LLM Reasoning in Games

    Authors: Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, Kai Han

    Abstract: Large Language Models (LLMs) are increasingly deployed in real-world applications that demand complex reasoning. To track progress, robust benchmarks are required to evaluate their capabilities beyond superficial pattern recognition. However, current LLM reasoning benchmarks often face challenges such as insufficient interpretability, performance saturation or data contamination. To address these… ▽ More

    Submitted 1 June, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: 9 pages, ACL 2025

  17. arXiv:2411.12846  [pdf, other

    cs.CY cs.CV cs.HC eess.IV

    Towards Fairness in AI for Melanoma Detection: Systemic Review and Recommendations

    Authors: Laura N Montoya, Jennafer Shae Roberts, Belen Sanchez Hidalgo

    Abstract: Early and accurate melanoma detection is crucial for improving patient outcomes. Recent advancements in artificial intelligence AI have shown promise in this area, but the technologys effectiveness across diverse skin tones remains a critical challenge. This study conducts a systematic review and preliminary analysis of AI based melanoma detection research published between 2013 and 2024, focusing… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 22 pages, 4 figures, 7 tables,accepted for publication in Future of Information and Communication Conference (FICC) 2025, whose proceedings will be published in 'Lecture Notes in Networks and Systems' by Springer Nature

  18. arXiv:2411.05000  [pdf, other

    cs.CL

    Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

    Authors: Jonathan Roberts, Kai Han, Samuel Albanie

    Abstract: As the context limits of Large Language Models (LLMs) increase, the range of possible applications and downstream functions broadens. In many real-world tasks, decisions depend on details scattered across collections of often disparate documents containing mostly irrelevant information. Long-context LLMs appear well-suited to this form of complex information retrieval and reasoning, which has trad… ▽ More

    Submitted 23 April, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: Accepted at ICLR 2025

  19. arXiv:2410.22456  [pdf, other

    cs.CV cs.AI

    Image2Struct: Benchmarking Structure Extraction for Vision-Language Models

    Authors: Josselin Somerville Roberts, Tony Lee, Chi Heem Wong, Michihiro Yasunaga, Yifan Mai, Percy Liang

    Abstract: We introduce Image2Struct, a benchmark to evaluate vision-language models (VLMs) on extracting structure from images. Our benchmark 1) captures real-world use cases, 2) is fully automatic and does not require human judgment, and 3) is based on a renewable stream of fresh data. In Image2Struct, VLMs are prompted to generate the underlying structure (e.g., LaTeX code or HTML) from an input image (e.… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. First three authors contributed equally

  20. arXiv:2410.07112  [pdf, other

    cs.CV cs.AI

    VHELM: A Holistic Evaluation of Vision Language Models

    Authors: Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang

    Abstract: Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs… ▽ More

    Submitted 24 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. First three authors contributed equally

  21. arXiv:2409.02823  [pdf, other

    cs.HC

    Design Contradictions: Help or Hindrance?

    Authors: Aron E. Owen, Jonathan C. Roberts

    Abstract: The need for innovative ideas in data visualisation drives us to explore new creative approaches. Combining two or more creative words, particularly those that contradict each other, can positively impact the creative process, sparking novel ideas and designs. As we move towards AI-driven design, an open question arises: do these design contradictions work positively with AI tools? Currently, the… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  22. arXiv:2409.02036  [pdf, other

    cs.HC

    Towards Metrics for Evaluating Creativity in Visualisation Design

    Authors: Aron E Owen, Jonathan C Roberts

    Abstract: Creativity in visualisation design is essential for designers and data scientists who need to present data in innovative ways. It is often achieved through sketching or drafting low-fidelity prototypes. However, judging this innovation is often difficult. A creative visualisation test would offer a structured approach to enhancing visual thinking and design skills, which are vital across many fiel… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  23. arXiv:2409.01882  [pdf, other

    cs.CL

    Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis

    Authors: Ray Umphrey, Jesse Roberts, Lindsey Roberts

    Abstract: This study explores the potential of large language models (LLMs) for identifying and examining intertextual relationships within biblical, Koine Greek texts. By evaluating the performance of LLMs on various intertextuality scenarios the study demonstrates that these models can detect direct quotations, allusions, and echoes between texts. The LLM's ability to generate novel intertextual observati… ▽ More

    Submitted 29 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  24. arXiv:2409.01283  [pdf, other

    cs.HC

    Towards a Generative AI Design Dialogue

    Authors: Aron E. Owen, Jonathan C. Roberts

    Abstract: Traditional visualisation designers often start with sketches before implementation. With generative AI, these sketches can be turned into AI-generated visualisations using specific prompts. However, guiding AI to create compelling visuals can be challenging. We propose a new design process where designers verbalise their thoughts during work, later converting these narratives into AI prompts. Thi… ▽ More

    Submitted 19 August, 2024; originally announced September 2024.

  25. arXiv:2408.16479  [pdf, other

    cs.HC cs.CY

    Fostering Creative Visualisation Skills Through Data-Art Exhibitions

    Authors: Jonathan C. Roberts

    Abstract: Data-art exhibitions offer a unique and real-world setting to foster creative visualisation skills among students. They serve as real-world platform for students to display their work, bridging the gap between classroom learning and professional practice. Students must develop a technical solution, grasp the context, and produce work that is appropriate for public presentation. This scenario helps… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 2 pages, 2 figures, Accepted for presentation in IEEE VIS Posters 2024

    ACM Class: H.5.2; K.3.0; K.4.0

  26. arXiv:2408.11817  [pdf, other

    cs.CV

    GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

    Authors: Jonathan Roberts, Kai Han, Samuel Albanie

    Abstract: Large multimodal models (LMMs) have exhibited proficiencies across many visual tasks. Although numerous well-known benchmarks exist to evaluate model performance, they increasingly have insufficient headroom. As such, there is a pressing need for a new generation of benchmarks challenging enough for the next generation of LMMs. One area that LMMs show potential is graph analysis, specifically, the… ▽ More

    Submitted 29 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: V2: Fixed references formatting

  27. arXiv:2408.10439  [pdf, other

    cs.HC cs.GR

    Visual Storytelling: A Methodological Approach to Designing and Implementing a Visualisation Poster

    Authors: Rhiannon Owen, Jonathan Roberts

    Abstract: We present a design study of developing a visualisation poster. Posters can be difficult to create, and the story on a poster is not always clear. Using a case-study approach we propose three important aspects: the poster should have a clear focus (especially a hero visualisation), envisioning its use helps to drive the important aspects, and third the essence (its fundamental concept and guiding… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 5 pages, 1 figure, accepted for publication to the EG UK Computer Graphics & Visual Computing (CGVC) 2024

    ACM Class: I.3.8; K.3.0

  28. arXiv:2408.08651  [pdf, other

    cs.CL cs.AI

    Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning

    Authors: Kyle Moore, Jesse Roberts, Thao Pham, Douglas Fisher

    Abstract: Language models are known to absorb biases from their training data, leading to predictions driven by statistical regularities rather than semantic relevance. We investigate the impact of these biases on answer choice preferences in the Massive Multi-Task Language Understanding (MMLU) task. Our findings reveal that differences in learned regularities across answer options are predictive of model p… ▽ More

    Submitted 5 September, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  29. arXiv:2408.07590  [pdf, other

    cs.HC cs.GR

    Creating Data Art: Authentic Learning and Visualisation Exhibition

    Authors: Jonathan C. Roberts

    Abstract: We present an authentic learning task designed for computing students, centred on the creation of data-art visualisations from chosen datasets for a public exhibition. This exhibition was showcased in the cinema foyer for two weeks in June, providing a real-world platform for students to display their work. Over the course of two years, we implemented this active learning task with two different c… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 Figures, Accepted for publication in Proceedings EG UK Computer Graphics & Visual Computing (2024)

    ACM Class: I.3.8; K.3.0

  30. arXiv:2408.04750  [pdf, other

    cs.HC

    Engaging Data-Art: Conducting a Public Hands-On Workshop

    Authors: Jonathan C. Roberts

    Abstract: Data-art blends visualisation, data science, and artistic expression. It allows people to transform information and data into exciting and interesting visual narratives. Hosting a public data-art hands-on workshop enables participants to engage with data and learn fundamental visualisation techniques. However, being a public event, it presents a range of challenges. We outline our approach to orga… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 6 pages, 6 figures, IEEE EduVis workshop paper, proceedings of IEEE Visualization 2024 conference

    ACM Class: I.3.8; K.3.0

  31. arXiv:2408.03681  [pdf, other

    cs.HC

    Path-based Design Model for Constructing and Exploring Alternative Visualisations

    Authors: James Jackson, Panagiotis D. Ritsos, Peter W. S. Butcher, Jonathan C. Roberts

    Abstract: We present a path-based design model and system for designing and creating visualisations. Our model represents a systematic approach to constructing visual representations of data or concepts following a predefined sequence of steps. The initial step involves outlining the overall appearance of the visualisation by creating a skeleton structure, referred to as a flowpath. Subsequently, we specify… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 11 pages, 14 figures, accepted for publication in IEEE Transactions on Visualization and Computer Graphics

    ACM Class: J.5; I.3.0; I.3.6; I.3.8; H.5.2

  32. arXiv:2407.15695  [pdf, ps, other

    cs.HC cs.CL

    Supporting the Digital Autonomy of Elders Through LLM Assistance

    Authors: Jesse Roberts, Lindsey Roberts, Alice Reed

    Abstract: The internet offers tremendous access to services, social connections, and needed products. However, to those without sufficient experience, engaging with businesses and friends across the internet can be daunting due to the ever present danger of scammers and thieves, to say nothing of the myriad of potential computer viruses. Like a forest rich with both edible and poisonous plants, those famili… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  33. arXiv:2407.06349  [pdf, other

    cs.CL cs.AI

    Large Language Model Recall Uncertainty is Modulated by the Fan Effect

    Authors: Jesse Roberts, Kyle Moore, Thao Pham, Oseremhen Ewaleifoh, Doug Fisher

    Abstract: This paper evaluates whether large language models (LLMs) exhibit cognitive fan effects, similar to those discovered by Anderson in humans, after being pre-trained on human textual data. We conduct two sets of in-context recall experiments designed to elicit fan effects. Consistent with human results, we find that LLM recall uncertainty, measured via token probability, is influenced by the fan eff… ▽ More

    Submitted 29 September, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  34. arXiv:2407.03483  [pdf, other

    math.DS cs.MS math.AP

    Construct accurate multi-continuum micromorphic homogenisations in multi-D space-time with computer algebra

    Authors: A. J. Roberts

    Abstract: Homogenisation empowers the efficient macroscale system level prediction of physical scenarios with intricate microscale structures. Here we develop an innovative powerful, rigorous and flexible framework for asymptotic homogenisation of dynamics at the \emph{finite} scale separation of real physics, with proven results underpinned by modern dynamical systems theory. The novel systematic approach… ▽ More

    Submitted 6 April, 2025; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 3rd version

    MSC Class: 35B27; 74H10; 37L10; 35B40

  35. arXiv:2406.11634  [pdf, other

    cs.CL cs.AI

    The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance

    Authors: Kyle Moore, Jesse Roberts, Thao Pham, Oseremhen Ewaleifoh, Doug Fisher

    Abstract: Cloze testing is a common method for measuring the behavior of large language models on a number of benchmark tasks. Using the MMLU dataset, we show that the base-rate probability (BRP) differences across answer tokens are significant and affect task performance ie. guess A if uncertain. We find that counterfactual prompting does sufficiently mitigate the BRP effect. The BRP effect is found to hav… ▽ More

    Submitted 30 September, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  36. arXiv:2405.14930  [pdf, other

    astro-ph.IM astro-ph.GA cs.LG

    AstroPT: Scaling Large Observation Models for Astronomy

    Authors: Michael J. Smith, Ryan J. Roberts, Eirini Angeloudi, Marc Huertas-Company

    Abstract: This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find t… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, 1 table. Code available at https://github.com/Smith42/astroPT

  37. arXiv:2405.08807  [pdf, other

    cs.CV

    SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

    Authors: Jonathan Roberts, Kai Han, Neil Houlsby, Samuel Albanie

    Abstract: Large multimodal models (LMMs) have proven flexible and generalisable across many tasks and fields. Although they have strong potential to aid scientific research, their capabilities in this domain are not well characterised. A key aspect of scientific research is the ability to understand and interpret figures, which serve as a rich, compressed source of complex information. In this work, we pres… ▽ More

    Submitted 5 December, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted at NeurIPS 2024 (Datasets and Benchmarks Track)

  38. arXiv:2404.08710  [pdf, other

    cs.GT cs.AI

    Do Large Language Models Learn Human-Like Strategic Preferences?

    Authors: Jesse Roberts, Kyle Moore, Doug Fisher

    Abstract: In this paper, we evaluate whether LLMs learn to make human-like preference judgements in strategic scenarios as compared with known empirical results. Solar and Mistral are shown to exhibit stable value-based preference consistent with humans and exhibit human-like preference for cooperation in the prisoner's dilemma (including stake-size effect) and traveler's dilemma (including penalty-size eff… ▽ More

    Submitted 2 October, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  39. arXiv:2402.03116  [pdf, other

    cs.HC cs.LG

    Feature-Action Design Patterns for Storytelling Visualizations with Time Series Data

    Authors: Saiful Khan, Scott Jones, Benjamin Bach, Jaehoon Cha, Min Chen, Julie Meikle, Jonathan C Roberts, Jeyan Thiyagalingam, Jo Wood, Panagiotis D. Ritsos

    Abstract: We present a method to create storytelling visualization with time series data. Many personal decisions nowadays rely on access to dynamic data regularly, as we have seen during the COVID-19 pandemic. It is thus desirable to construct storytelling visualization for dynamic data that is selected by an individual for a specific context. Because of the need to tell data-dependent stories, predefined… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  40. arXiv:2311.14656  [pdf, other

    cs.CV cs.AI

    Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

    Authors: Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, Samuel Albanie

    Abstract: Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response. We conduct a series of experiments exploring various vision capabilities o… ▽ More

    Submitted 16 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: V3: Fixed typo in Fig.1; V2: Minor formatting changes and added missing subfigure captions

  41. arXiv:2311.02211  [pdf, other

    cs.OH cs.AI

    Rock Climbing Route Generation and Grading as Computational Creativity

    Authors: Jesse Roberts

    Abstract: In this paper, we bridge work in rock climbing route generation and grading into the computational creativity community. We provide the necessary background to situate that literature and demonstrate the domain's intellectual merit in the computational creativity community. We provide a guiding set of desiderata for future work in this area. We propose an approach to computational route grading. F… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  42. arXiv:2310.10500  [pdf, other

    q-fin.TR cs.LG q-fin.PM

    Few-Shot Learning Patterns in Financial Time-Series for Trend-Following Strategies

    Authors: Kieran Wood, Samuel Kessler, Stephen J. Roberts, Stefan Zohren

    Abstract: Forecasting models for systematic trading strategies do not adapt quickly when financial market conditions rapidly change, as was seen in the advent of the COVID-19 pandemic in 2020, causing many forecasting models to take loss-making positions. To deal with such situations, we propose a novel time-series trend-following forecaster that can quickly adapt to new market conditions, referred to as re… ▽ More

    Submitted 28 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: minor edits

  43. arXiv:2309.10215  [pdf

    cs.CY cs.HC cs.SI

    In Consideration of Indigenous Data Sovereignty: Data Mining as a Colonial Practice

    Authors: Jennafer Shae Roberts, Laura N Montoya

    Abstract: Data mining reproduces colonialism, and Indigenous voices are being left out of the development of technology that relies on data, such as artificial intelligence. This research stresses the need for the inclusion of Indigenous Data Sovereignty and centers on the importance of Indigenous rights over their own data. Inclusion is necessary in order to integrate Indigenous knowledge into the design,… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 12 pages, 1 Figure, Future Technologies Conference (FTC) 2023. arXiv admin note: substantial text overlap with arXiv:2208.04700

  44. arXiv:2309.08776  [pdf, other

    cs.LG cs.AI cs.RO

    Projected Task-Specific Layers for Multi-Task Reinforcement Learning

    Authors: Josselin Somerville Roberts, Julia Di

    Abstract: Multi-task reinforcement learning could enable robots to scale across a wide variety of manipulation tasks in homes and workplaces. However, generalizing from one task to another and mitigating negative task interference still remains a challenge. Addressing this challenge by successfully sharing information across tasks will depend on how well the structure underlying the tasks is captured. In th… ▽ More

    Submitted 6 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Journal ref: ICRA 2024

  45. arXiv:2309.05139  [pdf, other

    cs.CV cs.RO

    A Skeleton-based Approach For Rock Crack Detection Towards A Climbing Robot Application

    Authors: Josselin Somerville Roberts, Paul-Emile Giacomelli, Yoni Gozlan, Julia Di

    Abstract: Conventional wheeled robots are unable to traverse scientifically interesting, but dangerous, cave environments. Multi-limbed climbing robot designs, such as ReachBot, are able to grasp irregular surface features and execute climbing motions to overcome obstacles, given suitable grasp locations. To support grasp site identification, we present a method for detecting rock cracks and edges, the SKel… ▽ More

    Submitted 6 November, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

    Journal ref: IEEE IRC 2023

  46. arXiv:2308.09226  [pdf, other

    cs.CE

    Efficient computational homogenisation of 2D beams of heterogeneous elasticity using the patch scheme

    Authors: Thien Tran-Duc, J. E. Bunder, A. J. Roberts

    Abstract: Modern 'smart' materials have complex heterogeneous microscale structure, often with unknown macroscale closure but one we need to realise for large scale engineering and science. The multiscale Equation-Free Patch Scheme empowers us to non-intrusively, efficiently, and accurately predict the large scale, system level, solutions through computations on only small sparse patches of the given detail… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  47. Using Artificial Populations to Study Psychological Phenomena in Neural Models

    Authors: Jesse Roberts, Kyle Moore, Drew Wilenzick, Doug Fisher

    Abstract: The recent proliferation of research into transformer based natural language processing has led to a number of studies which attempt to detect the presence of human-like cognitive behavior in the models. We contend that, as is true of human psychology, the investigation of cognitive behavior in language models must be conducted in an appropriate population of an appropriate size for the results to… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  48. arXiv:2308.07703  [pdf, other

    cs.HC

    Challenges and Opportunities in Data Visualization Education: A Call to Action

    Authors: Benjamin Bach, Mandy Keck, Fateme Rajabiyazdi, Tatiana Losev, Isabel Meirelles, Jason Dykes, Robert S. Laramee, Mashael AlKadi, Christina Stoiber, Samuel Huron, Charles Perin, Luiz Morais, Wolfgang Aigner, Doris Kosminsky, Magdalena Boucher, Søren Knudsen, Areti Manataki, Jan Aerts, Uta Hinrichs, Jonathan C. Roberts, Sheelagh Carpendale

    Abstract: This paper is a call to action for research and discussion on data visualization education. As visualization evolves and spreads through our professional and personal lives, we need to understand how to support and empower a broad and diverse community of learners in visualization. Data Visualization is a diverse and dynamic discipline that combines knowledge from different fields, is tailored to… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at VIS 2023 Conference, Melbourne, VIC

  49. arXiv:2308.02559  [pdf, other

    cs.CV cs.LG hep-ex

    DLSIA: Deep Learning for Scientific Image Analysis

    Authors: Eric J Roberts, Tanny Chavez, Alexander Hexemer, Petrus H. Zwart

    Abstract: We introduce DLSIA (Deep Learning for Scientific Image Analysis), a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network (CNN) architectures for a wide variety of tasks in image analysis to be used in downstream data processing, or for experiment-in-the-loop computing scenarios. DL… ▽ More

    Submitted 26 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 10 pages, two column, 9 figures, 1 Supplementary section

  50. arXiv:2308.02399  [pdf, ps, other

    cs.CY cs.SI

    The Glamorisation of Unpaid Labour: AI and its Influencers

    Authors: Nana Mgbechikwere Nwachukwu, Jennafer Shae Roberts, Laura N Montoya

    Abstract: To harness the true potential of Artificial Intelligence (AI) for societal betterment, we need to move away from prioritising corporate interests which exploit Global South workers in the digital age. The unpaid labour and societal harms which are generated by Digital Value Networks (DVNs) disproportionately affect workers in Africa, Latin America, and India and need to be regulated. In this resea… ▽ More

    Submitted 15 September, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: 4 pages, 2 pages of references, Deep Learning Indaba 2023 Short Paper