-
Voxtral
Authors:
Alexander H. Liu,
Andy Ehrenberg,
Andy Lo,
Clément Denoix,
Corentin Barreau,
Guillaume Lample,
Jean-Malo Delignon,
Khyathi Raghavi Chandu,
Patrick von Platen,
Pavankumar Reddy Muddireddy,
Sanchit Gandhi,
Soham Ghosh,
Srijan Mishra,
Thomas Foubert,
Abhinav Rastogi,
Adam Yang,
Albert Q. Jiang,
Alexandre Sablayrolles,
Amélie Héliou,
Amélie Martin,
Anmol Agarwal,
Antoine Roux,
Arthur Darcet,
Arthur Mensch,
Baptiste Bout
, et al. (81 additional authors not shown)
Abstract:
We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enab…
▽ More
We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enables the model to handle audio files up to 40 minutes in duration and long multi-turn conversations. We also contribute three benchmarks for evaluating speech understanding models on knowledge and trivia. Both Voxtral models are released under Apache 2.0 license.
△ Less
Submitted 17 July, 2025;
originally announced July 2025.
-
Towards Fairness in AI for Melanoma Detection: Systemic Review and Recommendations
Authors:
Laura N Montoya,
Jennafer Shae Roberts,
Belen Sanchez Hidalgo
Abstract:
Early and accurate melanoma detection is crucial for improving patient outcomes. Recent advancements in artificial intelligence AI have shown promise in this area, but the technologys effectiveness across diverse skin tones remains a critical challenge. This study conducts a systematic review and preliminary analysis of AI based melanoma detection research published between 2013 and 2024, focusing…
▽ More
Early and accurate melanoma detection is crucial for improving patient outcomes. Recent advancements in artificial intelligence AI have shown promise in this area, but the technologys effectiveness across diverse skin tones remains a critical challenge. This study conducts a systematic review and preliminary analysis of AI based melanoma detection research published between 2013 and 2024, focusing on deep learning methodologies, datasets, and skin tone representation. Our findings indicate that while AI can enhance melanoma detection, there is a significant bias towards lighter skin tones. To address this, we propose including skin hue in addition to skin tone as represented by the LOreal Color Chart Map for a more comprehensive skin tone assessment technique. This research highlights the need for diverse datasets and robust evaluation metrics to develop AI models that are equitable and effective for all patients. By adopting best practices outlined in a PRISMA Equity framework tailored for healthcare and melanoma detection, we can work towards reducing disparities in melanoma outcomes.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Optimal transmission expansion modestly reduces decarbonization costs of U.S. electricity
Authors:
Rangrang Zheng,
Greg Schivley,
Matthias Fripp,
Michael J. Roberts
Abstract:
Expanding interregional transmission is widely viewed as essential for integrating clean energy into decarbonized power systems. Using the open-source Switch capacity expansion model with detailed representation of existing U.S. generation and transmission infrastructure, solar, wind, and storage resources, and hourly operations, we evaluate the role of transmission across least-cost, socially opt…
▽ More
Expanding interregional transmission is widely viewed as essential for integrating clean energy into decarbonized power systems. Using the open-source Switch capacity expansion model with detailed representation of existing U.S. generation and transmission infrastructure, solar, wind, and storage resources, and hourly operations, we evaluate the role of transmission across least-cost, socially optimal, and zero-emissions scenarios for 2050. An optimal nationwide plan would more than triple interregional transmission capacity, yet this reduces the cost of a zero-emissions system by only 7% relative to relying on existing transmission, as storage, solar and wind siting, and nuclear generation serve as close substitutes. Regional cost and rent effects vary, with transmission generally favoring wind and hydrogen resources over solar and batteries. Sensitivity analysis shows diminishing returns: one-fifth of the benefits of full expansion can be achieved with one-twelfth of the added capacity, while cost reductions for batteries and hydrogen provide comparable or greater system savings than transmission. Reconductoring -- quadrupling line capacity at half the cost of new builds achieves nearly all the benefits of unconstrained expansion. These results suggest that while substantial transmission expansion is economically justified, a diverse set of flexibility resources can substitute for large-scale grid build-out, and the relative value of transmission is highly contingent on technological and cost developments.
△ Less
Submitted 29 September, 2025; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19
Authors:
Davide Pigoli,
Kieran Baker,
Jobie Budd,
Lorraine Butler,
Harry Coppock,
Sabrina Egglestone,
Steven G. Gilmour,
Chris Holmes,
David Hurley,
Radka Jersakova,
Ivan Kiskin,
Vasiliki Koutra,
Jonathon Mellor,
George Nicholson,
Joe Packham,
Selina Patel,
Richard Payne,
Stephen J. Roberts,
Björn W. Schuller,
Ana Tendero-Cañadas,
Tracey Thornley,
Alexander Titcomb
Abstract:
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously ass…
▽ More
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
△ Less
Submitted 27 February, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Rastreo muscular móvil usando magnetomicrometría -- traducción al español del articulo "Untethered Muscle Tracking Using Magnetomicrometry" por el autor Cameron R. Taylor
Authors:
Cameron R. Taylor,
Seong Ho Yeon,
William H. Clark,
Ellen G. Clarrissimeaux,
Mary Kate O'Donnell,
Thomas J. Roberts,
Hugh M. Herr
Abstract:
Muscle tissue drives nearly all movement in the animal kingdom, providing power, mobility, and dexterity. Technologies for measuring muscle tissue motion, such as sonomicrometry, fluoromicrometry, and ultrasound, have significantly advanced our understanding of biomechanics. Yet, the field lacks the ability to monitor muscle tissue motion for animal behavior outside the lab. Towards addressing thi…
▽ More
Muscle tissue drives nearly all movement in the animal kingdom, providing power, mobility, and dexterity. Technologies for measuring muscle tissue motion, such as sonomicrometry, fluoromicrometry, and ultrasound, have significantly advanced our understanding of biomechanics. Yet, the field lacks the ability to monitor muscle tissue motion for animal behavior outside the lab. Towards addressing this issue, we previously introduced magnetomicrometry, a method that uses magnetic beads to wirelessly monitor muscle tissue length changes, and we validated magnetomicrometry via tightly-controlled in situ testing. In this study we validate the accuracy of magnetomicrometry against fluoromicrometry during untethered running in an in vivo turkey model. We demonstrate real-time muscle tissue length tracking of the freely-moving turkeys executing various motor activities, including ramp ascent and descent, vertical ascent and descent, and free roaming movement. Given the demonstrated capacity of magnetomicrometry to track muscle movement in untethered animals, we feel that this technique will enable new scientific explorations and an improved understanding of muscle function. -- --
El tejido muscular es el motor de casi todos los movimientos del reino animal, ya que proporciona fuerza, movilidad y destreza. Las tecnologías para medir el movimiento del tejido muscular, como la sonomicrometría, la fluoromicrometría y el ultrasonido, han avanzado considerablemente la comprensión de la biomecánica. Sin embargo, este campo carece de la capacidad de rastrear el movimiento del tejido muscular en el comportamiento animal fuera del laboratorio. Para abordar este problema, presentamos previamente la magnetomicrometría, un método que utiliza pequeños imanes para rastrear de forma inalámbrica los cambios de longitud del tejido muscular, y validamos la magnetomicrometría mediante pruebas estrechamente controladas in situ. En este estudio validamos la precisión de la magnetomicrometría en comparación con la fluoromicrometría usando un modelo de pavo in vivo mientras corre libremente. Demostramos el rastreo en tiempo real de la longitud del tejido muscular de los pavos que se mueven libremente ejecutando varias actividades motoras, incluyendo el ascenso y el descenso en rampa, el ascenso y el descenso vertical, y el movimiento libre. Dada la capacidad demostrada de la magnetomicrometría para rastrear el movimiento muscular en animales en un contexto móvil, creemos que esta técnica permitirá nuevas exploraciones científicas y una mejor comprensión de la función muscular.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
HumBugDB: A Large-scale Acoustic Mosquito Dataset
Authors:
Ivan Kiskin,
Marianne Sinka,
Adam D. Cobb,
Waqas Rafique,
Lawrence Wang,
Davide Zilli,
Benjamin Gutteridge,
Rinita Dam,
Theodoros Marinos,
Yunpeng Li,
Dickson Msaky,
Emmanuel Kaindoa,
Gerard Killeen,
Eva Herreros-Moya,
Kathy J. Willis,
Stephen J. Roberts
Abstract:
This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight. We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time. Significantly, 18 hours of recordings contain annotations from 36 different species. Mosquitoes are well-known carriers of diseases such as malaria, dengue and y…
▽ More
This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight. We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time. Significantly, 18 hours of recordings contain annotations from 36 different species. Mosquitoes are well-known carriers of diseases such as malaria, dengue and yellow fever. Collecting this dataset is motivated by the need to assist applications which utilise mosquito acoustics to conduct surveys to help predict outbreaks and inform intervention policy. The task of detecting mosquitoes from the sound of their wingbeats is challenging due to the difficulty in collecting recordings from realistic scenarios. To address this, as part of the HumBug project, we conducted global experiments to record mosquitoes ranging from those bred in culture cages to mosquitoes captured in the wild. Consequently, the audio recordings vary in signal-to-noise ratio and contain a broad range of indoor and outdoor background environments from Tanzania, Thailand, Kenya, the USA and the UK. In this paper we describe in detail how we collected, labelled and curated the data. The data is provided from a PostgreSQL database, which contains important metadata such as the capture method, age, feeding status and gender of the mosquitoes. Additionally, we provide code to extract features and train Bayesian convolutional neural networks for two key tasks: the identification of mosquitoes from their corresponding background environments, and the classification of detected mosquitoes into species. Our extensive dataset is both challenging to machine learning researchers focusing on acoustic identification, and critical to entomologists, geo-spatial modellers and other domain experts to understand mosquito behaviour, model their distribution, and manage the threat they pose to humans.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Using Temperature Sensitivity to Estimate Shiftable Electricity Demand: Implications for power system investments and climate change
Authors:
Michael J. Roberts,
Sisi Zhang,
Eleanor Yuan,
James Jones,
Matthias Fripp
Abstract:
Growth of intermittent renewable energy and climate change make it increasingly difficult to manage electricity demand variability. Centralized storage can help but is costly. An alternative is to shift demand. Cooling and heating demands are substantial and can be economically shifted using thermal storage. To estimate what thermal storage, employed at scale, might do to reshape electricity loads…
▽ More
Growth of intermittent renewable energy and climate change make it increasingly difficult to manage electricity demand variability. Centralized storage can help but is costly. An alternative is to shift demand. Cooling and heating demands are substantial and can be economically shifted using thermal storage. To estimate what thermal storage, employed at scale, might do to reshape electricity loads, we pair fine-scale weather data with hourly electricity use to estimate the share of temperature-sensitive demand across 31 regions that span the continental United States. We then show how much variability can be reduced by shifting temperature-sensitive loads, with and without improved transmission between regions. We find that approximately three quarters of within-day, within-region demand variability can be eliminated by shifting just half of temperature-sensitive demand. The variability-reducing benefits of shifting temperature-sensitive demand complement those gained from improved interregional transmission, and greatly mitigate the challenge of serving higher peaks under climate change.
△ Less
Submitted 13 June, 2022; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Arthroscopic Multi-Spectral Scene Segmentation Using Deep Learning
Authors:
Shahnewaz Ali,
Yaqub Jonmohamadi,
Yu Takeda,
Jonathan Roberts,
Ross Crawford,
Cameron Brown,
Ajay K. Pandey
Abstract:
Knee arthroscopy is a minimally invasive surgical (MIS) procedure which is performed to treat knee-joint ailment. Lack of visual information of the surgical site obtained from miniaturized cameras make this surgical procedure more complex. Knee cavity is a very confined space; therefore, surgical scenes are captured at close proximity. Insignificant context of knee atlas often makes them unrecogni…
▽ More
Knee arthroscopy is a minimally invasive surgical (MIS) procedure which is performed to treat knee-joint ailment. Lack of visual information of the surgical site obtained from miniaturized cameras make this surgical procedure more complex. Knee cavity is a very confined space; therefore, surgical scenes are captured at close proximity. Insignificant context of knee atlas often makes them unrecognizable as a consequence unintentional tissue damage often occurred and shows a long learning curve to train new surgeons. Automatic context awareness through labeling of the surgical site can be an alternative to mitigate these drawbacks. However, from the previous studies, it is confirmed that the surgical site exhibits several limitations, among others, lack of discriminative contextual information such as texture and features which drastically limits this vision task. Additionally, poor imaging conditions and lack of accurate ground-truth labels are also limiting the accuracy. To mitigate these limitations of knee arthroscopy, in this work we proposed a scene segmentation method that successfully segments multi structures.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Ultrasound Diagnosis of COVID-19: Robustness and Explainability
Authors:
Jay Roberts,
Theodoros Tsiligkaridis
Abstract:
Diagnosis of COVID-19 at point of care is vital to the containment of the global pandemic. Point of care ultrasound (POCUS) provides rapid imagery of lungs to detect COVID-19 in patients in a repeatable and cost effective way. Previous work has used public datasets of POCUS videos to train an AI model for diagnosis that obtains high sensitivity. Due to the high stakes application we propose the us…
▽ More
Diagnosis of COVID-19 at point of care is vital to the containment of the global pandemic. Point of care ultrasound (POCUS) provides rapid imagery of lungs to detect COVID-19 in patients in a repeatable and cost effective way. Previous work has used public datasets of POCUS videos to train an AI model for diagnosis that obtains high sensitivity. Due to the high stakes application we propose the use of robust and explainable techniques. We demonstrate experimentally that robust models have more stable predictions and offer improved interpretability. A framework of contrastive explanations based on adversarial perturbations is used to explain model predictions that aligns with human visual perception.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Automated bird sound recognition in realistic settings
Authors:
Timos Papadopoulos,
Stephen J. Roberts,
Katherine J. Willis
Abstract:
We evaluated the effectiveness of an automated bird sound identification system in a situation that emulates a realistic, typical application. We trained classification algorithms on a crowd-sourced collection of bird audio recording data and restricted our training methods to be completely free of manual intervention. The approach is hence directly applicable to the analysis of multiple species c…
▽ More
We evaluated the effectiveness of an automated bird sound identification system in a situation that emulates a realistic, typical application. We trained classification algorithms on a crowd-sourced collection of bird audio recording data and restricted our training methods to be completely free of manual intervention. The approach is hence directly applicable to the analysis of multiple species collections, with labelling provided by crowd-sourced collection. We evaluated the performance of the bird sound recognition system on a realistic number of candidate classes, corresponding to real conditions. We investigated the use of two canonical classification methods, chosen due to their widespread use and ease of interpretation, namely a k Nearest Neighbour (kNN) classifier with histogram-based features and a Support Vector Machine (SVM) with time-summarisation features. We further investigated the use of a certainty measure, derived from the output probabilities of the classifiers, to enhance the interpretability and reliability of the class decisions. Our results demonstrate that both identification methods achieved similar performance, but we argue that the use of the kNN classifier offers somewhat more flexibility. Furthermore, we show that employing an outcome certainty measure provides a valuable and consistent indicator of the reliability of classification results. Our use of generic training data and our investigation of probabilistic classification methodologies that can flexibly address the variable number of candidate species/classes that are expected to be encountered in the field, directly contribute to the development of a practical bird sound identification system with potentially global application. Further, we show that certainty measures associated with identification outcomes can significantly contribute to the practical usability of the overall system.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
Stochastic processes and feedback-linearisation for online identification and Bayesian adaptive control of fully-actuated mechanical systems
Authors:
Jan-Peter Calliess,
Antonis Papachristodoulou,
Stephen J. Roberts
Abstract:
This work proposes a new method for simultaneous probabilistic identification and control of an observable, fully-actuated mechanical system. Identification is achieved by conditioning stochastic process priors on observations of configurations and noisy estimates of configuration derivatives. In contrast to previous work that has used stochastic processes for identification, we leverage the struc…
▽ More
This work proposes a new method for simultaneous probabilistic identification and control of an observable, fully-actuated mechanical system. Identification is achieved by conditioning stochastic process priors on observations of configurations and noisy estimates of configuration derivatives. In contrast to previous work that has used stochastic processes for identification, we leverage the structural knowledge afforded by Lagrangian mechanics and learn the drift and control input matrix functions of the control-affine system separately. We utilise feedback-linearisation to reduce, in expectation, the uncertain nonlinear control problem to one that is easy to regulate in a desired manner. Thereby, our method combines the flexibility of nonparametric Bayesian learning with epistemological guarantees on the expected closed-loop trajectory. We illustrate our method in the context of torque-actuated pendula where the dynamics are learned with a combination of normal and log-normal processes.
△ Less
Submitted 1 April, 2014; v1 submitted 18 November, 2013;
originally announced November 2013.