-
The Aloe Family Recipe for Open and Specialized Healthcare LLMs
Authors:
Dario Garcia-Gasulla,
Jordi Bayarri-Planas,
Ashwin Kumar Gururajan,
Enrique Lopez-Cuena,
Adrian Tormos,
Daniel Hinjos,
Pablo Bernabeu-Perez,
Anna Arias-Duart,
Pablo Agustin Martin-Torres,
Marta Gonzalez-Mallo,
Sergio Alvarez-Napagao,
Eduard Ayguadé-Parra,
Ulises Cortés
Abstract:
Purpose: With advancements in Large Language Models (LLMs) for healthcare, the need arises for competitive open-source models to protect the public interest. This work contributes to the field of open medical LLMs by optimizing key stages of data preprocessing and training, while showing how to improve model safety (through DPO) and efficacy (through RAG). The evaluation methodology used, which in…
▽ More
Purpose: With advancements in Large Language Models (LLMs) for healthcare, the need arises for competitive open-source models to protect the public interest. This work contributes to the field of open medical LLMs by optimizing key stages of data preprocessing and training, while showing how to improve model safety (through DPO) and efficacy (through RAG). The evaluation methodology used, which includes four different types of tests, defines a new standard for the field. The resultant models, shown to be competitive with the best private alternatives, are released with a permisive license.
Methods: Building on top of strong base models like Llama 3.1 and Qwen 2.5, Aloe Beta uses a custom dataset to enhance public data with synthetic Chain of Thought examples. The models undergo alignment with Direct Preference Optimization, emphasizing ethical and policy-aligned performance in the presence of jailbreaking attacks. Evaluation includes close-ended, open-ended, safety and human assessments, to maximize the reliability of results.
Results: Recommendations are made across the entire pipeline, backed by the solid performance of the Aloe Family. These models deliver competitive performance across healthcare benchmarks and medical fields, and are often preferred by healthcare professionals. On bias and toxicity, the Aloe Beta models significantly improve safety, showing resilience to unseen jailbreaking attacks. For a responsible release, a detailed risk assessment specific to healthcare is attached to the Aloe Family models.
Conclusion: The Aloe Beta models, and the recipe that leads to them, are a significant contribution to the open-source medical LLM field, offering top-of-the-line performance while maintaining high ethical requirements. This work sets a new standard for developing and reporting aligned LLMs in healthcare.
△ Less
Submitted 28 May, 2025; v1 submitted 7 May, 2025;
originally announced May 2025.
-
SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing
Authors:
Xinlei Niu,
Kin Wai Cheuk,
Jing Zhang,
Naoki Murata,
Chieh-Hsin Lai,
Michele Mancusi,
Woosung Choi,
Giorgio Fabbro,
Wei-Hsiang Liao,
Charles Patrick Martin,
Yuki Mitsufuji
Abstract:
Music editing is an important step in music production, which has broad applications, including game development and film production. Most existing zero-shot text-guided methods rely on pretrained diffusion models by involving forward-backward diffusion processes for editing. However, these methods often struggle to maintain the music content consistency. Additionally, text instructions alone usua…
▽ More
Music editing is an important step in music production, which has broad applications, including game development and film production. Most existing zero-shot text-guided methods rely on pretrained diffusion models by involving forward-backward diffusion processes for editing. However, these methods often struggle to maintain the music content consistency. Additionally, text instructions alone usually fail to accurately describe the desired music. In this paper, we propose two music editing methods that enhance the consistency between the original and edited music by leveraging score distillation. The first method, SteerMusic, is a coarse-grained zero-shot editing approach using delta denoising score. The second method, SteerMusic+, enables fine-grained personalized music editing by manipulating a concept token that represents a user-defined musical style. SteerMusic+ allows for the editing of music into any user-defined musical styles that cannot be achieved by the text instructions alone. Experimental results show that our methods outperform existing approaches in preserving both music content consistency and editing fidelity. User studies further validate that our methods achieve superior music editing quality. Audio examples are available on https://steermusic.pages.dev/.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis
Authors:
Alexandre Bazin,
Alain Gutierrez,
Marianne Huchard,
Pierre Martin,
Yulin,
Zhang
Abstract:
A widely used Agile practice for requirements is to produce a set of user stories (also called ``agile product backlog''), which roughly includes a list of pairs (role, feature), where the role handles the feature for a certain purpose. In the context of Software Product Lines, the requirements for a family of similar systems is thus a family of user-story sets, one per system, leading to a 3-dime…
▽ More
A widely used Agile practice for requirements is to produce a set of user stories (also called ``agile product backlog''), which roughly includes a list of pairs (role, feature), where the role handles the feature for a certain purpose. In the context of Software Product Lines, the requirements for a family of similar systems is thus a family of user-story sets, one per system, leading to a 3-dimensional dataset composed of sets of triples (system, role, feature). In this paper, we combine Triadic Concept Analysis (TCA) and Large Language Model (LLM) prompting to suggest the user-story set required to develop a new system relying on the variability logic of an existing system family. This process consists in 1) computing 3-dimensional variability expressed as a set of TCA implications, 2) providing the designer with intelligible design options, 3) capturing the designer's selection of options, 4) proposing a first user-story set corresponding to this selection, 5) consolidating its validity according to the implications identified in step 1, while completing it if necessary, and 6) leveraging LLM to have a more comprehensive website. This process is evaluated with a dataset comprising the user-story sets of 67 similar-purpose websites.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
Authors:
Paul Martin,
Sarah Mercer
Abstract:
Organisations are rapidly adopting artificial intelligence (AI) tools to perform tasks previously undertaken by people. The potential benefits are enormous. Separately, some organisations deploy personnel security measures to mitigate the security risks arising from trusted human insiders. Unfortunately, there is no meaningful interplay between the rapidly evolving domain of AI and the traditional…
▽ More
Organisations are rapidly adopting artificial intelligence (AI) tools to perform tasks previously undertaken by people. The potential benefits are enormous. Separately, some organisations deploy personnel security measures to mitigate the security risks arising from trusted human insiders. Unfortunately, there is no meaningful interplay between the rapidly evolving domain of AI and the traditional world of personnel security. This is a problem. The complex risks from human insiders are hard enough to understand and manage, despite many decades of effort. The emerging security risks from AI insiders are even more opaque. Both sides need all the help they can get. Some of the concepts and approaches that have proved useful in dealing with human insiders are also applicable to the emerging risks from AI insiders.
△ Less
Submitted 5 April, 2025; v1 submitted 26 March, 2025;
originally announced April 2025.
-
GS-QA: Comprehensive Quality Assessment Benchmark for Gaussian Splatting View Synthesis
Authors:
Pedro Martin,
António Rodrigues,
João Ascenso,
Maria Paula Queluz
Abstract:
Gaussian Splatting (GS) offers a promising alternative to Neural Radiance Fields (NeRF) for real-time 3D scene rendering. Using a set of 3D Gaussians to represent complex geometry and appearance, GS achieves faster rendering times and reduced memory consumption compared to the neural network approach used in NeRF. However, quality assessment of GS-generated static content is not yet explored in-de…
▽ More
Gaussian Splatting (GS) offers a promising alternative to Neural Radiance Fields (NeRF) for real-time 3D scene rendering. Using a set of 3D Gaussians to represent complex geometry and appearance, GS achieves faster rendering times and reduced memory consumption compared to the neural network approach used in NeRF. However, quality assessment of GS-generated static content is not yet explored in-depth. This paper describes a subjective quality assessment study that aims to evaluate synthesized videos obtained with several static GS state-of-the-art methods. The methods were applied to diverse visual scenes, covering both 360-degree and forward-facing (FF) camera trajectories. Moreover, the performance of 18 objective quality metrics was analyzed using the scores resulting from the subjective study, providing insights into their strengths, limitations, and alignment with human perception. All videos and scores are made available providing a comprehensive database that can be used as benchmark on GS view synthesis and objective quality metrics.
△ Less
Submitted 16 June, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Authors:
Anna Arias-Duart,
Pablo Agustin Martin-Torres,
Daniel Hinjos,
Pablo Bernabeu-Perez,
Lucia Urcelay Ganzabal,
Marta Gonzalez Mallo,
Ashwin Kumar Gururajan,
Enrique Lopez-Cuena,
Sergio Alvarez-Napagao,
Dario Garcia-Gasulla
Abstract:
Current Large Language Models (LLMs) benchmarks are often based on open-ended or close-ended QA evaluations, avoiding the requirement of human labor. Close-ended measurements evaluate the factuality of responses but lack expressiveness. Open-ended capture the model's capacity to produce discourse responses but are harder to assess for correctness. These two approaches are commonly used, either ind…
▽ More
Current Large Language Models (LLMs) benchmarks are often based on open-ended or close-ended QA evaluations, avoiding the requirement of human labor. Close-ended measurements evaluate the factuality of responses but lack expressiveness. Open-ended capture the model's capacity to produce discourse responses but are harder to assess for correctness. These two approaches are commonly used, either independently or together, though their relationship remains poorly understood. This work is focused on the healthcare domain, where both factuality and discourse matter greatly. It introduces a comprehensive, multi-axis suite for healthcare LLM evaluation, exploring correlations between open and close benchmarks and metrics. Findings include blind spots and overlaps in current methodologies. As an updated sanity check, we release a new medical benchmark --CareQA-- with both open and closed variants. Finally, we propose a novel metric for open-ended evaluations -- Relaxed Perplexity -- to mitigate the identified limitations.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Brief analysis of DeepSeek R1 and its implications for Generative AI
Authors:
Sarah Mercer,
Samuel Spillard,
Daniel P. Martin
Abstract:
In late January 2025, DeepSeek released their new reasoning model (DeepSeek R1); which was developed at a fraction of the cost yet remains competitive with OpenAI's models, despite the US's GPU export ban. This report discusses the model, and what its release means for the field of Generative AI more widely. We briefly discuss other models released from China in recent weeks, their similarities; i…
▽ More
In late January 2025, DeepSeek released their new reasoning model (DeepSeek R1); which was developed at a fraction of the cost yet remains competitive with OpenAI's models, despite the US's GPU export ban. This report discusses the model, and what its release means for the field of Generative AI more widely. We briefly discuss other models released from China in recent weeks, their similarities; innovative use of Mixture of Experts (MoE), Reinforcement Learning (RL) and clever engineering appear to be key factors in the capabilities of these models. This think piece has been written to a tight timescale, providing broad coverage of the topic, and serves as introductory material for those looking to understand the model's technical advancements, as well as its place in the ecosystem. Several further areas of research are identified.
△ Less
Submitted 7 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Measuring Grammatical Diversity from Small Corpora: Derivational Entropy Rates, Mean Length of Utterances, and Annotation Invariance
Authors:
Fermin Moscoso del Prado Martin
Abstract:
In many fields, such as language acquisition, neuropsychology of language, the study of aging, and historical linguistics, corpora are used for estimating the diversity of grammatical structures that are produced during a period by an individual, community, or type of speakers. In these cases, treebanks are taken as representative samples of the syntactic structures that might be encountered. Gene…
▽ More
In many fields, such as language acquisition, neuropsychology of language, the study of aging, and historical linguistics, corpora are used for estimating the diversity of grammatical structures that are produced during a period by an individual, community, or type of speakers. In these cases, treebanks are taken as representative samples of the syntactic structures that might be encountered. Generalizing the potential syntactic diversity from the structures documented in a small corpus requires careful extrapolation whose accuracy is constrained by the limited size of representative sub-corpora. In this article, I demonstrate -- theoretically, and empirically -- that a grammar's derivational entropy and the mean length of the utterances (MLU) it generates are fundamentally linked, giving rise to a new measure, the derivational entropy rate. The mean length of utterances becomes the most practical index of syntactic complexity; I demonstrate that MLU is not a mere proxy, but a fundamental measure of syntactic diversity. In combination with the new derivational entropy rate measure, it provides a theory-free assessment of grammatical complexity. The derivational entropy rate indexes the rate at which different grammatical annotation frameworks determine the grammatical complexity of treebanks. I introduce the Smoothed Induced Treebank Entropy (SITE) as a tool for estimating these measures accurately, even from very small treebanks. I conclude by discussing important implications of these results for both NLP and human language processing.
△ Less
Submitted 8 December, 2024;
originally announced December 2024.
-
Assessing Similarity Measures for the Evaluation of Human-Robot Motion Correspondence
Authors:
Charles Dietzel,
Patrick J. Martin
Abstract:
One key area of research in Human-Robot Interaction is solving the human-robot correspondence problem, which asks how a robot can learn to reproduce a human motion demonstration when the human and robot have different dynamics and kinematic structures. Evaluating these correspondence problem solutions often requires the use of qualitative surveys that can be time consuming to design and administer…
▽ More
One key area of research in Human-Robot Interaction is solving the human-robot correspondence problem, which asks how a robot can learn to reproduce a human motion demonstration when the human and robot have different dynamics and kinematic structures. Evaluating these correspondence problem solutions often requires the use of qualitative surveys that can be time consuming to design and administer. Additionally, qualitative survey results vary depending on the population of survey participants. In this paper, we propose the use of heterogeneous time-series similarity measures as a quantitative evaluation metric for evaluating motion correspondence to complement these qualitative surveys. To assess the suitability of these measures, we develop a behavioral cloning-based motion correspondence model, and evaluate it with a qualitative survey as well as quantitative measures. By comparing the resulting similarity scores with the human survey results, we identify Gromov Dynamic Time Warping as a promising quantitative measure for evaluating motion correspondence.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Evaluation of strategies for efficient rate-distortion NeRF streaming
Authors:
Pedro Martin,
António Rodrigues,
João Ascenso,
Maria Paula Queluz
Abstract:
Neural Radiance Fields (NeRF) have revolutionized the field of 3D visual representation by enabling highly realistic and detailed scene reconstructions from a sparse set of images. NeRF uses a volumetric functional representation that maps 3D points to their corresponding colors and opacities, allowing for photorealistic view synthesis from arbitrary viewpoints. Despite its advancements, the effic…
▽ More
Neural Radiance Fields (NeRF) have revolutionized the field of 3D visual representation by enabling highly realistic and detailed scene reconstructions from a sparse set of images. NeRF uses a volumetric functional representation that maps 3D points to their corresponding colors and opacities, allowing for photorealistic view synthesis from arbitrary viewpoints. Despite its advancements, the efficient streaming of NeRF content remains a significant challenge due to the large amount of data involved. This paper investigates the rate-distortion performance of two NeRF streaming strategies: pixel-based and neural network (NN) parameter-based streaming. While in the former, images are coded and then transmitted throughout the network, in the latter, the respective NeRF model parameters are coded and transmitted instead. This work also highlights the trade-offs in complexity and performance, demonstrating that the NN parameter-based strategy generally offers superior efficiency, making it suitable for one-to-many streaming scenarios.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model
Authors:
Xinlei Niu,
Jing Zhang,
Charles Patrick Martin
Abstract:
We present SoundMorpher, an open-world sound morphing method designed to generate perceptually uniform morphing trajectories. Traditional sound morphing techniques typically assume a linear relationship between the morphing factor and sound perception, achieving smooth transitions by linearly interpolating the semantic features of source and target sounds while gradually adjusting the morphing fac…
▽ More
We present SoundMorpher, an open-world sound morphing method designed to generate perceptually uniform morphing trajectories. Traditional sound morphing techniques typically assume a linear relationship between the morphing factor and sound perception, achieving smooth transitions by linearly interpolating the semantic features of source and target sounds while gradually adjusting the morphing factor. However, these methods oversimplify the complexities of sound perception, resulting in limitations in morphing quality. In contrast, SoundMorpher explores an explicit relationship between the morphing factor and the perception of morphed sounds, leveraging log Mel-spectrogram features. This approach further refines the morphing sequence by ensuring a constant target perceptual difference for each transition and determining the corresponding morphing factors using binary search. To address the lack of a formal quantitative evaluation framework for sound morphing, we propose a set of metrics based on three established objective criteria. These metrics enable comprehensive assessment of morphed results and facilitate direct comparisons between methods, fostering advancements in sound morphing research. Extensive experiments demonstrate the effectiveness and versatility of SoundMorpher in real-world scenarios, showcasing its potential in applications such as creative music composition, film post-production, and interactive audio technologies. Our demonstration and codes are available at~\url{https://xinleiniu.github.io/SoundMorpher-demo/}.
△ Less
Submitted 16 December, 2024; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Tonal Cognition in Sonification: Exploring the Needs of Practitioners in Sonic Interaction Design
Authors:
Minsik Choi,
Josh Andres,
Charles Patrick Martin
Abstract:
Research into tonal music examines the structural relationships among sounds and how they align with our auditory perception. The exploration of integrating tonal cognition into sonic interaction design, particularly for practitioners lacking extensive musical knowledge, and developing an accessible software tool, remains limited. We report on a study of designers to understand the sound creation…
▽ More
Research into tonal music examines the structural relationships among sounds and how they align with our auditory perception. The exploration of integrating tonal cognition into sonic interaction design, particularly for practitioners lacking extensive musical knowledge, and developing an accessible software tool, remains limited. We report on a study of designers to understand the sound creation practices of industry experts and explore how infusing tonal music principles into a sound design tool can better support their craft and enhance the sonic experiences they create. Our study collected qualitative data through semi-structured individual and focus group interviews with six participants. We developed a low-fidelity prototype sound design tool that involves practical methods of functional harmony and interaction design discussed in focus groups. We identified four themes through reflexive thematic analysis: decision-making, domain knowledge and terminology, collaboration, and contexts in sound creation. Finally, we discussed design considerations for an accessible sonic interaction design tool that aligns auditory experience more closely with tonal cognition.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
UltraCortex: Submillimeter Ultra-High Field 9.4 T Brain MR Image Collection and Manual Cortical Segmentations
Authors:
Lucas Mahler,
Julius Steiglechner,
Benjamin Bender,
Tobias Lindig,
Dana Ramadan,
Jonas Bause,
Florian Birk,
Rahel Heule,
Edyta Charyasz,
Michael Erb,
Vinod Jangir Kumar,
Gisela E Hagberg,
Pascal Martin,
Gabriele Lohmann,
Klaus Scheffler
Abstract:
The UltraCortex repository (https://www.ultracortex.org) houses magnetic resonance imaging data of the human brain obtained at an ultra-high field strength of 9.4 T. It contains 86 structural MR images with spatial resolutions ranging from 0.6 to 0.8 mm. Additionally, the repository includes segmentations of 12 brains into gray and white matter compartments. These segmentations have been independe…
▽ More
The UltraCortex repository (https://www.ultracortex.org) houses magnetic resonance imaging data of the human brain obtained at an ultra-high field strength of 9.4 T. It contains 86 structural MR images with spatial resolutions ranging from 0.6 to 0.8 mm. Additionally, the repository includes segmentations of 12 brains into gray and white matter compartments. These segmentations have been independently validated by two expert neuroradiologists, thus establishing them as a reliable gold standard. This resource provides researchers with access to high-quality brain imaging data and validated segmentations, facilitating neuroimaging studies and advancing our understanding of brain structure and function. Existing repositories do not accommodate field strengths beyond 7 T, nor do they offer validated segmentations, underscoring the significance of this new resource.
△ Less
Submitted 9 January, 2025; v1 submitted 3 June, 2024;
originally announced June 2024.
-
NeRF View Synthesis: Subjective Quality Assessment and Objective Metrics Evaluation
Authors:
Pedro Martin,
Antonio Rodrigues,
Joao Ascenso,
Maria Paula Queluz
Abstract:
Neural radiance fields (NeRF) are a groundbreaking computer vision technology that enables the generation of high-quality, immersive visual content from multiple viewpoints. This capability has significant advantages for applications such as virtual/augmented reality, 3D modelling, and content creation for the film and entertainment industry. However, the evaluation of NeRF methods poses several c…
▽ More
Neural radiance fields (NeRF) are a groundbreaking computer vision technology that enables the generation of high-quality, immersive visual content from multiple viewpoints. This capability has significant advantages for applications such as virtual/augmented reality, 3D modelling, and content creation for the film and entertainment industry. However, the evaluation of NeRF methods poses several challenges, including a lack of comprehensive datasets, reliable assessment methodologies, and objective quality metrics. This paper addresses the problem of NeRF view synthesis (NVS) quality assessment thoroughly, by conducting a rigorous subjective quality assessment test that considers several scene classes and recently proposed NVS methods. Additionally, the performance of a wide range of state-of-the-art conventional and learning-based full-reference 2D image and video quality assessment metrics is evaluated against the subjective scores of the subjective study. This study found that errors in camera pose estimation can result in spatial misalignments between synthesized and reference images, which need to be corrected before applying an objective quality metric. The experimental results are analyzed in depth, providing a comparative evaluation of several NVS methods and objective quality metrics, across different classes of visual scenes, including real and synthetic content for front-face and 360-degree camera trajectories.
△ Less
Submitted 27 September, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation
Authors:
Xinlei Niu,
Jing Zhang,
Christian Walder,
Charles Patrick Martin
Abstract:
We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the gen…
▽ More
We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the generated outputs, resulting in coherent and high-fidelity performance. Our experiments demonstrate that SoundLoCD outperforms the baseline with greatly reduced computational resources. A comprehensive ablation study further validates the contribution of each component within SoundLoCD. Demo page: \url{https://XinleiNIU.github.io/demo-SoundLoCD/}.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Integrating knowledge-guided symbolic regression and model-based design of experiments to automate process flow diagram development
Authors:
Alexander W. Rogers,
Amanda Lane,
Cesar Mendoza,
Simon Watson,
Adam Kowalski,
Philip Martin,
Dongda Zhang
Abstract:
New products must be formulated rapidly to succeed in the global formulated product market; however, key product indicators (KPIs) can be complex, poorly understood functions of the chemical composition and processing history. Consequently, scale-up must currently undergo expensive trial-and-error campaigns. To accelerate process flow diagram (PFD) optimisation and knowledge discovery, this work p…
▽ More
New products must be formulated rapidly to succeed in the global formulated product market; however, key product indicators (KPIs) can be complex, poorly understood functions of the chemical composition and processing history. Consequently, scale-up must currently undergo expensive trial-and-error campaigns. To accelerate process flow diagram (PFD) optimisation and knowledge discovery, this work proposed a novel digital framework to automatically quantify process mechanisms by integrating symbolic regression (SR) within model-based design of experiments (MBDoE). Each iteration, SR proposed a Pareto front of interpretable mechanistic expressions, and then MBDoE designed a new experiment to discriminate between them while balancing PFD optimisation. To investigate the framework's performance, a new process model capable of simulating general formulated product synthesis was constructed to generate in-silico data for different case studies. The framework could effectively discover ground-truth process mechanisms within a few iterations, indicating its great potential for use within the general chemical industry for digital manufacturing and product innovation.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Authors:
Ashwin Kumar Gururajan,
Enrique Lopez-Cuena,
Jordi Bayarri-Planas,
Adrian Tormos,
Daniel Hinjos,
Pablo Bernabeu-Perez,
Anna Arias-Duart,
Pablo Agustin Martin-Torres,
Lucia Urcelay-Ganzabal,
Marta Gonzalez-Mallo,
Sergio Alvarez-Napagao,
Eduard Ayguadé-Parra,
Ulises Cortés Dario Garcia-Gasulla
Abstract:
As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging,…
▽ More
As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging, alignment, red teaming and advanced inference schemes, as means to improve current open models. To that end, we introduce the Aloe family, a set of open medical LLMs highly competitive within its scale range. Aloe models are trained on the current best base models (Mistral, LLaMA 3), using a new custom dataset which combines public data sources improved with synthetic Chain of Thought (CoT). Aloe models undergo an alignment phase, becoming one of the first few policy-aligned open healthcare LLM using Direct Preference Optimization, setting a new standard for ethical performance in healthcare LLMs. Model evaluation expands to include various bias and toxicity datasets, a dedicated red teaming effort, and a much-needed risk assessment for healthcare LLMs. Finally, to explore the limits of current LLMs in inference, we study several advanced prompt engineering strategies to boost performance across benchmarks, yielding state-of-the-art results for open healthcare 7B LLMs, unprecedented at this scale.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Authors:
Xinlei Niu,
Jing Zhang,
Charles Patrick Martin
Abstract:
We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder…
▽ More
We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder and optimises style text embeddings to align with the speaker style information through contrastive learning in parallel. Therefore, HybridVC can be efficiently trained under limited computational resources. Our experiments demonstrate HybridVC's superior training efficiency and its capability for advanced multi-modal voice style conversion. This underscores its potential for widespread applications such as user-defined personalised voice in various social media platforms. A comprehensive ablation study further validates the effectiveness of our method.
△ Less
Submitted 24 September, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Improving the interpretability of GNN predictions through conformal-based graph sparsification
Authors:
Pablo Sanchez-Martin,
Kinaan Aamir Khan,
Isabel Valera
Abstract:
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks. However, most GNN architectures aggregate information from all nodes and edges in a graph, regardless of their relevance to the task at hand, thus hindering the interpretability of their predictions. In contrast to prior work, in this paper we propose a GNN \emph{training} approach that j…
▽ More
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks. However, most GNN architectures aggregate information from all nodes and edges in a graph, regardless of their relevance to the task at hand, thus hindering the interpretability of their predictions. In contrast to prior work, in this paper we propose a GNN \emph{training} approach that jointly i) finds the most predictive subgraph by removing edges and/or nodes -- -\emph{without making assumptions about the subgraph structure} -- while ii) optimizing the performance of the graph classification task. To that end, we rely on reinforcement learning to solve the resulting bi-level optimization with a reward function based on conformal predictions to account for the current in-training uncertainty of the classifier. Our empirical results on nine different graph classification datasets show that our method competes in performance with baselines while relying on significantly sparser subgraphs, leading to more interpretable GNN-based predictions.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Exploring the Boundaries of Ambient Awareness in Twitter
Authors:
Pablo Sanchez-Martin,
Sonja Utz,
Isabel Valera
Abstract:
Ambient awareness refers to the ability of social media users to obtain knowledge about who knows what (i.e., users' expertise) in their network, by simply being exposed to other users' content (e.g, tweets on Twitter). Previous work, based on user surveys, reveals that individuals self-report ambient awareness only for parts of their networks. However, it is unclear whether it is their limited co…
▽ More
Ambient awareness refers to the ability of social media users to obtain knowledge about who knows what (i.e., users' expertise) in their network, by simply being exposed to other users' content (e.g, tweets on Twitter). Previous work, based on user surveys, reveals that individuals self-report ambient awareness only for parts of their networks. However, it is unclear whether it is their limited cognitive capacity or the limited exposure to diagnostic tweets (i.e., online content) that prevents people from developing ambient awareness for their complete network. In this work, we focus on in-wall ambient awareness (IWAA) in Twitter and conduct a two-step data-driven analysis, that allows us to explore to which extent IWAA is likely, or even possible. First, we rely on reactions (e.g., likes), as strong evidence of users being aware of experts in Twitter. Unfortunately, such strong evidence can be only measured for active users, which represent the minority in the network. Thus to study the boundaries of IWAA to a larger extent, in the second part of our analysis, we instead focus on the passive exposure to content generated by other users -- which we refer to as in-wall visibility. This analysis shows that (in line with \citet{levordashka2016ambient}) only for a subset of users IWAA is plausible, while for the majority it is unlikely, if even possible, to develop IWAA. We hope that our methodology paves the way for the emergence of data-driven approaches for the study of ambient awareness.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Padding Aware Neurons
Authors:
Dario Garcia-Gasulla,
Victor Gimenez-Abalos,
Pablo Martin-Torres
Abstract:
Convolutional layers are a fundamental component of most image-related models. These layers often implement by default a static padding policy (\eg zero padding), to control the scale of the internal representations, and to allow kernel activations centered on the border regions. In this work we identify Padding Aware Neurons (PANs), a type of filter that is found in most (if not all) convolutiona…
▽ More
Convolutional layers are a fundamental component of most image-related models. These layers often implement by default a static padding policy (\eg zero padding), to control the scale of the internal representations, and to allow kernel activations centered on the border regions. In this work we identify Padding Aware Neurons (PANs), a type of filter that is found in most (if not all) convolutional models trained with static padding. PANs focus on the characterization and recognition of input border location, introducing a spatial inductive bias into the model (e.g., how close to the input's border a pattern typically is). We propose a method to identify PANs through their activations, and explore their presence in several popular pre-trained models, finding PANs on all models explored, from dozens to hundreds. We discuss and illustrate different types of PANs, their kernels and behaviour. To understand their relevance, we test their impact on model performance, and find padding and PANs to induce strong and characteristic biases in the data. Finally, we discuss whether or not PANs are desirable, as well as the potential side effects of their presence in the context of model performance, generalisation, efficiency and safety.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Dataset Generation and Bonobo Classification from Weakly Labelled Videos
Authors:
Pierre-Etienne Martin
Abstract:
This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to…
▽ More
This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to a macaque detector in order to spatially detect the individual present in the video. Handcrafted features coupled with different classification algorithms and deep-learning methods using a ResNet architecture are investigated for bonobo identification. Performance is compared in terms of classification accuracy on the splits of the database using different data separation methods. We demonstrate the importance of data preparation and how a wrong data separation can lead to false good results. Finally, after a meaningful separation of the data, the best classification performance is obtained using a fine-tuned ResNet model and reaches 75% of accuracy.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Exploring the Role of Explainability in AI-Assisted Embryo Selection
Authors:
Lucia Urcelay,
Daniel Hinjos,
Pablo A. Martin-Torres,
Marta Gonzalez,
Marta Mendez,
Salva Cívico,
Sergio Álvarez-Napagao,
Dario Garcia-Gasulla
Abstract:
In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making i…
▽ More
In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making is key. In this paper we analyze the current work in the explainability of AI-assisted embryo analysis models, identifying the limitations. We also discuss how these models could be integrated in the clinical context as decision support systems, considering the needs of clinicians and patients. Finally, we propose guidelines for the sake of increasing interpretability and trustworthiness, pushing this technology forward towards established clinical practice.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
A C++20 Interface for MPI 4.0
Authors:
Ali Can Demiralp,
Philipp Martin,
Niko Sakic,
Marcel Krüger,
Tim Gerrits
Abstract:
We present a modern C++20 interface for MPI 4.0. The interface utilizes recent language features to ease development of MPI applications. An aggregate reflection system enables generation of MPI data types from user-defined classes automatically. Immediate and persistent operations are mapped to futures, which can be chained to describe sequential asynchronous operations and task graphs in a conci…
▽ More
We present a modern C++20 interface for MPI 4.0. The interface utilizes recent language features to ease development of MPI applications. An aggregate reflection system enables generation of MPI data types from user-defined classes automatically. Immediate and persistent operations are mapped to futures, which can be chained to describe sequential asynchronous operations and task graphs in a concise way. This work introduces the prominent features of the interface with examples. We further measure its performance overhead with respect to the raw C interface.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Authors:
Xinlei Niu,
Christian Walder,
Jing Zhang,
Charles Patrick Martin
Abstract:
We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all…
▽ More
We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.
△ Less
Submitted 25 June, 2024; v1 submitted 4 June, 2023;
originally announced June 2023.
-
NeRF-QA: Neural Radiance Fields Quality Assessment Database
Authors:
Pedro Martin,
António Rodrigues,
João Ascenso,
Maria Paula Queluz
Abstract:
This short paper proposes a new database - NeRF-QA - containing 48 videos synthesized with seven NeRF based methods, along with their perceived quality scores, resulting from subjective assessment tests; for the videos selection, both real and synthetic, 360 degrees scenes were considered. This database will allow to evaluate the suitability, to NeRF based synthesized views, of existing objective…
▽ More
This short paper proposes a new database - NeRF-QA - containing 48 videos synthesized with seven NeRF based methods, along with their perceived quality scores, resulting from subjective assessment tests; for the videos selection, both real and synthetic, 360 degrees scenes were considered. This database will allow to evaluate the suitability, to NeRF based synthesized views, of existing objective quality metrics and also the development of new quality metrics, specific for this case.
△ Less
Submitted 16 June, 2025; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Boltzmann Distribution on "Short" Integer Partitions with Power Parts: Limit Laws and Sampling
Authors:
Jean C. Peyen,
Leonid V. Bogachev,
Paul P. Martin
Abstract:
The paper is concerned with the asymptotic analysis of a family of Boltzmann (multiplicative) distributions over the set $\check{\varLambda}^{q}$ of strict integer partitions (i.e., with unequal parts) into perfect $q$-th powers. A combinatorial link is provided via a suitable conditioning by fixing the partition weight (the sum of parts) and length (the number of parts), leading to uniform distri…
▽ More
The paper is concerned with the asymptotic analysis of a family of Boltzmann (multiplicative) distributions over the set $\check{\varLambda}^{q}$ of strict integer partitions (i.e., with unequal parts) into perfect $q$-th powers. A combinatorial link is provided via a suitable conditioning by fixing the partition weight (the sum of parts) and length (the number of parts), leading to uniform distribution on the corresponding subspaces of partitions. The Boltzmann measure is calibrated through the hyper-parameters $\langle N\rangle$ and $\langle M\rangle$ controlling the expected weight and length, respectively. We study ``short'' partitions, where the parameter $\langle M\rangle$ is either fixed or grows slower than for typical plain (unconstrained) partitions. For this model, we obtain a variety of limit theorems including the asymptotics of the cumulative cardinality in the case of fixed $\langle M\rangle$ and a limit shape result in the case of slow growth of $\langle M\rangle$. In both cases, we also characterize the joint distribution of the weight and length, as well as the growth of the smallest and largest parts. Using these results we construct suitable sampling algorithms and analyse their performance.
△ Less
Submitted 23 July, 2024; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks
Authors:
Leonard Hacker,
Finn Bartels,
Pierre-Etienne Martin
Abstract:
As participants of the MediaEval 2022 Sport Task, we propose a two-stream network approach for the classification and detection of table tennis strokes. Each stream is a succession of 3D Convolutional Neural Network (CNN) blocks using attention mechanisms. Each stream processes different 4D inputs. Our method utilizes raw RGB data and pose information computed from MMPose toolbox. The pose informa…
▽ More
As participants of the MediaEval 2022 Sport Task, we propose a two-stream network approach for the classification and detection of table tennis strokes. Each stream is a succession of 3D Convolutional Neural Network (CNN) blocks using attention mechanisms. Each stream processes different 4D inputs. Our method utilizes raw RGB data and pose information computed from MMPose toolbox. The pose information is treated as an image by applying the pose either on a black background or on the original RGB frame it has been computed from. Best performance is obtained by feeding raw RGB data to one stream, Pose + RGB (PRGB) information to the other stream and applying late fusion on the features. The approaches were evaluated on the provided TTStroke-21 data sets. We can report an improvement in stroke classification, reaching 87.3% of accuracy, while the detection does not outperform the baseline but still reaches an IoU of 0.349 and mAP of 0.110.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms
Authors:
Pierre-Etienne Martin
Abstract:
This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2022 benchmark. This task proposes two subtasks: stroke classification from trimmed videos, and stroke detection from untrimmed videos. This baseline addresses both subtasks. We propose two types of 3D-CNN architectures to solve the two subtasks. Both 3D-CNNs use Spatio-temporal convolutions and attent…
▽ More
This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2022 benchmark. This task proposes two subtasks: stroke classification from trimmed videos, and stroke detection from untrimmed videos. This baseline addresses both subtasks. We propose two types of 3D-CNN architectures to solve the two subtasks. Both 3D-CNNs use Spatio-temporal convolutions and attention mechanisms. The architectures and the training process are tailored to solve the addressed subtask. This baseline method is shared publicly online to help the participants in their investigation and alleviate eventually some aspects of the task such as video processing, training method, evaluation and submission routine. The baseline method reaches 86.4% of accuracy with our v2 model for the classification subtask. For the detection subtask, the baseline reaches a mAP of 0.131 and IoU of 0.515 with our v1 model.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Universal Topological Regularities of Syntactic Structures: Decoupling Efficiency from Optimization
Authors:
Fermín Moscoso del Prado Martín
Abstract:
Human syntactic structures are usually represented as graphs. Much research has focused on the mapping between such graphs and linguistic sequences, but less attention has been paid to the shapes of the graphs themselves: their topologies. This study investigates how the topologies of syntactic graphs reveal traces of the processes that led to their emergence. I report a new universal regularity i…
▽ More
Human syntactic structures are usually represented as graphs. Much research has focused on the mapping between such graphs and linguistic sequences, but less attention has been paid to the shapes of the graphs themselves: their topologies. This study investigates how the topologies of syntactic graphs reveal traces of the processes that led to their emergence. I report a new universal regularity in syntactic structures: Their topology is communicatively efficient above chance. The pattern holds, without exception, for all 124 languages studied, across linguistic families and modalities (spoken, written, and signed). This pattern can arise from a process optimizing for communicative efficiency or, alternatively, by construction, as a by-effect of a sublinear preferential attachment process reflecting language production mechanisms known from psycholinguistics. This dual explanation shows how communicative efficiency, per se, does not require optimization. Among the two options, efficiency without optimization offers the better explanation for the new pattern.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022
Authors:
Pierre-Etienne Martin,
Jordan Calandre,
Boris Mansencal,
Jenny Benois-Pineau,
Renaud Péteri,
Laurent Mascarilla,
Julien Morlier
Abstract:
Sports video analysis is a widespread research topic. Its applications are very diverse, like events detection during a match, video summary, or fine-grained movement analysis of athletes. As part of the MediaEval 2022 benchmarking initiative, this task aims at detecting and classifying subtle movements from sport videos. We focus on recordings of table tennis matches. Conducted since 2019, this t…
▽ More
Sports video analysis is a widespread research topic. Its applications are very diverse, like events detection during a match, video summary, or fine-grained movement analysis of athletes. As part of the MediaEval 2022 benchmarking initiative, this task aims at detecting and classifying subtle movements from sport videos. We focus on recordings of table tennis matches. Conducted since 2019, this task provides a classification challenge from untrimmed videos recorded under natural conditions with known temporal boundaries for each stroke. Since 2021, the task also provides a stroke detection challenge from unannotated, untrimmed videos. This year, the training, validation, and test sets are enhanced to ensure that all strokes are represented in each dataset. The dataset is now similar to the one used in [1, 2]. This research is intended to build tools for coaches and athletes who want to further evaluate their sport performances.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Emergent communication enhances foraging behaviour in evolved swarms controlled by Spiking Neural Networks
Authors:
Cristian Jimenez Romero,
Alper Yegenoglu,
Aarón Pérez Martín,
Sandra Diaz-Pier,
Abigail Morrison
Abstract:
Social insects such as ants communicate via pheromones which allows them to coordinate their activity and solve complex tasks as a swarm, e.g. foraging for food. This behavior was shaped through evolutionary processes. In computational models, self-coordination in swarms has been implemented using probabilistic or simple action rules to shape the decision of each agent and the collective behavior.…
▽ More
Social insects such as ants communicate via pheromones which allows them to coordinate their activity and solve complex tasks as a swarm, e.g. foraging for food. This behavior was shaped through evolutionary processes. In computational models, self-coordination in swarms has been implemented using probabilistic or simple action rules to shape the decision of each agent and the collective behavior. However, manual tuned decision rules may limit the behavior of the swarm. In this work we investigate the emergence of self-coordination and communication in evolved swarms without defining any explicit rule. We evolve a swarm of agents representing an ant colony. We use an evolutionary algorithm to optimize a spiking neural network (SNN) which serves as an artificial brain to control the behavior of each agent. The goal of the evolved colony is to find optimal ways to forage for food and return it to the nest in the shortest amount of time. In the evolutionary phase, the ants are able to learn to collaborate by depositing pheromone near food piles and near the nest to guide other ants. The pheromone usage is not manually encoded into the network; instead, this behavior is established through the optimization procedure. We observe that pheromone-based communication enables the ants to perform better in comparison to colonies where communication via pheromone did not emerge. We assess the foraging performance by comparing the SNN based model to a rule based system. Our results show that the SNN based model can efficiently complete the foraging task in a short amount of time. Our approach illustrates self coordination via pheromone emerges as a result of the network optimization. This work serves as a proof of concept for the possibility of creating complex applications utilizing SNNs as underlying architectures for multi-agent interactions where communication and self-coordination is desired.
△ Less
Submitted 8 September, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Embodying the Glitch: Perspectives on Generative AI in Dance Practice
Authors:
Benedikte Wallace,
Charles P. Martin
Abstract:
What role does the break from realism play in the potential for generative artificial intelligence as a creative tool? Through exploration of glitch, we examine the prospective value of these artefacts in creative practice. This paper describes findings from an exploration of AI-generated "mistakes" when using movement produced by a generative deep learning model as an inspiration source in dance…
▽ More
What role does the break from realism play in the potential for generative artificial intelligence as a creative tool? Through exploration of glitch, we examine the prospective value of these artefacts in creative practice. This paper describes findings from an exploration of AI-generated "mistakes" when using movement produced by a generative deep learning model as an inspiration source in dance composition.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Monitoring ROS2: from Requirements to Autonomous Robots
Authors:
Ivan Perez,
Anastasia Mavridou,
Tom Pressburger,
Alexander Will,
Patrick J. Martin
Abstract:
Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors f…
▽ More
Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors for autonomous robots from requirements written in a structured natural language. Our approach integrates the Formal Requirement Elicitation Tool (FRET) with Copilot, a runtime verification framework, through the Ogma integration tool. FRET is used to specify requirements with unambiguous semantics, which are then automatically translated into temporal logic formulae. Ogma generates monitor specifications from the FRET output, which are compiled into hard-real time C99. To facilitate integration of the monitors in ROS2, we have extended Ogma to generate ROS2 packages defining monitoring nodes, which run the monitors when new data becomes available, and publish the results of any violations. The goal of our approach is to treat the generated ROS2 packages as black boxes and integrate them into larger ROS2 systems with minimal effort.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Learning Interaction Variables and Kernels from Observations of Agent-Based Systems
Authors:
Jinchao Feng,
Mauro Maggioni,
Patrick Martin,
Ming Zhong
Abstract:
Dynamical systems across many disciplines are modeled as interacting particles or agents, with interaction rules that depend on a very small number of variables (e.g. pairwise distances, pairwise differences of phases, etc...), functions of the state of pairs of agents. Yet, these interaction rules can generate self-organized dynamics, with complex emergent behaviors (clustering, flocking, swarmin…
▽ More
Dynamical systems across many disciplines are modeled as interacting particles or agents, with interaction rules that depend on a very small number of variables (e.g. pairwise distances, pairwise differences of phases, etc...), functions of the state of pairs of agents. Yet, these interaction rules can generate self-organized dynamics, with complex emergent behaviors (clustering, flocking, swarming, etc.). We propose a learning technique that, given observations of states and velocities along trajectories of the agents, yields both the variables upon which the interaction kernel depends and the interaction kernel itself, in a nonparametric fashion. This yields an effective dimension reduction which avoids the curse of dimensionality from the high-dimensional observation data (states and velocities of all the agents). We demonstrate the learning capability of our method to a variety of first-order interacting systems.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
Authors:
Pierre-Etienne Martin,
J Benois-Pineau,
R Péteri,
A Zemmari,
J Morlier
Abstract:
3D convolutional networks is a good means to perform tasks such as video segmentation into coherent spatio-temporal chunks and classification of them with regard to a target taxonomy. In the chapter we are interested in the classification of continuous video takes with repeatable actions, such as strokes of table tennis. Filmed in a free marker less ecological environment, these videos represent a…
▽ More
3D convolutional networks is a good means to perform tasks such as video segmentation into coherent spatio-temporal chunks and classification of them with regard to a target taxonomy. In the chapter we are interested in the classification of continuous video takes with repeatable actions, such as strokes of table tennis. Filmed in a free marker less ecological environment, these videos represent a challenge from both segmentation and classification point of view. The 3D convnets are an efficient tool for solving these problems with window-based approaches.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Exploring hyper-parameter spaces of neuroscience models on high performance computers with Learning to Learn
Authors:
Alper Yegenoglu,
Anand Subramoney,
Thorsten Hater,
Cristian Jimenez-Romero,
Wouter Klijn,
Aaron Perez Martin,
Michiel van der Vlag,
Michael Herty,
Abigail Morrison,
Sandra Diaz-Pier
Abstract:
Neuroscience models commonly have a high number of degrees of freedom and only specific regions within the parameter space are able to produce dynamics of interest. This makes the development of tools and strategies to efficiently find these regions of high importance to advance brain research. Exploring the high dimensional parameter space using numerical simulations has been a frequently used te…
▽ More
Neuroscience models commonly have a high number of degrees of freedom and only specific regions within the parameter space are able to produce dynamics of interest. This makes the development of tools and strategies to efficiently find these regions of high importance to advance brain research. Exploring the high dimensional parameter space using numerical simulations has been a frequently used technique in the last years in many areas of computational neuroscience. High performance computing (HPC) can provide today a powerful infrastructure to speed up explorations and increase our general understanding of the model's behavior in reasonable times.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
RTGNN: A Novel Approach to Model Stochastic Traffic Dynamics
Authors:
Ke Sun,
Stephen Chaves,
Paul Martin,
Vijay Kumar
Abstract:
Modeling stochastic traffic dynamics is critical to developing self-driving cars. Because it is difficult to develop first principle models of cars driven by humans, there is great potential for using data driven approaches in developing traffic dynamical models. While there is extensive literature on this subject, previous works mainly address the prediction accuracy of data-driven models. Moreov…
▽ More
Modeling stochastic traffic dynamics is critical to developing self-driving cars. Because it is difficult to develop first principle models of cars driven by humans, there is great potential for using data driven approaches in developing traffic dynamical models. While there is extensive literature on this subject, previous works mainly address the prediction accuracy of data-driven models. Moreover, it is often difficult to apply these models to common planning frameworks since they fail to meet the assumptions therein. In this work, we propose a new stochastic traffic model, Recurrent Traffic Graph Neural Network (RTGNN), by enforcing additional structures on the model so that the proposed model can be seamlessly integrated with existing motion planning algorithms. RTGNN is a Markovian model and is able to infer future traffic states conditioned on the motion of the ego vehicle. Specifically, RTGNN uses a definition of the traffic state that includes the state of all players in a local region and is therefore able to make joint predictions for all agents of interest. Meanwhile, we explicitly model the hidden states of agents, "intentions," as part of the traffic state to reflect the inherent partial observability of traffic dynamics. The above mentioned properties are critical for integrating RTGNN with motion planning algorithms coupling prediction and decision making. Despite the additional structures, we show that RTGNN is able to achieve state-of-the-art accuracy through comparisons with other similar works.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Spatio-Temporal CNN baseline method for the Sports Video Task of MediaEval 2021 benchmark
Authors:
Pierre-Etienne Martin
Abstract:
This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2021 benchmark. This task proposes a stroke detection and a stroke classification subtasks. This baseline addresses both subtasks. The spatio-temporal CNN architecture and the training process of the model are tailored according to the addressed subtask. The method has the purpose of helping the partic…
▽ More
This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2021 benchmark. This task proposes a stroke detection and a stroke classification subtasks. This baseline addresses both subtasks. The spatio-temporal CNN architecture and the training process of the model are tailored according to the addressed subtask. The method has the purpose of helping the participants to solve the task and is not meant to reach stateof-the-art performance. Still, for the detection task, the baseline is performing better than the other participants, which stresses the difficulty of such a task.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Two Stream Network for Stroke Detection in Table Tennis
Authors:
Anam Zahra,
Pierre-Etienne Martin
Abstract:
This paper presents a table tennis stroke detection method from videos. The method relies on a two-stream Convolutional Neural Network processing in parallel the RGB Stream and its computed optical flow. The method has been developed as part of the MediaEval 2021 benchmark for the Sport task. Our contribution did not outperform the provided baseline on the test set but has performed the best among…
▽ More
This paper presents a table tennis stroke detection method from videos. The method relies on a two-stream Convolutional Neural Network processing in parallel the RGB Stream and its computed optical flow. The method has been developed as part of the MediaEval 2021 benchmark for the Sport task. Our contribution did not outperform the provided baseline on the test set but has performed the best among the other participants with regard to the mAP metric.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Sports Video: Fine-Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2021
Authors:
Pierre-Etienne Martin,
Jordan Calandre,
Boris Mansencal,
Jenny Benois-Pineau,
Renaud Péteri,
Laurent Mascarilla,
Julien Morlier
Abstract:
Sports video analysis is a prevalent research topic due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests up to analysis of athletes' performance. The Sports Video task is part of the MediaEval 2021 benchmark. This task tackles fine-grained action detection and classification from videos. The focus is on recordings of table tennis games. Ru…
▽ More
Sports video analysis is a prevalent research topic due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests up to analysis of athletes' performance. The Sports Video task is part of the MediaEval 2021 benchmark. This task tackles fine-grained action detection and classification from videos. The focus is on recordings of table tennis games. Running since 2019, the task has offered a classification challenge from untrimmed video recorded in natural conditions with known temporal boundaries for each stroke. This year, the dataset is extended and offers, in addition, a detection challenge from untrimmed videos without annotations. This work aims at creating tools for sports coaches and players in order to analyze sports performance. Movement analysis and player profiling may be built upon such technology to enrich the training experience of athletes and improve their performance.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
VACA: Design of Variational Graph Autoencoders for Interventional and Counterfactual Queries
Authors:
Pablo Sanchez-Martin,
Miriam Rateike,
Isabel Valera
Abstract:
In this paper, we introduce VACA, a novel class of variational graph autoencoders for causal inference in the absence of hidden confounders, when only observational data and the causal graph are available. Without making any parametric assumptions, VACA mimics the necessary properties of a Structural Causal Model (SCM) to provide a flexible and practical framework for approximating interventions (…
▽ More
In this paper, we introduce VACA, a novel class of variational graph autoencoders for causal inference in the absence of hidden confounders, when only observational data and the causal graph are available. Without making any parametric assumptions, VACA mimics the necessary properties of a Structural Causal Model (SCM) to provide a flexible and practical framework for approximating interventions (do-operator) and abduction-action-prediction steps. As a result, and as shown by our empirical results, VACA accurately approximates the interventional and counterfactual distributions on diverse SCMs. Finally, we apply VACA to evaluate counterfactual fairness in fair classification problems, as well as to learn fair classifiers without compromising performance.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Three-Stream 3D/1D CNN for Fine-Grained Action Classification and Segmentation in Table Tennis
Authors:
Pierre-Etienne Martin,
Jenny Benois-Pineau,
Renaud Péteri,
Julien Morlier
Abstract:
This paper proposes a fusion method of modalities extracted from video through a three-stream network with spatio-temporal and temporal convolutions for fine-grained action classification in sport. It is applied to TTStroke-21 dataset which consists of untrimmed videos of table tennis games. The goal is to detect and classify table tennis strokes in the videos, the first step of a bigger scheme ai…
▽ More
This paper proposes a fusion method of modalities extracted from video through a three-stream network with spatio-temporal and temporal convolutions for fine-grained action classification in sport. It is applied to TTStroke-21 dataset which consists of untrimmed videos of table tennis games. The goal is to detect and classify table tennis strokes in the videos, the first step of a bigger scheme aiming at giving feedback to the players for improving their performance. The three modalities are raw RGB data, the computed optical flow and the estimated pose of the player. The network consists of three branches with attention blocks. Features are fused at the latest stage of the network using bilinear layers. Compared to previous approaches, the use of three modalities allows faster convergence and better performances on both tasks: classification of strokes with known temporal boundaries and joint segmentation and classification. The pose is also further investigated in order to offer richer feedback to the athletes.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication
Authors:
Mohammadreza Soltaniyeh,
Richard P. Martin,
Santosh Nagarakatte
Abstract:
This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We p…
▽ More
This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We propose a novel design for the IM2COL unit that uses a set of distributed local memories connected by a ring network, which improves energy efficiency and latency by streaming the input feature map only once. We propose a tall systolic array for the GEMM unit while also providing the ability to organize it as multiple small GEMM units, which enables our design to handle a wide range of CNNs and their parameters. Further, our design improves performance by effectively mapping the sparse data to the hardware units by utilizing sparsity in both input feature maps and weights. Our prototype, SPOTS, is on average 1.74X faster than Eyeriss. It is also 78X, and 12X more energy-efficient when compared to CPU and GPU implementations, respectively.
△ Less
Submitted 24 November, 2021; v1 submitted 28 July, 2021;
originally announced July 2021.
-
The Dynamic k-Mismatch Problem
Authors:
Raphaël Clifford,
Paweł Gawrychowski,
Tomasz Kociumaka,
Daniel P. Martin,
Przemysław Uznański
Abstract:
The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlapping…
▽ More
The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlapping blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index $i$, returns the Hamming distance between the pattern and the text substring starting at position $i$, or reports that it exceeds $k$.
First, we show a data structure with $\tilde{O}(1)$ update and $\tilde{O}(k)$ query time. Then we show that $\tilde{O}(k)$ update and $\tilde{O}(1)$ query time is also possible. These two provide an optimal trade-off for the dynamic $k$-mismatch problem with $k \le \sqrt{n}$: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve $k^{1-Ω(1)}$ time for all operations.
For $k\ge \sqrt{n}$, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking $n^{1/2-Ω(1)}$ time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved $\tilde{O}(\sqrt{n})$ time per operation in that case, but with $\tilde{O}(n^{3/4})$ time per operation for large alphabets. We improve and extend this result with an algorithm that, given $1\le x\le k$, achieves update time $\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}})$ and query time $\tilde{O}(x)$. In particular, for $k\ge \sqrt{n}$, an appropriate choice of $x$ yields $\tilde{O}(\sqrt[3]{nk})$ time per operation, which is $\tilde{O}(n^{2/3})$ when no threshold $k$ is provided.
△ Less
Submitted 28 March, 2022; v1 submitted 13 May, 2021;
originally announced May 2021.
-
3D attention mechanism for fine-grained classification of table tennis strokes using a Twin Spatio-Temporal Convolutional Neural Networks
Authors:
Pierre-Etienne Martin,
Jenny Benois-Pineau,
Renaud Péteri,
Julien Morlier
Abstract:
The paper addresses the problem of recognition of actions in video with low inter-class variability such as Table Tennis strokes. Two stream, "twin" convolutional neural networks are used with 3D convolutions both on RGB data and optical flow. Actions are recognized by classification of temporal windows. We introduce 3D attention modules and examine their impact on classification efficiency. In th…
▽ More
The paper addresses the problem of recognition of actions in video with low inter-class variability such as Table Tennis strokes. Two stream, "twin" convolutional neural networks are used with 3D convolutions both on RGB data and optical flow. Actions are recognized by classification of temporal windows. We introduce 3D attention modules and examine their impact on classification efficiency. In the context of the study of sportsmen performances, a corpus of the particular actions of table tennis strokes is considered. The use of attention blocks in the network speeds up the training step and improves the classification scores up to 5% with our twin model. We visualize the impact on the obtained features and notice correlation between attention and player movements and position. Score comparison of state-of-the-art action classification method and proposed approach with attentional blocks is performed on the corpus. Proposed model with attention blocks outperforms previous model without them and our baseline.
△ Less
Submitted 20 November, 2020;
originally announced December 2020.
-
Composing an Ensemble Standstill Work for Myo and Bela
Authors:
Charles Patrick Martin,
Alexander Refsum Jensenius,
Jim Torresen
Abstract:
This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce My…
▽ More
This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce Myo-to-Bela and Myo-to-OSC software bridges that assist with prototyping compositions using the Myo controller.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
A Laptop Ensemble Performance System using Recurrent Neural Networks
Authors:
Rohan Proctor,
Charles Patrick Martin
Abstract:
The popularity of applying machine learning techniques in musical domains has created an inherent availability of freely accessible pre-trained neural network (NN) models ready for use in creative applications. This work outlines the implementation of one such application in the form of an assistance tool designed for live improvisational performances by laptop ensembles. The primary intention was…
▽ More
The popularity of applying machine learning techniques in musical domains has created an inherent availability of freely accessible pre-trained neural network (NN) models ready for use in creative applications. This work outlines the implementation of one such application in the form of an assistance tool designed for live improvisational performances by laptop ensembles. The primary intention was to leverage off-the-shelf pre-trained NN models as a basis for assisting individual performers either as musical novices looking to engage with more experienced performers or as a tool to expand musical possibilities through new forms of creative expression. The system expands upon a variety of ideas found in different research areas including new interfaces for musical expression, generative music and group performance to produce a networked performance solution served via a web-browser interface. The final implementation of the system offers performers a mixture of high and low-level controls to influence the shape of sequences of notes output by locally run NN models in real time, also allowing performers to define their level of engagement with the assisting generative models. Two test performances were played, with the system shown to feasibly support four performers over a four minute piece while producing musically cohesive and engaging music. Iterations on the design of the system exposed technical constraints on the use of a JavaScript environment for generative models in a live music context, largely derived from inescapable processing overheads.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Sonic Sculpture: Activating Engagement with Head-Mounted Augmented Reality
Authors:
Charles Patrick Martin,
Zeruo Liu,
Yichen Wang,
Wennan He,
Henry Gardner
Abstract:
This work examines how head-mounted AR can be used to build an interactive sonic landscape to engage with a public sculpture. We describe a sonic artwork, "Listening To Listening", that has been designed to accompany a real-world sculpture with two prototype interaction schemes. Our artwork is created for the HoloLens platform so that users can have an individual experience in a mixed reality cont…
▽ More
This work examines how head-mounted AR can be used to build an interactive sonic landscape to engage with a public sculpture. We describe a sonic artwork, "Listening To Listening", that has been designed to accompany a real-world sculpture with two prototype interaction schemes. Our artwork is created for the HoloLens platform so that users can have an individual experience in a mixed reality context. Personal head-mounted AR systems have recently become available and practical for integration into public art projects, however research into sonic sculpture works has yet to account for the affordances of current portable and mainstream AR systems. In this work, we take advantage of the HoloLens' spatial awareness to build sonic spaces that have a precise spatial relationship to a given sculpture and where the sculpture itself is modelled in the augmented scene as an "invisible hologram". We describe the artistic rationale for our artwork, the design of the two interaction schemes, and the technical and usability feedback that we have obtained from demonstrations during iterative development.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Towards Movement Generation with Audio Features
Authors:
Benedikte Wallace,
Charles P. Martin,
Jim Torresen,
Kristian Nymoen
Abstract:
Sound and movement are closely coupled, particularly in dance. Certain audio features have been found to affect the way we move to music. Is this relationship between sound and movement something which can be modelled using machine learning? This work presents initial experiments wherein high-level audio features calculated from a set of music pieces are included in a movement generation model tra…
▽ More
Sound and movement are closely coupled, particularly in dance. Certain audio features have been found to affect the way we move to music. Is this relationship between sound and movement something which can be modelled using machine learning? This work presents initial experiments wherein high-level audio features calculated from a set of music pieces are included in a movement generation model trained on motion capture recordings of improvised dance. Our results indicate that the model learns to generate realistic dance movements which vary depending on the audio features.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.