-
Evaluation in EEG Emotion Recognition: State-of-the-Art Review and Unified Framework
Authors:
Natia Kukhilava,
Tatia Tsmindashvili,
Rapael Kalandadze,
Anchit Gupta,
Sofio Katamadze,
François Brémond,
Laura M. Ferrari,
Philipp Müller,
Benedikt Emanuel Wirth
Abstract:
Electroencephalography-based Emotion Recognition (EEG-ER) has become a growing research area in recent years. Analyzing 216 papers published between 2018 and 2023, we uncover that the field lacks a unified evaluation protocol, which is essential to fairly define the state of the art, compare new approaches and to track the field's progress. We report the main inconsistencies between the used evalu…
▽ More
Electroencephalography-based Emotion Recognition (EEG-ER) has become a growing research area in recent years. Analyzing 216 papers published between 2018 and 2023, we uncover that the field lacks a unified evaluation protocol, which is essential to fairly define the state of the art, compare new approaches and to track the field's progress. We report the main inconsistencies between the used evaluation protocols, which are related to ground truth definition, evaluation metric selection, data splitting types (e.g., subject-dependent or subject-independent) and the use of different datasets. Capitalizing on this state-of-the-art research, we propose a unified evaluation protocol, EEGain (https://github.com/EmotionLab/EEGain), which enables an easy and efficient evaluation of new methods and datasets. EEGain is a novel open source software framework, offering the capability to compare - and thus define - state-of-the-art results. EEGain includes standardized methods for data pre-processing, data splitting, evaluation metrics, and the ability to load the six most relevant datasets (i.e., AMIGOS, DEAP, DREAMER, MAHNOB-HCI, SEED, SEED-IV) in EEG-ER with only a single line of code. In addition, we have assessed and validated EEGain using these six datasets on the four most common publicly available methods (EEGNet, DeepConvNet, ShallowConvNet, TSception). This is a significant step to make research on EEG-ER more reproducible and comparable, thereby accelerating the overall progress of the field.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Masked Conditioning for Deep Generative Models
Authors:
Phillip Mueller,
Jannik Wiese,
Sebastian Mueller,
Lars Mikelsons
Abstract:
Datasets in engineering domains are often small, sparsely labeled, and contain numerical as well as categorical conditions. Additionally. computational resources are typically limited in practical applications which hinders the adoption of generative models for engineering tasks. We introduce a novel masked-conditioning approach, that enables generative models to work with sparse, mixed-type data.…
▽ More
Datasets in engineering domains are often small, sparsely labeled, and contain numerical as well as categorical conditions. Additionally. computational resources are typically limited in practical applications which hinders the adoption of generative models for engineering tasks. We introduce a novel masked-conditioning approach, that enables generative models to work with sparse, mixed-type data. We mask conditions during training to simulate sparse conditions at inference time. For this purpose, we explore the use of various sparsity schedules that show different strengths and weaknesses. In addition, we introduce a flexible embedding that deals with categorical as well as numerical conditions. We integrate our method into an efficient variational autoencoder as well as a latent diffusion model and demonstrate the applicability of our approach on two engineering-related datasets of 2D point clouds and images. Finally, we show that small models trained on limited data can be coupled with large pretrained foundation models to improve generation quality while retaining the controllability induced by our conditioning scheme.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Authors:
Galann Pennec,
Zhengyuan Liu,
Nicholas Asher,
Philippe Muller,
Nancy F. Chen
Abstract:
Vision-Language Models (VLMs) often struggle to balance visual and textual information when summarizing complex multimodal inputs, such as entire TV show episodes. In this paper, we propose a zero-shot video-to-text summarization approach that builds its own screenplay representation of an episode, effectively integrating key video moments, dialogue, and character information into a unified docume…
▽ More
Vision-Language Models (VLMs) often struggle to balance visual and textual information when summarizing complex multimodal inputs, such as entire TV show episodes. In this paper, we propose a zero-shot video-to-text summarization approach that builds its own screenplay representation of an episode, effectively integrating key video moments, dialogue, and character information into a unified document. Unlike previous approaches, we simultaneously generate screenplays and name the characters in zero-shot, using only the audio, video, and transcripts as input. Additionally, we highlight that existing summarization metrics can fail to assess the multimodal content in summaries. To address this, we introduce MFactSum, a multimodal metric that evaluates summaries with respect to both vision and text modalities. Using MFactSum, we evaluate our screenplay summaries on the SummScreen3D dataset, demonstrating superiority against state-of-the-art VLMs such as Gemini 1.5 by generating summaries containing 20% more relevant visual information while requiring 75% less of the video as input.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models
Authors:
Patrick Müller,
Alexander Braun,
Margret Keuper
Abstract:
Deep neural networks (DNNs) have proven to be successful in various computer vision applications such that models even infer in safety-critical situations. Therefore, vision models have to behave in a robust way to disturbances such as noise or blur. While seminal benchmarks exist to evaluate model robustness to diverse corruptions, blur is often approximated in an overly simplistic way to model d…
▽ More
Deep neural networks (DNNs) have proven to be successful in various computer vision applications such that models even infer in safety-critical situations. Therefore, vision models have to behave in a robust way to disturbances such as noise or blur. While seminal benchmarks exist to evaluate model robustness to diverse corruptions, blur is often approximated in an overly simplistic way to model defocus, while ignoring the different blur kernel shapes that result from optical systems. To study model robustness against realistic optical blur effects, this paper proposes two datasets of blur corruptions, which we denote OpticsBench and LensCorruptions. OpticsBench examines primary aberrations such as coma, defocus, and astigmatism, i.e. aberrations that can be represented by varying a single parameter of Zernike polynomials. To go beyond the principled but synthetic setting of primary aberrations, LensCorruptions samples linear combinations in the vector space spanned by Zernike polynomials, corresponding to 100 real lenses. Evaluations for image classification and object detection on ImageNet and MSCOCO show that for a variety of different pre-trained models, the performance on OpticsBench and LensCorruptions varies significantly, indicating the need to consider realistic image corruptions to evaluate a model's robustness against blur.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
AutoPsyC: Automatic Recognition of Psychodynamic Conflicts from Semi-structured Interviews with Large Language Models
Authors:
Sayed Muddashir Hossain,
Simon Ostermann,
Patrick Gebhard,
Cord Benecke,
Josef van Genabith,
Philipp Müller
Abstract:
Psychodynamic conflicts are persistent, often unconscious themes that shape a person's behaviour and experiences. Accurate diagnosis of psychodynamic conflicts is crucial for effective patient treatment and is commonly done via long, manually scored semi-structured interviews. Existing automated solutions for psychiatric diagnosis tend to focus on the recognition of broad disorder categories such…
▽ More
Psychodynamic conflicts are persistent, often unconscious themes that shape a person's behaviour and experiences. Accurate diagnosis of psychodynamic conflicts is crucial for effective patient treatment and is commonly done via long, manually scored semi-structured interviews. Existing automated solutions for psychiatric diagnosis tend to focus on the recognition of broad disorder categories such as depression, and it is unclear to what extent psychodynamic conflicts which even the patient themselves may not have conscious access to could be automatically recognised from conversation. In this paper, we propose AutoPsyC, the first method for recognising the presence and significance of psychodynamic conflicts from full-length Operationalized Psychodynamic Diagnostics (OPD) interviews using Large Language Models (LLMs). Our approach combines recent advances in parameter-efficient fine-tuning and Retrieval-Augmented Generation (RAG) with a summarisation strategy to effectively process entire 90 minute long conversations. In evaluations on a dataset of 141 diagnostic interviews we show that AutoPsyC consistently outperforms all baselines and ablation conditions on the recognition of four highly relevant psychodynamic conflicts.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Place Capability Graphs: A General-Purpose Model of Rust's Ownership and Borrowing Guarantees
Authors:
Zachary Grannan,
Aurel Bílý,
Jonáš Fiala,
Jasper Geer,
Markus de Medeiros,
Peter Müller,
Alexander J. Summers
Abstract:
Rust's novel type system has proved an attractive target for verification and program analysis tools, due to the rich guarantees it provides for controlling aliasing and mutability. However, fully understanding, extracting and exploiting these guarantees is subtle and challenging: existing models for Rust's type checking either support a smaller idealised language disconnected from real-world Rust…
▽ More
Rust's novel type system has proved an attractive target for verification and program analysis tools, due to the rich guarantees it provides for controlling aliasing and mutability. However, fully understanding, extracting and exploiting these guarantees is subtle and challenging: existing models for Rust's type checking either support a smaller idealised language disconnected from real-world Rust code, or come with severe limitations in terms of precise modelling of Rust borrows, composite types storing them, function signatures and loops.
In this paper, we present a novel model of Rust's type-checking called Place Capability Graphs, which lifts these limitations, and which can be directly calculated from the Rust compiler's own programmatic representations and analyses. We demonstrate that our model supports over 98% of Rust functions in the most popular public crates, and show its suitability as a general-purpose basis for verification and program analysis tools by developing promising new prototype versions of the existing Flowistry and Prusti tools.
△ Less
Submitted 4 April, 2025; v1 submitted 27 March, 2025;
originally announced March 2025.
-
MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling
Authors:
Damian Boborzi,
Phillip Mueller,
Jonas Emrich,
Dominik Schmid,
Sebastian Mueller,
Lars Mikelsons
Abstract:
Generative models have recently made remarkable progress in the field of 3D objects. However, their practical application in fields like engineering remains limited since they fail to deliver the accuracy, quality, and controllability needed for domain-specific tasks. Fine-tuning large generative models is a promising perspective for making these models available in these fields. Creating high-qua…
▽ More
Generative models have recently made remarkable progress in the field of 3D objects. However, their practical application in fields like engineering remains limited since they fail to deliver the accuracy, quality, and controllability needed for domain-specific tasks. Fine-tuning large generative models is a promising perspective for making these models available in these fields. Creating high-quality, domain-specific 3D datasets is crucial for fine-tuning large generative models, yet the data filtering and annotation process remains a significant bottleneck. We present MeshFleet, a filtered and annotated 3D vehicle dataset extracted from Objaverse-XL, the most extensive publicly available collection of 3D objects. Our approach proposes a pipeline for automated data filtering based on a quality classifier. This classifier is trained on a manually labeled subset of Objaverse, incorporating DINOv2 and SigLIP embeddings, refined through caption-based analysis and uncertainty estimation. We demonstrate the efficacy of our filtering method through a comparative analysis against caption and image aesthetic score-based techniques and fine-tuning experiments with SV3D, highlighting the importance of targeted data selection for domain-specific 3D generative modeling.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
L-FUSION: Laplacian Fetal Ultrasound Segmentation & Uncertainty Estimation
Authors:
Johanna P. Müller,
Robert Wright,
Thomas G. Day,
Lorenzo Venturini,
Samuel F. Budd,
Hadrien Reynaud,
Joseph V. Hajnal,
Reza Razavi,
Bernhard Kainz
Abstract:
Accurate analysis of prenatal ultrasound (US) is essential for early detection of developmental anomalies. However, operator dependency and technical limitations (e.g. intrinsic artefacts and effects, setting errors) can complicate image interpretation and the assessment of diagnostic uncertainty. We present L-FUSION (Laplacian Fetal US Segmentation with Integrated FoundatiON models), a framework…
▽ More
Accurate analysis of prenatal ultrasound (US) is essential for early detection of developmental anomalies. However, operator dependency and technical limitations (e.g. intrinsic artefacts and effects, setting errors) can complicate image interpretation and the assessment of diagnostic uncertainty. We present L-FUSION (Laplacian Fetal US Segmentation with Integrated FoundatiON models), a framework that integrates uncertainty quantification through unsupervised, normative learning and large-scale foundation models for robust segmentation of fetal structures in normal and pathological scans. We propose to utilise the aleatoric logit distributions of Stochastic Segmentation Networks and Laplace approximations with fast Hessian estimations to estimate epistemic uncertainty only from the segmentation head. This enables us to achieve reliable abnormality quantification for instant diagnostic feedback. Combined with an integrated Dropout component, L-FUSION enables reliable differentiation of lesions from normal fetal anatomy with enhanced uncertainty maps and segmentation counterfactuals in US imaging. It improves epistemic and aleatoric uncertainty interpretation and removes the need for manual disease-labelling. Evaluations across multiple datasets show that L-FUSION achieves superior segmentation accuracy and consistent uncertainty quantification, supporting on-site decision-making and offering a scalable solution for advancing fetal ultrasound analysis in clinical settings.
△ Less
Submitted 12 March, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
Towards Generalisable Time Series Understanding Across Domains
Authors:
Özgün Turgut,
Philip Müller,
Martin J. Menten,
Daniel Rueckert
Abstract:
Recent breakthroughs in natural language processing and computer vision, driven by efficient pre-training on large datasets, have enabled foundation models to excel on a wide range of tasks. However, this potential has not yet been fully realised in time series analysis, as existing methods fail to address the heterogeneity in large time series corpora. Prevalent in domains ranging from medicine t…
▽ More
Recent breakthroughs in natural language processing and computer vision, driven by efficient pre-training on large datasets, have enabled foundation models to excel on a wide range of tasks. However, this potential has not yet been fully realised in time series analysis, as existing methods fail to address the heterogeneity in large time series corpora. Prevalent in domains ranging from medicine to finance, time series vary substantially in characteristics such as variate count, inter-variate relationships, temporal patterns, and sampling frequency. To address this, we introduce a novel pre-training paradigm specifically designed to handle time series heterogeneity. We propose a tokeniser with learnable domain signatures, a dual masking strategy, and a normalised cross-correlation loss, enabling our open model for general time series analysis (OTiS) to efficiently learn from large time series corpora. Extensive benchmarking on diverse tasks, such as classification, regression, and forecasting, demonstrates that OTiS outperforms state-of-the-art baselines. Our code and pre-trained weights are available at https://github.com/oetu/otis.
△ Less
Submitted 31 January, 2025; v1 submitted 9 October, 2024;
originally announced October 2024.
-
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
Authors:
Phillip Mueller,
Sebastian Mueller,
Lars Mikelsons
Abstract:
We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatur…
▽ More
We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatures) from image-generation models to detect geometric correspondences (e.g. the position of the wheel center) in structural images and the generation of diverse text descriptions for structural images. GPT-4o, a vision-language-model (VLM), is instructed to analyze images and produce diverse descriptions aligned with the system-prompt. By representing technical images as Diffusion-Hyperfeatures, drawing geometric correspondences between them is possible. The detection accuracy of geometric points in unseen samples is improved by presenting multiple annotated source images. GPT-4o has sufficient capabilities to generate accurate descriptions of technical images. Grounding the generation only on images leads to diverse descriptions but causes hallucinations, while grounding it on categorical labels restricts the diversity. Using both as input balances creativity and accuracy. Successfully using Hyperfeatures for geometric correspondence suggests that this approach can be used for general point-detection and annotation tasks in technical images. Labeling such images with text descriptions using VLMs is possible, but dependent on the models detection capabilities, careful prompt-engineering and the selection of input information. Applying foundation models in engineering design is largely unexplored. We aim to bridge this gap with a dataset to explore training, finetuning and conditioning DGMs in this field and suggesting approaches to bootstrap foundation models to process technical images.
△ Less
Submitted 22 May, 2025; v1 submitted 25 September, 2024;
originally announced September 2024.
-
Estimating Neural Orientation Distribution Fields on High Resolution Diffusion MRI Scans
Authors:
Mohammed Munzer Dwedari,
William Consagra,
Philip Müller,
Özgün Turgut,
Daniel Rueckert,
Yogesh Rathi
Abstract:
The Orientation Distribution Function (ODF) characterizes key brain microstructural properties and plays an important role in understanding brain structural connectivity. Recent works introduced Implicit Neural Representation (INR) based approaches to form a spatially aware continuous estimate of the ODF field and demonstrated promising results in key tasks of interest when compared to conventiona…
▽ More
The Orientation Distribution Function (ODF) characterizes key brain microstructural properties and plays an important role in understanding brain structural connectivity. Recent works introduced Implicit Neural Representation (INR) based approaches to form a spatially aware continuous estimate of the ODF field and demonstrated promising results in key tasks of interest when compared to conventional discrete approaches. However, traditional INR methods face difficulties when scaling to large-scale images, such as modern ultra-high-resolution MRI scans, posing challenges in learning fine structures as well as inefficiencies in training and inference speed. In this work, we propose HashEnc, a grid-hash-encoding-based estimation of the ODF field and demonstrate its effectiveness in retaining structural and textural features. We show that HashEnc achieves a 10% enhancement in image quality while requiring 3x less computational resources than current methods. Our code can be found at https://github.com/MunzerDw/NODF-HashEnc.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Towards certifiable AI in aviation: landscape, challenges, and opportunities
Authors:
Hymalai Bello,
Daniel Geißler,
Lala Ray,
Stefan Müller-Divéky,
Peter Müller,
Shannon Kittrell,
Mengxi Liu,
Bo Zhou,
Paul Lukowicz
Abstract:
Artificial Intelligence (AI) methods are powerful tools for various domains, including critical fields such as avionics, where certification is required to achieve and maintain an acceptable level of safety. General solutions for safety-critical systems must address three main questions: Is it suitable? What drives the system's decisions? Is it robust to errors/attacks? This is more complex in AI…
▽ More
Artificial Intelligence (AI) methods are powerful tools for various domains, including critical fields such as avionics, where certification is required to achieve and maintain an acceptable level of safety. General solutions for safety-critical systems must address three main questions: Is it suitable? What drives the system's decisions? Is it robust to errors/attacks? This is more complex in AI than in traditional methods. In this context, this paper presents a comprehensive mind map of formal AI certification in avionics. It highlights the challenges of certifying AI development with an example to emphasize the need for qualification beyond performance metrics.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
MultiMediate'24: Multi-Domain Engagement Estimation
Authors:
Philipp Müller,
Michal Balazia,
Tobias Baur,
Michael Dietz,
Alexander Heimerl,
Anna Penzkofer,
Dominik Schiller,
François Brémond,
Jan Alexandersson,
Elisabeth André,
Andreas Bulling
Abstract:
Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'…
▽ More
Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'24, we present the first challenge addressing multi-domain engagement estimation. As training data, we utilise the NOXI database of dyadic novice-expert interactions. In addition to within-domain test data, we add two new test domains. First, we introduce recordings following the NOXI protocol but covering languages that are not present in the NOXI training data. Second, we collected novel engagement annotations on the MPIIGroupInteraction dataset which consists of group discussions between three to four people. In this way, MultiMediate'24 evaluates the ability of approaches to generalise across factors such as language and cultural background, group size, task, and screen-mediated vs. face-to-face interaction. This paper describes the MultiMediate'24 challenge and presents baseline results. In addition, we discuss selected challenge solutions.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models
Authors:
Philipp Müller,
Alexander Heimerl,
Sayed Muddashir Hossain,
Lea Siegel,
Jan Alexandersson,
Patrick Gebhard,
Elisabeth André,
Tanja Schneeberger
Abstract:
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules. For affective computing systems, an understanding of how users regulate their emotions can be highly useful, for example to provide feedback in job interview training, or in psychotherapeutic scenarios. However, at present no method to automatically classify different emotion re…
▽ More
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules. For affective computing systems, an understanding of how users regulate their emotions can be highly useful, for example to provide feedback in job interview training, or in psychotherapeutic scenarios. However, at present no method to automatically classify different emotion regulation strategies in a cross-user scenario exists. At the same time, recent studies showed that instruction-tuned Large Language Models (LLMs) can reach impressive performance across a variety of affect recognition tasks such as categorical emotion recognition or sentiment analysis. While these results are promising, it remains unclear to what extent the representational power of LLMs can be utilized in the more subtle task of classifying users' internal emotion regulation strategy. To close this gap, we make use of the recently introduced \textsc{Deep} corpus for modeling the social display of the emotion shame, where each point in time is annotated with one of seven different emotion regulation classes. We fine-tune Llama2-7B as well as the recently introduced Gemma model using Low-rank Optimization on prompts generated from different sources of information on the \textsc{Deep} corpus. These include verbal and nonverbal behavior, person factors, as well as the results of an in-depth interview after the interaction. Our results show, that a fine-tuned Llama2-7B LLM is able to classify the utilized emotion regulation strategy with high accuracy (0.84) without needing access to data from post-interaction interviews. This represents a significant improvement over previous approaches based on Bayesian Networks and highlights the importance of modeling verbal behavior in emotion regulation.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models
Authors:
Ayrton San Joaquin,
Bin Wang,
Zhengyuan Liu,
Nicholas Asher,
Brian Lim,
Philippe Muller,
Nancy F. Chen
Abstract:
Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and eval…
▽ More
Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model. Notably, we assess the model's internal gradients to estimate this relationship, aiming to rank the contribution of each training point. To enhance efficiency, we propose an optimization to compute influence functions with a reduced number of layers while achieving similar accuracy. By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data. Meantime, using influence functions to analyze model coverage to certain testing samples could provide a reliable and interpretable signal on the training set's coverage of those test points.
△ Less
Submitted 2 October, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Formal Foundations for Translational Separation Logic Verifiers (extended version)
Authors:
Thibault Dardinier,
Michael Sammler,
Gaurav Parthasarathy,
Alexander J. Summers,
Peter Müller
Abstract:
Program verification tools are often implemented as front-end translations of an input program into an intermediate verification language (IVL) such as Boogie, GIL, Viper, or Why3. The resulting IVL program is then verified using an existing back-end verifier. A soundness proof for such a translational verifier needs to relate the input program and verification logic to the semantics of the IVL, w…
▽ More
Program verification tools are often implemented as front-end translations of an input program into an intermediate verification language (IVL) such as Boogie, GIL, Viper, or Why3. The resulting IVL program is then verified using an existing back-end verifier. A soundness proof for such a translational verifier needs to relate the input program and verification logic to the semantics of the IVL, which in turn needs to be connected with the verification logic implemented in the back-end verifiers. Performing such proofs is challenging due to the large semantic gap between the input and output programs and logics, especially for complex verification logics such as separation logic.
This paper presents a formal framework for reasoning about translational separation logic verifiers. At its center is a generic core IVL that captures the essence of different separation logics. We define its operational semantics and formally connect it to two different back-end verifiers, which use symbolic execution and verification condition generation, resp. Crucially, this semantics uses angelic non-determinism to enable the application of different proof search algorithms and heuristics in the back-end verifiers. An axiomatic semantics for the core IVL simplifies reasoning about the front-end translation by performing essential proof steps once and for all in the equivalence proof with the operational semantics rather than for each concrete front-end translation.
We illustrate the usefulness of our formal framework by instantiating our core IVL with elements of Viper and connecting it to two Viper back-ends as well as a front-end for concurrent separation logic. All our technical results have been formalized in Isabelle/HOL, including the core IVL and its semantics, the semantics of two back-ends for a subset of Viper, and all proofs.
△ Less
Submitted 20 December, 2024; v1 submitted 29 July, 2024;
originally announced July 2024.
-
The influence of Automated Decision-Making systems in the context of street-level bureaucrats' practices
Authors:
Manuel Portela,
A. Paula Rodriguez Müller,
Luca Tangi
Abstract:
In an era of digital governance, the use of automation for individual and cooperative work is increasing in public administrations (Tangi et al., 2022). Despite the promises of efficiency and cost reduction, automation could bring new challenges to the governance schemes. Regional, national, and local governments are taking measures to regulate and measure the impact of automated decision-making s…
▽ More
In an era of digital governance, the use of automation for individual and cooperative work is increasing in public administrations (Tangi et al., 2022). Despite the promises of efficiency and cost reduction, automation could bring new challenges to the governance schemes. Regional, national, and local governments are taking measures to regulate and measure the impact of automated decision-making systems (ADMS). This research focuses on the use and adoption of ADMS in European public administrations to understand how these systems have been transforming the roles, tasks, and duties of street-level bureaucrats. We conducted a qualitative study in which we interviewed street-level bureaucrats from three administrations who had used an ADMS for several years, which was embedded in their daily work routines. The outcome of our research is an analysis of five dimensions of how collaborative work, the organizational settings, the capacities of bureaucrats and the implementation of the ADMS enable or limit the capacities for offering better services towards the citizens.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Authors:
Phillip Mueller,
Lars Mikelsons
Abstract:
The synthesis of product design concepts stands at the crux of early-phase development processes for technical products, traditionally posing an intricate interdisciplinary challenge. The application of deep learning methods, particularly Deep Generative Models (DGMs), holds the promise of automating and streamlining manual iterations and therefore introducing heightened levels of innovation and e…
▽ More
The synthesis of product design concepts stands at the crux of early-phase development processes for technical products, traditionally posing an intricate interdisciplinary challenge. The application of deep learning methods, particularly Deep Generative Models (DGMs), holds the promise of automating and streamlining manual iterations and therefore introducing heightened levels of innovation and efficiency. However, DGMs have yet to be widely adopted into the synthesis of product design concepts. This paper aims to explore the reasons behind this limited application and derive the requirements for successful integration of these technologies. We systematically analyze DGM-families (VAE, GAN, Diffusion, Transformer, Radiance Field), assessing their strengths, weaknesses, and general applicability for product design conception. Our objective is to provide insights that simplify the decision-making process for engineers, helping them determine which method might be most effective for their specific challenges. Recognizing the rapid evolution of this field, we hope that our analysis contributes to a fundamental understanding and guides practitioners towards the most promising approaches. This work seeks not only to illuminate current challenges but also to propose potential solutions, thereby offering a clear roadmap for leveraging DGMs in the realm of product design conception.
△ Less
Submitted 21 May, 2025; v1 submitted 15 July, 2024;
originally announced July 2024.
-
InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture
Authors:
Phillip Mueller,
Jannik Wiese,
Ioan Craciun,
Lars Mikelsons
Abstract:
Recent advancements in image synthesis are fueled by the advent of large-scale diffusion models. Yet, integrating realistic object visualizations seamlessly into new or existing backgrounds without extensive training remains a challenge. This paper introduces InsertDiffusion, a novel, training-free diffusion architecture that efficiently embeds objects into images while preserving their structural…
▽ More
Recent advancements in image synthesis are fueled by the advent of large-scale diffusion models. Yet, integrating realistic object visualizations seamlessly into new or existing backgrounds without extensive training remains a challenge. This paper introduces InsertDiffusion, a novel, training-free diffusion architecture that efficiently embeds objects into images while preserving their structural and identity characteristics. Our approach utilizes off-the-shelf generative models and eliminates the need for fine-tuning, making it ideal for rapid and adaptable visualizations in product design and marketing. We demonstrate superior performance over existing methods in terms of image realism and alignment with input conditions. By decomposing the generation task into independent steps, InsertDiffusion offers a scalable solution that extends the capabilities of diffusion models for practical applications, achieving high-quality visualizations that maintain the authenticity of the original objects.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Specialized curricula for training vision-language models in retinal image analysis
Authors:
Robbie Holland,
Thomas R. P. Taylor,
Christopher Holmes,
Sophie Riedl,
Julia Mai,
Maria Patsiamanidi,
Dimitra Mitsopoulou,
Paul Hager,
Philip Müller,
Hendrik P. N. Scholl,
Hrvoje Bogunović,
Ursula Schmidt-Erfurth,
Daniel Rueckert,
Sobha Sivaprasad,
Andrew J. Lotery,
Martin J. Menten
Abstract:
Clinicians spend a significant amount of time reviewing medical images and transcribing their findings regarding patient diagnosis, referral and treatment in text form. Vision-language models (VLMs), which automatically interpret images and summarize their findings as text, have enormous potential to alleviate clinical workloads and increase patient access to high-quality medical care. While found…
▽ More
Clinicians spend a significant amount of time reviewing medical images and transcribing their findings regarding patient diagnosis, referral and treatment in text form. Vision-language models (VLMs), which automatically interpret images and summarize their findings as text, have enormous potential to alleviate clinical workloads and increase patient access to high-quality medical care. While foundational models have stirred considerable interest in the medical community, it is unclear whether their general capabilities translate to real-world clinical utility. In this work, we demonstrate that OpenAI's ChatGPT-4o model, in addition to two foundation VLMs designed for medical use, markedly underperform compared to practicing ophthalmologists on specialist tasks crucial to the care of patients with age-related macular degeneration (AMD). To address this, we initially identified the essential capabilities required for image-based clinical decision-making, and then developed a curriculum to selectively train VLMs in these skills. The resulting model, RetinaVLM, can be instructed to write reports that significantly outperform those written by leading foundation medical VLMs and ChatGPT-4o in disease staging (F1 score of 0.63 vs. 0.33) and patient referral (0.67 vs. 0.50), and approaches the diagnostic performance of junior ophthalmologists (who achieve 0.77 and 0.78 on the respective tasks). Furthermore, in a single-blind reader study two senior ophthalmologists with up to 32 years of experience found RetinaVLM's reports were found to be substantially more accurate than those by ChatGPT-4o (64.3% vs. 14.3%). These results reinforce that our curriculum-based approach provides a blueprint towards specializing foundation medical VLMs for real-world clinical tasks.
△ Less
Submitted 24 February, 2025; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Authors:
Andrea Posada,
Daniel Rueckert,
Felix Meissen,
Philip Müller
Abstract:
Since the Transformer architecture emerged, language model development has grown, driven by their promising potential. Releasing these models into production requires properly understanding their behavior, particularly in sensitive domains like medicine. Despite this need, the medical literature still lacks practical assessment of pre-trained language models, which are especially valuable in setti…
▽ More
Since the Transformer architecture emerged, language model development has grown, driven by their promising potential. Releasing these models into production requires properly understanding their behavior, particularly in sensitive domains like medicine. Despite this need, the medical literature still lacks practical assessment of pre-trained language models, which are especially valuable in settings where only consumer-grade computational resources are available. To address this gap, we have conducted a comprehensive survey of language models in the medical field and evaluated a subset of these for medical text classification and conditional text generation. The subset includes 53 models with 110 million to 13 billion parameters, spanning the Transformer-based model families and knowledge domains. Different approaches are employed for text classification, including zero-shot learning, enabling tuning without the need to train the model. These approaches are helpful in our target settings, where many users of language models find themselves. The results reveal remarkable performance across the tasks and datasets evaluated, underscoring the potential of certain models to contain medical knowledge, even without domain specialization. This study thus advocates for further exploration of model applications in medical contexts, particularly in computational resource-constrained settings, to benefit a wide range of users. The code is available on https://github.com/anpoc/Language-models-in-medicine.
△ Less
Submitted 23 October, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Resource-efficient Medical Image Analysis with Self-adapting Forward-Forward Networks
Authors:
Johanna P. Müller,
Bernhard Kainz
Abstract:
We introduce a fast Self-adapting Forward-Forward Network (SaFF-Net) for medical imaging analysis, mitigating power consumption and resource limitations, which currently primarily stem from the prevalent reliance on back-propagation for model training and fine-tuning. Building upon the recently proposed Forward-Forward Algorithm (FFA), we introduce the Convolutional Forward-Forward Algorithm (CFFA…
▽ More
We introduce a fast Self-adapting Forward-Forward Network (SaFF-Net) for medical imaging analysis, mitigating power consumption and resource limitations, which currently primarily stem from the prevalent reliance on back-propagation for model training and fine-tuning. Building upon the recently proposed Forward-Forward Algorithm (FFA), we introduce the Convolutional Forward-Forward Algorithm (CFFA), a parameter-efficient reformulation that is suitable for advanced image analysis and overcomes the speed and generalisation constraints of the original FFA. To address hyper-parameter sensitivity of FFAs we are also introducing a self-adapting framework SaFF-Net fine-tuning parameters during warmup and training in parallel. Our approach enables more effective model training and eliminates the previously essential requirement for an arbitrarily chosen Goodness function in FFA. We evaluate our approach on several benchmarking datasets in comparison with standard Back-Propagation (BP) neural networks showing that FFA-based networks with notably fewer parameters and function evaluations can compete with standard models, especially, in one-shot scenarios and large batch sizes. The code will be available at the time of the conference.
△ Less
Submitted 17 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Diffusion-based Generative Image Outpainting for Recovery of FOV-Truncated CT Images
Authors:
Michelle Espranita Liman,
Daniel Rueckert,
Florian J. Fintelmann,
Philip Müller
Abstract:
Field-of-view (FOV) recovery of truncated chest CT scans is crucial for accurate body composition analysis, which involves quantifying skeletal muscle and subcutaneous adipose tissue (SAT) on CT slices. This, in turn, enables disease prognostication. Here, we present a method for recovering truncated CT slices using generative image outpainting. We train a diffusion model and apply it to truncated…
▽ More
Field-of-view (FOV) recovery of truncated chest CT scans is crucial for accurate body composition analysis, which involves quantifying skeletal muscle and subcutaneous adipose tissue (SAT) on CT slices. This, in turn, enables disease prognostication. Here, we present a method for recovering truncated CT slices using generative image outpainting. We train a diffusion model and apply it to truncated CT slices generated by simulating a small FOV. Our model reliably recovers the truncated anatomy and outperforms the previous state-of-the-art despite being trained on 87% less data.
△ Less
Submitted 26 September, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
ADESSE: Advice Explanations in Complex Repeated Decision-Making Environments
Authors:
Sören Schleibaum,
Lu Feng,
Sarit Kraus,
Jörg P. Müller
Abstract:
In the evolving landscape of human-centered AI, fostering a synergistic relationship between humans and AI agents in decision-making processes stands as a paramount challenge. This work considers a problem setup where an intelligent agent comprising a neural network-based prediction component and a deep reinforcement learning component provides advice to a human decision-maker in complex repeated…
▽ More
In the evolving landscape of human-centered AI, fostering a synergistic relationship between humans and AI agents in decision-making processes stands as a paramount challenge. This work considers a problem setup where an intelligent agent comprising a neural network-based prediction component and a deep reinforcement learning component provides advice to a human decision-maker in complex repeated decision-making environments. Whether the human decision-maker would follow the agent's advice depends on their beliefs and trust in the agent and on their understanding of the advice itself. To this end, we developed an approach named ADESSE to generate explanations about the adviser agent to improve human trust and decision-making. Computational experiments on a range of environments with varying model sizes demonstrate the applicability and scalability of ADESSE. Furthermore, an interactive game-based user study shows that participants were significantly more satisfied, achieved a higher reward in the game, and took less time to select an action when presented with explanations generated by ADESSE. These findings illuminate the critical role of tailored, human-centered explanations in AI-assisted decision-making.
△ Less
Submitted 10 September, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Verification Algorithms for Automated Separation Logic Verifiers
Authors:
Marco Eilers,
Malte Schwerhoff,
Peter Müller
Abstract:
Most automated program verifiers for separation logic use either symbolic execution or verification condition generation to extract proof obligations, which are then handed over to an SMT solver. Existing verification algorithms are designed to be sound, but differ in performance and completeness. These characteristics may also depend on the programs and properties to be verified. Consequently, de…
▽ More
Most automated program verifiers for separation logic use either symbolic execution or verification condition generation to extract proof obligations, which are then handed over to an SMT solver. Existing verification algorithms are designed to be sound, but differ in performance and completeness. These characteristics may also depend on the programs and properties to be verified. Consequently, developers and users of program verifiers have to select a verification algorithm carefully for their application domain. Taking an informed decision requires a systematic comparison of the performance and completeness characteristics of the verification algorithms used by modern separation logic verifiers, but such a comparison does not exist.
This paper describes five verification algorithms for separation logic, three that are used in existing tools and two novel algorithms that combine characteristics of existing symbolic execution and verification condition generation algorithms. A detailed evaluation of implementations of these five algorithms in the Viper infrastructure assesses their performance and completeness for different classes of input programs. Based on the experimental results, we identify candidate portfolios of algorithms that maximize completeness and performance.
△ Less
Submitted 27 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Reasoning about Interior Mutability in Rust using Library-Defined Capabilities
Authors:
Federico Poli,
Xavier Denis,
Peter Müller,
Alexander J. Summers
Abstract:
Existing automated verification techniques for safe Rust code rely on the strong type-system properties to reason about programs, especially to deduce which memory locations do not change (i.e., are framed) across function calls. However, these type guarantees do not hold in the presence of interior mutability (e.g., when interacting with any concurrent data structure). As a consequence, existing…
▽ More
Existing automated verification techniques for safe Rust code rely on the strong type-system properties to reason about programs, especially to deduce which memory locations do not change (i.e., are framed) across function calls. However, these type guarantees do not hold in the presence of interior mutability (e.g., when interacting with any concurrent data structure). As a consequence, existing verification techniques for safe code such as Prusti and Creusot are either unsound or fundamentally incomplete if applied to this setting. In this work, we present the first technique capable of automatically verifying safe clients of existing interiorly mutable types. At the core of our approach, we identify a novel notion of implicit capabilities: library-defined properties that cannot be expressed using Rust's types. We propose new annotations to specify these capabilities and a first-order logic encoding suitable for program verification. We have implemented our technique in a verifier called Mendel and used it to prove absence of panics in Rust programs that make use of popular standard-library types with interior mutability, including Rc, Arc, Cell, RefCell, AtomicI32, Mutex and RwLock. Our evaluation shows that these library annotations are useful for verifying usages of real-world libraries, and powerful enough to require zero client-side annotations in many of the verified programs.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Protocols to Code: Formal Verification of a Next-Generation Internet Router
Authors:
João C. Pereira,
Tobias Klenze,
Sofia Giampietro,
Markus Limbeck,
Dionysios Spiliopoulos,
Felix A. Wolf,
Marco Eilers,
Christoph Sprenger,
David Basin,
Peter Müller,
Adrian Perrig
Abstract:
We present the first formally-verified Internet router, which is part of the SCION Internet architecture. SCION routers run a cryptographic protocol for secure packet forwarding in an adversarial environment. We verify both the protocol's network-wide security properties and low-level properties of its implementation. More precisely, we develop a series of protocol models by refinement in Isabelle…
▽ More
We present the first formally-verified Internet router, which is part of the SCION Internet architecture. SCION routers run a cryptographic protocol for secure packet forwarding in an adversarial environment. We verify both the protocol's network-wide security properties and low-level properties of its implementation. More precisely, we develop a series of protocol models by refinement in Isabelle/HOL and we use an automated program verifier to prove that the router's Go code satisfies memory safety, crash freedom, freedom from data races, and adheres to the protocol model. Both verification efforts are soundly linked together. Our work demonstrates the feasibility of coherently verifying a critical network component from high-level protocol models down to performance-optimized production code, developed by an independent team. In the process, we uncovered critical bugs in both the protocol and its implementation, which were confirmed by the code developers, and we strengthened the protocol's security properties. This paper explains our approach, summarizes the main results, and distills lessons for the design and implementation of verifiable systems, for the handling of continuous changes, and for the verification techniques and tools employed.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
ChEX: Interactive Localization and Region Description in Chest X-rays
Authors:
Philip Müller,
Georgios Kaissis,
Daniel Rueckert
Abstract:
Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these i…
▽ More
Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX's interactive capabilities. Code: https://github.com/philip-mueller/chex
△ Less
Submitted 15 July, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Language Models Meet Anomaly Detection for Better Interpretability and Generalizability
Authors:
Jun Li,
Su Hwan Kim,
Philip Müller,
Lina Felsner,
Daniel Rueckert,
Benedikt Wiestler,
Julia A. Schnabel,
Cosmin I. Bercea
Abstract:
This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for mult…
▽ More
This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model's generalizability to previously unseen medical conditions. The code and dataset are available at https://github.com/compai-lab/miccai-2024-junli?tab=readme-ov-file
△ Less
Submitted 23 July, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Towards Trustworthy Automated Program Verifiers: Formally Validating Translations into an Intermediate Verification Language (extended version)
Authors:
Gaurav Parthasarathy,
Thibault Dardinier,
Benjamin Bonneau,
Peter Müller,
Alexander J. Summers
Abstract:
Automated program verifiers are typically implemented using an intermediate verification language (IVL), such as Boogie or Why3. A verifier front-end translates the input program and specification into an IVL program, while the back-end generates proof obligations for the IVL program and employs an SMT solver to discharge them. Soundness of such verifiers therefore requires that the front-end tran…
▽ More
Automated program verifiers are typically implemented using an intermediate verification language (IVL), such as Boogie or Why3. A verifier front-end translates the input program and specification into an IVL program, while the back-end generates proof obligations for the IVL program and employs an SMT solver to discharge them. Soundness of such verifiers therefore requires that the front-end translation faithfully captures the semantics of the input program and specification in the IVL program, and that the back-end reports success only if the IVL program is actually correct. For a verification tool to be trustworthy, these soundness conditions must be satisfied by its actual implementation, not just the program logic it uses.
In this paper, we present a novel validation methodology that, given a formal semantics for the input language and IVL, provides formal soundness guarantees for front-end implementations. For each run of the verifier, we automatically generate a proof in Isabelle showing that the correctness of the produced IVL program implies the correctness of the input program. This proof can be checked independently from the verifier, in Isabelle, and can be combined with existing work on validating back-ends to obtain an end-to-end soundness result. Our methodology based on forward simulation employs several modularisation strategies to handle the large semantic gap between the input language and the IVL, as well as the intricacies of practical, optimised translations. We present our methodology for the widely-used Viper and Boogie languages. Our evaluation shows that it is effective in validating the translations performed by the existing Viper implementation.
△ Less
Submitted 9 May, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
M3TCM: Multi-modal Multi-task Context Model for Utterance Classification in Motivational Interviews
Authors:
Sayed Muddashir Hossain,
Jan Alexandersson,
Philipp Müller
Abstract:
Accurate utterance classification in motivational interviews is crucial to automatically understand the quality and dynamics of client-therapist interaction, and it can serve as a key input for systems mediating such interactions. Motivational interviews exhibit three important characteristics. First, there are two distinct roles, namely client and therapist. Second, they are often highly emotiona…
▽ More
Accurate utterance classification in motivational interviews is crucial to automatically understand the quality and dynamics of client-therapist interaction, and it can serve as a key input for systems mediating such interactions. Motivational interviews exhibit three important characteristics. First, there are two distinct roles, namely client and therapist. Second, they are often highly emotionally charged, which can be expressed both in text and in prosody. Finally, context is of central importance to classify any given utterance. Previous works did not adequately incorporate all of these characteristics into utterance classification approaches for mental health dialogues. In contrast, we present M3TCM, a Multi-modal, Multi-task Context Model for utterance classification. Our approach for the first time employs multi-task learning to effectively model both joint and individual components of therapist and client behaviour. Furthermore, M3TCM integrates information from the text and speech modality as well as the conversation context. With our novel approach, we outperform the state of the art for utterance classification on the recently introduced AnnoMI dataset with a relative improvement of 20% for the client- and by 15% for therapist utterance classification. In extensive ablation studies, we quantify the improvement resulting from each contribution.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Algorithmic Details behind the Predator Shape Analyser
Authors:
Kamil Dudka,
Petr Muller,
Petr Peringer,
Veronika Šoková,
Tomáš Vojnar
Abstract:
This chapter, which is an extended and revised version of the conference paper 'Predator: Byte-Precise Verification of Low-Level List Manipulation', concentrates on a detailed description of the algorithms behind the Predator shape analyser based on abstract interpretation and symbolic memory graphs. Predator is particularly suited for formal analysis and verification of sequential non-recursive C…
▽ More
This chapter, which is an extended and revised version of the conference paper 'Predator: Byte-Precise Verification of Low-Level List Manipulation', concentrates on a detailed description of the algorithms behind the Predator shape analyser based on abstract interpretation and symbolic memory graphs. Predator is particularly suited for formal analysis and verification of sequential non-recursive C code that uses low-level pointer operations to manipulate various kinds of linked lists of unbounded size as well as various other kinds of pointer structures of bounded size. The tool supports practically relevant forms of pointer arithmetic, block operations, address alignment, or memory reinterpretation. We present the overall architecture of the tool, along with selected implementation details of the tool as well as its extension into so-called Predator Hunting Party, which utilises multiple concurrently-running Predator analysers with various restrictions on their behaviour. Results of experiments with Predator within the SV-COMP competition as well as on our own benchmarks are provided.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Weakly Supervised Object Detection in Chest X-Rays with Differentiable ROI Proposal Networks and Soft ROI Pooling
Authors:
Philip Müller,
Felix Meissen,
Georgios Kaissis,
Daniel Rueckert
Abstract:
Weakly supervised object detection (WSup-OD) increases the usefulness and interpretability of image classification algorithms without requiring additional supervision. The successes of multiple instance learning in this task for natural images, however, do not translate well to medical images due to the very different characteristics of their objects (i.e. pathologies). In this work, we propose We…
▽ More
Weakly supervised object detection (WSup-OD) increases the usefulness and interpretability of image classification algorithms without requiring additional supervision. The successes of multiple instance learning in this task for natural images, however, do not translate well to medical images due to the very different characteristics of their objects (i.e. pathologies). In this work, we propose Weakly Supervised ROI Proposal Networks (WSRPN), a new method for generating bounding box proposals on the fly using a specialized region of interest-attention (ROI-attention) module. WSRPN integrates well with classic backbone-head classification algorithms and is end-to-end trainable with only image-label supervision. We experimentally demonstrate that our new method outperforms existing methods in the challenging task of disease localization in chest X-ray images. Code: https://github.com/philip-mueller/wsrpn
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents
Authors:
Daksitha Withanage Don,
Philipp Müller,
Fabrizio Nunnari,
Elisabeth André,
Patrick Gebhard
Abstract:
Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack o…
▽ More
Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack of a software toolkit integrating such approaches with SIA frameworks that conforms to the challenging real-time requirements of human-agent interaction scenarios. In our work, we for the first time present such a toolkit consisting of three main components: (1) real-time feature extraction capturing multi-modal social cues from the user; (2) behavior generation based on a recent state-of-the-art neural network approach; (3) visualization of the generated behavior supporting both FLAME-based and Apple ARKit-based interactive agents. We comprehensively evaluate the real-time performance of the whole framework and its components. In addition, we introduce pre-trained behavioral generation models derived from psychotherapy sessions for domain-specific listening behaviors. Our software toolkit, pivotal for deploying and assessing SIAs' listening behavior in real-time, is publicly available. Resources, including code, behavioural multi-modal features extracted from therapeutic interactions, are hosted at https://daksitha.github.io/ReNeLib
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Classification of Volatile Organic Compounds by Differential Mobility Spectrometry Based on Continuity of Alpha Curves
Authors:
Anton Rauhameri,
Angelo Robiños,
Osmo Anttalainen,
Timo Salpavaara,
Jussi Rantala,
Veikko Surakka,
Pasi Kallio,
Antti Vehkaoja,
Philipp Müller
Abstract:
Background: Classification of volatile organic compounds (VOCs) is of interest in many fields. Examples include but are not limited to medicine, detection of explosives, and food quality control. Measurements collected with electronic noses can be used for classification and analysis of VOCs. One type of electronic noses that has seen considerable development in recent years is Differential Mobili…
▽ More
Background: Classification of volatile organic compounds (VOCs) is of interest in many fields. Examples include but are not limited to medicine, detection of explosives, and food quality control. Measurements collected with electronic noses can be used for classification and analysis of VOCs. One type of electronic noses that has seen considerable development in recent years is Differential Mobility Spectrometry (DMS). DMS yields measurements that are visualized as dispersion plots that contain traces, also known as alpha curves. Current methods used for analyzing DMS dispersion plots do not usually utilize the information stored in the continuity of these traces, which suggests that alternative approaches should be investigated.
Results: In this work, for the first time, dispersion plots were interpreted as a series of measurements evolving sequentially. Thus, it was hypothesized that time-series classification algorithms can be effective for classification and analysis of dispersion plots. An extensive dataset of 900 dispersion plots for five chemicals measured at five flow rates and two concentrations was collected. The data was used to analyze the classification performance of six algorithms. According to our hypothesis, the highest classification accuracy of 88\% was achieved by a Long-Short Term Memory neural network, which supports our hypothesis.
Significance: A new concept for approaching classification tasks of dispersion plots is presented and compared with other well-known classification algorithms. This creates a new angle of view for analysis and classification of the dispersion plots. In addition, a new dataset of dispersion plots is openly shared to public.
△ Less
Submitted 13 March, 2024; v1 submitted 13 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1326 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 9 May, 2025; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Authors:
Franciskus Xaverius Erick,
Mina Rezaei,
Johanna Paula Müller,
Bernhard Kainz
Abstract:
Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. Despite the substantial advancements made in recent years, self-supervised models have posed a challenge to practitioners, as they do not readily provide insight into the model's confidence and uncertainty. Tackling this issue is no simple feat, primarily due to the complexity involve…
▽ More
Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. Despite the substantial advancements made in recent years, self-supervised models have posed a challenge to practitioners, as they do not readily provide insight into the model's confidence and uncertainty. Tackling this issue is no simple feat, primarily due to the complexity involved in implementing techniques that can make use of the latent representations learned during pre-training without relying on explicit labels. Motivated by this, we introduce a new stochastic vision transformer that integrates uncertainty and distance awareness into self-supervised learning (SSL) pipelines. Instead of the conventional deterministic vector embedding, our novel stochastic vision transformer encodes image patches into elliptical Gaussian distributional embeddings. Notably, the attention matrices of these stochastic representational embeddings are computed using Wasserstein distance-based attention, effectively capitalizing on the distributional nature of these embeddings. Additionally, we propose a regularization term based on Wasserstein distance for both pre-training and fine-tuning processes, thereby incorporating distance awareness into latent representations. We perform extensive experiments across different tasks such as in-distribution generalization, out-of-distribution detection, dataset corruption, semi-supervised settings, and transfer learning to other datasets and tasks. Our proposed method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on a variety of datasets.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Refinement Proofs in Rust Using Ghost Locks
Authors:
Aurel Bílý,
João C. Pereira,
Jan Schär,
Peter Müller
Abstract:
Refinement transforms an abstract system model into a concrete, executable program, such that properties established for the abstract model carry over to the concrete implementation. Refinement has been used successfully in the development of substantial verified systems. Nevertheless, existing refinement techniques have limitations that impede their practical usefulness. Some techniques generate…
▽ More
Refinement transforms an abstract system model into a concrete, executable program, such that properties established for the abstract model carry over to the concrete implementation. Refinement has been used successfully in the development of substantial verified systems. Nevertheless, existing refinement techniques have limitations that impede their practical usefulness. Some techniques generate executable code automatically, which generally leads to implementations with sub-optimal performance. Others employ bottom-up program verification to reason about efficient implementations, but impose strict requirements on the structure of the code, the structure of the refinement proofs, as well as the employed verification logic and tools.
In this paper, we present a novel refinement technique that removes these limitations. It supports a wide range of program structures, data representations, and proof structures. Our approach supports reasoning about both safety and liveness properties. We implement our approach in a state-of-the-art verifier for the Rust language, which itself offers a strong foundation for memory safety. We demonstrate the practicality of our approach on a number of substantial case studies.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Whole Slide Multiple Instance Learning for Predicting Axillary Lymph Node Metastasis
Authors:
Glejdis Shkëmbi,
Johanna P. Müller,
Zhe Li,
Katharina Breininger,
Peter Schüffler,
Bernhard Kainz
Abstract:
Breast cancer is a major concern for women's health globally, with axillary lymph node (ALN) metastasis identification being critical for prognosis evaluation and treatment guidance. This paper presents a deep learning (DL) classification pipeline for quantifying clinical information from digital core-needle biopsy (CNB) images, with one step less than existing methods. A publicly available datase…
▽ More
Breast cancer is a major concern for women's health globally, with axillary lymph node (ALN) metastasis identification being critical for prognosis evaluation and treatment guidance. This paper presents a deep learning (DL) classification pipeline for quantifying clinical information from digital core-needle biopsy (CNB) images, with one step less than existing methods. A publicly available dataset of 1058 patients was used to evaluate the performance of different baseline state-of-the-art (SOTA) DL models in classifying ALN metastatic status based on CNB images. An extensive ablation study of various data augmentation techniques was also conducted. Finally, the manual tumor segmentation and annotation step performed by the pathologists was assessed.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Anatomy-Driven Pathology Detection on Chest X-rays
Authors:
Philip Müller,
Felix Meissen,
Johannes Brandt,
Georgios Kaissis,
Daniel Rueckert
Abstract:
Pathology detection and delineation enables the automatic interpretation of medical scans such as chest X-rays while providing a high level of explainability to support radiologists in making informed decisions. However, annotating pathology bounding boxes is a time-consuming task such that large public datasets for this purpose are scarce. Current approaches thus use weakly supervised object dete…
▽ More
Pathology detection and delineation enables the automatic interpretation of medical scans such as chest X-rays while providing a high level of explainability to support radiologists in making informed decisions. However, annotating pathology bounding boxes is a time-consuming task such that large public datasets for this purpose are scarce. Current approaches thus use weakly supervised object detection to learn the (rough) localization of pathologies from image-level annotations, which is however limited in performance due to the lack of bounding box supervision. We therefore propose anatomy-driven pathology detection (ADPD), which uses easy-to-annotate bounding boxes of anatomical regions as proxies for pathologies. We study two training approaches: supervised training using anatomy-level pathology labels and multiple instance learning (MIL) with image-level pathology labels. Our results show that our anatomy-level training approach outperforms weakly supervised methods and fully supervised detection with limited training samples, and our MIL approach is competitive with both baseline approaches, therefore demonstrating the potential of our approach.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
NeMig -- A Bilingual News Collection and Knowledge Graph about Migration
Authors:
Andreea Iana,
Mehwish Alam,
Alexander Grote,
Nevena Nikolajevic,
Katharina Ludwig,
Philipp Müller,
Christof Weinhardt,
Heiko Paulheim
Abstract:
News recommendation plays a critical role in shaping the public's worldviews through the way in which it filters and disseminates information about different topics. Given the crucial impact that media plays in opinion formation, especially for sensitive topics, understanding the effects of personalized recommendation beyond accuracy has become essential in today's digital society. In this work, w…
▽ More
News recommendation plays a critical role in shaping the public's worldviews through the way in which it filters and disseminates information about different topics. Given the crucial impact that media plays in opinion formation, especially for sensitive topics, understanding the effects of personalized recommendation beyond accuracy has become essential in today's digital society. In this work, we present NeMig, a bilingual news collection on the topic of migration, and corresponding rich user data. In comparison to existing news recommendation datasets, which comprise a large variety of monolingual news, NeMig covers articles on a single controversial topic, published in both Germany and the US. We annotate the sentiment polarization of the articles and the political leanings of the media outlets, in addition to extracting subtopics and named entities disambiguated through Wikidata. These features can be used to analyze the effects of algorithmic news curation beyond accuracy-based performance, such as recommender biases and the creation of filter bubbles. We construct domain-specific knowledge graphs from the news text and metadata, thus encoding knowledge-level connections between articles. Importantly, while existing datasets include only click behavior, we collect user socio-demographic and political information in addition to explicit click feedback. We demonstrate the utility of NeMig through experiments on the tasks of news recommenders benchmarking, analysis of biases in recommenders, and news trends analysis. NeMig aims to provide a useful resource for the news recommendation community and to foster interdisciplinary research into the multidimensional effects of algorithmic news curation.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Classification robustness to common optical aberrations
Authors:
Patrick Müller,
Alexander Braun,
Margret Keuper
Abstract:
Computer vision using deep neural networks (DNNs) has brought about seminal changes in people's lives. Applications range from automotive, face recognition in the security industry, to industrial process monitoring. In some cases, DNNs infer even in safety-critical situations. Therefore, for practical applications, DNNs have to behave in a robust way to disturbances such as noise, pixelation, or b…
▽ More
Computer vision using deep neural networks (DNNs) has brought about seminal changes in people's lives. Applications range from automotive, face recognition in the security industry, to industrial process monitoring. In some cases, DNNs infer even in safety-critical situations. Therefore, for practical applications, DNNs have to behave in a robust way to disturbances such as noise, pixelation, or blur. Blur directly impacts the performance of DNNs, which are often approximated as a disk-shaped kernel to model defocus. However, optics suggests that there are different kernel shapes depending on wavelength and location caused by optical aberrations. In practice, as the optical quality of a lens decreases, such aberrations increase. This paper proposes OpticsBench, a benchmark for investigating robustness to realistic, practically relevant optical blur effects. Each corruption represents an optical aberration (coma, astigmatism, spherical, trefoil) derived from Zernike Polynomials. Experiments on ImageNet show that for a variety of different pre-trained DNNs, the performance varies strongly compared to disk-shaped kernels, indicating the necessity of considering realistic image degradations. In addition, we show on ImageNet-100 with OpticsAugment that robustness can be increased by using optical kernels as data augmentation. Compared to a conventionally trained ResNeXt50, training with OpticsAugment achieves an average performance gain of 21.7% points on OpticsBench and 6.8% points on 2D common corruptions.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
MultiMediate'23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions
Authors:
Philipp Müller,
Michal Balazia,
Tobias Baur,
Michael Dietz,
Alexander Heimerl,
Dominik Schiller,
Mohammed Guermal,
Dominike Thomas,
François Brémond,
Jan Alexandersson,
Elisabeth André,
Andreas Bulling
Abstract:
Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes t…
▽ More
Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes the MultiMediate'23 challenge and presents novel sets of annotations for both tasks. For engagement estimation we collected novel annotations on the NOvice eXpert Interaction (NOXI) database. For bodily behaviour recognition, we annotated test recordings of the MPIIGroupInteraction corpus with the BBSI annotation scheme. In addition, we present baseline results for both challenge tasks.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Unlocking the diagnostic potential of electrocardiograms through information transfer from cardiac magnetic resonance imaging
Authors:
Özgün Turgut,
Philip Müller,
Paul Hager,
Suprosanna Shit,
Sophie Starck,
Martin J. Menten,
Eimo Martens,
Daniel Rueckert
Abstract:
Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram (ECG) is a cost-effective and widely available diagnostic aid that provides functional information of the heart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magnetic resonance (CMR) imaging provides detailed structural information of the heart a…
▽ More
Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram (ECG) is a cost-effective and widely available diagnostic aid that provides functional information of the heart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magnetic resonance (CMR) imaging provides detailed structural information of the heart and thus enables evidence-based diagnosis of CVD, but long scan times and high costs limit its use in clinical routine. In this work, we present a deep learning strategy for cost-effective and comprehensive cardiac screening solely from ECG. Our approach combines multimodal contrastive learning with masked data modelling to transfer domain-specific information from CMR imaging to ECG representations. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalisability of our method for subject-specific risk prediction of CVD and the prediction of cardiac phenotypes using only ECG data. Specifically, our novel multimodal pre-training paradigm improves performance by up to 12.19 % for risk prediction and 27.59 % for phenotype prediction. In a qualitative analysis, we demonstrate that our learned ECG representations incorporate information from CMR image regions of interest. Our entire pipeline is publicly available at https://github.com/oetu/MMCL-ECG-CMR.
△ Less
Submitted 7 January, 2025; v1 submitted 9 August, 2023;
originally announced August 2023.
-
Interpretable 2D Vision Models for 3D Medical Images
Authors:
Alexander Ziller,
Ayhan Can Erdur,
Marwa Trigui,
Alp Güvenir,
Tamara T. Mueller,
Philip Müller,
Friederike Jungmann,
Johannes Brandt,
Jan Peeken,
Rickmer Braren,
Daniel Rueckert,
Georgios Kaissis
Abstract:
Training Artificial Intelligence (AI) models on 3D images presents unique challenges compared to the 2D case: Firstly, the demand for computational resources is significantly higher, and secondly, the availability of large datasets for pre-training is often limited, impeding training success. This study proposes a simple approach of adapting 2D networks with an intermediate feature representation…
▽ More
Training Artificial Intelligence (AI) models on 3D images presents unique challenges compared to the 2D case: Firstly, the demand for computational resources is significantly higher, and secondly, the availability of large datasets for pre-training is often limited, impeding training success. This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images. Our method employs attention pooling to learn to assign each slice an importance weight and, by that, obtain a weighted average of all 2D slices. These weights directly quantify the contribution of each slice to the contribution and thus make the model prediction inspectable. We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods. Furthermore, we compare the in-built interpretability of our approach to HiResCam, a state-of-the-art retrospective interpretability approach.
△ Less
Submitted 5 December, 2023; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Many tasks make light work: Learning to localise medical anomalies from multiple synthetic tasks
Authors:
Matthew Baugh,
Jeremy Tan,
Johanna P. Müller,
Mischa Dombrowski,
James Batten,
Bernhard Kainz
Abstract:
There is a growing interest in single-class modelling and out-of-distribution detection as fully supervised machine learning models cannot reliably identify classes not included in their training. The long tail of infinitely many out-of-distribution classes in real-world scenarios, e.g., for screening, triage, and quality control, means that it is often necessary to train single-class models that…
▽ More
There is a growing interest in single-class modelling and out-of-distribution detection as fully supervised machine learning models cannot reliably identify classes not included in their training. The long tail of infinitely many out-of-distribution classes in real-world scenarios, e.g., for screening, triage, and quality control, means that it is often necessary to train single-class models that represent an expected feature distribution, e.g., from only strictly healthy volunteer data. Conventional supervised machine learning would require the collection of datasets that contain enough samples of all possible diseases in every imaging modality, which is not realistic. Self-supervised learning methods with synthetic anomalies are currently amongst the most promising approaches, alongside generative auto-encoders that analyse the residual reconstruction error. However, all methods suffer from a lack of structured validation, which makes calibration for deployment difficult and dataset-dependant. Our method alleviates this by making use of multiple visually-distinct synthetic anomaly learning tasks for both training and validation. This enables more robust training and generalisation. With our approach we can readily outperform state-of-the-art methods, which we demonstrate on exemplars in brain MRI and chest X-rays. Code is available at https://github.com/matt-baugh/many-tasks-make-light-work .
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Zero-Shot Anomaly Detection with Pre-trained Segmentation Models
Authors:
Matthew Baugh,
James Batten,
Johanna P. Müller,
Bernhard Kainz
Abstract:
This technical report outlines our submission to the zero-shot track of the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge. Building on the performance of the WINCLIP framework, we aim to enhance the system's localization capabilities by integrating zero-shot segmentation models. In addition, we perform foreground instance segmentation which enables the model to focus on the relevant p…
▽ More
This technical report outlines our submission to the zero-shot track of the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge. Building on the performance of the WINCLIP framework, we aim to enhance the system's localization capabilities by integrating zero-shot segmentation models. In addition, we perform foreground instance segmentation which enables the model to focus on the relevant parts of the image, thus allowing the models to better identify small or subtle deviations. Our pipeline requires no external data or information, allowing for it to be directly applied to new datasets. Our team (Variance Vigilance Vanguard) ranked third in the zero-shot track of the VAND challenge, and achieve an average F1-max score of 81.5/24.2 at a sample/pixel level on the VisA dataset.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Backchannel Detection and Agreement Estimation from Video with Transformer Networks
Authors:
Ahmed Amer,
Chirag Bhuvaneshwara,
Gowtham K. Addluri,
Mohammed M. Shaik,
Vedant Bonde,
Philipp Müller
Abstract:
Listeners use short interjections, so-called backchannels, to signify attention or express agreement. The automatic analysis of this behavior is of key importance for human conversation analysis and interactive conversational agents. Current state-of-the-art approaches for backchannel analysis from visual behavior make use of two types of features: features based on body pose and features based on…
▽ More
Listeners use short interjections, so-called backchannels, to signify attention or express agreement. The automatic analysis of this behavior is of key importance for human conversation analysis and interactive conversational agents. Current state-of-the-art approaches for backchannel analysis from visual behavior make use of two types of features: features based on body pose and features based on facial behavior. At the same time, transformer neural networks have been established as an effective means to fuse input from different data sources, but they have not yet been applied to backchannel analysis. In this work, we conduct a comprehensive evaluation of multi-modal transformer architectures for automatic backchannel analysis based on pose and facial information. We address both the detection of backchannels as well as the task of estimating the agreement expressed in a backchannel. In evaluations on the MultiMediate'22 backchannel detection challenge, we reach 66.4% accuracy with a one-layer transformer architecture, outperforming the previous state of the art. With a two-layer transformer architecture, we furthermore set a new state of the art (0.0604 MSE) on the task of estimating the amount of agreement expressed in a backchannel.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
MO-DEHB: Evolutionary-based Hyperband for Multi-Objective Optimization
Authors:
Noor Awad,
Ayushi Sharma,
Philipp Muller,
Janek Thomas,
Frank Hutter
Abstract:
Hyperparameter optimization (HPO) is a powerful technique for automating the tuning of machine learning (ML) models. However, in many real-world applications, accuracy is only one of multiple performance criteria that must be considered. Optimizing these objectives simultaneously on a complex and diverse search space remains a challenging task. In this paper, we propose MO-DEHB, an effective and f…
▽ More
Hyperparameter optimization (HPO) is a powerful technique for automating the tuning of machine learning (ML) models. However, in many real-world applications, accuracy is only one of multiple performance criteria that must be considered. Optimizing these objectives simultaneously on a complex and diverse search space remains a challenging task. In this paper, we propose MO-DEHB, an effective and flexible multi-objective (MO) optimizer that extends the recent evolutionary Hyperband method DEHB. We validate the performance of MO-DEHB using a comprehensive suite of 15 benchmarks consisting of diverse and challenging MO problems, including HPO, neural architecture search (NAS), and joint NAS and HPO, with objectives including accuracy, latency and algorithmic fairness. A comparative study against state-of-the-art MO optimizers demonstrates that MO-DEHB clearly achieves the best performance across our 15 benchmarks.
△ Less
Submitted 11 May, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Flexible K Nearest Neighbors Classifier: Derivation and Application for Ion-mobility Spectrometry-based Indoor Localization
Authors:
Philipp Müller
Abstract:
The K Nearest Neighbors (KNN) classifier is widely used in many fields such as fingerprint-based localization or medicine. It determines the class membership of unlabelled sample based on the class memberships of the K labelled samples, the so-called nearest neighbors, that are closest to the unlabelled sample. The choice of K has been the topic of various studies and proposed KNN-variants. Yet no…
▽ More
The K Nearest Neighbors (KNN) classifier is widely used in many fields such as fingerprint-based localization or medicine. It determines the class membership of unlabelled sample based on the class memberships of the K labelled samples, the so-called nearest neighbors, that are closest to the unlabelled sample. The choice of K has been the topic of various studies and proposed KNN-variants. Yet no variant has been proven to outperform all other variants. In this paper a KNN-variant is discussed which ensures that the K nearest neighbors are indeed close to the unlabelled sample and finds K along the way. The algorithm is tested and compared to the standard KNN in theoretical scenarios and for indoor localization based on ion-mobility spectrometry fingerprints. It achieves a higher classification accuracy than the KNN in the tests, while having the same computational demand.
△ Less
Submitted 13 March, 2024; v1 submitted 20 April, 2023;
originally announced April 2023.