Search | arXiv e-print repository

arXiv:2503.19244 [pdf, ps, other]

Maximum number of edge colorings avoiding rainbow copies of $K_4$

Authors: Hiêp Hàn, Carlos Hoppen, Nicolas Moro Müller, Dionatan Ricardo Schmidt

Abstract: In this paper we show that for $r\geq 12$ and any sufficiently large $n$-vertex graph $G$ the number of $r$-edge-colorings of $G$ with no rainbow $K_4$ is at most $r^{ex(n,K_4)}$, where $ex(n,K_4)$ denotes the Turán number of $K_4$. Moreover, $G$ attains equality if and only if it is the Turán graph $T_3(n)$. The bound on the number of colors $r\geq 12$ is best possible. It improves upon a resul… ▽ More In this paper we show that for $r\geq 12$ and any sufficiently large $n$-vertex graph $G$ the number of $r$-edge-colorings of $G$ with no rainbow $K_4$ is at most $r^{ex(n,K_4)}$, where $ex(n,K_4)$ denotes the Turán number of $K_4$. Moreover, $G$ attains equality if and only if it is the Turán graph $T_3(n)$. The bound on the number of colors $r\geq 12$ is best possible. It improves upon a result of H. Lefmann, D.A. Nolibos, and the second author who showed the same result for $r \geq 5434$ and it confirms a conjecture by Gupta, Pehova, Powierski and Staden. △ Less

Submitted 30 April, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

Comments: 15 pages

MSC Class: 05C35 ACM Class: G.2.2

arXiv:2408.15775 [pdf, other]

Easy, Interpretable, Effective: openSMILE for voice deepfake detection

Authors: Octavian Pascu, Dan Oneata, Horia Cucu, Nicolas M. Müller

Abstract: In this paper, we demonstrate that attacks in the latest ASVspoof5 dataset -- a de facto standard in the field of voice authenticity and deepfake detection -- can be identified with surprising accuracy using a small subset of very simplistic features. These are derived from the openSMILE library, and are scalar-valued, easy to compute, and human interpretable. For example, attack A10`s unvoiced se… ▽ More In this paper, we demonstrate that attacks in the latest ASVspoof5 dataset -- a de facto standard in the field of voice authenticity and deepfake detection -- can be identified with surprising accuracy using a small subset of very simplistic features. These are derived from the openSMILE library, and are scalar-valued, easy to compute, and human interpretable. For example, attack A10`s unvoiced segments have a mean length of 0.09 +- 0.02, while bona fide instances have a mean length of 0.18 +- 0.07. Using this feature alone, a threshold classifier achieves an Equal Error Rate (EER) of 10.3% for attack A10. Similarly, across all attacks, we achieve up to 0.8% EER, with an overall EER of 15.7 +- 6.0%. We explore the generalization capabilities of these features and find that some of them transfer effectively between attacks, primarily when the attacks originate from similar Text-to-Speech (TTS) architectures. This finding may indicate that voice anti-spoofing is, in part, a problem of identifying and remembering signatures or fingerprints of individual TTS systems. This allows to better understand anti-spoofing models and their challenges in real-world application. △ Less

Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

arXiv:2406.03512 [pdf, other]

Harder or Different? Understanding Generalization of Audio Deepfake Detection

Authors: Nicolas M. Müller, Nicholas Evans, Hemlata Tak, Philip Sperl, Konstantin Böttinger

Abstract: Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others. The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect? Or, is it because deepfakes generated with one model are fundamentally different to those generated using a… ▽ More Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others. The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect? Or, is it because deepfakes generated with one model are fundamentally different to those generated using another model? We answer this question by decomposing the performance gap between in-domain and out-of-domain test data into 'hardness' and 'difference' components. Experiments performed using ASVspoof databases indicate that the hardness component is practically negligible, with the performance gap being attributed primarily to the difference component. This has direct implications for real-world deepfake detection, highlighting that merely increasing model capacity, the currently-dominant research trend, may not effectively address the generalization challenge. △ Less

Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Journal ref: Interspeech 2024

arXiv:2402.11963 [pdf, other]

Imbalance in Regression Datasets

Authors: Daniel Kowatsch, Nicolas M. Müller, Kilian Tscharke, Philip Sperl, Konstantin Bötinger

Abstract: For classification, the problem of class imbalance is well known and has been extensively studied. In this paper, we argue that imbalance in regression is an equally important problem which has so far been overlooked: Due to under- and over-representations in a data set's target distribution, regressors are prone to degenerate to naive models, systematically neglecting uncommon training data and o… ▽ More For classification, the problem of class imbalance is well known and has been extensively studied. In this paper, we argue that imbalance in regression is an equally important problem which has so far been overlooked: Due to under- and over-representations in a data set's target distribution, regressors are prone to degenerate to naive models, systematically neglecting uncommon training data and over-representing targets seen often during training. We analyse this problem theoretically and use resulting insights to develop a first definition of imbalance in regression, which we show to be a generalisation of the commonly employed imbalance measure in classification. With this, we hope to turn the spotlight on the overlooked problem of imbalance in regression and to provide common ground for future research. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.06304 [pdf, ps, other]

A New Approach to Voice Authenticity

Authors: Nicolas M. Müller, Piotr Kawa, Shen Hu, Matthias Neu, Jennifer Williams, Philip Sperl, Konstantin Böttinger

Abstract: Voice faking, driven primarily by recent advances in text-to-speech (TTS) synthesis technology, poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can be considered genuine, while fake speech comes from TTS synthesis. We argue that this binary distinction is oversimplified. For instance, altered playback speeds can be used for malicious purpo… ▽ More Voice faking, driven primarily by recent advances in text-to-speech (TTS) synthesis technology, poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can be considered genuine, while fake speech comes from TTS synthesis. We argue that this binary distinction is oversimplified. For instance, altered playback speeds can be used for malicious purposes, like in the 'Drunken Nancy Pelosi' incident. Similarly, editing of audio clips can be done ethically, e.g., for brevity or summarization in news reporting or podcasts, but editing can also create misleading narratives. In this paper, we propose a conceptual shift away from the binary paradigm of audio being either 'fake' or 'real'. Instead, our focus is on pinpointing 'voice edits', which encompass traditional modifications like filters and cuts, as well as TTS synthesis and VC systems. We delineate 6 categories and curate a new challenge dataset rooted in the M-AILABS corpus, for which we present baseline detection systems. And most importantly, we argue that merely categorizing audio as fake or real is a dangerous over-simplification that will fail to move the field of speech technology forward. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2401.09512 [pdf, other]

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Authors: Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

Abstract: Text-to-Speech (TTS) technology offers notable benefits, such as providing a voice for individuals with speech impairments, but it also facilitates the creation of audio deepfakes and spoofing attacks. AI-based detection methods can help mitigate these risks; however, the performance of such models is inherently dependent on the quality and diversity of their training data. Presently, the availabl… ▽ More Text-to-Speech (TTS) technology offers notable benefits, such as providing a voice for individuals with speech impairments, but it also facilitates the creation of audio deepfakes and spoofing attacks. AI-based detection methods can help mitigate these risks; however, the performance of such models is inherently dependent on the quality and diversity of their training data. Presently, the available datasets are heavily skewed towards English and Chinese audio, which limits the global applicability of these anti-spoofing systems. To address this limitation, this paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), created using 91 TTS models, comprising 42 different architectures, to generate 420.7 hours of synthetic voice in 38 different languages. We train and evaluate three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance over comparable datasets like InTheWild and Fake-Or-Real when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing MLAAD and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes. △ Less

Submitted 26 April, 2025; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: IJCNN 2024

arXiv:2310.19381 [pdf, other]

Protecting Publicly Available Data With Machine Learning Shortcuts

Authors: Nicolas M. Müller, Maximilian Burgert, Pascal Debus, Jennifer Williams, Philip Sperl, Konstantin Böttinger

Abstract: Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficul… ▽ More Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Published at BMVC 2023

arXiv:2308.11800 [pdf, other]

Complex-valued neural networks for voice anti-spoofing

Authors: Nicolas M. Müller, Philip Sperl, Konstantin Böttinger

Abstract: Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper pr… ▽ More Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the complex-valued, CQT frequency-domain representation of the input audio. This method retains phase information and allows for explainable AI methods. Results show that this approach outperforms previous methods on the "In-the-Wild" anti-spoofing dataset and enables interpretation of the results through explainable AI. Ablation studies confirm that the model has learned to use phase information to detect voice spoofing. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Interspeech 2023

arXiv:2302.04246 [pdf, other]

Shortcut Detection with Variational Autoencoders

Authors: Nicolas M. Müller, Simon Roschmann, Shahbaz Khan, Philip Sperl, Konstantin Böttinger

Abstract: For real-world applications of machine learning (ML), it is essential that models make predictions based on well-generalizing features rather than spurious correlations in the data. The identification of such spurious correlations, also known as shortcuts, is a challenging problem and has so far been scarcely addressed. In this work, we present a novel approach to detect shortcuts in image and aud… ▽ More For real-world applications of machine learning (ML), it is essential that models make predictions based on well-generalizing features rather than spurious correlations in the data. The identification of such spurious correlations, also known as shortcuts, is a challenging problem and has so far been scarcely addressed. In this work, we present a novel approach to detect shortcuts in image and audio datasets by leveraging variational autoencoders (VAEs). The disentanglement of features in the latent space of VAEs allows us to discover feature-target correlations in datasets and semi-automatically evaluate them for ML shortcuts. We demonstrate the applicability of our method on several real-world datasets and identify shortcuts that have not been discovered before. △ Less

Submitted 21 July, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: Accepted at the ICML 2023 Workshop on Spurious Correlations, Invariance and Stability

arXiv:2211.15510 [pdf, other]

Localized Shortcut Removal

Authors: Nicolas M. Müller, Jochen Jacobs, Jennifer Williams, Konstantin Böttinger

Abstract: Machine learning is a data-driven field, and the quality of the underlying datasets plays a crucial role in learning success. However, high performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful. This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at… ▽ More Machine learning is a data-driven field, and the quality of the underlying datasets plays a crucial role in learning success. However, high performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful. This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at hand. To address this issue for datasets where the shortcuts are smaller and more localized than true features, we propose a novel approach to detect and remove them. We use an adversarially trained lens to detect and eliminate highly predictive but semantically unconnected clues in images. In our experiments on both synthetic and real-world data, we show that our proposed approach reliably identifies and neutralizes such shortcuts without causing degradation of model performance on clean data. We believe that our approach can lead to more meaningful and generalizable machine learning models, especially in scenarios where the quality of the underlying datasets is crucial. △ Less

Submitted 23 May, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted at XAI4CV @ CVPR2023

arXiv:2203.16263 [pdf, other]

Does Audio Deepfake Detection Generalize?

Authors: Nicolas M. Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, Konstantin Böttinger

Abstract: Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these architectures are successful: Preprocessing steps, hyperparameter settings, and the degree of fine-tuning are not consistent across related work. Whi… ▽ More Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these architectures are successful: Preprocessing steps, hyperparameter settings, and the degree of fine-tuning are not consistent across related work. Which factors contribute to success, and which are accidental? In this work, we address this problem: We systematize audio spoofing detection by re-implementing and uniformly evaluating architectures from related work. We identify overarching features for successful audio deepfake detection, such as using cqtspec or logspec features instead of melspec features, which improves performance by 37% EER on average, all other factors constant. Additionally, we evaluate generalization capabilities: We collect and publish a new dataset consisting of 37.9 hours of found audio recordings of celebrities and politicians, of which 17.2 hours are deepfakes. We find that related work performs poorly on such real-world data (performance degradation of up to one thousand percent). This may suggest that the community has tailored its solutions too closely to the prevailing ASVSpoof benchmark and that deepfakes are much harder to detect outside the lab than previously thought. △ Less

Submitted 27 August, 2024; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: Interspeech 2022

arXiv:2203.15563 [pdf, other]

Attacker Attribution of Audio Deepfakes

Authors: Nicolas M. Müller, Franziska Dieckmann, Jennifer Williams

Abstract: Deepfakes are synthetically generated media often devised with malicious intent. They have become increasingly more convincing with large training datasets advanced neural networks. These fakes are readily being misused for slander, misinformation and fraud. For this reason, intensive research for developing countermeasures is also expanding. However, recent work is almost exclusively limited to d… ▽ More Deepfakes are synthetically generated media often devised with malicious intent. They have become increasingly more convincing with large training datasets advanced neural networks. These fakes are readily being misused for slander, misinformation and fraud. For this reason, intensive research for developing countermeasures is also expanding. However, recent work is almost exclusively limited to deepfake detection - predicting if audio is real or fake. This is despite the fact that attribution (who created which fake?) is an essential building block of a larger defense strategy, as practiced in the field of cybersecurity for a long time. This paper considers the problem of deepfake attacker attribution in the domain of audio. We present several methods for creating attacker signatures using low-level acoustic descriptors and machine learning embeddings. We show that speech signal features are inadequate for characterizing attacker signatures. However, we also demonstrate that embeddings from a recurrent neural network can successfully characterize attacks from both known and unknown attackers. Our attack signature embeddings result in distinct clusters, both for seen and unseen audio deepfakes. We show that these embeddings can be used in downstream-tasks to high-effect, scoring 97.10% accuracy in attacker-id classification. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to Insterspeech 2022

arXiv:2107.09667 [pdf, other]

doi 10.1145/3552466.3556531

Human Perception of Audio Deepfakes

Authors: Nicolas M. Müller, Karla Pizzi, Jennifer Williams

Abstract: The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research. Automatic detection of deepfakes has seen many new machine learning techniques, however, human detection capabilities are far less explored. In this paper, we present results from comparing the abilities of humans and machines for detecting audio deepfakes used to imitate… ▽ More The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research. Automatic detection of deepfakes has seen many new machine learning techniques, however, human detection capabilities are far less explored. In this paper, we present results from comparing the abilities of humans and machines for detecting audio deepfakes used to imitate someone's voice. For this, we use a web-based application framework formulated as a game. Participants were asked to distinguish between real and fake audio samples. In our experiment, 472 unique users competed against a state-of-the-art AI deepfake detection algorithm for 14912 total of rounds of the game. We find that humans and deepfake detection algorithms share similar strengths and weaknesses, both struggling to detect certain types of attacks. This is in contrast to the superhuman performance of AI in many application areas such as object detection or face recognition. Concerning human success factors, we find that IT professionals have no advantage over non-professionals but native speakers have an advantage over non-native speakers. Additionally, we find that older participants tend to be more susceptible than younger ones. These insights may be helpful when designing future cybersecurity training for humans as well as developing better detection algorithms. △ Less

Submitted 27 August, 2024; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: Published at ACM Multimedia 2022 Workshop DDAM First International Workshop on Deepfake Detection for Audio Multimedia at ACM Multimedia 2022

arXiv:2106.12914 [pdf, other]

Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?

Authors: Nicolas M. Müller, Franziska Dieckmann, Pavel Czempin, Roman Canals, Konstantin Böttinger, Jennifer Williams

Abstract: We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenom… ▽ More We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenomenon and its impact in depth. We compare several types of models trained on a) only the duration of the leading silence and b) only on the duration of leading and trailing silence. Results show that models trained on only the duration of the leading silence perform particularly well, and achieve up to 85% percent accuracy and an equal error rate (EER) of 15.1%. At the same time, we observe that trimming silence during pre-processing and then training established antispoofing models using signal-based features leads to comparatively worse performance. In that case, EER increases from 3.6% (with silence) to 15.5% (trimmed silence). Our findings suggest that previous work may, in part, have inadvertently learned thespoof/bonafide distinction by relying on the duration of silence as it appears in the official challenge dataset. We discuss the potential consequences that this has for interpreting system scores in the challenge and discuss how the ASV community may further consider this issue. △ Less

Submitted 28 September, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Journal ref: ASVspoof 2021 Workshop

arXiv:2104.06744 [pdf, other]

Defending Against Adversarial Denial-of-Service Data Poisoning Attacks

Authors: Nicolas M. Müller, Simon Roschmann, Konstantin Böttinger

Abstract: Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies. Since many applications rely on untrusted training data, an attacker can easily craft malicious samples and inject them into the training dataset to degrade the performance of machine learning models. As recent work has shown, such Denial-of-Service (DoS) data poisoning attacks are hi… ▽ More Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies. Since many applications rely on untrusted training data, an attacker can easily craft malicious samples and inject them into the training dataset to degrade the performance of machine learning models. As recent work has shown, such Denial-of-Service (DoS) data poisoning attacks are highly effective. To mitigate this threat, we propose a new approach of detecting DoS poisoned instances. In comparison to related work, we deviate from clustering and anomaly detection based approaches, which often suffer from the curse of dimensionality and arbitrary anomaly threshold selection. Rather, our defence is based on extracting information from the training data in such a generalized manner that we can identify poisoned samples based on the information present in the unpoisoned portion of the data. We evaluate our defence against two DoS poisoning attacks and seven datasets, and find that it reliably identifies poisoned instances. In comparison to related work, our defence improves false positive / false negative rates by at least 50%, often more. △ Less

Submitted 30 November, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: Published at ACSAC DYNAMICS 2020

arXiv:2104.05557 [pdf, other]

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

Authors: Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frederico Santos de Oliveira, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Maria Aluisio, Moacir Antonelli Ponti

Abstract: In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transform… ▽ More In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the spectrograms predicted by the TTS model on the training dataset can significantly improve the similarity and speech quality for new speakers. Our model converges using only 11 speakers, reaching state-of-the-art results for similarity with new speakers, as well as high speech quality. △ Less

Submitted 15 June, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

Comments: Accepted on Interspeech 2021

arXiv:2101.10792 [pdf, other]

Adversarial Vulnerability of Active Transfer Learning

Authors: Nicolas M. Müller, Konstantin Böttinger

Abstract: Two widely used techniques for training supervised machine learning models on small datasets are Active Learning and Transfer Learning. The former helps to optimally use a limited budget to label new data. The latter uses large pre-trained models as feature extractors and enables the design of complex, non-linear models even on tiny datasets. Combining these two approaches is an effective, state-o… ▽ More Two widely used techniques for training supervised machine learning models on small datasets are Active Learning and Transfer Learning. The former helps to optimally use a limited budget to label new data. The latter uses large pre-trained models as feature extractors and enables the design of complex, non-linear models even on tiny datasets. Combining these two approaches is an effective, state-of-the-art method when dealing with small datasets. In this paper, we share an intriguing observation: Namely, that the combination of these techniques is particularly susceptible to a new kind of data poisoning attack: By adding small adversarial noise on the input, it is possible to create a collision in the output space of the transfer learner. As a result, Active Learning algorithms no longer select the optimal instances, but almost exclusively the ones injected by the attacker. This allows an attacker to manipulate the active learner to select and include arbitrary images into the data set, even against an overwhelming majority of unpoisoned samples. We show that a model trained on such a poisoned dataset has a significantly deteriorated performance, dropping from 86\% to 34\% test accuracy. We evaluate this attack on both audio and image datasets and support our findings empirically. To the best of our knowledge, this weakness has not been described before in literature. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: Accepted for publication at IDA 2021

arXiv:2010.07190 [pdf, other]

doi 10.1145/3385003.3410921

Towards Resistant Audio Adversarial Examples

Authors: Tom Dörr, Karla Markert, Nicolas M. Müller, Konstantin Böttinger

Abstract: Adversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition is also susceptible to adversarial attacks. However, reliably bridging the air gap (i.e., making the adversarial examples work when recorded via a m… ▽ More Adversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition is also susceptible to adversarial attacks. However, reliably bridging the air gap (i.e., making the adversarial examples work when recorded via a microphone) has so far eluded researchers. We find that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting because of the binning operation in the target speech recognition system (e.g., Mozilla Deepspeech). We devise an approach to mitigate this flaw and find that our method improves generation of adversarial examples with varying offsets. We confirm the significant improvement with our approach by empirical comparison of the edit distance in a realistic over-the-air setting. Our approach states a significant step towards over-the-air attacks. We publish the code and an applicable implementation of our approach. △ Less

Submitted 14 October, 2020; originally announced October 2020.

ACM Class: I.2

Journal ref: SPAI 20: Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial IntelligenceOctober 2020 Pages 3-10

arXiv:2009.07008 [pdf, other]

Data Poisoning Attacks on Regression Learning and Corresponding Defenses

Authors: Nicolas Michael Müller, Daniel Kowatsch, Konstantin Böttinger

Abstract: Adversarial data poisoning is an effective attack against machine learning and threatens model integrity by introducing poisoned data into the training dataset. So far, it has been studied mostly for classification, even though regression learning is used in many mission critical systems (such as dosage of medication, control of cyber-physical systems and managing power supply). Therefore, in the… ▽ More Adversarial data poisoning is an effective attack against machine learning and threatens model integrity by introducing poisoned data into the training dataset. So far, it has been studied mostly for classification, even though regression learning is used in many mission critical systems (such as dosage of medication, control of cyber-physical systems and managing power supply). Therefore, in the present research, we aim to evaluate all aspects of data poisoning attacks on regression learning, exceeding previous work both in terms of breadth and depth. We present realistic scenarios in which data poisoning attacks threaten production systems and introduce a novel black-box attack, which is then applied to a real-word medical use-case. As a result, we observe that the mean squared error (MSE) of the regressor increases to 150 percent due to inserting only two percent of poison samples. Finally, we present a new defense strategy against the novel and previous attacks and evaluate it thoroughly on 26 datasets. As a result of the conducted experiments, we conclude that the proposed defence strategy effectively mitigates the considered attacks. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:1912.05283 [pdf, other]

doi 10.1109/IJCNN.2019.8851920

Identifying Mislabeled Instances in Classification Datasets

Authors: Nicolas Michael Müller, Karla Markert

Abstract: A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be performed by human domain experts. This process tends to be expensive both in time and money, and is prone to errors. Additionally, reviewing an entire labeled datas… ▽ More A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be performed by human domain experts. This process tends to be expensive both in time and money, and is prone to errors. Additionally, reviewing an entire labeled dataset manually is often prohibitively costly, so many real world datasets contain mislabeled instances. To address this issue, we present in this paper a non-parametric end-to-end pipeline to find mislabeled instances in numerical, image and natural language datasets. We evaluate our system quantitatively by adding a small number of label noise to 29 datasets, and show that we find mislabeled instances with an average precision of more than 0.84 when reviewing our system's top 1\% recommendation. We then apply our system to publicly available datasets and find mislabeled instances in CIFAR-100, Fashion-MNIST, and others. Finally, we publish the code and an applicable implementation of our approach. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Journal ref: 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019

arXiv:hep-th/0205021 [pdf, ps, other]

doi 10.1016/S0550-3213(02)00572-2

Quantum group symmetry and particle scattering in (2+1)-dimensional quantum gravity

Authors: F. A. Bais, N. M. Muller, B. J. Schroers

Abstract: Starting with the Chern-Simons formulation of (2+1)-dimensional gravity we show that the gravitational interactions deform the Poincare symmetry of flat space-time to a quantum group symmetry. The relevant quantum group is the quantum double of the universal cover of the (2+1)-dimensional Lorentz group, or Lorentz double for short. We construct the Hilbert space of two gravitating particles and… ▽ More Starting with the Chern-Simons formulation of (2+1)-dimensional gravity we show that the gravitational interactions deform the Poincare symmetry of flat space-time to a quantum group symmetry. The relevant quantum group is the quantum double of the universal cover of the (2+1)-dimensional Lorentz group, or Lorentz double for short. We construct the Hilbert space of two gravitating particles and use the universal R-matrix of the Lorentz double to derive a general expression for the scattering cross section of gravitating particles with spin. In appropriate limits our formula reproduces the semi-classical scattering formulae found by 't Hooft, Deser, Jackiw and de Sousa Gerbert. △ Less

Submitted 2 May, 2002; originally announced May 2002.

Comments: 45 pages, amslatex

Report number: HWM-01-45, EMPG-02-07, ITFA-2002-12

Journal ref: Nucl.Phys. B640 (2002) 3-45

arXiv:hep-th/9804130 [pdf, ps, other]

doi 10.1016/S0550-3213(98)00572-0

Topological field theory and the quantum double of SU(2)

Authors: F. A. Bais, N. M. Muller

Abstract: We study the quantum mechanics of a system of topologically interacting particles in 2+1 dimensions, which is described by coupling the particles to a Chern-Simons gauge field of an inhomogeneous group. Analysis of the phase space shows that for the particular case of ISO(3) Chern-Simons theory the underlying symmetry is that of the quantum double D(SU(2)), based on the homogeneous part of the g… ▽ More We study the quantum mechanics of a system of topologically interacting particles in 2+1 dimensions, which is described by coupling the particles to a Chern-Simons gauge field of an inhomogeneous group. Analysis of the phase space shows that for the particular case of ISO(3) Chern-Simons theory the underlying symmetry is that of the quantum double D(SU(2)), based on the homogeneous part of the gauge group. This in contrast to the usual q-deformed gauge group itself, which occurs in the case of a homogeneous gauge group. Subsequently, we describe the structure of the quantum double of a continuous group and the classification of its unitary irreducible representations. The comultiplication and the R-element of the quantum double allow for a natural description of the fusion properties and the nonabelian braid statistics of the particles. These typically manifest themselves in generalised Aharonov-Bohm scattering processes, for which we compute the differential cross sections. Finally, we briefly describe the structure of D(SO(2,1)), the underlying quantum double symmetry of (2+1)-dimensional quantum gravity. △ Less

Submitted 15 July, 1998; v1 submitted 20 April, 1998; originally announced April 1998.

Comments: 48 pages, 3 figures, LaTeX2e; two remarks and a reference added, typos corrected; to appear in Nucl.Phys.B

Report number: UvA-WINS-ITFA 98-07

Journal ref: Nucl.Phys. B530 (1998) 349-400

arXiv:q-alg/9712042 [pdf, ps, other]

doi 10.1007/s002200050475

Tensor product representations of the quantum double of a compact group

Authors: T. H. Koornwinder, F. A. Bais, N. M. Muller

Abstract: We consider the quantum double D(G) of a compact group G, following an earlier paper. We use the explicit comultiplication on D(G) in order to build tensor products of irreducible *-representations. Then we study their behaviour under the action of the R-matrix, and their decomposition into irreducible *-representations. The example of D(SU(2)) is treated in detail, with explicit formulas for di… ▽ More We consider the quantum double D(G) of a compact group G, following an earlier paper. We use the explicit comultiplication on D(G) in order to build tensor products of irreducible *-representations. Then we study their behaviour under the action of the R-matrix, and their decomposition into irreducible *-representations. The example of D(SU(2)) is treated in detail, with explicit formulas for direct integral decomposition (`Clebsch-Gordan series') and Clebsch-Gordan coefficients. We point out possible physical applications. △ Less

Submitted 20 April, 1998; v1 submitted 17 December, 1997; originally announced December 1997.

Comments: LaTeX2e, 27 pages, corrected references, accepted by Comm.Math.Phys

Report number: UvA-WINS-Wisk.97-14, UvA-WINS-ITFA.97-44 MSC Class: 22D20; 22D30 (Primary); 81R50 (secondary)

Journal ref: Commun.Math.Phys. 198 (1998) 157-186

arXiv:q-alg/9605044 [pdf, ps, other]

Quantum double of a (locally) compact group

Authors: T. H. Koornwinder, N. M. Muller

Abstract: We generalise the quantum double construction of Drinfel'd to the case of the (Hopf) algebra of suitable functions on a compact or locally compact group. We will concentrate on the *-algebra structure of the quantum double. If the conjugacy classes in the group are countably separated, then we classify the irreducible *-representations by using the connection with so-called transformation group… ▽ More We generalise the quantum double construction of Drinfel'd to the case of the (Hopf) algebra of suitable functions on a compact or locally compact group. We will concentrate on the *-algebra structure of the quantum double. If the conjugacy classes in the group are countably separated, then we classify the irreducible *-representations by using the connection with so-called transformation group algebras. For finite groups, we will compare our description to the result of Dijkgraaf, Pasquier and Roche. Finally we will work out the explicit examples of SU(2) and SL(2,R). △ Less

Submitted 2 October, 1996; v1 submitted 29 May, 1996; originally announced May 1996.

Comments: LaTeX2e, 18 pages. Univ. of Amsterdam, Depts. of Math. and of Theor.Phys., to be published in the Journal of Lie Theory

Report number: UvA-WINS-Wisk. 96-08, UvA-WINS-ITFA 96-19 MSC Class: 22D20; 22D30 (Primary); 81R50 (secondary)

Showing 1–24 of 24 results for author: Muller, N M