-
LISArD: Learning Image Similarity to Defend Against Gray-box Adversarial Attacks
Authors:
Joana C. Costa,
Tiago Roxo,
Hugo Proença,
Pedro R. M. Inácio
Abstract:
State-of-the-art defense mechanisms are typically evaluated in the context of white-box attacks, which is not realistic, as it assumes the attacker can access the gradients of the target network. To protect against this scenario, Adversarial Training (AT) and Adversarial Distillation (AD) include adversarial examples during the training phase, and Adversarial Purification uses a generative model t…
▽ More
State-of-the-art defense mechanisms are typically evaluated in the context of white-box attacks, which is not realistic, as it assumes the attacker can access the gradients of the target network. To protect against this scenario, Adversarial Training (AT) and Adversarial Distillation (AD) include adversarial examples during the training phase, and Adversarial Purification uses a generative model to reconstruct all the images given to the classifier. This paper considers an even more realistic evaluation scenario: gray-box attacks, which assume that the attacker knows the architecture and the dataset used to train the target network, but cannot access its gradients. We provide empirical evidence that models are vulnerable to gray-box attacks and propose LISArD, a defense mechanism that does not increase computational and temporal costs but provides robustness against gray-box and white-box attacks without including AT. Our method approximates a cross-correlation matrix, created with the embeddings of perturbed and clean images, to a diagonal matrix while simultaneously conducting classification learning. Our results show that LISArD can effectively protect against gray-box attacks, can be used in multiple architectures, and carries over its resilience to the white-box scenario. Also, state-of-the-art AD models underperform greatly when removing AT and/or moving to gray-box settings, highlighting the lack of robustness from existing approaches to perform in various conditions (aside from white-box settings). All the source code is available at https://github.com/Joana-Cabral/LISArD.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
ASDnB: Merging Face with Body Cues For Robust Active Speaker Detection
Authors:
Tiago Roxo,
Joana C. Costa,
Pedro Inácio,
Hugo Proença
Abstract:
State-of-the-art Active Speaker Detection (ASD) approaches mainly use audio and facial features as input. However, the main hypothesis in this paper is that body dynamics is also highly correlated to "speaking" (and "listening") actions and should be particularly useful in wild conditions (e.g., surveillance settings), where face cannot be reliably accessed. We propose ASDnB, a model that singular…
▽ More
State-of-the-art Active Speaker Detection (ASD) approaches mainly use audio and facial features as input. However, the main hypothesis in this paper is that body dynamics is also highly correlated to "speaking" (and "listening") actions and should be particularly useful in wild conditions (e.g., surveillance settings), where face cannot be reliably accessed. We propose ASDnB, a model that singularly integrates face with body information by merging the inputs at different steps of feature extraction. Our approach splits 3D convolution into 2D and 1D to reduce computation cost without loss of performance, and is trained with adaptive weight feature importance for improved complement of face with body data. Our experiments show that ASDnB achieves state-of-the-art results in the benchmark dataset (AVA-ActiveSpeaker), in the challenging data of WASD, and in cross-domain settings using Columbia. This way, ASDnB can perform in multiple settings, which is positively regarded as a strong baseline for robust ASD models (code available at https://github.com/Tiago-Roxo/ASDnB).
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
BIAS: A Body-based Interpretable Active Speaker Approach
Authors:
Tiago Roxo,
Joana C. Costa,
Pedro R. M. Inácio,
Hugo Proença
Abstract:
State-of-the-art Active Speaker Detection (ASD) approaches heavily rely on audio and facial features to perform, which is not a sustainable approach in wild scenarios. Although these methods achieve good results in the standard AVA-ActiveSpeaker set, a recent wilder ASD dataset (WASD) showed the limitations of such models and raised the need for new approaches. As such, we propose BIAS, a model th…
▽ More
State-of-the-art Active Speaker Detection (ASD) approaches heavily rely on audio and facial features to perform, which is not a sustainable approach in wild scenarios. Although these methods achieve good results in the standard AVA-ActiveSpeaker set, a recent wilder ASD dataset (WASD) showed the limitations of such models and raised the need for new approaches. As such, we propose BIAS, a model that, for the first time, combines audio, face, and body information, to accurately predict active speakers in varying/challenging conditions. Additionally, we design BIAS to provide interpretability by proposing a novel use for Squeeze-and-Excitation blocks, namely in attention heatmaps creation and feature importance assessment. For a full interpretability setup, we annotate an ASD-related actions dataset (ASD-Text) to finetune a ViT-GPT2 for text scene description to complement BIAS interpretability. The results show that BIAS is state-of-the-art in challenging conditions where body-based features are of utmost importance (Columbia, open-settings, and WASD), and yields competitive results in AVA-ActiveSpeaker, where face is more influential than body for ASD. BIAS interpretability also shows the features/aspects more relevant towards ASD prediction in varying settings, making it a strong baseline for further developments in interpretable ASD models, and is available at https://github.com/Tiago-Roxo/BIAS.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
How to Squeeze An Explanation Out of Your Model
Authors:
Tiago Roxo,
Joana C. Costa,
Pedro R. M. Inácio,
Hugo Proença
Abstract:
Deep learning models are widely used nowadays for their reliability in performing various tasks. However, they do not typically provide the reasoning behind their decision, which is a significant drawback, particularly for more sensitive areas such as biometrics, security and healthcare. The most commonly used approaches to provide interpretability create visual attention heatmaps of regions of in…
▽ More
Deep learning models are widely used nowadays for their reliability in performing various tasks. However, they do not typically provide the reasoning behind their decision, which is a significant drawback, particularly for more sensitive areas such as biometrics, security and healthcare. The most commonly used approaches to provide interpretability create visual attention heatmaps of regions of interest on an image based on models gradient backpropagation. Although this is a viable approach, current methods are targeted toward image settings and default/standard deep learning models, meaning that they require significant adaptations to work on video/multi-modal settings and custom architectures. This paper proposes an approach for interpretability that is model-agnostic, based on a novel use of the Squeeze and Excitation (SE) block that creates visual attention heatmaps. By including an SE block prior to the classification layer of any model, we are able to retrieve the most influential features via SE vector manipulation, one of the key components of the SE block. Our results show that this new SE-based interpretability can be applied to various models in image and video/multi-modal settings, namely biometrics of facial features with CelebA and behavioral biometrics using Active Speaker Detection datasets. Furthermore, our proposal does not compromise model performance toward the original task, and has competitive results with current interpretability approaches in state-of-the-art object datasets, highlighting its robustness to perform in varying data aside from the biometric context.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
How Deep Learning Sees the World: A Survey on Adversarial Attacks & Defenses
Authors:
Joana C. Costa,
Tiago Roxo,
Hugo Proença,
Pedro R. M. Inácio
Abstract:
Deep Learning is currently used to perform multiple tasks, such as object recognition, face recognition, and natural language processing. However, Deep Neural Networks (DNNs) are vulnerable to perturbations that alter the network prediction (adversarial examples), raising concerns regarding its usage in critical areas, such as self-driving vehicles, malware detection, and healthcare. This paper co…
▽ More
Deep Learning is currently used to perform multiple tasks, such as object recognition, face recognition, and natural language processing. However, Deep Neural Networks (DNNs) are vulnerable to perturbations that alter the network prediction (adversarial examples), raising concerns regarding its usage in critical areas, such as self-driving vehicles, malware detection, and healthcare. This paper compiles the most recent adversarial attacks, grouped by the attacker capacity, and modern defenses clustered by protection strategies. We also present the new advances regarding Vision Transformers, summarize the datasets and metrics used in the context of adversarial settings, and compare the state-of-the-art results under different attacks, finishing with the identification of open issues.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
WASD: A Wilder Active Speaker Detection Dataset
Authors:
Tiago Roxo,
Joana C. Costa,
Pedro R. M. Inácio,
Hugo Proença
Abstract:
Current Active Speaker Detection (ASD) models achieve great results on AVA-ActiveSpeaker (AVA), using only sound and facial features. Although this approach is applicable in movie setups (AVA), it is not suited for less constrained conditions. To demonstrate this limitation, we propose a Wilder Active Speaker Detection (WASD) dataset, with increased difficulty by targeting the two key components o…
▽ More
Current Active Speaker Detection (ASD) models achieve great results on AVA-ActiveSpeaker (AVA), using only sound and facial features. Although this approach is applicable in movie setups (AVA), it is not suited for less constrained conditions. To demonstrate this limitation, we propose a Wilder Active Speaker Detection (WASD) dataset, with increased difficulty by targeting the two key components of current ASD: audio and face. Grouped into 5 categories, ranging from optimal conditions to surveillance settings, WASD contains incremental challenges for ASD with tactical impairment of audio and face data. We select state-of-the-art models and assess their performance in two groups of WASD: Easy (cooperative settings) and Hard (audio and/or face are specifically degraded). The results show that: 1) AVA trained models maintain a state-of-the-art performance in WASD Easy group, while underperforming in the Hard one, showing the 2) similarity between AVA and Easy data; and 3) training in WASD does not improve models performance to AVA levels, particularly for audio impairment and surveillance settings. This shows that AVA does not prepare models for wild ASD and current approaches are subpar to deal with such conditions. The proposed dataset also contains body data annotations to provide a new source for ASD, and is available at https://github.com/Tiago-Roxo/WASD.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.
-
Complete identification of complex salt geometries from inaccurate migrated subsurface offset gathers using deep learning
Authors:
Ana Paula O. Muller,
Jesse C. Costa,
Clecio R. Bom,
Elisangela L. Faria,
Matheus Klatt,
Gabriel Teixeira,
Marcelo P. de Albuquerque,
Marcio P. de Albuquerque
Abstract:
Delimiting salt inclusions from migrated images is a time-consuming activity that relies on highly human-curated analysis and is subject to interpretation errors or limitations of the methods available. We propose to use migrated images produced from an inaccurate velocity model (with a reasonable approximation of sediment velocity, but without salt inclusions) to predict the correct salt inclusio…
▽ More
Delimiting salt inclusions from migrated images is a time-consuming activity that relies on highly human-curated analysis and is subject to interpretation errors or limitations of the methods available. We propose to use migrated images produced from an inaccurate velocity model (with a reasonable approximation of sediment velocity, but without salt inclusions) to predict the correct salt inclusions shape using a Convolutional Neural Network (CNN). Our approach relies on subsurface Common Image Gathers to focus the sediments' reflections around the zero offset and to spread the energy of salt reflections over large offsets. Using synthetic data, we trained a U-Net to use common-offset subsurface images as input channels for the CNN and the correct salt-masks as network output. The network learned to predict the salt inclusions masks with high accuracy; moreover, it also performed well when applied to synthetic benchmark data sets that were not previously introduced. Our training process tuned the U-Net to successfully learn the shape of complex salt bodies from partially focused subsurface offset images.
△ Less
Submitted 5 December, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Recognizing pro-R closures of regular languages
Authors:
Jorge Almeida,
José Carlos Costa,
Marc Zeitoun
Abstract:
Given a regular language L, we effectively construct a unary semigroup that recognizes the topological closure of L in the free unary semigroup relative to the variety of unary semigroups generated by the pseudovariety R of all finite R-trivial semigroups. In particular, we obtain a new effective solution of the separation problem of regular languages by R-languages.
Given a regular language L, we effectively construct a unary semigroup that recognizes the topological closure of L in the free unary semigroup relative to the variety of unary semigroups generated by the pseudovariety R of all finite R-trivial semigroups. In particular, we obtain a new effective solution of the separation problem of regular languages by R-languages.
△ Less
Submitted 24 May, 2019;
originally announced May 2019.
-
The linear nature of pseudowords
Authors:
Jorge Almeida,
Alfredo Costa,
José Carlos Costa,
Marc Zeitoun
Abstract:
Given a pseudoword over suitable pseudovarieties, we associate to it a labeled linear order determined by the factorizations of the pseudoword. We show that, in the case of the pseudovariety of aperiodic finite semigroups, the pseudoword can be recovered from the labeled linear order.
Given a pseudoword over suitable pseudovarieties, we associate to it a labeled linear order determined by the factorizations of the pseudoword. We show that, in the case of the pseudovariety of aperiodic finite semigroups, the pseudoword can be recovered from the labeled linear order.
△ Less
Submitted 9 March, 2017; v1 submitted 26 February, 2017;
originally announced February 2017.
-
McCammond's normal forms for free aperiodic semigroups revisited
Authors:
Jorge Almeida,
José Carlos Costa,
Marc Zeitoun
Abstract:
This paper revisits the solution of the word problem for $ω$-terms interpreted over finite aperiodic semigroups, obtained by J. McCammond. The original proof of correctness of McCammond's algorithm, based on normal forms for such terms, uses McCammond's solution of the word problem for certain Burnside semigroups. In this paper, we establish a new, simpler, correctness proof of McCammond's algorit…
▽ More
This paper revisits the solution of the word problem for $ω$-terms interpreted over finite aperiodic semigroups, obtained by J. McCammond. The original proof of correctness of McCammond's algorithm, based on normal forms for such terms, uses McCammond's solution of the word problem for certain Burnside semigroups. In this paper, we establish a new, simpler, correctness proof of McCammond's algorithm, based on properties of certain regular languages associated with the normal forms. This method leads to new applications.
△ Less
Submitted 3 June, 2014;
originally announced June 2014.