-
Flow-Based Non-stationary Temporal Regime Causal Structure Learning
Authors:
Abdellah Rahmani,
Pascal Frossard
Abstract:
Understanding causal relationships in multivariate time series is crucial in many scenarios, such as those dealing with financial or neurological data. Many such time series exhibit multiple regimes, i.e., consecutive temporal segments with a priori unknown boundaries, with each regime having its own causal structure. Inferring causal dependencies and regime shifts is critical for analyzing the un…
▽ More
Understanding causal relationships in multivariate time series is crucial in many scenarios, such as those dealing with financial or neurological data. Many such time series exhibit multiple regimes, i.e., consecutive temporal segments with a priori unknown boundaries, with each regime having its own causal structure. Inferring causal dependencies and regime shifts is critical for analyzing the underlying processes. However, causal structure learning in this setting is challenging due to (1) non stationarity, i.e., each regime can have its own causal graph and mixing function, and (2) complex noise distributions, which may be non Gaussian or heteroscedastic. Existing causal discovery approaches cannot address these challenges, since generally assume stationarity or Gaussian noise with constant variance. Hence, we introduce FANTOM, a unified framework for causal discovery that handles non stationary processes along with non Gaussian and heteroscedastic noises. FANTOM simultaneously infers the number of regimes and their corresponding indices and learns each regime's Directed Acyclic Graph. It uses a Bayesian Expectation Maximization algorithm that maximizes the evidence lower bound of the data log likelihood. On the theoretical side, we prove, under mild assumptions, that temporal heteroscedastic causal models, introduced in FANTOM's formulation, are identifiable in both stationary and non stationary settings. In addition, extensive experiments on synthetic and real data show that FANTOM outperforms existing methods.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
Authors:
Alba Carballo-Castro,
Manuel Madeira,
Yiming Qin,
Dorina Thanou,
Pascal Frossard
Abstract:
Directed graphs naturally model systems with asymmetric, ordered relationships, essential to applications in biology, transportation, social networks, and visual understanding. Generating such graphs enables tasks such as simulation, data augmentation and novel instance discovery; however, directed graph generation remains underexplored. We identify two key factors limiting progress in this direct…
▽ More
Directed graphs naturally model systems with asymmetric, ordered relationships, essential to applications in biology, transportation, social networks, and visual understanding. Generating such graphs enables tasks such as simulation, data augmentation and novel instance discovery; however, directed graph generation remains underexplored. We identify two key factors limiting progress in this direction: first, modeling edge directionality introduces a substantially larger dependency space, making the underlying distribution harder to learn; second, the absence of standardized benchmarks hinders rigorous evaluation. Addressing the former requires more expressive models that are sensitive to directional topologies. We propose Directo, the first generative model for directed graphs built upon the discrete flow matching framework. Our approach combines: (i) principled positional encodings tailored to asymmetric pairwise relations, (ii) a dual-attention mechanism capturing both incoming and outgoing dependencies, and (iii) a robust, discrete generative framework. To support evaluation, we introduce a benchmark suite covering synthetic and real-world datasets. It shows that our method performs strongly across diverse settings and even competes with specialized models for particular classes, such as directed acyclic graphs. Our results highlight the effectiveness and generality of our approach, establishing a solid foundation for future research in directed graph generation.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs
Authors:
Ke Wang,
Yiming Qin,
Nikolaos Dimitriadis,
Alessandro Favero,
Pascal Frossard
Abstract:
Language models deployed in real-world systems often require post-hoc updates to incorporate new or corrected knowledge. However, editing such models efficiently and reliably - without retraining or forgetting previous information - remains a major challenge. Existing methods for lifelong model editing either compromise generalization, interfere with past edits, or fail to scale to long editing se…
▽ More
Language models deployed in real-world systems often require post-hoc updates to incorporate new or corrected knowledge. However, editing such models efficiently and reliably - without retraining or forgetting previous information - remains a major challenge. Existing methods for lifelong model editing either compromise generalization, interfere with past edits, or fail to scale to long editing sequences. We propose MEMOIR, a novel scalable framework that injects knowledge through a residual memory, i.e., a dedicated parameter module, while preserving the core capabilities of the pre-trained model. By sparsifying input activations through sample-dependent masks, MEMOIR confines each edit to a distinct subset of the memory parameters, minimizing interference among edits. At inference, it identifies relevant edits by comparing the sparse activation patterns of new queries to those stored during editing. This enables generalization to rephrased queries by activating only the relevant knowledge while suppressing unnecessary memory activation for unrelated prompts. Experiments on question answering, hallucination correction, and out-of-distribution generalization benchmarks across LLaMA-3 and Mistral demonstrate that MEMOIR achieves state-of-the-art performance across reliability, generalization, and locality metrics, scaling to thousands of sequential edits with minimal forgetting.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning
Authors:
Juan Garcia Giraldo,
Nikolaos Dimitriadis,
Ke Wang,
Pascal Frossard
Abstract:
Model merging is a flexible and computationally tractable approach to merge single-task checkpoints into a multi-task model. Prior work has solely focused on constrained multi-task settings where there is a one-to-one mapping between a sample and a task, overlooking the paradigm where multiple tasks may operate on the same sample, e.g., scene understanding. In this paper, we focus on the multi-tas…
▽ More
Model merging is a flexible and computationally tractable approach to merge single-task checkpoints into a multi-task model. Prior work has solely focused on constrained multi-task settings where there is a one-to-one mapping between a sample and a task, overlooking the paradigm where multiple tasks may operate on the same sample, e.g., scene understanding. In this paper, we focus on the multi-task setting with single-input-multiple-outputs (SIMO) and show that it qualitatively differs from the single-input-single-output model merging settings studied in the literature due to the existence of task-specific decoders and diverse loss objectives. We identify that existing model merging methods lead to significant performance degradation, primarily due to representation misalignment between the merged encoder and task-specific decoders. We propose two simple and efficient fixes for the SIMO setting to re-align the feature representation after merging. Compared to joint fine-tuning, our approach is computationally effective and flexible, and sheds light into identifying task relationships in an offline manner. Experiments on NYUv2, Cityscapes, and a subset of the Taskonomy dataset demonstrate: (1) task arithmetic suffices to enable multi-task capabilities; however, the representations generated by the merged encoder has to be re-aligned with the task-specific heads; (2) the proposed architecture rivals traditional multi-task learning in performance but requires fewer samples and training steps by leveraging the existence of task-specific models.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology
Authors:
Boqi Chen,
Cédric Vincent-Cuaz,
Lydia A. Schoenpflug,
Manuel Madeira,
Lisa Fournier,
Vaishnavi Subramanian,
Sonali Andani,
Samuel Ruiperez-Campillo,
Julia E. Vogt,
Raphaëlle Luisier,
Dorina Thanou,
Viktor H. Koelzer,
Pascal Frossard,
Gabriele Campanella,
Gunnar Rätsch
Abstract:
Vision foundation models (FMs) are accelerating the development of digital pathology algorithms and transforming biomedical research. These models learn, in a self-supervised manner, to represent histological features in highly heterogeneous tiles extracted from whole-slide images (WSIs) of real-world patient samples. The performance of these FMs is significantly influenced by the size, diversity,…
▽ More
Vision foundation models (FMs) are accelerating the development of digital pathology algorithms and transforming biomedical research. These models learn, in a self-supervised manner, to represent histological features in highly heterogeneous tiles extracted from whole-slide images (WSIs) of real-world patient samples. The performance of these FMs is significantly influenced by the size, diversity, and balance of the pre-training data. However, data selection has been primarily guided by expert knowledge at the WSI level, focusing on factors such as disease classification and tissue types, while largely overlooking the granular details available at the tile level. In this paper, we investigate the potential of unsupervised automatic data curation at the tile-level, taking into account 350 million tiles. Specifically, we apply hierarchical clustering trees to pre-extracted tile embeddings, allowing us to sample balanced datasets uniformly across the embedding space of the pretrained FM. We further identify these datasets are subject to a trade-off between size and balance, potentially compromising the quality of representations learned by FMs, and propose tailored batch sampling strategies to mitigate this effect. We demonstrate the effectiveness of our method through improved performance on a diverse range of clinically relevant downstream tasks.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
Authors:
Alessandro Favero,
Antonio Sclocchi,
Francesco Cagnetta,
Pascal Frossard,
Matthieu Wyart
Abstract:
Natural data is often organized as a hierarchical composition of features. How many samples do generative models need in order to learn the composition rules, so as to produce a combinatorially large number of novel data? What signal in the data is exploited to learn those rules? We investigate these questions in the context of diffusion models both theoretically and empirically. Theoretically, we…
▽ More
Natural data is often organized as a hierarchical composition of features. How many samples do generative models need in order to learn the composition rules, so as to produce a combinatorially large number of novel data? What signal in the data is exploited to learn those rules? We investigate these questions in the context of diffusion models both theoretically and empirically. Theoretically, we consider a simple probabilistic context-free grammar - a tree-like graphical model used to represent the hierarchical and compositional structure of data such as language and images. We demonstrate that diffusion models learn the grammar's composition rules with the sample complexity required for clustering features with statistically similar context, a process similar to the word2vec algorithm. However, this clustering emerges hierarchically: higher-level features associated with longer contexts require more data to be identified. This mechanism leads to a sample complexity that scales polynomially with the said context size. As a result, diffusion models trained on an intermediate dataset size generate data coherent up to a certain scale, but lacking global coherence. We test these predictions across different domains and find remarkable agreement: both generated texts and images achieve progressively larger coherence lengths as the training time or dataset size grows. We discuss connections between the hierarchical clustering mechanism we introduce here and the renormalization group in physics.
△ Less
Submitted 4 June, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
NMT-Obfuscator Attack: Ignore a sentence in translation with only one word
Authors:
Sahar Sadrizadeh,
César Descalzo,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Neural Machine Translation systems are used in diverse applications due to their impressive performance. However, recent studies have shown that these systems are vulnerable to carefully crafted small perturbations to their inputs, known as adversarial attacks. In this paper, we propose a new type of adversarial attack against NMT models. In this attack, we find a word to be added between two sent…
▽ More
Neural Machine Translation systems are used in diverse applications due to their impressive performance. However, recent studies have shown that these systems are vulnerable to carefully crafted small perturbations to their inputs, known as adversarial attacks. In this paper, we propose a new type of adversarial attack against NMT models. In this attack, we find a word to be added between two sentences such that the second sentence is ignored and not translated by the NMT model. The word added between the two sentences is such that the whole adversarial text is natural in the source language. This type of attack can be harmful in practical scenarios since the attacker can hide malicious information in the automatic translation made by the target NMT model. Our experiments show that different NMT models and translation tasks are vulnerable to this type of attack. Our attack can successfully force the NMT models to ignore the second part of the input in the translation for more than 50% of all cases while being able to maintain low perplexity for the whole input.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data
Authors:
William Cappelletti,
Pascal Frossard
Abstract:
Representing and exploiting multivariate signals require capturing complex relations between variables. We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution through a weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed data, along with a bilinear generalization…
▽ More
Representing and exploiting multivariate signals require capturing complex relations between variables. We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution through a weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed data, along with a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem. Our new formulation allows to include a priori knowledge on signal properties, as well as on underlying graphs and their coefficients. We show the capability of our method to reconstruct graphs from signals in multiple synthetic settings, where our model outperforms previous baselines. Then, we exploit graph-dictionary representations in a motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods relying on many more features.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Authors:
Ke Wang,
Nikolaos Dimitriadis,
Alessandro Favero,
Guillermo Ortiz-Jimenez,
Francois Fleuret,
Pascal Frossard
Abstract:
Fine-tuning pre-trained models has become the standard approach to endow them with specialized knowledge, but it poses fundamental challenges. In particular, \textit{(i)} fine-tuning often leads to catastrophic forgetting, where improvements on a target domain degrade generalization on other tasks, and \textit{(ii)} merging fine-tuned checkpoints from disparate tasks can lead to significant perfor…
▽ More
Fine-tuning pre-trained models has become the standard approach to endow them with specialized knowledge, but it poses fundamental challenges. In particular, \textit{(i)} fine-tuning often leads to catastrophic forgetting, where improvements on a target domain degrade generalization on other tasks, and \textit{(ii)} merging fine-tuned checkpoints from disparate tasks can lead to significant performance loss. To address these challenges, we introduce LiNeS, Layer-increasing Network Scaling, a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance. LiNeS scales parameter updates linearly based on their layer depth within the network, maintaining shallow layers close to their pre-trained values to preserve general features while allowing deeper layers to retain task-specific representations. In multi-task model merging scenarios, layer-wise scaling of merged parameters reduces negative task interference. LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing. It mitigates forgetting, enhances out-of-distribution generalization, integrates seamlessly with existing multi-task model merging baselines improving their performance across benchmarks and model sizes, and can boost generalization when merging LLM policies aligned with different rewards via RLHF. Our method is simple to implement, computationally efficient and complementary to many existing techniques. Our source code is available at https://github.com/wang-kee/LiNeS
△ Less
Submitted 28 February, 2025; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Deep End-to-End Survival Analysis with Temporal Consistency
Authors:
Mariana Vargas Vieyra,
Pascal Frossard
Abstract:
In this study, we present a novel Survival Analysis algorithm designed to efficiently handle large-scale longitudinal data. Our approach draws inspiration from Reinforcement Learning principles, particularly the Deep Q-Network paradigm, extending Temporal Learning concepts to Survival Regression. A central idea in our method is temporal consistency, a hypothesis that past and future outcomes in th…
▽ More
In this study, we present a novel Survival Analysis algorithm designed to efficiently handle large-scale longitudinal data. Our approach draws inspiration from Reinforcement Learning principles, particularly the Deep Q-Network paradigm, extending Temporal Learning concepts to Survival Regression. A central idea in our method is temporal consistency, a hypothesis that past and future outcomes in the data evolve smoothly over time. Our framework uniquely incorporates temporal consistency into large datasets by providing a stable training signal that captures long-term temporal relationships and ensures reliable updates. Additionally, the method supports arbitrarily complex architectures, enabling the modeling of intricate temporal dependencies, and allows for end-to-end training. Through numerous experiments we provide empirical evidence demonstrating our framework's ability to exploit temporal consistency across datasets of varying sizes. Moreover, our algorithm outperforms benchmarks on datasets with long sequences, demonstrating its ability to capture long-term patterns. Finally, ablation studies show how our method enhances training stability.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
DeFoG: Discrete Flow Matching for Graph Generation
Authors:
Yiming Qin,
Manuel Madeira,
Dorina Thanou,
Pascal Frossard
Abstract:
Graph generative models are essential across diverse scientific domains by capturing complex distributions over relational data. Among them, graph diffusion models achieve superior performance but face inefficient sampling and limited flexibility due to the tight coupling between training and sampling stages. We introduce DeFoG, a novel graph generative framework that disentangles sampling from tr…
▽ More
Graph generative models are essential across diverse scientific domains by capturing complex distributions over relational data. Among them, graph diffusion models achieve superior performance but face inefficient sampling and limited flexibility due to the tight coupling between training and sampling stages. We introduce DeFoG, a novel graph generative framework that disentangles sampling from training, enabling a broader design space for more effective and efficient model optimization. DeFoG employs a discrete flow-matching formulation that respects the inherent symmetries of graphs. We theoretically ground this disentangled formulation by explicitly relating the training loss to the sampling algorithm and showing that DeFoG faithfully replicates the ground truth graph distribution. Building on these foundations, we thoroughly investigate DeFoG's design space and propose novel sampling methods that significantly enhance performance and reduce the required number of refinement steps. Extensive experiments demonstrate state-of-the-art performance across synthetic, molecular, and digital pathology datasets, covering both unconditional and conditional generation settings. It also outperforms most diffusion-based models with just 5-10% of their sampling steps.
△ Less
Submitted 15 June, 2025; v1 submitted 5 October, 2024;
originally announced October 2024.
-
Sequential Representation Learning via Static-Dynamic Conditional Disentanglement
Authors:
Mathieu Cyrille Simon,
Pascal Frossard,
Christophe De Vleeschouwer
Abstract:
This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos. We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables and that improves the model expressivity throu…
▽ More
This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos. We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables and that improves the model expressivity through additional Normalizing Flows. A formal definition of the factors is proposed. This formalism leads to the derivation of sufficient conditions for the ground truth factors to be identifiable, and to the introduction of a novel theoretically grounded disentanglement constraint that can be directly and efficiently incorporated into our new framework. The experiments show that the proposed approach outperforms previous complex state-of-the-art techniques in scenarios where the dynamics of a scene are influenced by its content.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
PriPHiT: Privacy-Preserving Hierarchical Training of Deep Neural Networks
Authors:
Yamin Sepehri,
Pedram Pad,
Pascal Frossard,
L. Andrea Dunbar
Abstract:
The training phase of deep neural networks requires substantial resources and as such is often performed on cloud servers. However, this raises privacy concerns when the training dataset contains sensitive content, e.g., facial or medical images. In this work, we propose a method to perform the training phase of a deep learning model on both an edge device and a cloud server that prevents sensitiv…
▽ More
The training phase of deep neural networks requires substantial resources and as such is often performed on cloud servers. However, this raises privacy concerns when the training dataset contains sensitive content, e.g., facial or medical images. In this work, we propose a method to perform the training phase of a deep learning model on both an edge device and a cloud server that prevents sensitive content being transmitted to the cloud while retaining the desired information. The proposed privacy-preserving method uses adversarial early exits to suppress the sensitive content at the edge and transmits the task-relevant information to the cloud. This approach incorporates noise addition during the training phase to provide a differential privacy guarantee. We extensively test our method on different facial and medical datasets with diverse attributes using various deep learning architectures, showcasing its outstanding performance. We also demonstrate the effectiveness of privacy preservation through successful defenses against different white-box, deep and GAN-based reconstruction attacks. This approach is designed for resource-constrained edge devices, ensuring minimal memory usage and computational overhead.
△ Less
Submitted 16 December, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Authors:
Nikolaos Dimitriadis,
Pascal Frossard,
Francois Fleuret
Abstract:
Multi-task trade-offs in machine learning can be addressed via Pareto Front Learning (PFL) methods that parameterize the Pareto Front (PF) with a single model. PFL permits to select the desired operational point during inference, contrary to traditional Multi-Task Learning (MTL) that optimizes for a single trade-off decided prior to training. However, recent PFL methodologies suffer from limited s…
▽ More
Multi-task trade-offs in machine learning can be addressed via Pareto Front Learning (PFL) methods that parameterize the Pareto Front (PF) with a single model. PFL permits to select the desired operational point during inference, contrary to traditional Multi-Task Learning (MTL) that optimizes for a single trade-off decided prior to training. However, recent PFL methodologies suffer from limited scalability, slow convergence, and excessive memory requirements, while exhibiting inconsistent mappings from preference to objective space. We introduce PaLoRA, a novel parameter-efficient method that addresses these limitations in two ways. First, we augment any neural network architecture with task-specific low-rank adapters and continuously parameterize the PF in their convex hull. Our approach steers the original model and the adapters towards learning general and task-specific features, respectively. Second, we propose a deterministic sampling schedule of preference vectors that reinforces this division of labor, enabling faster convergence and strengthening the validity of the mapping from preference to objective space throughout training. Our experiments show that PaLoRA outperforms state-of-the-art MTL and PFL baselines across various datasets, scales to large networks, reducing the memory overhead $23.8-31.7$ times compared with competing PFL baselines in scene understanding benchmarks.
△ Less
Submitted 26 February, 2025; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Subgraph Matching via Partial Optimal Transport
Authors:
Wen-Xin Pan,
Isabel Haasler,
Pascal Frossard
Abstract:
In this work, we propose a novel approach for subgraph matching, the problem of finding a given query graph in a large source graph, based on the fused Gromov-Wasserstein distance. We formulate the subgraph matching problem as a partial fused Gromov-Wasserstein problem, which allows us to build on existing theory and computational methods in order to solve this challenging problem. We extend our m…
▽ More
In this work, we propose a novel approach for subgraph matching, the problem of finding a given query graph in a large source graph, based on the fused Gromov-Wasserstein distance. We formulate the subgraph matching problem as a partial fused Gromov-Wasserstein problem, which allows us to build on existing theory and computational methods in order to solve this challenging problem. We extend our method by employing a subgraph sliding approach, which makes it efficient even for large graphs. In numerical experiments, we showcase that our new algorithms have the ability to outperform state-of-the-art methods for subgraph matching on synthetic as well as realworld datasets. In particular, our methods exhibit robustness with respect to noise in the datasets and achieve very fast query times.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Generative Modelling of Structurally Constrained Graphs
Authors:
Manuel Madeira,
Clement Vignac,
Dorina Thanou,
Pascal Frossard
Abstract:
Graph diffusion models have emerged as state-of-the-art techniques in graph generation; yet, integrating domain knowledge into these models remains challenging. Domain knowledge is particularly important in real-world scenarios, where invalid generated graphs hinder deployment in practical applications. Unconstrained and conditioned graph diffusion models fail to guarantee such domain-specific str…
▽ More
Graph diffusion models have emerged as state-of-the-art techniques in graph generation; yet, integrating domain knowledge into these models remains challenging. Domain knowledge is particularly important in real-world scenarios, where invalid generated graphs hinder deployment in practical applications. Unconstrained and conditioned graph diffusion models fail to guarantee such domain-specific structural properties. We present ConStruct, a novel framework that enables graph diffusion models to incorporate hard constraints on specific properties, such as planarity or acyclicity. Our approach ensures that the sampled graphs remain within the domain of graphs that satisfy the specified property throughout the entire trajectory in both the forward and reverse processes. This is achieved by introducing an edge-absorbing noise model and a new projector operator. ConStruct demonstrates versatility across several structural and edge-deletion invariant constraints and achieves state-of-the-art performance for both synthetic benchmarks and attributed real-world datasets. For example, by incorporating planarity constraints in digital pathology graph datasets, the proposed method outperforms existing baselines, improving data validity by up to 71.1 percentage points.
△ Less
Submitted 6 December, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Localizing Task Information for Improved Model Merging and Compression
Authors:
Ke Wang,
Nikolaos Dimitriadis,
Guillermo Ortiz-Jimenez,
François Fleuret,
Pascal Frossard
Abstract:
Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required t…
▽ More
Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights. We propose TALL-masks, a method to identify these task supports given a collection of task vectors and show that one can retrieve >99% of the single task accuracy by applying our masks to the multi-task vector, effectively compressing the individual checkpoints. We study the statistics of intersections among constructed masks and reveal the existence of selfish and catastrophic weights, i.e., parameters that are important exclusively to one task and irrelevant to all tasks but detrimental to multi-task fusion. For this reason, we propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches. Our experiments in vision and NLP benchmarks with up to 20 tasks, show that Consensus Merging consistently improves existing approaches. Furthermore, our proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
PUMA: margin-based data pruning
Authors:
Javier Maroto,
Pascal Frossard
Abstract:
Deep learning has been able to outperform humans in terms of classification accuracy in many tasks. However, to achieve robustness to adversarial perturbations, the best methodologies require to perform adversarial training on a much larger training set that has been typically augmented using generative models (e.g., diffusion models). Our main objective in this work, is to reduce these data requi…
▽ More
Deep learning has been able to outperform humans in terms of classification accuracy in many tasks. However, to achieve robustness to adversarial perturbations, the best methodologies require to perform adversarial training on a much larger training set that has been typically augmented using generative models (e.g., diffusion models). Our main objective in this work, is to reduce these data requirements while achieving the same or better accuracy-robustness trade-offs. We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin). We find that the existing approaches that prune samples with low margin fails to increase robustness when we add a lot of synthetic data, and explain this situation with a perceptron learning task. Moreover, we find that pruning high margin samples for better accuracy increases the harmful impact of mislabeled perturbed data in adversarial training, hurting both robustness and accuracy. We thus propose PUMA, a new data pruning strategy that computes the margin using DeepFool, and prunes the training samples of highest margin without hurting performance by jointly adjusting the training attack norm on the samples of lowest margin. We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies. Not only PUMA achieves similar robustness with less data, but it also significantly increases the model accuracy, improving the performance trade-off.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
Authors:
Junbo Yin,
Jianbing Shen,
Runnan Chen,
Wei Li,
Ruigang Yang,
Pascal Frossard,
Wenguan Wang
Abstract:
Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D space in autonomous driving scenarios. However, objects in the BEV representation typically exhibit small sizes, and the associated point cloud context is inherently sparse, which leads to great challenges for reliable 3D perception. In this paper, we propose IS-Fusion, an innovative multimodal fusion framewo…
▽ More
Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D space in autonomous driving scenarios. However, objects in the BEV representation typically exhibit small sizes, and the associated point cloud context is inherently sparse, which leads to great challenges for reliable 3D perception. In this paper, we propose IS-Fusion, an innovative multimodal fusion framework that jointly captures the Instance- and Scene-level contextual information. IS-Fusion essentially differs from existing approaches that only focus on the BEV scene-level fusion by explicitly incorporating instance-level multimodal information, thus facilitating the instance-centric tasks like 3D object detection. It comprises a Hierarchical Scene Fusion (HSF) module and an Instance-Guided Fusion (IGF) module. HSF applies Point-to-Grid and Grid-to-Region transformers to capture the multimodal scene context at different granularities. IGF mines instance candidates, explores their relationships, and aggregates the local multimodal context for each instance. These instances then serve as guidance to enhance the scene feature and yield an instance-aware BEV representation. On the challenging nuScenes benchmark, IS-Fusion outperforms all the published multimodal works to date. Code is available at: https://github.com/yinjunbo/IS-Fusion.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein
Authors:
Hugues Van Assel,
Cédric Vincent-Cuaz,
Nicolas Courty,
Rémi Flamary,
Pascal Frossard,
Titouan Vayer
Abstract:
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships wi…
▽ More
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem. We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets.
△ Less
Submitted 27 June, 2025; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Sparse Training of Discrete Diffusion Models for Graph Generation
Authors:
Yiming Qin,
Clement Vignac,
Pascal Frossard
Abstract:
Generative graph models struggle to scale due to the need to predict the existence or type of edges between all node pairs. To address the resulting quadratic complexity, existing scalable models often impose restrictive assumptions such as a cluster structure within graphs, thus limiting their applicability. To address this, we introduce SparseDiff, a novel diffusion model based on the observatio…
▽ More
Generative graph models struggle to scale due to the need to predict the existence or type of edges between all node pairs. To address the resulting quadratic complexity, existing scalable models often impose restrictive assumptions such as a cluster structure within graphs, thus limiting their applicability. To address this, we introduce SparseDiff, a novel diffusion model based on the observation that almost all large graphs are sparse. By selecting a subset of edges, SparseDiff effectively leverages sparse graph representations both during the noising process and within the denoising network, which ensures that space complexity scales linearly with the number of chosen edges. During inference, SparseDiff progressively fills the adjacency matrix with the selected subsets of edges, mirroring the training process. Our model demonstrates state-of-the-art performance across multiple metrics on both small and large datasets, confirming its effectiveness and robustness across varying graph sizes. It also ensures faster convergence, particularly on larger graphs, achieving a fourfold speedup on the large Ego dataset compared to dense models, thereby paving the way for broader applications.
△ Less
Submitted 22 May, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Causal Temporal Regime Structure Learning
Authors:
Abdellah Rahmani,
Pascal Frossard
Abstract:
Understanding causal relationships in multivariate time series is essential for predicting and controlling dynamic systems in fields like economics, neuroscience, and climate science. However, existing causal discovery methods often assume stationarity, limiting their effectiveness when time series consist of sequential regimes, consecutive temporal segments with unknown boundaries and changing ca…
▽ More
Understanding causal relationships in multivariate time series is essential for predicting and controlling dynamic systems in fields like economics, neuroscience, and climate science. However, existing causal discovery methods often assume stationarity, limiting their effectiveness when time series consist of sequential regimes, consecutive temporal segments with unknown boundaries and changing causal structures. In this work, we firstly introduce a framework to describe and model such time series. Then, we present CASTOR, a novel method that concurrently learns the Directed Acyclic Graph (DAG) for each regime while determining the number of regimes and their sequential arrangement. CASTOR optimizes the data log-likelihood using an expectation-maximization algorithm, alternating between assigning regime indices (expectation step) and inferring causal relationships in each regime (maximization step). We establish the identifiability of the regimes and DAGs within our framework. Extensive experiments show that CASTOR consistently outperforms existing causal discovery models in detecting different regimes and learning their DAGs across various settings, including linear and nonlinear causal relationships, on both synthetic and real world datasets.
△ Less
Submitted 19 February, 2025; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Tertiary Lymphoid Structures Generation through Graph-based Diffusion
Authors:
Manuel Madeira,
Dorina Thanou,
Pascal Frossard
Abstract:
Graph-based representation approaches have been proven to be successful in the analysis of biomedical data, due to their capability of capturing intricate dependencies between biological entities, such as the spatial organization of different cell types in a tumor tissue. However, to further enhance our understanding of the underlying governing biological mechanisms, it is important to accurately…
▽ More
Graph-based representation approaches have been proven to be successful in the analysis of biomedical data, due to their capability of capturing intricate dependencies between biological entities, such as the spatial organization of different cell types in a tumor tissue. However, to further enhance our understanding of the underlying governing biological mechanisms, it is important to accurately capture the actual distributions of such complex data. Graph-based deep generative models are specifically tailored to accomplish that. In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs. In particular, we show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content, a well-established biomarker for evaluating the cancer progression in oncology research. Additionally, we further illustrate the utility of the learned generative models for data augmentation in a TLS classification task. To the best of our knowledge, this is the first work that leverages the power of graph diffusion models in generating meaningful biological cell structures.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels
Authors:
Ke Wang,
Guillermo Ortiz-Jimenez,
Rodolphe Jenatton,
Mark Collier,
Efi Kokiopoulou,
Pascal Frossard
Abstract:
Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts…
▽ More
Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts in terms of preventing overfitting to label noise. To address this deficiency, we introduce Pi-DUAL, an architecture designed to harness PI to distinguish clean from wrong labels. Pi-DUAL decomposes the output logits into a prediction term, based on conventional input features, and a noise-fitting term influenced solely by PI. A gating mechanism steered by PI adaptively shifts focus between these terms, allowing the model to implicitly separate the learning paths of clean and wrong labels. Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on ImageNet-PI), establishing a new state-of-the-art test set accuracy. Additionally, Pi-DUAL is a potent method for identifying noisy samples post-training, outperforming other strong methods at this task. Overall, Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.
△ Less
Submitted 28 May, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
A Classification-Guided Approach for Adversarial Attacks against Neural Machine Translation
Authors:
Sahar Sadrizadeh,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Neural Machine Translation (NMT) models have been shown to be vulnerable to adversarial attacks, wherein carefully crafted perturbations of the input can mislead the target model. In this paper, we introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier. In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations in t…
▽ More
Neural Machine Translation (NMT) models have been shown to be vulnerable to adversarial attacks, wherein carefully crafted perturbations of the input can mislead the target model. In this paper, we introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier. In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations in the target language by the NMT model belong to a different class than the original translations. Unlike previous attacks, our new approach has a more substantial effect on the translation by altering the overall meaning, which then leads to a different class determined by an oracle classifier. To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks by incorporating output translations of the target NMT model and the output logits of a classifier within the attack process. Extensive experiments, including a comparison with existing untargeted attacks, show that our attack is considerably more successful in altering the class of the output translation and has more effect on the translation. This new paradigm can reveal the vulnerabilities of NMT systems by focusing on the class of translation rather than the mere translation quality as studied traditionally.
△ Less
Submitted 22 February, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Online Network Source Optimization with Graph-Kernel MAB
Authors:
Laura Toni,
Pascal Frossard
Abstract:
We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive gr…
▽ More
We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications
Authors:
Adam Ivankay,
Mattia Rigotti,
Pascal Frossard
Abstract:
Along with the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has seen a significant increase recently. Several methods have been introduced to provide insight into the inference process of deep neural networks. However, most of these explainability methods have been shown to be brittle in the face of adversa…
▽ More
Along with the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has seen a significant increase recently. Several methods have been introduced to provide insight into the inference process of deep neural networks. However, most of these explainability methods have been shown to be brittle in the face of adversarial perturbations of their inputs in the image and generic textual domain. In this work we show that this phenomenon extends to specific and important high stakes domains like biomedical datasets. In particular, we observe that the robustness of explanations should be characterized in terms of the accuracy of the explanation in linking a model's inputs and its decisions - faithfulness - and its relevance from the perspective of domain experts - plausibility. This is crucial to prevent explanations that are inaccurate but still look convincing in the context of the domain at hand. To this end, we show how to adapt current attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. This results in our DomainAdaptiveAREstimator (DARE) attribution robustness estimator, allowing us to properly characterize the domain-specific robustness of faithful explanations. Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE, allowing us to train networks that display robust attributions. Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models
Authors:
Sahar Sadrizadeh,
Clément Barbier,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original sentences but destroy the translation generated by the target NMT model. This optimization problem is discrete, and we propose a continuous relaxation to solve it.…
▽ More
In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original sentences but destroy the translation generated by the target NMT model. This optimization problem is discrete, and we propose a continuous relaxation to solve it. With this relaxation, we find a probability distribution for each token in the adversarial example, and then we can generate multiple adversarial examples by sampling from these distributions. Experimental results show that our attack significantly degrades the translation quality of multiple NMT models while maintaining the semantic similarity between the original and adversarial sentences. Furthermore, our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate. Finally, we propose a black-box extension of our attack by sampling from an optimized probability distribution for a reference model whose gradients are accessible.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Flexible Channel Dimensions for Differentiable Architecture Search
Authors:
Ahmet Caner Yüzügüler,
Nikolaos Dimitriadis,
Pascal Frossard
Abstract:
Finding optimal channel dimensions (i.e., the number of filters in DNN layers) is essential to design DNNs that perform well under computational resource constraints. Recent work in neural architecture search aims at automating the optimization of the DNN model implementation. However, existing neural architecture search methods for channel dimensions rely on fixed search spaces, which prevents ac…
▽ More
Finding optimal channel dimensions (i.e., the number of filters in DNN layers) is essential to design DNNs that perform well under computational resource constraints. Recent work in neural architecture search aims at automating the optimization of the DNN model implementation. However, existing neural architecture search methods for channel dimensions rely on fixed search spaces, which prevents achieving an efficient and fully automated solution. In this work, we propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm to enable a flexible search space for channel dimensions. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency for the CIFAR-10 dataset with an improvement of $1.3-1.7\times$ in GPU-hours and $1.5-1.7\times$ in the memory requirements during the architecture search stage. Moreover, the proposed frameworks do not require a well-engineered search space a priori, which is an important step towards fully automated design of DNN architectures.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Frequency-Based Vulnerability Analysis of Deep Learning Models against Image Corruptions
Authors:
Harshitha Machiraju,
Michael H. Herzog,
Pascal Frossard
Abstract:
Deep learning models often face challenges when handling real-world image corruptions. In response, researchers have developed image corruption datasets to evaluate the performance of deep neural networks in handling such corruptions. However, these datasets have a significant limitation: they do not account for all corruptions encountered in real-life scenarios. To address this gap, we present MU…
▽ More
Deep learning models often face challenges when handling real-world image corruptions. In response, researchers have developed image corruption datasets to evaluate the performance of deep neural networks in handling such corruptions. However, these datasets have a significant limitation: they do not account for all corruptions encountered in real-life scenarios. To address this gap, we present MUFIA (Multiplicative Filter Attack), an algorithm designed to identify the specific types of corruptions that can cause models to fail. Our algorithm identifies the combination of image frequency components that render a model susceptible to misclassification while preserving the semantic similarity to the original image. We find that even state-of-the-art models trained to be robust against known common corruptions struggle against the low visibility-based corruptions crafted by MUFIA. This highlights the need for more comprehensive approaches to enhance model robustness against a wider range of real-world image corruptions.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Bures-Wasserstein Means of Graphs
Authors:
Isabel Haasler,
Pascal Frossard
Abstract:
Finding the mean of sampled data is a fundamental task in machine learning and statistics. However, in cases where the data samples are graph objects, defining a mean is an inherently difficult task. We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions, where graph similarity can be measured using the Wasserstein metric. By finding…
▽ More
Finding the mean of sampled data is a fundamental task in machine learning and statistics. However, in cases where the data samples are graph objects, defining a mean is an inherently difficult task. We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions, where graph similarity can be measured using the Wasserstein metric. By finding a mean in this embedding space, we can recover a mean graph that preserves structural information. We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it. To highlight the potential of our framework as a valuable tool for practical applications in machine learning, it is evaluated on various tasks, including k-means clustering of structured aligned graphs, classification of functional brain networks, and semi-supervised node classification in multi-layer graphs. Our experimental results demonstrate that our approach achieves consistent performance, outperforms existing baseline approaches, and improves the performance of state-of-the-art methods.
△ Less
Submitted 1 March, 2024; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Authors:
Guillermo Ortiz-Jimenez,
Alessandro Favero,
Pascal Frossard
Abstract:
Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space: By adding the fine-tuned weights of different tasks, the model's performance can be improved on these tasks, while negating them leads to task forgetting. Yet, our understanding of the effectiveness of task arithmetic and its underlying principles remains limited. We…
▽ More
Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space: By adding the fine-tuned weights of different tasks, the model's performance can be improved on these tasks, while negating them leads to task forgetting. Yet, our understanding of the effectiveness of task arithmetic and its underlying principles remains limited. We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. This property arises during pre-training and manifests when distinct directions in weight space govern separate, localized regions in function space associated with the tasks. Notably, we show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across multiple task arithmetic benchmarks and diverse models. Building on these findings, we provide theoretical and empirical analyses of the neural tangent kernel (NTK) of these models and establish a compelling link between task arithmetic and the spatial localization of the NTK eigenfunctions. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to edit pre-trained models through the NTK linearization.
△ Less
Submitted 21 November, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
SequeL: A Continual Learning Library in PyTorch and JAX
Authors:
Nikolaos Dimitriadis,
Francois Fleuret,
Pascal Frossard
Abstract:
Continual Learning is an important and challenging problem in machine learning, where models must adapt to a continuous stream of new data without forgetting previously acquired knowledge. While existing frameworks are built on PyTorch, the rising popularity of JAX might lead to divergent codebases, ultimately hindering reproducibility and progress. To address this problem, we introduce SequeL, a…
▽ More
Continual Learning is an important and challenging problem in machine learning, where models must adapt to a continuous stream of new data without forgetting previously acquired knowledge. While existing frameworks are built on PyTorch, the rising popularity of JAX might lead to divergent codebases, ultimately hindering reproducibility and progress. To address this problem, we introduce SequeL, a flexible and extensible library for Continual Learning that supports both PyTorch and JAX frameworks. SequeL provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches. The library is designed towards modularity and simplicity, making the API suitable for both researchers and practitioners. We release SequeL\footnote{\url{https://github.com/nik-dim/sequel}} as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Knowledge-Distilled Graph Neural Networks for Personalized Epileptic Seizure Detection
Authors:
Qinyue Zheng,
Arun Venkitaraman,
Simona Petravic,
Pascal Frossard
Abstract:
Wearable devices for seizure monitoring detection could significantly improve the quality of life of epileptic patients. However, existing solutions that mostly rely on full electrode set of electroencephalogram (EEG) measurements could be inconvenient for every day use. In this paper, we propose a novel knowledge distillation approach to transfer the knowledge from a sophisticated seizure detecto…
▽ More
Wearable devices for seizure monitoring detection could significantly improve the quality of life of epileptic patients. However, existing solutions that mostly rely on full electrode set of electroencephalogram (EEG) measurements could be inconvenient for every day use. In this paper, we propose a novel knowledge distillation approach to transfer the knowledge from a sophisticated seizure detector (called the teacher) trained on data from the full set of electrodes to learn new detectors (called the student). They are both providing lightweight implementations and significantly reducing the number of electrodes needed for recording the EEG. We consider the case where the teacher and the student seizure detectors are graph neural networks (GNN), since these architectures actively use the connectivity information. We consider two cases (a) when a single student is learnt for all the patients using preselected channels; and (b) when personalized students are learnt for every individual patient, with personalized channel selection using a Gumbelsoftmax approach. Our experiments on the publicly available Temple University Hospital EEG Seizure Data Corpus (TUSZ) show that both knowledge-distillation and personalization play significant roles in improving performance of seizure detection, particularly for patients with scarce EEG data. We observe that using as few as two channels, we are able to obtain competitive seizure detection performance. This, in turn, shows the potential of our approach in more realistic scenario of wearable devices for personalized monitoring of seizures, even with few recordings.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Hierarchical Training of Deep Neural Networks Using Early Exiting
Authors:
Yamin Sepehri,
Pedram Pad,
Ahmet Caner Yüzügüler,
Pascal Frossard,
L. Andrea Dunbar
Abstract:
Deep neural networks provide state-of-the-art accuracy for vision tasks but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime and privacy concerns. In this study, a novel hierarchical training method for deep neural networks is proposed that uses early exits in a…
▽ More
Deep neural networks provide state-of-the-art accuracy for vision tasks but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime and privacy concerns. In this study, a novel hierarchical training method for deep neural networks is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy. In contrast, our method can use both edge and cloud workers simultaneously, does not share the raw input data with the cloud and does not require communication during the backward pass. Several simulations and on-device experiments for different neural network architectures demonstrate the effectiveness of this method. It is shown that the proposed method reduces the training runtime for VGG-16 and ResNet-18 architectures by 29% and 61% in CIFAR-10 classification and by 25% and 81% in Tiny ImageNet classification when the communication with the cloud is done over a low bit rate channel. This gain in the runtime is achieved whilst the accuracy drop is negligible. This method is advantageous for online learning of high-accuracy deep neural networks on sensor-holding low-resource devices such as mobile phones or robots as a part of an edge-cloud system, making them more flexible in facing new tasks and classes of data.
△ Less
Submitted 20 May, 2024; v1 submitted 4 March, 2023;
originally announced March 2023.
-
Targeted Adversarial Attacks against Neural Machine Translation
Authors:
Sahar Sadrizadeh,
AmirHossein Dabiri Aghdam,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Neural Machine Translation (NMT) systems are used in various applications. However, it has been shown that they are vulnerable to very small perturbations of their inputs, known as adversarial attacks. In this paper, we propose a new targeted adversarial attack against NMT models. In particular, our goal is to insert a predefined target keyword into the translation of the adversarial sentence whil…
▽ More
Neural Machine Translation (NMT) systems are used in various applications. However, it has been shown that they are vulnerable to very small perturbations of their inputs, known as adversarial attacks. In this paper, we propose a new targeted adversarial attack against NMT models. In particular, our goal is to insert a predefined target keyword into the translation of the adversarial sentence while maintaining similarity between the original sentence and the perturbed one in the source domain. To this aim, we propose an optimization problem, including an adversarial loss term and a similarity term. We use gradient projection in the embedding space to craft an adversarial sentence. Experimental results show that our attack outperforms Seq2Sick, the other targeted adversarial attack against NMT models, in terms of success rate and decrease in translation quality. Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences while similarity with the original sentence stays preserved.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation
Authors:
Clement Vignac,
Nagham Osman,
Laura Toni,
Pascal Frossard
Abstract:
This work introduces MiDi, a novel diffusion model for jointly generating molecular graphs and their corresponding 3D arrangement of atoms. Unlike existing methods that rely on predefined rules to determine molecular bonds based on the 3D conformation, MiDi offers an end-to-end differentiable approach that streamlines the molecule generation process. Our experimental results demonstrate the effect…
▽ More
This work introduces MiDi, a novel diffusion model for jointly generating molecular graphs and their corresponding 3D arrangement of atoms. Unlike existing methods that rely on predefined rules to determine molecular bonds based on the 3D conformation, MiDi offers an end-to-end differentiable approach that streamlines the molecule generation process. Our experimental results demonstrate the effectiveness of this approach. On the challenging GEOM-DRUGS dataset, MiDi generates 92% of stable molecules, against 6% for the previous EDM model that uses interatomic distances for bond prediction, and 40% using EDM followed by an algorithm that directly optimize bond orders for validity. Our code is available at github.com/cvignac/MiDi.
△ Less
Submitted 5 June, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
TransFool: An Adversarial Attack against Neural Machine Translation Models
Authors:
Sahar Sadrizadeh,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By…
▽ More
Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By integrating the embedding representation of a language model, we generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that TransFool is transferable to unknown target models. Finally, based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings. Thus, TransFool permits us to better characterize the vulnerability of NMT models and outlines the necessity to design strong defense mechanisms and more robust NMT systems for real-life applications.
△ Less
Submitted 16 June, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Adversarial training with informed data selection
Authors:
Marcele O. K. Mendonça,
Javier Maroto,
Pascal Frossard,
Paulo S. R. Diniz
Abstract:
With the increasing amount of available data and advances in computing capabilities, deep neural networks (DNNs) have been successfully employed to solve challenging tasks in various areas, including healthcare, climate, and finance. Nevertheless, state-of-the-art DNNs are susceptible to quasi-imperceptible perturbed versions of the original images -- adversarial examples. These perturbations of t…
▽ More
With the increasing amount of available data and advances in computing capabilities, deep neural networks (DNNs) have been successfully employed to solve challenging tasks in various areas, including healthcare, climate, and finance. Nevertheless, state-of-the-art DNNs are susceptible to quasi-imperceptible perturbed versions of the original images -- adversarial examples. These perturbations of the network input can lead to disastrous implications in critical areas where wrong decisions can directly affect human lives. Adversarial training is the most efficient solution to defend the network against these malicious attacks. However, adversarial trained networks generally come with lower clean accuracy and higher computational complexity. This work proposes a data selection (DS) strategy to be applied in the mini-batch training. Based on the cross-entropy loss, the most relevant samples in the batch are selected to update the model parameters in the backpropagation. The simulation results show that a good compromise can be obtained regarding robustness and standard accuracy, whereas the computational complexity of the backpropagation pass is reduced.
△ Less
Submitted 7 January, 2023;
originally announced January 2023.
-
Estimating the Adversarial Robustness of Attributions in Text with Transformers
Authors:
Adam Ivankay,
Mattia Rigotti,
Ivan Girardi,
Chiara Marchiori,
Pascal Frossard
Abstract:
Explanations are crucial parts of deep neural network (DNN) classifiers. In high stakes applications, faithful and robust explanations are important to understand and gain trust in DNN classifiers. However, recent work has shown that state-of-the-art attribution methods in text classifiers are susceptible to imperceptible adversarial perturbations that alter explanations significantly while mainta…
▽ More
Explanations are crucial parts of deep neural network (DNN) classifiers. In high stakes applications, faithful and robust explanations are important to understand and gain trust in DNN classifiers. However, recent work has shown that state-of-the-art attribution methods in text classifiers are susceptible to imperceptible adversarial perturbations that alter explanations significantly while maintaining the correct prediction outcome. If undetected, this can critically mislead the users of DNNs. Thus, it is crucial to understand the influence of such adversarial perturbations on the networks' explanations and their perceptibility. In this work, we establish a novel definition of attribution robustness (AR) in text classification, based on Lipschitz continuity. Crucially, it reflects both attribution change induced by adversarial input alterations and perceptibility of such alterations. Moreover, we introduce a wide set of text similarity measures to effectively capture locality between two text samples and imperceptibility of adversarial perturbations in text. We then propose our novel TransformerExplanationAttack (TEA), a strong adversary that provides a tight estimation for attribution robustness in text classification. TEA uses state-of-the-art language models to extract word substitutions that result in fluent, contextual adversarial samples. Finally, with experiments on several text classification architectures, we show that TEA consistently outperforms current state-of-the-art AR estimators, yielding perturbations that alter explanations to a greater extent while being more fluent and less perceptible.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
SSDA3D: Semi-supervised Domain Adaptation for 3D Object Detection from Point Cloud
Authors:
Yan Wang,
Junbo Yin,
Wei Li,
Pascal Frossard,
Ruigang Yang,
Jianbing Shen
Abstract:
LiDAR-based 3D object detection is an indispensable task in advanced autonomous driving systems. Though impressive detection results have been achieved by superior 3D detectors, they suffer from significant performance degeneration when facing unseen domains, such as different LiDAR configurations, different cities, and weather conditions. The mainstream approaches tend to solve these challenges b…
▽ More
LiDAR-based 3D object detection is an indispensable task in advanced autonomous driving systems. Though impressive detection results have been achieved by superior 3D detectors, they suffer from significant performance degeneration when facing unseen domains, such as different LiDAR configurations, different cities, and weather conditions. The mainstream approaches tend to solve these challenges by leveraging unsupervised domain adaptation (UDA) techniques. However, these UDA solutions just yield unsatisfactory 3D detection results when there is a severe domain shift, e.g., from Waymo (64-beam) to nuScenes (32-beam). To address this, we present a novel Semi-Supervised Domain Adaptation method for 3D object detection (SSDA3D), where only a few labeled target data is available, yet can significantly improve the adaptation performance. In particular, our SSDA3D includes an Inter-domain Adaptation stage and an Intra-domain Generalization stage. In the first stage, an Inter-domain Point-CutMix module is presented to efficiently align the point cloud distribution across domains. The Point-CutMix generates mixed samples of an intermediate domain, thus encouraging to learn domain-invariant knowledge. Then, in the second stage, we further enhance the model for better generalization on the unlabeled target set. This is achieved by exploring Intra-domain Point-MixUp in semi-supervised learning, which essentially regularizes the pseudo label distribution. Experiments from Waymo to nuScenes show that, with only 10% labeled target data, our SSDA3D can surpass the fully-supervised oracle model with 100% target label. Our code is available at https://github.com/yinjunbo/SSDA3D.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
A Meta-GNN approach to personalized seizure detection and classification
Authors:
Abdellah Rahmani,
Arun Venkitaraman,
Pascal Frossard
Abstract:
In this paper, we propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples. We achieve this by combining two novel paradigms that have recently seen much success in a wide variety of real-world applications: graph neural networks (GNN), and meta-learning. We train a Meta-GNN based classifier that learns a global mo…
▽ More
In this paper, we propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples. We achieve this by combining two novel paradigms that have recently seen much success in a wide variety of real-world applications: graph neural networks (GNN), and meta-learning. We train a Meta-GNN based classifier that learns a global model from a set of training patients such that this global model can eventually be adapted to a new unseen patient using very limited samples. We apply our approach on the TUSZ-dataset, one of the largest and publicly available benchmark datasets for epilepsy. We show that our method outperforms the baselines by reaching 82.7% on accuracy and 82.08% on F1 score after only 20 iterations on new unseen patients.
△ Less
Submitted 20 March, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Maximum Likelihood Distillation for Robust Modulation Classification
Authors:
Javier Maroto,
Gérôme Bovet,
Pascal Frossard
Abstract:
Deep Neural Networks are being extensively used in communication systems and Automatic Modulation Classification (AMC) in particular. However, they are very susceptible to small adversarial perturbations that are carefully crafted to change the network decision. In this work, we build on knowledge distillation ideas and adversarial training in order to build more robust AMC systems. We first outli…
▽ More
Deep Neural Networks are being extensively used in communication systems and Automatic Modulation Classification (AMC) in particular. However, they are very susceptible to small adversarial perturbations that are carefully crafted to change the network decision. In this work, we build on knowledge distillation ideas and adversarial training in order to build more robust AMC systems. We first outline the importance of the quality of the training data in terms of accuracy and robustness of the model. We then propose to use the Maximum Likelihood function, which could solve the AMC problem in offline settings, to generate better training labels. Those labels teach the model to be uncertain in challenging conditions, which permits to increase the accuracy, as well as the robustness of the model when combined with adversarial training. Interestingly, we observe that this increase in performance transfers to online settings, where the Maximum Likelihood function cannot be used in practice. Overall, this work highlights the potential of learning to be uncertain in difficult scenarios, compared to directly removing label noise.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models
Authors:
Nikolaos Dimitriadis,
Pascal Frossard,
François Fleuret
Abstract:
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks' performance, and resort to optimality in the Pareto sense. Most MTL methodo…
▽ More
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks' performance, and resort to optimality in the Pareto sense. Most MTL methodologies either completely neglect this aspect, and instead of aiming at learning a Pareto Front, produce one solution predefined by their optimization schemes, or produce diverse but discrete solutions. Recent approaches parameterize the Pareto Front via neural networks, leading to complex mappings from tradeoff to objective space. In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose \textit{Pareto Manifold Learning}, an ensembling method in weight space. Our approach produces a continuous Pareto Front in a single training run, that allows to modulate the performance on each task during inference. Experiments on multi-task learning benchmarks, ranging from image classification to tabular datasets and scene understanding, show that \textit{Pareto Manifold Learning} outperforms state-of-the-art single-point algorithms, while learning a better Pareto parameterization than multi-point baselines.
△ Less
Submitted 14 June, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design
Authors:
Ilia Igashov,
Hannes Stärk,
Clément Vignac,
Victor Garcia Satorras,
Pascal Frossard,
Max Welling,
Michael Bronstein,
Bruno Correia
Abstract:
Fragment-based drug discovery has been an effective paradigm in early-stage drug development. An open challenge in this area is designing linkers between disconnected molecular fragments of interest to obtain chemically-relevant candidate drug molecules. In this work, we propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design. Given a set of disconnected…
▽ More
Fragment-based drug discovery has been an effective paradigm in early-stage drug development. An open challenge in this area is designing linkers between disconnected molecular fragments of interest to obtain chemically-relevant candidate drug molecules. In this work, we propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design. Given a set of disconnected fragments, our model places missing atoms in between and designs a molecule incorporating all the initial fragments. Unlike previous approaches that are only able to connect pairs of molecular fragments, our method can link an arbitrary number of fragments. Additionally, the model automatically determines the number of atoms in the linker and its attachment points to the input fragments. We demonstrate that DiffLinker outperforms other methods on the standard datasets generating more diverse and synthetically-accessible molecules. Besides, we experimentally test our method in real-world applications, showing that it can successfully generate valid linkers conditioned on target protein pockets.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
CLAD: A Contrastive Learning based Approach for Background Debiasing
Authors:
Ke Wang,
Harshitha Machiraju,
Oh-Hyeon Choung,
Michael Herzog,
Pascal Frossard
Abstract:
Convolutional neural networks (CNNs) have achieved superhuman performance in multiple vision tasks, especially image classification. However, unlike humans, CNNs leverage spurious features, such as background information to make decisions. This tendency creates different problems in terms of robustness or weak generalization performance. Through our work, we introduce a contrastive learning-based…
▽ More
Convolutional neural networks (CNNs) have achieved superhuman performance in multiple vision tasks, especially image classification. However, unlike humans, CNNs leverage spurious features, such as background information to make decisions. This tendency creates different problems in terms of robustness or weak generalization performance. Through our work, we introduce a contrastive learning-based approach (CLAD) to mitigate the background bias in CNNs. CLAD encourages semantic focus on object foregrounds and penalizes learning features from irrelavant backgrounds. Our method also introduces an efficient way of sampling negative samples. We achieve state-of-the-art results on the Background Challenge dataset, outperforming the previous benchmark with a margin of 4.1\%. Our paper shows how CLAD serves as a proof of concept for debiasing of spurious features, such as background and texture (in supplementary material).
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
DiGress: Discrete Denoising diffusion for graph generation
Authors:
Clement Vignac,
Igor Krawczuk,
Antoine Siraudin,
Bohan Wang,
Volkan Cevher,
Pascal Frossard
Abstract:
This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model utilizes a discrete diffusion process that progressively edits graphs with noise, through the process of adding or removing edges and changing the categories. A graph transformer network is trained to revert this process, simplifying the problem of distribut…
▽ More
This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model utilizes a discrete diffusion process that progressively edits graphs with noise, through the process of adding or removing edges and changing the categories. A graph transformer network is trained to revert this process, simplifying the problem of distribution learning over graphs into a sequence of node and edge classification tasks. We further improve sample quality by introducing a Markovian noise model that preserves the marginal distribution of node and edge types during diffusion, and by incorporating auxiliary graph-theoretic features. A procedure for conditioning the generation on graph-level features is also proposed. DiGress achieves state-of-the-art performance on molecular and non-molecular datasets, with up to 3x validity improvement on a planar graph dataset. It is also the first model to scale to the large GuacaMol dataset containing 1.3M drug-like molecules without the use of molecule-specific representations.
△ Less
Submitted 23 May, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Catastrophic overfitting can be induced with discriminative non-robust features
Authors:
Guillermo Ortiz-Jiménez,
Pau de Jorge,
Amartya Sanyal,
Adel Bibi,
Puneet K. Dokania,
Pascal Frossard,
Gregory Rogéz,
Philip H. S. Torr
Abstract:
Adversarial training (AT) is the de facto method for building robust neural networks, but it can be computationally expensive. To mitigate this, fast single-step attacks can be used, but this may lead to catastrophic overfitting (CO). This phenomenon appears when networks gain non-trivial robustness during the first stages of AT, but then reach a breaking point where they become vulnerable in just…
▽ More
Adversarial training (AT) is the de facto method for building robust neural networks, but it can be computationally expensive. To mitigate this, fast single-step attacks can be used, but this may lead to catastrophic overfitting (CO). This phenomenon appears when networks gain non-trivial robustness during the first stages of AT, but then reach a breaking point where they become vulnerable in just a few iterations. The mechanisms that lead to this failure mode are still poorly understood. In this work, we study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images. In particular, we show that CO can be induced at much smaller $ε$ values than it was observed before just by injecting images with seemingly innocuous features. These features aid non-robust classification but are not enough to achieve robustness on their own. Through extensive experiments we analyze this novel phenomenon and discover that the presence of these easy features induces a learning shortcut that leads to CO. Our findings provide new insights into the mechanisms of CO and improve our understanding of the dynamics of AT. The code to reproduce our experiments can be found at https://github.com/gortizji/co_features.
△ Less
Submitted 15 August, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Fooling Explanations in Text Classifiers
Authors:
Adam Ivankay,
Ivan Girardi,
Chiara Marchiori,
Pascal Frossard
Abstract:
State-of-the-art text classification models are becoming increasingly reliant on deep neural networks (DNNs). Due to their black-box nature, faithful and robust explanation methods need to accompany classifiers for deployment in real-life scenarios. However, it has been shown in vision applications that explanation methods are susceptible to local, imperceptible perturbations that can significantl…
▽ More
State-of-the-art text classification models are becoming increasingly reliant on deep neural networks (DNNs). Due to their black-box nature, faithful and robust explanation methods need to accompany classifiers for deployment in real-life scenarios. However, it has been shown in vision applications that explanation methods are susceptible to local, imperceptible perturbations that can significantly alter the explanations without changing the predicted classes. We show here that the existence of such perturbations extends to text classifiers as well. Specifically, we introduceTextExplanationFooler (TEF), a novel explanation attack algorithm that alters text input samples imperceptibly so that the outcome of widely-used explanation methods changes considerably while leaving classifier predictions unchanged. We evaluate the performance of the attribution robustness estimation performance in TEF on five sequence classification datasets, utilizing three DNN architectures and three transformer architectures for each dataset. TEF can significantly decrease the correlation between unchanged and perturbed input attributions, which shows that all models and explanation methods are susceptible to TEF perturbations. Moreover, we evaluate how the perturbations transfer to other model architectures and attribution methods, and show that TEF perturbations are also effective in scenarios where the target model and explanation method are unknown. Finally, we introduce a semi-universal attack that is able to compute fast, computationally light perturbations with no knowledge of the attacked classifier nor explanation method. Overall, our work shows that explanations in text classifiers are very fragile and users need to carefully address their robustness before relying on them in critical applications.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Hierarchical Spherical CNNs with Lifting-based Adaptive Wavelets for Pooling and Unpooling
Authors:
Mingxing Xu,
Chenglin Li,
Wenrui Dai,
Siheng Chen,
Junni Zou,
Pascal Frossard,
Hongkai Xiong
Abstract:
Pooling and unpooling are two essential operations in constructing hierarchical spherical convolutional neural networks (HS-CNNs) for comprehensive feature learning in the spherical domain. Most existing models employ downsampling-based pooling, which will inevitably incur information loss and cannot adapt to different spherical signals and tasks. Besides, the preserved information after pooling c…
▽ More
Pooling and unpooling are two essential operations in constructing hierarchical spherical convolutional neural networks (HS-CNNs) for comprehensive feature learning in the spherical domain. Most existing models employ downsampling-based pooling, which will inevitably incur information loss and cannot adapt to different spherical signals and tasks. Besides, the preserved information after pooling cannot be well restored by the subsequent unpooling to characterize the desirable features for a task. In this paper, we propose a novel framework of HS-CNNs with a lifting structure to learn adaptive spherical wavelets for pooling and unpooling, dubbed LiftHS-CNN, which ensures a more efficient hierarchical feature learning for both image- and pixel-level tasks. Specifically, adaptive spherical wavelets are learned with a lifting structure that consists of trainable lifting operators (i.e., update and predict operators). With this learnable lifting structure, we can adaptively partition a signal into two sub-bands containing low- and high-frequency components, respectively, and thus generate a better down-scaled representation for pooling by preserving more information in the low-frequency sub-band. The update and predict operators are parameterized with graph-based attention to jointly consider the signal's characteristics and the underlying geometries. We further show that particular properties are promised by the learned wavelets, ensuring the spatial-frequency localization for better exploiting the signal's correlation in both spatial and frequency domains. We then propose an unpooling operation that is invertible to the lifting-based pooling, where an inverse wavelet transform is performed by using the learned lifting operators to restore an up-scaled representation. Extensive empirical evaluations on various spherical domain tasks validate the superiority of the proposed LiftHS-CNN.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.