Search | arXiv e-print repository

Investigating the Impact of Weight Sharing Decisions on Knowledge Transfer in Continual Learning

Authors: Josh Andle, Ali Payani, Salimeh Yasaei-Sekeh

Abstract: Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks, improving network efficiency and adaptability to different tasks. Additionally, CL serves as an ideal setting for studying network behavior and Forward Knowledge Transfer (FKT) between tasks. Pruning methods for CL train subnetworks to handle the seque… ▽ More Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks, improving network efficiency and adaptability to different tasks. Additionally, CL serves as an ideal setting for studying network behavior and Forward Knowledge Transfer (FKT) between tasks. Pruning methods for CL train subnetworks to handle the sequential tasks which allows us to take a structured approach to investigating FKT. Sharing prior subnetworks' weights leverages past knowledge for the current task through FKT. Understanding which weights to share is important as sharing all weights can yield sub-optimal accuracy. This paper investigates how different sharing decisions affect the FKT between tasks. Through this lens we demonstrate how task complexity and similarity influence the optimal weight sharing decisions, giving insights into the relationships between tasks and helping inform decision making in similar CL methods. We implement three sequential datasets designed to emphasize variation in task complexity and similarity, reporting results for both ResNet-18 and VGG-16. By sharing in accordance with the decisions supported by our findings, we show that we can improve task accuracy compared to other sharing decisions. △ Less

Submitted 18 December, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 5 Figures, 4 Tables, 2 Algorithms

arXiv:2307.03803 [pdf, other]

A Theoretical Perspective on Subnetwork Contributions to Adversarial Robustness

Authors: Jovon Craig, Josh Andle, Theodore S. Nowak, Salimeh Yasaei Sekeh

Abstract: The robustness of deep neural networks (DNNs) against adversarial attacks has been studied extensively in hopes of both better understanding how deep learning models converge and in order to ensure the security of these models in safety-critical applications. Adversarial training is one approach to strengthening DNNs against adversarial attacks, and has been shown to offer a means for doing so at… ▽ More The robustness of deep neural networks (DNNs) against adversarial attacks has been studied extensively in hopes of both better understanding how deep learning models converge and in order to ensure the security of these models in safety-critical applications. Adversarial training is one approach to strengthening DNNs against adversarial attacks, and has been shown to offer a means for doing so at the cost of applying computationally expensive training methods to the entire model. To better understand these attacks and facilitate more efficient adversarial training, in this paper we develop a novel theoretical framework that investigates how the adversarial robustness of a subnetwork contributes to the robustness of the entire network. To do so we first introduce the concept of semirobustness, which is a measure of the adversarial robustness of a subnetwork. Building on this concept, we then provide a theoretical analysis to show that if a subnetwork is semirobust and there is a sufficient dependency between it and each subsequent layer in the network, then the remaining layers are also guaranteed to be robust. We validate these findings empirically across multiple DNN architectures, datasets, and adversarial attacks. Experiments show the ability of a robust subnetwork to promote full-network robustness, and investigate the layer-wise dependencies required for this full-network robustness to be achieved. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 3 figures, 3 tables, 17 pages, has appendices

arXiv:2204.12010 [pdf, other]

Theoretical Understanding of the Information Flow on Continual Learning Performance

Authors: Josh Andle, Salimeh Yasaei Sekeh

Abstract: Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data sequentially. CL performance evaluates the model's ability to continually learn and solve new problems with incremental available information over time while retaining previous knowledge. Despite the numerous previous solutions to bypass the catastrophic forgetting (CF) of previously seen tasks duri… ▽ More Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data sequentially. CL performance evaluates the model's ability to continually learn and solve new problems with incremental available information over time while retaining previous knowledge. Despite the numerous previous solutions to bypass the catastrophic forgetting (CF) of previously seen tasks during the learning process, most of them still suffer significant forgetting, expensive memory cost, or lack of theoretical understanding of neural networks' conduct while learning new tasks. While the issue that CL performance degrades under different training regimes has been extensively studied empirically, insufficient attention has been paid from a theoretical angle. In this paper, we establish a probabilistic framework to analyze information flow through layers in networks for task sequences and its impact on learning performance. Our objective is to optimize the information preservation between layers while learning new tasks to manage task-specific knowledge passing throughout the layers while maintaining model performance on previous tasks. In particular, we study CL performance's relationship with information flow in the network to answer the question "How can knowledge of information flow between layers be used to alleviate CF?". Our analysis provides novel insights of information adaptation within the layers during the incremental task learning process. Through our experiments, we provide empirical evidence and practically highlight the performance improvement across multiple tasks. △ Less

Submitted 2 May, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: 7 figures, 16 pages

arXiv:2203.11743 [pdf, other]

The Stanford Drone Dataset is More Complex than We Think: An Analysis of Key Characteristics

Authors: Joshua Andle, Nicholas Soucy, Simon Socolow, Salimeh Yasaei Sekeh

Abstract: Several datasets exist which contain annotated information of individuals' trajectories. Such datasets are vital for many real-world applications, including trajectory prediction and autonomous navigation. One prominent dataset currently in use is the Stanford Drone Dataset (SDD). Despite its prominence, discussion surrounding the characteristics of this dataset is insufficient. We demonstrate how… ▽ More Several datasets exist which contain annotated information of individuals' trajectories. Such datasets are vital for many real-world applications, including trajectory prediction and autonomous navigation. One prominent dataset currently in use is the Stanford Drone Dataset (SDD). Despite its prominence, discussion surrounding the characteristics of this dataset is insufficient. We demonstrate how this insufficiency reduces the information available to users and can impact performance. Our contributions include the outlining of key characteristics in the SDD, employment of an information-theoretic measure and custom metric to clearly visualize those characteristics, the implementation of the PECNet and Y-Net trajectory prediction models to demonstrate the outlined characteristics' impact on predictive performance, and lastly we provide a comparison between the SDD and Intersection Drone (inD) Dataset. Our analysis of the SDD's key characteristics is important because without adequate information about available datasets a user's ability to select the most suitable dataset for their methods, to reproduce one another's results, and to interpret their own results are hindered. The observations we make through this analysis provide a readily accessible and interpretable source of information for those planning to use the SDD. Our intention is to increase the performance and reproducibility of methods applied to this dataset going forward, while also clearly detailing less obvious features of the dataset for new users. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 12 pages, 10 figures, 5 tables

Showing 1–4 of 4 results for author: Andle, J