Search | arXiv e-print repository

Distributed Quasi-Newton Method for Fair and Fast Federated Learning

Authors: Shayan Mohajer Hamidi, Linfeng Ye

Abstract: Federated learning (FL) is a promising technology that enables edge devices/clients to collaboratively and iteratively train a machine learning model under the coordination of a central server. The most common approach to FL is first-order methods, where clients send their local gradients to the server in each iteration. However, these methods often suffer from slow convergence rates. As a remedy,… ▽ More Federated learning (FL) is a promising technology that enables edge devices/clients to collaboratively and iteratively train a machine learning model under the coordination of a central server. The most common approach to FL is first-order methods, where clients send their local gradients to the server in each iteration. However, these methods often suffer from slow convergence rates. As a remedy, second-order methods, such as quasi-Newton, can be employed in FL to accelerate its convergence. Unfortunately, similarly to the first-order FL methods, the application of second-order methods in FL can lead to unfair models, achieving high average accuracy while performing poorly on certain clients' local datasets. To tackle this issue, in this paper we introduce a novel second-order FL framework, dubbed \textbf{d}istributed \textbf{q}uasi-\textbf{N}ewton \textbf{fed}erated learning (DQN-Fed). This approach seeks to ensure fairness while leveraging the fast convergence properties of quasi-Newton methods in the FL context. Specifically, DQN-Fed helps the server update the global model in such a way that (i) all local loss functions decrease to promote fairness, and (ii) the rate of change in local loss functions aligns with that of the quasi-Newton method. We prove the convergence of DQN-Fed and demonstrate its \textit{linear-quadratic} convergence rate. Moreover, we validate the efficacy of DQN-Fed across a range of federated datasets, showing that it surpasses state-of-the-art fair FL methods in fairness, average accuracy and convergence speed. △ Less

Submitted 18 January, 2025; originally announced January 2025.

arXiv:2501.09849 [pdf, other]

Coded Deep Learning: Framework and Algorithm

Authors: En-hui Yang, Shayan Mohajer Hamidi

Abstract: The success of deep learning (DL) is often achieved with large models and high complexity during both training and post-training inferences, hindering training in resource-limited settings. To alleviate these issues, this paper introduces a new framework dubbed ``coded deep learning'' (CDL), which integrates information-theoretic coding concepts into the inner workings of DL, to significantly comp… ▽ More The success of deep learning (DL) is often achieved with large models and high complexity during both training and post-training inferences, hindering training in resource-limited settings. To alleviate these issues, this paper introduces a new framework dubbed ``coded deep learning'' (CDL), which integrates information-theoretic coding concepts into the inner workings of DL, to significantly compress model weights and activations, reduce computational complexity at both training and post-training inference stages, and enable efficient model/data parallelism. Specifically, within CDL, (i) we first propose a novel probabilistic method for quantizing both model weights and activations, and its soft differentiable variant which offers an analytic formula for gradient calculation during training; (ii) both the forward and backward passes during training are executed over quantized weights and activations, eliminating most floating-point operations and reducing training complexity; (iii) during training, both weights and activations are entropy constrained so that they are compressible in an information-theoretic sense throughout training, thus reducing communication costs in model/data parallelism; and (iv) the trained model in CDL is by default in a quantized format with compressible quantized weights, reducing post-training inference and storage complexity. Additionally, a variant of CDL, namely relaxed CDL (R-CDL), is presented to further improve the trade-off between validation accuracy and compression though requiring full precision in training with other advantageous features of CDL intact. Extensive empirical results show that CDL and R-CDL outperform the state-of-the-art algorithms in DNN compression in the literature. △ Less

Submitted 16 January, 2025; originally announced January 2025.

arXiv:2501.03392 [pdf, other]

Over-the-Air Fair Federated Learning via Multi-Objective Optimization

Authors: Shayan Mohajer Hamidi, Ali Bereyhi, Saba Asaad, H. Vincent Poor

Abstract: In federated learning (FL), heterogeneity among the local dataset distributions of clients can result in unsatisfactory performance for some, leading to an unfair model. To address this challenge, we propose an over-the-air fair federated learning algorithm (OTA-FFL), which leverages over-the-air computation to train fair FL models. By formulating FL as a multi-objective minimization problem, we i… ▽ More In federated learning (FL), heterogeneity among the local dataset distributions of clients can result in unsatisfactory performance for some, leading to an unfair model. To address this challenge, we propose an over-the-air fair federated learning algorithm (OTA-FFL), which leverages over-the-air computation to train fair FL models. By formulating FL as a multi-objective minimization problem, we introduce a modified Chebyshev approach to compute adaptive weighting coefficients for gradient aggregation in each communication round. To enable efficient aggregation over the multiple access channel, we derive analytical solutions for the optimal transmit scalars at the clients and the de-noising scalar at the parameter server. Extensive experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance compared to existing methods. △ Less

Submitted 6 January, 2025; originally announced January 2025.

arXiv:2501.02880 [pdf, other]

Conditional Mutual Information Based Diffusion Posterior Sampling for Solving Inverse Problems

Authors: Shayan Mohajer Hamidi, En-Hui Yang

Abstract: Inverse problems are prevalent across various disciplines in science and engineering. In the field of computer vision, tasks such as inpainting, deblurring, and super-resolution are commonly formulated as inverse problems. Recently, diffusion models (DMs) have emerged as a promising approach for addressing noisy linear inverse problems, offering effective solutions without requiring additional tas… ▽ More Inverse problems are prevalent across various disciplines in science and engineering. In the field of computer vision, tasks such as inpainting, deblurring, and super-resolution are commonly formulated as inverse problems. Recently, diffusion models (DMs) have emerged as a promising approach for addressing noisy linear inverse problems, offering effective solutions without requiring additional task-specific training. Specifically, with the prior provided by DMs, one can sample from the posterior by finding the likelihood. Since the likelihood is intractable, it is often approximated in the literature. However, this approximation compromises the quality of the generated images. To overcome this limitation and improve the effectiveness of DMs in solving inverse problems, we propose an information-theoretic approach. Specifically, we maximize the conditional mutual information $\mathrm{I}(\boldsymbol{x}_0; \boldsymbol{y} | \boldsymbol{x}_t)$, where $\boldsymbol{x}_0$ represents the reconstructed signal, $\boldsymbol{y}$ is the measurement, and $\boldsymbol{x}_t$ is the intermediate signal at stage $t$. This ensures that the intermediate signals $\boldsymbol{x}_t$ are generated in a way that the final reconstructed signal $\boldsymbol{x}_0$ retains as much information as possible about the measurement $\boldsymbol{y}$. We demonstrate that this method can be seamlessly integrated with recent approaches and, once incorporated, enhances their performance both qualitatively and quantitatively. △ Less

Submitted 6 January, 2025; originally announced January 2025.

arXiv:2412.20045 [pdf, other]

Enhancing Diffusion Models for Inverse Problems with Covariance-Aware Posterior Sampling

Authors: Shayan Mohajer Hamidi, En-Hui Yang

Abstract: Inverse problems exist in many disciplines of science and engineering. In computer vision, for example, tasks such as inpainting, deblurring, and super resolution can be effectively modeled as inverse problems. Recently, denoising diffusion probabilistic models (DDPMs) are shown to provide a promising solution to noisy linear inverse problems without the need for additional task specific training.… ▽ More Inverse problems exist in many disciplines of science and engineering. In computer vision, for example, tasks such as inpainting, deblurring, and super resolution can be effectively modeled as inverse problems. Recently, denoising diffusion probabilistic models (DDPMs) are shown to provide a promising solution to noisy linear inverse problems without the need for additional task specific training. Specifically, with the prior provided by DDPMs, one can sample from the posterior by approximating the likelihood. In the literature, approximations of the likelihood are often based on the mean of conditional densities of the reverse process, which can be obtained using Tweedie formula. To obtain a better approximation to the likelihood, in this paper we first derive a closed form formula for the covariance of the reverse process. Then, we propose a method based on finite difference method to approximate this covariance such that it can be readily obtained from the existing pretrained DDPMs, thereby not increasing the complexity compared to existing approaches. Finally, based on the mean and approximated covariance of the reverse process, we present a new approximation to the likelihood. We refer to this method as covariance-aware diffusion posterior sampling (CA-DPS). Experimental results show that CA-DPS significantly improves reconstruction performance without requiring hyperparameter tuning. The code for the paper is put in the supplementary materials. △ Less

Submitted 28 December, 2024; originally announced December 2024.

arXiv:2412.03867 [pdf, other]

GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning

Authors: Shayan Mohajer Hamidi, Ali Bereyhi, Saba Asaad, H. Vincent Poor

Abstract: Second-order methods are widely adopted to improve the convergence rate of learning algorithms. In federated learning (FL), these methods require the clients to share their local Hessian matrices with the parameter server (PS), which comes at a prohibitive communication cost. A classical solution to this issue is to approximate the global Hessian matrix from the first-order information. Unlike in… ▽ More Second-order methods are widely adopted to improve the convergence rate of learning algorithms. In federated learning (FL), these methods require the clients to share their local Hessian matrices with the parameter server (PS), which comes at a prohibitive communication cost. A classical solution to this issue is to approximate the global Hessian matrix from the first-order information. Unlike in idealized networks, this solution does not perform effectively in over-the-air FL settings, where the PS receives noisy versions of the local gradients. This paper introduces a novel second-order FL framework tailored for wireless channels. The pivotal innovation lies in the PS's capability to directly estimate the global Hessian matrix from the received noisy local gradients via a non-parametric method: the PS models the unknown Hessian matrix as a Gaussian process, and then uses the temporal relation between the gradients and Hessian along with the channel model to find a stochastic estimator for the global Hessian matrix. We refer to this method as Gaussian process-based Hessian modeling for wireless FL (GP-FL) and show that it exhibits a linear-quadratic convergence rate. Numerical experiments on various datasets demonstrate that GP-FL outperforms all classical baseline first and second order FL approaches. △ Less

Submitted 4 December, 2024; originally announced December 2024.

Comments: The paper is submitted to IEEE Transactions on Signal Processing

arXiv:2409.06319 [pdf, other]

Rate-Constrained Quantization for Communication-Efficient Federated Learning

Authors: Shayan Mohajer Hamidi, Ali Bereyhi

Abstract: Quantization is a common approach to mitigate the communication cost of federated learning (FL). In practice, the quantized local parameters are further encoded via an entropy coding technique, such as Huffman coding, for efficient data compression. In this case, the exact communication overhead is determined by the bit rate of the encoded gradients. Recognizing this fact, this work deviates from… ▽ More Quantization is a common approach to mitigate the communication cost of federated learning (FL). In practice, the quantized local parameters are further encoded via an entropy coding technique, such as Huffman coding, for efficient data compression. In this case, the exact communication overhead is determined by the bit rate of the encoded gradients. Recognizing this fact, this work deviates from the existing approaches in the literature and develops a novel quantized FL framework, called \textbf{r}ate-\textbf{c}onstrained \textbf{fed}erated learning (RC-FED), in which the gradients are quantized subject to both fidelity and data rate constraints. We formulate this scheme, as a joint optimization in which the quantization distortion is minimized while the rate of encoded gradients is kept below a target threshold. This enables for a tunable trade-off between quantization distortion and communication cost. We analyze the convergence behavior of RC-FED, and show its superior performance against baseline quantized FL schemes on several datasets. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2407.18041 [pdf, other]

How to Train the Teacher Model for Effective Knowledge Distillation

Authors: Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan, Linfeng Ye, Ahmed Hussein Salamah

Abstract: Recently, it was shown that the role of the teacher in knowledge distillation (KD) is to provide the student with an estimate of the true Bayes conditional probability density (BCPD). Notably, the new findings propose that the student's error rate can be upper-bounded by the mean squared error (MSE) between the teacher's output and BCPD. Consequently, to enhance KD efficacy, the teacher should be… ▽ More Recently, it was shown that the role of the teacher in knowledge distillation (KD) is to provide the student with an estimate of the true Bayes conditional probability density (BCPD). Notably, the new findings propose that the student's error rate can be upper-bounded by the mean squared error (MSE) between the teacher's output and BCPD. Consequently, to enhance KD efficacy, the teacher should be trained such that its output is close to BCPD in MSE sense. This paper elucidates that training the teacher model with MSE loss equates to minimizing the MSE between its output and BCPD, aligning with its core responsibility of providing the student with a BCPD estimate closely resembling it in MSE terms. In this respect, through a comprehensive set of experiments, we demonstrate that substituting the conventional teacher trained with cross-entropy loss with one trained using MSE loss in state-of-the-art KD methods consistently boosts the student's accuracy, resulting in improvements of up to 2.6\%. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: The paper was accepted at ECCV2024

arXiv:2407.12161 [pdf, other]

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Authors: Karolis Jucys, George Adamopoulos, Mehrab Hamidi, Stephanie Milani, Mohammad Reza Samsami, Artem Zholus, Sonia Joseph, Blake Richards, Irina Rish, Özgür Şimşek

Abstract: Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applyi… ▽ More Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task - crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Mechanistic Interpretability Workshop at ICML 2024

arXiv:2406.06262 [pdf, other]

Modular Growth of Hierarchical Networks: Efficient, General, and Robust Curriculum Learning

Authors: Mani Hamidi, Sina Khajehabdollahi, Emmanouil Giannakakis, Tim Schäfer, Anna Levina, Charley M. Wu

Abstract: Structural modularity is a pervasive feature of biological neural networks, which have been linked to several functional and computational advantages. Yet, the use of modular architectures in artificial neural networks has been relatively limited despite early successes. Here, we explore the performance and functional dynamics of a modular network trained on a memory task via an iterative growth c… ▽ More Structural modularity is a pervasive feature of biological neural networks, which have been linked to several functional and computational advantages. Yet, the use of modular architectures in artificial neural networks has been relatively limited despite early successes. Here, we explore the performance and functional dynamics of a modular network trained on a memory task via an iterative growth curriculum. We find that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics, including training time, generalizability, and robustness to some perturbations. We further examine how different aspects of a modular network's connectivity contribute to its computational capability. We then demonstrate that the inductive bias introduced by the modular topology is strong enough for the network to perform well even when the connectivity within modules is fixed and only the connections between modules are trained. Our findings suggest that gradual modular growth of RNNs could provide advantages for learning increasingly complex tasks on evolutionary timescales, and help build more scalable and compressible artificial networks. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.13324 [pdf, other]

Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers

Authors: Shayan Mohajer Hamidi, Linfeng Ye

Abstract: Adversarial training (AT) is a popular method for training robust deep neural networks (DNNs) against adversarial attacks. Yet, AT suffers from two shortcomings: (i) the robustness of DNNs trained by AT is highly intertwined with the size of the DNNs, posing challenges in achieving robustness in smaller models; and (ii) the adversarial samples employed during the AT process exhibit poor generaliza… ▽ More Adversarial training (AT) is a popular method for training robust deep neural networks (DNNs) against adversarial attacks. Yet, AT suffers from two shortcomings: (i) the robustness of DNNs trained by AT is highly intertwined with the size of the DNNs, posing challenges in achieving robustness in smaller models; and (ii) the adversarial samples employed during the AT process exhibit poor generalization, leaving DNNs vulnerable to unforeseen attack types. To address these dual challenges, this paper introduces adversarial training via adaptive knowledge amalgamation of an ensemble of teachers (AT-AKA). In particular, we generate a diverse set of adversarial samples as the inputs to an ensemble of teachers; and then, we adaptively amalgamate the logtis of these teachers to train a generalized-robust student. Through comprehensive experiments, we illustrate the superior efficacy of AT-AKA over existing AT methods and adversarial robustness distillation techniques against cutting-edge attacks, including AutoAttack. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2403.03372 [pdf, other]

doi 10.1038/s41597-025-04775-6

TartanAviation: Image, Speech, and ADS-B Trajectory Datasets for Terminal Airspace Operations

Authors: Jay Patrikar, Joao Dantas, Brady Moon, Milad Hamidi, Sourish Ghosh, Nikhil Keetha, Ian Higgins, Atharva Chandak, Takashi Yoneyama, Sebastian Scherer

Abstract: We introduce TartanAviation, an open-source multi-modal dataset focused on terminal-area airspace operations. TartanAviation provides a holistic view of the airport environment by concurrently collecting image, speech, and ADS-B trajectory data using setups installed inside airport boundaries. The datasets were collected at both towered and non-towered airfields across multiple months to capture d… ▽ More We introduce TartanAviation, an open-source multi-modal dataset focused on terminal-area airspace operations. TartanAviation provides a holistic view of the airport environment by concurrently collecting image, speech, and ADS-B trajectory data using setups installed inside airport boundaries. The datasets were collected at both towered and non-towered airfields across multiple months to capture diversity in aircraft operations, seasons, aircraft types, and weather conditions. In total, TartanAviation provides 3.1M images, 3374 hours of Air Traffic Control speech data, and 661 days of ADS-B trajectory data. The data was filtered, processed, and validated to create a curated dataset. In addition to the dataset, we also open-source the code-base used to collect and pre-process the dataset, further enhancing accessibility and usability. We believe this dataset has many potential use cases and would be particularly vital in allowing AI and machine learning technologies to be integrated into air traffic control systems and advance the adoption of autonomous aircraft in the airspace. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 8 pages, 6 figures, 2 tables

Journal ref: Scientific Data volume 12, Article number: 468 (2025)

arXiv:2401.08732 [pdf, other]

Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information

Authors: Linfeng Ye, Shayan Mohajer Hamidi, Renhao Tan, En-Hui Yang

Abstract: It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of cond… ▽ More It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of conditional mutual information (CMI) into the estimation of BCPD and propose a novel estimator called the maximum CMI (MCMI) method. Specifically, in MCMI estimation, both the log-likelihood and CMI of the teacher are simultaneously maximized when the teacher is trained. Through Eigen-CAM, it is further shown that maximizing the teacher's CMI value allows the teacher to capture more contextual information in an image cluster. Via conducting a thorough set of experiments, we show that by employing a teacher trained via MCMI estimation rather than one trained via MLL estimation in various state-of-the-art KD frameworks, the student's classification accuracy consistently increases, with the gain of up to 3.32\%. This suggests that the teacher's BCPD estimate provided by MCMI method is more accurate than that provided by MLL method. In addition, we show that such improvements in the student's accuracy are more drastic in zero-shot and few-shot settings. Notably, the student's accuracy increases with the gain of up to 5.72\% when 5\% of the training samples are available to the student (few-shot), and increases from 0\% to as high as 84\% for an omitted class (zero-shot). The code is available at \url{https://github.com/iclr2024mcmi/ICLRMCMI}. △ Less

Submitted 7 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 32 pages, 19 figures, Published as a conference paper at ICLR 2024

MSC Class: 68T30 ACM Class: I.2.6

Journal ref: International Conference on Learning Representations 2024 (ICLR)

arXiv:2401.07991 [pdf, other]

Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes

Authors: Shayan Mohajer Hamidi, Linfeng Ye

Abstract: Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each c… ▽ More Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack. △ Less

Submitted 20 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: The paper has been accepted in ICASSP 2024

arXiv:2401.04993 [pdf, other]

AdaFed: Fair Federated Learning via Adaptive Common Descent Direction

Authors: Shayan Mohajer Hamidi, En-Hui Yang

Abstract: Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is… ▽ More Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is to find an updating direction for the server along which (i) all the clients' loss functions are decreasing; and (ii) more importantly, the loss functions for the clients with larger values decrease with a higher rate. AdaFed adaptively tunes this common direction based on the values of local gradients and loss functions. We validate the effectiveness of AdaFed on a suite of federated datasets, and demonstrate that AdaFed outperforms state-of-the-art fair FL methods. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: This paper has been accepted in Transactions on Machine Learning Research. This is the link to the paper: https://openreview.net/forum?id=rFecyFpFUp&referrer=%5Bthe%20profile%20of%20Shayan%20Mohajer%20Hamidi%5D(%2Fprofile%3Fid%3D~Shayan_Mohajer_Hamidi1)

arXiv:2401.00532 [pdf, other]

On the Necessity of Metalearning: Learning Suitable Parameterizations for Learning Processes

Authors: Massinissa Hamidi, Aomar Osmani

Abstract: In this paper we will discuss metalearning and how we can go beyond the current classical learning paradigm. We will first address the importance of inductive biases in the learning process and what is at stake: the quantities of data necessary to learn. We will subsequently see the importance of choosing suitable parameterizations to end up with well-defined learning processes. Especially since i… ▽ More In this paper we will discuss metalearning and how we can go beyond the current classical learning paradigm. We will first address the importance of inductive biases in the learning process and what is at stake: the quantities of data necessary to learn. We will subsequently see the importance of choosing suitable parameterizations to end up with well-defined learning processes. Especially since in the context of real-world applications, we face numerous biases due, e.g., to the specificities of sensors, the heterogeneity of data sources, the multiplicity of points of view, etc. This will lead us to the idea of exploiting the structuring of the concepts to be learned in order to organize the learning process that we published previously. We conclude by discussing the perspectives around parameter-tying schemes and the emergence of universal aspects in the models thus learned. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.04675 [pdf, ps, other]

Reverse Engineering Deep ReLU Networks An Optimization-based Algorithm

Authors: Mehrab Hamidi

Abstract: Reverse engineering deep ReLU networks is a critical problem in understanding the complex behavior and interpretability of neural networks. In this research, we present a novel method for reconstructing deep ReLU networks by leveraging convex optimization techniques and a sampling-based approach. Our method begins by sampling points in the input space and querying the black box model to obtain the… ▽ More Reverse engineering deep ReLU networks is a critical problem in understanding the complex behavior and interpretability of neural networks. In this research, we present a novel method for reconstructing deep ReLU networks by leveraging convex optimization techniques and a sampling-based approach. Our method begins by sampling points in the input space and querying the black box model to obtain the corresponding hyperplanes. We then define a convex optimization problem with carefully chosen constraints and conditions to guarantee its convexity. The objective function is designed to minimize the discrepancy between the reconstructed networks output and the target models output, subject to the constraints. We employ gradient descent to optimize the objective function, incorporating L1 or L2 regularization as needed to encourage sparse or smooth solutions. Our research contributes to the growing body of work on reverse engineering deep ReLU networks and paves the way for new advancements in neural network interpretability and security. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 9 pages, 2 supplementary pages

arXiv:2309.09123 [pdf, other]

Conditional Mutual Information Constrained Deep Learning for Classification

Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan, Beverly Yang

Abstract: The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the D… ▽ More The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2211.06932 [pdf, other]

Challenges in Close-Proximity Safe and Seamless Operation of Manned and Unmanned Aircraft in Shared Airspace

Authors: Jay Patrikar, Joao P. A. Dantas, Sourish Ghosh, Parv Kapoor, Ian Higgins, Jasmine J. Aloor, Ingrid Navarro, Jimin Sun, Ben Stoler, Milad Hamidi, Rohan Baijal, Brady Moon, Jean Oh, Sebastian Scherer

Abstract: We propose developing an integrated system to keep autonomous unmanned aircraft safely separated and behave as expected in conjunction with manned traffic. The main goal is to achieve safe manned-unmanned vehicle teaming to improve system performance, have each (robot/human) teammate learn from each other in various aircraft operations, and reduce the manning needs of manned aircraft. The proposed… ▽ More We propose developing an integrated system to keep autonomous unmanned aircraft safely separated and behave as expected in conjunction with manned traffic. The main goal is to achieve safe manned-unmanned vehicle teaming to improve system performance, have each (robot/human) teammate learn from each other in various aircraft operations, and reduce the manning needs of manned aircraft. The proposed system anticipates and reacts to other aircraft using natural language instructions and can serve as a co-pilot or operate entirely autonomously. We point out the main technical challenges where improvements on current state-of-the-art are needed to enable Visual Flight Rules to fully autonomous aerial operations, bringing insights to these critical areas. Furthermore, we present an interactive demonstration in a prototypical scenario with one AI pilot and one human pilot sharing the same terminal airspace, interacting with each other using language, and landing safely on the same runway. We also show a demonstration of a vision-only aircraft detection system. △ Less

Submitted 13 November, 2022; originally announced November 2022.

arXiv:2209.12849 [pdf, other]

AirTrack: Onboard Deep Learning Framework for Long-Range Aircraft Detection and Tracking

Authors: Sourish Ghosh, Jay Patrikar, Brady Moon, Milad Moghassem Hamidi, Sebastian Scherer

Abstract: Detect-and-Avoid (DAA) capabilities are critical for safe operations of unmanned aircraft systems (UAS). This paper introduces, AirTrack, a real-time vision-only detect and tracking framework that respects the size, weight, and power (SWaP) constraints of sUAS systems. Given the low Signal-to-Noise ratios (SNR) of far away aircraft, we propose using full resolution images in a deep learning framew… ▽ More Detect-and-Avoid (DAA) capabilities are critical for safe operations of unmanned aircraft systems (UAS). This paper introduces, AirTrack, a real-time vision-only detect and tracking framework that respects the size, weight, and power (SWaP) constraints of sUAS systems. Given the low Signal-to-Noise ratios (SNR) of far away aircraft, we propose using full resolution images in a deep learning framework that aligns successive images to remove ego-motion. The aligned images are then used downstream in cascaded primary and secondary classifiers to improve detection and tracking performance on multiple metrics. We show that AirTrack outperforms state-of-the art baselines on the Amazon Airborne Object Tracking (AOT) Dataset. Multiple real world flight tests with a Cessna 182 interacting with general aviation traffic and additional near-collision flight tests with a Bell helicopter flying towards a UAS in a controlled setting showcase that the proposed approach satisfies the newly introduced ASTM F3442/F3442M standard for DAA. Empirical evaluations show that our system has a probability of track of more than 95% up to a range of 700m. Video available at https://youtu.be/H3lL_Wjxjpw . △ Less

Submitted 20 March, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 7 pages, 5 figures, ICRA 2023

arXiv:2111.15046 [pdf, other]

A Secure Key Sharing Algorithm Exploiting Phase Reciprocity in Wireless Channels

Authors: Shayan Mohajer Hamidi, Amir Keyvan Khandani, Ehsan Bateni

Abstract: This article presents a secure key exchange algorithm that exploits reciprocity in wireless channels to share a secret key between two nodes $A$ and $B$. Reciprocity implies that the channel phases in the links $A\rightarrow B$ and $B\rightarrow A$ are the same. A number of such reciprocal phase values are measured at nodes $A$ and $B$, called shared phase values hereafter. Each shared phase value… ▽ More This article presents a secure key exchange algorithm that exploits reciprocity in wireless channels to share a secret key between two nodes $A$ and $B$. Reciprocity implies that the channel phases in the links $A\rightarrow B$ and $B\rightarrow A$ are the same. A number of such reciprocal phase values are measured at nodes $A$ and $B$, called shared phase values hereafter. Each shared phase value is used to mask points of a Phase Shift Keying (PSK) constellation. Masking is achieved by rotating each PSK constellation with a shared phase value. Rotation of constellation is equivalent to adding phases modulo-$2π$, and as the channel phase is uniformly distributed in $[0,2π)$, the result of summation conveys zero information about summands. To enlarge the key size over a static or slow fading channel, the Radio Frequency (RF) propagation path is perturbed to create several independent realizations of multi-path fading, each used to share a new phase value. To eavesdrop a phase value shared in this manner, the Eavesdropper (Eve) will always face an under-determined system of linear equations which will not reveal any useful information about its actual solution value. This property is used to establish a secure key between two legitimate users. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.12305 [pdf, other]

Thundernna: a white box adversarial attack

Authors: Linfeng Ye, Shayan Mohajer Hamidi

Abstract: The existing work shows that the neural network trained by naive gradient-based optimization method is prone to adversarial attacks, adds small malicious on the ordinary input is enough to make the neural network wrong. At the same time, the attack against a neural network is the key to improving its robustness. The training against adversarial examples can make neural networks resist some kinds o… ▽ More The existing work shows that the neural network trained by naive gradient-based optimization method is prone to adversarial attacks, adds small malicious on the ordinary input is enough to make the neural network wrong. At the same time, the attack against a neural network is the key to improving its robustness. The training against adversarial examples can make neural networks resist some kinds of adversarial attacks. At the same time, the adversarial attack against a neural network can also reveal some characteristics of the neural network, a complex high-dimensional non-linear function, as discussed in previous work. In This project, we develop a first-order method to attack the neural network. Compare with other first-order attacks, our method has a much higher success rate. Furthermore, it is much faster than second-order attacks and multi-steps first-order attacks. △ Less

Submitted 21 January, 2024; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: 10 pages, 5 figures

MSC Class: 92B20 ACM Class: I.2.m

arXiv:2104.04889 [pdf, other]

Affinity-Based Hierarchical Learning of Dependent Concepts for Human Activity Recognition

Authors: Aomar Osmani, Massinissa Hamidi, Pegah Alizadeh

Abstract: In multi-class classification tasks, like human activity recognition, it is often assumed that classes are separable. In real applications, this assumption becomes strong and generates inconsistencies. Besides, the most commonly used approach is to learn classes one-by-one against the others. This computational simplification principle introduces strong inductive biases on the learned theories. In… ▽ More In multi-class classification tasks, like human activity recognition, it is often assumed that classes are separable. In real applications, this assumption becomes strong and generates inconsistencies. Besides, the most commonly used approach is to learn classes one-by-one against the others. This computational simplification principle introduces strong inductive biases on the learned theories. In fact, the natural connections among some classes, and not others, deserve to be taken into account. In this paper, we show that the organization of overlapping classes (multiple inheritances) into hierarchies considerably improves classification performances. This is particularly true in the case of activity recognition tasks featured in the SHL dataset. After theoretically showing the exponential complexity of possible class hierarchies, we propose an approach based on transfer affinity among the classes to determine an optimal hierarchy for the learning process. Extensive experiments show improved performances and a reduction in the number of examples needed to learn. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:2104.04885 [pdf, other]

Description of Structural Biases and Associated Data in Sensor-Rich Environments

Authors: Massinissa Hamidi, Aomar Osmani

Abstract: In this article, we study activity recognition in the context of sensor-rich environments. We address, in particular, the problem of inductive biases and their impact on the data collection process. To be effective and robust, activity recognition systems must take these biases into account at all levels and model them as hyperparameters by which they can be controlled. Whether it is a bias relate… ▽ More In this article, we study activity recognition in the context of sensor-rich environments. We address, in particular, the problem of inductive biases and their impact on the data collection process. To be effective and robust, activity recognition systems must take these biases into account at all levels and model them as hyperparameters by which they can be controlled. Whether it is a bias related to sensor measurement, transmission protocol, sensor deployment topology, heterogeneity, dynamicity, or stochastic effects, it is important to understand their substantial impact on the quality of activity recognition models. This study highlights the need to separate the different types of biases arising in real situations so that machine learning models, e.g., adapt to the dynamicity of these environments, resist to sensor failures, and follow the evolution of the sensors topology. We propose a metamodeling process in which the sensor data is structured in layers. The lower layers encode the various biases linked to transformations, transmissions, and topology of data. The upper layers encode biases related to the data itself. This way, it becomes easier to model hyperparameters and follow changes in the data acquisition infrastructure. We illustrate our approach on the SHL dataset which provides motion sensor data for a list of human activities collected under real conditions. The trade-offs exposed and the broader implications of our approach are discussed with alternative techniques to encode and incorporate knowledge into activity recognition models. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:2011.11736 [pdf, other]

Accurate and Rapid Diagnosis of COVID-19 Pneumonia with Batch Effect Removal of Chest CT-Scans and Interpretable Artificial Intelligence

Authors: Rassa Ghavami Modegh, Mehrab Hamidi, Saeed Masoudian, Amir Mohseni, Hamzeh Lotfalinezhad, Mohammad Ali Kazemi, Behnaz Moradi, Mahyar Ghafoori, Omid Motamedi, Omid Pournik, Kiara Rezaei-Kalantari, Amirreza Manteghinezhad, Shaghayegh Haghjooy Javanmard, Fateme Abdoli Nezhad, Ahmad Enhesari, Mohammad Saeed Kheyrkhah, Razieh Eghtesadi, Javid Azadbakht, Akbar Aliasgharzadeh, Mohammad Reza Sharif, Ali Khaleghi, Abbas Foroutan, Hossein Ghanaati, Hamed Dashti, Hamid R. Rabiee

Abstract: COVID-19 is a virus with high transmission rate that demands rapid identification of the infected patients to reduce the spread of the disease. The current gold-standard test, Reverse-Transcription Polymerase Chain Reaction (RT-PCR), has a high rate of false negatives. Diagnosing from CT-scan images as a more accurate alternative has the challenge of distinguishing COVID-19 from other pneumonia di… ▽ More COVID-19 is a virus with high transmission rate that demands rapid identification of the infected patients to reduce the spread of the disease. The current gold-standard test, Reverse-Transcription Polymerase Chain Reaction (RT-PCR), has a high rate of false negatives. Diagnosing from CT-scan images as a more accurate alternative has the challenge of distinguishing COVID-19 from other pneumonia diseases. Artificial intelligence can help radiologists and physicians to accelerate the process of diagnosis, increase its accuracy, and measure the severity of the disease. We designed a new interpretable deep neural network to distinguish healthy people, patients with COVID-19, and patients with other pneumonia diseases from axial lung CT-scan images. Our model also detects the infected areas and calculates the percentage of the infected lung volume. We first preprocessed the images to eliminate the batch effects of different devices, and then adopted a weakly supervised method to train the model without having any tags for the infected parts. We trained and evaluated the model on a large dataset of 3359 samples from 6 different medical centers. The model reached sensitivities of 97.75% and 98.15%, and specificities of 87% and 81.03% in separating healthy people from the diseased and COVID-19 from other diseases, respectively. It also demonstrated similar performance for 1435 samples from 6 different medical centers which proves its generalizability. The performance of the model on a large diverse dataset, its generalizability, and interpretability makes it suitable to be used as a reliable diagnostic system. △ Less

Submitted 8 January, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: 27 pages, 4 figures. Some minor changes have been applied to the text, some fomulae are added to help the descriptions become more clear, two names and two names are corrected (The full version of the names are included)

arXiv:1911.03793 [pdf, other]

doi 10.1109/ATSIP.2017.8075525

A Robust Blind 3-D Mesh Watermarking based on Wavelet Transform for Copyright Protection

Authors: Mohamed Hamidi, Mohamed El Haziti, Hocine Cherifi, Driss Aboutajdine

Abstract: Nowadays, three-dimensional meshes have been extensively used in several applications such as, industrial, medical, computer-aided design (CAD) and entertainment due to the processing capability improvement of computers and the development of the network infrastructure. Unfortunately, like digital images and videos, 3-D meshes can be easily modified, duplicated and redistributed by unauthorized us… ▽ More Nowadays, three-dimensional meshes have been extensively used in several applications such as, industrial, medical, computer-aided design (CAD) and entertainment due to the processing capability improvement of computers and the development of the network infrastructure. Unfortunately, like digital images and videos, 3-D meshes can be easily modified, duplicated and redistributed by unauthorized users. Digital watermarking came up while trying to solve this problem. In this paper, we propose a blind robust watermarking scheme for three-dimensional semiregular meshes for Copyright protection. The watermark is embedded by modifying the norm of the wavelet coefficient vectors associated with the lowest resolution level using the edge normal norms as synchronizing primitives. The experimental results show that in comparison with alternative 3-D mesh watermarking approaches, the proposed method can resist to a wide range of common attacks, such as similarity transformations including translation, rotation, uniform scaling and their combination, noise addition, Laplacian smoothing, quantization, while preserving high imperceptibility. △ Less

Submitted 9 November, 2019; originally announced November 2019.

Comments: 6 pages, 3 figures, International Conference on Advanced Technologies for Signal and Image Processing (ATSIP'2017)

arXiv:1911.00753 [pdf, other]

doi 10.1007/s11042-018-5913-9

Hybrid blind robust image watermarking technique based on DFT-DCT and Arnold transform

Authors: Mohamed Hamidi, Mohamed El Haziti, Hocine Cherifi, Mohammed El Hassouni

Abstract: In this paper, a robust blind image watermarking method is proposed for copyright protection of digital images. This hybrid method relies on combining two well-known transforms that are the discrete Fourier transform (DFT) and the discrete cosine transform (DCT). The motivation behind this combination is to enhance the imperceptibility and the robustness. The imperceptibility requirement is achiev… ▽ More In this paper, a robust blind image watermarking method is proposed for copyright protection of digital images. This hybrid method relies on combining two well-known transforms that are the discrete Fourier transform (DFT) and the discrete cosine transform (DCT). The motivation behind this combination is to enhance the imperceptibility and the robustness. The imperceptibility requirement is achieved by using magnitudes of DFT coefficients while the robustness improvement is ensured by applying DCT to the DFT coefficients magnitude. The watermark is embedded by modifying the coefficients of the middle band of the DCT using a secret key. The security of the proposed method is enhanced by applying Arnold transform (AT) to the watermark before embedding. Experiments were conducted on natural and textured images. Results show that, compared with state-of-the-art methods, the proposed method is robust to a wide range of attacks while preserving high imperceptibility. △ Less

Submitted 2 November, 2019; originally announced November 2019.

Comments: 34 page, 17 figures, published in Multimedia Tools and Applications Springer, 2018

Journal ref: Multimedia Tools and Applications, 77(20), 27181-27214 (2018)

arXiv:1910.12828 [pdf, other]

doi 10.1007/978-3-030-31332-6_15

Blind Robust 3-D Mesh Watermarking based on Mesh Saliency and QIM quantization for Copyright Protection

Authors: Mohamed Hamidi, Aladine Chetouani, Mohamed El Haziti, Mohammed El Hassouni, and Hocine Cherifi

Abstract: Due to the recent demand of 3-D models in several applications like medical imaging, video games, among others, the necessity of implementing 3-D mesh watermarking schemes aiming to protect copyright has increased considerably. The majority of robust 3-D watermarking techniques have essentially focused on the robustness against attacks while the imperceptibility of these techniques is still a real… ▽ More Due to the recent demand of 3-D models in several applications like medical imaging, video games, among others, the necessity of implementing 3-D mesh watermarking schemes aiming to protect copyright has increased considerably. The majority of robust 3-D watermarking techniques have essentially focused on the robustness against attacks while the imperceptibility of these techniques is still a real issue. In this context, a blind robust 3-D mesh watermarking method based on mesh saliency and Quantization Index Modulation (QIM) for Copyright protection is proposed. The watermark is embedded by quantifying the vertex norms of the 3-D mesh using QIM scheme since it offers a good robustness-capacity tradeoff. The choice of the vertices is adjusted by the mesh saliency to achieve watermark robustness and to avoid visual distortions. The experimental results show the high imperceptibility of the proposed scheme while ensuring a good robustness against a wide range of attacks including additive noise, similarity transformations, smoothing, quantization, etc. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: 11 pages, 5 figures, published in IbPRIA 2019: 9th Iberian Conference on Pattern Recognition and Image Analysis. arXiv admin note: substantial text overlap with arXiv:1910.11211

arXiv:1910.11211 [pdf, other]

doi 10.1007/978-3-030-22885-9_19

A Robust Blind 3-D Mesh Watermarking technique based on SCS quantization and mesh Saliency for Copyright Protection

Authors: Mohamed Hamidi, Aladine Chetouani, Mohamed El Haziti1, Mohammed El Hassouni, Hocine Cherifi

Abstract: Due to the recent demand of 3-D meshes in a wide range of applications such as video games, medical imaging, film special effect making, computer-aided design (CAD), among others, the necessity of implementing 3-D mesh watermarking schemes aiming to protect copyright has increased in the last decade. Nowadays, the majority of robust 3-D watermarking approaches have mainly focused on the robustness… ▽ More Due to the recent demand of 3-D meshes in a wide range of applications such as video games, medical imaging, film special effect making, computer-aided design (CAD), among others, the necessity of implementing 3-D mesh watermarking schemes aiming to protect copyright has increased in the last decade. Nowadays, the majority of robust 3-D watermarking approaches have mainly focused on the robustness against attacks while the imperceptibility of these techniques is still a serious challenge. In this context, a blind robust 3-D mesh watermarking method based on mesh saliency and scalar Costa scheme (SCS) for Copyright protection is proposed. The watermark is embedded by quantifying the vertex norms of the 3-D mesh by SCS scheme using the vertex normal norms as synchronizing primitives. The choice of these vertices is based on 3-D mesh saliency to achieve watermark robustness while ensuring high imperceptibility. The experimental results show that in comparison with the alternative methods, the proposed work can achieve a high imperceptibility performance while ensuring a good robustness against several common attacks including similarity transformations, noise addition, quantization, smoothing, elements reordering, etc. △ Less

Submitted 24 October, 2019; originally announced October 2019.

Comments: 10 pages, 11 figures, 5th International Conference on Mobile, Secure and Programmable Networking (MSPN'2019)

arXiv:1910.11185 [pdf, other]

doi 10.1109/AICCSA.2015.7507124

A blind Robust Image Watermarking Approach exploiting the DFT Magnitude

Authors: Mohamed Hamidi, Mohamed El Haziti, Hocine Cherifi, Driss Aboutajdine

Abstract: Due to the current progress in Internet, digital contents (video, audio and images) are widely used. Distribution of multimedia contents is now faster and it allows for easy unauthorized reproduction of information. Digital watermarking came up while trying to solve this problem. Its main idea is to embed a watermark into a host digital content without affecting its quality. Moreover, watermarking… ▽ More Due to the current progress in Internet, digital contents (video, audio and images) are widely used. Distribution of multimedia contents is now faster and it allows for easy unauthorized reproduction of information. Digital watermarking came up while trying to solve this problem. Its main idea is to embed a watermark into a host digital content without affecting its quality. Moreover, watermarking can be used in several applications such as authentication, copy control, indexation, Copyright protection, etc. In this paper, we propose a blind robust image watermarking approach as a solution to the problem of copyright protection of digital images. The underlying concept of our method is to apply a discrete cosine transform (DCT) to the magnitude resulting from a discrete Fourier transform (DFT) applied to the original image. Then, the watermark is embedded by modifying the coefficients of the DCT using a secret key to increase security. Experimental results show the robustness of the proposed technique to a wide range of common attacks, e.g., Low-Pass Gaussian Filtering, JPEG compression, Gaussian noise, salt & pepper noise, Gaussian Smoothing and Histogram equalization. The proposed method achieves a Peak signal-to-noise-ration (PSNR) value greater than 66 (dB) and ensures a perfect watermark extraction. △ Less

Submitted 21 October, 2019; originally announced October 2019.

Comments: 6 pages, 4 Figures, published in : (2015) IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA)

Showing 1–30 of 30 results for author: Hamidi, M