Search | arXiv e-print repository

arXiv:2504.01995 [pdf, other]

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Authors: Hamed Mahdavi, Alireza Hashemi, Majid Daliri, Pegah Mohammadipour, Alireza Farhadi, Samira Malek, Yekta Yazdanifard, Amir Khasahmadi, Vasant Honavar

Abstract: Recent advances in large language models (LLMs) have shown impressive progress in mathematical reasoning tasks. However, current evaluation benchmarks predominantly focus on the accuracy of final answers, often overlooking the crucial logical rigor for mathematical problem solving. The claim that state-of-the-art LLMs can solve Math Olympiad-level problems requires closer examination. To explore t… ▽ More Recent advances in large language models (LLMs) have shown impressive progress in mathematical reasoning tasks. However, current evaluation benchmarks predominantly focus on the accuracy of final answers, often overlooking the crucial logical rigor for mathematical problem solving. The claim that state-of-the-art LLMs can solve Math Olympiad-level problems requires closer examination. To explore this, we conducted both qualitative and quantitative human evaluations of proofs generated by LLMs, and developed a schema for automatically assessing their reasoning capabilities. Our study reveals that current LLMs fall significantly short of solving challenging Olympiad-level problems and frequently fail to distinguish correct mathematical reasoning from clearly flawed solutions. Our analyses demonstrate that the occasional correct final answers provided by LLMs often result from pattern recognition or heuristic shortcuts rather than genuine mathematical reasoning. These findings underscore the substantial gap between LLM performance and human expertise in advanced mathematical reasoning and highlight the importance of developing benchmarks that prioritize the soundness of the reasoning used to arrive at an answer rather than the mere correctness of the final answers. △ Less

Submitted 10 April, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

arXiv:2503.07464 [pdf, other]

Learning to Localize Leakage of Cryptographic Sensitive Variables

Authors: Jimmy Gammell, Anand Raghunathan, Abolfazl Hashemi, Kaushik Roy

Abstract: While cryptographic algorithms such as the ubiquitous Advanced Encryption Standard (AES) are secure, *physical implementations* of these algorithms in hardware inevitably 'leak' sensitive data such as cryptographic keys. A particularly insidious form of leakage arises from the fact that hardware consumes power and emits radiation in a manner that is statistically associated with the data it proces… ▽ More While cryptographic algorithms such as the ubiquitous Advanced Encryption Standard (AES) are secure, *physical implementations* of these algorithms in hardware inevitably 'leak' sensitive data such as cryptographic keys. A particularly insidious form of leakage arises from the fact that hardware consumes power and emits radiation in a manner that is statistically associated with the data it processes and the instructions it executes. Supervised deep learning has emerged as a state-of-the-art tool for carrying out *side-channel attacks*, which exploit this leakage by learning to map power/radiation measurements throughout encryption to the sensitive data operated on during that encryption. In this work we develop a principled deep learning framework for determining the relative leakage due to measurements recorded at different points in time, in order to inform *defense* against such attacks. This information is invaluable to cryptographic hardware designers for understanding *why* their hardware leaks and how they can mitigate it (e.g. by indicating the particular sections of code or electronic components which are responsible). Our framework is based on an adversarial game between a family of classifiers trained to estimate the conditional distributions of sensitive data given subsets of measurements, and a budget-constrained noise distribution which probabilistically erases individual measurements to maximize the loss of these classifiers. We demonstrate our method's efficacy and ability to overcome limitations of prior work through extensive experimental comparison with 8 baseline methods using 3 evaluation metrics and 6 publicly-available power/EM trace datasets from AES, ECC and RSA implementations. We provide an open-source PyTorch implementation of these experiments. △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: 52 pages, 30 figures. Our code can be found at https://github.com/jimgammell/learning_to_localize_leakage

arXiv:2503.02112 [pdf, other]

Building Machine Learning Challenges for Anomaly Detection in Science

Authors: Elizabeth G. Campolongo, Yuan-Tang Chou, Ekaterina Govorkova, Wahid Bhimji, Wei-Lun Chao, Chris Harris, Shih-Chieh Hsu, Hilmar Lapp, Mark S. Neubauer, Josephine Namayanja, Aneesh Subramanian, Philip Harris, Advaith Anand, David E. Carlyn, Subhankar Ghosh, Christopher Lawrence, Eric Moreno, Ryan Raikman, Jiaman Wu, Ziheng Zhang, Bayu Adhi, Mohammad Ahmadi Gharehtoragh, Saúl Alonso Monsalve, Marta Babicz, Furqan Baig , et al. (125 additional authors not shown)

Abstract: Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be c… ▽ More Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery. △ Less

Submitted 29 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

Comments: 17 pages 6 figures to be submitted to Nature Communications

arXiv:2411.00143 [pdf, other]

Enhancing Brain Source Reconstruction through Physics-Informed 3D Neural Networks

Authors: Marco Morik, Ali Hashemi, Klaus-Robert Müller, Stefan Haufe, Shinichi Nakajima

Abstract: Reconstructing brain sources is a fundamental challenge in neuroscience, crucial for understanding brain function and dysfunction. Electroencephalography (EEG) signals have a high temporal resolution. However, identifying the correct spatial location of brain sources from these signals remains difficult due to the ill-posed structure of the problem. Traditional methods predominantly rely on manual… ▽ More Reconstructing brain sources is a fundamental challenge in neuroscience, crucial for understanding brain function and dysfunction. Electroencephalography (EEG) signals have a high temporal resolution. However, identifying the correct spatial location of brain sources from these signals remains difficult due to the ill-posed structure of the problem. Traditional methods predominantly rely on manually crafted priors, missing the flexibility of data-driven learning, while recent deep learning approaches focus on end-to-end learning, typically using the physical information of the forward model only for generating training data. We propose the novel hybrid method 3D-PIUNet for EEG source localization that effectively integrates the strengths of traditional and deep learning techniques. 3D-PIUNet starts from an initial physics-informed estimate by using the pseudo inverse to map from measurements to source space. Secondly, by viewing the brain as a 3D volume, we use a 3D convolutional U-Net to capture spatial dependencies and refine the solution according to the learned data prior. Training the model relies on simulated pseudo-realistic brain source data, covering different source distributions. Trained on this data, our model significantly improves spatial accuracy, demonstrating superior performance over both traditional and end-to-end data-driven methods. Additionally, we validate our findings with real EEG data from a visual task, where 3D-PIUNet successfully identifies the visual cortex and reconstructs the expected temporal behavior, thereby showcasing its practical applicability. △ Less

Submitted 31 October, 2024; originally announced November 2024.

Comments: Under Review in IEEE Transactions on Medical Imaging

arXiv:2410.19207 [pdf, other]

Equitable Federated Learning with Activation Clustering

Authors: Antesh Upadhyay, Abolfazl Hashemi

Abstract: Federated learning is a prominent distributed learning paradigm that incorporates collaboration among diverse clients, promotes data locality, and thus ensures privacy. These clients have their own technological, cultural, and other biases in the process of data generation. However, the present standard often ignores this bias/heterogeneity, perpetuating bias against certain groups rather than mit… ▽ More Federated learning is a prominent distributed learning paradigm that incorporates collaboration among diverse clients, promotes data locality, and thus ensures privacy. These clients have their own technological, cultural, and other biases in the process of data generation. However, the present standard often ignores this bias/heterogeneity, perpetuating bias against certain groups rather than mitigating it. In response to this concern, we propose an equitable clustering-based framework where the clients are categorized/clustered based on how similar they are to each other. We propose a unique way to construct the similarity matrix that uses activation vectors. Furthermore, we propose a client weighing mechanism to ensure that each cluster receives equal importance and establish $O(1/\sqrt{K})$ rate of convergence to reach an $ε-$stationary solution. We assess the effectiveness of our proposed strategy against common baselines, demonstrating its efficacy in terms of reducing the bias existing amongst various client clusters and consequently ameliorating algorithmic bias against specific groups. △ Less

Submitted 1 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

Comments: 28 pages

arXiv:2409.08917 [pdf, other]

Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation

Authors: Guojun Liang, Najmeh Abiri, Atiye Sadat Hashemi, Jens Lundström, Stefan Byttner, Prayag Tiwari

Abstract: Accurate imputation is essential for the reliability and success of downstream tasks. Recently, diffusion models have attracted great attention in this field. However, these models neglect the latent distribution in a lower-dimensional space derived from the observed data, which limits the generative capacity of the diffusion model. Additionally, dealing with the original missing data without labe… ▽ More Accurate imputation is essential for the reliability and success of downstream tasks. Recently, diffusion models have attracted great attention in this field. However, these models neglect the latent distribution in a lower-dimensional space derived from the observed data, which limits the generative capacity of the diffusion model. Additionally, dealing with the original missing data without labels becomes particularly problematic. To address these issues, we propose the Latent Space Score-Based Diffusion Model (LSSDM) for probabilistic multivariate time series imputation. Observed values are projected onto low-dimensional latent space and coarse values of the missing data are reconstructed without knowing their ground truth values by this unsupervised learning approach. Finally, the reconstructed values are fed into a conditional diffusion model to obtain the precise imputed values of the time series. In this way, LSSDM not only possesses the power to identify the latent distribution but also seamlessly integrates the diffusion model to obtain the high-fidelity imputed values and assess the uncertainty of the dataset. Experimental results demonstrate that LSSDM achieves superior imputation performance while also providing a better explanation and uncertainty analysis of the imputation mechanism. The website of the code is \textit{https://github.com/gorgen2020/LSSDM\_imputation}. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 5 pages, conference

arXiv:2408.13683 [pdf, ps, other]

Submodular Maximization Approaches for Equitable Client Selection in Federated Learning

Authors: Andrés Catalino Castillo Jiménez, Ege C. Kaya, Lintao Ye, Abolfazl Hashemi

Abstract: In a conventional Federated Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration. However, this random selection often leads to disparate performance among clients, raising concerns regarding fairness, particularly in applications where equitable outcomes are crucial, such as in medical or financial machine learning tasks… ▽ More In a conventional Federated Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration. However, this random selection often leads to disparate performance among clients, raising concerns regarding fairness, particularly in applications where equitable outcomes are crucial, such as in medical or financial machine learning tasks. This disparity typically becomes more pronounced with the advent of performance-centric client sampling techniques. This paper introduces two novel methods, namely SUBTRUNC and UNIONFL, designed to address the limitations of random client selection. Both approaches utilize submodular function maximization to achieve more balanced models. By modifying the facility location problem, they aim to mitigate the fairness concerns associated with random selection. SUBTRUNC leverages client loss information to diversify solutions, while UNIONFL relies on historical client selection data to ensure a more equitable performance of the final model. Moreover, these algorithms are accompanied by robust theoretical guarantees regarding convergence under reasonable assumptions. The efficacy of these methods is demonstrated through extensive evaluations across heterogeneous scenarios, revealing significant improvements in fairness as measured by a client dissimilarity metric. △ Less

Submitted 27 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

Comments: 13 pages

arXiv:2407.21788 [pdf, other]

Vision-Language Model Based Handwriting Verification

Authors: Mihir Chauhan, Abhishek Satbhai, Mohammad Abuzar Hashemi, Mir Basheer Ali, Bina Ramamurthy, Mingchen Gao, Siwei Lyu, Sargur Srihari

Abstract: Handwriting Verification is a critical in document forensics. Deep learning based approaches often face skepticism from forensic document examiners due to their lack of explainability and reliance on extensive training data and handcrafted features. This paper explores using Vision Language Models (VLMs), such as OpenAI's GPT-4o and Google's PaliGemma, to address these challenges. By leveraging th… ▽ More Handwriting Verification is a critical in document forensics. Deep learning based approaches often face skepticism from forensic document examiners due to their lack of explainability and reliance on extensive training data and handcrafted features. This paper explores using Vision Language Models (VLMs), such as OpenAI's GPT-4o and Google's PaliGemma, to address these challenges. By leveraging their Visual Question Answering capabilities and 0-shot Chain-of-Thought (CoT) reasoning, our goal is to provide clear, human-understandable explanations for model decisions. Our experiments on the CEDAR handwriting dataset demonstrate that VLMs offer enhanced interpretability, reduce the need for large training datasets, and adapt better to diverse handwriting styles. However, results show that the CNN-based ResNet-18 architecture outperforms the 0-shot CoT prompt engineering approach with GPT-4o (Accuracy: 70%) and supervised fine-tuned PaliGemma (Accuracy: 71%), achieving an accuracy of 84% on the CEDAR AND dataset. These findings highlight the potential of VLMs in generating human-interpretable decisions while underscoring the need for further advancements to match the performance of specialized deep learning models. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: 4 Pages, 1 Figure, 1 Table, Accepted as Short paper at Irish Machine Vision and Image Processing (IMVIP) Conference

arXiv:2405.18320 [pdf, other]

Self-Supervised Learning Based Handwriting Verification

Authors: Mihir Chauhan, Mohammad Abuzar Hashemi, Abhishek Satbhai, Mir Basheer Ali, Bina Ramamurthy, Mingchen Gao, Siwei Lyu, Sargur Srihari

Abstract: We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature extractors and supervised learning on CEDAR AND data… ▽ More We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature extractors and supervised learning on CEDAR AND dataset. We show that ResNet based Variational Auto-Encoder (VAE) outperforms other generative approaches achieving 76.3% accuracy, while ResNet-18 fine-tuned using Variance-Invariance-Covariance Regularization (VICReg) outperforms other contrastive approaches achieving 78% accuracy. Using a pre-trained VAE and VICReg for the downstream task of writer verification we observed a relative improvement in accuracy of 6.7% and 9% over ResNet-18 supervised baseline with 10% writer labels. △ Less

Submitted 1 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures, 2 tables, Accepted at Irish Machine Vision and Image Processing Conference 2024

arXiv:2405.18237 [pdf, other]

Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression

Authors: Zhankun Luo, Abolfazl Hashemi

Abstract: We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear converg… ▽ More We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings under some assumptions and its global convergence rate with random initialization has been affirmed. However, the exponent of convergence has not been theoretically estimated and the geometric properties of the trajectory of EM iterations are not well-understood. In this paper, first, using Bessel functions we provide explicit closed-form expressions for the EM updates under all SNR regimes. Then, in the noiseless setting, we completely characterize the behavior of EM iterations by deriving a recurrence relation at the population level and notably show that all the iterations lie on a certain cycloid. Based on this new trajectory-based analysis, we exhibit the theoretical estimate for the exponent of super-linear convergence and further improve the statistical error bound at the finite-sample level. Our analysis provides a new framework for studying the behavior of EM for Mixed Linear Regression. △ Less

Submitted 3 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: This paper was accepted by the 41st International Conference on Machine Learning (ICML 2024). The code for numerical experiments is available at https://github.com/dassein/cycloid_em_mlr

arXiv:2405.02188 [pdf, other]

Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes

Authors: Sang Bin Moon, Abolfazl Hashemi

Abstract: The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not… ▽ More The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not adversarial. To address this, we introduce and study a new variant of AMDP, which aims to minimize regret while utilizing a set of cost predictors. For this setting, we develop a new policy search method that achieves a sublinear optimistic regret with high probability, that is a regret bound which gracefully degrades with the estimation power of the cost predictors. Establishing such optimistic regret bounds is nontrivial given that (i) as we demonstrate, the existing importance-weighted cost estimators cannot establish optimistic bounds, and (ii) the feedback model of AMDP is different (and more realistic) than the existing optimistic online learning works. Our result, in particular, hinges upon developing a novel optimistically biased cost estimator that leverages cost predictors and enables a high-probability regret analysis without imposing restrictive assumptions. We further discuss practical extensions of the proposed scheme and demonstrate its efficacy numerically. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.08003 [pdf, other]

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Authors: Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi, Vaneet Aggarwal, Christopher G. Brinton

Abstract: To improve the efficiency of reinforcement learning (RL), we propose a novel asynchronous federated reinforcement learning (FedRL) framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To address the challenge of lagged policies in asynchronous settings, we design a delay-adaptive lookahead technique \textit{specifically… ▽ More To improve the efficiency of reinforcement learning (RL), we propose a novel asynchronous federated reinforcement learning (FedRL) framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To address the challenge of lagged policies in asynchronous settings, we design a delay-adaptive lookahead technique \textit{specifically for FedRL} that can effectively handle heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $O(\frac{ε^{-2.5}}{N})$ sample complexity for global convergence at each agent on average. Compared to the single agent setting with $O(ε^{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $O(\frac{t_{\max}}{N})$ to $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$, where $t_{i}$ denotes the time consumption in each iteration at agent $i$, and $t_{\max}$ is the largest one. The latter complexity $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the improved performance of AFedPG in four widely used MuJoCo environments with varying numbers of agents. We also demonstrate the advantages of AFedPG in various computing heterogeneity scenarios. △ Less

Submitted 23 January, 2025; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Published as a conference paper at ICLR 2025

ACM Class: I.2.6; I.2.11

arXiv:2404.05919 [pdf, other]

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

Authors: Sai Aparna Aketi, Abolfazl Hashemi, Kaushik Roy

Abstract: Decentralized learning is crucial in supporting on-device learning over large distributed datasets, eliminating the need for a central server. However, the communication overhead remains a major bottleneck for the practical realization of such decentralized setups. To tackle this issue, several algorithms for decentralized training with compressed communication have been proposed in the literature… ▽ More Decentralized learning is crucial in supporting on-device learning over large distributed datasets, eliminating the need for a central server. However, the communication overhead remains a major bottleneck for the practical realization of such decentralized setups. To tackle this issue, several algorithms for decentralized training with compressed communication have been proposed in the literature. Most of these algorithms introduce an additional hyper-parameter referred to as consensus step-size which is tuned based on the compression ratio at the beginning of the training. In this work, we propose AdaGossip, a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. We demonstrate the effectiveness of the proposed method through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance ($0-2\%$ improvement in test accuracy) compared to the current state-of-the-art method for decentralized learning with communication compression. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 11 pages, 3 figures, 8 tables. arXiv admin note: text overlap with arXiv:2305.04792, arXiv:2310.15890

arXiv:2404.03759 [pdf, other]

Localized Distributional Robustness in Submodular Multi-Task Subset Selection

Authors: Ege C. Kaya, Abolfazl Hashemi

Abstract: In this work, we approach the problem of multi-task submodular optimization with the perspective of local distributional robustness, within the neighborhood of a reference distribution which assigns an importance score to each task. We initially propose to introduce a regularization term which makes use of the relative entropy to the standard multi-task objective. We then demonstrate through duali… ▽ More In this work, we approach the problem of multi-task submodular optimization with the perspective of local distributional robustness, within the neighborhood of a reference distribution which assigns an importance score to each task. We initially propose to introduce a regularization term which makes use of the relative entropy to the standard multi-task objective. We then demonstrate through duality that this novel formulation itself is equivalent to the maximization of a monotone increasing function composed with a submodular function, which may be efficiently carried out through standard greedy selection methods. This approach bridges the existing gap in the optimization of performance-robustness trade-offs in multi-task subset selection. To numerically validate our theoretical results, we test the proposed method in two different settings, one on the selection of satellites in low Earth orbit constellations in the context of a sensor selection problem involving weak-submodular functions, and the other on an image summarization task using neural networks involving submodular functions. Our method is compared with two other algorithms focused on optimizing the performance of the worst-case task, and on directly optimizing the performance on the reference distribution itself. We conclude that our novel formulation produces a solution that is locally distributional robust, and computationally inexpensive. △ Less

Submitted 3 November, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 41 pages, 7 figures. A preliminary version of this article was presented at the 2023 Allerton Conference on Communication, Control, and Computing. This version is to be published in IEEE Transactions on Signal Processing

arXiv:2403.13247 [pdf, other]

FedNMUT -- Federated Noisy Model Update Tracking Convergence Analysis

Authors: Vishnu Pandi Chellapandi, Antesh Upadhyay, Abolfazl Hashemi, Stanislaw H. Żak

Abstract: A novel Decentralized Noisy Model Update Tracking Federated Learning algorithm (FedNMUT) is proposed that is tailored to function efficiently in the presence of noisy communication channels that reflect imperfect information exchange. This algorithm uses gradient tracking to minimize the impact of data heterogeneity while minimizing communication overhead. The proposed algorithm incorporates noise… ▽ More A novel Decentralized Noisy Model Update Tracking Federated Learning algorithm (FedNMUT) is proposed that is tailored to function efficiently in the presence of noisy communication channels that reflect imperfect information exchange. This algorithm uses gradient tracking to minimize the impact of data heterogeneity while minimizing communication overhead. The proposed algorithm incorporates noise into its parameters to mimic the conditions of noisy communication channels, thereby enabling consensus among clients through a communication graph topology in such challenging environments. FedNMUT prioritizes parameter sharing and noise incorporation to increase the resilience of decentralized learning systems against noisy communications. Theoretical results for the smooth non-convex objective function are provided by us, and it is shown that the $ε-$stationary solution is achieved by our algorithm at the rate of $\mathcal{O}\left(\frac{1}{\sqrt{T}}\right)$, where $T$ is the total number of communication rounds. Additionally, via empirical validation, we demonstrated that the performance of FedNMUT is superior to the existing state-of-the-art methods and conventional parameter-mixing approaches in dealing with imperfect information sharing. This proves the capability of the proposed algorithm to counteract the negative effects of communication noise in a decentralized learning framework. △ Less

Submitted 24 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2303.10695

arXiv:2402.18726 [pdf, other]

Unveiling Privacy, Memorization, and Input Curvature Links

Authors: Deepak Ravikumar, Efstathia Soufleri, Abolfazl Hashemi, Kaushik Roy

Abstract: Deep Neural Nets (DNNs) have become a pervasive tool for solving many emerging problems. However, they tend to overfit to and memorize the training set. Memorization is of keen interest since it is closely related to several concepts such as generalization, noisy learning, and privacy. To study memorization, Feldman (2019) proposed a formal score, however its computational requirements limit its p… ▽ More Deep Neural Nets (DNNs) have become a pervasive tool for solving many emerging problems. However, they tend to overfit to and memorize the training set. Memorization is of keen interest since it is closely related to several concepts such as generalization, noisy learning, and privacy. To study memorization, Feldman (2019) proposed a formal score, however its computational requirements limit its practical use. Recent research has shown empirical evidence linking input loss curvature (measured by the trace of the loss Hessian w.r.t inputs) and memorization. It was shown to be ~3 orders of magnitude more efficient than calculating the memorization score. However, there is a lack of theoretical understanding linking memorization with input loss curvature. In this paper, we not only investigate this connection but also extend our analysis to establish theoretical links between differential privacy, memorization, and input loss curvature. First, we derive an upper bound on memorization characterized by both differential privacy and input loss curvature. Second, we present a novel insight showing that input loss curvature is upper-bounded by the differential privacy parameter. Our theoretical findings are further empirically validated using deep models on CIFAR and ImageNet datasets, showing a strong correlation between our theoretical predictions and results observed in practice. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2401.05379 [pdf, other]

AutoVisual Fusion Suite: A Comprehensive Evaluation of Image Segmentation and Voice Conversion Tools on HuggingFace Platform

Authors: Amirreza Hashemi

Abstract: This study presents a comprehensive evaluation of tools available on the HuggingFace platform for two pivotal applications in artificial intelligence: image segmentation and voice conversion. The primary objective was to identify the top three tools within each category and subsequently install and configure these tools on Linux systems. We leveraged the power of pre-trained segmentation models su… ▽ More This study presents a comprehensive evaluation of tools available on the HuggingFace platform for two pivotal applications in artificial intelligence: image segmentation and voice conversion. The primary objective was to identify the top three tools within each category and subsequently install and configure these tools on Linux systems. We leveraged the power of pre-trained segmentation models such as SAM and DETR Model with ResNet-50 backbone for image segmentation, and the so-vits-svc-fork model for voice conversion. This paper delves into the methodologies and challenges encountered during the implementation process, and showcases the successful combination of video segmentation and voice conversion in a unified project named AutoVisual Fusion Suite. △ Less

Submitted 12 January, 2024; v1 submitted 17 December, 2023; originally announced January 2024.

Comments: 27 pages, 21 figures

arXiv:2310.05286 [pdf, other]

Generalizable Error Modeling for Human Data Annotation: Evidence From an Industry-Scale Search Data Annotation Program

Authors: Heinrich Peters, Alireza Hashemi, James Rae

Abstract: Machine learning (ML) and artificial intelligence (AI) systems rely heavily on human-annotated data for training and evaluation. A major challenge in this context is the occurrence of annotation errors, as their effects can degrade model performance. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML appl… ▽ More Machine learning (ML) and artificial intelligence (AI) systems rely heavily on human-annotated data for training and evaluation. A major challenge in this context is the occurrence of annotation errors, as their effects can degrade model performance. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML applications (music streaming, video streaming, and mobile apps). Drawing on real-world data from an extensive search relevance annotation program, we demonstrate that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications (i.e., a global, task-agnostic model performs on par with task-specific models). In contrast to past research, which has often focused on predicting annotation labels from task-specific features, our model is trained to predict errors directly from a combination of task features and behavioral features derived from the annotation process, in order to achieve a high degree of generalizability. We demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors (e.g., 40% efficiency gains for the music streaming application). These results highlight that behavioral error detection models can yield considerable improvements in the efficiency and quality of data annotation processes. Our findings reveal critical insights into effective error management in the data annotation process, thereby contributing to the broader field of human-in-the-loop ML. △ Less

Submitted 25 September, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2307.07406 [pdf, other]

doi 10.1109/ACCESS.2023.3288613

Improved Convergence Analysis and SNR Control Strategies for Federated Learning in the Presence of Noise

Authors: Antesh Upadhyay, Abolfazl Hashemi

Abstract: We propose an improved convergence analysis technique that characterizes the distributed learning paradigm of federated learning (FL) with imperfect/noisy uplink and downlink communications. Such imperfect communication scenarios arise in the practical deployment of FL in emerging communication systems and protocols. The analysis developed in this paper demonstrates, for the first time, that there… ▽ More We propose an improved convergence analysis technique that characterizes the distributed learning paradigm of federated learning (FL) with imperfect/noisy uplink and downlink communications. Such imperfect communication scenarios arise in the practical deployment of FL in emerging communication systems and protocols. The analysis developed in this paper demonstrates, for the first time, that there is an asymmetry in the detrimental effects of uplink and downlink communications in FL. In particular, the adverse effect of the downlink noise is more severe on the convergence of FL algorithms. Using this insight, we propose improved Signal-to-Noise (SNR) control strategies that, discarding the negligible higher-order terms, lead to a similar convergence rate for FL as in the case of a perfect, noise-free communication channel while incurring significantly less power resources compared to existing solutions. In particular, we establish that to maintain the $O(\frac{1}{\sqrt{K}})$ rate of convergence like in the case of noise-free FL, we need to scale down the uplink and downlink noise by $Ω({\sqrt{k}})$ and $Ω({k})$ respectively, where $k$ denotes the communication round, $k=1,\dots, K$. Our theoretical result is further characterized by two major benefits: firstly, it does not assume the somewhat unrealistic assumption of bounded client dissimilarity, and secondly, it only requires smooth non-convex loss functions, a function class better suited for modern machine learning and deep learning models. We also perform extensive empirical analysis to verify the validity of our theoretical findings. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Journal ref: in IEEE Access, vol. 11, pp. 63398-63416, 2023

arXiv:2307.03298 [pdf]

Application of Spherical Convolutional Neural Networks to Image Reconstruction and Denoising in Nuclear Medicine

Authors: Amirreza Hashemi, Yuemeng Feng, Arman Rahmim, Hamid Sabet

Abstract: This work investigates use of equivariant neural networks as efficient and high-performance frameworks for image reconstruction and denoising in nuclear medicine. Our work aims to tackle limitations of conventional Convolutional Neural Networks (CNNs), which require significant training. We investigated equivariant networks, aiming to reduce CNN's dependency on specific training sets. Specifically… ▽ More This work investigates use of equivariant neural networks as efficient and high-performance frameworks for image reconstruction and denoising in nuclear medicine. Our work aims to tackle limitations of conventional Convolutional Neural Networks (CNNs), which require significant training. We investigated equivariant networks, aiming to reduce CNN's dependency on specific training sets. Specifically, we implemented and evaluated equivariant spherical CNNs (SCNNs) for 2- and 3-dimensional medical imaging problems. Our results demonstrate superior quality and computational efficiency of SCNNs in both image reconstruction and denoising benchmark problems. Furthermore, we propose a novel approach to employ SCNNs as a complement to conventional image reconstruction tools, enhancing the outcomes while reducing reliance on the training set. Across all cases, we observed significant decrease in computational cost by leveraging the inherent inclusion of equivariant representatives while achieving the same or higher quality of image processing using SCNNs compared to CNNs. Additionally, we explore the potential of SCNNs for broader tomography applications, particularly those requiring rotationally variant representation. △ Less

Submitted 30 January, 2025; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.05655 [pdf, other]

doi 10.1109/ACCESS.2023.3284891

Communication-Efficient Zeroth-Order Distributed Online Optimization: Algorithm, Theory, and Applications

Authors: Ege C. Kaya, M. Berk Sahin, Abolfazl Hashemi

Abstract: This paper focuses on a multi-agent zeroth-order online optimization problem in a federated learning setting for target tracking. The agents only sense their current distances to their targets and aim to maintain a minimum safe distance from each other to prevent collisions. The coordination among the agents and dissemination of collision-prevention information is managed by a central server using… ▽ More This paper focuses on a multi-agent zeroth-order online optimization problem in a federated learning setting for target tracking. The agents only sense their current distances to their targets and aim to maintain a minimum safe distance from each other to prevent collisions. The coordination among the agents and dissemination of collision-prevention information is managed by a central server using the federated learning paradigm. The proposed formulation leads to an instance of distributed online nonconvex optimization problem that is solved via a group of communication-constrained agents. To deal with the communication limitations of the agents, an error feedback-based compression scheme is utilized for agent-to-server communication. The proposed algorithm is analyzed theoretically for the general class of distributed online nonconvex optimization problems. We provide non-asymptotic convergence rates that show the dominant term is independent of the characteristics of the compression scheme. Our theoretical results feature a new approach that employs significantly more relaxed assumptions in comparison to standard literature. The performance of the proposed solution is further analyzed numerically in terms of tracking errors and collisions between agents in two relevant applications. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 21 pages, 5 figures, and this paper has been accepted by IEEE Access

arXiv:2305.04792 [pdf, other]

Global Update Tracking: A Decentralized Learning Algorithm for Heterogeneous Data

Authors: Sai Aparna Aketi, Abolfazl Hashemi, Kaushik Roy

Abstract: Decentralized learning enables the training of deep learning models over large distributed datasets generated at different locations, without the need for a central server. However, in practical scenarios, the data distribution across these devices can be significantly different, leading to a degradation in model performance. In this paper, we focus on designing a decentralized learning algorithm… ▽ More Decentralized learning enables the training of deep learning models over large distributed datasets generated at different locations, without the need for a central server. However, in practical scenarios, the data distribution across these devices can be significantly different, leading to a degradation in model performance. In this paper, we focus on designing a decentralized learning algorithm that is less susceptible to variations in data distribution across devices. We propose Global Update Tracking (GUT), a novel tracking-based method that aims to mitigate the impact of heterogeneous data in decentralized learning without introducing any communication overhead. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, and ImageNette), model architectures, and network topologies. Our experiments show that the proposed method achieves state-of-the-art performance for decentralized learning on heterogeneous data via a $1-6\%$ improvement in test accuracy compared to other existing techniques. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 22 pages, 10 tables, 3 figures

arXiv:2303.10695 [pdf, other]

On the Convergence of Decentralized Federated Learning Under Imperfect Information Sharing

Authors: Vishnu Pandi Chellapandi, Antesh Upadhyay, Abolfazl Hashemi, Stanislaw H /. Zak

Abstract: Decentralized learning and optimization is a central problem in control that encompasses several existing and emerging applications, such as federated learning. While there exists a vast literature on this topic and most methods centered around the celebrated average-consensus paradigm, less attention has been devoted to scenarios where the communication between the agents may be imperfect. To thi… ▽ More Decentralized learning and optimization is a central problem in control that encompasses several existing and emerging applications, such as federated learning. While there exists a vast literature on this topic and most methods centered around the celebrated average-consensus paradigm, less attention has been devoted to scenarios where the communication between the agents may be imperfect. To this end, this paper presents three different algorithms of Decentralized Federated Learning (DFL) in the presence of imperfect information sharing modeled as noisy communication channels. The first algorithm, Federated Noisy Decentralized Learning (FedNDL1), comes from the literature, where the noise is added to their parameters to simulate the scenario of the presence of noisy communication channels. This algorithm shares parameters to form a consensus with the clients based on a communication graph topology through a noisy communication channel. The proposed second algorithm (FedNDL2) is similar to the first algorithm but with added noise to the parameters, and it performs the gossip averaging before the gradient optimization. The proposed third algorithm (FedNDL3), on the other hand, shares the gradients through noisy communication channels instead of the parameters. Theoretical and experimental results demonstrate that under imperfect information sharing, the third scheme that mixes gradients is more robust in the presence of a noisy channel compared with the algorithms from the literature that mix the parameters. △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: 24 pages, 2 figures

arXiv:2301.10960 [pdf, other]

Visiting Distant Neighbors in Graph Convolutional Networks

Authors: Alireza Hashemi, Hernan Makse

Abstract: We extend the graph convolutional network method for deep learning on graph data to higher order in terms of neighboring nodes. In order to construct representations for a node in a graph, in addition to the features of the node and its immediate neighboring nodes, we also include more distant nodes in the calculations. In experimenting with a number of publicly available citation graph datasets,… ▽ More We extend the graph convolutional network method for deep learning on graph data to higher order in terms of neighboring nodes. In order to construct representations for a node in a graph, in addition to the features of the node and its immediate neighboring nodes, we also include more distant nodes in the calculations. In experimenting with a number of publicly available citation graph datasets, we show that this higher order neighbor visiting pays off by outperforming the original model especially when we have a limited number of available labeled data points for the training of the model. △ Less

Submitted 22 May, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2207.01105 [pdf, other]

Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach

Authors: Yun Liao, Seyyed Ali Hashemi, Hengjie Yang, John M. Cioffi

Abstract: While constructing polar codes for successive-cancellation decoding can be implemented efficiently by sorting the bit-channels, finding optimal polar codes for cyclic-redundancy-check-aided successive-cancellation list (CA-SCL) decoding in an efficient and scalable manner still awaits investigation. This paper first maps a polar code to a unique heterogeneous graph called the polar-code-constructi… ▽ More While constructing polar codes for successive-cancellation decoding can be implemented efficiently by sorting the bit-channels, finding optimal polar codes for cyclic-redundancy-check-aided successive-cancellation list (CA-SCL) decoding in an efficient and scalable manner still awaits investigation. This paper first maps a polar code to a unique heterogeneous graph called the polar-code-construction message-passing (PCCMP) graph. Next, a heterogeneous graph-neural-network-based iterative message-passing (IMP) algorithm is proposed which aims to find a PCCMP graph that corresponds to the polar code with minimum frame error rate under CA-SCL decoding. This new IMP algorithm's major advantage lies in its scalability power. That is, the model complexity is independent of the blocklength and code rate, and a trained IMP model over a short polar code can be readily applied to a long polar code's construction. Numerical experiments show that IMP-based polar-code constructions outperform classical constructions under CA-SCL decoding. In addition, when an IMP model trained on a length-128 polar code directly applies to the construction of polar codes with different code rates and blocklengths, simulations show that these polar code constructions deliver comparable performance to the 5G polar codes. △ Less

Submitted 13 May, 2023; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: 33 pages, 11 figures, submitted to IEEE Transactions on Communications

arXiv:2202.04786 [pdf, other]

No-Regret Learning in Dynamic Stackelberg Games

Authors: Niklas Lauffer, Mahsa Ghasemi, Abolfazl Hashemi, Yagiz Savas, Ufuk Topcu

Abstract: In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's… ▽ More In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies. Although standard Stackelberg games have been utilized to improve scheduling in security domains, their deployment is often limited by requiring complete information of the follower's utility function. In contrast, we consider scenarios where the follower's utility function is unknown to the leader; however, it can be linearly parameterized. Our objective then is to provide an algorithm that prescribes a randomized strategy to the leader at each step of the game based on observations of how the follower responded in previous steps. We design a no-regret learning algorithm that, with high probability, achieves a regret bound (when compared to the best policy in hindsight) which is sublinear in the number of time steps; the degree of sublinearity depends on the number of features representing the follower's utility function. The regret of the proposed learning algorithm is independent of the size of the state space and polynomial in the rest of the parameters of the game. We show that the proposed learning algorithm outperforms existing model-free reinforcement learning approaches. △ Less

Submitted 9 February, 2022; originally announced February 2022.

Comments: Preprint, under review

ACM Class: I.2.6; I.2.11

arXiv:2112.00057 [pdf, ps, other]

Successive Syndrome-Check Decoding of Polar Codes

Authors: Seyyed Ali Hashemi, Marco Mondelli, John Cioffi, Andrea Goldsmith

Abstract: A two-part successive syndrome-check decoding of polar codes is proposed with the first part successively refining the received codeword and the second part checking its syndrome. A new formulation of the successive-cancellation (SC) decoding algorithm is presented that allows for successively refining the received codeword by comparing the log-likelihood ratio value of a frozen bit with its prede… ▽ More A two-part successive syndrome-check decoding of polar codes is proposed with the first part successively refining the received codeword and the second part checking its syndrome. A new formulation of the successive-cancellation (SC) decoding algorithm is presented that allows for successively refining the received codeword by comparing the log-likelihood ratio value of a frozen bit with its predefined value. The syndrome of the refined received codeword is then checked for possible errors. In case there are no errors, the decoding process is terminated. Otherwise, the decoder continues to refine the received codeword. The proposed method is extended to the case of SC list (SCL) decoding by terminating the decoding process when the syndrome of the best candidate in the list indicates no errors. Simulation results show that the proposed method reduces the time-complexity of SC and SCL decoders and their fast variants, especially at high signal-to-noise ratios. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: 2021 Asilomar Conference on Signals, Systems, and Computers

arXiv:2111.01692 [pdf, other]

Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging

Authors: Ali Hashemi, Yijing Gao, Chang Cai, Sanjay Ghosh, Klaus-Robert Müller, Srikantan S. Nagarajan, Stefan Haufe

Abstract: Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work eith… ▽ More Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work either neglects the temporal structure or leads to computationally demanding inference schemes. Overcoming these limitations, we devise a novel flexible hierarchical Bayesian framework within which the spatio-temporal dynamics of model parameters and noise are modeled to have Kronecker product covariance structure. Inference in our framework is based on majorization-minimization optimization and has guaranteed convergence properties. Our highly efficient algorithms exploit the intrinsic Riemannian geometry of temporal autocovariance matrices. For stationary dynamics described by Toeplitz matrices, the theory of circulant embeddings is employed. We prove convex bounding properties and derive update rules of the resulting algorithms. On both synthetic and real neural data from M/EEG, we demonstrate that our methods lead to improved performance. △ Less

Submitted 23 November, 2021; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: Accepted to the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2109.12239 [pdf, other]

doi 10.1109/ACCESS.2021.3140151

Fast Successive-Cancellation List Flip Decoding of Polar Codes

Authors: Nghia Doan, Seyyed Ali Hashemi, Warren J. Gross

Abstract: This work presents a fast successive-cancellation list flip (Fast-SCLF) decoding algorithm for polar codes that addresses the high latency issue associated with the successive-cancellation list flip (SCLF) decoding algorithm. We first propose a bit-flipping strategy tailored to the state-of-the-art fast successive-cancellation list (FSCL) decoding that avoids tree-traversal in the binary tree repr… ▽ More This work presents a fast successive-cancellation list flip (Fast-SCLF) decoding algorithm for polar codes that addresses the high latency issue associated with the successive-cancellation list flip (SCLF) decoding algorithm. We first propose a bit-flipping strategy tailored to the state-of-the-art fast successive-cancellation list (FSCL) decoding that avoids tree-traversal in the binary tree representation of SCLF, thus reducing the latency of the decoding process. We then derive a parameterized path selection error model to accurately estimate the bit index at which the correct decoding path is eliminated from the initial FSCL decoding. The trainable parameter is optimized online based on an efficient supervised learning framework. Simulation results show that for a polar code of length 512 with 256 information bits, with similar error-correction performance and memory consumption, the proposed Fast-SCLF decoder reduces up to $73.4\%$ of the average decoding latency of the SCLF decoder with the same list size at the frame error rate of $10^{-4}$, while incurring a maximum computational complexity overhead of $27.6\%$. For the same polar code of length 512 with 256 information bits and at practical signal-to-noise ratios, the proposed decoder with list size 4 reduces $89.3\%$ and $43.7\%$ of the average complexity and decoding latency of the FSCL decoder with list size 32 (FSCL-32), respectively, while also reducing $83.2\%$ of the memory consumption of FSCL-32. The significant improvements of the proposed decoder come at the cost of $0.07$ dB error-correction performance degradation compared with FSCL-32. △ Less

Submitted 23 January, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

Comments: Published in IEEE Access, Volume: 10, Page(s): 5568 - 5584, Date of Publication: 04 January 2022

arXiv:2109.04993 [pdf, other]

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation

Authors: Mohammad Abuzar Hashemi, Zhanghexuan Li, Mihir Chauhan, Yan Shen, Abhishek Satbhai, Mir Basheer Ali, Mingchen Gao, Sargur Srihari

Abstract: Pre-training visual and textual representations from large-scale image-text pairs is becoming a standard approach for many downstream vision-language tasks. The transformer-based models learn inter and intra-modal attention through a list of self-supervised learning tasks. This paper proposes LAViTeR, a novel architecture for visual and textual representation learning. The main module, Visual Text… ▽ More Pre-training visual and textual representations from large-scale image-text pairs is becoming a standard approach for many downstream vision-language tasks. The transformer-based models learn inter and intra-modal attention through a list of self-supervised learning tasks. This paper proposes LAViTeR, a novel architecture for visual and textual representation learning. The main module, Visual Textual Alignment (VTA) will be assisted by two auxiliary tasks, GAN-based image synthesis and Image Captioning. We also propose a new evaluation metric measuring the similarity between the learnt visual and textual embedding. The experimental results on two public datasets, CUB and MS-COCO, demonstrate superior visual and textual representation alignment in the joint feature embedding space △ Less

Submitted 1 October, 2024; v1 submitted 4 September, 2021; originally announced September 2021.

Comments: 15 pages, 10 Figures, 5 Tables. Accepted for Oral Presentation at Irish Machine Vision and Image Processing Conference Proceedings (IMVIP), 2024

arXiv:2109.02122 [pdf, other]

Decoding Reed-Muller Codes with Successive Codeword Permutations

Authors: Nghia Doan, Seyyed Ali Hashemi, Marco Mondelli, Warren J. Gross

Abstract: A novel recursive list decoding (RLD) algorithm for Reed-Muller (RM) codes based on successive permutations (SP) of the codeword is presented. A low-complexity SP scheme applied to a subset of the symmetry group of RM codes is first proposed to carefully select a good codeword permutation on the fly. Then, the proposed SP technique is integrated into an improved RLD algorithm that initializes diff… ▽ More A novel recursive list decoding (RLD) algorithm for Reed-Muller (RM) codes based on successive permutations (SP) of the codeword is presented. A low-complexity SP scheme applied to a subset of the symmetry group of RM codes is first proposed to carefully select a good codeword permutation on the fly. Then, the proposed SP technique is integrated into an improved RLD algorithm that initializes different decoding paths with random codeword permutations, which are sampled from the full symmetry group of RM codes. Finally, efficient latency and complexity reduction schemes are introduced that virtually preserve the error-correction performance of the proposed decoder. Simulation results demonstrate that at the target frame error rate of $10^{-3}$ for the RM code of length $256$ with $163$ information bits, the proposed decoder reduces $6\%$ of the computational complexity and $22\%$ of the decoding latency of the state-of-the-art semi-parallel simplified successive-cancellation decoder with fast Hadamard transform (SSC-FHT) that uses $96$ permutations from the full symmetry group of RM codes, while relatively maintaining the error-correction performance and memory consumption of the semi-parallel permuted SSC-FHT decoder. △ Less

Submitted 20 September, 2022; v1 submitted 5 September, 2021; originally announced September 2021.

Comments: Accepted for publication in IEEE Transactions on Communications

arXiv:2108.12550 [pdf, ps, other]

Successive-Cancellation Decoding of Reed-Muller Codes with Fast Hadamard Transform

Authors: Nghia Doan, Seyyed Ali Hashemi, Warren J. Gross

Abstract: A novel permuted fast successive-cancellation list decoding algorithm with fast Hadamard transform (FHT-FSCL) is presented. The proposed decoder initializes $L$ $(L\ge1)$ active decoding paths with $L$ random codeword permutations sampled from the full symmetry group of the codes. The path extension in the permutation domain is carried out until the first constituent RM code of order $1$ is visite… ▽ More A novel permuted fast successive-cancellation list decoding algorithm with fast Hadamard transform (FHT-FSCL) is presented. The proposed decoder initializes $L$ $(L\ge1)$ active decoding paths with $L$ random codeword permutations sampled from the full symmetry group of the codes. The path extension in the permutation domain is carried out until the first constituent RM code of order $1$ is visited. Conventional path extension of the successive-cancellation list decoder is then utilized in the information bit domain. The simulation results show that for a RM code of length $512$ with $46$ information bits, by running $20$ parallel permuted FHT-FSCL decoders with $L=4$, we reduce $72\%$ of the computational complexity, $22\%$ of the decoding latency, and $84\%$ of the memory consumption of the state-of-the-art simplified successive-cancellation decoder that uses $512$ permutations sampled from the full symmetry group of the code, with similar error-correction performance at the target frame error rate of $10^{-4}$. △ Less

Submitted 7 February, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: Submitted to an IEEE journal for possible publication

arXiv:2107.08991 [pdf, ps, other]

A Tree Search Approach for Maximum-Likelihood Decoding of Reed-Muller Codes

Authors: Seyyed Ali Hashemi, Nghia Doan, Warren J. Gross, John Cioffi, Andrea Goldsmith

Abstract: A low-complexity tree search approach is presented that achieves the maximum-likelihood (ML) decoding performance of Reed-Muller (RM) codes. The proposed approach generates a bit-flipping tree that is traversed to find the ML decoding result by performing successive-cancellation decoding after each node visit. A depth-first search (DFS) and a breadth-first search (BFS) scheme are developed and a l… ▽ More A low-complexity tree search approach is presented that achieves the maximum-likelihood (ML) decoding performance of Reed-Muller (RM) codes. The proposed approach generates a bit-flipping tree that is traversed to find the ML decoding result by performing successive-cancellation decoding after each node visit. A depth-first search (DFS) and a breadth-first search (BFS) scheme are developed and a log-likelihood-ratio-based bit-flipping metric is utilized to avoid redundant node visits in the tree. Several enhancements to the proposed algorithm are presented to further reduce the number of node visits. Simulation results confirm that the BFS scheme provides a lower average number of node visits than the existing tree search approach to decode RM codes. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2107.00116 [pdf, other]

On the Benefits of Inducing Local Lipschitzness for Robust Generative Adversarial Imitation Learning

Authors: Farzan Memarian, Abolfazl Hashemi, Scott Niekum, Ufuk Topcu

Abstract: We explore methodologies to improve the robustness of generative adversarial imitation learning (GAIL) algorithms to observation noise. Towards this objective, we study the effect of local Lipschitzness of the discriminator and the generator on the robustness of policies learned by GAIL. In many robotics applications, the learned policies by GAIL typically suffer from a degraded performance at tes… ▽ More We explore methodologies to improve the robustness of generative adversarial imitation learning (GAIL) algorithms to observation noise. Towards this objective, we study the effect of local Lipschitzness of the discriminator and the generator on the robustness of policies learned by GAIL. In many robotics applications, the learned policies by GAIL typically suffer from a degraded performance at test time since the observations from the environment might be corrupted by noise. Hence, robustifying the learned policies against the observation noise is of critical importance. To this end, we propose a regularization method to induce local Lipschitzness in the generator and the discriminator of adversarial imitation learning methods. We show that the modified objective leads to learning significantly more robust policies. Moreover, we demonstrate -- both theoretically and experimentally -- that training a locally Lipschitz discriminator leads to a locally Lipschitz generator, thereby improving the robustness of the resultant policy. We perform extensive experiments on simulated robot locomotion environments from the MuJoCo suite that demonstrate the proposed method learns policies that significantly outperform the state-of-the-art generative adversarial imitation learning algorithm when applied to test scenarios with noise-corrupted observations. △ Less

Submitted 15 January, 2024; v1 submitted 30 June, 2021; originally announced July 2021.

arXiv:2106.08882 [pdf, other]

Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

Authors: Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

Abstract: Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{G… ▽ More Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{Gm} to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 0.5 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with \textsc{Gm}. △ Less

Submitted 16 June, 2021; originally announced June 2021.

arXiv:2106.07094 [pdf, other]

On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates

Authors: Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

Abstract: There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update i… ▽ More There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy. For Lipschitz functions, the Lipschitz constant serves as a trivial clipping threshold with zero bias. However, Lipschitzness does not hold in many practical settings; moreover, verifying it and computing the Lipschitz constant is hard. Thus, the choice of the clipping threshold is non-trivial and requires a lot of tuning in practice. In this paper, we provide the first convergence result for private FL on smooth \textit{convex} objectives \textit{for a general clipping threshold} -- \textit{without assuming Lipschitzness}. We also look at a simpler alternative to clipping (for bounding sensitivity) which is \textit{normalization} -- where we use only a scaled version of the unit vector along the client updates, completely discarding the magnitude information. {The resulting normalization-based private FL algorithm is theoretically shown to have better convergence than its clipping-based counterpart on smooth convex functions. We corroborate our theory with synthetic experiments as well as experiments on benchmarking datasets. △ Less

Submitted 15 April, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

arXiv:2103.03191 [pdf, other]

Generalization Bounds for Sparse Random Feature Expansions

Authors: Abolfazl Hashemi, Hayden Schaeffer, Robert Shi, Ufuk Topcu, Giang Tran, Rachel Ward

Abstract: Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting t… ▽ More Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature expansion to obtain parsimonious random feature models. Specifically, we leverage ideas from compressive sensing to generate random feature expansions with theoretical guarantees even in the data-scarce setting. In particular, we provide generalization bounds for functions in a certain class (that is dense in a reproducing kernel Hilbert space) depending on the number of samples and the distribution of features. The generalization bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. In particular, by introducing sparse features, i.e. features with random sparse weights, we provide improved bounds for low order functions. We show that the sparse random feature expansions outperforms shallow networks in several scientific machine learning tasks. △ Less

Submitted 20 August, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

arXiv:2012.13378 [pdf, ps, other]

Parallelism versus Latency in Simplified Successive-Cancellation Decoding of Polar Codes

Authors: Seyyed Ali Hashemi, Marco Mondelli, Arman Fazeli, Alexander Vardy, John Cioffi, Andrea Goldsmith

Abstract: This paper characterizes the latency of the simplified successive-cancellation (SSC) decoding scheme for polar codes under hardware resource constraints. In particular, when the number of processing elements $P$ that can perform SSC decoding operations in parallel is limited, as is the case in practice, the latency of SSC decoding is $O\left(N^{1-1/μ}+\frac{N}{P}\log_2\log_2\frac{N}{P}\right)$, wh… ▽ More This paper characterizes the latency of the simplified successive-cancellation (SSC) decoding scheme for polar codes under hardware resource constraints. In particular, when the number of processing elements $P$ that can perform SSC decoding operations in parallel is limited, as is the case in practice, the latency of SSC decoding is $O\left(N^{1-1/μ}+\frac{N}{P}\log_2\log_2\frac{N}{P}\right)$, where $N$ is the block length of the code and $μ$ is the scaling exponent of the channel. Three direct consequences of this bound are presented. First, in a fully-parallel implementation where $P=\frac{N}{2}$, the latency of SSC decoding is $O\left(N^{1-1/μ}\right)$, which is sublinear in the block length. This recovers a result from our earlier work. Second, in a fully-serial implementation where $P=1$, the latency of SSC decoding scales as $O\left(N\log_2\log_2 N\right)$. The multiplicative constant is also calculated: we show that the latency of SSC decoding when $P=1$ is given by $\left(2+o(1)\right) N\log_2\log_2 N$. Third, in a semi-parallel implementation, the smallest $P$ that gives the same latency as that of the fully-parallel implementation is $P=N^{1/μ}$. The tightness of our bound on SSC decoding latency and the applicability of the foregoing results is validated through extensive simulations. △ Less

Submitted 24 December, 2020; originally announced December 2020.

arXiv:2012.04061 [pdf, other]

Faster Non-Convex Federated Learning via Global and Local Momentum

Authors: Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

Abstract: We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key… ▽ More We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key algorithmic idea that enables achieving this improved complexity is based on the observation that the convergence in FL is hampered by two sources of high variance: (i) the global server aggregation step with multiple local updates, exacerbated by client heterogeneity, and (ii) the noise of the local client-level stochastic gradients. By modeling the server aggregation step as a generalized gradient-type update, we propose a variance-reducing momentum-based global update at the server, which when applied in conjunction with variance-reduced local updates at the clients, enables \texttt{FedGLOMO} to enjoy an improved convergence rate. Moreover, we derive our results under a novel and more realistic client-heterogeneity assumption which we verify empirically -- unlike prior assumptions that are hard to verify. Our experiments illustrate the intrinsic variance reduction effect of \texttt{FedGLOMO}, which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication efficiency. △ Less

Submitted 24 October, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

arXiv:2012.02232 [pdf, other]

doi 10.1088/2632-2153/ac1fc9

Graph Convolutional Neural Networks for Body Force Prediction

Authors: Francis Ogoke, Kazem Meidani, Amirreza Hashemi, Amir Barati Farimani

Abstract: Many scientific and engineering processes produce spatially unstructured data. However, most data-driven models require a feature matrix that enforces both a set number and order of features for each sample. They thus cannot be easily constructed for an unstructured dataset. Therefore, a graph based data-driven model to perform inference on fields defined on an unstructured mesh, using a Graph Con… ▽ More Many scientific and engineering processes produce spatially unstructured data. However, most data-driven models require a feature matrix that enforces both a set number and order of features for each sample. They thus cannot be easily constructed for an unstructured dataset. Therefore, a graph based data-driven model to perform inference on fields defined on an unstructured mesh, using a Graph Convolutional Neural Network (GCNN) is presented. The ability of the method to predict global properties from spatially irregular measurements with high accuracy is demonstrated by predicting the drag force associated with laminar flow around airfoils from scattered velocity measurements. The network can infer from field samples at different resolutions, and is invariant to the order in which the measurements within each sample are presented. The GCNN method, using inductive convolutional layers and adaptive pooling, is able to predict this quantity with a validation $R^{2}$ above 0.98, and a Normalized Mean Squared Error below 0.01, without relying on spatial structure. △ Less

Submitted 3 December, 2020; originally announced December 2020.

arXiv:2011.12882 [pdf, other]

Sparse Multi-Decoder Recursive Projection Aggregation for Reed-Muller Codes

Authors: Dorsa Fathollahi, Nariman Farsad, Seyyed Ali Hashemi, Marco Mondelli

Abstract: Reed-Muller (RM) codes are one of the oldest families of codes. Recently, a recursive projection aggregation (RPA) decoder has been proposed, which achieves a performance that is close to the maximum likelihood decoder for short-length RM codes. One of its main drawbacks, however, is the large amount of computations needed. In this paper, we devise a new algorithm to lower the computational budget… ▽ More Reed-Muller (RM) codes are one of the oldest families of codes. Recently, a recursive projection aggregation (RPA) decoder has been proposed, which achieves a performance that is close to the maximum likelihood decoder for short-length RM codes. One of its main drawbacks, however, is the large amount of computations needed. In this paper, we devise a new algorithm to lower the computational budget while keeping a performance close to that of the RPA decoder. The proposed approach consists of multiple sparse RPAs that are generated by performing only a selection of projections in each sparsified decoder. In the end, a cyclic redundancy check (CRC) is used to decide between output codewords. Simulation results show that our proposed approach reduces the RPA decoder's computations up to $80\%$ with negligible performance loss. △ Less

Submitted 26 November, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: 6 pages, 12 figures

arXiv:2011.10643 [pdf, other]

On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

Authors: Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

Abstract: In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such co… ▽ More In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having $O(\log\frac{1}ε)$ gradient iterations {with constant step size} - and $O(\log\frac{1}ε)$ gossip steps between every pair of these iterations - enables convergence to within $ε$ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression. △ Less

Submitted 20 November, 2020; originally announced November 2020.

arXiv:2010.14919 [pdf, other]

Transferable Universal Adversarial Perturbations Using Generative Models

Authors: Atiye Sadat Hashemi, Andreas Bär, Saeed Mozaffari, Tim Fingscheidt

Abstract: Deep neural networks tend to be vulnerable to adversarial perturbations, which by adding to a natural image can fool a respective model with high confidence. Recently, the existence of image-agnostic perturbations, also known as universal adversarial perturbations (UAPs), were discovered. However, existing UAPs still lack a sufficiently high fooling rate, when being applied to an unknown target mo… ▽ More Deep neural networks tend to be vulnerable to adversarial perturbations, which by adding to a natural image can fool a respective model with high confidence. Recently, the existence of image-agnostic perturbations, also known as universal adversarial perturbations (UAPs), were discovered. However, existing UAPs still lack a sufficiently high fooling rate, when being applied to an unknown target model. In this paper, we propose a novel deep learning technique for generating more transferable UAPs. We utilize a perturbation generator and some given pretrained networks so-called source models to generate UAPs using the ImageNet dataset. Due to the similar feature representation of various model architectures in the first layer, we propose a loss formulation that focuses on the adversarial energy only in the respective first layer of the source models. This supports the transferability of our generated UAPs to any other target model. We further empirically analyze our generated UAPs and demonstrate that these perturbations generalize very well towards different target models. Surpassing the current state of the art in both, fooling rate and model-transferability, we can show the superiority of our proposed approach. Using our generated non-targeted UAPs, we obtain an average fooling rate of 93.36% on the source models (state of the art: 82.16%). Generating our UAPs on the deep ResNet-152, we obtain about a 12% absolute fooling rate advantage vs. cutting-edge methods on VGG-16 and VGG-19 target models. △ Less

Submitted 29 October, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2009.09277 [pdf, ps, other]

Construction of Polar Codes with Reinforcement Learning

Authors: Yun Liao, Seyyed Ali Hashemi, John Cioffi, Andrea Goldsmith

Abstract: This paper formulates the polar-code construction problem for the successive-cancellation list (SCL) decoder as a maze-traversing game, which can be solved by reinforcement learning techniques. The proposed method provides a novel technique for polar-code construction that no longer depends on sorting and selecting bit-channels by reliability. Instead, this technique decides whether the input bits… ▽ More This paper formulates the polar-code construction problem for the successive-cancellation list (SCL) decoder as a maze-traversing game, which can be solved by reinforcement learning techniques. The proposed method provides a novel technique for polar-code construction that no longer depends on sorting and selecting bit-channels by reliability. Instead, this technique decides whether the input bits should be frozen in a purely sequential manner. The equivalence of optimizing the polar-code construction for the SCL decoder under this technique and maximizing the expected reward of traversing a maze is drawn. Simulation results show that the standard polar-code constructions that are designed for the successive-cancellation decoder are no longer optimal for the SCL decoder with respect to the frame error rate. In contrast, the simulations show that, with a reasonable amount of training, the game-based construction method finds code constructions that have lower frame-error rate for various code lengths and decoders compared to standard constructions. △ Less

Submitted 19 September, 2020; originally announced September 2020.

Comments: To be published in Proceedings of IEEE Globecom 2020

arXiv:2009.06796 [pdf, other]

Decoding Polar Codes with Reinforcement Learning

Authors: Nghia Doan, Seyyed Ali Hashemi, Warren Gross

Abstract: In this paper we address the problem of selecting factor-graph permutations of polar codes under belief propagation (BP) decoding to significantly improve the error-correction performance of the code. In particular, we formalize the factor-graph permutation selection as the multi-armed bandit problem in reinforcement learning and propose a decoder that acts like an online-learning agent that learn… ▽ More In this paper we address the problem of selecting factor-graph permutations of polar codes under belief propagation (BP) decoding to significantly improve the error-correction performance of the code. In particular, we formalize the factor-graph permutation selection as the multi-armed bandit problem in reinforcement learning and propose a decoder that acts like an online-learning agent that learns to select the good factor-graph permutations during the course of decoding. We use state-of-the-art algorithms for the multi-armed bandit problem and show that for a 5G polar codes of length 128 with 64 information bits, the proposed decoder has an error-correction performance gain of around 0.125 dB at the target frame error rate of 10^{-4}, when compared to the approach that randomly selects the factor-graph permutations. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Comments: Accepted for presentation at IEEE GLOBECOM 2020

arXiv:2008.12483 [pdf]

A transfer learning metamodel using artificial neural networks applied to natural convection flows in enclosures

Authors: Majid Ashouri, Alireza Hashemi

Abstract: In this paper, we employed a transfer learning technique to predict the Nusselt number for natural convection flows in enclosures. Specifically, we considered the benchmark problem of a two-dimensional square enclosure with isolated horizontal walls and vertical walls at constant temperatures. The Rayleigh and Prandtl numbers are sufficient parameters to simulate this problem numerically. We adopt… ▽ More In this paper, we employed a transfer learning technique to predict the Nusselt number for natural convection flows in enclosures. Specifically, we considered the benchmark problem of a two-dimensional square enclosure with isolated horizontal walls and vertical walls at constant temperatures. The Rayleigh and Prandtl numbers are sufficient parameters to simulate this problem numerically. We adopted two approaches to this problem: Firstly, we made use of a multi-grid dataset in order to train our artificial neural network in a cost-effective manner. By monitoring the training losses for this dataset, we detected any significant anomalies that stemmed from an insufficient grid size, which we further corrected by altering the grid size or adding more data. Secondly, we sought to endow our metamodel with the ability to account for additional input features by performing transfer learning using deep neural networks. We trained a neural network with a single input feature (Rayleigh) and extended it to incorporate the effects of a second feature (Prandtl). We also considered the case of hollow enclosures, demonstrating that our learning framework can be applied to systems with higher physical complexity, while bringing the computational and training costs down. △ Less

Submitted 14 October, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

arXiv:2006.01266 [pdf, other]

Leveraging Affective Bidirectional Transformers for Offensive Language Detection

Authors: AbdelRahim Elmadany, Chiyu Zhang, Muhammad Abdul-Mageed, Azadeh Hashemi

Abstract: Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech. In this work, we report our submission to the Offensive Language and hate-speech Detection shared task organized with the 4th Workshop on Open-Source Arabic Corpora and Processing Tools Arabic (OSACT4). We focus on developing purely deep learning system… ▽ More Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech. In this work, we report our submission to the Offensive Language and hate-speech Detection shared task organized with the 4th Workshop on Open-Source Arabic Corpora and Processing Tools Arabic (OSACT4). We focus on developing purely deep learning systems, without a need for feature engineering. For that purpose, we develop an effective method for automatic data augmentation and show the utility of training both offensive and hate speech models off (i.e., by fine-tuning) previously trained affective models (i.e., sentiment and emotion). Our best models are significantly better than a vanilla BERT model, with 89.60% acc (82.31% macro F1) for hate speech and 95.20% acc (70.51% macro F1) on official TEST data. △ Less

Submitted 16 May, 2020; originally announced June 2020.

arXiv:2005.04394 [pdf, other]

Threshold-Based Fast Successive-Cancellation Decoding of Polar Codes

Authors: Haotian Zheng, Seyyed Ali Hashemi, Alexios Balatsoukas-Stimming, Zizheng Cao, Ton Koonen, John Cioffi, Andrea Goldsmith

Abstract: Fast SC decoding overcomes the latency caused by the serial nature of the SC decoding by identifying new nodes in the upper levels of the SC decoding tree and implementing their fast parallel decoders. In this work, we first present a novel sequence repetition node corresponding to a particular class of bit sequences. Most existing special node types are special cases of the proposed sequence repe… ▽ More Fast SC decoding overcomes the latency caused by the serial nature of the SC decoding by identifying new nodes in the upper levels of the SC decoding tree and implementing their fast parallel decoders. In this work, we first present a novel sequence repetition node corresponding to a particular class of bit sequences. Most existing special node types are special cases of the proposed sequence repetition node. Then, a fast parallel decoder is proposed for this class of node. To further speed up the decoding process of general nodes outside this class, a threshold-based hard-decision-aided scheme is introduced. The threshold value that guarantees a given error-correction performance in the proposed scheme is derived theoretically. Analysis and hardware implementation results on a polar code of length $1024$ with code rates $1/4$, $1/2$, and $3/4$ show that our proposed algorithm reduces the required clock cycles by up to $8\%$, and leads to a $10\%$ improvement in the maximum operating frequency compared to state-of-the-art decoders without tangibly altering the error-correction performance. In addition, using the proposed threshold-based hard-decision-aided scheme, the decoding latency can be further reduced by $57\%$ at $\mathrm{E_b}/\mathrm{N_0} = 5.0$~dB. △ Less

Submitted 27 November, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

Comments: 14 pages, 8 figures, 5 tables, submitted to IEEE Transactions on Communications

arXiv:1912.13072 [pdf, other]

AraNet: A Deep Learning Toolkit for Arabic Social Media

Authors: Muhammad Abdul-Mageed, Chiyu Zhang, Azadeh Hashemi, El Moatez Billah Nagoudi

Abstract: We describe AraNet, a collection of deep learning Arabic social media processing tools. Namely, we exploit an extensive host of publicly available and novel social media datasets to train bidirectional encoders from transformer models (BERT) to predict age, dialect, gender, emotion, irony, and sentiment. AraNet delivers state-of-the-art performance on a number of the cited tasks and competitively… ▽ More We describe AraNet, a collection of deep learning Arabic social media processing tools. Namely, we exploit an extensive host of publicly available and novel social media datasets to train bidirectional encoders from transformer models (BERT) to predict age, dialect, gender, emotion, irony, and sentiment. AraNet delivers state-of-the-art performance on a number of the cited tasks and competitively on others. In addition, AraNet has the advantage of being exclusively based on a deep learning framework and hence feature engineering free. To the best of our knowledge, AraNet is the first to performs predictions across such a wide range of tasks for Arabic NLP and thus meets a critical needs. We publicly release AraNet to accelerate research and facilitate comparisons across the different tasks. △ Less

Submitted 11 April, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

Comments: Accepted by The 4th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT)

arXiv:1912.01086 [pdf, other]

Deep-Learning-Aided Successive-Cancellation Decoding of Polar Codes

Authors: Seyyed Ali Hashemi, Nghia Doan, Thibaud Tonnellier, Warren J. Gross

Abstract: A deep-learning-aided successive-cancellation list (DL-SCL) decoding algorithm for polar codes is introduced with deep-learning-aided successive-cancellation (DL-SC) decoding being a specific case of it. The DL-SCL decoder works by allowing additional rounds of SCL decoding when the first SCL decoding attempt fails, using a novel bit-flipping metric. The proposed bit-flipping metric exploits the i… ▽ More A deep-learning-aided successive-cancellation list (DL-SCL) decoding algorithm for polar codes is introduced with deep-learning-aided successive-cancellation (DL-SC) decoding being a specific case of it. The DL-SCL decoder works by allowing additional rounds of SCL decoding when the first SCL decoding attempt fails, using a novel bit-flipping metric. The proposed bit-flipping metric exploits the inherent relations between the information bits in polar codes that are represented by a correlation matrix. The correlation matrix is then optimized using emerging deep-learning techniques. Performance results on a polar code of length 128 with 64 information bits concatenated with a 24-bit cyclic redundancy check show that the proposed bit-flipping metric in the proposed DL-SCL decoder requires up to 66% fewer multiplications and up to 36% fewer additions, without any need to perform transcendental functions, and by providing almost the same error-correction performance in comparison with the state of the art. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 2019 Asilomar Conference on Signals, Systems, and Computers

Showing 1–50 of 88 results for author: Hashemi, A