-
Elliptic Loss Regularization
Authors:
Ali Hasan,
Haoming Yang,
Yuting Ng,
Vahid Tarokh
Abstract:
Regularizing neural networks is important for anticipating model behavior in regions of the data space that are not well represented. In this work, we propose a regularization technique for enforcing a level of smoothness in the mapping between the data input space and the loss value. We specify the level of regularity by requiring that the loss of the network satisfies an elliptic operator over t…
▽ More
Regularizing neural networks is important for anticipating model behavior in regions of the data space that are not well represented. In this work, we propose a regularization technique for enforcing a level of smoothness in the mapping between the data input space and the loss value. We specify the level of regularity by requiring that the loss of the network satisfies an elliptic operator over the data domain. To do this, we modify the usual empirical risk minimization objective such that we instead minimize a new objective that satisfies an elliptic operator over points within the domain. This allows us to use existing theory on elliptic operators to anticipate the behavior of the error for points outside the training set. We propose a tractable computational method that approximates the behavior of the elliptic operator while being computationally efficient. Finally, we analyze the properties of the proposed regularization to understand the performance on common problems of distribution shift and group imbalance. Numerical experiments confirm the utility of the proposed regularization technique.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Parabolic Continual Learning
Authors:
Haoming Yang,
Ali Hasan,
Vahid Tarokh
Abstract:
Regularizing continual learning techniques is important for anticipating algorithmic behavior under new realizations of data. We introduce a new approach to continual learning by imposing the properties of a parabolic partial differential equation (PDE) to regularize the expected behavior of the loss over time. This class of parabolic PDEs has a number of favorable properties that allow us to anal…
▽ More
Regularizing continual learning techniques is important for anticipating algorithmic behavior under new realizations of data. We introduce a new approach to continual learning by imposing the properties of a parabolic partial differential equation (PDE) to regularize the expected behavior of the loss over time. This class of parabolic PDEs has a number of favorable properties that allow us to analyze the error incurred through forgetting and the error induced through generalization. Specifically, we do this through imposing boundary conditions where the boundary is given by a memory buffer. By using the memory buffer as a boundary, we can enforce long term dependencies by bounding the expected error by the boundary loss. Finally, we illustrate the empirical performance of the method on a series of continual learning tasks.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Teleportation With Null Space Gradient Projection for Optimization Acceleration
Authors:
Zihao Wu,
Juncheng Dong,
Ahmed Aloui,
Vahid Tarokh
Abstract:
Optimization techniques have become increasingly critical due to the ever-growing model complexity and data scale. In particular, teleportation has emerged as a promising approach, which accelerates convergence of gradient descent-based methods by navigating within the loss invariant level set to identify parameters with advantageous geometric properties. Existing teleportation algorithms have pri…
▽ More
Optimization techniques have become increasingly critical due to the ever-growing model complexity and data scale. In particular, teleportation has emerged as a promising approach, which accelerates convergence of gradient descent-based methods by navigating within the loss invariant level set to identify parameters with advantageous geometric properties. Existing teleportation algorithms have primarily demonstrated their effectiveness in optimizing Multi-Layer Perceptrons (MLPs), but their extension to more advanced architectures, such as Convolutional Neural Networks (CNNs) and Transformers, remains challenging. Moreover, they often impose significant computational demands, limiting their applicability to complex architectures. To this end, we introduce an algorithm that projects the gradient of the teleportation objective function onto the input null space, effectively preserving the teleportation within the loss invariant level set and reducing computational cost. Our approach is readily generalizable from MLPs to CNNs, transformers, and potentially other advanced architectures. We validate the effectiveness of our algorithm across various benchmark datasets and optimizers, demonstrating its broad applicability.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting
Authors:
Zihao Wu,
Juncheng Dong,
Haoming Yang,
Vahid Tarokh
Abstract:
Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. Despite their state-of-the-art performance, we identify two potential areas for improvement. First, the variates of the multivariate time series are processed independently. Moreover, the multi-scale (long and short range) representations ar…
▽ More
Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. Despite their state-of-the-art performance, we identify two potential areas for improvement. First, the variates of the multivariate time series are processed independently. Moreover, the multi-scale (long and short range) representations are learned separately by two independent models without communication. In light of these concerns, we propose State Space Transformer with cross-attention (S2TX). S2TX employs a cross-attention mechanism to integrate a Mamba model for extracting long-range cross-variate context and a Transformer model with local window attention to capture short-range representations. By cross-attending to the global context, the Transformer model further facilitates variate-level interactions as well as local/global communications. Comprehensive experiments on seven classic long-short range time-series forecasting benchmark datasets demonstrate that S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Score-Based Metropolis-Hastings Algorithms
Authors:
Ahmed Aloui,
Ali Hasan,
Juncheng Dong,
Zihao Wu,
Vahid Tarokh
Abstract:
In this paper, we introduce a new approach for integrating score-based models with the Metropolis-Hastings algorithm. While traditional score-based diffusion models excel in accurately learning the score function from data points, they lack an energy function, making the Metropolis-Hastings adjustment step inaccessible. Consequently, the unadjusted Langevin algorithm is often used for sampling usi…
▽ More
In this paper, we introduce a new approach for integrating score-based models with the Metropolis-Hastings algorithm. While traditional score-based diffusion models excel in accurately learning the score function from data points, they lack an energy function, making the Metropolis-Hastings adjustment step inaccessible. Consequently, the unadjusted Langevin algorithm is often used for sampling using estimated score functions. The lack of an energy function then prevents the application of the Metropolis-adjusted Langevin algorithm and other Metropolis-Hastings methods, limiting the wealth of other algorithms developed that use acceptance functions. We address this limitation by introducing a new loss function based on the \emph{detailed balance condition}, allowing the estimation of the Metropolis-Hastings acceptance probabilities given a learned score function. We demonstrate the effectiveness of the proposed method for various scenarios, including sampling from heavy-tail distributions.
△ Less
Submitted 31 March, 2025; v1 submitted 31 December, 2024;
originally announced January 2025.
-
Learn2Mix: Training Neural Networks Using Adaptive Data Integration
Authors:
Shyam Venkatasubramanian,
Vahid Tarokh
Abstract:
Accelerating model convergence within resource-constrained environments is critical to ensure fast and efficient neural network training. This work presents learn2mix, a novel training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Unlike classical training methods that use static class proportions, learn2mix continually adapts class…
▽ More
Accelerating model convergence within resource-constrained environments is critical to ensure fast and efficient neural network training. This work presents learn2mix, a novel training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Unlike classical training methods that use static class proportions, learn2mix continually adapts class proportions during training, leading to faster convergence. Empirical evaluations conducted on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with existing approaches, achieving improved results for classification, regression, and reconstruction tasks under limited training resources and with imbalanced classes. Our empirical findings are supported by theoretical analysis.
△ Less
Submitted 13 February, 2025; v1 submitted 20 December, 2024;
originally announced December 2024.
-
Offline Stochastic Optimization of Black-Box Objective Functions
Authors:
Juncheng Dong,
Zihao Wu,
Hamid Jafarkhani,
Ali Pezeshki,
Vahid Tarokh
Abstract:
Many challenges in science and engineering, such as drug discovery and communication network design, involve optimizing complex and expensive black-box functions across vast search spaces. Thus, it is essential to leverage existing data to avoid costly active queries of these black-box functions. To this end, while Offline Black-Box Optimization (BBO) is effective for deterministic problems, it ma…
▽ More
Many challenges in science and engineering, such as drug discovery and communication network design, involve optimizing complex and expensive black-box functions across vast search spaces. Thus, it is essential to leverage existing data to avoid costly active queries of these black-box functions. To this end, while Offline Black-Box Optimization (BBO) is effective for deterministic problems, it may fall short in capturing the stochasticity of real-world scenarios. To address this, we introduce Stochastic Offline BBO (SOBBO), which tackles both black-box objectives and uncontrolled uncertainties. We propose two solutions: for large-data regimes, a differentiable surrogate allows for gradient-based optimization, while for scarce-data regimes, we directly estimate gradients under conservative field constraints, improving robustness, convergence, and data efficiency. Numerical experiments demonstrate the effectiveness of our approach on both synthetic and real-world tasks.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians
Authors:
William N. Caballero,
Matthew LaRosa,
Alexander Fisher,
Vahid Tarokh
Abstract:
The multivariate Gaussian distribution underpins myriad operations-research, decision-analytic, and machine-learning models (e.g., Bayesian optimization, Gaussian influence diagrams, and variational autoencoders). However, despite recent advances in adversarial machine learning (AML), inference for Gaussian models in the presence of an adversary is notably understudied. Therefore, we consider a se…
▽ More
The multivariate Gaussian distribution underpins myriad operations-research, decision-analytic, and machine-learning models (e.g., Bayesian optimization, Gaussian influence diagrams, and variational autoencoders). However, despite recent advances in adversarial machine learning (AML), inference for Gaussian models in the presence of an adversary is notably understudied. Therefore, we consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables. To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence. We consider white- and grey-box settings such that the attacker has complete and incomplete knowledge about the decisionmaker's underlying multivariate Gaussian distribution, respectively. Select instances are shown to reduce to quadratic and stochastic quadratic programs, and structural properties are derived to inform solution methods. We assess the impact and efficacy of these attacks in three examples, including, real estate evaluation, interest rate estimation and signals processing. Each example leverages an alternative underlying model, thereby highlighting the attacks' broad applicability. Through these applications, we also juxtapose the behavior of the white- and grey-box attacks to understand how uncertainty and structure affect attacker behavior.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions
Authors:
Arman Adibi,
Sanjeev Kulkarni,
H. Vincent Poor,
Taposh Banerjee,
Vahid Tarokh
Abstract:
This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible. This situation happens in many scenarios in physics such as in ferromagnetism, crystallography, magneto-hydrodynamics, and thermodynamics, where the energy models are difficult to normalize.
Our approach is based on the estimation of the Cumulative Sum (CUSUM) statistic…
▽ More
This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible. This situation happens in many scenarios in physics such as in ferromagnetism, crystallography, magneto-hydrodynamics, and thermodynamics, where the energy models are difficult to normalize.
Our approach is based on the estimation of the Cumulative Sum (CUSUM) statistics, which is known to produce optimal performance. We first present an intuitively appealing approximation method. Unfortunately, this produces a biased estimator of the CUSUM statistics and may cause performance degradation. We then propose the Log-Partition Approximation Cumulative Sum (LPA-CUSUM) algorithm based on thermodynamic integration (TI) in order to estimate the log-ratio of normalizing constants of pre- and post-change distributions. It is proved that this approach gives an unbiased estimate of the log-partition function and the CUSUM statistics, and leads to an asymptotically optimal performance. Moreover, we derive a relationship between the required sample size for thermodynamic integration and the desired detection delay performance, offering guidelines for practical parameter selection. Numerical studies are provided demonstrating the efficacy of our approach.
△ Less
Submitted 11 February, 2025; v1 submitted 18 October, 2024;
originally announced October 2024.
-
Steinmetz Neural Networks for Complex-Valued Data
Authors:
Shyam Venkatasubramanian,
Ali Pezeshki,
Vahid Tarokh
Abstract:
We introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valued subnetworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, incorporates multi-view learning to construct more interpretable representations in the latent space. Moreover, we present the Analytic Neural Network, which incorporates a consis…
▽ More
We introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valued subnetworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, incorporates multi-view learning to construct more interpretable representations in the latent space. Moreover, we present the Analytic Neural Network, which incorporates a consistency penalty that encourages analytic signal representations in the latent space of the Steinmetz neural network. This penalty enforces a deterministic and orthogonal relationship between the real and imaginary components. Using an information-theoretic construction, we demonstrate that the generalization gap upper bound posited by the analytic neural network is lower than that of the general class of Steinmetz neural networks. Our numerical experiments depict the improved performance and robustness to additive noise, afforded by our proposed networks on benchmark datasets and synthetic examples.
△ Less
Submitted 13 February, 2025; v1 submitted 16 September, 2024;
originally announced September 2024.
-
DynamicFL: Federated Learning with Dynamic Communication Resource Allocation
Authors:
Qi Le,
Enmao Diao,
Xinran Wang,
Vahid Tarokh,
Jie Ding,
Ali Anwar
Abstract:
Federated Learning (FL) is a collaborative machine learning framework that allows multiple users to train models utilizing their local data in a distributed manner. However, considerable statistical heterogeneity in local data across devices often leads to suboptimal model performance compared with independently and identically distributed (IID) data scenarios. In this paper, we introduce DynamicF…
▽ More
Federated Learning (FL) is a collaborative machine learning framework that allows multiple users to train models utilizing their local data in a distributed manner. However, considerable statistical heterogeneity in local data across devices often leads to suboptimal model performance compared with independently and identically distributed (IID) data scenarios. In this paper, we introduce DynamicFL, a new FL framework that investigates the trade-offs between global model performance and communication costs for two widely adopted FL methods: Federated Stochastic Gradient Descent (FedSGD) and Federated Averaging (FedAvg). Our approach allocates diverse communication resources to clients based on their data statistical heterogeneity, considering communication resource constraints, and attains substantial performance enhancements compared to uniform communication resource allocation. Notably, our method bridges the gap between FedSGD and FedAvg, providing a flexible framework leveraging communication heterogeneity to address statistical heterogeneity in FL. Through extensive experiments, we demonstrate that DynamicFL surpasses current state-of-the-art methods with up to a 10% increase in model accuracy, demonstrating its adaptability and effectiveness in tackling data statistical heterogeneity challenges.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions
Authors:
Patrick Kuiper,
Ali Hasan,
Wenhao Yang,
Yuting Ng,
Hoda Bidkhori,
Jose Blanchet,
Vahid Tarokh
Abstract:
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from spatial Poisson point processes. While powerful, these models are only asymptotically valid for large samples. However, since extreme data is by defin…
▽ More
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from spatial Poisson point processes. While powerful, these models are only asymptotically valid for large samples. However, since extreme data is by definition scarce, the potential for model misspecification error is inherent to these applications, thus DRO estimators are natural. In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes. We study both tractable convex formulations for some problems of interest (e.g. CVaR) and more general neural network based estimators. Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques. Additionally, the proposed method is applied to a real data set of financial returns for comparison to a previous analysis. We established the proposed model as a novel formulation in the multivariate EVT domain, and innovative with respect to performance when compared to relevant alternate proposals.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Generative Learning for Simulation of Vehicle Faults
Authors:
Patrick Kuiper,
Sirui Lin,
Jose Blanchet,
Vahid Tarokh
Abstract:
We develop a novel generative model to simulate vehicle health and forecast faults, conditioned on practical operational considerations. The model, trained on data from the US Army's Predictive Logistics program, aims to support predictive maintenance. It forecasts faults far enough in advance to execute a maintenance intervention before a breakdown occurs. The model incorporates real-world factor…
▽ More
We develop a novel generative model to simulate vehicle health and forecast faults, conditioned on practical operational considerations. The model, trained on data from the US Army's Predictive Logistics program, aims to support predictive maintenance. It forecasts faults far enough in advance to execute a maintenance intervention before a breakdown occurs. The model incorporates real-world factors that affect vehicle health. It also allows us to understand the vehicle's condition by analyzing operating data, and characterizing each vehicle into discrete states. Importantly, the model predicts the time to first fault with high accuracy. We compare its performance to other models and demonstrate its successful training.
△ Less
Submitted 30 July, 2024; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Base Models for Parabolic Partial Differential Equations
Authors:
Xingzi Xu,
Ali Hasan,
Jie Ding,
Vahid Tarokh
Abstract:
Parabolic partial differential equations (PDEs) appear in many disciplines to model the evolution of various mathematical objects, such as probability flows, value functions in control theory, and derivative prices in finance. It is often necessary to compute the solutions or a function of the solutions to a parametric PDE in multiple scenarios corresponding to different parameters of this PDE. Th…
▽ More
Parabolic partial differential equations (PDEs) appear in many disciplines to model the evolution of various mathematical objects, such as probability flows, value functions in control theory, and derivative prices in finance. It is often necessary to compute the solutions or a function of the solutions to a parametric PDE in multiple scenarios corresponding to different parameters of this PDE. This process often requires resolving the PDEs from scratch, which is time-consuming. To better employ existing simulations for the PDEs, we propose a framework for finding solutions to parabolic PDEs across different scenarios by meta-learning an underlying base distribution. We build upon this base distribution to propose a method for computing solutions to parametric PDEs under different parameter settings. Finally, we illustrate the application of the proposed methods through extensive experiments in generative modeling, stochastic control, and finance. The empirical results suggest that the proposed approach improves generalization to solving PDEs under new parameter regimes.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
RASPNet: A Benchmark Dataset for Radar Adaptive Signal Processing Applications
Authors:
Shyam Venkatasubramanian,
Bosung Kang,
Ali Pezeshki,
Muralidhar Rangaswamy,
Vahid Tarokh
Abstract:
We present a large-scale dataset for radar adaptive signal processing (RASP) applications to support the development of data-driven models within the adaptive radar community. The dataset, RASPNet, exceeds 16 TB in size and comprises 100 realistic scenarios compiled over a variety of topographies and land types from across the contiguous United States. For each scenario, RASPNet consists of 10,000…
▽ More
We present a large-scale dataset for radar adaptive signal processing (RASP) applications to support the development of data-driven models within the adaptive radar community. The dataset, RASPNet, exceeds 16 TB in size and comprises 100 realistic scenarios compiled over a variety of topographies and land types from across the contiguous United States. For each scenario, RASPNet consists of 10,000 clutter realizations from an airborne radar setting, which can be used to benchmark radar and complex-valued learning algorithms. RASPNet intends to fill a prominent gap in the availability of a large-scale, realistic dataset that standardizes the evaluation of adaptive radar processing techniques and complex-valued neural networks. We outline its construction, organization, and several applications, including a transfer learning example to demonstrate how RASPNet can be used for realistic adaptive radar processing scenarios.
△ Less
Submitted 14 February, 2025; v1 submitted 13 June, 2024;
originally announced June 2024.
-
ColA: Collaborative Adaptation with Gradient Learning
Authors:
Enmao Diao,
Qi Le,
Suya Wu,
Xinran Wang,
Ali Anwar,
Jie Ding,
Vahid Tarokh
Abstract:
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational over…
▽ More
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Neural McKean-Vlasov Processes: Distributional Dependence in Diffusion Processes
Authors:
Haoming Yang,
Ali Hasan,
Yuting Ng,
Vahid Tarokh
Abstract:
McKean-Vlasov stochastic differential equations (MV-SDEs) provide a mathematical description of the behavior of an infinite number of interacting particles by imposing a dependence on the particle density. As such, we study the influence of explicitly including distributional information in the parameterization of the SDE. We propose a series of semi-parametric methods for representing MV-SDEs, an…
▽ More
McKean-Vlasov stochastic differential equations (MV-SDEs) provide a mathematical description of the behavior of an infinite number of interacting particles by imposing a dependence on the particle density. As such, we study the influence of explicitly including distributional information in the parameterization of the SDE. We propose a series of semi-parametric methods for representing MV-SDEs, and corresponding estimators for inferring parameters from data based on the properties of the MV-SDE. We analyze the characteristics of the different architectures and estimators, and consider their applicability in relevant machine learning problems. We empirically compare the performance of the different architectures and estimators on real and synthetic datasets for time series and probabilistic modeling. The results suggest that explicitly including distributional dependence in the parameterization of the SDE is effective in modeling temporal data with interaction under an exchangeability assumption while maintaining strong performance for standard Itô-SDEs due to the richer class of probability flows associated with MV-SDEs.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Random Linear Projections Loss for Hyperplane-Based Optimization in Neural Networks
Authors:
Shyam Venkatasubramanian,
Ahmed Aloui,
Vahid Tarokh
Abstract:
Advancing loss function design is pivotal for optimizing neural network training and performance. This work introduces Random Linear Projections (RLP) loss, a novel approach that enhances training efficiency by leveraging geometric relationships within the data. Distinct from traditional loss functions that target minimizing pointwise errors, RLP loss operates by minimizing the distance between se…
▽ More
Advancing loss function design is pivotal for optimizing neural network training and performance. This work introduces Random Linear Projections (RLP) loss, a novel approach that enhances training efficiency by leveraging geometric relationships within the data. Distinct from traditional loss functions that target minimizing pointwise errors, RLP loss operates by minimizing the distance between sets of hyperplanes connecting fixed-size subsets of feature-prediction pairs and feature-label pairs. Our empirical evaluations, conducted across benchmark datasets and synthetic examples, demonstrate that neural networks trained with RLP loss outperform those trained with traditional loss functions, achieving improved performance with fewer data samples, and exhibiting greater robustness to additive noise. We provide theoretical analysis supporting our empirical findings.
△ Less
Submitted 30 May, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Counterfactual Data Augmentation with Contrastive Learning
Authors:
Ahmed Aloui,
Juncheng Dong,
Cat P. Le,
Vahid Tarokh
Abstract:
Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a simila…
▽ More
Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Off-Policy Evaluation for Human Feedback
Authors:
Qitong Gao,
Ge Gao,
Juncheng Dong,
Vahid Tarokh,
Min Chi,
Miroslav Pajic
Abstract:
Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare.…
▽ More
Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, as well as in a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.
△ Less
Submitted 14 October, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Perceiver-based CDF Modeling for Time Series Forecasting
Authors:
Cat P. Le,
Chris Cannella,
Ali Hasan,
Yuting Ng,
Vahid Tarokh
Abstract:
Transformers have demonstrated remarkable efficacy in forecasting time series data. However, their extensive dependence on self-attention mechanisms demands significant computational resources, thereby limiting their practical applicability across diverse tasks, especially in multimodal problems. In this work, we propose a new architecture, called perceiver-CDF, for modeling cumulative distributio…
▽ More
Transformers have demonstrated remarkable efficacy in forecasting time series data. However, their extensive dependence on self-attention mechanisms demands significant computational resources, thereby limiting their practical applicability across diverse tasks, especially in multimodal problems. In this work, we propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data. Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction. By leveraging the perceiver, our model efficiently transforms high-dimensional and multimodal data into a compact latent space, thereby significantly reducing computational demands. Subsequently, we implement a copula-based attention mechanism to construct the joint distribution of missing data for prediction. Further, we propose an output variance testing mechanism to effectively mitigate error propagation during prediction. To enhance efficiency and reduce complexity, we introduce midpoint inference for the local attention mechanism. This enables the model to efficiently capture dependencies within nearby imputed samples without considering all previous samples. The experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods while utilizing less than half of the computational resources.
△ Less
Submitted 24 June, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Treatment Effects in Extreme Regimes
Authors:
Ahmed Aloui,
Ali Hasan,
Yuting Ng,
Miroslav Pajic,
Vahid Tarokh
Abstract:
Understanding treatment effects in extreme regimes is important for characterizing risks associated with different interventions. This is hindered by the unavailability of counterfactual outcomes and the rarity and difficulty of collecting extreme data in practice. To address this issue, we propose a new framework based on extreme value theory for estimating treatment effects in extreme regimes. W…
▽ More
Understanding treatment effects in extreme regimes is important for characterizing risks associated with different interventions. This is hindered by the unavailability of counterfactual outcomes and the rarity and difficulty of collecting extreme data in practice. To address this issue, we propose a new framework based on extreme value theory for estimating treatment effects in extreme regimes. We quantify these effects using variations in tail decay rates of potential outcomes in the presence and absence of treatments. We establish algorithms for calculating these quantities and develop related theoretical results. We demonstrate the efficacy of our approach on various standard synthetic and semi-synthetic datasets.
△ Less
Submitted 22 May, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Causal Mediation Analysis with Multi-dimensional and Indirectly Observed Mediators
Authors:
Ziyang Jiang,
Yiling Liu,
Michael H. Klein,
Ahmed Aloui,
Yiman Ren,
Keyu Li,
Vahid Tarokh,
David Carlson
Abstract:
Causal mediation analysis (CMA) is a powerful method to dissect the total effect of a treatment into direct and mediated effects within the potential outcome framework. This is important in many scientific applications to identify the underlying mechanisms of a treatment effect. However, in many scientific applications the mediator is unobserved, but there may exist related measurements. For examp…
▽ More
Causal mediation analysis (CMA) is a powerful method to dissect the total effect of a treatment into direct and mediated effects within the potential outcome framework. This is important in many scientific applications to identify the underlying mechanisms of a treatment effect. However, in many scientific applications the mediator is unobserved, but there may exist related measurements. For example, we may want to identify how changes in brain activity or structure mediate an antidepressant's effect on behavior, but we may only have access to electrophysiological or imaging brain measurements. To date, most CMA methods assume that the mediator is one-dimensional and observable, which oversimplifies such real-world scenarios. To overcome this limitation, we introduce a CMA framework that can handle complex and indirectly observed mediators based on the identifiable variational autoencoder (iVAE) architecture. We prove that the true joint distribution over observed and latent variables is identifiable with the proposed method. Additionally, our framework captures a disentangled representation of the indirectly observed mediator and yields accurate estimation of the direct and mediated effects in synthetic and semi-synthetic experiments, providing evidence of its potential utility in real-world applications.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Robust Reinforcement Learning through Efficient Adversarial Herding
Authors:
Juncheng Dong,
Hao-Lun Hsu,
Qitong Gao,
Vahid Tarokh,
Miroslav Pajic
Abstract:
Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to potential disturbances. Adversarial training using a two-player max-min game has been proven effective in enhancing the robustness of RL agents. In this work, we…
▽ More
Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to potential disturbances. Adversarial training using a two-player max-min game has been proven effective in enhancing the robustness of RL agents. In this work, we extend the two-player game by introducing an adversarial herd, which involves a group of adversaries, in order to address ($\textit{i}$) the difficulty of the inner optimization problem, and ($\textit{ii}$) the potential over pessimism caused by the selection of a candidate adversary set that may include unlikely scenarios. We first prove that adversarial herds can efficiently approximate the inner optimization problem. Then we address the second issue by replacing the worst-case performance in the inner optimization with the average performance over the worst-$k$ adversaries. We evaluate the proposed method on multiple MuJoCo environments. Experimental results demonstrate that our approach consistently generates more robust policies.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Deep Generalized Green's Functions
Authors:
Rixi Peng,
Juncheng Dong,
Jordan Malof,
Willie J. Padilla,
Vahid Tarokh
Abstract:
In this study, we address the challenge of obtaining a Green's function operator for linear partial differential equations (PDEs). The Green's function is well-sought after due to its ability to directly map inputs to solutions, bypassing the need for common numerical methods such as finite difference and finite elements methods. However, obtaining an explicit form of the Green's function kernel f…
▽ More
In this study, we address the challenge of obtaining a Green's function operator for linear partial differential equations (PDEs). The Green's function is well-sought after due to its ability to directly map inputs to solutions, bypassing the need for common numerical methods such as finite difference and finite elements methods. However, obtaining an explicit form of the Green's function kernel for most PDEs has been a challenge due to the Dirac delta function singularity present. To address this issue, we propose the Deep Generalized Green's Function (DGGF) as an alternative, which can be solved for in an efficient and accurate manner using neural network models. The DGGF provides a more efficient and precise approach to solving linear PDEs while inheriting the reusability of the Green's function, and possessing additional desirable properties such as mesh-free operation and a small memory footprint. The DGGF is compared against a variety of state-of-the-art (SOTA) PDE solvers, including direct methods, namely physics-informed neural networks (PINNs), Green's function approaches such as networks for Gaussian approximation of the Dirac delta functions (GADD), and numerical Green's functions (NGFs). The performance of all methods is compared on four representative PDE categories, each with different combinations of dimensionality and domain shape. The results confirm the advantages of DGGFs, and benefits of Generalized Greens Functions as an novel alternative approach to solve PDEs without suffering from singularities.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Mode-Aware Continual Learning for Conditional Generative Adversarial Networks
Authors:
Cat P. Le,
Juncheng Dong,
Ahmed Aloui,
Vahid Tarokh
Abstract:
The main challenge in continual learning for generative models is to effectively learn new target modes with limited samples while preserving previously learned ones. To this end, we introduce a new continual learning approach for conditional generative adversarial networks by leveraging a mode-affinity score specifically designed for generative modeling. First, the generator produces samples of e…
▽ More
The main challenge in continual learning for generative models is to effectively learn new target modes with limited samples while preserving previously learned ones. To this end, we introduce a new continual learning approach for conditional generative adversarial networks by leveraging a mode-affinity score specifically designed for generative modeling. First, the generator produces samples of existing modes for subsequent replay. The discriminator is then used to compute the mode similarity measure, which identifies a set of closest existing modes to the target. Subsequently, a label for the target mode is generated and given as a weighted average of the labels within this set. We extend the continual learning model by training it on the target data with the newly-generated label, while performing memory replay to mitigate the risk of catastrophic forgetting. Experimental results on benchmark datasets demonstrate the gains of our continual learning approach over the state-of-the-art methods, even when using fewer training samples.
△ Less
Submitted 23 September, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Neural Network Accelerated Process Design of Polycrystalline Microstructures
Authors:
Junrong Lin,
Mahmudul Hasan,
Pinar Acar,
Jose Blanchet,
Vahid Tarokh
Abstract:
Computational experiments are exploited in finding a well-designed processing path to optimize material structures for desired properties. This requires understanding the interplay between the processing-(micro)structure-property linkages using a multi-scale approach that connects the macro-scale (process parameters) to meso (homogenized properties) and micro (crystallographic texture) scales. Due…
▽ More
Computational experiments are exploited in finding a well-designed processing path to optimize material structures for desired properties. This requires understanding the interplay between the processing-(micro)structure-property linkages using a multi-scale approach that connects the macro-scale (process parameters) to meso (homogenized properties) and micro (crystallographic texture) scales. Due to the nature of the problem's multi-scale modeling setup, possible processing path choices could grow exponentially as the decision tree becomes deeper, and the traditional simulators' speed reaches a critical computational threshold. To lessen the computational burden for predicting microstructural evolution under given loading conditions, we develop a neural network (NN)-based method with physics-infused constraints. The NN aims to learn the evolution of microstructures under each elementary process. Our method is effective and robust in finding optimal processing paths. In this study, our NN-based method is applied to maximize the homogenized stiffness of a Copper microstructure, and it is found to be 686 times faster while achieving 0.053% error in the resulting homogenized stiffness compared to the traditional finite element simulator on a 10-process experiment.
△ Less
Submitted 3 May, 2023; v1 submitted 11 April, 2023;
originally announced May 2023.
-
Subspace Perturbation Analysis for Data-Driven Radar Target Localization
Authors:
Shyam Venkatasubramanian,
Sandeep Gogineni,
Bosung Kang,
Ali Pezeshki,
Muralidhar Rangaswamy,
Vahid Tarokh
Abstract:
Recent works exploring data-driven approaches to classical problems in adaptive radar have demonstrated promising results pertaining to the task of radar target localization. Via the use of space-time adaptive processing (STAP) techniques and convolutional neural networks, these data-driven approaches to target localization have helped benchmark the performance of neural networks for matched scena…
▽ More
Recent works exploring data-driven approaches to classical problems in adaptive radar have demonstrated promising results pertaining to the task of radar target localization. Via the use of space-time adaptive processing (STAP) techniques and convolutional neural networks, these data-driven approaches to target localization have helped benchmark the performance of neural networks for matched scenarios. However, the thorough bridging of these topics across mismatched scenarios still remains an open problem. As such, in this work, we augment our data-driven approach to radar target localization by performing a subspace perturbation analysis, which allows us to benchmark the localization accuracy of our proposed deep learning framework across mismatched scenarios. To evaluate this framework, we generate comprehensive datasets by randomly placing targets of variable strengths in mismatched constrained areas via RFView, a high-fidelity, site-specific modeling and simulation tool. For the radar returns from these constrained areas, we generate heatmap tensors in range, azimuth, and elevation using the normalized adaptive matched filter (NAMF) test statistic. We estimate target locations from these heatmap tensors using a convolutional neural network, and demonstrate that the predictive performance of our framework in the presence of mismatches can be predetermined.
△ Less
Submitted 21 March, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Pruning Deep Neural Networks from a Sparsity Perspective
Authors:
Enmao Diao,
Ganghua Wang,
Jiawei Zhan,
Yuhong Yang,
Jie Ding,
Vahid Tarokh
Abstract:
In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. Many deep pruning algorithms have been proposed with impressive empi…
▽ More
In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. Many deep pruning algorithms have been proposed with impressive empirical success. However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may under-prune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. Our extensive experiments corroborate the hypothesis that for a generic pruning procedure, PQI decreases first when a large model is being effectively regularized and then increases when its compressibility reaches a limit that appears to correspond to the beginning of underfitting. Subsequently, PQI decreases again when the model collapse and significant deterioration in the performance of the model start to occur. Additionally, our experiments demonstrate that the proposed adaptive pruning algorithm with proper choice of hyper-parameters is superior to the iterative pruning algorithms such as the lottery ticket-based pruning methods, in terms of both compression efficiency and robustness.
△ Less
Submitted 23 August, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
PASTA: Pessimistic Assortment Optimization
Authors:
Juncheng Dong,
Weibin Mo,
Zhengling Qi,
Cong Shi,
Ethan X. Fang,
Vahid Tarokh
Abstract:
We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimiza…
▽ More
We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimization, the problem of insufficient data coverage is likely to occur in the offline dataset. Therefore, designing a provably efficient offline learning algorithm becomes a significant challenge. To this end, we propose an algorithm referred to as Pessimistic ASsortment opTimizAtion (PASTA for short) designed based on the principle of pessimism, that can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. In particular, we establish a regret bound for the offline assortment optimization problem under the celebrated multinomial logit model. We also propose an efficient computational procedure to solve our pessimistic assortment optimization problem. Numerical studies demonstrate the superiority of the proposed method over the existing baseline method.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
Domain Adaptation via Rebalanced Sub-domain Alignment
Authors:
Yiling Liu,
Juncheng Dong,
Ziyang Jiang,
Ahmed Aloui,
Keyu Li,
Hunter Klein,
Vahid Tarokh,
David Carlson
Abstract:
Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in real-world scenarios. To address this limitati…
▽ More
Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in real-world scenarios. To address this limitation, we propose a novel generalization bound that reweights source classification error by aligning source and target sub-domains. We prove that our proposed generalization bound is at least as strong as existing bounds under realistic assumptions, and we empirically show that it is much stronger on real-world data. We then propose an algorithm to minimize this novel generalization bound. We demonstrate by numerical experiments that this approach improves performance in shifted class distribution scenarios compared to state-of-the-art methods.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Quickest Change Detection for Unnormalized Statistical Models
Authors:
Suya Wu,
Enmao Diao,
Taposh Banerjee,
Jie Ding,
Vahid Tarokh
Abstract:
Classical quickest change detection algorithms require modeling pre-change and post-change distributions. Such an approach may not be feasible for various machine learning models because of the complexity of computing the explicit distributions. Additionally, these methods may suffer from a lack of robustness to model mismatch and noise. This paper develops a new variant of the classical Cumulativ…
▽ More
Classical quickest change detection algorithms require modeling pre-change and post-change distributions. Such an approach may not be feasible for various machine learning models because of the complexity of computing the explicit distributions. Additionally, these methods may suffer from a lack of robustness to model mismatch and noise. This paper develops a new variant of the classical Cumulative Sum (CUSUM) algorithm for the quickest change detection. This variant is based on Fisher divergence and the Hyvärinen score and is called the Score-based CUSUM (SCUSUM) algorithm. The SCUSUM algorithm allows the applications of change detection for unnormalized statistical models, i.e., models for which the probability density function contains an unknown normalization constant. The asymptotic optimality of the proposed algorithm is investigated by deriving expressions for average detection delay and the mean running time to a false alarm. Numerical results are provided to demonstrate the performance of the proposed algorithm.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Personalized Federated Recommender Systems with Private and Partially Federated AutoEncoders
Authors:
Qi Le,
Enmao Diao,
Xinran Wang,
Ali Anwar,
Vahid Tarokh,
Jie Ding
Abstract:
Recommender Systems (RSs) have become increasingly important in many application domains, such as digital marketing. Conventional RSs often need to collect users' data, centralize them on the server-side, and form a global model to generate reliable recommendations. However, they suffer from two critical limitations: the personalization problem that the RSs trained traditionally may not be customi…
▽ More
Recommender Systems (RSs) have become increasingly important in many application domains, such as digital marketing. Conventional RSs often need to collect users' data, centralize them on the server-side, and form a global model to generate reliable recommendations. However, they suffer from two critical limitations: the personalization problem that the RSs trained traditionally may not be customized for individual users, and the privacy problem that directly sharing user data is not encouraged. We propose Personalized Federated Recommender Systems (PersonalFR), which introduces a personalized autoencoder-based recommendation model with Federated Learning (FL) to address these challenges. PersonalFR guarantees that each user can learn a personal model from the local dataset and other participating users' data without sharing local data, data embeddings, or models. PersonalFR consists of three main components, including AutoEncoder-based RSs (ARSs) that learn the user-item interactions, Partially Federated Learning (PFL) that updates the encoder locally and aggregates the decoder on the server-side, and Partial Compression (PC) that only computes and transmits active model parameters. Extensive experiments on two real-world datasets demonstrate that PersonalFR can achieve private and personalized performance comparable to that trained by centralizing all users' data. Moreover, PersonalFR requires significantly less computation and communication overhead than standard FL baselines.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Transfer Learning for Individual Treatment Effect Estimation
Authors:
Ahmed Aloui,
Juncheng Dong,
Cat P. Le,
Vahid Tarokh
Abstract:
This work considers the problem of transferring causal knowledge between tasks for Individual Treatment Effect (ITE) estimation. To this end, we theoretically assess the feasibility of transferring ITE knowledge and present a practical framework for efficient transfer. A lower bound is introduced on the ITE error of the target task to demonstrate that ITE knowledge transfer is challenging due to t…
▽ More
This work considers the problem of transferring causal knowledge between tasks for Individual Treatment Effect (ITE) estimation. To this end, we theoretically assess the feasibility of transferring ITE knowledge and present a practical framework for efficient transfer. A lower bound is introduced on the ITE error of the target task to demonstrate that ITE knowledge transfer is challenging due to the absence of counterfactual information. Nevertheless, we establish generalization upper bounds on the counterfactual loss and ITE error of the target task, demonstrating the feasibility of ITE knowledge transfer. Subsequently, we introduce a framework with a new Causal Inference Task Affinity (CITA) measure for ITE knowledge transfer. Specifically, we use CITA to find the closest source task to the target task and utilize it for ITE knowledge transfer. Empirical studies are provided, demonstrating the efficacy of the proposed method. We observe that ITE knowledge transfer can significantly (up to 95%) reduce the amount of data required for ITE estimation.
△ Less
Submitted 5 June, 2023; v1 submitted 1 October, 2022;
originally announced October 2022.
-
Data-Driven Target Localization Using Adaptive Radar Processing and Convolutional Neural Networks
Authors:
Shyam Venkatasubramanian,
Sandeep Gogineni,
Bosung Kang,
Ali Pezeshki,
Muralidhar Rangaswamy,
Vahid Tarokh
Abstract:
Leveraging the advanced functionalities of modern radio frequency (RF) modeling and simulation tools, specifically designed for adaptive radar processing applications, this paper presents a data-driven approach to improve accuracy in radar target localization post adaptive radar detection. To this end, we generate a large number of radar returns by randomly placing targets of variable strengths in…
▽ More
Leveraging the advanced functionalities of modern radio frequency (RF) modeling and simulation tools, specifically designed for adaptive radar processing applications, this paper presents a data-driven approach to improve accuracy in radar target localization post adaptive radar detection. To this end, we generate a large number of radar returns by randomly placing targets of variable strengths in a predefined area, using RFView, a high-fidelity, site-specific, RF modeling & simulation tool. We produce heatmap tensors from the radar returns, in range, azimuth [and Doppler], of the normalized adaptive matched filter (NAMF) test statistic. We then train a regression convolutional neural network (CNN) to estimate target locations from these heatmap tensors, and we compare the target localization accuracy of this approach with that of peak-finding and local search methods. This empirical study shows that our regression CNN achieves a considerable improvement in target location estimation accuracy. The regression CNN offers significant gains and reasonable accuracy even at signal-to-clutter-plus-noise ratio (SCNR) regimes that are close to the breakdown threshold SCNR of the NAMF. We also study the robustness of our trained CNN to mismatches in the radar data, where the CNN is tested on heatmap tensors collected from areas that it was not trained on. We show that our CNN can be made robust to mismatches in the radar data through few-shot learning, using a relatively small number of new training samples.
△ Less
Submitted 9 July, 2024; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Inference and Sampling for Archimax Copulas
Authors:
Yuting Ng,
Ali Hasan,
Vahid Tarokh
Abstract:
Understanding multivariate dependencies in both the bulk and the tails of a distribution is an important problem for many applications, such as ensuring algorithms are robust to observations that are infrequent but have devastating effects. Archimax copulas are a family of distributions endowed with a precise representation that allows simultaneous modeling of the bulk and the tails of a distribut…
▽ More
Understanding multivariate dependencies in both the bulk and the tails of a distribution is an important problem for many applications, such as ensuring algorithms are robust to observations that are infrequent but have devastating effects. Archimax copulas are a family of distributions endowed with a precise representation that allows simultaneous modeling of the bulk and the tails of a distribution. Rather than separating the two as is typically done in practice, incorporating additional information from the bulk may improve inference of the tails, where observations are limited. Building on the stochastic representation of Archimax copulas, we develop a non-parametric inference method and sampling algorithm. Our proposed methods, to the best of our knowledge, are the first that allow for highly flexible and scalable inference and sampling algorithms, enabling the increased use of Archimax copulas in practical settings. We experimentally compare to state-of-the-art density modeling techniques, and the results suggest that the proposed method effectively extrapolates to the tails while scaling to higher dimensional data. Our findings suggest that the proposed algorithms can be used in a variety of applications where understanding the interplay between the bulk and the tails of a distribution is necessary, such as healthcare and safety.
△ Less
Submitted 20 September, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
On The Energy Statistics of Feature Maps in Pruning of Neural Networks with Skip-Connections
Authors:
Mohammadreza Soltani,
Suya Wu,
Yuerong Li,
Jie Ding,
Vahid Tarokh
Abstract:
We propose a new structured pruning framework for compressing Deep Neural Networks (DNNs) with skip connections, based on measuring the statistical dependency of hidden layers and predicted outputs. The dependence measure defined by the energy statistics of hidden layers serves as a model-free measure of information between the feature maps and the output of the network. The estimated dependence m…
▽ More
We propose a new structured pruning framework for compressing Deep Neural Networks (DNNs) with skip connections, based on measuring the statistical dependency of hidden layers and predicted outputs. The dependence measure defined by the energy statistics of hidden layers serves as a model-free measure of information between the feature maps and the output of the network. The estimated dependence measure is subsequently used to prune a collection of redundant and uninformative layers. Model-freeness of our measure guarantees that no parametric assumptions on the feature map distribution are required, making it computationally appealing for very high dimensional feature space in DNNs. Extensive numerical experiments on various architectures show the efficacy of the proposed pruning approach with competitive performance to state-of-the-art methods.
△ Less
Submitted 26 January, 2022;
originally announced January 2022.
-
Toward Data-Driven STAP Radar
Authors:
Shyam Venkatasubramanian,
Chayut Wongkamthong,
Mohammadreza Soltani,
Bosung Kang,
Sandeep Gogineni,
Ali Pezeshki,
Muralidhar Rangaswamy,
Vahid Tarokh
Abstract:
Using an amalgamation of techniques from classical radar, computer vision, and deep learning, we characterize our ongoing data-driven approach to space-time adaptive processing (STAP) radar. We generate a rich example dataset of received radar signals by randomly placing targets of variable strengths in a predetermined region using RFView, a site-specific radio frequency modeling and simulation to…
▽ More
Using an amalgamation of techniques from classical radar, computer vision, and deep learning, we characterize our ongoing data-driven approach to space-time adaptive processing (STAP) radar. We generate a rich example dataset of received radar signals by randomly placing targets of variable strengths in a predetermined region using RFView, a site-specific radio frequency modeling and simulation tool developed by ISL Inc. For each data sample within this region, we generate heatmap tensors in range, azimuth, and elevation of the output power of a minimum variance distortionless response (MVDR) beamformer, which can be replaced with a desired test statistic. These heatmap tensors can be thought of as stacked images, and in an airborne scenario, the moving radar creates a sequence of these time-indexed image stacks, resembling a video. Our goal is to use these images and videos to detect targets and estimate their locations, a procedure reminiscent of computer vision algorithms for object detection$-$namely, the Faster Region-Based Convolutional Neural Network (Faster R-CNN). The Faster R-CNN consists of a proposal generating network for determining regions of interest (ROI), a regression network for positioning anchor boxes around targets, and an object classification algorithm; it is developed and optimized for natural images. Our ongoing research will develop analogous tools for heatmap images of radar data. In this regard, we will generate a large, representative adaptive radar signal processing database for training and testing, analogous in spirit to the COCO dataset for natural images. As a preliminary example, we present a regression network in this paper for estimating target locations to demonstrate the feasibility of and significant improvements provided by our data-driven approach.
△ Less
Submitted 9 March, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.
-
Multi-Agent Adversarial Attacks for Multi-Channel Communications
Authors:
Juncheng Dong,
Suya Wu,
Mohammadreza Sultani,
Vahid Tarokh
Abstract:
Recently Reinforcement Learning (RL) has been applied as an anti-adversarial remedy in wireless communication networks. However, studying the RL-based approaches from the adversary's perspective has received little attention. Additionally, RL-based approaches in an anti-adversary or adversarial paradigm mostly consider single-channel communication (either channel selection or single channel power…
▽ More
Recently Reinforcement Learning (RL) has been applied as an anti-adversarial remedy in wireless communication networks. However, studying the RL-based approaches from the adversary's perspective has received little attention. Additionally, RL-based approaches in an anti-adversary or adversarial paradigm mostly consider single-channel communication (either channel selection or single channel power control), while multi-channel communication is more common in practice. In this paper, we propose a multi-agent adversary system (MAAS) for modeling and analyzing adversaries in a wireless communication scenario by careful design of the reward function under realistic communication scenarios. In particular, by modeling the adversaries as learning agents, we show that the proposed MAAS is able to successfully choose the transmitted channel(s) and their respective allocated power(s) without any prior knowledge of the sender strategy. Compared to the single-agent adversary (SAA), multi-agents in MAAS can achieve significant reduction in signal-to-noise ratio (SINR) under the same power constraints and partial observability, while providing improved stability and a more efficient learning process. Moreover, through empirical studies we show that the results in simulation are close to the ones in communication in reality, a conclusion that is pivotal to the validity of performance of agents evaluated in simulations.
△ Less
Submitted 27 January, 2022; v1 submitted 22 January, 2022;
originally announced January 2022.
-
A Physics-Informed Vector Quantized Autoencoder for Data Compression of Turbulent Flow
Authors:
Mohammadreza Momenifar,
Enmao Diao,
Vahid Tarokh,
Andrew D. Bragg
Abstract:
Analyzing large-scale data from simulations of turbulent flows is memory intensive, requiring significant resources. This major challenge highlights the need for data compression techniques. In this study, we apply a physics-informed Deep Learning technique based on vector quantization to generate a discrete, low-dimensional representation of data from simulations of three-dimensional turbulent fl…
▽ More
Analyzing large-scale data from simulations of turbulent flows is memory intensive, requiring significant resources. This major challenge highlights the need for data compression techniques. In this study, we apply a physics-informed Deep Learning technique based on vector quantization to generate a discrete, low-dimensional representation of data from simulations of three-dimensional turbulent flows. The deep learning framework is composed of convolutional layers and incorporates physical constraints on the flow, such as preserving incompressibility and global statistical characteristics of the velocity gradients. The accuracy of the model is assessed using statistical, comparison-based similarity and physics-based metrics. The training data set is produced from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow. The performance of this lossy data compression scheme is evaluated not only with unseen data from the stationary, isotropic turbulent flow, but also with data from decaying isotropic turbulence, and a Taylor-Green vortex flow. Defining the compression ratio (CR) as the ratio of original data size to the compressed one, the results show that our model based on vector quantization can offer CR $=85$ with a mean square error (MSE) of $O(10^{-3})$, and predictions that faithfully reproduce the statistics of the flow, except at the very smallest scales where there is some loss. Compared to the recent study based on a conventional autoencoder where compression is performed in a continuous space, our model improves the CR by more than $30$ percent, and reduces the MSE by an order of magnitude. Our compression model is an attractive solution for situations where fast, high quality and low-overhead encoding and decoding of large data are required.
△ Less
Submitted 11 January, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Emulating Spatio-Temporal Realizations of Three-Dimensional Isotropic Turbulence via Deep Sequence Learning Models
Authors:
Mohammadreza Momenifar,
Enmao Diao,
Vahid Tarokh,
Andrew D. Bragg
Abstract:
We use a data-driven approach to model a three-dimensional turbulent flow using cutting-edge Deep Learning techniques. The deep learning framework incorporates physical constraints on the flow, such as preserving incompressibility and global statistical invariants of velocity gradient tensor. The accuracy of the model is assessed using statistical and physics-based metrics. The data set comes from…
▽ More
We use a data-driven approach to model a three-dimensional turbulent flow using cutting-edge Deep Learning techniques. The deep learning framework incorporates physical constraints on the flow, such as preserving incompressibility and global statistical invariants of velocity gradient tensor. The accuracy of the model is assessed using statistical and physics-based metrics. The data set comes from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow in a cubic box. Since the size of the dataset is memory intensive, we first generate a low-dimensional representation of the velocity data, and then pass it to a sequence prediction network that learns the spatial and temporal correlations of the underlying data. The dimensionality reduction is performed via extraction using Vector-Quantized Autoencoder (VQ-AE), which learns the discrete latent variables. For the sequence forecasting, the idea of Transformer architecture from natural language processing is used, and its performance compared against more standard Recurrent Networks (such as Convolutional LSTM). These architectures are designed and trained to perform a sequence to sequence multi-class classification task in which they take an input sequence with a fixed length (k) and predict a sequence with a fixed length (p), representing the future time instants of the flow. Our results for the short-term predictions show that the accuracy of results for both models deteriorates across predicted snapshots due to autoregressive nature of the predictions. Based on our diagnostics tests, the trained Conv-Transformer model outperforms the Conv-LSTM one and can accurately, both quantitatively and qualitatively, retain the large scales and capture well the inertial scales of flow but fails at recovering the small and intermittent fluid motions.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Blaschke Product Neural Networks (BPNN): A Physics-Infused Neural Network for Phase Retrieval of Meromorphic Functions
Authors:
Juncheng Dong,
Simiao Ren,
Yang Deng,
Omar Khatib,
Jordan Malof,
Mohammadreza Soltani,
Willie Padilla,
Vahid Tarokh
Abstract:
Numerous physical systems are described by ordinary or partial differential equations whose solutions are given by holomorphic or meromorphic functions in the complex domain. In many cases, only the magnitude of these functions are observed on various points on the purely imaginary jw-axis since coherent measurement of their phases is often expensive. However, it is desirable to retrieve the lost…
▽ More
Numerous physical systems are described by ordinary or partial differential equations whose solutions are given by holomorphic or meromorphic functions in the complex domain. In many cases, only the magnitude of these functions are observed on various points on the purely imaginary jw-axis since coherent measurement of their phases is often expensive. However, it is desirable to retrieve the lost phases from the magnitudes when possible. To this end, we propose a physics-infused deep neural network based on the Blaschke products for phase retrieval. Inspired by the Helson and Sarason Theorem, we recover coefficients of a rational function of Blaschke products using a Blaschke Product Neural Network (BPNN), based upon the magnitude observations as input. The resulting rational function is then used for phase retrieval. We compare the BPNN to conventional deep neural networks (NNs) on several phase retrieval problems, comprising both synthetic and contemporary real-world problems (e.g., metamaterials for which data collection requires substantial expertise and is time consuming). On each phase retrieval problem, we compare against a population of conventional NNs of varying size and hyperparameter settings. Even without any hyper-parameter search, we find that BPNNs consistently outperform the population of optimized NNs in scarce data scenarios, and do so despite being much smaller models. The results can in turn be applied to calculate the refractive index of metamaterials, which is an important problem in emerging areas of material science.
△ Less
Submitted 25 November, 2021;
originally announced November 2021.
-
Characteristic Neural Ordinary Differential Equations
Authors:
Xingzi Xu,
Ali Hasan,
Khalil Elkhalil,
Jie Ding,
Vahid Tarokh
Abstract:
We propose Characteristic-Neural Ordinary Differential Equations (C-NODEs), a framework for extending Neural Ordinary Differential Equations (NODEs) beyond ODEs. While NODEs model the evolution of a latent variables as the solution to an ODE, C-NODE models the evolution of the latent variables as the solution of a family of first-order quasi-linear partial differential equations (PDEs) along curve…
▽ More
We propose Characteristic-Neural Ordinary Differential Equations (C-NODEs), a framework for extending Neural Ordinary Differential Equations (NODEs) beyond ODEs. While NODEs model the evolution of a latent variables as the solution to an ODE, C-NODE models the evolution of the latent variables as the solution of a family of first-order quasi-linear partial differential equations (PDEs) along curves on which the PDEs reduce to ODEs, referred to as characteristic curves. This in turn allows the application of the standard frameworks for solving ODEs, namely the adjoint method. Learning optimal characteristic curves for given tasks improves the performance and computational efficiency, compared to state of the art NODE models. We prove that the C-NODE framework extends the classical NODE on classification tasks by demonstrating explicit C-NODE representable functions not expressible by NODEs. Additionally, we present C-NODE-based continuous normalizing flows, which describe the density evolution of latent variables along multiple dimensions. Empirical results demonstrate the improvements provided by the proposed method for classification and density estimation on CIFAR-10, SVHN, and MNIST datasets under a similar computational budget as the existing NODE methods. The results also provide empirical evidence that the learned curves improve the efficiency of the system through a lower number of parameters and function evaluations compared with baselines.
△ Less
Submitted 9 November, 2022; v1 submitted 25 November, 2021;
originally announced November 2021.
-
Decentralized Multi-Target Cross-Domain Recommendation for Multi-Organization Collaborations
Authors:
Enmao Diao,
Vahid Tarokh,
Jie Ding
Abstract:
Recommender Systems (RSs) are operated locally by different organizations in many realistic scenarios. If various organizations can fully share their data and perform computation in a centralized manner, they may significantly improve the accuracy of recommendations. However, collaborations among multiple organizations in enhancing the performance of recommendations are primarily limited due to th…
▽ More
Recommender Systems (RSs) are operated locally by different organizations in many realistic scenarios. If various organizations can fully share their data and perform computation in a centralized manner, they may significantly improve the accuracy of recommendations. However, collaborations among multiple organizations in enhancing the performance of recommendations are primarily limited due to the difficulty of sharing data and models. To address this challenge, we propose Decentralized Multi-Target Cross-Domain Recommendation (DMTCDR) with Multi-Target Assisted Learning (MTAL) and Assisted AutoEncoder (AAE). Our method can help multiple organizations collaboratively improve their recommendation performance in a decentralized manner without sharing sensitive assets. Consequently, it allows decentralized organizations to collaborate and form a community of shared interest. We conduct extensive experiments to demonstrate that the new method can significantly outperform locally trained RSs and mitigate the cold start problem.
△ Less
Submitted 6 November, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Task Affinity with Maximum Bipartite Matching in Few-Shot Learning
Authors:
Cat P. Le,
Juncheng Dong,
Mohammadreza Soltani,
Vahid Tarokh
Abstract:
We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a…
▽ More
We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a novel algorithm for the few-shot learning problem. In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model. Results on various few-shot benchmark datasets demonstrate the efficacy of the proposed approach by improving the classification accuracy over the state-of-the-art methods even when using smaller models.
△ Less
Submitted 21 January, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Semi-Empirical Objective Functions for MCMC Proposal Optimization
Authors:
Chris Cannella,
Vahid Tarokh
Abstract:
Current objective functions used for training neural MCMC proposal distributions implicitly rely on architectural restrictions to yield sensible optimization results, which hampers the development of highly expressive neural MCMC proposal architectures. In this work, we introduce and demonstrate a semi-empirical procedure for determining approximate objective functions suitable for optimizing arbi…
▽ More
Current objective functions used for training neural MCMC proposal distributions implicitly rely on architectural restrictions to yield sensible optimization results, which hampers the development of highly expressive neural MCMC proposal architectures. In this work, we introduce and demonstrate a semi-empirical procedure for determining approximate objective functions suitable for optimizing arbitrarily parameterized proposal distributions in MCMC methods. Our proposed Ab Initio objective functions consist of the weighted combination of functions following constraints on their global optima and transformation invariances that we argue should be upheld by general measures of MCMC efficiency for use in proposal optimization. Our experimental results demonstrate that Ab Initio objective functions maintain favorable performance and preferable optimization behavior compared to existing objective functions for neural MCMC optimization. We find that Ab Initio objective functions are sufficiently robust to enable the confident optimization of neural proposal distributions parameterized by deep generative networks extending beyond the regimes of traditional MCMC schemes
△ Less
Submitted 9 April, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address th…
▽ More
Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication-efficient FL with SSL. To address this issue, we propose alternate training to `fine-tune global model with labeled data' and `generate pseudo-labels with the global model.' We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results.
△ Less
Submitted 11 October, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collabo…
▽ More
Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new method for multiple organizations to assist each other in supervised learning tasks without sharing local data, models, and objective functions. In this framework, all participants collaboratively optimize the aggregate of local loss functions, and each participant autonomously builds its own model by iteratively fitting the gradients of the overarching objective function. We also provide asymptotic convergence analysis and practical case studies of GAL. Experimental studies demonstrate that GAL can achieve performance close to centralized learning when all data, models, and objective functions are fully disclosed.
△ Less
Submitted 11 October, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
A Methodology for Exploring Deep Convolutional Features in Relation to Hand-Crafted Features with an Application to Music Audio Modeling
Authors:
Anna K. Yanchenko,
Mohammadreza Soltani,
Robert J. Ravier,
Sayan Mukherjee,
Vahid Tarokh
Abstract:
Understanding the features learned by deep models is important from a model trust perspective, especially as deep systems are deployed in the real world. Most recent approaches for deep feature understanding or model explanation focus on highlighting input data features that are relevant for classification decisions. In this work, we instead take the perspective of relating deep features to well-s…
▽ More
Understanding the features learned by deep models is important from a model trust perspective, especially as deep systems are deployed in the real world. Most recent approaches for deep feature understanding or model explanation focus on highlighting input data features that are relevant for classification decisions. In this work, we instead take the perspective of relating deep features to well-studied, hand-crafted features that are meaningful for the application of interest. We propose a methodology and set of systematic experiments for exploring deep features in this setting, where input feature importance approaches for deep feature understanding do not apply. Our experiments focus on understanding which hand-crafted and deep features are useful for the classification task of interest, how robust these features are for related tasks and how similar the deep features are to the meaningful hand-crafted features. Our proposed method is general to many application areas and we demonstrate its utility on orchestral music audio data.
△ Less
Submitted 9 October, 2021; v1 submitted 31 May, 2021;
originally announced June 2021.
-
Fisher Task Distance and Its Application in Neural Architecture Search
Authors:
Cat P. Le,
Mohammadreza Soltani,
Juncheng Dong,
Vahid Tarokh
Abstract:
We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Tas…
▽ More
We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Taskonomy datasets. Next, we construct an online neural architecture search framework using the Fisher task distance, in which we have access to the past learned tasks. By using the Fisher task distance, we can identify the closest learned tasks to the target task, and utilize the knowledge learned from these related tasks for the target task. Here, we show how the proposed distance between a target task and a set of learned tasks can be used to reduce the neural architecture search space for the target task. The complexity reduction in search space for task-specific architectures is achieved by building on the optimized architectures for similar tasks instead of doing a full search and without using this side information. Experimental results for tasks in MNIST, CIFAR-10, CIFAR-100, ImageNet datasets demonstrate the efficacy of the proposed approach and its improvements, in terms of the performance and the number of parameters, over other gradient-based search methods, such as ENAS, DARTS, PC-DARTS.
△ Less
Submitted 30 April, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.