-
A Dual Basis Approach for Structured Robust Euclidean Distance Geometry
Authors:
Chandra Kundu,
Abiy Tasissa,
HanQin Cai
Abstract:
Euclidean Distance Matrix (EDM), which consists of pairwise squared Euclidean distances of a given point configuration, finds many applications in modern machine learning. This paper considers the setting where only a set of anchor nodes is used to collect the distances between themselves and the rest. In the presence of potential outliers, it results in a structured partial observation on EDM wit…
▽ More
Euclidean Distance Matrix (EDM), which consists of pairwise squared Euclidean distances of a given point configuration, finds many applications in modern machine learning. This paper considers the setting where only a set of anchor nodes is used to collect the distances between themselves and the rest. In the presence of potential outliers, it results in a structured partial observation on EDM with partial corruptions. Note that an EDM can be connected to a positive semi-definite Gram matrix via a non-orthogonal dual basis. Inspired by recent development of non-orthogonal dual basis in optimization, we propose a novel algorithmic framework, dubbed Robust Euclidean Distance Geometry via Dual Basis (RoDEoDB), for recovering the Euclidean distance geometry, i.e., the underlying point configuration. The exact recovery guarantees have been established in terms of both the Gram matrix and point configuration, under some mild conditions. Empirical experiments show superior performance of RoDEoDB on sensor localization and molecular conformation datasets.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Diffusion Stochastic Learning Over Adaptive Competing Networks
Authors:
Yike Zhao,
Haoyuan Cai,
Ali H. Sayed
Abstract:
This paper studies a stochastic dynamic game between two competing teams, each consisting of a network of collaborating agents. Unlike fully cooperative settings, where all agents share a common objective, each team in this game aims to minimize its own distinct objective. In the adversarial setting, their objectives could be conflicting as in zero-sum games. Throughout the competition, agents sha…
▽ More
This paper studies a stochastic dynamic game between two competing teams, each consisting of a network of collaborating agents. Unlike fully cooperative settings, where all agents share a common objective, each team in this game aims to minimize its own distinct objective. In the adversarial setting, their objectives could be conflicting as in zero-sum games. Throughout the competition, agents share strategic information within their own team while simultaneously inferring and adapting to the strategies of the opposing team. We propose diffusion learning algorithms to address two important classes of this network game: i) a zero-sum game characterized by weak cross-team subgraph interactions, and ii) a general non-zero-sum game exhibiting strong cross-team subgraph interactions. We analyze the stability performance of the proposed algorithms under reasonable assumptions and illustrate the theoretical results through experiments on Cournot team competition and decentralized GAN training.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Plus-pure thresholds of some cusp-like singularities in mixed characteristic
Authors:
Hanlin Cai,
Suchitra Pande,
Eamon Quinlan-Gallego,
Karl Schwede,
Kevin Tucker
Abstract:
Log-canonical and $F$-pure thresholds of pairs in equal characteristic admit an analog in the recent theory of singularities in mixed characteristic, which is known as the plus-pure threshold. In this paper we study plus-pure thresholds for singularities of the form $p^a + x^b \in {\bf Z}_p [[ x ]]$, showing that in a number of cases this plus-pure threshold agrees with the $F$-pure threshold of t…
▽ More
Log-canonical and $F$-pure thresholds of pairs in equal characteristic admit an analog in the recent theory of singularities in mixed characteristic, which is known as the plus-pure threshold. In this paper we study plus-pure thresholds for singularities of the form $p^a + x^b \in {\bf Z}_p [[ x ]]$, showing that in a number of cases this plus-pure threshold agrees with the $F$-pure threshold of the singularity $t^a + x^b \in {\bf F}_p [[ t, x ]]$. We also discuss a few other sporadic examples.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Characterizing perfectoid covers of abelian varieties
Authors:
Rebecca Bellovin,
Hanlin Cai,
Sean Howe,
Tongmu He
Abstract:
We give a simple characterization of all perfectoid profinite étale covers of abelian varieties in terms of the Hodge-Tate filtration on the $p$-adic Tate module. We also compute the geometric Sen morphism for all profinite $p$-adic Lie torsors over an abelian variety, and combine this with our characterization to prove a conjecture of Rodríguez Camargo on perfectoidness of $p$-adic Lie torsors in…
▽ More
We give a simple characterization of all perfectoid profinite étale covers of abelian varieties in terms of the Hodge-Tate filtration on the $p$-adic Tate module. We also compute the geometric Sen morphism for all profinite $p$-adic Lie torsors over an abelian variety, and combine this with our characterization to prove a conjecture of Rodríguez Camargo on perfectoidness of $p$-adic Lie torsors in this case. We obtain complementary results for covers of semi-abeloid varieties, $p$-divisible rigid analytic groups, and varieties with globally generated 1-forms. Our proof of perfectoidness for covers of abelian varieties is based on results of Scholze on the canonical subgroup and holds for an arbitrary abelian variety over an algebraically closed non-archimedean extension of $\mathbb{Q}_p$. In an appendix authored by Tongmu He, an alternate proof is presented in the case of abelian varieties that can be defined over a discretely valued subfield by combining our computation of the geometric Sen morphism with previous pointwise perfectoidness and purity of perfectoidness results of He.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Deeply Learned Robust Matrix Completion for Large-scale Low-rank Data Recovery
Authors:
HanQin Cai,
Chandra Kundu,
Jialin Liu,
Wotao Yin
Abstract:
Robust matrix completion (RMC) is a widely used machine learning tool that simultaneously tackles two critical issues in low-rank data analysis: missing data entries and extreme outliers. This paper proposes a novel scalable and learnable non-convex approach, coined Learned Robust Matrix Completion (LRMC), for large-scale RMC problems. LRMC enjoys low computational complexity with linear convergen…
▽ More
Robust matrix completion (RMC) is a widely used machine learning tool that simultaneously tackles two critical issues in low-rank data analysis: missing data entries and extreme outliers. This paper proposes a novel scalable and learnable non-convex approach, coined Learned Robust Matrix Completion (LRMC), for large-scale RMC problems. LRMC enjoys low computational complexity with linear convergence. Motivated by the proposed theorem, the free parameters of LRMC can be effectively learned via deep unfolding to achieve optimum performance. Furthermore, this paper proposes a flexible feedforward-recurrent-mixed neural network framework that extends deep unfolding from fix-number iterations to infinite iterations. The superior empirical performance of LRMC is verified with extensive experiments against state-of-the-art on synthetic datasets and real applications, including video background subtraction, ultrasound imaging, face modeling, and cloud removal from satellite imagery.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
Structured Sampling for Robust Euclidean Distance Geometry
Authors:
Chandra Kundu,
Abiy Tasissa,
HanQin Cai
Abstract:
This paper addresses the problem of estimating the positions of points from distance measurements corrupted by sparse outliers. Specifically, we consider a setting with two types of nodes: anchor nodes, for which exact distances to each other are known, and target nodes, for which complete but corrupted distance measurements to the anchors are available. To tackle this problem, we propose a novel…
▽ More
This paper addresses the problem of estimating the positions of points from distance measurements corrupted by sparse outliers. Specifically, we consider a setting with two types of nodes: anchor nodes, for which exact distances to each other are known, and target nodes, for which complete but corrupted distance measurements to the anchors are available. To tackle this problem, we propose a novel algorithm powered by Nyström method and robust principal component analysis. Our method is computationally efficient as it processes only a localized subset of the distance matrix and does not require distance measurements between target nodes. Empirical evaluations on synthetic datasets, designed to mimic sensor localization, and on molecular experiments, demonstrate that our algorithm achieves accurate recovery with a modest number of anchors, even in the presence of high levels of sparse outliers.
△ Less
Submitted 17 February, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery
Authors:
Paris Giampouras,
HanQin Cai,
Rene Vidal
Abstract:
In this paper, we focus on a matrix factorization-based approach to recover low-rank {\it asymmetric} matrices from corrupted measurements. We propose an {\it Overparameterized Preconditioned Subgradient Algorithm (OPSA)} and provide, for the first time in the literature, linear convergence rates independent of the rank of the sought asymmetric matrix in the presence of gross corruptions. Our work…
▽ More
In this paper, we focus on a matrix factorization-based approach to recover low-rank {\it asymmetric} matrices from corrupted measurements. We propose an {\it Overparameterized Preconditioned Subgradient Algorithm (OPSA)} and provide, for the first time in the literature, linear convergence rates independent of the rank of the sought asymmetric matrix in the presence of gross corruptions. Our work goes beyond existing results in preconditioned-type approaches addressing their current limitation, i.e., the lack of convergence guarantees in the case of {\it asymmetric matrices of unknown rank}. By applying our approach to (robust) matrix sensing, we highlight its merits when the measurement operator satisfies a mixed-norm restricted isometry property. Lastly, we present extensive numerical experiments that validate our theoretical results and demonstrate the effectiveness of our approach for different levels of overparameterization and outlier corruptions.
△ Less
Submitted 29 May, 2025; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees
Authors:
Chandler Smith,
HanQin Cai,
Abiy Tasissa
Abstract:
The problem of determining the configuration of points from partial distance information, known as the Euclidean Distance Geometry (EDG) problem, is fundamental to many tasks in the applied sciences. In this paper, we propose two algorithms grounded in the Riemannian optimization framework to address the EDG problem. Our approach formulates the problem as a low-rank matrix completion task over the…
▽ More
The problem of determining the configuration of points from partial distance information, known as the Euclidean Distance Geometry (EDG) problem, is fundamental to many tasks in the applied sciences. In this paper, we propose two algorithms grounded in the Riemannian optimization framework to address the EDG problem. Our approach formulates the problem as a low-rank matrix completion task over the Gram matrix, using partial measurements represented as expansion coefficients of the Gram matrix in a non-orthogonal basis. For the first algorithm, under a uniform sampling with replacement model for the observed distance entries, we demonstrate that, with high probability, a Riemannian gradient-like algorithm on the manifold of rank-$r$ matrices converges linearly to the true solution, given initialization via a one-step hard thresholding. This holds provided the number of samples, $m$, satisfies $m \geq \mathcal{O}(n^{7/4}r^2 \log(n))$. With a more refined initialization, achieved through resampled Riemannian gradient-like descent, we further improve this bound to $m \geq \mathcal{O}(nr^2 \log(n))$. Our analysis for the first algorithm leverages a non-self-adjoint operator and depends on deriving eigenvalue bounds for an inner product matrix of restricted basis matrices, leveraging sparsity properties for tighter guarantees than previously established. The second algorithm introduces a self-adjoint surrogate for the sampling operator. This algorithm demonstrates strong numerical performance on both synthetic and real data. Furthermore, we show that optimizing over manifolds of higher-than-rank-$r$ matrices yields superior numerical results, consistent with recent literature on overparameterization in the EDG problem.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
Authors:
Haoyuan Cai,
Sulaiman A. Alghunaim,
Ali H. Sayed
Abstract:
Lower-bound analyses for nonconvex strongly-concave minimax optimization problems have shown that stochastic first-order algorithms require at least $\mathcal{O}(\varepsilon^{-4})$ oracle complexity to find an $\varepsilon$-stationary point. Some works indicate that this complexity can be improved to $\mathcal{O}(\varepsilon^{-3})$ when the loss gradient is Lipschitz continuous. The question of ac…
▽ More
Lower-bound analyses for nonconvex strongly-concave minimax optimization problems have shown that stochastic first-order algorithms require at least $\mathcal{O}(\varepsilon^{-4})$ oracle complexity to find an $\varepsilon$-stationary point. Some works indicate that this complexity can be improved to $\mathcal{O}(\varepsilon^{-3})$ when the loss gradient is Lipschitz continuous. The question of achieving enhanced convergence rates under distinct conditions, remains unresolved. In this work, we address this question for optimization problems that are nonconvex in the minimization variable and strongly concave or Polyak-Lojasiewicz (PL) in the maximization variable. We introduce novel bias-corrected momentum algorithms utilizing efficient Hessian-vector products. We establish convergence conditions and demonstrate a lower iteration complexity of $\mathcal{O}(\varepsilon^{-3})$ for the proposed algorithms. The effectiveness of the method is validated through applications to robust logistic regression using real-world datasets.
△ Less
Submitted 13 May, 2025; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Guaranteed Sampling Flexibility for Low-tubal-rank Tensor Completion
Authors:
Bowen Su,
Juntao You,
HanQin Cai,
Longxiu Huang
Abstract:
While Bernoulli sampling is extensively studied in tensor completion, t-CUR sampling approximates low-tubal-rank tensors via lateral and horizontal subtensors. However, both methods lack sufficient flexibility for diverse practical applications. To address this, we introduce Tensor Cross-Concentrated Sampling (t-CCS), a novel and straightforward sampling model that advances the matrix cross-concen…
▽ More
While Bernoulli sampling is extensively studied in tensor completion, t-CUR sampling approximates low-tubal-rank tensors via lateral and horizontal subtensors. However, both methods lack sufficient flexibility for diverse practical applications. To address this, we introduce Tensor Cross-Concentrated Sampling (t-CCS), a novel and straightforward sampling model that advances the matrix cross-concentrated sampling concept within a tensor framework. t-CCS effectively bridges the gap between Bernoulli and t-CUR sampling, offering additional flexibility that can lead to computational savings in various contexts. A key aspect of our work is the comprehensive theoretical analysis provided. We establish a sufficient condition for the successful recovery of a low-rank tensor from its t-CCS samples. In support of this, we also develop a theoretical framework validating the feasibility of t-CUR via uniform random sampling and conduct a detailed theoretical sampling complexity analysis for tensor completion problems utilizing the general Bernoulli sampling model. Moreover, we introduce an efficient non-convex algorithm, the Iterative t-CUR Tensor Completion (ITCURTC) algorithm, specifically designed to tackle the t-CCS-based tensor completion. We have intensively tested and validated the effectiveness of the t-CCS model and the ITCURTC algorithm across both synthetic and real-world datasets.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Accelerating Ill-conditioned Hankel Matrix Recovery via Structured Newton-like Descent
Authors:
HanQin Cai,
Longxiu Huang,
Xiliang Lu,
Juntao You
Abstract:
This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of th…
▽ More
This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of the condition number of the underlying Hankel matrix. The recovery guarantee has been established under some mild conditions. Numerical experiments on both synthetic and real datasets show the superior performance of HSNLD against state-of-the-art algorithms.
△ Less
Submitted 10 April, 2025; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Efficient quaternion CUR method for low-rank approximation to quaternion matrix
Authors:
Peng-Ling Wu,
Kit Ian Kou,
Hongmin Cai,
Zhaoyuan Yu
Abstract:
The low-rank quaternion matrix approximation has been successfully applied in many applications involving signal processing and color image processing. However, the cost of quaternion models for generating low-rank quaternion matrix approximation is sometimes considerable due to the computation of the quaternion singular value decomposition (QSVD), which limits their application to real large-scal…
▽ More
The low-rank quaternion matrix approximation has been successfully applied in many applications involving signal processing and color image processing. However, the cost of quaternion models for generating low-rank quaternion matrix approximation is sometimes considerable due to the computation of the quaternion singular value decomposition (QSVD), which limits their application to real large-scale data. To address this deficiency, an efficient quaternion matrix CUR (QMCUR) method for low-rank approximation is suggested, which provides significant acceleration in color image processing. We first explore the QMCUR approximation method, which uses actual columns and rows of the given quaternion matrix, instead of the costly QSVD. Additionally, two different sampling strategies are used to sample the above-selected columns and rows. Then, the perturbation analysis is performed on the QMCUR approximation of noisy versions of low-rank quaternion matrices. Extensive experiments on both synthetic and real data further reveal the superiority of the proposed algorithm compared with other algorithms for getting low-rank approximation, in terms of both efficiency and accuracy.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
On the Robustness of Cross-Concentrated Sampling for Matrix Completion
Authors:
HanQin Cai,
Longxiu Huang,
Chandra Kundu,
Bowen Su
Abstract:
Matrix completion is one of the crucial tools in modern data science research. Recently, a novel sampling model for matrix completion coined cross-concentrated sampling (CCS) has caught much attention. However, the robustness of the CCS model against sparse outliers remains unclear in the existing studies. In this paper, we aim to answer this question by exploring a novel Robust CCS Completion pro…
▽ More
Matrix completion is one of the crucial tools in modern data science research. Recently, a novel sampling model for matrix completion coined cross-concentrated sampling (CCS) has caught much attention. However, the robustness of the CCS model against sparse outliers remains unclear in the existing studies. In this paper, we aim to answer this question by exploring a novel Robust CCS Completion problem. A highly efficient non-convex iterative algorithm, dubbed Robust CUR Completion (RCURC), is proposed. The empirical performance of the proposed algorithm, in terms of both efficiency and robustness, is verified in synthetic and real datasets.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Diffusion Stochastic Optimization for Min-Max Problems
Authors:
Haoyuan Cai,
Sulaiman A. Alghunaim,
Ali H. Sayed
Abstract:
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional stochastic version suffers from the need for a large batch size on the order of $\mathcal{O}(\varepsilon^{-2})$ to achieve an $\varepsilon$-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient…
▽ More
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional stochastic version suffers from the need for a large batch size on the order of $\mathcal{O}(\varepsilon^{-2})$ to achieve an $\varepsilon$-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence and resolve the large batch issue by establishing a tighter upper bound, under the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions. We also extend the applicability of the proposed method to the distributed scenario, where agents communicate with their neighbors via a left-stochastic protocol. To implement DSS-OG, we can query the stochastic gradient oracles in parallel with some extra memory overhead, resulting in a complexity comparable to its conventional counterpart. To demonstrate the efficacy of the proposed algorithm, we conduct tests by training generative adversarial networks.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
On Efficient Inference of Causal Effects with Multiple Mediators
Authors:
Haoyu Wei,
Hengrui Cai,
Chengchun Shi,
Rui Song
Abstract:
This paper provides robust estimators and efficient inference of causal effects involving multiple interacting mediators. Most existing works either impose a linear model assumption among the mediators or are restricted to handle conditionally independent mediators given the exposure. To overcome these limitations, we define causal and individual mediation effects in a general setting, and employ…
▽ More
This paper provides robust estimators and efficient inference of causal effects involving multiple interacting mediators. Most existing works either impose a linear model assumption among the mediators or are restricted to handle conditionally independent mediators given the exposure. To overcome these limitations, we define causal and individual mediation effects in a general setting, and employ a semiparametric framework to develop quadruply robust estimators for these causal effects. We further establish the asymptotic normality of the proposed estimators and prove their local semiparametric efficiencies. The proposed method is empirically validated via simulated and real datasets concerning psychiatric disorders in trauma survivors.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
One-element Extensions of Hyperplane Arrangements
Authors:
Hang Cai,
Houshan Fu,
Suijie Wang
Abstract:
We classify one-element extensions of a hyperplane arrangement by the induced adjoint arrangement. Based on the classification, several kinds of combinatorial invariants including Whitney polynomials, characteristic polynomials, Whitney numbers and face numbers, are constants on those strata associated with the induced adjoint arrangement, and also order-preserving with respect to the intersection…
▽ More
We classify one-element extensions of a hyperplane arrangement by the induced adjoint arrangement. Based on the classification, several kinds of combinatorial invariants including Whitney polynomials, characteristic polynomials, Whitney numbers and face numbers, are constants on those strata associated with the induced adjoint arrangement, and also order-preserving with respect to the intersection lattice of the induced adjoint arrangement. As a byproduct, we obtain a convolution formula on the characteristic polynomials $χ(\mathcal{A}+H_{\bmα,a},t)$ when $\mathcal{A}$ is defined over a finite field $\mathbb{F}_q$ or a rational arrangement.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Quaternion tensor left ring decomposition and application for color image inpainting
Authors:
Jifei Miao,
Kit Ian Kou,
Hongmin Cai,
Lizhi Liu
Abstract:
In recent years, tensor networks have emerged as powerful tools for solving large-scale optimization problems. One of the most promising tensor networks is the tensor ring (TR) decomposition, which achieves circular dimensional permutation invariance in the model through the utilization of the trace operation and equitable treatment of the latent cores. On the other hand, more recently, quaternion…
▽ More
In recent years, tensor networks have emerged as powerful tools for solving large-scale optimization problems. One of the most promising tensor networks is the tensor ring (TR) decomposition, which achieves circular dimensional permutation invariance in the model through the utilization of the trace operation and equitable treatment of the latent cores. On the other hand, more recently, quaternions have gained significant attention and have been widely utilized in color image processing tasks due to their effectiveness in encoding color pixels by considering the three color channels as a unified entity. Therefore, in this paper, based on the left quaternion matrix multiplication, we propose the quaternion tensor left ring (QTLR) decomposition, which inherits the powerful and generalized representation abilities of the TR decomposition while leveraging the advantages of quaternions for color pixel representation. In addition to providing the definition of QTLR decomposition and an algorithm for learning the QTLR format, the paper further proposes a low-rank quaternion tensor completion (LRQTC) model and its algorithm for color image inpainting based on the defined QTLR decomposition. Finally, extensive experiments on color image inpainting demonstrate that the proposed LRQTC method is highly competitive.
△ Less
Submitted 16 September, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Towards Constituting Mathematical Structures for Learning to Optimize
Authors:
Jialin Liu,
Xiaohan Chen,
Zhangyang Wang,
Wotao Yin,
HanQin Cai
Abstract:
Learning to Optimize (L2O), a technique that utilizes machine learning to learn an optimization algorithm automatically from data, has gained arising attention in recent years. A generic L2O approach parameterizes the iterative update rule and learns the update direction as a black-box network. While the generic approach is widely applicable, the learned model can overfit and may not generalize we…
▽ More
Learning to Optimize (L2O), a technique that utilizes machine learning to learn an optimization algorithm automatically from data, has gained arising attention in recent years. A generic L2O approach parameterizes the iterative update rule and learns the update direction as a black-box network. While the generic approach is widely applicable, the learned model can overfit and may not generalize well to out-of-distribution test sets. In this paper, we derive the basic mathematical conditions that successful update rules commonly satisfy. Consequently, we propose a novel L2O model with a mathematics-inspired structure that is broadly applicable and generalized well to out-of-distribution problems. Numerical simulations validate our theoretical findings and demonstrate the superior empirical performance of the proposed L2O model.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
To AI or not to AI, to Buy Local or not to Buy Local: A Mathematical Theory of Real Price
Authors:
Huan Cai,
Catherine Xu,
Weiyu Xu
Abstract:
In the past several decades, the world's economy has become increasingly globalized. On the other hand, there are also ideas advocating the practice of ``buy local'', by which people buy locally produced goods and services rather than those produced farther away. In this paper, we establish a mathematical theory of real price that determines the optimal global versus local spending of an agent whi…
▽ More
In the past several decades, the world's economy has become increasingly globalized. On the other hand, there are also ideas advocating the practice of ``buy local'', by which people buy locally produced goods and services rather than those produced farther away. In this paper, we establish a mathematical theory of real price that determines the optimal global versus local spending of an agent which achieves the agent's optimal tradeoff between spending and obtained utility. Our theory of real price depends on the asymptotic analysis of a Markov chain transition probability matrix related to the network of producers and consumers. We show that the real price of a product or service can be determined from the involved Markov chain matrix, and can be dramatically different from the product's label price. In particular, we show that the label prices of products and services are often not ``real'' or directly ``useful'': given two products offering the same myopic utility, the one with lower label price may not necessarily offer better asymptotic utility. This theory shows that the globality or locality of the products and services does have different impacts on the spending-utility tradeoff of a customer. The established mathematical theory of real price can be used to determine whether to adopt or not to adopt certain artificial intelligence (AI) technologies from an economic perspective.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption
Authors:
HanQin Cai,
Zehan Chao,
Longxiu Huang,
Deanna Needell
Abstract:
We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker…
▽ More
We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker rank setting. RTCUR is developed within a framework of alternating projections that projects between the set of low-rank tensors and the set of sparse tensors. We utilize the recently developed tensor CUR decomposition to substantially reduce the computational complexity in each projection. In addition, we develop four variants of RTCUR for different application settings. We demonstrate the effectiveness and computational advantages of RTCUR against state-of-the-art methods on both synthetic and real-world datasets.
△ Less
Submitted 10 October, 2023; v1 submitted 6 May, 2023;
originally announced May 2023.
-
Lipschitz optimal transport metric for a wave system modeling nematic liquid crystals
Authors:
Hong Cai,
Geng Chen,
Yannan Shen
Abstract:
In this paper, we study the Lipschitz continuous dependence of conservative Hölder continuous weak solutions to a variational wave system derived from a model for nematic liquid crystals. Since the solution of this system generally forms finite time cusp singularity, the solution flow is not Lipschitz continuous under the Sobolev metric used in the existence and uniqueness theory. We establish a F…
▽ More
In this paper, we study the Lipschitz continuous dependence of conservative Hölder continuous weak solutions to a variational wave system derived from a model for nematic liquid crystals. Since the solution of this system generally forms finite time cusp singularity, the solution flow is not Lipschitz continuous under the Sobolev metric used in the existence and uniqueness theory. We establish a Finsler type optimal transport metric, and show the Lipschitz continuous dependence of solution on the initial data under this metric. This kind of Finsler type optimal transport metrics was first established in [A. Bressan and G. Chen, Arch. Ration. Mech. Anal. 226(3) (2017), 1303-1343] for the scalar variational wave equation. This equation can be used to describe the unit direction n of mean orientation of nematic liquid crystals, when n is restricted on a circle. The model considered in this paper describes the propagation of n without this restriction, i.e. n takes any value on the unite sphere. So we need to consider a wave system instead of a scalar equation.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Non-convex approaches for low-rank tensor completion under tubal sampling
Authors:
Zheng Tan,
Longxiu Huang,
HanQin Cai,
Yifei Lou
Abstract:
Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor $L_1$-$L_2$ (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainti…
▽ More
Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor $L_1$-$L_2$ (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainting problem. Empirical results reveal a trade-off between the accuracy and time efficiency of these two methods in a low sampling ratio. Each of them outperforms some classical completion methods in at least one aspect.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Heterogeneous Synthetic Learner for Panel Data
Authors:
Ye Shen,
Runzhe Wan,
Hengrui Cai,
Rui Song
Abstract:
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on th…
▽ More
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.
△ Less
Submitted 29 January, 2023; v1 submitted 30 December, 2022;
originally announced December 2022.
-
Perfectoid signature, perfectoid Hilbert-Kunz multiplicity, and an application to local fundamental groups
Authors:
Hanlin Cai,
Seungsu Lee,
Linquan Ma,
Karl Schwede,
Kevin Tucker
Abstract:
We define a (perfectoid) mixed characteristic version of $F$-signature and Hilbert-Kunz multiplicity by utilizing the perfectoidization functor of Bhatt-Scholze and Faltings' normalized length (also developed in the work of Gabber-Ramero). We show that these definitions coincide with the classical theory in equal characteristic $p > 0$. We prove that a ring is regular if and only if either its per…
▽ More
We define a (perfectoid) mixed characteristic version of $F$-signature and Hilbert-Kunz multiplicity by utilizing the perfectoidization functor of Bhatt-Scholze and Faltings' normalized length (also developed in the work of Gabber-Ramero). We show that these definitions coincide with the classical theory in equal characteristic $p > 0$. We prove that a ring is regular if and only if either its perfectoid signature or perfectoid Hilbert-Kunz multiplicity is 1 and we show that perfectoid Hilbert-Kunz multiplicity characterizes BCM closure and extended plus closure of $\mathfrak{m}$-primary ideals. We demonstrate that perfectoid signature detects BCM-regularity and transforms similarly to $F$-signature or normalized volume under quasi-étale maps. As a consequence, we prove that BCM-regular rings have finite local étale fundamental group and also finite torsion part of their divisor class groups. Finally, we also define a mixed characteristic version of relative rational signature, and show it characterizes BCM-rational singularities.
△ Less
Submitted 4 February, 2025; v1 submitted 8 September, 2022;
originally announced September 2022.
-
Matrix Completion with Cross-Concentrated Sampling: Bridging Uniform Sampling and CUR Sampling
Authors:
HanQin Cai,
Longxiu Huang,
Pengyu Li,
Deanna Needell
Abstract:
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform s…
▽ More
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
△ Less
Submitted 21 March, 2023; v1 submitted 20 August, 2022;
originally announced August 2022.
-
Riemannian CUR Decompositions for Robust Principal Component Analysis
Authors:
Keaton Hamm,
Mohamed Meskini,
HanQin Cai
Abstract:
Robust Principal Component Analysis (PCA) has received massive attention in recent years. It aims to recover a low-rank matrix and a sparse matrix from their sum. This paper proposes a novel nonconvex Robust PCA algorithm, coined Riemannian CUR (RieCUR), which utilizes the ideas of Riemannian optimization and robust CUR decompositions. This algorithm has the same computational complexity as Iterat…
▽ More
Robust Principal Component Analysis (PCA) has received massive attention in recent years. It aims to recover a low-rank matrix and a sparse matrix from their sum. This paper proposes a novel nonconvex Robust PCA algorithm, coined Riemannian CUR (RieCUR), which utilizes the ideas of Riemannian optimization and robust CUR decompositions. This algorithm has the same computational complexity as Iterated Robust CUR, which is currently state-of-the-art, but is more robust to outliers. RieCUR is also able to tolerate a significant amount of outliers, and is comparable to Accelerated Alternating Projections, which has high outlier tolerance but worse computational complexity than the proposed method. Thus, the proposed algorithm achieves state-of-the-art performance on Robust PCA both in terms of computational complexity and outlier tolerance.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Structured Gradient Descent for Fast Robust Low-Rank Hankel Matrix Completion
Authors:
HanQin Cai,
Jian-Feng Cai,
Juntao You
Abstract:
We study the robust matrix completion problem for the low-rank Hankel matrix, which detects the sparse corruptions caused by extreme outliers while we try to recover the original Hankel matrix from the partial observation. In this paper, we explore the convenient Hankel structure and propose a novel non-convex algorithm, coined Hankel Structured Gradient Descent (HSGD), for large-scale robust Hank…
▽ More
We study the robust matrix completion problem for the low-rank Hankel matrix, which detects the sparse corruptions caused by extreme outliers while we try to recover the original Hankel matrix from the partial observation. In this paper, we explore the convenient Hankel structure and propose a novel non-convex algorithm, coined Hankel Structured Gradient Descent (HSGD), for large-scale robust Hankel matrix completion problems. HSGD is highly computing- and sample-efficient compared to the state-of-the-arts. The recovery guarantee with a linear convergence rate has been established for HSGD under some mild assumptions. The empirical advantages of HSGD are verified on both synthetic datasets and real-world nuclear magnetic resonance signals.
△ Less
Submitted 19 March, 2023; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Jump Interval-Learning for Individualized Decision Making
Authors:
Hengrui Cai,
Chengchun Shi,
Rui Song,
Wenbin Lu
Abstract:
An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this paper, we focus on the continuous treatment setting and propose a jump interval-learning to develop an individualized interval-val…
▽ More
An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this paper, we focus on the continuous treatment setting and propose a jump interval-learning to develop an individualized interval-valued decision rule (I2DR) that maximizes the expected outcome. Unlike IDRs that recommend a single treatment, the proposed I2DR yields an interval of treatment options for each individual, making it more flexible to implement in practice. To derive an optimal I2DR, our jump interval-learning method estimates the conditional mean of the outcome given the treatment and the covariates via jump penalized regression, and derives the corresponding optimal I2DR based on the estimated outcome regression function. The regressor is allowed to be either linear for clear interpretation or deep neural network to model complex treatment-covariates interactions. To implement jump interval-learning, we develop a searching algorithm based on dynamic programming that efficiently computes the outcome regression function. Statistical properties of the resulting I2DR are established when the outcome regression function is either a piecewise or continuous function over the treatment space. We further develop a procedure to infer the mean outcome under the (estimated) optimal policy. Extensive simulations and a real data application to a warfarin study are conducted to demonstrate the empirical validity of the proposed I2DR.
△ Less
Submitted 28 January, 2023; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning
Authors:
Ye Shen,
Hengrui Cai,
Rui Song
Abstract:
Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instructions on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a pr…
▽ More
Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instructions on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection for consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulation studies and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.
△ Less
Submitted 2 August, 2024; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection
Authors:
HanQin Cai,
Jialin Liu,
Wotao Yin
Abstract:
Robust principal component analysis (RPCA) is a critical tool in modern machine learning, which detects outliers in the task of low-rank matrix reconstruction. In this paper, we propose a scalable and learnable non-convex approach for high-dimensional RPCA problems, which we call Learned Robust PCA (LRPCA). LRPCA is highly efficient, and its free parameters can be effectively learned to optimize v…
▽ More
Robust principal component analysis (RPCA) is a critical tool in modern machine learning, which detects outliers in the task of low-rank matrix reconstruction. In this paper, we propose a scalable and learnable non-convex approach for high-dimensional RPCA problems, which we call Learned Robust PCA (LRPCA). LRPCA is highly efficient, and its free parameters can be effectively learned to optimize via deep unfolding. Moreover, we extend deep unfolding from finite iterations to infinite iterations via a novel feedforward-recurrent-mixed neural network model. We establish the recovery guarantee of LRPCA under mild assumptions for RPCA. Numerical experiments show that LRPCA outperforms the state-of-the-art RPCA algorithms, such as ScaledGD and AltProj, on both synthetic datasets and real-world applications.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Curvature-Aware Derivative-Free Optimization
Authors:
Bumsu Kim,
HanQin Cai,
Daniel McKenzie,
Wotao Yin
Abstract:
The paper discusses derivative-free optimization (DFO), which involves minimizing a function without access to gradients or directional derivatives, only function evaluations. Classical DFO methods, which mimic gradient-based methods, such as Nelder-Mead and direct search have limited scalability for high-dimensional problems. Zeroth-order methods have been gaining popularity due to the demands of…
▽ More
The paper discusses derivative-free optimization (DFO), which involves minimizing a function without access to gradients or directional derivatives, only function evaluations. Classical DFO methods, which mimic gradient-based methods, such as Nelder-Mead and direct search have limited scalability for high-dimensional problems. Zeroth-order methods have been gaining popularity due to the demands of large-scale machine learning applications, and the paper focuses on the selection of the step size $α_k$ in these methods. The proposed approach, called Curvature-Aware Random Search (CARS), uses first- and second-order finite difference approximations to compute a candidate $α_{+}$. We prove that for strongly convex objective functions, CARS converges linearly provided that the search direction is drawn from a distribution satisfying very mild conditions. We also present a Cubic Regularized variant of CARS, named CARS-CR, which converges in a rate of $\mathcal{O}(k^{-1})$ without the assumption of strong convexity. Numerical experiments show that CARS and CARS-CR match or exceed the state-of-the-arts on benchmark problem sets.
△ Less
Submitted 12 April, 2023; v1 submitted 27 September, 2021;
originally announced September 2021.
-
Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition
Authors:
HanQin Cai,
Zehan Chao,
Longxiu Huang,
Deanna Needell
Abstract:
We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tens…
▽ More
We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tensor Fiber CUR decomposition to dramatically lower the computational complexity. The performance advantage of RTCUR is empirically verified against the state-of-the-arts on the synthetic datasets and is further demonstrated on the real-world application such as color video background subtraction.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Estimation of high-dimensional change-points under a group sparsity structure
Authors:
Hanqing Cai,
Tengyao Wang
Abstract:
Change-points are a routine feature of 'big data' observed in the form of high-dimensional data streams. In many such data streams, the component series possess group structures and it is natural to assume that changes only occur in a small number of all groups. We propose a new change point procedure, called 'groupInspect', that exploits the group sparsity structure to estimate a projection direc…
▽ More
Change-points are a routine feature of 'big data' observed in the form of high-dimensional data streams. In many such data streams, the component series possess group structures and it is natural to assume that changes only occur in a small number of all groups. We propose a new change point procedure, called 'groupInspect', that exploits the group sparsity structure to estimate a projection direction so as to aggregate information across the component series to successfully estimate the change-point in the mean structure of the series. We prove that the estimated projection direction is minimax optimal, up to logarithmic factors, when all group sizes are of comparable order. Moreover, our theory provide strong guarantees on the rate of convergence of the change-point location estimator. Numerical studies demonstrates the competitive performance of groupInspect in a wide range of settings and a real data example confirms the practical usefulness of our procedure.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
GEAR: On Optimal Decision Making with Auxiliary Data
Authors:
Hengrui Cai,
Rui Song,
Wenbin Lu
Abstract:
Personalized optimal decision making, finding the optimal decision rule (ODR) based on individual characteristics, has attracted increasing attention recently in many fields, such as education, economics, and medicine. Current ODR methods usually require the primary outcome of interest in samples for assessing treatment effects, namely the experimental sample. However, in many studies, treatments…
▽ More
Personalized optimal decision making, finding the optimal decision rule (ODR) based on individual characteristics, has attracted increasing attention recently in many fields, such as education, economics, and medicine. Current ODR methods usually require the primary outcome of interest in samples for assessing treatment effects, namely the experimental sample. However, in many studies, treatments may have a long-term effect, and as such the primary outcome of interest cannot be observed in the experimental sample due to the limited duration of experiments, which makes the estimation of ODR impossible. This paper is inspired to address this challenge by making use of an auxiliary sample to facilitate the estimation of ODR in the experimental sample. We propose an auGmented inverse propensity weighted Experimental and Auxiliary sample-based decision Rule (GEAR) by maximizing the augmented inverse propensity weighted value estimator over a class of decision rules using the experimental sample, with the primary outcome being imputed based on the auxiliary sample. The asymptotic properties of the proposed GEAR estimators and their associated value estimators are established. Simulation studies are conducted to demonstrate its empirical validity with a real AIDS application.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
Calibrated Optimal Decision Making with Multiple Data Sources and Limited Outcome
Authors:
Hengrui Cai,
Wenbin Lu,
Rui Song
Abstract:
We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available. The outcome of interest is limited in the sense that it is only observed in the primary sample. In reality, such multiple data sources may belong to heterogeneous studies and thus cannot be combined directly. This paper proposes a new framework to handle heterogeneous samples…
▽ More
We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available. The outcome of interest is limited in the sense that it is only observed in the primary sample. In reality, such multiple data sources may belong to heterogeneous studies and thus cannot be combined directly. This paper proposes a new framework to handle heterogeneous samples and address the limited outcome simultaneously through a novel calibrated optimal decision-making method, by leveraging the common intermediate outcomes in multiple data sources. Specifically, our method allows the baseline covariates across different samples to have either homogeneous or heterogeneous distributions. Under the equal conditional means of intermediate outcomes in different samples given baseline covariates and the treatment information, we show that the proposed estimator of the conditional mean outcome is asymptotically normal and more efficient than using the primary sample solely. Extensive experiments on simulated datasets demonstrate empirical validity and improved efficiency using our approach, followed by a real application to electronic health records.
△ Less
Submitted 21 September, 2022; v1 submitted 21 April, 2021;
originally announced April 2021.
-
Mode-wise Tensor Decompositions: Multi-dimensional Generalizations of CUR Decompositions
Authors:
HanQin Cai,
Keaton Hamm,
Longxiu Huang,
Deanna Needell
Abstract:
Low rank tensor approximation is a fundamental tool in modern machine learning and data science. In this paper, we study the characterization, perturbation analysis, and an efficient sampling strategy for two primary tensor CUR approximations, namely Chidori and Fiber CUR. We characterize exact tensor CUR decompositions for low multilinear rank tensors. We also present theoretical error bounds of…
▽ More
Low rank tensor approximation is a fundamental tool in modern machine learning and data science. In this paper, we study the characterization, perturbation analysis, and an efficient sampling strategy for two primary tensor CUR approximations, namely Chidori and Fiber CUR. We characterize exact tensor CUR decompositions for low multilinear rank tensors. We also present theoretical error bounds of the tensor CUR approximations when (adversarial or Gaussian) noise appears. Moreover, we show that low cost uniform sampling is sufficient for tensor CUR approximations if the tensor has an incoherent structure. Empirical performance evaluations, with both synthetic and real-world datasets, establish the speed advantage of the tensor CUR approximations over other state-of-the-art low multilinear rank tensor approximations.
△ Less
Submitted 25 June, 2021; v1 submitted 19 March, 2021;
originally announced March 2021.
-
A Zeroth-Order Block Coordinate Descent Algorithm for Huge-Scale Black-Box Optimization
Authors:
HanQin Cai,
Yuchen Lou,
Daniel McKenzie,
Wotao Yin
Abstract:
We consider the zeroth-order optimization problem in the huge-scale setting, where the dimension of the problem is so large that performing even basic vector operations on the decision variables is infeasible. In this paper, we propose a novel algorithm, coined ZO-BCD, that exhibits favorable overall query complexity and has a much smaller per-iteration computational complexity. In addition, we di…
▽ More
We consider the zeroth-order optimization problem in the huge-scale setting, where the dimension of the problem is so large that performing even basic vector operations on the decision variables is infeasible. In this paper, we propose a novel algorithm, coined ZO-BCD, that exhibits favorable overall query complexity and has a much smaller per-iteration computational complexity. In addition, we discuss how the memory footprint of ZO-BCD can be reduced even further by the clever use of circulant measurement matrices. As an application of our new method, we propose the idea of crafting adversarial attacks on neural network based classifiers in a wavelet domain, which can result in problem dimensions of over 1.7 million. In particular, we show that crafting adversarial examples to audio classifiers in a wavelet domain can achieve the state-of-the-art attack success rate of 97.9%.
△ Less
Submitted 11 June, 2021; v1 submitted 21 February, 2021;
originally announced February 2021.
-
Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank Estimation
Authors:
HanQin Cai,
Keaton Hamm,
Longxiu Huang,
Jiaqi Li,
Tao Wang
Abstract:
Robust principal component analysis (RPCA) is a widely used tool for dimension reduction. In this work, we propose a novel non-convex algorithm, coined Iterated Robust CUR (IRCUR), for solving RPCA problems, which dramatically improves the computational efficiency in comparison with the existing algorithms. IRCUR achieves this acceleration by employing CUR decomposition when updating the low rank…
▽ More
Robust principal component analysis (RPCA) is a widely used tool for dimension reduction. In this work, we propose a novel non-convex algorithm, coined Iterated Robust CUR (IRCUR), for solving RPCA problems, which dramatically improves the computational efficiency in comparison with the existing algorithms. IRCUR achieves this acceleration by employing CUR decomposition when updating the low rank component, which allows us to obtain an accurate low rank approximation via only three small submatrices. Consequently, IRCUR is able to process only the small submatrices and avoid expensive computing on the full matrix through the entire algorithm. Numerical experiments establish the computational advantage of IRCUR over the state-of-art algorithms on both synthetic and real-world datasets.
△ Less
Submitted 7 February, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
A One-bit, Comparison-Based Gradient Estimator
Authors:
HanQin Cai,
Daniel Mckenzie,
Wotao Yin,
Zhenliang Zhang
Abstract:
We study zeroth-order optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a $\textit{comparison oracle}$, which given two points $x$ and $y$ returns a single bit of information indicating which point has larger function value, $f(x)$ or $f(y)$. By treating the gradient as an unknown signal to be recovered, we show how…
▽ More
We study zeroth-order optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a $\textit{comparison oracle}$, which given two points $x$ and $y$ returns a single bit of information indicating which point has larger function value, $f(x)$ or $f(y)$. By treating the gradient as an unknown signal to be recovered, we show how one can use tools from one-bit compressed sensing to construct a robust and reliable estimator of the normalized gradient. We then propose an algorithm, coined SCOBO, that uses this estimator within a gradient descent scheme. We show that when $f(x)$ has some low dimensional structure that can be exploited, SCOBO outperforms the state-of-the-art in terms of query complexity. Our theoretical claims are verified by extensive numerical experiments.
△ Less
Submitted 23 April, 2022; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Uniqueness of Dissipative Solution for Camassa-Holm Equation with Peakon-Antipeakon Initial Data
Authors:
Hong Cai,
Geng Chen,
Hongwei Mei
Abstract:
We give a proof for the uniqueness of dissipative solution for the Camassa-Holm equation with some peakon-antipeakon initial data following Dafermos' earlier resut in [5] on the Hunter-Saxton equation. Our result shows that two existing global existence frameworks, through the vanishing viscosity method by Xin-Zhang in [11] and the transformation of coordinate method for dissipative solutions by B…
▽ More
We give a proof for the uniqueness of dissipative solution for the Camassa-Holm equation with some peakon-antipeakon initial data following Dafermos' earlier resut in [5] on the Hunter-Saxton equation. Our result shows that two existing global existence frameworks, through the vanishing viscosity method by Xin-Zhang in [11] and the transformation of coordinate method for dissipative solutions by Bressan-Constantin in [3], give the same solution, for a special but typical initial data forming finite time gradient blowup.
△ Less
Submitted 21 October, 2020; v1 submitted 16 August, 2020;
originally announced August 2020.
-
A Finsler type Lipschitz optimal transport metric for a quasilinear wave equation
Authors:
Hong Cai,
Geng Chen,
Yannan Shen
Abstract:
We consider the global well-posedness of weak energy conservative solution to a general quasilinear wave equation through variational principle, where the solution may form finite time cusp singularity, when energy concentrates. As a main result in this paper, we construct a Finsler type optimal transport metric, then prove that the solution flow is Lipschitz under this metric. We also prove a gen…
▽ More
We consider the global well-posedness of weak energy conservative solution to a general quasilinear wave equation through variational principle, where the solution may form finite time cusp singularity, when energy concentrates. As a main result in this paper, we construct a Finsler type optimal transport metric, then prove that the solution flow is Lipschitz under this metric. We also prove a generic regularity result by applying Thom's transversality theorem, then find piecewise smooth transportation paths among a dense set of solutions. The results in this paper are for large data solutions, without restriction on the size of solutions.
△ Less
Submitted 14 August, 2020; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Uniqueness of conservative solutions to a one-dimensional general quasilinear wave equation through variational principle
Authors:
Hong Cai,
Geng Chen,
Yi Du,
Yannan Shen
Abstract:
In this paper, we prove the uniqueness of energy conservative Holder continuous weak solution to a general quasilinear wave equation by the analysis of characteristics. This result has no restriction on the size of solutions, i.e. it is a large data result.
In this paper, we prove the uniqueness of energy conservative Holder continuous weak solution to a general quasilinear wave equation by the analysis of characteristics. This result has no restriction on the size of solutions, i.e. it is a large data result.
△ Less
Submitted 14 August, 2020; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling
Authors:
HanQin Cai,
Daniel Mckenzie,
Wotao Yin,
Zhenliang Zhang
Abstract:
We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new $\textbf{Z}$eroth-$\textbf{O}$rder $\textbf{R}$egularized $\textbf{O}$ptimization method, dubbed ZORO. When the underlying…
▽ More
We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new $\textbf{Z}$eroth-$\textbf{O}$rder $\textbf{R}$egularized $\textbf{O}$ptimization method, dubbed ZORO. When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function. We achieve this with an adaptive, randomized gradient estimator, followed by an inexact proximal-gradient scheme. Under a novel approximately sparse gradient assumption and various different convex settings, we show the (theoretical and empirical) convergence rate of ZORO is only logarithmically dependent on the problem dimension. Numerical experiments show that ZORO outperforms the existing methods with similar assumptions, on both synthetic and real datasets.
△ Less
Submitted 30 November, 2021; v1 submitted 29 March, 2020;
originally announced March 2020.
-
Singularity formation for radially symmetric expanding wave of Compressible Euler Equations
Authors:
Hong Cai,
Geng Chen,
Tian-Yi Wang
Abstract:
In this paper, for compressible Euler equations in multiple space dimensions, we prove the break-down of classical solutions with a large class of initial data by tracking the propagation of radially symmetric expanding wave including compression. The singularity formation is corresponding to the finite time shock formation. We also provide some new global sup-norm estimates on velocity and densit…
▽ More
In this paper, for compressible Euler equations in multiple space dimensions, we prove the break-down of classical solutions with a large class of initial data by tracking the propagation of radially symmetric expanding wave including compression. The singularity formation is corresponding to the finite time shock formation. We also provide some new global sup-norm estimates on velocity and density functions for classical solutions. The results in this paper have no restriction on the size of solutions, hence are large data results.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
Accelerated Structured Alternating Projections for Robust Spectrally Sparse Signal Recovery
Authors:
HanQin Cai,
Jian-Feng Cai,
Tianming Wang,
Guojian Yin
Abstract:
Consider a spectrally sparse signal $\boldsymbol{x}$ that consists of $r$ complex sinusoids with or without damping. We study the robust recovery problem for the spectrally sparse signal under the fully observed setting, which is about recovering $\boldsymbol{x}$ and a sparse corruption vector $\boldsymbol{s}$ from their sum $\boldsymbol{z}=\boldsymbol{x}+\boldsymbol{s}$. In this paper, we exploit…
▽ More
Consider a spectrally sparse signal $\boldsymbol{x}$ that consists of $r$ complex sinusoids with or without damping. We study the robust recovery problem for the spectrally sparse signal under the fully observed setting, which is about recovering $\boldsymbol{x}$ and a sparse corruption vector $\boldsymbol{s}$ from their sum $\boldsymbol{z}=\boldsymbol{x}+\boldsymbol{s}$. In this paper, we exploit the low-rank property of the Hankel matrix formed by $\boldsymbol{x}$, and formulate the problem as the robust recovery of a corrupted low-rank Hankel matrix. We develop a highly efficient non-convex algorithm, coined Accelerated Structured Alternating Projections (ASAP). The high computational efficiency and low space complexity of ASAP are achieved by fast computations involving structured matrices, and a subspace projection method for accelerated low-rank approximation. Theoretical recovery guarantee with a linear convergence rate has been established for ASAP, under some mild assumptions on $\boldsymbol{x}$ and $\boldsymbol{s}$. Empirical performance comparisons on both synthetic and real-world data confirm the advantages of ASAP, in terms of computational efficiency and robustness aspects.
△ Less
Submitted 16 January, 2021; v1 submitted 13 October, 2019;
originally announced October 2019.
-
Two-Sample Test Based on Classification Probability
Authors:
Haiyan Cai,
Bryan Goggin,
Qingtang Jiang
Abstract:
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of classification probabilities from a classifier trained from the samples, a test statistic is proposed. We explain why such a test can be a powerful test and compa…
▽ More
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of classification probabilities from a classifier trained from the samples, a test statistic is proposed. We explain why such a test can be a powerful test and compare its performance in terms of the power and efficiency with those of some other recently proposed tests with simulation and real-life data. The test proposed is nonparametric and can be applied to complex and high dimensional data wherever there is a classifier that provides consistent estimate of the classification probability for such data.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Fisher-KPP dynamics in diffusive Rosenzweig-MacArthur and Holling-Tanner models
Authors:
Hong Cai,
Anna Ghazaryan,
Vahagn Manukian
Abstract:
We prove the existence of traveling fronts in diffusive Rosenzweig-MacArthur and Holling-Tanner population models and investigate their relation with fronts in a scalar Fisher-KPP equation. More precisely, we prove the existence of fronts in a Rosenzweig-MacArthur predator-prey model in two situations: when the prey diffuses at the rate much smaller than that of the predator and when both the pred…
▽ More
We prove the existence of traveling fronts in diffusive Rosenzweig-MacArthur and Holling-Tanner population models and investigate their relation with fronts in a scalar Fisher-KPP equation. More precisely, we prove the existence of fronts in a Rosenzweig-MacArthur predator-prey model in two situations: when the prey diffuses at the rate much smaller than that of the predator and when both the predator and the prey diffuse very slowly. Both situations are captured as singular perturbations of the associated limiting systems. In the first situation we demonstrate clear relations of the fronts with the fronts in a scalar Fisher-KPP equation. Indeed, we show that the underlying dynamical system in a singular limit is reduced to a scalar Fisher-KPP equation and the fronts supported by the full system are small perturbations of the Fisher-KPP fronts. We obtain a similar result for a diffusive Holling-Tanner population model. In the second situation for the Rosenzweig-MacArthur model we prove the existence of the fronts but without observing a direct relation with Fisher-KPP equation. The analysis suggests that, in a variety of reaction-diffusion systems that rise in population modeling, parameter regimes may be found when the dynamics of the system is inherited from the scalar Fisher-KPP equation.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
Adaptive Synchrosqueezing Transform with a Time-Varying Parameter for Non-stationary Signal Separation
Authors:
Lin Li,
Haiyan Cai,
Qingtang Jiang
Abstract:
The continuous wavelet transform (CWT) is a linear time-frequency representation and a powerful tool for analyzing non-stationary signals. The synchrosqueezing transform (SST) is a special type of the reassignment method which not only enhances the energy concentration of CWT in the time-frequency plane, but also separates the components of multicomponent signals. The "bump wavelet" and Morlet's w…
▽ More
The continuous wavelet transform (CWT) is a linear time-frequency representation and a powerful tool for analyzing non-stationary signals. The synchrosqueezing transform (SST) is a special type of the reassignment method which not only enhances the energy concentration of CWT in the time-frequency plane, but also separates the components of multicomponent signals. The "bump wavelet" and Morlet's wavelet are commonly used continuous wavelets for the wavelet-based SST. There is a parameter in these wavelets which controls the widths of the time-frequency localization window. In most literature on SST, this parameter is a fixed positive constant. In this paper, we consider the CWT with a time-varying parameter (called the adaptive CWT) and the corresponding SST (called the adaptive SST) for instantaneous frequency estimation and multicomponent signal separation. We also introduce the 2nd-order adaptive SST. We analyze the separation conditions for non-stationary multicomponent signals with the local approximation of linear frequency modulation mode. We derive well-separated conditions of a multicomponent signal based on the adaptive CWT. We propose methods to select the time-varying parameter so that the corresponding adaptive SSTs of the components of a multicomponent signal have sharp representations and are well-separated, and hence the components can be recovered more accurately. We provide comparison experimental results to demonstrate the efficiency and robustness of the proposed adaptive CWT and adaptive SST in separating components of multicomponent signals with fast varying frequencies.
△ Less
Submitted 26 September, 2019; v1 submitted 29 December, 2018;
originally announced December 2018.
-
Analysis of Adaptive Short-time Fourier Transform-based Synchrosqueezing Transform
Authors:
Haiyan Cai,
Qingtang Jiang,
Lin Li,
Bruce W. Suter
Abstract:
Recently the study of modeling a non-stationary signal as a superposition of amplitude and frequency-modulated Fourier-like oscillatory modes has been a very active research area. The synchrosqueezing transform (SST) is a powerful method for instantaneous frequency estimation and component separation of non-stationary multicomponent signals. The short-time Fourier transform-based SST (FSST for sho…
▽ More
Recently the study of modeling a non-stationary signal as a superposition of amplitude and frequency-modulated Fourier-like oscillatory modes has been a very active research area. The synchrosqueezing transform (SST) is a powerful method for instantaneous frequency estimation and component separation of non-stationary multicomponent signals. The short-time Fourier transform-based SST (FSST for short) reassigns the frequency variable to sharpen the time-frequency representation and to separate the components of a multicomponent non-stationary signal. Very recently the FSST with a time-varying parameter, called the adaptive FSST, was introduced. The simulation experiments show that the adaptive FSST is very promising in instantaneous frequency estimation of the component of a multicomponent signal, and in accurate component recovery. However the theoretical analysis of the adaptive FSST has not been carried out. In this paper, we study the theoretical analysis of the adaptive FSST and obtain the error bounds for the instantaneous frequency estimation and component recovery with the adaptive FSST and the 2nd-order adaptive FSST.
△ Less
Submitted 28 December, 2018;
originally announced December 2018.
-
Accelerated Alternating Projections for Robust Principal Component Analysis
Authors:
HanQin Cai,
Jian-Feng Cai,
Ke Wei
Abstract:
We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\boldsymbol{L}$ and a sparse matrix $\boldsymbol{S}$ from their sum $\boldsymbol{D}=\boldsymbol{L}+\boldsymbol{S}$. In this paper, a new algorithm, dubbed accelerated alternating projections, is introduced for robust PCA which significantly improves the computational efficiency of the existing alternat…
▽ More
We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\boldsymbol{L}$ and a sparse matrix $\boldsymbol{S}$ from their sum $\boldsymbol{D}=\boldsymbol{L}+\boldsymbol{S}$. In this paper, a new algorithm, dubbed accelerated alternating projections, is introduced for robust PCA which significantly improves the computational efficiency of the existing alternating projections proposed in [Netrapalli, Praneeth, et al., 2014] when updating the low rank factor. The acceleration is achieved by first projecting a matrix onto some low dimensional subspace before obtaining a new estimate of the low rank matrix via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other state-of-the-art algorithms for robust PCA.
△ Less
Submitted 10 February, 2019; v1 submitted 15 November, 2017;
originally announced November 2017.