Skip to main content

Showing 1–36 of 36 results for author: Cao, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.04616  [pdf

    cs.CL stat.AP stat.ML

    Subjective Perspectives within Learned Representations Predict High-Impact Innovation

    Authors: Likun Cao, Rui Pan, James Evans

    Abstract: Existing studies of innovation emphasize the power of social structures to shape innovation capacity. Emerging machine learning approaches, however, enable us to model innovators' personal perspectives and interpersonal innovation opportunities as a function of their prior trajectories of experience. We theorize then quantify subjective perspectives and innovation opportunities based on innovator… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 107 pages, 20 figures

  2. arXiv:2505.06699  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws

    Authors: Xiyuan Wei, Ming Lin, Fanjiang Ye, Fengguang Song, Liangliang Cao, My T. Thai, Tianbao Yang

    Abstract: This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting, named $\textbf{model steering}$. While ad-hoc methods have been used in various contexts, including the training of large foundation models, its underlying principles remain insufficiently understood, leading… ▽ More

    Submitted 16 May, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

    Comments: 18 pages, 6 figures

  3. arXiv:2411.12726  [pdf, other

    math.NA cs.LG stat.CO stat.ML

    LazyDINO: Fast, scalable, and efficiently amortized Bayesian inversion via structure-exploiting and surrogate-driven measure transport

    Authors: Lianghao Cao, Joshua Chen, Michael Brennan, Thomas O'Leary-Roseberry, Youssef Marzouk, Omar Ghattas

    Abstract: We present LazyDINO, a transport map variational inference method for fast, scalable, and efficiently amortized solutions of high-dimensional nonlinear Bayesian inverse problems with expensive parameter-to-observable (PtO) maps. Our method consists of an offline phase in which we construct a derivative-informed neural surrogate of the PtO map using joint samples of the PtO map and its Jacobian. Du… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  4. arXiv:2410.05419  [pdf, ps, other

    cs.LG cs.AI stat.ME

    Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality

    Authors: Lei You, Yijun Bian, Lele Cao

    Abstract: Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanation… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  5. arXiv:2403.08220  [pdf, other

    math.NA cs.LG stat.CO stat.ML

    Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problems

    Authors: Lianghao Cao, Thomas O'Leary-Roseberry, Omar Ghattas

    Abstract: We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (P… ▽ More

    Submitted 20 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Updated manuscript: changed title, changed format, typo correction, and minor terminology changes

  6. arXiv:2401.13112  [pdf, other

    cs.AI stat.ML

    Distributional Counterfactual Explanations With Optimal Transport

    Authors: Lei You, Lele Cao, Mattias Nilsson, Bo Zhao, Lei Lei

    Abstract: Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models by identifying alternative inputs that lead to different outcomes. However, existing CE approaches, including group and global methods, focus predominantly on specific input modifications, lacking the ability to capture nuanced distributional characteristics that influence model ou… ▽ More

    Submitted 12 March, 2025; v1 submitted 23 January, 2024; originally announced January 2024.

  7. arXiv:2401.03341  [pdf, other

    cs.LG stat.ML

    Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection

    Authors: Zhangkai Wu, Longbing Cao, Qi Zhang, Junxian Zhou, Hui Chen

    Abstract: Due to their unsupervised training and uncertainty estimation, deep Variational Autoencoders (VAEs) have become powerful tools for reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based TSAD methods, either statistical or deep, tune meta-priors to estimate the likelihood probability for effectively capturing spatiotemporal dependencies in the data. However, these methods con… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  8. arXiv:2309.13303  [pdf, other

    cs.LG cs.CV stat.ML

    C$^2$VAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior

    Authors: Zhangkai Wu, Longbing Cao

    Abstract: We present a self-supervised variational autoencoder (VAE) to jointly learn disentangled and dependent hidden factors and then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations in a contrastive manner. To this end, a Contrastive Copula VAE (C$^2$VAE) is introduced without relying on prior knowledge about data in the probabilistic prin… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  9. arXiv:2306.05398  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci math.NA stat.CO

    Bayesian model calibration for diblock copolymer thin film self-assembly using power spectrum of microscopy data and machine learning surrogate

    Authors: Lianghao Cao, Keyi Wu, J. Tinsley Oden, Peng Chen, Omar Ghattas

    Abstract: Identifying parameters of computational models from experimental data, or model calibration, is fundamental for assessing and improving the predictability and reliability of computer simulations. In this work, we propose a method for Bayesian calibration of models that predict morphological patterns of diblock copolymer (Di-BCP) thin film self-assembly while accounting for various sources of uncer… ▽ More

    Submitted 3 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Minor changes from the original submission, including a change in the title

  10. arXiv:2210.03008  [pdf, other

    math.NA cs.LG stat.CO stat.ML

    Residual-based error correction for neural operator accelerated infinite-dimensional Bayesian inverse problems

    Authors: Lianghao Cao, Thomas O'Leary-Roseberry, Prashant K. Jha, J. Tinsley Oden, Omar Ghattas

    Abstract: We explore using neural operators, or neural network representations of nonlinear maps between function spaces, to accelerate infinite-dimensional Bayesian inverse problems (BIPs) with models governed by nonlinear parametric partial differential equations (PDEs). Neural operators have gained significant attention in recent years for their ability to approximate the parameter-to-solution maps defin… ▽ More

    Submitted 18 October, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

  11. arXiv:2206.11343  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci stat.AP stat.CO stat.ML

    Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport

    Authors: Ricardo Baptista, Lianghao Cao, Joshua Chen, Omar Ghattas, Fengyi Li, Youssef M. Marzouk, J. Tinsley Oden

    Abstract: We consider the Bayesian calibration of models describing the phenomenon of block copolymer (BCP) self-assembly using image data produced by microscopy or X-ray scattering techniques. To account for the random long-range disorder in BCP equilibrium structures, we introduce auxiliary variables to represent this aleatory uncertainty. These variables, however, result in an integrated likelihood for h… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  12. arXiv:2108.10029  [pdf, other

    stat.ML cs.LG q-bio.PE

    Modeling time evolving COVID-19 uncertainties with density dependent asymptomatic infections and social reinforcement

    Authors: Qing Liu, Longbing Cao

    Abstract: The COVID-19 pandemic has posed significant challenges in modeling its complex epidemic transmissions, infection and contagion, which are very different from known epidemics. The challenges in quantifying COVID-19 complexities include effectively modeling its process and data uncertainties. The uncertainties are embedded in implicit and high-proportional undocumented infections, asymptomatic conta… ▽ More

    Submitted 5 April, 2022; v1 submitted 23 August, 2021; originally announced August 2021.

  13. arXiv:2103.11516  [pdf, other

    cs.LG cs.AI stat.ML

    Homophily Outlier Detection in Non-IID Categorical Data

    Authors: Guansong Pang, Longbing Cao, Ling Chen

    Abstract: Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distribution… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

    Comments: To appear in Data Ming and Knowledge Discovery Journal

  14. arXiv:2009.06847  [pdf, other

    cs.LG cs.AI cs.CR cs.CV stat.ML

    Toward Deep Supervised Anomaly Detection: Reinforcement Learning from Partially Labeled Anomaly Data

    Authors: Guansong Pang, Anton van den Hengel, Chunhua Shen, Longbing Cao

    Abstract: We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset. This is a common scenario in many important applications. Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data. We propos… ▽ More

    Submitted 10 June, 2021; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Accepted to KDD 2021

  15. Unsupervised Heterogeneous Coupling Learning for Categorical Representation

    Authors: Chengzhang Zhu, Longbing Cao, Jianping Yin

    Abstract: Complex categorical data is often hierarchically coupled with heterogeneous relationships between attributes and attribute values and the couplings between objects. Such value-to-object couplings are heterogeneous with complementary and inconsistent interactions and distributions. Limited research exists on unlabeled categorical data representations, ignores the heterogeneous and hierarchical coup… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

  16. arXiv:2007.02500  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Learning for Anomaly Detection: A Review

    Authors: Guansong Pang, Chunhua Shen, Longbing Cao, Anton van den Hengel

    Abstract: Anomaly detection, a.k.a. outlier detection or novelty detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection, has emerged as a critical direction. This paper sur… ▽ More

    Submitted 4 December, 2020; v1 submitted 5 July, 2020; originally announced July 2020.

    Comments: Survey paper, 36 pages, 180 references, 2 figures, 4 tables

    Journal ref: ACM Computing Surveys, 2020

  17. arXiv:2005.08047  [pdf, other

    cs.LG stat.ML

    Simple, Scalable, and Stable Variational Deep Clustering

    Authors: Lele Cao, Sahar Asadi, Wenfei Zhu, Christian Schmidli, Michael Sjöberg

    Abstract: Deep clustering (DC) has become the state-of-the-art for unsupervised clustering. In principle, DC represents a variety of unsupervised methods that jointly learn the underlying clusters and the latent representation directly from unstructured datasets. However, DC methods are generally poorly applied due to high operational costs, low scalability, and unstable results. In this paper, we first eva… ▽ More

    Submitted 21 May, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: 17 pages, 5 figures, source code: https://github.com/king/s3vdc

  18. arXiv:1907.00526   

    cs.LG cs.AI stat.ML

    FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

    Authors: Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Zheng, Gang Pan

    Abstract: In recent years significant progress has been made in dealing with challenging problems using reinforcement learning.Despite its great success, reinforcement learning still faces challenge in continuous control tasks. Conventional methods always compute the derivatives of the optimal goal with a costly computation resources, and are inefficient, unstable and lack of robust-ness when dealing with s… ▽ More

    Submitted 31 January, 2020; v1 submitted 30 June, 2019; originally announced July 2019.

    Comments: I found some theoretical errors

  19. arXiv:1905.13043  [pdf, other

    math.OC stat.ML

    Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization

    Authors: Albert S Berahas, Liyuan Cao, Krzysztof Choromanski, Katya Scheinberg

    Abstract: In this paper, we consider derivative free optimization problems, where the objective function is smooth but is computed with some amount of noise, the function evaluations are expensive and no derivative information is available. We are motivated by policy optimization problems in reinforcement learning that have recently become popular [Choromaski et al. 2018; Fazel et al. 2018; Salimans et al.… ▽ More

    Submitted 2 June, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: 14 pages, 2 figures. arXiv admin note: text overlap with arXiv:1905.01332

  20. arXiv:1905.10594  [pdf

    cs.LG stat.ML

    Multi-view Information-theoretic Co-clustering for Co-occurrence Data

    Authors: Peng Xu, Zhaohong Deng, Kup-Sze Choi, Longbing Cao, Shitong Wang

    Abstract: Multi-view clustering has received much attention recently. Most of the existing multi-view clustering methods only focus on one-sided clustering. As the co-occurring data elements involve the counts of sample-feature co-occurrences, it is more efficient to conduct two-sided clustering along the samples and features simultaneously. To take advantage of two-sided clustering for the co-occurrences i… ▽ More

    Submitted 25 May, 2019; originally announced May 2019.

    Journal ref: AAAI 2019

  21. arXiv:1905.07237  [pdf, other

    cs.LG cs.AI stat.ML

    TBQ($σ$): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning

    Authors: Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Pan

    Abstract: Off-policy reinforcement learning with eligibility traces is challenging because of the discrepancy between target policy and behavior policy. One common approach is to measure the difference between two policies in a probabilistic way, such as importance sampling and tree-backup. However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Comments: 8 pages

    MSC Class: 68Wxx

  22. arXiv:1902.07903  [pdf, other

    cs.NI cs.LG stat.ML

    Learning Deterministic Policy with Target for Power Control in Wireless Networks

    Authors: Yujiao Lu, Hancheng Lu, Liangliang Cao, Feng Wu, Daren Zhu

    Abstract: Inter-Cell Interference Coordination (ICIC) is a promising way to improve energy efficiency in wireless networks, especially where small base stations are densely deployed. However, traditional optimization based ICIC schemes suffer from severe performance degradation with complex interference pattern. To address this issue, we propose a Deep Reinforcement Learning with Deterministic Policy and Ta… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

    Comments: 7 pages, 7 figures, GlobeCom2018

  23. arXiv:1806.04808  [pdf, other

    cs.LG cs.AI cs.DB stat.ML

    Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection

    Authors: Guansong Pang, Longbing Cao, Ling Chen, Huan Liu

    Abstract: Learning expressive low-dimensional representations of ultrahigh-dimensional data, e.g., data with thousands/millions of features, has been a major way to enable learning methods to address the curse of dimensionality. However, existing unsupervised representation learning methods mainly focus on preserving the data regularity information and learning the representations independently of subsequen… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: 10 pages, 4 figures, 3 tables. To appear in the proceedings of KDD18, Long presentation (oral)

  24. arXiv:1802.00324  [pdf

    cs.LG stat.ML

    One-class Collective Anomaly Detection based on Long Short-Term Memory Recurrent Neural Networks

    Authors: Nga Nguyen Thi, Van Loi Cao, Nhien-An Le-Khac

    Abstract: Intrusion detection for computer network systems has been becoming one of the most critical tasks for network administrators today. It has an important role for organizations, governments and our society due to the valuable resources hosted on computer networks. Traditional misuse detection strategies are unable to detect new and unknown intrusion types. In contrast, anomaly detection in network s… ▽ More

    Submitted 31 January, 2018; originally announced February 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1703.09752

  25. arXiv:1703.02391  [pdf, other

    cs.CV cs.LG stat.ML

    Learning from Noisy Labels with Distillation

    Authors: Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li

    Abstract: The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain. Traditionally, the label noises have been treated as statistical outliers, and approaches such as importance re-weighting and bootstrap have been proposed to alleviate the problem. According to our observation, the real-world noisy lab… ▽ More

    Submitted 7 April, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

  26. arXiv:1604.05198  [pdf, ps, other

    cs.NE cs.LG stat.ML

    Locally Imposing Function for Generalized Constraint Neural Networks - A Study on Equality Constraints

    Authors: Linlin Cao, Ran He, Bao-Gang Hu

    Abstract: This work is a further study on the Generalized Constraint Neural Network (GCNN) model [1], [2]. Two challenges are encountered in the study, that is, to embed any type of prior information and to select its imposing schemes. The work focuses on the second challenge and studies a new constraint imposing scheme for equality constraints. A new method called locally imposing function (LIF) is propose… ▽ More

    Submitted 18 April, 2016; originally announced April 2016.

    Comments: 8 pages, 7 figures

  27. arXiv:1310.1545  [pdf, ps, other

    cs.LG cs.SI stat.ML

    Learning Hidden Structures with Relational Models by Adequately Involving Rich Information in A Network

    Authors: Xuhui Fan, Richard Yi Da Xu, Longbing Cao, Yin Song

    Abstract: Effectively modelling hidden structures in a network is very practical but theoretically challenging. Existing relational models only involve very limited information, namely the binary directional link data, embedded in a network to learn hidden networking structures. There is other rich and meaningful information (e.g., various attributes of entities and more granular information than binary ele… ▽ More

    Submitted 6 October, 2013; originally announced October 2013.

  28. arXiv:1306.3003  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Non-parametric Power-law Data Clustering

    Authors: Xuhui Fan, Yiling Zeng, Longbing Cao

    Abstract: It has always been a great challenge for clustering algorithms to automatically determine the cluster numbers according to the distribution of datasets. Several approaches have been proposed to address this issue, including the recent promising work which incorporate Bayesian Nonparametrics into the $k$-means clustering procedure. This approach shows simplicity in implementation and solidity in th… ▽ More

    Submitted 12 June, 2013; originally announced June 2013.

  29. arXiv:1306.3002  [pdf, ps, other

    stat.ML cs.LG

    A Convergence Theorem for the Graph Shift-type Algorithms

    Authors: Xuhui Fan, Longbing Cao

    Abstract: Graph Shift (GS) algorithms are recently focused as a promising approach for discovering dense subgraphs in noisy data. However, there are no theoretical foundations for proving the convergence of the GS Algorithm. In this paper, we propose a generic theoretical framework consisting of three key GS components: simplex of generated sequence set, monotonic and continuous objective function and close… ▽ More

    Submitted 12 June, 2013; originally announced June 2013.

  30. arXiv:1306.2999  [pdf, ps, other

    cs.SI cs.LG stat.ML

    Dynamic Infinite Mixed-Membership Stochastic Blockmodel

    Authors: Xuhui Fan, Longbing Cao, Richard Yi Da Xu

    Abstract: Directional and pairwise measurements are often used to model inter-relationships in a social network setting. The Mixed-Membership Stochastic Blockmodel (MMSB) was a seminal work in this area, and many of its capabilities were extended since then. In this paper, we propose the \emph{Dynamic Infinite Mixed-Membership stochastic blockModel (DIM3)}, a generalised framework that extends the existing… ▽ More

    Submitted 12 June, 2013; originally announced June 2013.

  31. arXiv:1306.2733  [pdf, ps, other

    cs.LG stat.ML

    Copula Mixed-Membership Stochastic Blockmodel for Intra-Subgroup Correlations

    Authors: Xuhui Fan, Longbing Cao, Richard Yi Da Xu

    Abstract: The \emph{Mixed-Membership Stochastic Blockmodel (MMSB)} is a popular framework for modeling social network relationships. It can fully exploit each individual node's participation (or membership) in a social structure. Despite its powerful representations, this model makes an assumption that the distributions of relational membership indicators between two nodes are independent. Under many social… ▽ More

    Submitted 6 October, 2013; v1 submitted 12 June, 2013; originally announced June 2013.

  32. arXiv:1305.5734  [pdf, ps, other

    stat.ML cs.LG

    Characterizing A Database of Sequential Behaviors with Latent Dirichlet Hidden Markov Models

    Authors: Yin Song, Longbing Cao, Xuhui Fan, Wei Cao, Jian Zhang

    Abstract: This paper proposes a generative model, the latent Dirichlet hidden Markov models (LDHMM), for characterizing a database of sequential behaviors (sequences). LDHMMs posit that each sequence is generated by an underlying Markov chain process, which are controlled by the corresponding parameters (i.e., the initial state vector, transition matrix and the emission matrix). These sequence-level latent… ▽ More

    Submitted 24 May, 2013; originally announced May 2013.

    ACM Class: H.2.8; F.1.2

  33. arXiv:1211.2400   

    stat.ME

    A shrinkage estimation for large dimensional precision matrices using random matrix theory

    Authors: Cheng Wang, Guangming Pan, Longbing Cao

    Abstract: In this paper, a new ridge-type shrinkage estimator for the precision matrix has been proposed. The asymptotic optimal shrinkage coefficients and the theoretical loss were derived. Data-driven estimators for the shrinkage coefficients were also conducted based on the asymptotic results deriving from random matrix theories. The new estimator which has a simple explicit formula is distribution-fre… ▽ More

    Submitted 2 September, 2019; v1 submitted 11 November, 2012; originally announced November 2012.

    Comments: This paper has been withdrawn by the author due to substantial contents will be updated

  34. Non-parametric shrinkage mean estimation for quadratic loss functions with unknown covariance matrices

    Authors: Cheng Wang, Tiejun Tong, Longbing Cao, Baiqi Miao

    Abstract: In this paper, a shrinkage estimator for the population mean is proposed under known quadratic loss functions with unknown covariance matrices. The new estimator is non-parametric in the sense that it does not assume a specific parametric distribution for the data and it does not require the prior information on the population covariance matrix. Analytical results on the improvement of the propose… ▽ More

    Submitted 6 November, 2014; v1 submitted 7 November, 2012; originally announced November 2012.

    Comments: Some technical parts of Theorem 3.1 and 3.2 were corrected in this version

    Journal ref: Journal of Multivariate Analysis, 125, 222-232, 2014

  35. Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data

    Authors: Cheng Wang, Longbing Cao, Baiqi Miao

    Abstract: This work studies the theoretical rules of feature selection in linear discriminant analysis (LDA), and a new feature selection method is proposed for sparse linear discriminant analysis. An $l_1$ minimization method is used to select the important features from which the LDA will be constructed. The asymptotic results of this proposed two-stage LDA (TLDA) are studied, demonstrating that TLDA is a… ▽ More

    Submitted 22 April, 2013; v1 submitted 8 June, 2012; originally announced June 2012.

    Comments: 20 pages, 3 figures, 5 tables, accepted by Computational Statistics and Data Analysis

  36. On Identity Tests for High Dimensional Data Using RMT

    Authors: Cheng Wang, Jing Yang, Baiqi Miao, Longbing Cao

    Abstract: In this work, we redefined two important statistics, the CLRT test (Bai et.al., Ann. Stat. 37 (2009) 3822-3840) and the LW test (Ledoit and Wolf, Ann. Stat. 30 (2002) 1081-1102) on identity tests for high dimensional data using random matrix theories. Compared with existing CLRT and LW tests, the new tests can accommodate data which has unknown means and non-Gaussian distributions. Simulations dem… ▽ More

    Submitted 11 April, 2013; v1 submitted 15 March, 2012; originally announced March 2012.

    Comments: 16 pages, 2 figures, 3 tables, To be published in the Journal of Multivariate Analysis

    MSC Class: 62H15; 62H10