Search | arXiv e-print repository

Graph-theoretic Inference for Random Effects in High-dimensional Studies

Abstract: We study the problem of testing for the presence of random effects in mixed models with high-dimensional fixed effects. To this end, we propose a rank-based graph-theoretic approach to test whether a collection of random effects is zero. Our approach is non-parametric and model-free in the sense that we not require correct specification of the mixed model nor estimation of unknown parameters. Inst… ▽ More We study the problem of testing for the presence of random effects in mixed models with high-dimensional fixed effects. To this end, we propose a rank-based graph-theoretic approach to test whether a collection of random effects is zero. Our approach is non-parametric and model-free in the sense that we not require correct specification of the mixed model nor estimation of unknown parameters. Instead, the test statistic evaluates whether incorporating group-level correlation meaningfully improves the ability of a potentially high-dimensional covariate vector $X$ to predict a response variable $Y$. We establish the consistency of the proposed test and derive its asymptotic null distribution. Through simulation studies and a real data application, we demonstrate the practical effectiveness of the proposed test. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2505.21814 [pdf, ps, other]

Adaptive Block-Based Change-Point Detection for Sparse Spatially Clustered Data with Applications in Remote Sensing Imaging

Authors: Alan Moore, Lynna Chu, Zhengyuan Zhu

Abstract: We present a non-parametric change-point detection approach to detect potentially sparse changes in a time series of high-dimensional observations or non-Euclidean data objects. We target a change in distribution that occurs in a small, unknown subset of dimensions, where these dimensions may be correlated. Our work is motivated by a remote sensing application, where changes occur in small, spatia… ▽ More We present a non-parametric change-point detection approach to detect potentially sparse changes in a time series of high-dimensional observations or non-Euclidean data objects. We target a change in distribution that occurs in a small, unknown subset of dimensions, where these dimensions may be correlated. Our work is motivated by a remote sensing application, where changes occur in small, spatially clustered regions over time. An adaptive block-based change-point detection framework is proposed that accounts for spatial dependencies across dimensions and leverages these dependencies to boost detection power and improve estimation accuracy. Through simulation studies, we demonstrate that our approach has superior performance in detecting sparse changes in datasets with spatial or local group structures. An application of the proposed method to detect activity, such as new construction, in remote sensing imagery of the Natanz Nuclear facility in Iran is presented to demonstrate the method's efficacy. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2402.15600 [pdf, ps, other]

A Graph-based Approach to Estimating the Number of Clusters in High-dimensional Settings

Authors: Yichuan Bai, Lynna Chu

Abstract: We consider the problem of estimating the number of clusters (k) in a dataset. We propose a non-parametric approach to the problem that utilizes similarity graphs to construct a robust statistic that effectively captures similarity information among observations. This graph-based statistic is applicable to datasets of any dimension, is computationally efficient to obtain, and can be paired with an… ▽ More We consider the problem of estimating the number of clusters (k) in a dataset. We propose a non-parametric approach to the problem that utilizes similarity graphs to construct a robust statistic that effectively captures similarity information among observations. This graph-based statistic is applicable to datasets of any dimension, is computationally efficient to obtain, and can be paired with any kind of clustering technique. Asymptotic theory is developed to establish the selection consistency of the proposed approach. Simulation studies demonstrate that the graph-based statistic outperforms existing methods for estimating k, especially in the high-dimensional setting. We illustrate its utility on an imaging dataset and an RNA-seq dataset. △ Less

Submitted 11 June, 2025; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2307.12325 [pdf, ps, other]

A Robust Framework for Graph-based Two-Sample Tests Using Weights

Authors: Yichuan Bai, Lynna Chu

Abstract: Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The test statistics are constructed from similarity graphs (such as K-minimum spanning tree), and consequently, their performance is sensitive to the structure of the graph. When the graph has problematic structures (for example, hubs), as is common for high-dimensional data, this can resul… ▽ More Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The test statistics are constructed from similarity graphs (such as K-minimum spanning tree), and consequently, their performance is sensitive to the structure of the graph. When the graph has problematic structures (for example, hubs), as is common for high-dimensional data, this can result in low power and unstable performance among existing graph-based tests. We address this challenge by proposing new test statistics that are robust to problematic structures of the graph and can provide reliable inferences. We employ an edge-weighting strategy using intrinsic characteristics of the graph that are computationally simple and efficient to obtain. The limiting null distribution of the robust test statistics is derived and shown to work well for finite sample sizes. Simulation studies and data analysis of Chicago taxi-trip travel patterns demonstrate the new tests' improved performance across a range of settings. △ Less

Submitted 19 June, 2025; v1 submitted 23 July, 2023; originally announced July 2023.

arXiv:2301.04856 [pdf, other]

Multimodal Deep Learning

Authors: Cem Akkus, Luyang Chu, Vladana Djakovic, Steffen Jauch-Walser, Philipp Koch, Giacomo Loss, Christopher Marquardt, Marco Moldovan, Nadja Sauter, Maximilian Schneider, Rickmer Schulte, Karol Urbanczyk, Jann Goschenhofer, Christian Heumann, Rasmus Hvingelby, Daniel Schalk, Matthias Aßenmacher

Abstract: This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance rep… ▽ More This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2008.03655 [pdf, other]

Global Optimum Search in Quantum Deep Learning

Authors: Lanston Hau Man Chu, Tejas Bhojraj, Rui Huang

Abstract: This paper aims to solve machine learning optimization problem by using quantum circuit. Two approaches, namely the average approach and the Partial Swap Test Cut-off method (PSTC) was proposed to search for the global minimum/maximum of two different objective functions. The current cost is $O(\sqrt{|Θ|} N)$, but there is potential to improve PSTC further to $O(\sqrt{|Θ|} \cdot sublinear \ N)$ by… ▽ More This paper aims to solve machine learning optimization problem by using quantum circuit. Two approaches, namely the average approach and the Partial Swap Test Cut-off method (PSTC) was proposed to search for the global minimum/maximum of two different objective functions. The current cost is $O(\sqrt{|Θ|} N)$, but there is potential to improve PSTC further to $O(\sqrt{|Θ|} \cdot sublinear \ N)$ by enhancing the checking process. △ Less

Submitted 9 August, 2020; originally announced August 2020.

Comments: 17 pages

arXiv:2007.03797 [pdf, other]

Personalized Cross-Silo Federated Learning on Non-IID Data

Authors: Yutao Huang, Lingyang Chu, Zirui Zhou, Lanjun Wang, Jiangchuan Liu, Jian Pei, Yong Zhang

Abstract: Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a he… ▽ More Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a heuristic method to further improve the performance of FedAMP when clients adopt deep neural networks as personalized models. Our extensive experiments on benchmark data sets demonstrate the superior performance of the proposed methods. △ Less

Submitted 13 December, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: Accepted by AAAI 2021. The API of this work is available at Huawei Cloud (https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=6d4a9521-6a4d-4b6d-b84d-943d7c7b1cbd), free registration is required before use

arXiv:1906.06857 [pdf, other]

Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution

Authors: Zicun Cong, Lingyang Chu, Lanjun Wang, Xia Hu, Jian Pei

Abstract: More and more AI services are provided through APIs on cloud where predictive models are hidden behind APIs. To build trust with users and reduce potential application risk, it is important to interpret how such predictive models hidden behind APIs make their decisions. The biggest challenge of interpreting such predictions is that no access to model parameters or training data is available. Exist… ▽ More More and more AI services are provided through APIs on cloud where predictive models are hidden behind APIs. To build trust with users and reduce potential application risk, it is important to interpret how such predictive models hidden behind APIs make their decisions. The biggest challenge of interpreting such predictions is that no access to model parameters or training data is available. Existing works interpret the predictions of a model hidden behind an API by heuristically probing the response of the API with perturbed input instances. However, these methods do not provide any guarantee on the exactness and consistency of their interpretations. In this paper, we propose an elegant closed form solution named OpenAPI to compute exact and consistent interpretations for the family of Piecewise Linear Models (PLM), which includes many popular classification models. The major idea is to first construct a set of overdetermined linear equation systems with a small set of perturbed instances and the predictions made by the model on those instances. Then, we solve the equation systems to identify the decision features that are responsible for the prediction on an input instance. Our extensive experiments clearly demonstrate the exactness and consistency of our method. △ Less

Submitted 19 April, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

arXiv:1905.06329 [pdf, ps, other]

LEMO: Learn to Equalize for MIMO-OFDM Systems with Low-Resolution ADCs

Authors: Lei Chu, Ling Pei, Husheng Li, Robert Caiming Qiu

Abstract: This paper develops a new deep neural network optimized equalization framework for massive multiple input multiple output orthogonal frequency division multiplexing (MIMOOFDM) systems that employ low-resolution analog-to-digital converters (ADCs) at the base station (BS). The use of lowresolution ADCs could largely reduce hardware complexity and circuit power consumption, however, it makes the cha… ▽ More This paper develops a new deep neural network optimized equalization framework for massive multiple input multiple output orthogonal frequency division multiplexing (MIMOOFDM) systems that employ low-resolution analog-to-digital converters (ADCs) at the base station (BS). The use of lowresolution ADCs could largely reduce hardware complexity and circuit power consumption, however, it makes the channel station information almost blind to the BS, hence causing difficulty in solving the equalization problem. In this paper, we consider a supervised learning architecture, where the goal is to learn a representative function that can predict the targets (constellation points) from the inputs (outputs of the low-resolution ADCs) based on the labeled training data (pilot signals). Especially, our main contributions are two-fold: 1) First, we design a new activation function, whose outputs are close to the constellation points when the parameters are finally optimized, to help us fully exploit the stochastic gradient descent method for the discrete optimization problem. 2) Second, an unsupervised loss is designed and then added to the optimization objective, aiming to enhance the representation ability (so-called generalization). Lastly, various experimental results confirm the superiority of the proposed equalizer over some existing ones, particularly when the statistics of the channel state information are unclear. △ Less

Submitted 25 May, 2020; v1 submitted 14 May, 2019; originally announced May 2019.

arXiv:1903.00711 [pdf, other]

neuralRank: Searching and ranking ANN-based model repositories

Authors: Nirmit Desai, Linsong Chu, Raghu K. Ganti, Sebastian Stein, Mudhakar Srivatsa

Abstract: Widespread applications of deep learning have led to a plethora of pre-trained neural network models for common tasks. Such models are often adapted from other models via transfer learning. The models may have varying training sets, training algorithms, network architectures, and hyper-parameters. For a given application, what isthe most suitable model in a model repository? This is a critical que… ▽ More Widespread applications of deep learning have led to a plethora of pre-trained neural network models for common tasks. Such models are often adapted from other models via transfer learning. The models may have varying training sets, training algorithms, network architectures, and hyper-parameters. For a given application, what isthe most suitable model in a model repository? This is a critical question for practical deployments but it has not received much attention. This paper introduces the novel problem of searching and ranking models based on suitability relative to a target dataset and proposes a ranking algorithm called \textit{neuralRank}. The key idea behind this algorithm is to base model suitability on the discriminating power of a model, using a novel metric to measure it. With experimental results on the MNIST, Fashion, and CIFAR10 datasets, we demonstrate that (1) neuralRank is independent of the domain, the training set, or the network architecture and (2) that the models ranked highly by neuralRank ranking tend to have higher model accuracy in practice. △ Less

Submitted 2 March, 2019; originally announced March 2019.

arXiv:1810.05973 [pdf, other]

doi 10.1109/TSP.2022.3205763

Sequential Change-point Detection for High-dimensional and non-Euclidean Data

Authors: Lynna Chu, Hao Chen

Abstract: In many applications, it is often of practical and scientific interest to detect anomaly events in a streaming sequence of high-dimensional or non-Euclidean observations. We study a non-parametric framework that utilizes nearest neighbor information among the observations to detect changes in an online setting. It can be applied to data in arbitrary dimension and non-Euclidean data as long as a si… ▽ More In many applications, it is often of practical and scientific interest to detect anomaly events in a streaming sequence of high-dimensional or non-Euclidean observations. We study a non-parametric framework that utilizes nearest neighbor information among the observations to detect changes in an online setting. It can be applied to data in arbitrary dimension and non-Euclidean data as long as a similarity measure on the sample space can be defined. We consider new test statistics under this framework that can detect anomaly events more effectively than the existing test while keeping the false discovery rate controlled at a fixed level. Analytic formulas approximating the average run lengths of the new approaches are derived to make them fast applicable to modern datasets. Simulation studies are provided to support theoretical results. The proposed approach is illustrated with an analysis of the NYC taxi dataset. △ Less

Submitted 21 October, 2022; v1 submitted 14 October, 2018; originally announced October 2018.

Journal ref: in IEEE Transactions on Signal Processing, vol. 70, pp. 4498-4511, 2022

arXiv:1808.05403 [pdf, other]

A Survey on Nonconvex Regularization Based Sparse and Low-Rank Recovery in Signal Processing, Statistics, and Machine Learning

Authors: Fei Wen, Lei Chu, Peilin Liu, Robert C. Qiu

Abstract: In the past decade, sparse and low-rank recovery have drawn much attention in many areas such as signal/image processing, statistics, bioinformatics and machine learning. To achieve sparsity and/or low-rankness inducing, the $\ell_1$ norm and nuclear norm are of the most popular regularization penalties due to their convexity. While the $\ell_1$ and nuclear norm are convenient as the related conve… ▽ More In the past decade, sparse and low-rank recovery have drawn much attention in many areas such as signal/image processing, statistics, bioinformatics and machine learning. To achieve sparsity and/or low-rankness inducing, the $\ell_1$ norm and nuclear norm are of the most popular regularization penalties due to their convexity. While the $\ell_1$ and nuclear norm are convenient as the related convex optimization problems are usually tractable, it has been shown in many applications that a nonconvex penalty can yield significantly better performance. In recent, nonconvex regularization based sparse and low-rank recovery is of considerable interest and it in fact is a main driver of the recent progress in nonconvex and nonsmooth optimization. This paper gives an overview of this topic in various fields in signal processing, statistics and machine learning, including compressive sensing (CS), sparse regression and variable selection, sparse signals separation, sparse principal component analysis (PCA), large covariance and inverse covariance matrices estimation, matrix completion, and robust PCA. We present recent developments of nonconvex regularization based sparse and low-rank recovery in these fields, addressing the issues of penalty selection, applications and the convergence of nonconvex algorithms. Code is available at https://github.com/FWen/ncreg.git. △ Less

Submitted 6 June, 2019; v1 submitted 16 August, 2018; originally announced August 2018.

Comments: 22 pages

Journal ref: Published in IEEE Access 2018: https://ieeexplore.ieee.org/abstract/document/8531588

arXiv:1802.03503 [pdf, other]

A New Approach of Exploiting Self-Adjoint Matrix Polynomials of Large Random Matrices for Anomaly Detection and Fault Location

Authors: Zenan Ling, Robert C. Qiu, Xing He, Lei Chu

Abstract: Synchronized measurements of a large power grid enable an unprecedented opportunity to study the spatialtemporal correlations. Statistical analytics for those massive datasets start with high-dimensional data matrices. Uncertainty is ubiquitous in a future's power grid. These data matrices are recognized as random matrices. This new point of view is fundamental in our theoretical analysis since tr… ▽ More Synchronized measurements of a large power grid enable an unprecedented opportunity to study the spatialtemporal correlations. Statistical analytics for those massive datasets start with high-dimensional data matrices. Uncertainty is ubiquitous in a future's power grid. These data matrices are recognized as random matrices. This new point of view is fundamental in our theoretical analysis since true covariance matrices cannot be estimated accurately in a high-dimensional regime. As an alternative, we consider large-dimensional sample covariance matrices in the asymptotic regime to replace the true covariance matrices. The self-adjoint polynomials of large-dimensional random matrices are studied as statistics for big data analytics. The calculation of the asymptotic spectrum distribution (ASD) for such a matrix polynomial is understandably challenging. This task is made possible by a recent breakthrough in free probability, an active research branch in random matrix theory. This is the very reason why the work of this paper is inspired initially. The new approach is interesting in many aspects. The mathematical reason may be most critical. The real-world problems can be solved using this approach, however. △ Less

Submitted 9 February, 2018; originally announced February 2018.

Comments: 12 pages, 13 figures, submitted to IEEE Trans on Big Data

arXiv:1801.01669 [pdf, other]

Early Anomaly Detection and Location in Distribution Network: A Data-Driven Approach

Authors: Xin Shi, Robert Qiu, Xing He, Zenan Ling, Haosen Yang, Lei Chu

Abstract: The measurement data collected from the supervisory control and data acquisition (SCADA) system installed in distribution network can reflect the operational state of the network effectively. In this paper, a random matrix theory (RMT) based approach is developed for early anomaly detection and localization by using the data. For every feeder in the distribution network, a corresponding data matri… ▽ More The measurement data collected from the supervisory control and data acquisition (SCADA) system installed in distribution network can reflect the operational state of the network effectively. In this paper, a random matrix theory (RMT) based approach is developed for early anomaly detection and localization by using the data. For every feeder in the distribution network, a corresponding data matrix is formed. Based on the Marchenko-Pastur Law for the empirical spectral analysis of covariance `signal+noise' matrix, the linear eigenvalue statistics are introduced to indicate the anomaly, and the outliers and their corresponding eigenvectors are analyzed for locating the anomaly. As for the low observability feeders in the distribution network, an increasing data dimension algorithm is designed for the formulated low-dimensional matrices being more accurately analyzed. The developed approach can detect and localize the anomaly at an early stage, and it is robust to random disturbance and measurement error. Cases on Matpower simulation data and real SCADA data corroborate the feasibility of the approach. △ Less

Submitted 11 March, 2020; v1 submitted 5 January, 2018; originally announced January 2018.

Comments: 10 pages, submitted to IET Generation, Transmission and Distribution

arXiv:1710.10745 [pdf, other]

doi 10.1109/TPWRS.2019.2935739

Invisible Units Detection and Estimation Based on Random Matrix Theory

Authors: Xing He, Lei Chu, Robert C. Qiu, Qian Ai, Zenan Ling, Jian Zhang

Abstract: Invisible units mainly refer to small-scale units that are not monitored by, and thus are not visible to utilities. Integration of these invisible units into power systems does significantly affect the way in which a distribution grid is planned and operated. This paper, based on random matrix theory (RMT), proposes a statistical, data-driven framework to handle the massive grid data, in contrast… ▽ More Invisible units mainly refer to small-scale units that are not monitored by, and thus are not visible to utilities. Integration of these invisible units into power systems does significantly affect the way in which a distribution grid is planned and operated. This paper, based on random matrix theory (RMT), proposes a statistical, data-driven framework to handle the massive grid data, in contrast to its deterministic, model-based counterpart. Combining the RMT-based data-mining framework with conventional techniques, some heuristics are derived as the solution to the invisible units detection and estimation task: linear eigenvalue statistic indicators (LESs) are suggested as the main ingredients of the solution; according to the statistical properties of LESs, the hypothesis testing is formulated to conduct change point detection in the high-dimensional space. The proposed method is promising for anomaly detection and pertinent to current distribution networks-it is capable of detecting invisible power usage and fraudulent behavior while even being able to locate the suspect's location. Case studies, using both simulated data and actual data, validate the proposed method. △ Less

Submitted 9 December, 2023; v1 submitted 29 October, 2017; originally announced October 2017.

Comments: 10 pages

Journal ref: IEEE Transactions on Power Systems, 2019, 35(3): 1846-1855

arXiv:1708.04935 [pdf, other]

Spatio-Temporal Big Data Analysis for Smart Grids Based on Random Matrix Theory: A Comprehensive Study

Authors: Robert Qiu, Lei Chu, Xing He, Zenan Ling, Haichun Liu

Abstract: A cornerstone of the smart grid is the advanced monitorability on its assets and operations. Increasingly pervasive installation of the phasor measurement units (PMUs) allows the so-called synchrophasor measurements to be taken roughly 100 times faster than the legacy supervisory control and data acquisition (SCADA) measurements, time-stamped using the global positioning system (GPS) signals to ca… ▽ More A cornerstone of the smart grid is the advanced monitorability on its assets and operations. Increasingly pervasive installation of the phasor measurement units (PMUs) allows the so-called synchrophasor measurements to be taken roughly 100 times faster than the legacy supervisory control and data acquisition (SCADA) measurements, time-stamped using the global positioning system (GPS) signals to capture the grid dynamics. On the other hand, the availability of low-latency two-way communication networks will pave the way to high-precision real-time grid state estimation and detection, remedial actions upon network instability, and accurate risk analysis and post-event assessment for failure prevention. In this chapter, we firstly modelling spatio-temporal PMU data in large scale grids as random matrix sequences. Secondly, some basic principles of random matrix theory (RMT), such as asymptotic spectrum laws, transforms, convergence rate and free probability, are introduced briefly in order to the better understanding and application of RMT technologies. Lastly, the case studies based on synthetic data and real data are developed to evaluate the performance of the RMT-based schemes in different application scenarios (i.e., state evaluation and situation awareness). △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: Book chapter#23 for the book "Transportation and Power Grid in Smart Cities: Communication Networks and Services". arXiv admin note: text overlap with arXiv:1302.0885 by other authors

arXiv:1707.00167 [pdf, other]

Asymptotic Distribution-Free Change-Point Detection for Multivariate and non-Euclidean Data

Authors: Lynna Chu, Hao Chen

Abstract: We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing ap… ▽ More We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution free under the null hypothesis of no change. Analytic p-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets. The new approaches are illustrated in an analysis of New York taxi data. △ Less

Submitted 22 February, 2018; v1 submitted 1 July, 2017; originally announced July 2017.

arXiv:1610.05076 [pdf, other]

A Novel Data-Driven Situation Awareness Approach for Future Grids--Using Large Random Matrices for Big Data Modeling

Authors: Xing He, Lei Chu, Robert C. Qiu, Qian Ai, Zenan Ling

Abstract: Data-driven approaches, when tasked with situation awareness, are suitable for complex grids with massive datasets. It is a challenge, however, to efficiently turn these massive datasets into useful big data analytics. To address such a challenge, this paper, based on random matrix theory (RMT), proposes a datadriven approach. The approach models massive datasets as large random matrices; it is mo… ▽ More Data-driven approaches, when tasked with situation awareness, are suitable for complex grids with massive datasets. It is a challenge, however, to efficiently turn these massive datasets into useful big data analytics. To address such a challenge, this paper, based on random matrix theory (RMT), proposes a datadriven approach. The approach models massive datasets as large random matrices; it is model-free and requiring no knowledge about physical model parameters. In particular, the large data dimension N and the large time span T, from the spatial aspect and the temporal aspect respectively, lead to favorable results. The beautiful thing lies in that these linear eigenvalue statistics (LESs) built from data matrices follow Gaussian distributions for very general conditions, due to the latest breakthroughs in probability on the central limit theorems of those LESs. Numerous case studies, with both simulated data and field data, are given to validate the proposed new algorithms. △ Less

Submitted 16 January, 2018; v1 submitted 17 October, 2016; originally announced October 2016.

Comments: 10 pages, 14 figures, 2 tables, Submit to IEEE Access. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

arXiv:1609.03301 [pdf, other]

Massive Streaming PMU Data Modeling and Analytics in Smart Grid State Evaluation Based on Multiple High-Dimensional Covariance Tests

Authors: Lei Chu, Robert Qiu, Xing He, Zenan Ling, Yadong Liu

Abstract: The analogous deployment of phase measurement units (PMUs), the increase of data quantum and the deregulation of energy market, all call for the robust state evaluation in large scale power systems. Implementing model based estimators is impractical because of the complexity scale of solving the high dimension power flow equations. In this paper, we first represent massive streaming PMU data as bi… ▽ More The analogous deployment of phase measurement units (PMUs), the increase of data quantum and the deregulation of energy market, all call for the robust state evaluation in large scale power systems. Implementing model based estimators is impractical because of the complexity scale of solving the high dimension power flow equations. In this paper, we first represent massive streaming PMU data as big random matrix flow. By exploiting the variations in the covariance matrix of the massive streaming PMU data, a novel power state evaluation algorithm is then developed based on the multiple high dimensional covariance matrix tests. The proposed test statistic is flexible and nonparametric, which assumes no specific parameter distribution or dimension structure for the PMU data. Besides, it can jointly reveal the relative magnitude, duration and location of a system event. For the sake of practical application, we reduce the computation of the proposed test statistic from $O(\varepsilon n_g^4)$ to $O(ηn_g^2)$ by principal component calculation and redundant computation elimination. The novel algorithm is numerically evaluated utilizing the IEEE 30-, 118-bus system, a Polish 2383-bus system, and a real 34-PMU system. The case studies illustrate and verify the superiority of proposed state evaluation indicator. △ Less

Submitted 22 June, 2017; v1 submitted 12 September, 2016; originally announced September 2016.

Comments: IEEE, transations on Big Data, 2017

arXiv:1512.07082 [pdf, other]

doi 10.1109/ACCESS.2016.2581838

Designing for Situation Awareness of Future Power Grids: An Indicator System Based on Linear Eigenvalue Statistics of Large Random Matrices

Authors: Xing He, Robert C. Qiu, Qian Ai, Lei Chu, Xinyi Xu, Zenan Ling

Abstract: Future power grids are fundamentally different from current ones, both in size and in complexity; this trend imposes challenges for situation awareness (SA) based on classical indicators, which are usually model-based and deterministic. As an alternative, this paper proposes a statistical indicator system based on linear eigenvalue statistics (LESs) of large random matrices: 1) from a data modelin… ▽ More Future power grids are fundamentally different from current ones, both in size and in complexity; this trend imposes challenges for situation awareness (SA) based on classical indicators, which are usually model-based and deterministic. As an alternative, this paper proposes a statistical indicator system based on linear eigenvalue statistics (LESs) of large random matrices: 1) from a data modeling viewpoint, we build, starting from power flows equations, the random matrix models (RMMs) only using the real-time data flow in a statistical manner; 2) for a data analysis that is fully driven from RMMs, we put forward the high-dimensional indicators, called LESs that have some unique statistical features such as Gaussian properties; and 3) we develop a three-dimensional (3D) power-map to visualize the system, respectively, from a high-dimensional viewpoint and a low-dimensional one. Therefore, a statistical methodology of SA is employed; it conducts SA with a model-free and data-driven procedure, requiring no knowledge of system topologies, units operation/control models, causal relationship, etc. This methodology has numerous advantages, such as sensitivity, universality, speed, and flexibility. In particular, its robustness against bad data is highlighted, with potential advantages in cyber security. The theory of big data based stability for on-line operations may prove feasible along with this line of work, although this critical development will be reported elsewhere. △ Less

Submitted 6 July, 2016; v1 submitted 22 December, 2015; originally announced December 2015.

Comments: 8 pages, 8 figures, 3 tables

Journal ref: IEEE Access , vol.4, pp.3557-3568, 2016

Showing 1–20 of 20 results for author: Chu, L