-
Graph-theoretic Inference for Random Effects in High-dimensional Studies
Authors:
Lynna Chu,
Yichuan Bai
Abstract:
We study the problem of testing for the presence of random effects in mixed models with high-dimensional fixed effects. To this end, we propose a rank-based graph-theoretic approach to test whether a collection of random effects is zero. Our approach is non-parametric and model-free in the sense that we not require correct specification of the mixed model nor estimation of unknown parameters. Inst…
▽ More
We study the problem of testing for the presence of random effects in mixed models with high-dimensional fixed effects. To this end, we propose a rank-based graph-theoretic approach to test whether a collection of random effects is zero. Our approach is non-parametric and model-free in the sense that we not require correct specification of the mixed model nor estimation of unknown parameters. Instead, the test statistic evaluates whether incorporating group-level correlation meaningfully improves the ability of a potentially high-dimensional covariate vector $X$ to predict a response variable $Y$. We establish the consistency of the proposed test and derive its asymptotic null distribution. Through simulation studies and a real data application, we demonstrate the practical effectiveness of the proposed test.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Adaptive Block-Based Change-Point Detection for Sparse Spatially Clustered Data with Applications in Remote Sensing Imaging
Authors:
Alan Moore,
Lynna Chu,
Zhengyuan Zhu
Abstract:
We present a non-parametric change-point detection approach to detect potentially sparse changes in a time series of high-dimensional observations or non-Euclidean data objects. We target a change in distribution that occurs in a small, unknown subset of dimensions, where these dimensions may be correlated. Our work is motivated by a remote sensing application, where changes occur in small, spatia…
▽ More
We present a non-parametric change-point detection approach to detect potentially sparse changes in a time series of high-dimensional observations or non-Euclidean data objects. We target a change in distribution that occurs in a small, unknown subset of dimensions, where these dimensions may be correlated. Our work is motivated by a remote sensing application, where changes occur in small, spatially clustered regions over time. An adaptive block-based change-point detection framework is proposed that accounts for spatial dependencies across dimensions and leverages these dependencies to boost detection power and improve estimation accuracy. Through simulation studies, we demonstrate that our approach has superior performance in detecting sparse changes in datasets with spatial or local group structures. An application of the proposed method to detect activity, such as new construction, in remote sensing imagery of the Natanz Nuclear facility in Iran is presented to demonstrate the method's efficacy.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
A Graph-based Approach to Estimating the Number of Clusters in High-dimensional Settings
Authors:
Yichuan Bai,
Lynna Chu
Abstract:
We consider the problem of estimating the number of clusters (k) in a dataset. We propose a non-parametric approach to the problem that utilizes similarity graphs to construct a robust statistic that effectively captures similarity information among observations. This graph-based statistic is applicable to datasets of any dimension, is computationally efficient to obtain, and can be paired with an…
▽ More
We consider the problem of estimating the number of clusters (k) in a dataset. We propose a non-parametric approach to the problem that utilizes similarity graphs to construct a robust statistic that effectively captures similarity information among observations. This graph-based statistic is applicable to datasets of any dimension, is computationally efficient to obtain, and can be paired with any kind of clustering technique. Asymptotic theory is developed to establish the selection consistency of the proposed approach. Simulation studies demonstrate that the graph-based statistic outperforms existing methods for estimating k, especially in the high-dimensional setting. We illustrate its utility on an imaging dataset and an RNA-seq dataset.
△ Less
Submitted 11 June, 2025; v1 submitted 23 February, 2024;
originally announced February 2024.
-
A Robust Framework for Graph-based Two-Sample Tests Using Weights
Authors:
Yichuan Bai,
Lynna Chu
Abstract:
Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The test statistics are constructed from similarity graphs (such as K-minimum spanning tree), and consequently, their performance is sensitive to the structure of the graph. When the graph has problematic structures (for example, hubs), as is common for high-dimensional data, this can resul…
▽ More
Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The test statistics are constructed from similarity graphs (such as K-minimum spanning tree), and consequently, their performance is sensitive to the structure of the graph. When the graph has problematic structures (for example, hubs), as is common for high-dimensional data, this can result in low power and unstable performance among existing graph-based tests. We address this challenge by proposing new test statistics that are robust to problematic structures of the graph and can provide reliable inferences. We employ an edge-weighting strategy using intrinsic characteristics of the graph that are computationally simple and efficient to obtain. The limiting null distribution of the robust test statistics is derived and shown to work well for finite sample sizes. Simulation studies and data analysis of Chicago taxi-trip travel patterns demonstrate the new tests' improved performance across a range of settings.
△ Less
Submitted 19 June, 2025; v1 submitted 23 July, 2023;
originally announced July 2023.
-
Multimodal Deep Learning
Authors:
Cem Akkus,
Luyang Chu,
Vladana Djakovic,
Steffen Jauch-Walser,
Philipp Koch,
Giacomo Loss,
Christopher Marquardt,
Marco Moldovan,
Nadja Sauter,
Maximilian Schneider,
Rickmer Schulte,
Karol Urbanczyk,
Jann Goschenhofer,
Christian Heumann,
Rasmus Hvingelby,
Daniel Schalk,
Matthias Aßenmacher
Abstract:
This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance rep…
▽ More
This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Global Optimum Search in Quantum Deep Learning
Authors:
Lanston Hau Man Chu,
Tejas Bhojraj,
Rui Huang
Abstract:
This paper aims to solve machine learning optimization problem by using quantum circuit. Two approaches, namely the average approach and the Partial Swap Test Cut-off method (PSTC) was proposed to search for the global minimum/maximum of two different objective functions. The current cost is $O(\sqrt{|Θ|} N)$, but there is potential to improve PSTC further to $O(\sqrt{|Θ|} \cdot sublinear \ N)$ by…
▽ More
This paper aims to solve machine learning optimization problem by using quantum circuit. Two approaches, namely the average approach and the Partial Swap Test Cut-off method (PSTC) was proposed to search for the global minimum/maximum of two different objective functions. The current cost is $O(\sqrt{|Θ|} N)$, but there is potential to improve PSTC further to $O(\sqrt{|Θ|} \cdot sublinear \ N)$ by enhancing the checking process.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
Personalized Cross-Silo Federated Learning on Non-IID Data
Authors:
Yutao Huang,
Lingyang Chu,
Zirui Zhou,
Lanjun Wang,
Jiangchuan Liu,
Jian Pei,
Yong Zhang
Abstract:
Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a he…
▽ More
Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a heuristic method to further improve the performance of FedAMP when clients adopt deep neural networks as personalized models. Our extensive experiments on benchmark data sets demonstrate the superior performance of the proposed methods.
△ Less
Submitted 13 December, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution
Authors:
Zicun Cong,
Lingyang Chu,
Lanjun Wang,
Xia Hu,
Jian Pei
Abstract:
More and more AI services are provided through APIs on cloud where predictive models are hidden behind APIs. To build trust with users and reduce potential application risk, it is important to interpret how such predictive models hidden behind APIs make their decisions. The biggest challenge of interpreting such predictions is that no access to model parameters or training data is available. Exist…
▽ More
More and more AI services are provided through APIs on cloud where predictive models are hidden behind APIs. To build trust with users and reduce potential application risk, it is important to interpret how such predictive models hidden behind APIs make their decisions. The biggest challenge of interpreting such predictions is that no access to model parameters or training data is available. Existing works interpret the predictions of a model hidden behind an API by heuristically probing the response of the API with perturbed input instances. However, these methods do not provide any guarantee on the exactness and consistency of their interpretations. In this paper, we propose an elegant closed form solution named OpenAPI to compute exact and consistent interpretations for the family of Piecewise Linear Models (PLM), which includes many popular classification models. The major idea is to first construct a set of overdetermined linear equation systems with a small set of perturbed instances and the predictions made by the model on those instances. Then, we solve the equation systems to identify the decision features that are responsible for the prediction on an input instance. Our extensive experiments clearly demonstrate the exactness and consistency of our method.
△ Less
Submitted 19 April, 2020; v1 submitted 17 June, 2019;
originally announced June 2019.
-
LEMO: Learn to Equalize for MIMO-OFDM Systems with Low-Resolution ADCs
Authors:
Lei Chu,
Ling Pei,
Husheng Li,
Robert Caiming Qiu
Abstract:
This paper develops a new deep neural network optimized equalization framework for massive multiple input multiple output orthogonal frequency division multiplexing (MIMOOFDM) systems that employ low-resolution analog-to-digital converters (ADCs) at the base station (BS). The use of lowresolution ADCs could largely reduce hardware complexity and circuit power consumption, however, it makes the cha…
▽ More
This paper develops a new deep neural network optimized equalization framework for massive multiple input multiple output orthogonal frequency division multiplexing (MIMOOFDM) systems that employ low-resolution analog-to-digital converters (ADCs) at the base station (BS). The use of lowresolution ADCs could largely reduce hardware complexity and circuit power consumption, however, it makes the channel station information almost blind to the BS, hence causing difficulty in solving the equalization problem. In this paper, we consider a supervised learning architecture, where the goal is to learn a representative function that can predict the targets (constellation points) from the inputs (outputs of the low-resolution ADCs) based on the labeled training data (pilot signals). Especially, our main contributions are two-fold: 1) First, we design a new activation function, whose outputs are close to the constellation points when the parameters are finally optimized, to help us fully exploit the stochastic gradient descent method for the discrete optimization problem. 2) Second, an unsupervised loss is designed and then added to the optimization objective, aiming to enhance the representation ability (so-called generalization). Lastly, various experimental results confirm the superiority of the proposed equalizer over some existing ones, particularly when the statistics of the channel state information are unclear.
△ Less
Submitted 25 May, 2020; v1 submitted 14 May, 2019;
originally announced May 2019.
-
neuralRank: Searching and ranking ANN-based model repositories
Authors:
Nirmit Desai,
Linsong Chu,
Raghu K. Ganti,
Sebastian Stein,
Mudhakar Srivatsa
Abstract:
Widespread applications of deep learning have led to a plethora of pre-trained neural network models for common tasks. Such models are often adapted from other models via transfer learning. The models may have varying training sets, training algorithms, network architectures, and hyper-parameters. For a given application, what isthe most suitable model in a model repository? This is a critical que…
▽ More
Widespread applications of deep learning have led to a plethora of pre-trained neural network models for common tasks. Such models are often adapted from other models via transfer learning. The models may have varying training sets, training algorithms, network architectures, and hyper-parameters. For a given application, what isthe most suitable model in a model repository? This is a critical question for practical deployments but it has not received much attention. This paper introduces the novel problem of searching and ranking models based on suitability relative to a target dataset and proposes a ranking algorithm called \textit{neuralRank}. The key idea behind this algorithm is to base model suitability on the discriminating power of a model, using a novel metric to measure it. With experimental results on the MNIST, Fashion, and CIFAR10 datasets, we demonstrate that (1) neuralRank is independent of the domain, the training set, or the network architecture and (2) that the models ranked highly by neuralRank ranking tend to have higher model accuracy in practice.
△ Less
Submitted 2 March, 2019;
originally announced March 2019.
-
Sequential Change-point Detection for High-dimensional and non-Euclidean Data
Authors:
Lynna Chu,
Hao Chen
Abstract:
In many applications, it is often of practical and scientific interest to detect anomaly events in a streaming sequence of high-dimensional or non-Euclidean observations. We study a non-parametric framework that utilizes nearest neighbor information among the observations to detect changes in an online setting. It can be applied to data in arbitrary dimension and non-Euclidean data as long as a si…
▽ More
In many applications, it is often of practical and scientific interest to detect anomaly events in a streaming sequence of high-dimensional or non-Euclidean observations. We study a non-parametric framework that utilizes nearest neighbor information among the observations to detect changes in an online setting. It can be applied to data in arbitrary dimension and non-Euclidean data as long as a similarity measure on the sample space can be defined. We consider new test statistics under this framework that can detect anomaly events more effectively than the existing test while keeping the false discovery rate controlled at a fixed level. Analytic formulas approximating the average run lengths of the new approaches are derived to make them fast applicable to modern datasets. Simulation studies are provided to support theoretical results. The proposed approach is illustrated with an analysis of the NYC taxi dataset.
△ Less
Submitted 21 October, 2022; v1 submitted 14 October, 2018;
originally announced October 2018.
-
A Survey on Nonconvex Regularization Based Sparse and Low-Rank Recovery in Signal Processing, Statistics, and Machine Learning
Authors:
Fei Wen,
Lei Chu,
Peilin Liu,
Robert C. Qiu
Abstract:
In the past decade, sparse and low-rank recovery have drawn much attention in many areas such as signal/image processing, statistics, bioinformatics and machine learning. To achieve sparsity and/or low-rankness inducing, the $\ell_1$ norm and nuclear norm are of the most popular regularization penalties due to their convexity. While the $\ell_1$ and nuclear norm are convenient as the related conve…
▽ More
In the past decade, sparse and low-rank recovery have drawn much attention in many areas such as signal/image processing, statistics, bioinformatics and machine learning. To achieve sparsity and/or low-rankness inducing, the $\ell_1$ norm and nuclear norm are of the most popular regularization penalties due to their convexity. While the $\ell_1$ and nuclear norm are convenient as the related convex optimization problems are usually tractable, it has been shown in many applications that a nonconvex penalty can yield significantly better performance. In recent, nonconvex regularization based sparse and low-rank recovery is of considerable interest and it in fact is a main driver of the recent progress in nonconvex and nonsmooth optimization. This paper gives an overview of this topic in various fields in signal processing, statistics and machine learning, including compressive sensing (CS), sparse regression and variable selection, sparse signals separation, sparse principal component analysis (PCA), large covariance and inverse covariance matrices estimation, matrix completion, and robust PCA. We present recent developments of nonconvex regularization based sparse and low-rank recovery in these fields, addressing the issues of penalty selection, applications and the convergence of nonconvex algorithms. Code is available at https://github.com/FWen/ncreg.git.
△ Less
Submitted 6 June, 2019; v1 submitted 16 August, 2018;
originally announced August 2018.
-
A New Approach of Exploiting Self-Adjoint Matrix Polynomials of Large Random Matrices for Anomaly Detection and Fault Location
Authors:
Zenan Ling,
Robert C. Qiu,
Xing He,
Lei Chu
Abstract:
Synchronized measurements of a large power grid enable an unprecedented opportunity to study the spatialtemporal correlations. Statistical analytics for those massive datasets start with high-dimensional data matrices. Uncertainty is ubiquitous in a future's power grid. These data matrices are recognized as random matrices. This new point of view is fundamental in our theoretical analysis since tr…
▽ More
Synchronized measurements of a large power grid enable an unprecedented opportunity to study the spatialtemporal correlations. Statistical analytics for those massive datasets start with high-dimensional data matrices. Uncertainty is ubiquitous in a future's power grid. These data matrices are recognized as random matrices. This new point of view is fundamental in our theoretical analysis since true covariance matrices cannot be estimated accurately in a high-dimensional regime. As an alternative, we consider large-dimensional sample covariance matrices in the asymptotic regime to replace the true covariance matrices. The self-adjoint polynomials of large-dimensional random matrices are studied as statistics for big data analytics. The calculation of the asymptotic spectrum distribution (ASD) for such a matrix polynomial is understandably challenging. This task is made possible by a recent breakthrough in free probability, an active research branch in random matrix theory. This is the very reason why the work of this paper is inspired initially. The new approach is interesting in many aspects. The mathematical reason may be most critical. The real-world problems can be solved using this approach, however.
△ Less
Submitted 9 February, 2018;
originally announced February 2018.
-
Early Anomaly Detection and Location in Distribution Network: A Data-Driven Approach
Authors:
Xin Shi,
Robert Qiu,
Xing He,
Zenan Ling,
Haosen Yang,
Lei Chu
Abstract:
The measurement data collected from the supervisory control and data acquisition (SCADA) system installed in distribution network can reflect the operational state of the network effectively. In this paper, a random matrix theory (RMT) based approach is developed for early anomaly detection and localization by using the data. For every feeder in the distribution network, a corresponding data matri…
▽ More
The measurement data collected from the supervisory control and data acquisition (SCADA) system installed in distribution network can reflect the operational state of the network effectively. In this paper, a random matrix theory (RMT) based approach is developed for early anomaly detection and localization by using the data. For every feeder in the distribution network, a corresponding data matrix is formed. Based on the Marchenko-Pastur Law for the empirical spectral analysis of covariance `signal+noise' matrix, the linear eigenvalue statistics are introduced to indicate the anomaly, and the outliers and their corresponding eigenvectors are analyzed for locating the anomaly. As for the low observability feeders in the distribution network, an increasing data dimension algorithm is designed for the formulated low-dimensional matrices being more accurately analyzed. The developed approach can detect and localize the anomaly at an early stage, and it is robust to random disturbance and measurement error. Cases on Matpower simulation data and real SCADA data corroborate the feasibility of the approach.
△ Less
Submitted 11 March, 2020; v1 submitted 5 January, 2018;
originally announced January 2018.
-
Invisible Units Detection and Estimation Based on Random Matrix Theory
Authors:
Xing He,
Lei Chu,
Robert C. Qiu,
Qian Ai,
Zenan Ling,
Jian Zhang
Abstract:
Invisible units mainly refer to small-scale units that are not monitored by, and thus are not visible to utilities. Integration of these invisible units into power systems does significantly affect the way in which a distribution grid is planned and operated. This paper, based on random matrix theory (RMT), proposes a statistical, data-driven framework to handle the massive grid data, in contrast…
▽ More
Invisible units mainly refer to small-scale units that are not monitored by, and thus are not visible to utilities. Integration of these invisible units into power systems does significantly affect the way in which a distribution grid is planned and operated. This paper, based on random matrix theory (RMT), proposes a statistical, data-driven framework to handle the massive grid data, in contrast to its deterministic, model-based counterpart. Combining the RMT-based data-mining framework with conventional techniques, some heuristics are derived as the solution to the invisible units detection and estimation task: linear eigenvalue statistic indicators (LESs) are suggested as the main ingredients of the solution; according to the statistical properties of LESs, the hypothesis testing is formulated to conduct change point detection in the high-dimensional space. The proposed method is promising for anomaly detection and pertinent to current distribution networks-it is capable of detecting invisible power usage and fraudulent behavior while even being able to locate the suspect's location. Case studies, using both simulated data and actual data, validate the proposed method.
△ Less
Submitted 9 December, 2023; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Spatio-Temporal Big Data Analysis for Smart Grids Based on Random Matrix Theory: A Comprehensive Study
Authors:
Robert Qiu,
Lei Chu,
Xing He,
Zenan Ling,
Haichun Liu
Abstract:
A cornerstone of the smart grid is the advanced monitorability on its assets and operations. Increasingly pervasive installation of the phasor measurement units (PMUs) allows the so-called synchrophasor measurements to be taken roughly 100 times faster than the legacy supervisory control and data acquisition (SCADA) measurements, time-stamped using the global positioning system (GPS) signals to ca…
▽ More
A cornerstone of the smart grid is the advanced monitorability on its assets and operations. Increasingly pervasive installation of the phasor measurement units (PMUs) allows the so-called synchrophasor measurements to be taken roughly 100 times faster than the legacy supervisory control and data acquisition (SCADA) measurements, time-stamped using the global positioning system (GPS) signals to capture the grid dynamics. On the other hand, the availability of low-latency two-way communication networks will pave the way to high-precision real-time grid state estimation and detection, remedial actions upon network instability, and accurate risk analysis and post-event assessment for failure prevention.
In this chapter, we firstly modelling spatio-temporal PMU data in large scale grids as random matrix sequences. Secondly, some basic principles of random matrix theory (RMT), such as asymptotic spectrum laws, transforms, convergence rate and free probability, are introduced briefly in order to the better understanding and application of RMT technologies. Lastly, the case studies based on synthetic data and real data are developed to evaluate the performance of the RMT-based schemes in different application scenarios (i.e., state evaluation and situation awareness).
△ Less
Submitted 15 August, 2017;
originally announced August 2017.
-
Asymptotic Distribution-Free Change-Point Detection for Multivariate and non-Euclidean Data
Authors:
Lynna Chu,
Hao Chen
Abstract:
We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing ap…
▽ More
We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution free under the null hypothesis of no change. Analytic p-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets. The new approaches are illustrated in an analysis of New York taxi data.
△ Less
Submitted 22 February, 2018; v1 submitted 1 July, 2017;
originally announced July 2017.
-
A Novel Data-Driven Situation Awareness Approach for Future Grids--Using Large Random Matrices for Big Data Modeling
Authors:
Xing He,
Lei Chu,
Robert C. Qiu,
Qian Ai,
Zenan Ling
Abstract:
Data-driven approaches, when tasked with situation awareness, are suitable for complex grids with massive datasets. It is a challenge, however, to efficiently turn these massive datasets into useful big data analytics. To address such a challenge, this paper, based on random matrix theory (RMT), proposes a datadriven approach. The approach models massive datasets as large random matrices; it is mo…
▽ More
Data-driven approaches, when tasked with situation awareness, are suitable for complex grids with massive datasets. It is a challenge, however, to efficiently turn these massive datasets into useful big data analytics. To address such a challenge, this paper, based on random matrix theory (RMT), proposes a datadriven approach. The approach models massive datasets as large random matrices; it is model-free and requiring no knowledge about physical model parameters. In particular, the large data dimension N and the large time span T, from the spatial aspect and the temporal aspect respectively, lead to favorable results. The beautiful thing lies in that these linear eigenvalue statistics (LESs) built from data matrices follow Gaussian distributions for very general conditions, due to the latest breakthroughs in probability on the central limit theorems of those LESs. Numerous case studies, with both simulated data and field data, are given to validate the proposed new algorithms.
△ Less
Submitted 16 January, 2018; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Massive Streaming PMU Data Modeling and Analytics in Smart Grid State Evaluation Based on Multiple High-Dimensional Covariance Tests
Authors:
Lei Chu,
Robert Qiu,
Xing He,
Zenan Ling,
Yadong Liu
Abstract:
The analogous deployment of phase measurement units (PMUs), the increase of data quantum and the deregulation of energy market, all call for the robust state evaluation in large scale power systems. Implementing model based estimators is impractical because of the complexity scale of solving the high dimension power flow equations. In this paper, we first represent massive streaming PMU data as bi…
▽ More
The analogous deployment of phase measurement units (PMUs), the increase of data quantum and the deregulation of energy market, all call for the robust state evaluation in large scale power systems. Implementing model based estimators is impractical because of the complexity scale of solving the high dimension power flow equations. In this paper, we first represent massive streaming PMU data as big random matrix flow. By exploiting the variations in the covariance matrix of the massive streaming PMU data, a novel power state evaluation algorithm is then developed based on the multiple high dimensional covariance matrix tests. The proposed test statistic is flexible and nonparametric, which assumes no specific parameter distribution or dimension structure for the PMU data. Besides, it can jointly reveal the relative magnitude, duration and location of a system event. For the sake of practical application, we reduce the computation of the proposed test statistic from $O(\varepsilon n_g^4)$ to $O(ηn_g^2)$ by principal component calculation and redundant computation elimination. The novel algorithm is numerically evaluated utilizing the IEEE 30-, 118-bus system, a Polish 2383-bus system, and a real 34-PMU system. The case studies illustrate and verify the superiority of proposed state evaluation indicator.
△ Less
Submitted 22 June, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.
-
Designing for Situation Awareness of Future Power Grids: An Indicator System Based on Linear Eigenvalue Statistics of Large Random Matrices
Authors:
Xing He,
Robert C. Qiu,
Qian Ai,
Lei Chu,
Xinyi Xu,
Zenan Ling
Abstract:
Future power grids are fundamentally different from current ones, both in size and in complexity; this trend imposes challenges for situation awareness (SA) based on classical indicators, which are usually model-based and deterministic. As an alternative, this paper proposes a statistical indicator system based on linear eigenvalue statistics (LESs) of large random matrices: 1) from a data modelin…
▽ More
Future power grids are fundamentally different from current ones, both in size and in complexity; this trend imposes challenges for situation awareness (SA) based on classical indicators, which are usually model-based and deterministic. As an alternative, this paper proposes a statistical indicator system based on linear eigenvalue statistics (LESs) of large random matrices: 1) from a data modeling viewpoint, we build, starting from power flows equations, the random matrix models (RMMs) only using the real-time data flow in a statistical manner; 2) for a data analysis that is fully driven from RMMs, we put forward the high-dimensional indicators, called LESs that have some unique statistical features such as Gaussian properties; and 3) we develop a three-dimensional (3D) power-map to visualize the system, respectively, from a high-dimensional viewpoint and a low-dimensional one. Therefore, a statistical methodology of SA is employed; it conducts SA with a model-free and data-driven procedure, requiring no knowledge of system topologies, units operation/control models, causal relationship, etc. This methodology has numerous advantages, such as sensitivity, universality, speed, and flexibility. In particular, its robustness against bad data is highlighted, with potential advantages in cyber security. The theory of big data based stability for on-line operations may prove feasible along with this line of work, although this critical development will be reported elsewhere.
△ Less
Submitted 6 July, 2016; v1 submitted 22 December, 2015;
originally announced December 2015.