-
A Random Matrix Analysis of In-context Memorization for Nonlinear Attention
Authors:
Zhenyu Liao,
Jiaqing Liu,
TianQi Hou,
Difan Zou,
Zenan Ling
Abstract:
Attention mechanisms have revolutionized machine learning (ML) by enabling efficient modeling of global dependencies across inputs. Their inherently parallelizable structures allow for efficient scaling with the exponentially increasing size of both pretrained data and model parameters. Yet, despite their central role as the computational backbone of modern large language models (LLMs), the theore…
▽ More
Attention mechanisms have revolutionized machine learning (ML) by enabling efficient modeling of global dependencies across inputs. Their inherently parallelizable structures allow for efficient scaling with the exponentially increasing size of both pretrained data and model parameters. Yet, despite their central role as the computational backbone of modern large language models (LLMs), the theoretical understanding of Attentions, especially in the nonlinear setting, remains limited.
In this paper, we provide a precise characterization of the \emph{in-context memorization error} of \emph{nonlinear Attention}, in the high-dimensional proportional regime where the number of input tokens $n$ and their embedding dimension $p$ are both large and comparable. Leveraging recent advances in the theory of large kernel random matrices, we show that nonlinear Attention typically incurs higher memorization error than linear ridge regression on random inputs. However, this gap vanishes, and can even be reversed, when the input exhibits statistical structure, particularly when the Attention weights align with the input signal direction. Our results reveal how nonlinearity and input structure interact with each other to govern the memorization performance of nonlinear Attention. The theoretical insights are supported by numerical experiments.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Machine learning bridging battery field data and laboratory data
Authors:
Yanbin Zhao,
Hao Liu,
Zhihua Deng,
Tong Li,
Haoyi Jiang,
Zhenfei Ling,
Xingkai Wang,
Lei Zhang,
Xiaoping Ouyang
Abstract:
Aiming at the dilemma that most laboratory data-driven diagnostic and prognostic methods cannot be applied to field batteries in passenger cars and energy storage systems, this paper proposes a method to bridge field data and laboratory data using machine learning. Only two field real impedances corresponding to a medium frequency and a high frequency are needed to predict laboratory real impedanc…
▽ More
Aiming at the dilemma that most laboratory data-driven diagnostic and prognostic methods cannot be applied to field batteries in passenger cars and energy storage systems, this paper proposes a method to bridge field data and laboratory data using machine learning. Only two field real impedances corresponding to a medium frequency and a high frequency are needed to predict laboratory real impedance curve, laboratory charge/discharge curve, and laboratory relaxation curve. Based on the predicted laboratory data, laboratory data-driven methods can be used for field battery diagnosis and prognosis. Compared with the field data-driven methods based on massive historical field data, the proposed method has the advantages of higher accuracy, lower cost, faster speed, readily available, and no use of private data. The proposed method is tested using two open-source datasets containing 249 NMC cells. For a test set containing 76 cells, the mean absolute percentage errors of laboratory real impedance curve, charge curve, and discharge curve prediction results are 0.85%, 4.72%, and 2.69%, respectively. This work fills the gap between laboratory data-driven diagnostic and prognostic methods and field battery applications, making all laboratory data-driven methods applicable to field battery diagnosis and prognosis. Furthermore, this work overturns the fixed path of developing field battery diagnostic and prognostic methods based on massive field historical data, opening up new research and breakthrough directions for field battery diagnosis and prognosis.
△ Less
Submitted 13 May, 2025; v1 submitted 8 May, 2025;
originally announced May 2025.
-
Machine learning accelerates fuel cell life testing
Authors:
Yanbin Zhao,
Hao Liu,
Zhihua Deng,
Haoyi Jiang,
Zhenfei Ling,
Zhiyang Liu,
Xingkai Wang,
Tong Li,
Xiaoping Ouyang
Abstract:
Accelerated life testing (ALT) can significantly reduce the economic, time, and labor costs of life testing in the process of equipment, device, and material research and development (R&D), and improve R&D efficiency. This paper proposes a performance characterization data prediction (PCDP) method and a life prediction-driven ALT (LP-ALT) method to accelerate the life test of polymer electrolyte m…
▽ More
Accelerated life testing (ALT) can significantly reduce the economic, time, and labor costs of life testing in the process of equipment, device, and material research and development (R&D), and improve R&D efficiency. This paper proposes a performance characterization data prediction (PCDP) method and a life prediction-driven ALT (LP-ALT) method to accelerate the life test of polymer electrolyte membrane fuel cells (PEMFCs). The PCDP method can accurately predict different PCD using only four impedances (real and imaginary) corresponding to a high frequency and a medium frequency, greatly shortening the measurement time of offline PCD and reducing the difficulty of life testing. The test results on an open source life test dataset containing 42 PEMFCs show that compared with the determination coefficient (R^2) results of predicted aging indicators, including limiting current, total mass transport resistance, electrochemically active surface area, and crossover current, obtained based on the measured PCD, the R^2 results of predicted aging indicators based on the predicted PCD is only reduced by 0.04, 0.01, 0.05, and 0.06, respectively. The LP-ALT method can shorten the life test time through early life prediction. Test results on the same open-source life test dataset of PEMFCs show that the acceleration ratio of the LP-ALT method can reach 30 times under the premise of ensuring that the minimum R^2 of the prediction results of different aging indicators, including limiting current, total mass transport resistance, and electrochemically active surface area, is not less than 0.89. Combining the different performance characterization data predicted by the PCDP method and the life prediction of the LP-ALT method, the diagnosis and prognosis of PEMFCs and their components can be achieved.
△ Less
Submitted 7 May, 2025; v1 submitted 26 April, 2025;
originally announced April 2025.
-
Bayesian shrinkage priors subject to linear constraints
Authors:
Zhi Ling,
Shozen Dan
Abstract:
In Bayesian regression models with categorical predictors, constraints are needed to ensure identifiability when using all $K$ levels of a factor. The sum-to-zero constraint is particularly useful as it allows coefficients to represent deviations from the population average. However, implementing such constraints in Bayesian settings is challenging, especially when assigning appropriate priors tha…
▽ More
In Bayesian regression models with categorical predictors, constraints are needed to ensure identifiability when using all $K$ levels of a factor. The sum-to-zero constraint is particularly useful as it allows coefficients to represent deviations from the population average. However, implementing such constraints in Bayesian settings is challenging, especially when assigning appropriate priors that respect these constraints and general principles. Here we develop a multivariate normal prior family that satisfies arbitrary linear constraints while preserving the local adaptivity properties of shrinkage priors, with an efficient implementation algorithm for probabilistic programming languages. Our approach applies broadly to various shrinkage frameworks including Bayesian Ridge, horseshoe priors and their variants, demonstrating excellent performance in simulation studies. The covariance structure we derive generalizes beyond regression models to any Bayesian analysis requiring linear constraints on parameters, providing practitioners with a principled approach to parameter identification while maintaining proper uncertainty quantification and interpretability.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton
Authors:
Chengmei Niu,
Zhenyu Liao,
Zenan Ling,
Michael W. Mahoney
Abstract:
A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of ra…
▽ More
A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of random sketches are not unbiased, despite the unbiasedness of the sketches themselves. This bias presents challenges for the use of random sketches in various ML pipelines, such as fast stochastic optimization, scalable statistical estimators, and distributed optimization. In the context of random projection, the inversion bias can be easily corrected for dense Gaussian projections (which are, however, too expensive for many applications). Recent work has shown how the inversion bias can be corrected for sparse sub-gaussian projections. In this paper, we show how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform. Using these results, we establish problem-independent local convergence rates for sub-sampled Newton methods.
△ Less
Submitted 29 May, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Towards pandemic preparedness: ability to estimate high-resolution social contact patterns from longitudinal surveys
Authors:
Shozen Dan,
Joshua Tegegne,
Yu Chen,
Zhi Ling,
Veronika K. Jaeger,
André Karch,
Swapnil Mishra,
Oliver Ratmann
Abstract:
Social contact surveys are an important tool to assess infection risks within populations, and the effect of non-pharmaceutical interventions on social behaviour during disease outbreaks, epidemics, and pandemics. Numerous longitudinal social contact surveys were conducted during the COVID-19 era, however data analysis is plagued by reporting fatigue, a phenomenon whereby the average number of soc…
▽ More
Social contact surveys are an important tool to assess infection risks within populations, and the effect of non-pharmaceutical interventions on social behaviour during disease outbreaks, epidemics, and pandemics. Numerous longitudinal social contact surveys were conducted during the COVID-19 era, however data analysis is plagued by reporting fatigue, a phenomenon whereby the average number of social contacts reported declines with the number of repeat participations and as participants' engagement decreases over time. Using data from the German COVIMOD Study between April 2020 to December 2021, we demonstrate that reporting fatigue varied considerably by sociodemographic factors and was consistently strongest among parents reporting children contacts (parental proxy reporting), students, middle-aged individuals, those in full-time employment and those self-employed. We find further that, when using data from first-time participants as gold standard, statistical models incorporating a simple logistic function to control for reporting fatigue were associated with substantially improved estimation accuracy relative to models with no reporting fatigue adjustments, and that no cap on the number of repeat participations was required. These results indicate that existing longitudinal contact survey data can be meaningfully interpreted under an easy-to-implement statistical approach adressing reporting fatigue confounding, and that longitudinal designs including repeat participants are a viable option for future social contact survey designs.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Nonstationary Sparse Spectral Permanental Process
Authors:
Zicheng Sun,
Yixuan Zhang,
Zenan Ling,
Xuhui Fan,
Feng Zhou
Abstract:
Existing permanental processes often impose constraints on kernel types or stationarity, limiting the model's expressiveness. To overcome these limitations, we propose a novel approach utilizing the sparse spectral representation of nonstationary kernels. This technique relaxes the constraints on kernel types and stationarity, allowing for more flexible modeling while reducing computational comple…
▽ More
Existing permanental processes often impose constraints on kernel types or stationarity, limiting the model's expressiveness. To overcome these limitations, we propose a novel approach utilizing the sparse spectral representation of nonstationary kernels. This technique relaxes the constraints on kernel types and stationarity, allowing for more flexible modeling while reducing computational complexity to the linear level. Additionally, we introduce a deep kernel variant by hierarchically stacking multiple spectral feature mappings, further enhancing the model's expressiveness to capture complex patterns in data. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of our approach, particularly in scenarios with pronounced data nonstationarity. Additionally, ablation studies are conducted to provide insights into the impact of various hyperparameters on model performance.
△ Less
Submitted 18 December, 2024; v1 submitted 4 October, 2024;
originally announced October 2024.
-
Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures
Authors:
Zenan Ling,
Longbo Li,
Zhanbo Feng,
Yixuan Zhang,
Feng Zhou,
Robert C. Qiu,
Zhenyu Liao
Abstract:
Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of…
▽ More
Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets.
△ Less
Submitted 19 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification
Authors:
Tianjun Ke,
Haoqun Cao,
Zenan Ling,
Feng Zhou
Abstract:
Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classificat…
▽ More
Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification due to its conditional conjugacy property. However, the theoretical property of logistic-softmax is not clear and previous research indicated that the inherent uncertainty of logistic-softmax leads to suboptimal performance. To mitigate these issues, we revisit and redesign the logistic-softmax likelihood, which enables control of the \textit{a priori} confidence level through a temperature parameter. Furthermore, we theoretically and empirically show that softmax can be viewed as a special case of logistic-softmax and logistic-softmax induces a larger family of data distribution than softmax. Utilizing modified logistic-softmax, we integrate the data augmentation technique into the deep kernel based Gaussian process meta-learning framework, and derive an analytical mean-field approximation for task-specific updates. Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets. Code is publicly available at \url{https://github.com/keanson/revisit-logistic-softmax}.
△ Less
Submitted 10 October, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint
Authors:
Zenan Ling,
Zhenyu Liao,
Robert C. Qiu
Abstract:
Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this,…
▽ More
Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Global Convergence of Over-parameterized Deep Equilibrium Models
Authors:
Zenan Ling,
Xingyu Xie,
Qiuhao Wang,
Zongpeng Zhang,
Zhouchen Lin
Abstract:
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on…
▽ More
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
△ Less
Submitted 28 March, 2023; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Any Part of Bayesian Network Structure Learning
Authors:
Zhaolong Ling,
Kui Yu,
Hao Wang,
Lin Liu,
Jiuyong Li
Abstract:
We study an interesting and challenging problem, learning any part of a Bayesian network (BN) structure. In this challenge, it will be computationally inefficient using existing global BN structure learning algorithms to find an entire BN structure to achieve the part of a BN structure in which we are interested. And local BN structure learning algorithms encounter the false edge orientation probl…
▽ More
We study an interesting and challenging problem, learning any part of a Bayesian network (BN) structure. In this challenge, it will be computationally inefficient using existing global BN structure learning algorithms to find an entire BN structure to achieve the part of a BN structure in which we are interested. And local BN structure learning algorithms encounter the false edge orientation problem when they are directly used to tackle this challenging problem. In this paper, we first present a new concept of Expand-Backtracking to explain why local BN structure learning methods have the false edge orientation problem, then propose APSL, an efficient and accurate Any Part of BN Structure Learning algorithm. Specifically, APSL divides the V-structures in a Markov blanket (MB) into two types: collider V-structure and non-collider V-structure, then it starts from a node of interest and recursively finds both collider V-structures and non-collider V-structures in the found MBs, until the part of a BN structure in which we are interested are oriented. To improve the efficiency of APSL, we further design the APSL-FS algorithm using Feature Selection, APSL-FS. Using six benchmark BNs, the extensive experiments have validated the efficiency and accuracy of our methods.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
State Alignment-based Imitation Learning
Authors:
Fangchen Liu,
Zhan Ling,
Tongzhou Mu,
Hao Su
Abstract:
Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of the current imitation learning methods fail because they focus on imitating actions. We propose a novel state alignment-based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible. The state alignment comes from both lo…
▽ More
Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of the current imitation learning methods fail because they focus on imitating actions. We propose a novel state alignment-based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible. The state alignment comes from both local and global perspectives and we combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings and imitation learning settings where the expert and imitator have different dynamics models.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Causality-based Feature Selection: Methods and Evaluations
Authors:
Kui Yu,
Xianjie Guo,
Lin Liu,
Jiuyong Li,
Hao Wang,
Zhaolong Ling,
Xindong Wu
Abstract:
Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has p…
▽ More
Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this paper, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world data sets. Finally, we discuss some challenging problems to be tackled in future causality-based feature selection research.
△ Less
Submitted 16 November, 2019;
originally announced November 2019.
-
Spectrum concentration in deep residual learning: a free probability approach
Authors:
Zenan Ling,
Xing He,
Robert C. Qiu
Abstract:
We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of…
▽ More
We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of a fully-connected deep ResNet for both linear and nonlinear cases. With the powerful tool of free probability, we conduct an asymptotic analysis of the spectrum on the single-layer case, and then extend this analysis to the multi-layer case of an arbitrary number of layers. In particular, we propose to rescale the classical random initialization by the number of residual units, so that the spectrum has the order of $O(1)$, when compared with the large width and depth of the network. We empirically demonstrate that the proposed initialization scheme learns at a speed of orders of magnitudes faster than the classical ones, and thus attests a strong practical relevance of this investigation.
△ Less
Submitted 24 February, 2019; v1 submitted 31 July, 2018;
originally announced July 2018.
-
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment
Authors:
Tomi Kinnunen,
Jaime Lorenzo-Trueba,
Junichi Yamagishi,
Tomoki Toda,
Daisuke Saito,
Fernando Villavicencio,
Zhenhua Ling
Abstract:
Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expens…
▽ More
Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expensive, and non-reproducible process, it hinders rapid prototyping of new VC technology. We address artifact assessment using an alternative, objective approach leveraging from prior work on spoofing countermeasures (CMs) for automatic speaker verification. Therein, CMs are used for rejecting `fake' inputs such as replayed, synthetic or converted speech but their potential for automatic speech artifact assessment remains unknown. This study serves to fill that gap. As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts. Equal error rate (EER) of the CM, a confusability index of VC samples with real human speech, serves as our artifact measure. Two clusters of VCC'18 entries are identified: low-quality ones with detectable artifacts (low EERs), and higher quality ones with less artifacts. None of the VCC'18 systems, however, is perfect: all EERs are < 30 % (the `ideal' value would be 50 %). Our preliminary findings suggest potential of CMs outside of their original application, as a supplemental optimization and benchmarking tool to enhance VC technology.
△ Less
Submitted 4 September, 2018; v1 submitted 23 April, 2018;
originally announced April 2018.
-
The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods
Authors:
Jaime Lorenzo-Trueba,
Junichi Yamagishi,
Tomoki Toda,
Daisuke Saito,
Fernando Villavicencio,
Tomi Kinnunen,
Zhenhua Ling
Abstract:
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems. The objective of the challenge was to perform speaker conversion (i.e. transform the vocal identity) of a source speaker to a target speaker while maintaining linguistic inform…
▽ More
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems. The objective of the challenge was to perform speaker conversion (i.e. transform the vocal identity) of a source speaker to a target speaker while maintaining linguistic information. As an update to the previous challenge, we considered both parallel and non-parallel data to form the Hub and Spoke tasks, respectively. A total of 23 teams from around the world submitted their systems, 11 of them additionally participated in the optional Spoke task. A large-scale crowdsourced perceptual evaluation was then carried out to rate the submitted converted speech in terms of naturalness and similarity to the target speaker identity. In this paper, we present a brief summary of the state-of-the-art techniques for VC, followed by a detailed explanation of the challenge tasks and the results that were obtained.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
A New Approach of Exploiting Self-Adjoint Matrix Polynomials of Large Random Matrices for Anomaly Detection and Fault Location
Authors:
Zenan Ling,
Robert C. Qiu,
Xing He,
Lei Chu
Abstract:
Synchronized measurements of a large power grid enable an unprecedented opportunity to study the spatialtemporal correlations. Statistical analytics for those massive datasets start with high-dimensional data matrices. Uncertainty is ubiquitous in a future's power grid. These data matrices are recognized as random matrices. This new point of view is fundamental in our theoretical analysis since tr…
▽ More
Synchronized measurements of a large power grid enable an unprecedented opportunity to study the spatialtemporal correlations. Statistical analytics for those massive datasets start with high-dimensional data matrices. Uncertainty is ubiquitous in a future's power grid. These data matrices are recognized as random matrices. This new point of view is fundamental in our theoretical analysis since true covariance matrices cannot be estimated accurately in a high-dimensional regime. As an alternative, we consider large-dimensional sample covariance matrices in the asymptotic regime to replace the true covariance matrices. The self-adjoint polynomials of large-dimensional random matrices are studied as statistics for big data analytics. The calculation of the asymptotic spectrum distribution (ASD) for such a matrix polynomial is understandably challenging. This task is made possible by a recent breakthrough in free probability, an active research branch in random matrix theory. This is the very reason why the work of this paper is inspired initially. The new approach is interesting in many aspects. The mathematical reason may be most critical. The real-world problems can be solved using this approach, however.
△ Less
Submitted 9 February, 2018;
originally announced February 2018.
-
Early Anomaly Detection and Location in Distribution Network: A Data-Driven Approach
Authors:
Xin Shi,
Robert Qiu,
Xing He,
Zenan Ling,
Haosen Yang,
Lei Chu
Abstract:
The measurement data collected from the supervisory control and data acquisition (SCADA) system installed in distribution network can reflect the operational state of the network effectively. In this paper, a random matrix theory (RMT) based approach is developed for early anomaly detection and localization by using the data. For every feeder in the distribution network, a corresponding data matri…
▽ More
The measurement data collected from the supervisory control and data acquisition (SCADA) system installed in distribution network can reflect the operational state of the network effectively. In this paper, a random matrix theory (RMT) based approach is developed for early anomaly detection and localization by using the data. For every feeder in the distribution network, a corresponding data matrix is formed. Based on the Marchenko-Pastur Law for the empirical spectral analysis of covariance `signal+noise' matrix, the linear eigenvalue statistics are introduced to indicate the anomaly, and the outliers and their corresponding eigenvectors are analyzed for locating the anomaly. As for the low observability feeders in the distribution network, an increasing data dimension algorithm is designed for the formulated low-dimensional matrices being more accurately analyzed. The developed approach can detect and localize the anomaly at an early stage, and it is robust to random disturbance and measurement error. Cases on Matpower simulation data and real SCADA data corroborate the feasibility of the approach.
△ Less
Submitted 11 March, 2020; v1 submitted 5 January, 2018;
originally announced January 2018.
-
A Data-driven Approach to Multi-event Analytics in Large-scale Power Systems Using Factor Model
Authors:
Fan Yang,
Xing He,
Robert Caiming Qiu,
Zenan Ling
Abstract:
Multi-event detection and recognition in real time is of challenge for a modern grid as its feature is usually non-identifiable. Based on factor model, this paper porposes a data-driven method as an alternative solution under the framework of random matrix theory. This method maps the raw data into a high-dimensional space with two parts: 1) the principal components (factors, mapping event signals…
▽ More
Multi-event detection and recognition in real time is of challenge for a modern grid as its feature is usually non-identifiable. Based on factor model, this paper porposes a data-driven method as an alternative solution under the framework of random matrix theory. This method maps the raw data into a high-dimensional space with two parts: 1) the principal components (factors, mapping event signals); and 2) time series residuals (bulk, mapping white/non-Gaussian noises). The spatial information is extracted form factors, and the termporal infromation from residuals. Taking both spatial-tempral correlation into account, this method is able to reveal the multi-event: its components and their respective details, e.g., occurring time. Case studies based on the standard IEEE 118-bus system validate the proposed method.
△ Less
Submitted 23 December, 2017;
originally announced December 2017.
-
Invisible Units Detection and Estimation Based on Random Matrix Theory
Authors:
Xing He,
Lei Chu,
Robert C. Qiu,
Qian Ai,
Zenan Ling,
Jian Zhang
Abstract:
Invisible units mainly refer to small-scale units that are not monitored by, and thus are not visible to utilities. Integration of these invisible units into power systems does significantly affect the way in which a distribution grid is planned and operated. This paper, based on random matrix theory (RMT), proposes a statistical, data-driven framework to handle the massive grid data, in contrast…
▽ More
Invisible units mainly refer to small-scale units that are not monitored by, and thus are not visible to utilities. Integration of these invisible units into power systems does significantly affect the way in which a distribution grid is planned and operated. This paper, based on random matrix theory (RMT), proposes a statistical, data-driven framework to handle the massive grid data, in contrast to its deterministic, model-based counterpart. Combining the RMT-based data-mining framework with conventional techniques, some heuristics are derived as the solution to the invisible units detection and estimation task: linear eigenvalue statistic indicators (LESs) are suggested as the main ingredients of the solution; according to the statistical properties of LESs, the hypothesis testing is formulated to conduct change point detection in the high-dimensional space. The proposed method is promising for anomaly detection and pertinent to current distribution networks-it is capable of detecting invisible power usage and fraudulent behavior while even being able to locate the suspect's location. Case studies, using both simulated data and actual data, validate the proposed method.
△ Less
Submitted 9 December, 2023; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Spatio-Temporal Big Data Analysis for Smart Grids Based on Random Matrix Theory: A Comprehensive Study
Authors:
Robert Qiu,
Lei Chu,
Xing He,
Zenan Ling,
Haichun Liu
Abstract:
A cornerstone of the smart grid is the advanced monitorability on its assets and operations. Increasingly pervasive installation of the phasor measurement units (PMUs) allows the so-called synchrophasor measurements to be taken roughly 100 times faster than the legacy supervisory control and data acquisition (SCADA) measurements, time-stamped using the global positioning system (GPS) signals to ca…
▽ More
A cornerstone of the smart grid is the advanced monitorability on its assets and operations. Increasingly pervasive installation of the phasor measurement units (PMUs) allows the so-called synchrophasor measurements to be taken roughly 100 times faster than the legacy supervisory control and data acquisition (SCADA) measurements, time-stamped using the global positioning system (GPS) signals to capture the grid dynamics. On the other hand, the availability of low-latency two-way communication networks will pave the way to high-precision real-time grid state estimation and detection, remedial actions upon network instability, and accurate risk analysis and post-event assessment for failure prevention.
In this chapter, we firstly modelling spatio-temporal PMU data in large scale grids as random matrix sequences. Secondly, some basic principles of random matrix theory (RMT), such as asymptotic spectrum laws, transforms, convergence rate and free probability, are introduced briefly in order to the better understanding and application of RMT technologies. Lastly, the case studies based on synthetic data and real data are developed to evaluate the performance of the RMT-based schemes in different application scenarios (i.e., state evaluation and situation awareness).
△ Less
Submitted 15 August, 2017;
originally announced August 2017.
-
A Novel Approach for Big Data Analytics in Future Grids Based on Free Probability
Authors:
Zenan Ling,
Robert C. Qiu,
Xing He,
Chu Lei
Abstract:
Based on the random matrix model, we can build statistical models using massive datasets across the power grid, and employ hypothesis testing for anomaly detection. First, the aim of this paper is to make the first attempt to apply the recent free probability result in extracting big data analytics, in particular data fusion. The nature of this work is basic in that new algorithms and analytics to…
▽ More
Based on the random matrix model, we can build statistical models using massive datasets across the power grid, and employ hypothesis testing for anomaly detection. First, the aim of this paper is to make the first attempt to apply the recent free probability result in extracting big data analytics, in particular data fusion. The nature of this work is basic in that new algorithms and analytics tools are proposed to pave the way for the future's research. Second, using the new analytic tool, we are able to make some discovery related to anomaly detection that is very difficult for other approaches. To our best knowledge, there is no similar report in the literature. Third, both linear and nonlinear polynomials of large random matrices can be handled in this new framework. Simulations demonstrate the following: Compared with the linearity, nonlinearity is more flexible in problem modeling and closer to the nature of the reality. In some sense, some other nonlinear matrix polynomials may be more effective for the power grid
△ Less
Submitted 4 December, 2016;
originally announced December 2016.
-
A Novel Data-Driven Situation Awareness Approach for Future Grids--Using Large Random Matrices for Big Data Modeling
Authors:
Xing He,
Lei Chu,
Robert C. Qiu,
Qian Ai,
Zenan Ling
Abstract:
Data-driven approaches, when tasked with situation awareness, are suitable for complex grids with massive datasets. It is a challenge, however, to efficiently turn these massive datasets into useful big data analytics. To address such a challenge, this paper, based on random matrix theory (RMT), proposes a datadriven approach. The approach models massive datasets as large random matrices; it is mo…
▽ More
Data-driven approaches, when tasked with situation awareness, are suitable for complex grids with massive datasets. It is a challenge, however, to efficiently turn these massive datasets into useful big data analytics. To address such a challenge, this paper, based on random matrix theory (RMT), proposes a datadriven approach. The approach models massive datasets as large random matrices; it is model-free and requiring no knowledge about physical model parameters. In particular, the large data dimension N and the large time span T, from the spatial aspect and the temporal aspect respectively, lead to favorable results. The beautiful thing lies in that these linear eigenvalue statistics (LESs) built from data matrices follow Gaussian distributions for very general conditions, due to the latest breakthroughs in probability on the central limit theorems of those LESs. Numerous case studies, with both simulated data and field data, are given to validate the proposed new algorithms.
△ Less
Submitted 16 January, 2018; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Massive Streaming PMU Data Modeling and Analytics in Smart Grid State Evaluation Based on Multiple High-Dimensional Covariance Tests
Authors:
Lei Chu,
Robert Qiu,
Xing He,
Zenan Ling,
Yadong Liu
Abstract:
The analogous deployment of phase measurement units (PMUs), the increase of data quantum and the deregulation of energy market, all call for the robust state evaluation in large scale power systems. Implementing model based estimators is impractical because of the complexity scale of solving the high dimension power flow equations. In this paper, we first represent massive streaming PMU data as bi…
▽ More
The analogous deployment of phase measurement units (PMUs), the increase of data quantum and the deregulation of energy market, all call for the robust state evaluation in large scale power systems. Implementing model based estimators is impractical because of the complexity scale of solving the high dimension power flow equations. In this paper, we first represent massive streaming PMU data as big random matrix flow. By exploiting the variations in the covariance matrix of the massive streaming PMU data, a novel power state evaluation algorithm is then developed based on the multiple high dimensional covariance matrix tests. The proposed test statistic is flexible and nonparametric, which assumes no specific parameter distribution or dimension structure for the PMU data. Besides, it can jointly reveal the relative magnitude, duration and location of a system event. For the sake of practical application, we reduce the computation of the proposed test statistic from $O(\varepsilon n_g^4)$ to $O(ηn_g^2)$ by principal component calculation and redundant computation elimination. The novel algorithm is numerically evaluated utilizing the IEEE 30-, 118-bus system, a Polish 2383-bus system, and a real 34-PMU system. The case studies illustrate and verify the superiority of proposed state evaluation indicator.
△ Less
Submitted 22 June, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.
-
Designing for Situation Awareness of Future Power Grids: An Indicator System Based on Linear Eigenvalue Statistics of Large Random Matrices
Authors:
Xing He,
Robert C. Qiu,
Qian Ai,
Lei Chu,
Xinyi Xu,
Zenan Ling
Abstract:
Future power grids are fundamentally different from current ones, both in size and in complexity; this trend imposes challenges for situation awareness (SA) based on classical indicators, which are usually model-based and deterministic. As an alternative, this paper proposes a statistical indicator system based on linear eigenvalue statistics (LESs) of large random matrices: 1) from a data modelin…
▽ More
Future power grids are fundamentally different from current ones, both in size and in complexity; this trend imposes challenges for situation awareness (SA) based on classical indicators, which are usually model-based and deterministic. As an alternative, this paper proposes a statistical indicator system based on linear eigenvalue statistics (LESs) of large random matrices: 1) from a data modeling viewpoint, we build, starting from power flows equations, the random matrix models (RMMs) only using the real-time data flow in a statistical manner; 2) for a data analysis that is fully driven from RMMs, we put forward the high-dimensional indicators, called LESs that have some unique statistical features such as Gaussian properties; and 3) we develop a three-dimensional (3D) power-map to visualize the system, respectively, from a high-dimensional viewpoint and a low-dimensional one. Therefore, a statistical methodology of SA is employed; it conducts SA with a model-free and data-driven procedure, requiring no knowledge of system topologies, units operation/control models, causal relationship, etc. This methodology has numerous advantages, such as sensitivity, universality, speed, and flexibility. In particular, its robustness against bad data is highlighted, with potential advantages in cyber security. The theory of big data based stability for on-line operations may prove feasible along with this line of work, although this critical development will be reported elsewhere.
△ Less
Submitted 6 July, 2016; v1 submitted 22 December, 2015;
originally announced December 2015.