-
Attentional Graph Meta-Learning for Indoor Localization Using Extremely Sparse Fingerprints
Authors:
Wenzhong Yan,
Feng Yin,
Jun Gao,
Ao Wang,
Yang Tian,
Ruizhi Chen
Abstract:
Fingerprint-based indoor localization is often labor-intensive due to the need for dense grids and repeated measurements across time and space. Maintaining high localization accuracy with extremely sparse fingerprints remains a persistent challenge. Existing benchmark methods primarily rely on the measured fingerprints, while neglecting valuable spatial and environmental characteristics. In this p…
▽ More
Fingerprint-based indoor localization is often labor-intensive due to the need for dense grids and repeated measurements across time and space. Maintaining high localization accuracy with extremely sparse fingerprints remains a persistent challenge. Existing benchmark methods primarily rely on the measured fingerprints, while neglecting valuable spatial and environmental characteristics. In this paper, we propose a systematic integration of an Attentional Graph Neural Network (AGNN) model, capable of learning spatial adjacency relationships and aggregating information from neighboring fingerprints, and a meta-learning framework that utilizes datasets with similar environmental characteristics to enhance model training. To minimize the labor required for fingerprint collection, we introduce two novel data augmentation strategies: 1) unlabeled fingerprint augmentation using moving platforms, which enables the semi-supervised AGNN model to incorporate information from unlabeled fingerprints, and 2) synthetic labeled fingerprint augmentation through environmental digital twins, which enhances the meta-learning framework through a practical distribution alignment, which can minimize the feature discrepancy between synthetic and real-world fingerprints effectively. By integrating these novel modules, we propose the Attentional Graph Meta-Learning (AGML) model. This novel model combines the strengths of the AGNN model and the meta-learning framework to address the challenges posed by extremely sparse fingerprints. To validate our approach, we collected multiple datasets from both consumer-grade WiFi devices and professional equipment across diverse environments. Extensive experiments conducted on both synthetic and real-world datasets demonstrate that the AGML model-based localization method consistently outperforms all baseline methods using sparse fingerprints across all evaluated metrics.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems
Authors:
Zhidi Lin,
Ying Li,
Feng Yin,
Juan Maroñas,
Alexandre H. Thiéry
Abstract:
Gaussian process state-space models (GPSSMs) offer a principled framework for learning and inference in nonlinear dynamical systems with uncertainty quantification. However, existing GPSSMs are limited by the use of multiple independent stationary Gaussian processes (GPs), leading to prohibitive computational and parametric complexity in high-dimensional settings and restricted modeling capacity f…
▽ More
Gaussian process state-space models (GPSSMs) offer a principled framework for learning and inference in nonlinear dynamical systems with uncertainty quantification. However, existing GPSSMs are limited by the use of multiple independent stationary Gaussian processes (GPs), leading to prohibitive computational and parametric complexity in high-dimensional settings and restricted modeling capacity for non-stationary dynamics. To address these challenges, we propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems. Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics while significantly reducing model complexity. For the inference of the implicit process, we develop a variational inference algorithm that jointly approximates the posterior over the underlying GP and the neural network parameters defining the normalizing flows. To avoid explicit variational parameterization of the latent states, we further incorporate the ensemble Kalman filter (EnKF) into the variational framework, enabling accurate and efficient state estimation. Extensive empirical evaluations on synthetic and real-world datasets demonstrate the superior performance of our ETGPSSM in system dynamics learning, high-dimensional state estimation, and time-series forecasting, outperforming existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
△ Less
Submitted 14 May, 2025; v1 submitted 23 March, 2025;
originally announced March 2025.
-
Preventing Model Collapse in Gaussian Process Latent Variable Models
Authors:
Ying Li,
Zhidi Lin,
Feng Yin,
Michael Minyi Zhang
Abstract:
Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the unde…
▽ More
Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the underlying data structure. This paper addresses these issues by, first, theoretically examining the impact of projection variance on model collapse through the lens of a linear GPLVM. Second, we tackle model collapse due to inadequate kernel flexibility by integrating the spectral mixture (SM) kernel and a differentiable random Fourier feature (RFF) kernel approximation, which ensures computational scalability and efficiency through off-the-shelf automatic differentiation tools for learning the kernel hyperparameters, projection variance, and latent representations within the variational inference framework. The proposed GPLVM, named advisedRFLVM, is evaluated across diverse datasets and consistently outperforms various salient competing models, including state-of-the-art variational autoencoders (VAEs) and other GPLVM variants, in terms of informative latent representations and missing data imputation.
△ Less
Submitted 18 June, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference
Authors:
Zhidi Lin,
Yiyong Sun,
Feng Yin,
Alexandre Hoang Thiéry
Abstract:
The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, a…
▽ More
The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, and infeasibility for online applications, among others. In this paper, we tackle these challenges by incorporating the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, into the NMF variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO). Moreover, owing to the streamlined parameterization via the EnKF, the new GPSSM model can be easily accommodated in online learning applications. We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting. We also provide detailed analysis and fresh insights for the proposed algorithms. Comprehensive evaluation across diverse real and synthetic datasets corroborates the superior learning and inference performance of our EnKF-aided variational inference algorithms compared to existing methods.
△ Less
Submitted 22 July, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Attentional Graph Neural Network Is All You Need for Robust Massive Network Localization
Authors:
Wenzhong Yan,
Feng Yin,
Juntao Wang,
Geert Leus,
Abdelhak M. Zoubir,
Yang Tian
Abstract:
In this paper, we design Graph Neural Networks (GNNs) with attention mechanisms to tackle an important yet challenging nonlinear regression problem: massive network localization. We first review our previous network localization method based on Graph Convolutional Network (GCN), which can exhibit state-of-the-art localization accuracy, even under severe Non-Line-of-Sight (NLOS) conditions, by care…
▽ More
In this paper, we design Graph Neural Networks (GNNs) with attention mechanisms to tackle an important yet challenging nonlinear regression problem: massive network localization. We first review our previous network localization method based on Graph Convolutional Network (GCN), which can exhibit state-of-the-art localization accuracy, even under severe Non-Line-of-Sight (NLOS) conditions, by carefully preselecting a constant threshold for determining adjacency. As an extension, we propose a specially designed Attentional GNN (AGNN) model to resolve the sensitive thresholding issue of the GCN-based method and enhance the underlying model capacity. The AGNN comprises an Adjacency Learning Module (ALM) and Multiple Graph Attention Layers (MGAL), employing distinct attention architectures to systematically address the demerits of the GCN-based method, rendering it more practical for real-world applications. Comprehensive analyses are conducted to explain the superior performance of these methods, including a theoretical analysis of the AGNN's dynamic attention property and computational complexity, along with a systematic discussion of their robust characteristic against NLOS measurements. Extensive experimental results demonstrate the effectiveness of the GCN-based and AGNN-based network localization methods. Notably, integrating attention mechanisms into the AGNN yields substantial improvements in localization accuracy, approaching the fundamental lower bound and showing approximately 37\% to 53\% reduction in localization error compared to the vanilla GCN-based method across various NLOS noise configurations. Both methods outperform all competing approaches by far in terms of localization accuracy, robustness, and computational time, especially for considerably large network sizes.
△ Less
Submitted 7 April, 2025; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Should I Stop or Should I Go: Early Stopping with Heterogeneous Populations
Authors:
Hammaad Adam,
Fan Yin,
Huibin,
Hu,
Neil Tenenholtz,
Lorin Crawford,
Lester Mackey,
Allison Koenecke
Abstract:
Randomized experiments often need to be stopped prematurely due to the treatment having an unintended harmful effect. Existing methods that determine when to stop an experiment early are typically applied to the data in aggregate and do not account for treatment effect heterogeneity. In this paper, we study the early stopping of experiments for harm on heterogeneous populations. We first establish…
▽ More
Randomized experiments often need to be stopped prematurely due to the treatment having an unintended harmful effect. Existing methods that determine when to stop an experiment early are typically applied to the data in aggregate and do not account for treatment effect heterogeneity. In this paper, we study the early stopping of experiments for harm on heterogeneous populations. We first establish that current methods often fail to stop experiments when the treatment harms a minority group of participants. We then use causal machine learning to develop CLASH, the first broadly-applicable method for heterogeneous early stopping. We demonstrate CLASH's performance on simulated and real data and show that it yields effective early stopping for both clinical trials and A/B tests.
△ Less
Submitted 27 October, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling
Authors:
Lei Cheng,
Feng Yin,
Sergios Theodoridis,
Sotirios Chatzis,
Tsung-Hui Chang
Abstract:
Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can…
▽ More
Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can better exploit related prior information and naturally introduce robustness into the model, due to their unique capacity to marginalize out uncertainties related to the parameter estimates. Moreover, hyper-parameters associated with the adopted priors can be learnt via the training data. To implement sparsity-aware learning, the crucial point lies in the choice of the function regularizer for discriminative methods and the choice of the prior distribution for Bayesian learning. Over the last decade or so, due to the intense research on deep learning, emphasis has been put on discriminative techniques. However, a come back of Bayesian methods is taking place that sheds new light on the design of deep neural networks, which also establish firm links with Bayesian models and inspire new paths for unsupervised learning, such as Bayesian tensor decomposition.
The goal of this article is two-fold. First, to review, in a unified way, some recent advances in incorporating sparsity-promoting priors into three highly popular data modeling tools, namely deep neural networks, Gaussian processes, and tensor decomposition. Second, to review their associated inference techniques from different aspects, including: evidence maximization via optimization and variational inference methods. Challenges such as small data dilemma, automatic model structure search, and natural prediction uncertainty evaluation are also discussed. Typical signal processing and machine learning tasks are demonstrated.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Highly Scalable Maximum Likelihood and Conjugate Bayesian Inference for ERGMs on Graph Sets with Equivalent Vertices
Authors:
Fan Yin,
Carter T. Butts
Abstract:
The exponential family random graph modeling (ERGM) framework provides a flexible approach for the statistical analysis of networks. As ERGMs typically involve normalizing factors that are costly to compute, practical inference relies on a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the MLE of ERGM p…
▽ More
The exponential family random graph modeling (ERGM) framework provides a flexible approach for the statistical analysis of networks. As ERGMs typically involve normalizing factors that are costly to compute, practical inference relies on a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the MLE of ERGM parameters, and is feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case - in which we observe multiple networks from a common generative process - adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or where large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Bayesian Nonparametric Estimation for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations
Authors:
Fan Yin,
Jieying Jiao,
Guanyu Hu,
Jun Yan
Abstract:
Basketball shot location data provide valuable summary information regarding players to coaches, sports analysts, fans, statisticians, as well as players themselves. Represented by spatial points, such data are naturally analyzed with spatial point process models. We present a novel nonparametric Bayesian method for learning the underlying intensity surface built upon a combination of Dirichlet pr…
▽ More
Basketball shot location data provide valuable summary information regarding players to coaches, sports analysts, fans, statisticians, as well as players themselves. Represented by spatial points, such data are naturally analyzed with spatial point process models. We present a novel nonparametric Bayesian method for learning the underlying intensity surface built upon a combination of Dirichlet process and Markov random field. Our method has the advantage of effectively encouraging local spatial homogeneity when estimating a globally heterogeneous intensity surface. Posterior inferences are performed with an efficient Markov chain Monte Carlo (MCMC) algorithm. Simulation studies show that the inferences are accurate and that the method is superior compared to the competing methods. Application to the shot location data of $20$ representative NBA players in the 2017-2018 regular season offers interesting insights about the shooting patterns of these players. A comparison against the competing method shows that the proposed method can effectively incorporate spatial contiguity into the estimation of intensity surfaces.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
Bayesian Analysis of Static Light Scattering Data for Globular Proteins
Authors:
Fan Yin,
Domarin Khago,
Rachel W. Martin,
Carter T. Butts
Abstract:
Static light scattering is a popular physical chemistry technique that enables calculation of physical attributes such as the radius of gyration and the second virial coefficient for a macromolecule (e.g., a polymer or a protein) in solution. The second virial coefficient is a physical quantity that characterizes the magnitude and sign of pairwise interactions between particles, and hence is relat…
▽ More
Static light scattering is a popular physical chemistry technique that enables calculation of physical attributes such as the radius of gyration and the second virial coefficient for a macromolecule (e.g., a polymer or a protein) in solution. The second virial coefficient is a physical quantity that characterizes the magnitude and sign of pairwise interactions between particles, and hence is related to aggregation propensity, a property of considerable scientific and practical interest. Estimating the second virial coefficient from experimental data is challenging due both to the degree of precision required and the complexity of the error structure involved. In contrast to conventional approaches based on heuristic OLS estimates, Bayesian inference for the second virial coefficient allows explicit modeling of error processes, incorporation of prior information, and the ability to directly test competing physical models. Here, we introduce a fully Bayesian model for static light scattering experiments on small-particle systems, with joint inference for concentration, index of refraction, oligomer size, and the second virial coefficient. We apply our proposed model to study the aggregation behavior of hen egg-white lysozyme and human gammaS-crystallin using in-house experimental data. Based on these observations, we also perform a simulation study on the primary drivers of uncertainty in this family of experiments, showing in particular the potential for improved monitoring and control of concentration to aid inference.
△ Less
Submitted 31 October, 2020;
originally announced November 2020.
-
Graph Neural Network for Large-Scale Network Localization
Authors:
Wenzhong Yan,
Di Jin,
Zhidi Lin,
Feng Yin
Abstract:
Graph neural networks (GNNs) are popular to use for classifying structured data in the context of machine learning. But surprisingly, they are rarely applied to regression problems. In this work, we adopt GNN for a classic but challenging nonlinear regression problem, namely the network localization. Our main findings are in order. First, GNN is potentially the best solution to large-scale network…
▽ More
Graph neural networks (GNNs) are popular to use for classifying structured data in the context of machine learning. But surprisingly, they are rarely applied to regression problems. In this work, we adopt GNN for a classic but challenging nonlinear regression problem, namely the network localization. Our main findings are in order. First, GNN is potentially the best solution to large-scale network localization in terms of accuracy, robustness and computational time. Second, proper thresholding of the communication range is essential to its superior performance. Simulation results corroborate that the proposed GNN based method outperforms all state-of-the-art benchmarks by far. Such inspiring results are theoretically justified in terms of data aggregation, non-line-of-sight (NLOS) noise removal and low-pass filtering effect, all affected by the threshold for neighbor selection. Code is available at https://github.com/Yanzongzi/GNN-For-localization.
△ Less
Submitted 15 February, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Analysis of professional basketball field goal attempts via a Bayesian matrix clustering approach
Authors:
Fan Yin,
Guanyu Hu,
Weining Shen
Abstract:
We propose a Bayesian nonparametric matrix clustering approach to analyze the latent heterogeneity structure in the shot selection data collected from professional basketball players in the National Basketball Association (NBA). The proposed method adopts a mixture of finite mixtures framework and fully utilizes the spatial information via a mixture of matrix normal distribution representation. We…
▽ More
We propose a Bayesian nonparametric matrix clustering approach to analyze the latent heterogeneity structure in the shot selection data collected from professional basketball players in the National Basketball Association (NBA). The proposed method adopts a mixture of finite mixtures framework and fully utilizes the spatial information via a mixture of matrix normal distribution representation. We propose an efficient Markov chain Monte Carlo algorithm for posterior sampling that allows simultaneous inference on both the number of clusters and the cluster configurations. We also establish large sample convergence properties for the posterior distribution. The excellent empirical performance of the proposed method is demonstrated via simulation studies and an application to shot chart data from selected players in the 2017 18 NBA regular season.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Optimally Combining Classifiers for Semi-Supervised Learning
Authors:
Zhiguo Wang,
Liusha Yang,
Feng Yin,
Ke Lin,
Qingjiang Shi,
Zhi-Quan Luo
Abstract:
This paper considers semi-supervised learning for tabular data. It is widely known that Xgboost based on tree model works well on the heterogeneous features while transductive support vector machine can exploit the low density separation assumption. However, little work has been done to combine them together for the end-to-end semi-supervised learning. In this paper, we find these two methods have…
▽ More
This paper considers semi-supervised learning for tabular data. It is widely known that Xgboost based on tree model works well on the heterogeneous features while transductive support vector machine can exploit the low density separation assumption. However, little work has been done to combine them together for the end-to-end semi-supervised learning. In this paper, we find these two methods have complementary properties and larger diversity, which motivates us to propose a new semi-supervised learning method that is able to adaptively combine the strengths of Xgboost and transductive support vector machine. Instead of the majority vote rule, an optimization problem in terms of ensemble weight is established, which helps to obtain more accurate pseudo labels for unlabeled data. The experimental results on the UCI data sets and real commercial data set demonstrate the superior classification performance of our method over the five state-of-the-art algorithms improving test accuracy by about $3\%-4\%$. The partial code can be found at https://github.com/hav-cam-mit/CTO.
△ Less
Submitted 7 June, 2020;
originally announced June 2020.
-
Kernel-based Approximate Bayesian Inference for Exponential Family Random Graph Models
Authors:
Fan Yin,
Carter T. Butts
Abstract:
Bayesian inference for exponential family random graph models (ERGMs) is a doubly-intractable problem because of the intractability of both the likelihood and posterior normalizing factor. Auxiliary variable based Markov Chain Monte Carlo (MCMC) methods for this problem are asymptotically exact but computationally demanding, and are difficult to extend to modified ERGM families. In this work, we p…
▽ More
Bayesian inference for exponential family random graph models (ERGMs) is a doubly-intractable problem because of the intractability of both the likelihood and posterior normalizing factor. Auxiliary variable based Markov Chain Monte Carlo (MCMC) methods for this problem are asymptotically exact but computationally demanding, and are difficult to extend to modified ERGM families. In this work, we propose a kernel-based approximate Bayesian computation algorithm for fitting ERGMs. By employing an adaptive importance sampling technique, we greatly improve the efficiency of the sampling step. Though approximate, our easily parallelizable approach is yields comparable accuracy to state-of-the-art methods with substantial improvements in compute time on multi-core hardware. Our approach also flexibly accommodates both algorithmic enhancements (including improved learning algorithms for estimating conditional expectations) and extensions to non-standard cases such as inference from non-sufficient statistics. We demonstrate the performance of this approach on two well-known network data sets, comparing its accuracy and efficiency with results obtained using the approximate exchange algorithm. Our tests show a wallclock time advantage of up to 50% with five cores, and the ability to fit models in 1/5th the time at 30 cores; further speed enhancements are possible when more cores are available.
△ Less
Submitted 13 July, 2020; v1 submitted 17 April, 2020;
originally announced April 2020.
-
FedLoc: Federated Learning Framework for Data-Driven Cooperative Localization and Location Data Processing
Authors:
Feng Yin,
Zhidi Lin,
Yue Xu,
Qinglei Kong,
Deshi Li,
Sergios Theodoridis,
Shuguang,
Cui
Abstract:
In this overview paper, data-driven learning model-based cooperative localization and location data processing are considered, in line with the emerging machine learning and big data methods. We first review (1) state-of-the-art algorithms in the context of federated learning, (2) two widely used learning models, namely the deep neural network model and the Gaussian process model, and (3) various…
▽ More
In this overview paper, data-driven learning model-based cooperative localization and location data processing are considered, in line with the emerging machine learning and big data methods. We first review (1) state-of-the-art algorithms in the context of federated learning, (2) two widely used learning models, namely the deep neural network model and the Gaussian process model, and (3) various distributed model hyper-parameter optimization schemes. Then, we demonstrate various practical use cases that are summarized from a mixture of standard, newly published, and unpublished works, which cover a broad range of location services, including collaborative static localization/fingerprinting, indoor target tracking, outdoor navigation using low-sampling GPS, and spatio-temporal wireless traffic data modeling and prediction. Experimental results show that near centralized data fitting- and prediction performance can be achieved by a set of collaborative mobile users running distributed algorithms. All the surveyed use cases fall under our newly proposed Federated Localization (FedLoc) framework, which targets on collaboratively building accurate location services without sacrificing user privacy, in particular, sensitive information related to their geographical trajectories. Future research directions are also discussed at the end of this paper.
△ Less
Submitted 25 May, 2020; v1 submitted 7 March, 2020;
originally announced March 2020.
-
Scalable Learning Paradigms for Data-Driven Wireless Communication
Authors:
Yue Xu,
Feng Yin,
Wenjun Xu,
Chia-Han Lee,
Jiaru Lin,
Shuguang Cui
Abstract:
The marriage of wireless big data and machine learning techniques revolutionizes the wireless system by the data-driven philosophy. However, the ever exploding data volume and model complexity will limit centralized solutions to learn and respond within a reasonable time. Therefore, scalability becomes a critical issue to be solved. In this article, we aim to provide a systematic discussion on the…
▽ More
The marriage of wireless big data and machine learning techniques revolutionizes the wireless system by the data-driven philosophy. However, the ever exploding data volume and model complexity will limit centralized solutions to learn and respond within a reasonable time. Therefore, scalability becomes a critical issue to be solved. In this article, we aim to provide a systematic discussion on the building blocks of scalable data-driven wireless networks. On one hand, we discuss the forward-looking architecture and computing framework of scalable data-driven systems from a global perspective. On the other hand, we discuss the learning algorithms and model training strategies performed at each individual node from a local perspective. We also highlight several promising research directions in the context of scalable data-driven wireless communications to inspire future research.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Finite Mixtures of ERGMs for Modeling Ensembles of Networks
Authors:
Fan Yin,
Weining Shen,
Carter T. Butts
Abstract:
Ensembles of networks arise in many scientific fields, but there are few statistical tools for inferring their generative processes, particularly in the presence of both dyadic dependence and cross-graph heterogeneity. To fill in this gap, we propose characterizing network ensembles via finite mixtures of exponential family random graph models, a framework for parametric statistical modeling of gr…
▽ More
Ensembles of networks arise in many scientific fields, but there are few statistical tools for inferring their generative processes, particularly in the presence of both dyadic dependence and cross-graph heterogeneity. To fill in this gap, we propose characterizing network ensembles via finite mixtures of exponential family random graph models, a framework for parametric statistical modeling of graphs that has been successful in explicitly modeling the complex stochastic processes that govern the structure of edges in a network. Our proposed modeling framework can also be used for applications such as model-based clustering of ensembles of networks and density estimation for complex graph distributions. We develop a Metropolis-within-Gibbs algorithm to conduct fully Bayesian inference and adapt a version of deviance information criterion for missing data models to choose the number of latent heterogeneous generative mechanisms. Simulation studies show that the proposed procedure can recover the true number of latent heterogeneous generative processes and corresponding parameters. We demonstrate the utility of the proposed approach using an ensemble of political co-voting networks among U.S. Senators.
△ Less
Submitted 22 April, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Selection of Exponential-Family Random Graph Models via Held-Out Predictive Evaluation (HOPE)
Authors:
Fan Yin,
Nolan Edward Phillips,
Carter T. Butts
Abstract:
Statistical models for networks with complex dependencies pose particular challenges for model selection and evaluation. In particular, many well-established statistical tools for selecting between models assume conditional independence of observations and/or conventional asymptotics, and their theoretical foundations are not always applicable in a network modeling context. While simulation-based…
▽ More
Statistical models for networks with complex dependencies pose particular challenges for model selection and evaluation. In particular, many well-established statistical tools for selecting between models assume conditional independence of observations and/or conventional asymptotics, and their theoretical foundations are not always applicable in a network modeling context. While simulation-based approaches to model adequacy assessment are now widely used, there remains a need for procedures that quantify a model's performance in a manner suitable for selecting among competing models. Here, we propose to address this issue by developing a predictive evaluation strategy for exponential family random graph models that is analogous to cross-validation. Our approach builds on the held-out predictive evaluation (HOPE) scheme introduced by Wang et al. (2016) to assess imputation performance. We systematically hold out parts of the observed network to: evaluate how well the model is able to predict the held-out data; identify where the model performs poorly based on which data are held-out, indicating e.g. potential weaknesses; and calculate general summaries of predictive performance that can be used for model selection. As such, HOPE can assist researchers in improving models by indicating where a model performs poorly, and by quantitatively comparing predictive performance across competing models. The proposed method is applied to model selection problem of two well-known data sets, and the results are compared to those obtained via nominal AIC and BIC scores.
△ Less
Submitted 19 August, 2019; v1 submitted 16 August, 2019;
originally announced August 2019.
-
Gaussian Processes for Analyzing Positioned Trajectories in Sports
Authors:
Yuxin Zhao,
Feng Yin,
Fredrik Gunnarsson,
Fredrik Hultkrantz
Abstract:
Kernel-based machine learning approaches are gaining increasing interest for exploring and modeling large dataset in recent years. Gaussian process (GP) is one example of such kernel-based approaches, which can provide very good performance for nonlinear modeling problems. In this work, we first propose a grey-box modeling approach to analyze the forces in cross country skiing races. To be more pr…
▽ More
Kernel-based machine learning approaches are gaining increasing interest for exploring and modeling large dataset in recent years. Gaussian process (GP) is one example of such kernel-based approaches, which can provide very good performance for nonlinear modeling problems. In this work, we first propose a grey-box modeling approach to analyze the forces in cross country skiing races. To be more precise, a disciplined set of kinetic motion model formulae is combined with data-driven Gaussian process regression model, which accounts for everything unknown in the system. Then, a modeling approach is proposed to analyze the kinetic flow of both individual and clusters of skiers. The proposed approaches can be generally applied to use cases where positioned trajectories and kinetic measurements are available. The proposed approaches are evaluated using data collected from the Falun Nordic World Ski Championships 2015, in particular the Men's cross country $4\times10$ km relay. Forces during the cross country skiing races are analyzed and compared. Velocity models for skiers at different competition stages are also evaluated. Finally, the comparisons between the grey-box and black-box approach are carried out, where the grey-box approach can reduce the predictive uncertainty by $30\%$ to $40\%$.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
A General $\mathcal{O}(n^2)$ Hyper-Parameter Optimization for Gaussian Process Regression with Cross-Validation and Non-linearly Constrained ADMM
Authors:
Linning Xu,
Feng Yin,
Jiawei Zhang,
Zhi-Quan Luo,
Shuguang Cui
Abstract:
Hyper-parameter optimization remains as the core issue of Gaussian process (GP) for machine learning nowadays. The benchmark method using maximum likelihood (ML) estimation and gradient descent (GD) is impractical for processing big data due to its $O(n^3)$ complexity. Many sophisticated global or local approximation models, for instance, sparse GP, distributed GP, have been proposed to address su…
▽ More
Hyper-parameter optimization remains as the core issue of Gaussian process (GP) for machine learning nowadays. The benchmark method using maximum likelihood (ML) estimation and gradient descent (GD) is impractical for processing big data due to its $O(n^3)$ complexity. Many sophisticated global or local approximation models, for instance, sparse GP, distributed GP, have been proposed to address such complexity issue. In this paper, we propose two novel and general-purpose GP hyper-parameter training schemes (GPCV-ADMM) by replacing ML with cross-validation (CV) as the fitting criterion and replacing GD with a non-linearly constrained alternating direction method of multipliers (ADMM) as the optimization method. The proposed schemes are of $O(n^2)$ complexity for any covariance matrix without special structure. We conduct various experiments based on both synthetic and real data sets, wherein the proposed schemes show excellent performance in terms of convergence, hyper-parameter estimation accuracy, and computational time in comparison with the traditional ML based routines given in the GPML toolbox.
△ Less
Submitted 6 June, 2019; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Linear Multiple Low-Rank Kernel Based Stationary Gaussian Processes Regression for Time Series
Authors:
Feng Yin,
Lishuo Pan,
Xinwei He,
Tianshi Chen,
Sergios Theodoridis,
Zhi-Quan,
Luo
Abstract:
Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyper-parameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The under…
▽ More
Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyper-parameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The underlying stationary kernel can be approximated arbitrarily close by a new proposed grid spectral mixture (GSM) kernel, which turns out to be a linear combination of low-rank sub-kernels. In the case where a large number of the sub-kernels are used, either the Nyström or the random Fourier feature approximations can be adopted to deal efficiently with the computational demands. The unknown GP hyper-parameters consist of the non-negative weights of all sub-kernels as well as the noise variance; their estimation is performed via the maximum-likelihood (ML) estimation framework. Two efficient numerical optimization methods for solving the unknown hyper-parameters are derived, including a sequential majorization-minimization (MM) method and a non-linearly constrained alternating direction of multiplier method (ADMM). The MM matches perfectly with the proven low-rank property of the proposed GSM sub-kernels and turns out to be a part of efficiency, stable, and efficient solver, while the ADMM has the potential to generate better local minimum in terms of the test MSE. Experimental results, based on various classic time series data sets, corroborate that the proposed GSM kernel-based GP regression model outperforms several salient competitors of similar kind in terms of prediction mean-squared-error and numerical stability.
△ Less
Submitted 21 April, 2019;
originally announced April 2019.
-
Wireless Traffic Prediction with Scalable Gaussian Process: Framework, Algorithms, and Verification
Authors:
Yue Xu,
Feng Yin,
Wenjun Xu,
Jiaru Lin,
Shuguang Cui
Abstract:
The cloud radio access network (C-RAN) is a promising paradigm to meet the stringent requirements of the fifth generation (5G) wireless systems. Meanwhile, wireless traffic prediction is a key enabler for C-RANs to improve both the spectrum efficiency and energy efficiency through load-aware network managements. This paper proposes a scalable Gaussian process (GP) framework as a promising solution…
▽ More
The cloud radio access network (C-RAN) is a promising paradigm to meet the stringent requirements of the fifth generation (5G) wireless systems. Meanwhile, wireless traffic prediction is a key enabler for C-RANs to improve both the spectrum efficiency and energy efficiency through load-aware network managements. This paper proposes a scalable Gaussian process (GP) framework as a promising solution to achieve large-scale wireless traffic prediction in a cost-efficient manner. Our contribution is three-fold. First, to the best of our knowledge, this paper is the first to empower GP regression with the alternating direction method of multipliers (ADMM) for parallel hyper-parameter optimization in the training phase, where such a scalable training framework well balances the local estimation in baseband units (BBUs) and information consensus among BBUs in a principled way for large-scale executions. Second, in the prediction phase, we fuse local predictions obtained from the BBUs via a cross-validation based optimal strategy, which demonstrates itself to be reliable and robust for general regression tasks. Moreover, such a cross-validation based optimal fusion strategy is built upon a well acknowledged probabilistic model to retain the valuable closed-form GP inference properties. Third, we propose a C-RAN based scalable wireless prediction architecture, where the prediction accuracy and the time consumption can be balanced by tuning the number of the BBUs according to the real-time system demands. Experimental results show that our proposed scalable GP model can outperform the state-of-the-art approaches considerably, in terms of wireless traffic prediction performance.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Multitask Gaussian Process with Hierarchical Latent Interactions
Authors:
Kai Chen,
Twan van Laarhoven,
Elena Marchiori,
Feng Yin,
Shuguang Cui
Abstract:
Multitask Gaussian process (MTGP) is powerful for joint learning of multiple tasks with complicated correlation patterns. However, due to the assembling of additive independent latent functions, all current MTGPs including the salient linear model of coregionalization (LMC) and convolution frameworks cannot effectively represent and learn the hierarchical latent interactions between its latent fun…
▽ More
Multitask Gaussian process (MTGP) is powerful for joint learning of multiple tasks with complicated correlation patterns. However, due to the assembling of additive independent latent functions, all current MTGPs including the salient linear model of coregionalization (LMC) and convolution frameworks cannot effectively represent and learn the hierarchical latent interactions between its latent functions. In this paper, we further investigate the interactions in LMC of MTGP and then propose a novel kernel representation of the hierarchical interactions, which ameliorates both the expressiveness and the interpretability of MTGP. Specifically, we express the interaction as a product of function interaction and coefficient interaction. The function interaction is modeled by using cross convolution of latent functions. The coefficient interaction between the LMCs is described as a cross coregionalization term. We validate that considering the interactions can promote knowledge transferring in MTGP and compare our approach with some state-of-the-art MTGPs on both synthetic- and real-world datasets.
△ Less
Submitted 2 October, 2021; v1 submitted 3 August, 2018;
originally announced August 2018.
-
Compressible Spectral Mixture Kernels with Sparse Dependency Structures for Gaussian Processes
Authors:
Kai Chen,
Yijue Dai,
Feng Yin,
Elena Marchiori,
Sergios Theodoridis
Abstract:
Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaymés identity, we generalize the dependency structure through cross-co…
▽ More
Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaymés identity, we generalize the dependency structure through cross-covariance between the SM components. Then, we propose a novel SM kernel with a dependency structure (SMD) by using cross-convolution between the SM components. Furthermore, we ameliorate the expressiveness of the dependency structure by parameterizing it with time and phase delays. The dependency structure has clear interpretations in terms of spectral density, covariance behavior, and sampling path. To enrich the SMD with effective hyperparameter initialization, compressible SM kernel components, and sparse dependency structures, we introduce a novel structure adaptation (SA) algorithm in the end. A thorough comparative analysis of the SMD on both synthetic and real-life applications corroborates its efficacy.
△ Less
Submitted 26 July, 2023; v1 submitted 1 August, 2018;
originally announced August 2018.
-
Comparisons of penalized least squares methods by simulations
Authors:
Ke Zhang,
Fan Yin,
Shifeng Xiong
Abstract:
Penalized least squares methods are commonly used for simultaneous estimation and variable selection in high-dimensional linear models. In this paper we compare several prevailing methods including the lasso, nonnegative garrote, and SCAD in this area through Monte Carlo simulations. Criterion for evaluating these methods in terms of variable selection and estimation are presented. This paper focu…
▽ More
Penalized least squares methods are commonly used for simultaneous estimation and variable selection in high-dimensional linear models. In this paper we compare several prevailing methods including the lasso, nonnegative garrote, and SCAD in this area through Monte Carlo simulations. Criterion for evaluating these methods in terms of variable selection and estimation are presented. This paper focuses on the traditional n > p cases. For larger p, our results are still helpful to practitioners after the dimensionality is reduced by a screening method. K
△ Less
Submitted 7 May, 2014;
originally announced May 2014.