-
Learning Density Evolution from Snapshot Data
Authors:
Rentian Yao,
Atsushi Nitanda,
Xiaohui Chen,
Yun Yang
Abstract:
Motivated by learning dynamical structures from static snapshot data, this paper presents a distribution-on-scalar regression approach for estimating the density evolution of a stochastic process from its noisy temporal point clouds. We propose an entropy-regularized nonparametric maximum likelihood estimator (E-NPMLE), which leverages the entropic optimal transport as a smoothing regularizer for…
▽ More
Motivated by learning dynamical structures from static snapshot data, this paper presents a distribution-on-scalar regression approach for estimating the density evolution of a stochastic process from its noisy temporal point clouds. We propose an entropy-regularized nonparametric maximum likelihood estimator (E-NPMLE), which leverages the entropic optimal transport as a smoothing regularizer for the density flow. We show that the E-NPMLE has almost dimension-free statistical rates of convergence to the ground truth distributions, which exhibit a striking phase transition phenomenon in terms of the number of snapshots and per-snapshot sample size. To efficiently compute the E-NPMLE, we design a novel particle-based and grid-free coordinate KL divergence gradient descent (CKLGD) algorithm and prove its polynomial iteration complexity. Moreover, we provide numerical evidence on synthetic data to support our theoretical findings. This work contributes to the theoretical understanding and practical computation of estimating density evolution from noisy observations in arbitrary dimensions.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization
Authors:
Kaheon Kim,
Rentian Yao,
Changbo Zhu,
Xiaohui Chen
Abstract:
The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter for discretized probability distributions on point clouds is a challenging task when the domain dimension $d > 1$. Most practical algorithms for approximating th…
▽ More
The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter for discretized probability distributions on point clouds is a challenging task when the domain dimension $d > 1$. Most practical algorithms for approximating the barycenter problem are based on entropic regularization. In this paper, we introduce a nearly linear time $O(m \log{m})$ and linear space complexity $O(m)$ primal-dual algorithm, the Wasserstein-Descent $\dot{\mathbb{H}}^1$-Ascent (WDHA) algorithm, for computing the exact barycenter when the input probability density functions are discretized on an $m$-point grid. The key success of the WDHA algorithm hinges on alternating between two different yet closely related Wasserstein and Sobolev optimization geometries for the primal barycenter and dual Kantorovich potential subproblems. Under reasonable assumptions, we establish the convergence rate and iteration complexity of WDHA to its stationary point when the step size is appropriately chosen. Superior computational efficacy, scalability, and accuracy over the existing Sinkhorn-type algorithms are demonstrated on high-resolution (e.g., $1024 \times 1024$ images) 2D synthetic and real data.
△ Less
Submitted 25 May, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Agent-based Simulation Evaluation of CBD Tolling: A Case Study from New York City
Authors:
Qingnan Liang,
Ruili Yao,
Ruixuan Zhang,
Zhibin Chen,
Guoyuan Wu
Abstract:
Congestion tollings have been widely developed and adopted as an effective tool to mitigate urban traffic congestion and enhance transportation system sustainability. Nevertheless, these tolling schemes are often tailored on a city-by-city or even area-by-area basis, and the cost of conducting field experiments often makes the design and evaluation process challenging. In this work, we leverage MA…
▽ More
Congestion tollings have been widely developed and adopted as an effective tool to mitigate urban traffic congestion and enhance transportation system sustainability. Nevertheless, these tolling schemes are often tailored on a city-by-city or even area-by-area basis, and the cost of conducting field experiments often makes the design and evaluation process challenging. In this work, we leverage MATSim, a simulation platform that provides microscopic behaviors at the agent level, to evaluate performance on tolling schemes. Specifically, we conduct a case study of the Manhattan Central Business District (CBD) in New York City (NYC) using a fine-granularity traffic network model in the large-scale agent behavior setting. The flexibility of MATSim enables the implementation of a customized tolling policy proposed yet not deployed by the NYC agency while providing detailed interpretations. The quantitative and qualitative results indicate that the tested tolling program can regulate the personal vehicle volume in the CBD area and encourage the usage of public transportation, which proves to be a practical move towards sustainable transportation systems. More importantly, our work demonstrates that agent-based simulation helps better understand the travel pattern change subject to tollings in dense and complex urban environments, and it has the potential to facilitate efficient decision-making for the devotion to sustainable traffic management.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening
Authors:
Yifan Chen,
Rentian Yao,
Yun Yang,
Jie Chen
Abstract:
Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a diffe…
▽ More
Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a different perspective, developing a theory for preserving graph distances and proposing a method to achieve this. The geometric approach is useful when working with a collection of graphs, such as in graph classification and regression. In this study, we consider a graph as an element on a metric space equipped with the Gromov--Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. Minimizing this difference can be done using the popular weighted kernel $K$-means method, which improves existing spectrum-preserving methods with the proper choice of the kernel. The study includes a set of experiments to support the theory and method, including approximating the GW distance, preserving the graph spectrum, classifying graphs using spectral information, and performing regression using graph convolutional networks. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening .
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Fast Linear Model Trees by PILOT
Authors:
Jakob Raymaekers,
Peter J. Rousseeuw,
Tim Verdonck,
Ruicong Yao
Abstract:
Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addit…
▽ More
Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an $L^2$ boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for $PI$ecewise $L$inear $O$rganic $T$ree, where `organic' refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Efficient Truncated Linear Regression with Unknown Noise Variance
Authors:
Constantinos Daskalakis,
Patroklos Stefanou,
Rui Yao,
Manolis Zampetakis
Abstract:
Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in it…
▽ More
Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of~\citet{tobin1958estimation,amemiya1973regression}. When the distribution of the error is normal with known variance, recent work of~\citet{daskalakis2019truncatedregression} provides computationally and statistically efficient estimators of the linear model, $w$.
In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
Spatio-Temporal Wildfire Prediction using Multi-Modal Data
Authors:
Chen Xu,
Yao Xie,
Daniel A. Zuniga Vazquez,
Rui Yao,
Feng Qiu
Abstract:
Due to severe societal and environmental impacts, wildfire prediction using multi-modal sensing data has become a highly sought-after data-analytical tool by various stakeholders (such as state governments and power utility companies) to achieve a more informed understanding of wildfire activities and plan preventive measures. A desirable algorithm should precisely predict fire risk and magnitude…
▽ More
Due to severe societal and environmental impacts, wildfire prediction using multi-modal sensing data has become a highly sought-after data-analytical tool by various stakeholders (such as state governments and power utility companies) to achieve a more informed understanding of wildfire activities and plan preventive measures. A desirable algorithm should precisely predict fire risk and magnitude for a location in real time. In this paper, we develop a flexible spatio-temporal wildfire prediction framework using multi-modal time series data. We first predict the wildfire risk (the chance of a wildfire event) in real-time, considering the historical events using discrete mutually exciting point process models. Then we further develop a wildfire magnitude prediction set method based on the flexible distribution-free time-series conformal prediction (CP) approach. Theoretically, we prove a risk model parameter recovery guarantee, as well as coverage and set size guarantees for the CP sets. Through extensive real-data experiments with wildfire data in California, we demonstrate the effectiveness of our methods, as well as their flexibility and scalability in large regions.
△ Less
Submitted 10 October, 2023; v1 submitted 26 July, 2022;
originally announced July 2022.
-
Mean-field Variational Inference via Wasserstein Gradient Flow
Authors:
Rentian Yao,
Yun Yang
Abstract:
Variational inference, such as the mean-field (MF) approximation, requires certain conjugacy structures for efficient computation. These can impose unnecessary restrictions on the viable prior distribution family and further constraints on the variational approximation family. In this work, we introduce a general computational framework to implement MF variational inference for Bayesian models, wi…
▽ More
Variational inference, such as the mean-field (MF) approximation, requires certain conjugacy structures for efficient computation. These can impose unnecessary restrictions on the viable prior distribution family and further constraints on the variational approximation family. In this work, we introduce a general computational framework to implement MF variational inference for Bayesian models, with or without latent variables, using the Wasserstein gradient flow (WGF), a modern mathematical technique for realizing a gradient flow over the space of probability measures. Theoretically, we analyze the algorithmic convergence of the proposed approaches, providing an explicit expression for the contraction factor. We also strengthen existing results on MF variational posterior concentration from a polynomial to an exponential contraction, by utilizing the fixed point equation of the time-discretized WGF. Computationally, we propose a new constraint-free function approximation method using neural networks to numerically realize our algorithm. This method is shown to be more precise and efficient than traditional particle approximation methods based on Langevin dynamics.
△ Less
Submitted 8 September, 2023; v1 submitted 17 July, 2022;
originally announced July 2022.
-
Quantifying Grid Resilience Against Extreme Weather Using Large-Scale Customer Power Outage Data
Authors:
Shixiang Zhu,
Rui Yao,
Yao Xie,
Feng Qiu,
Yueming,
Qiu,
Xuan Wu
Abstract:
In recent decades, the weather around the world has become more irregular and extreme, often causing large-scale extended power outages. Resilience -- the capability of withstanding, adapting to, and recovering from a large-scale disruption -- has become a top priority for the power sector. However, the understanding of power grid resilience still stays on the conceptual level mostly or focuses on…
▽ More
In recent decades, the weather around the world has become more irregular and extreme, often causing large-scale extended power outages. Resilience -- the capability of withstanding, adapting to, and recovering from a large-scale disruption -- has become a top priority for the power sector. However, the understanding of power grid resilience still stays on the conceptual level mostly or focuses on particular components, yielding no actionable results or revealing few insights on the system level. This study provides a quantitatively measurable definition of power grid resilience, using a statistical model inspired by patterns observed from data and domain knowledge. We analyze a large-scale quarter-hourly historical electricity customer outage data and the corresponding weather records, and draw connections between the model and industry resilience practice. We showcase the resilience analysis using three major service territories on the east coast of the United States. Our analysis suggests that cumulative weather effects play a key role in causing immediate, sustained outages, and these outages can propagate and cause secondary outages in neighboring areas. The proposed model also provides some interesting insights into grid resilience enhancement planning. For example, our simulation results indicate that enhancing the power infrastructure in a small number of critical locations can reduce nearly half of the number of customer power outages in Massachusetts. In addition, we have shown that our model achieves promising accuracy in predicting the progress of customer power outages throughout extreme weather events, which can be very valuable for system operators and federal agencies to prepare disaster response.
△ Less
Submitted 4 September, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
A variational autoencoder approach for choice set generation and implicit perception of alternatives in choice modeling
Authors:
Rui Yao,
Shlomo Bekhor
Abstract:
This paper derives the generalized extreme value (GEV) model with implicit availability/perception (IAP) of alternatives and proposes a variational autoencoder (VAE) approach for choice set generation and implicit perception of alternatives. Specifically, the cross-nested logit (CNL) model with IAP is derived as an example of IAP-GEV models. The VAE approach is adapted to model the choice set gene…
▽ More
This paper derives the generalized extreme value (GEV) model with implicit availability/perception (IAP) of alternatives and proposes a variational autoencoder (VAE) approach for choice set generation and implicit perception of alternatives. Specifically, the cross-nested logit (CNL) model with IAP is derived as an example of IAP-GEV models. The VAE approach is adapted to model the choice set generation process, in which the likelihood of perceiving chosen alternatives in the choice set is maximized. The VAE approach for route choice set generation is exemplified using a real dataset. IAP- CNL model estimated has the best performance in terms of goodness-of-fit and prediction performance, compared to multinomial logit models and conventional choice set generation methods.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Deep Active Learning for Solvability Prediction in Power Systems
Authors:
Yichen Zhang,
Jianzhe Liu,
Feng Qiu,
Tianqi Hong,
Rui Yao
Abstract:
Traditional methods for solvability region analysis can only have inner approximations with inconclusive conservatism. Machine learning methods have been proposed to approach the real region. In this letter, we propose a deep active learning framework for power system solvability prediction. Compared with the passive learning methods where the training is performed after all instances are labeled,…
▽ More
Traditional methods for solvability region analysis can only have inner approximations with inconclusive conservatism. Machine learning methods have been proposed to approach the real region. In this letter, we propose a deep active learning framework for power system solvability prediction. Compared with the passive learning methods where the training is performed after all instances are labeled, the active learning selects most informative instances to be label and therefore significantly reduce the size of labeled dataset for training. In the active learning framework, the acquisition functions, which correspond to different sampling strategies, are defined in terms of the on-the-fly posterior probability from the classifier. The IEEE 39-bus system is employed to validate the proposed framework, where a two-dimensional case is illustrated to visualize the effectiveness of the sampling method followed by the full-dimensional numerical experiments.
△ Less
Submitted 22 December, 2020; v1 submitted 26 July, 2020;
originally announced July 2020.
-
Experiments on route choice set generation using a large GPS trajectory set
Authors:
Rui Yao,
Shlomo Bekhor
Abstract:
Several route choice models developed in the literature were based on a relatively small number of observations. With the extensive use of tracking devices in recent surveys, there is a possibility to obtain insights with respect to the traveler's choice behavior. In this paper, different path generation algorithms are evaluated using a large GPS trajectory dataset. The dataset contains 6,000 obse…
▽ More
Several route choice models developed in the literature were based on a relatively small number of observations. With the extensive use of tracking devices in recent surveys, there is a possibility to obtain insights with respect to the traveler's choice behavior. In this paper, different path generation algorithms are evaluated using a large GPS trajectory dataset. The dataset contains 6,000 observations from Tel-Aviv metropolitan area. An initial analysis is performed by generating a single route based on the shortest path. Almost 60% percent of the 6,000 observations can be covered (assuming a threshold of 80% overlap) using a single path. This result significantly contrasts previous literature findings. Link penalty, link elimination, simulation and via-node methods are applied to generate route sets, and the consistency of the algorithms are compared. A modified link penalty method, which accounts for preference of using higher hierarchical roads, provides a route set with 97% coverage (80% overlap threshold). The via-node method produces route set with satisfying coverage, and generates routes that are more heterogeneous (in terms number of links and routes ratio).
△ Less
Submitted 22 May, 2020;
originally announced June 2020.
-
Method and Dataset Mining in Scientific Papers
Authors:
Rujing Yao,
Linlin Hou,
Yingchun Ye,
Ou Wu,
Ji Zhang,
Jian Wu
Abstract:
Literature analysis facilitates researchers better understanding the development of science and technology. The conventional literature analysis focuses on the topics, authors, abstracts, keywords, references, etc., and rarely pays attention to the content of papers. In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers. The extraction and mining…
▽ More
Literature analysis facilitates researchers better understanding the development of science and technology. The conventional literature analysis focuses on the topics, authors, abstracts, keywords, references, etc., and rarely pays attention to the content of papers. In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers. The extraction and mining of M and D are useful for discipline analysis and algorithm recommendation. In this paper, we propose a novel entity recognition model, called MDER, and constructe datasets from the papers of the PAKDD conferences (2009-2019). Some preliminary experiments are conducted to assess the extraction performance and the mining results are visualized.
△ Less
Submitted 29 November, 2019;
originally announced November 2019.
-
Online detection of cascading change-points
Authors:
Rui Zhang,
Yao Xie,
Rui Yao,
Feng Qiu
Abstract:
We propose an online detection procedure for cascading failures in the network from sequential data, which can be modeled as multiple correlated change-points happening during a short period. We consider a temporal diffusion network model to capture the temporal dynamic structure of multiple change-points and develop a sequential Shewhart procedure based on the generalized likelihood ratio statist…
▽ More
We propose an online detection procedure for cascading failures in the network from sequential data, which can be modeled as multiple correlated change-points happening during a short period. We consider a temporal diffusion network model to capture the temporal dynamic structure of multiple change-points and develop a sequential Shewhart procedure based on the generalized likelihood ratio statistics based on the diffusion network model assuming unknown post-change distribution parameters. We also tackle the computational complexity posed by the unknown propagation. Numerical experiments demonstrate the good performance for detecting cascade failures.
△ Less
Submitted 5 February, 2021; v1 submitted 28 October, 2019;
originally announced November 2019.