Search | arXiv e-print repository

Pedestrian Volume Prediction Using a Diffusion Convolutional Gated Recurrent Unit Model

Authors: Yiwei Dong, Tingjin Chu, Lele Zhang, Hadi Ghaderi, Hanfang Yang

Abstract: Effective models for analysing and predicting pedestrian flow are important to ensure the safety of both pedestrians and other road users. These tools also play a key role in optimising infrastructure design and geometry and supporting the economic utility of interconnected communities. The implementation of city-wide automatic pedestrian counting systems provides researchers with invaluable data,… ▽ More Effective models for analysing and predicting pedestrian flow are important to ensure the safety of both pedestrians and other road users. These tools also play a key role in optimising infrastructure design and geometry and supporting the economic utility of interconnected communities. The implementation of city-wide automatic pedestrian counting systems provides researchers with invaluable data, enabling the development and training of deep learning applications that offer better insights into traffic and crowd flows. Benefiting from real-world data provided by the City of Melbourne pedestrian counting system, this study presents a pedestrian flow prediction model, as an extension of Diffusion Convolutional Grated Recurrent Unit (DCGRU) with dynamic time warping, named DCGRU-DTW. This model captures the spatial dependencies of pedestrian flow through the diffusion process and the temporal dependency captured by Gated Recurrent Unit (GRU). Through extensive numerical experiments, we demonstrate that the proposed model outperforms the classic vector autoregressive model and the original DCGRU across multiple model accuracy metrics. △ Less

Submitted 4 November, 2024; originally announced November 2024.

arXiv:2406.13725 [pdf, ps, other]

Tree-Sliced Wasserstein Distance: A Geometric Perspective

Authors: Viet-Hoang Tran, Trang Pham, Tho Tran, Minh Khoi Nguyen Nhat, Thanh Chu, Tam Le, Tan M. Nguyen

Abstract: Many variants of Optimal Transport (OT) have been developed to address its heavy computation. Among them, notably, Sliced Wasserstein (SW) is widely used for application domains by projecting the OT problem onto one-dimensional lines, and leveraging the closed-form expression of the univariate OT to reduce the computational burden. However, projecting measures onto low-dimensional spaces can lead… ▽ More Many variants of Optimal Transport (OT) have been developed to address its heavy computation. Among them, notably, Sliced Wasserstein (SW) is widely used for application domains by projecting the OT problem onto one-dimensional lines, and leveraging the closed-form expression of the univariate OT to reduce the computational burden. However, projecting measures onto low-dimensional spaces can lead to a loss of topological information. To mitigate this issue, in this work, we propose to replace one-dimensional lines with a more intricate structure, called tree systems. This structure is metrizable by a tree metric, which yields a closed-form expression for OT problems on tree systems. We provide an extensive theoretical analysis to formally define tree systems with their topological properties, introduce the concept of splitting maps, which operate as the projection mechanism onto these structures, then finally propose a novel variant of Radon transform for tree systems and verify its injectivity. This framework leads to an efficient metric between measures, termed Tree-Sliced Wasserstein distance on Systems of Lines (TSW-SL). By conducting a variety of experiments on gradient flows, image style transfer, and generative models, we illustrate that our proposed approach performs favorably compared to SW and its variants. △ Less

Submitted 9 June, 2025; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to ICML 2025

arXiv:2402.17595 [pdf, other]

Implicit Regularization via Spectral Neural Networks and Non-linear Matrix Sensing

Authors: Hong T. M. Chu, Subhro Ghosh, Chi Thanh Lam, Soumendu Sundar Mukherjee

Abstract: The phenomenon of implicit regularization has attracted interest in recent years as a fundamental aspect of the remarkable generalizing ability of neural networks. In a nutshell, it entails that gradient descent dynamics in many neural nets, even without any explicit regularizer in the loss function, converges to the solution of a regularized learning problem. However, known results attempting to… ▽ More The phenomenon of implicit regularization has attracted interest in recent years as a fundamental aspect of the remarkable generalizing ability of neural networks. In a nutshell, it entails that gradient descent dynamics in many neural nets, even without any explicit regularizer in the loss function, converges to the solution of a regularized learning problem. However, known results attempting to theoretically explain this phenomenon focus overwhelmingly on the setting of linear neural nets, and the simplicity of the linear structure is particularly crucial to existing arguments. In this paper, we explore this problem in the context of more realistic neural networks with a general class of non-linear activation functions, and rigorously demonstrate the implicit regularization phenomenon for such networks in the setting of matrix sensing problems, together with rigorous rate guarantees that ensure exponentially fast convergence of gradient descent.In this vein, we contribute a network architecture called Spectral Neural Networks (abbrv. SNN) that is particularly suitable for matrix learning problems. Conceptually, this entails coordinatizing the space of matrices by their singular values and singular vectors, as opposed to by their entries, a potentially fruitful perspective for matrix learning. We demonstrate that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets and confirm its effectiveness in the context of matrix sensing, via both mathematical guarantees and empirical investigations. We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.14029 [pdf, other]

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

Authors: Hikari Otsuka, Daiki Chijiwa, Ángel López García-Arias, Yasuyuki Okoshi, Kazushi Kawamura, Thiem Van Chu, Daichi Fujiki, Susumu Takeuchi, Masato Motomura

Abstract: Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are e… ▽ More Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing $70\%$ of a ResNet on ImageNet provides $3.3 \times$ compression compared to the SLT found within a dense counterpart, raises accuracy by up to $14.12$ points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both. △ Less

Submitted 8 February, 2025; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted at TMLR

arXiv:2312.03236 [pdf, other]

Multicoated and Folded Graph Neural Networks with Strong Lottery Tickets

Authors: Jiale Yan, Hiroaki Ito, Ángel López García-Arias, Yasuyuki Okoshi, Hikari Otsuka, Kazushi Kawamura, Thiem Van Chu, Masato Motomura

Abstract: The Strong Lottery Ticket Hypothesis (SLTH) demonstrates the existence of high-performing subnetworks within a randomly initialized model, discoverable through pruning a convolutional neural network (CNN) without any weight training. A recent study, called Untrained GNNs Tickets (UGT), expanded SLTH from CNNs to shallow graph neural networks (GNNs). However, discrepancies persist when comparing ba… ▽ More The Strong Lottery Ticket Hypothesis (SLTH) demonstrates the existence of high-performing subnetworks within a randomly initialized model, discoverable through pruning a convolutional neural network (CNN) without any weight training. A recent study, called Untrained GNNs Tickets (UGT), expanded SLTH from CNNs to shallow graph neural networks (GNNs). However, discrepancies persist when comparing baseline models with learned dense weights. Additionally, there remains an unexplored area in applying SLTH to deeper GNNs, which, despite delivering improved accuracy with additional layers, suffer from excessive memory requirements. To address these challenges, this work utilizes Multicoated Supermasks (M-Sup), a scalar pruning mask method, and implements it in GNNs by proposing a strategy for setting its pruning thresholds adaptively. In the context of deep GNNs, this research uncovers the existence of untrained recurrent networks, which exhibit performance on par with their trained feed-forward counterparts. This paper also introduces the Multi-Stage Folding and Unshared Masks methods to expand the search space in terms of both architecture and parameters. Through the evaluation of various datasets, including the Open Graph Benchmark (OGB), this work establishes a triple-win scenario for SLTH-based GNNs: by achieving high sparsity, competitive performance, and high memory efficiency with up to 98.7\% reduction, it demonstrates suitability for energy-efficient graph processing. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 9 pages, accepted in the Second Learning on Graphs Conference (LoG 2023)

Journal ref: Proceedings of the Second Learning on Graphs Conference (LoG 2023), PMLR 231

arXiv:2011.02466 [pdf, ps, other]

Algorithms and Hardness for Linear Algebra on Geometric Graphs

Authors: Josh Alman, Timothy Chu, Aaron Schild, Zhao Song

Abstract: For a function $\mathsf{K} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$, and a set $P = \{ x_1, \ldots, x_n\} \subset \mathbb{R}^d$ of $n$ points, the $\mathsf{K}$ graph $G_P$ of $P$ is the complete graph on $n$ nodes where the weight between nodes $i$ and $j$ is given by $\mathsf{K}(x_i, x_j)$. In this paper, we initiate the study of when efficient spectral graph theory is poss… ▽ More For a function $\mathsf{K} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$, and a set $P = \{ x_1, \ldots, x_n\} \subset \mathbb{R}^d$ of $n$ points, the $\mathsf{K}$ graph $G_P$ of $P$ is the complete graph on $n$ nodes where the weight between nodes $i$ and $j$ is given by $\mathsf{K}(x_i, x_j)$. In this paper, we initiate the study of when efficient spectral graph theory is possible on these graphs. We investigate whether or not it is possible to solve the following problems in $n^{1+o(1)}$ time for a $\mathsf{K}$-graph $G_P$ when $d < n^{o(1)}$: $\bullet$ Multiply a given vector by the adjacency matrix or Laplacian matrix of $G_P$ $\bullet$ Find a spectral sparsifier of $G_P$ $\bullet$ Solve a Laplacian system in $G_P$'s Laplacian matrix For each of these problems, we consider all functions of the form $\mathsf{K}(u,v) = f(\|u-v\|_2^2)$ for a function $f:\mathbb{R} \rightarrow \mathbb{R}$. We provide algorithms and comparable hardness results for many such $\mathsf{K}$, including the Gaussian kernel, Neural tangent kernels, and more. For example, in dimension $d = Ω(\log n)$, we show that there is a parameter associated with the function $f$ for which low parameter values imply $n^{1+o(1)}$ time algorithms for all three of these problems and high parameter values imply the nonexistence of subquadratic time algorithms assuming Strong Exponential Time Hypothesis ($\mathsf{SETH}$), given natural assumptions on $f$. As part of our results, we also show that the exponential dependence on the dimension $d$ in the celebrated fast multipole method of Greengard and Rokhlin cannot be improved, assuming $\mathsf{SETH}$, for a broad class of functions $f$. To the best of our knowledge, this is the first formal limitation proven about fast multipole methods. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: FOCS 2020

arXiv:2006.10897 [pdf, other]

Efficient Ridesharing Dispatch Using Multi-Agent Reinforcement Learning

Authors: Oscar de Lima, Hansal Shah, Ting-Sheng Chu, Brian Fogelson

Abstract: With the advent of ride-sharing services, there is a huge increase in the number of people who rely on them for various needs. Most of the earlier approaches tackling this issue required handcrafted functions for estimating travel times and passenger waiting times. Traditional Reinforcement Learning (RL) based methods attempting to solve the ridesharing problem are unable to accurately model the c… ▽ More With the advent of ride-sharing services, there is a huge increase in the number of people who rely on them for various needs. Most of the earlier approaches tackling this issue required handcrafted functions for estimating travel times and passenger waiting times. Traditional Reinforcement Learning (RL) based methods attempting to solve the ridesharing problem are unable to accurately model the complex environment in which taxis operate. Prior Multi-Agent Deep RL based methods based on Independent DQN (IDQN) learn decentralized value functions prone to instability due to the concurrent learning and exploring of multiple agents. Our proposed method based on QMIX is able to achieve centralized training with decentralized execution. We show that our model performs better than the IDQN baseline on a fixed grid size and is able to generalize well to smaller or larger grid sizes. Also, our algorithm is able to outperform IDQN baseline in the scenario where we have a variable number of passengers and cars in each episode. Code for our paper is publicly available at: https://github.com/UMich-ML-Group/RL-Ridesharing. △ Less

Submitted 18 June, 2020; originally announced June 2020.

arXiv:2004.09589 [pdf, other]

Weighted Cheeger and Buser Inequalities, with Applications to Clustering and Cutting Probability Densities

Authors: Timothy Chu, Gary L. Miller, Noel J. Walkington, Alex L. Wang

Abstract: In this paper, we show how sparse or isoperimetric cuts of a probability density function relate to Cheeger cuts of its principal eigenfunction, for appropriate definitions of `sparse cut' and `principal eigenfunction'. We construct these appropriate definitions of sparse cut and principal eigenfunction in the probability density setting. Then, we prove Cheeger and Buser type inequalities simila… ▽ More In this paper, we show how sparse or isoperimetric cuts of a probability density function relate to Cheeger cuts of its principal eigenfunction, for appropriate definitions of `sparse cut' and `principal eigenfunction'. We construct these appropriate definitions of sparse cut and principal eigenfunction in the probability density setting. Then, we prove Cheeger and Buser type inequalities similar to those for the normalized graph Laplacian of Alon-Milman. We demonstrate that no such inequalities hold for most prior definitions of sparse cut and principal eigenfunction. We apply this result to generate novel algorithms for cutting probability densities and clustering data, including a principled variant of spectral clustering. △ Less

Submitted 6 May, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

arXiv:2004.01339 [pdf, other]

Multi-agent Reinforcement Learning for Networked System Control

Authors: Tianshu Chu, Sandeep Chinchali, Sachin Katti

Abstract: This paper considers multi-agent reinforcement learning (MARL) in networked system control. Specifically, each agent learns a decentralized control policy based on local observations and messages from connected neighbors. We formulate such a networked MARL (NMARL) problem as a spatiotemporal Markov decision process and introduce a spatial discount factor to stabilize the training of each local age… ▽ More This paper considers multi-agent reinforcement learning (MARL) in networked system control. Specifically, each agent learns a decentralized control policy based on local observations and messages from connected neighbors. We formulate such a networked MARL (NMARL) problem as a spatiotemporal Markov decision process and introduce a spatial discount factor to stabilize the training of each local agent. Further, we propose a new differentiable communication protocol, called NeurComm, to reduce information loss and non-stationarity in NMARL. Based on experiments in realistic NMARL scenarios of adaptive traffic signal control and cooperative adaptive cruise control, an appropriate spatial discount factor effectively enhances the learning curves of non-communicative MARL algorithms, while NeurComm outperforms existing communication protocols in both learning efficiency and control performance. △ Less

Submitted 23 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

Comments: ICLR 2020

arXiv:1903.04527 [pdf, other]

Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control

Authors: Tianshu Chu, Jie Wang, Lara Codecà, Zhaojian Li

Abstract: Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. Multi-agent RL (MARL) overcomes the scalability issue by distributing the g… ▽ More Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. Multi-agent RL (MARL) overcomes the scalability issue by distributing the global control to each local RL agent, but it introduces new challenges: now the environment becomes partially observable from the viewpoint of each local agent due to limited communication among agents. Most existing studies in MARL focus on designing efficient communication and coordination among traditional Q-learning agents. This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent: advantage actor critic (A2C), within the context of ATSC. In particular, two methods are proposed to stabilize the learning procedure, by improving the observability and reducing the learning difficulty of each local agent. The proposed multi-agent A2C is compared against independent A2C and independent Q-learning algorithms, in both a large synthetic traffic grid and a large real-world traffic network of Monaco city, under simulated peak-hour traffic dynamics. Results demonstrate its optimality, robustness, and sample efficiency over other state-of-the-art decentralized MARL algorithms. △ Less

Submitted 11 March, 2019; originally announced March 2019.

arXiv:1706.05544 [pdf]

Rgtsvm: Support Vector Machines on a GPU in R

Authors: Zhong Wang, Tinyi Chu, Lauren A Choate, Charles G Danko

Abstract: Rgtsvm provides a fast and flexible support vector machine (SVM) implementation for the R language. The distinguishing feature of Rgtsvm is that support vector classification and support vector regression tasks are implemented on a graphical processing unit (GPU), allowing the libraries to scale to millions of examples with >100-fold improvement in performance over existing implementations. Nevert… ▽ More Rgtsvm provides a fast and flexible support vector machine (SVM) implementation for the R language. The distinguishing feature of Rgtsvm is that support vector classification and support vector regression tasks are implemented on a graphical processing unit (GPU), allowing the libraries to scale to millions of examples with >100-fold improvement in performance over existing implementations. Nevertheless, Rgtsvm retains feature parity and has an interface that is compatible with the popular e1071 SVM package in R. Altogether, Rgtsvm enables large SVM models to be created by both experienced and novice practitioners. △ Less

Submitted 17 June, 2017; originally announced June 2017.

Comments: 6 pages, 1 figure

arXiv:1301.2261 [pdf]

Semi-Instrumental Variables: A Test for Instrument Admissibility

Authors: Tianjiao Chu, Richard Scheines, Peter L. Spirtes

Abstract: In a causal graphical model, an instrument for a variable X and its effect Y is a random variable that is a cause of X and independent of all the causes of Y except X. (Pearl (1995), Spirtes et al (2000)). Instrumental variables can be used to estimate how the distribution of an effect will respond to a manipulation of its causes, even in the presence of unmeasured common causes (confounders). In… ▽ More In a causal graphical model, an instrument for a variable X and its effect Y is a random variable that is a cause of X and independent of all the causes of Y except X. (Pearl (1995), Spirtes et al (2000)). Instrumental variables can be used to estimate how the distribution of an effect will respond to a manipulation of its causes, even in the presence of unmeasured common causes (confounders). In typical instrumental variable estimation, instruments are chosen based on domain knowledge. There is currently no statistical test for validating a variable as an instrument. In this paper, we introduce the concept of semi-instrument, which generalizes the concept of instrument. We show that in the framework of additive models, under certain conditions, we can test whether a variable is semi-instrumental. Moreover, adding some distribution assumptions, we can test whether two semi-instruments are instrumental. We give algorithms to estimate the p-value that a random variable is semi-instrumental, and the p-value that two semi-instruments are both instrumental. These algorithms can be used to test the experts' choice of instruments, or to identify instruments automatically. △ Less

Submitted 10 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

Report number: UAI-P-2001-PG-83-90

arXiv:1109.0320 [pdf, ps, other]

doi 10.1214/11-AOS919

Penalized maximum likelihood estimation and variable selection in geostatistics

Authors: Tingjin Chu, Jun Zhu, Haonan Wang

Abstract: We consider the problem of selecting covariates in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and, for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency, particularly with large sample siz… ▽ More We consider the problem of selecting covariates in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and, for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency, particularly with large sample sizes, we propose penalized maximum covariance-tapered likelihood estimation (PMLE$_{\mathrm{T}}$) and its one-step sparse estimation (OSE$_{\mathrm{T}}$). General forms of penalty functions with an emphasis on smoothly clipped absolute deviation are used for penalized maximum likelihood. Theoretical properties of PMLE and OSE, as well as their approximations PMLE$_{\mathrm{T}}$ and OSE$_{\mathrm{T}}$ using covariance tapering, are derived, including consistency, sparsity, asymptotic normality and the oracle properties. For covariance tapering, a by-product of our theoretical results is consistency and asymptotic normality of maximum covariance-tapered likelihood estimates. Finite-sample properties of the proposed methods are demonstrated in a simulation study and, for illustration, the methods are applied to analyze two real data sets. △ Less

Submitted 23 February, 2012; v1 submitted 1 September, 2011; originally announced September 2011.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS919 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS919

Journal ref: Annals of Statistics 2011, Vol. 39, No. 5, 2607-2625

Showing 1–13 of 13 results for author: Chu, T