Search | arXiv e-print repository

Place Cells as Proximity-Preserving Embeddings: From Multi-Scale Random Walk to Straight-Forward Path Planning

Authors: Minglu Zhao, Dehong Xu, Deqian Kong, Wen-Hao Zhang, Ying Nian Wu

Abstract: The hippocampus enables spatial navigation through place cell populations forming cognitive maps. We propose proximity-preserving neural embeddings to encode multi-scale random walk transitions, where the inner product $\langle h(x, t), h(y, t) \rangle = q(y|x, t)$ represents normalized transition probabilities, with $h(x, t)$ as the embedding at location $x$ and $q(y|x, t)$ as the transition prob… ▽ More The hippocampus enables spatial navigation through place cell populations forming cognitive maps. We propose proximity-preserving neural embeddings to encode multi-scale random walk transitions, where the inner product $\langle h(x, t), h(y, t) \rangle = q(y|x, t)$ represents normalized transition probabilities, with $h(x, t)$ as the embedding at location $x$ and $q(y|x, t)$ as the transition probability at scale $\sqrt{t}$. This scale hierarchy mirrors hippocampal dorsoventral organization. The embeddings $h(x, t)$ reduce pairwise spatial proximity into an environmental map, with Euclidean distances preserving proximity information. We use gradient ascent on $q(y|x, t)$ for straight-forward path planning, employing adaptive scale selection for trap-free, smooth trajectories, equivalent to minimizing embedding space distances. Matrix squaring ($P_{2t} = P_t^2$) efficiently builds global transitions from local ones ($P_1$), enabling preplay-like shortcut prediction. Experiments demonstrate localized place fields, multi-scale tuning, adaptability, and remapping, achieving robust navigation in complex environments. Our biologically plausible framework, extensible to theta-phase precession, unifies spatial and temporal coding for scalable navigation. △ Less

Submitted 2 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

arXiv:2502.01567 [pdf, ps, other]

Latent Thought Models with Variational Bayes Inference-Time Computation

Authors: Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu

Abstract: We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast lear… ▽ More We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors (inference-time computation), and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional Large Language Models (LLMs), such as the number of iterations in inference-time computation and number of latent thought vectors. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling tasks. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model size, and achieve competitive performance in conditional and unconditional text generation. △ Less

Submitted 6 June, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

arXiv:2501.11869 [pdf, other]

Saturation in Snapshot Compressive Imaging

Authors: Mengyu Zhao, Shirin Jalali

Abstract: Snapshot Compressive Imaging (SCI) maps three-dimensional (3D) data cubes, such as videos or hyperspectral images, into two-dimensional (2D) measurements via optical modulation, enabling efficient data acquisition and reconstruction. Recent advances have shown the potential of mask optimization to enhance SCI performance, but most studies overlook nonlinear distortions caused by saturation in prac… ▽ More Snapshot Compressive Imaging (SCI) maps three-dimensional (3D) data cubes, such as videos or hyperspectral images, into two-dimensional (2D) measurements via optical modulation, enabling efficient data acquisition and reconstruction. Recent advances have shown the potential of mask optimization to enhance SCI performance, but most studies overlook nonlinear distortions caused by saturation in practical systems. Saturation occurs when high-intensity measurements exceed the sensor's dynamic range, leading to information loss that standard reconstruction algorithms cannot fully recover. This paper addresses the challenge of optimizing binary masks in SCI under saturation. We theoretically characterize the performance of compression-based SCI recovery in the presence of saturation and leverage these insights to optimize masks for such conditions. Our analysis reveals trade-offs between mask statistics and reconstruction quality in saturated systems. Experimental results using a Plug-and-Play (PnP) style network validate the theory, demonstrating improved recovery performance and robustness to saturation with our optimized binary masks. △ Less

Submitted 20 January, 2025; originally announced January 2025.

Comments: 13 pages

arXiv:2501.06653 [pdf, other]

Theoretical Characterization of Effect of Masks in Snapshot Compressive Imaging

Authors: Mengyu Zhao, Shirin Jalali

Abstract: Snapshot compressive imaging (SCI) refers to the recovery of three-dimensional data cubes-such as videos or hyperspectral images-from their two-dimensional projections, which are generated by a special encoding of the data with a mask. SCI systems commonly use binary-valued masks that follow certain physical constraints. Optimizing these masks subject to these constraints is expected to improve sy… ▽ More Snapshot compressive imaging (SCI) refers to the recovery of three-dimensional data cubes-such as videos or hyperspectral images-from their two-dimensional projections, which are generated by a special encoding of the data with a mask. SCI systems commonly use binary-valued masks that follow certain physical constraints. Optimizing these masks subject to these constraints is expected to improve system performance. However, prior theoretical work on SCI systems focuses solely on independently and identically distributed (i.i.d.) Gaussian masks, which do not permit such optimization. On the other hand, existing practical mask optimizations rely on computationally intensive joint optimizations that provide limited insight into the role of masks and are expected to be sub-optimal due to the non-convexity and complexity of the optimization. In this paper, we analytically characterize the performance of SCI systems employing binary masks and leverage our analysis to optimize hardware parameters. Our findings provide a comprehensive and fundamental understanding of the role of binary masks - with both independent and dependent elements - and their optimization. We also present simulation results that confirm our theoretical findings and further illuminate different aspects of mask design. △ Less

Submitted 11 January, 2025; originally announced January 2025.

Comments: 27 pages. arXiv admin note: substantial text overlap with arXiv:2307.07796

arXiv:2411.15189 [pdf, other]

Order is All You Need for Categorical Data Clustering

Authors: Yiqun Zhang, Mingjie Zhao, Hong Jia, Yang Lu, Mengke Li, Yiu-ming Cheung

Abstract: Categorical data composed of qualitative valued attributes are ubiquitous in machine learning tasks. Due to the lack of well-defined metric space, categorical data distributions are difficult to be intuitively understood. Clustering is a popular data analysis technique suitable for data distribution understanding. However, the success of clustering often relies on reasonable distance metrics, whic… ▽ More Categorical data composed of qualitative valued attributes are ubiquitous in machine learning tasks. Due to the lack of well-defined metric space, categorical data distributions are difficult to be intuitively understood. Clustering is a popular data analysis technique suitable for data distribution understanding. However, the success of clustering often relies on reasonable distance metrics, which happens to be what categorical data naturally lack. This paper therefore introduces a new finding that the order relation among attribute values is the decisive factor in clustering accuracy, and is also the key to understanding categorical data clusters, because the essence of clustering is to order the clusters in terms of their admission to samples. To obtain the orders, we propose a new learning paradigm that allows joint learning of clusters and the orders. It alternatively partitions the data into clusters based on the distance metric built upon the orders and estimates the most likely orders according to the clusters. The algorithm achieves superior clustering accuracy with a convergence guarantee, and the learned orders facilitate the understanding of the non-intuitive cluster distribution of categorical data. Extensive experiments with ablation studies, statistical evidence, and case studies have validated the new insight into the importance of value order and the method proposition. The source code is temporarily opened in https://anonymous.4open.science/r/OCL-demo. △ Less

Submitted 18 April, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

arXiv:2411.10596 [pdf, other]

A minimalistic representation model for head direction system

Authors: Minglu Zhao, Dehong Xu, Deqian Kong, Wen-Hao Zhang, Ying Nian Wu

Abstract: We present a minimalistic representation model for the head direction (HD) system, aiming to learn a high-dimensional representation of head direction that captures essential properties of HD cells. Our model is a representation of rotation group $U(1)$, and we study both the fully connected version and convolutional version. We demonstrate the emergence of Gaussian-like tuning profiles and a 2D c… ▽ More We present a minimalistic representation model for the head direction (HD) system, aiming to learn a high-dimensional representation of head direction that captures essential properties of HD cells. Our model is a representation of rotation group $U(1)$, and we study both the fully connected version and convolutional version. We demonstrate the emergence of Gaussian-like tuning profiles and a 2D circle geometry in both versions of the model. We also demonstrate that the learned model is capable of accurate path integration. △ Less

Submitted 2 June, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

Comments: Proceedings of the Annual Meeting of the Cognitive Science Society (CogSci 2025)

arXiv:2407.11336 [pdf, other]

Redistricting Reforms Reduce Gerrymandering by Constraining Partisan Actors

Authors: Cory McCartan, Christopher T. Kenny, Tyler Simko, Emma Ebowe, Michael Y. Zhao, Kosuke Imai

Abstract: Political actors often manipulate redistricting plans to gain electoral advantages, a process known as gerrymandering. Several states have implemented institutional reforms to address this problem, such as establishing map-drawing commissions. Estimating the impact of such reforms is challenging because each state structures bundles of rules in different ways. We model redistricting as a sequentia… ▽ More Political actors often manipulate redistricting plans to gain electoral advantages, a process known as gerrymandering. Several states have implemented institutional reforms to address this problem, such as establishing map-drawing commissions. Estimating the impact of such reforms is challenging because each state structures bundles of rules in different ways. We model redistricting as a sequential game, where each state's equilibrium solution summarizes multi-step institutional interactions as a single-dimensional score. We argue this score measures the leeway political actors have over the partisan lean of the final plan. Using a differences-in-differences design, we demonstrate that reforms reduce partisan bias and increase competitiveness when they constrain partisan actors. We perform a counterfactual policy analysis to estimate the effects of enacting recent reforms nationwide. Though commissions generally reduce bias, reforms that restrict partisan actors in multiple ways like removing veto points (Michigan) are much more effective than commissions where parties retain some control (Ohio). △ Less

Submitted 16 February, 2025; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: 21 pages, 7 figures, 1 table, plus references and appendices

arXiv:2404.04399 [pdf, ps, other]

Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

Authors: Toru Shirakawa, Yi Li, Yulun Wu, Sky Qiu, Yuxuan Li, Mingduo Zhao, Hiroyasu Iso, Mark van der Laan

Abstract: We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, f… ▽ More We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, following the targeted minimum loss-based likelihood estimation (TMLE) framework, we statistically corrected for the bias commonly associated with machine learning algorithms. Furthermore, our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in asymptotic statistical theory. Simulation results demonstrate our method's superior performance over existing approaches, particularly in complex, long time-horizon scenarios. It remains effective in small-sample, short-duration contexts, matching the performance of asymptotically efficient estimators. To demonstrate our method in practice, we applied our method to estimate counterfactual mean outcomes for standard versus intensive blood pressure management strategies in a real-world cardiovascular epidemiology cohort study. △ Less

Submitted 5 June, 2025; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: Published in ICML 2024, PMLR 235

Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:45097-45113, 2024

arXiv:2403.15999 [pdf, ps, other]

Near-Optimal differentially private low-rank trace regression with guaranteed private initialization

Authors: Mengyue Zha

Abstract: We study differentially private (DP) estimation of a rank-$r$ matrix $M \in \mathbb{R}^{d_1\times d_2}$ under the trace regression model with Gaussian measurement matrices. Theoretically, the sensitivity of non-private spectral initialization is precisely characterized, and the differential-privacy-constrained minimax lower bound for estimating $M$ under the Schatten-$q$ norm is established. Metho… ▽ More We study differentially private (DP) estimation of a rank-$r$ matrix $M \in \mathbb{R}^{d_1\times d_2}$ under the trace regression model with Gaussian measurement matrices. Theoretically, the sensitivity of non-private spectral initialization is precisely characterized, and the differential-privacy-constrained minimax lower bound for estimating $M$ under the Schatten-$q$ norm is established. Methodologically, the paper introduces a computationally efficient algorithm for DP-initialization with a sample size of $n \geq \widetilde O (r^2 (d_1\vee d_2))$. Under certain regularity conditions, the DP-initialization falls within a local ball surrounding $M$. We also propose a differentially private algorithm for estimating $M$ based on Riemannian optimization (DP-RGrad), which achieves a near-optimal convergence rate with the DP-initialization and sample size of $n \geq \widetilde O(r (d_1 + d_2))$. Finally, the paper discusses the non-trivial gap between the minimax lower bound and the upper bound of low-rank matrix estimation under the trace regression model. It is shown that the estimator given by DP-RGrad attains the optimal convergence rate in a weaker notion of differential privacy. Our powerful technique for analyzing the sensitivity of initialization requires no eigengap condition between $r$ non-zero singular values. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2401.16320 [pdf, ps, other]

A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning

Authors: X. L. Zhao, Y. M. Zhao, M. Li, T. T. Li, Q. Liu, S. Guo, X. X. Yi

Abstract: We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent sp… ▽ More We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent spin state in an environment characterized by dissipation and dephasing. Compared to the constant control scenario, this approach provides various control sequences maintaining collective spin squeezing and entanglement. It is observed that denser application of the control pulses enhances the performance of the outcomes. However, there is a minor enhancement in the performance by adding control actions. The proposed strategy demonstrates increased effectiveness for larger systems. Thermal excitations of the reservoir are detrimental to the control outcomes. Feasible experiments are suggested to implement this control proposal based on the comparison with the others. The extensions to continuous control problems and another quantum system are discussed. The replaceability of the reinforcement learning module is also emphasized. This research paves the way for its application in manipulating other quantum systems. △ Less

Submitted 14 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.03820 [pdf, other]

Optimal Differentially Private PCA and Estimation for Spiked Covariance Matrices

Authors: T. Tony Cai, Dong Xia, Mengyue Zha

Abstract: Estimating a covariance matrix and its associated principal components is a fundamental problem in contemporary statistics. While optimal estimation procedures have been developed with well-understood properties, the increasing demand for privacy preservation introduces new complexities to this classical problem. In this paper, we study optimal differentially private Principal Component Analysis (… ▽ More Estimating a covariance matrix and its associated principal components is a fundamental problem in contemporary statistics. While optimal estimation procedures have been developed with well-understood properties, the increasing demand for privacy preservation introduces new complexities to this classical problem. In this paper, we study optimal differentially private Principal Component Analysis (PCA) and covariance estimation within the spiked covariance model. We precisely characterize the sensitivity of eigenvalues and eigenvectors under this model and establish the minimax rates of convergence for estimating both the principal components and covariance matrix. These rates hold up to logarithmic factors and encompass general Schatten norms, including spectral norm, Frobenius norm, and nuclear norm as special cases. We propose computationally efficient differentially private estimators and prove their minimax optimality for sub-Gaussian distributions, up to logarithmic factors. Additionally, matching minimax lower bounds are established. Notably, compared to the existing literature, our results accommodate a diverging rank, a broader range of signal strengths, and remain valid even when the sample size is much smaller than the dimension, provided the signal strength is sufficiently strong. Both simulation studies and real data experiments demonstrate the merits of our method. △ Less

Submitted 27 September, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2311.11922 [pdf, other]

Evaluating the Surrogate Index as a Decision-Making Tool Using 200 A/B Tests at Netflix

Authors: Vickie Zhang, Michael Zhao, and Maria Dimakopoulou, Anh Le, Nathan Kallus

Abstract: Surrogate index approaches have recently become a popular method of estimating longer-term impact from shorter-term outcomes. In this paper, we leverage 1098 test arms from 200 A/B tests at Netflix to empirically investigate to what degree would decisions made using a surrogate index utilizing 14 days of data would align with those made using direct measurement of day 63 treatment effects. Focusin… ▽ More Surrogate index approaches have recently become a popular method of estimating longer-term impact from shorter-term outcomes. In this paper, we leverage 1098 test arms from 200 A/B tests at Netflix to empirically investigate to what degree would decisions made using a surrogate index utilizing 14 days of data would align with those made using direct measurement of day 63 treatment effects. Focusing specifically on linear "auto-surrogate" models that utilize the shorter-term observations of the long-term outcome of interest, we find that the statistical inferences that we would draw from using the surrogate index are ~95% consistent with those from directly measuring the long-term treatment effect. Moreover, when we restrict ourselves to the set of tests that would be "launched" (i.e. positive and statistically significant) based on the 63-day directly measured treatment effects, we find that relying instead on the surrogate index achieves 79% and 65% recall. △ Less

Submitted 30 January, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.04657 [pdf, other]

Long-Term Causal Inference with Imperfect Surrogates using Many Weak Experiments, Proxies, and Cross-Fold Moments

Authors: Aurélien Bibaut, Nathan Kallus, Simon Ejdemyr, Michael Zhao

Abstract: Inferring causal effects on long-term outcomes using short-term surrogates is crucial to rapid innovation. However, even when treatments are randomized and surrogates fully mediate their effect on outcomes, it's possible that we get the direction of causal effects wrong due to confounding between surrogates and outcomes -- a situation famously known as the surrogate paradox. The availability of ma… ▽ More Inferring causal effects on long-term outcomes using short-term surrogates is crucial to rapid innovation. However, even when treatments are randomized and surrogates fully mediate their effect on outcomes, it's possible that we get the direction of causal effects wrong due to confounding between surrogates and outcomes -- a situation famously known as the surrogate paradox. The availability of many historical experiments offer the opportunity to instrument for the surrogate and bypass this confounding. However, even as the number of experiments grows, two-stage least squares has non-vanishing bias if each experiment has a bounded size, and this bias is exacerbated when most experiments barely move metrics, as occurs in practice. We show how to eliminate this bias using cross-fold procedures, JIVE being one example, and construct valid confidence intervals for the long-term effect in new experiments where long-term outcome has not yet been observed. Our methodology further allows to proxy for effects not perfectly mediated by the surrogates, allowing us to handle both confounding and effect leakage as violations of standard statistical surrogacy conditions. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2209.00541 [pdf, ps, other]

Variable selection for varying multi-index coefficients models with applications to synergistic GxE interactions

Authors: Shunjie Guan, Mingtao Zhao, Yuehua Cui

Abstract: Epidemiological evidence suggests that simultaneous exposures to multiple environmental risk factors (Es) can increase disease risk larger than the additive effect of individual exposure acting alone. The interaction between a gene and multiple Es on a disease risk is termed as synergistic gene-environment interactions (synG$\times$E). Varying multi-index coefficients models (VMICM) have been a pr… ▽ More Epidemiological evidence suggests that simultaneous exposures to multiple environmental risk factors (Es) can increase disease risk larger than the additive effect of individual exposure acting alone. The interaction between a gene and multiple Es on a disease risk is termed as synergistic gene-environment interactions (synG$\times$E). Varying multi-index coefficients models (VMICM) have been a promising tool to model synergistic G$\times$E effect and to understand how multiple Es jointly influence genetic risks on a disease outcome. In this work, we proposed a 3-step variable selection approach for VMICM to estimate different effects of gene variables: varying, non-zero constant and zero effects which respectively correspond to nonlinear synG$\times$E, no synG$\times$E and no genetic effect. For multiple environmental exposure variables, we also estimated and selected important environmental variables that contribute to the synergistic interaction effect. We theoretically evaluated the oracle property of the proposed variable selection approach. Extensive simulation studies were conducted to evaluate the finite sample performance of the method, considering both continuous and discrete gene variables. Application to a real dataset further demonstrated the utility of the method. Our method has broad applications in areas where the purpose is to identify synergistic interaction effect. △ Less

Submitted 1 September, 2022; originally announced September 2022.

arXiv:2208.13059 [pdf]

Emergent Spatial Characteristics from Strategic Games Simulated on Random and Real Networks

Authors: Louis Zhao, Chen Ye Gan, Minglu Zhao

Abstract: Complex networks are a great tool for simulating the outcomes of different strategies used within the iterated prisoners' dilemma game. However, because the strategies themselves rely on the connection between nodes, then initial network structure should have an impact on the progression of the game. By defining each interaction in terms of a prisoner's dilemma and using its payoff matrix as a bas… ▽ More Complex networks are a great tool for simulating the outcomes of different strategies used within the iterated prisoners' dilemma game. However, because the strategies themselves rely on the connection between nodes, then initial network structure should have an impact on the progression of the game. By defining each interaction in terms of a prisoner's dilemma and using its payoff matrix as a basis for investigation, we implemented players with various interaction and edge attachment strategies, and ran this dynamic process on real and random networks with varying network structure. We found that, both network size and small world properties played an important role in not only deciding the convergence rate of the simulation but also the dominant status of nodes, under the conditions where identical strategies are employed by every player. △ Less

Submitted 27 August, 2022; originally announced August 2022.

Comments: Iterated Prisoner's Dilemma, Simulation, Complex Network, 7 Pages, 6 Figures, PNAS format

arXiv:2008.08844 [pdf, other]

Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks

Authors: Sitao Luan, Mingde Zhao, Chenqing Hua, Xiao-Wen Chang, Doina Precup

Abstract: The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making… ▽ More The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making the nodes gradually lose their identity and become indistinguishable. Hence, we augment the aggregation operations with their dual, i.e. diversification operators that make the node more distinct and preserve the identity. Such augmentation replaces the aggregation with a two-channel filtering process that, in theory, is beneficial for enriching the node representations. In practice, the proposed two-channel filters can be easily patched on existing GNN methods with diverse training strategies, including spectral and spatial (message passing) methods. In the experiments, we observe desired characteristics of the models and significant performance boost upon the baselines on 9 node classification tasks. △ Less

Submitted 2 November, 2022; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: New Frontiers in Graph Learning (GLFrontiers) Workshop (Oral), NeurIPS 2022

arXiv:2008.08838 [pdf, ps, other]

Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks

Authors: Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup

Abstract: The performance limit of Graph Convolutional Networks (GCNs) and the fact that we cannot stack more of them to increase the performance, which we usually do for other deep learning paradigms, are pervasively thought to be caused by the limitations of the GCN layers, including insufficient expressive power, etc. However, if so, for a fixed architecture, it would be unlikely to lower the training di… ▽ More The performance limit of Graph Convolutional Networks (GCNs) and the fact that we cannot stack more of them to increase the performance, which we usually do for other deep learning paradigms, are pervasively thought to be caused by the limitations of the GCN layers, including insufficient expressive power, etc. However, if so, for a fixed architecture, it would be unlikely to lower the training difficulty and to improve performance by changing only the training procedure, which we show in this paper not only possible but possible in several ways. This paper first identify the training difficulty of GCNs from the perspective of graph signal energy loss. More specifically, we find that the loss of energy in the backward pass during training nullifies the learning of the layers closer to the input. Then, we propose several methodologies to mitigate the training problem by slightly modifying the GCN operator, from the energy perspective. After empirical validation, we confirm that these changes of operator lead to significant decrease in the training difficulties and notable performance boost, without changing the composition of parameters. With these, we conclude that the root cause of the problem is more likely the training difficulty than the others. △ Less

Submitted 3 November, 2023; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: Accepted by 12th International Conference on Complex Networks and Their Applications

arXiv:2007.06178 [pdf, other]

Bridging Maximum Likelihood and Adversarial Learning via $α$-Divergence

Authors: Miaoyun Zhao, Yulai Cong, Shuyang Dai, Lawrence Carin

Abstract: Maximum likelihood (ML) and adversarial learning are two popular approaches for training generative models, and from many perspectives these techniques are complementary. ML learning encourages the capture of all data modes, and it is typically characterized by stable training. However, ML learning tends to distribute probability mass diffusely over the data space, $e.g.$, yielding blurry syntheti… ▽ More Maximum likelihood (ML) and adversarial learning are two popular approaches for training generative models, and from many perspectives these techniques are complementary. ML learning encourages the capture of all data modes, and it is typically characterized by stable training. However, ML learning tends to distribute probability mass diffusely over the data space, $e.g.$, yielding blurry synthetic images. Adversarial learning is well known to synthesize highly realistic natural images, despite practical challenges like mode dropping and delicate training. We propose an $α$-Bridge to unify the advantages of ML and adversarial learning, enabling the smooth transfer from one to the other via the $α$-divergence. We reveal that generalizations of the $α$-Bridge are closely related to approaches developed recently to regularize adversarial learning, providing insights into that prior work, and further understanding of why the $α$-Bridge performs well in practice. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: AAAI 2020

arXiv:2006.08906 [pdf, other]

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

Authors: Mingde Zhao

Abstract: Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to do temporal credit assignment, i.e. decide which portion of a reward should be assigned to predecessor states that… ▽ More Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to do temporal credit assignment, i.e. decide which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter $λ$. However, tuning this parameter can be time-consuming, and not tuning it can lead to inefficient learning. To improve the sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online, incurring roughly the same computational complexity per step as the usual value learner. Our approach can be used both in on-policy and off-policy learning. We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error. This method can be viewed as a plugin which can also be used to assist prediction with function approximation by meta-learning feature (observation)-based $λ$ online, or even in the control case to assist policy improvement. Our empirical evaluation demonstrates significant performance improvements, as well as improved robustness of the proposed algorithm to learning rate variation. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Master of Computer Science

arXiv:2006.08873 [pdf, other]

GO Hessian for Expectation-Based Objectives

Authors: Yulai Cong, Miaoyun Zhao, Jianqiao Li, Junya Chen, Lawrence Carin

Abstract: An unbiased low-variance gradient estimator, termed GO gradient, was proposed recently for expectation-based objectives $\mathbb{E}_{q_{\boldsymbolγ}(\boldsymbol{y})} [f(\boldsymbol{y})]$, where the random variable (RV) $\boldsymbol{y}$ may be drawn from a stochastic computation graph with continuous (non-reparameterizable) internal nodes and continuous/discrete leaves. Upgrading the GO gradient,… ▽ More An unbiased low-variance gradient estimator, termed GO gradient, was proposed recently for expectation-based objectives $\mathbb{E}_{q_{\boldsymbolγ}(\boldsymbol{y})} [f(\boldsymbol{y})]$, where the random variable (RV) $\boldsymbol{y}$ may be drawn from a stochastic computation graph with continuous (non-reparameterizable) internal nodes and continuous/discrete leaves. Upgrading the GO gradient, we present for $\mathbb{E}_{q_{\boldsymbol{\boldsymbolγ}}(\boldsymbol{y})} [f(\boldsymbol{y})]$ an unbiased low-variance Hessian estimator, named GO Hessian. Considering practical implementation, we reveal that GO Hessian is easy-to-use with auto-differentiation and Hessian-vector products, enabling efficient cheap exploitation of curvature information over stochastic computation graphs. As representative examples, we present the GO Hessian for non-reparameterizable gamma and negative binomial RVs/nodes. Based on the GO Hessian, we design a new second-order method for $\mathbb{E}_{q_{\boldsymbol{\boldsymbolγ}}(\boldsymbol{y})} [f(\boldsymbol{y})]$, with rigorous experiments conducted to verify its effectiveness and efficiency. △ Less

Submitted 15 June, 2020; originally announced June 2020.

arXiv:2003.12795 [pdf, ps, other]

Semi-Federated Learning

Authors: Zhikun Chen, Daofeng Li, Ming Zhao, Sihai Zhang, Jinkang Zhu

Abstract: Federated learning (FL) enables massive distributed Information and Communication Technology (ICT) devices to learn a global consensus model without any participants revealing their own data to the central server. However, the practicality, communication expense and non-independent and identical distribution (Non-IID) data challenges in FL still need to be concerned. In this work, we propose the S… ▽ More Federated learning (FL) enables massive distributed Information and Communication Technology (ICT) devices to learn a global consensus model without any participants revealing their own data to the central server. However, the practicality, communication expense and non-independent and identical distribution (Non-IID) data challenges in FL still need to be concerned. In this work, we propose the Semi-Federated Learning (Semi-FL) which differs from the FL in two aspects, local clients clustering and in-cluster training. A sequential training manner is designed for our in-cluster training in this paper which enables the neighboring clients to share their learning models. The proposed Semi-FL can be easily applied to future mobile communication networks and require less up-link transmission bandwidth. Numerical experiments validate the feasibility, learning performance and the robustness to Non-IID data of the proposed Semi-FL. The Semi-FL extends the existing potentials of FL. △ Less

Submitted 28 March, 2020; originally announced March 2020.

arXiv:2002.07605 [pdf]

doi 10.1016/j.neucom.2020.07.088

A comprehensive review on convolutional neural network in machine fault diagnosis

Authors: Jinyang Jiao, Ming Zhao, Jing Lin, Kaixuan Liang

Abstract: With the rapid development of manufacturing industry, machine fault diagnosis has become increasingly significant to ensure safe equipment operation and production. Consequently, multifarious approaches have been explored and developed in the past years, of which intelligent algorithms develop particularly rapidly. Convolutional neural network, as a typical representative of intelligent diagnostic… ▽ More With the rapid development of manufacturing industry, machine fault diagnosis has become increasingly significant to ensure safe equipment operation and production. Consequently, multifarious approaches have been explored and developed in the past years, of which intelligent algorithms develop particularly rapidly. Convolutional neural network, as a typical representative of intelligent diagnostic models, has been extensively studied and applied in recent five years, and a large amount of literature has been published in academic journals and conference proceedings. However, there has not been a systematic review to cover these studies and make a prospect for the further research. To fill in this gap, this work attempts to review and summarize the development of the Convolutional Network based Fault Diagnosis (CNFD) approaches comprehensively. Generally, a typical CNFD framework is composed of the following steps, namely, data collection, model construction, and feature learning and decision making, thus this paper is organized by following this stream. Firstly, data collection process is described, in which several popular datasets are introduced. Then, the fundamental theory from the basic convolutional neural network to its variants is elaborated. After that, the applications of CNFD are reviewed in terms of three mainstream directions, i.e. classification, prediction and transfer diagnosis. Finally, conclusions and prospects are presented to point out the characteristics of current development, facing challenges and future trends. Last but not least, it is expected that this work would provide convenience and inspire further exploration for researchers in this field. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2002.07601 [pdf, other]

ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning

Authors: Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei

Abstract: Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. In addition, we propose two improved versions of the proposed network. The first one transforms th… ▽ More Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. In addition, we propose two improved versions of the proposed network. The first one transforms the penalty parameter into a set of iteration-dependent ones, and the second one adopts a specially designed penalty function, which is based on a piecewise linear function with adjustable slopes. Numerical results show that the resulting DL-aided decoders outperform the original ADMM-penalized decoder for various low density parity check (LDPC) codes with similar computational complexity. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: 5 pages, 4 figures, accepted for publication in IEEE communications letters

arXiv:1912.06444 [pdf]

Deep Self-representative Concept Factorization Network for Representation Learning

Authors: Yan Zhang, Zhao Zhang, Zheng Zhang, Mingbo Zhao, Li Zhang, Zhengjun Zha, Meng Wang

Abstract: In this paper, we investigate the unsupervised deep representation learning issue and technically propose a novel framework called Deep Self-representative Concept Factorization Network (DSCF-Net), for clustering deep features. To improve the representation and clustering abilities, DSCF-Net explicitly considers discovering hidden deep semantic features, enhancing the robustness proper-ties of the… ▽ More In this paper, we investigate the unsupervised deep representation learning issue and technically propose a novel framework called Deep Self-representative Concept Factorization Network (DSCF-Net), for clustering deep features. To improve the representation and clustering abilities, DSCF-Net explicitly considers discovering hidden deep semantic features, enhancing the robustness proper-ties of the deep factorization to noise and preserving the local man-ifold structures of deep features. Specifically, DSCF-Net seamlessly integrates the robust deep concept factorization, deep self-expressive representation and adaptive locality preserving feature learning into a unified framework. To discover hidden deep repre-sentations, DSCF-Net designs a hierarchical factorization architec-ture using multiple layers of linear transformations, where the hierarchical representation is performed by formulating the prob-lem as optimizing the basis concepts in each layer to improve the representation indirectly. DSCF-Net also improves the robustness by subspace recovery for sparse error correction firstly and then performs the deep factorization in the recovered visual subspace. To obtain locality-preserving representations, we also present an adaptive deep self-representative weighting strategy by using the coefficient matrix as the adaptive reconstruction weights to keep the locality of representations. Extensive comparison results with several other related models show that DSCF-Net delivers state-of-the-art performance on several public databases. △ Less

Submitted 29 December, 2019; v1 submitted 13 December, 2019; originally announced December 2019.

Comments: Accepted by SDM 2020

arXiv:1911.08678 [pdf]

Robust Triple-Matrix-Recovery-Based Auto-Weighted Label Propagation for Classification

Authors: Huan Zhang, Zhao Zhang, Mingbo Zhao, Qiaolin Ye, Min Zhang, Meng Wang

Abstract: The graph-based semi-supervised label propagation algorithm has delivered impressive classification results. However, the estimated soft labels typically contain mixed signs and noise, which cause inaccurate predictions due to the lack of suitable constraints. Moreover, available methods typically calculate the weights and estimate the labels in the original input space, which typically contains n… ▽ More The graph-based semi-supervised label propagation algorithm has delivered impressive classification results. However, the estimated soft labels typically contain mixed signs and noise, which cause inaccurate predictions due to the lack of suitable constraints. Moreover, available methods typically calculate the weights and estimate the labels in the original input space, which typically contains noise and corruption. Thus, the en-coded similarities and manifold smoothness may be inaccurate for label estimation. In this paper, we present effective schemes for resolving these issues and propose a novel and robust semi-supervised classification algorithm, namely, the tri-ple-matrix-recovery-based robust auto-weighted label propa-gation framework (ALP-TMR). Our ALP-TMR introduces a triple matrix recovery mechanism to remove noise or mixed signs from the estimated soft labels and improve the robustness to noise and outliers in the steps of assigning weights and pre-dicting the labels simultaneously. Our method can jointly re-cover the underlying clean data, clean labels and clean weighting spaces by decomposing the original data, predicted soft labels or weights into a clean part plus an error part by fitting noise. In addition, ALP-TMR integrates the au-to-weighting process by minimizing reconstruction errors over the recovered clean data and clean soft labels, which can en-code the weights more accurately to improve both data rep-resentation and classification. By classifying samples in the recovered clean label and weight spaces, one can potentially improve the label prediction results. The results of extensive experiments demonstrated the satisfactory performance of our ALP-TMR. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted by IEEE TNNNLS

arXiv:1906.03814 [pdf, other]

doi 10.1109/TSP.2020.3035832

Learned Conjugate Gradient Descent Network for Massive MIMO Detection

Authors: Yi Wei, Ming-Min Zhao, Mingyi Hong, Min-jian Zhao, Ming Lei

Abstract: In this work, we consider the use of model-driven deep learning techniques for massive multiple-input multiple-output (MIMO) detection. Compared with conventional MIMO systems, massive MIMO promises improved spectral efficiency, coverage and range. Unfortunately, these benefits are coming at the cost of significantly increased computational complexity. To reduce the complexity of signal detection… ▽ More In this work, we consider the use of model-driven deep learning techniques for massive multiple-input multiple-output (MIMO) detection. Compared with conventional MIMO systems, massive MIMO promises improved spectral efficiency, coverage and range. Unfortunately, these benefits are coming at the cost of significantly increased computational complexity. To reduce the complexity of signal detection and guarantee the performance, we present a learned conjugate gradient descent network (LcgNet), which is constructed by unfolding the iterative conjugate gradient descent (CG) detector. In the proposed network, instead of calculating the exact values of the scalar step-sizes, we explicitly learn their universal values. Also, we can enhance the proposed network by augmenting the dimensions of these step-sizes. Furthermore, in order to reduce the memory costs, a novel quantized LcgNet is proposed, where a low-resolution nonuniform quantizer is integrated into the LcgNet to smartly quantize the aforementioned step-sizes. The quantizer is based on a specially designed soft staircase function with learnable parameters to adjust its shape. Meanwhile, due to fact that the number of learnable parameters is limited, the proposed networks are easy and fast to train. Numerical results demonstrate that the proposed network can achieve promising performance with much lower complexity. △ Less

Submitted 1 June, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: Part of this work has been accepted by IEEE ICC 2020

arXiv:1906.02174 [pdf, other]

Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks

Authors: Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup

Abstract: Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems. However, their bottlenecks still need to be addressed, and the advantages of multi-scale information and deep architectures have not been sufficiently exploited. In this paper, we theoretically analyze how existing Graph Convolutional Networks (GCNs) have limited e… ▽ More Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems. However, their bottlenecks still need to be addressed, and the advantages of multi-scale information and deep architectures have not been sufficiently exploited. In this paper, we theoretically analyze how existing Graph Convolutional Networks (GCNs) have limited expressive power due to the constraint of the activation functions and their architectures. We generalize spectral graph convolution and deep GCN in block Krylov subspace forms and devise two architectures, both with the potential to be scaled deeper but each making use of the multi-scale information in different ways. We further show that the equivalence of these two architectures can be established under certain conditions. On several node classification tasks, with or without the help of validation, the two new architectures achieve better performance compared to many state-of-the-art methods. △ Less

Submitted 8 September, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

Comments: Accepted and to be published by NeurIPS 2019

arXiv:1905.04413 [pdf, other]

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems

Authors: Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, Zhongyuan Wang

Abstract: Knowledge graphs capture structured information and relations between a set of entities or items. As such knowledge graphs represent an attractive source of information that could help improve recommender systems. However, existing approaches in this domain rely on manual feature engineering and do not allow for an end-to-end training. Here we propose Knowledge-aware Graph Neural Networks with Lab… ▽ More Knowledge graphs capture structured information and relations between a set of entities or items. As such knowledge graphs represent an attractive source of information that could help improve recommender systems. However, existing approaches in this domain rely on manual feature engineering and do not allow for an end-to-end training. Here we propose Knowledge-aware Graph Neural Networks with Label Smoothness regularization (KGNN-LS) to provide better recommendations. Conceptually, our approach computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relationships for a given user. This way we transform the knowledge graph into a user-specific weighted graph and then apply a graph neural network to compute personalized item embeddings. To provide better inductive bias, we rely on label smoothness assumption, which posits that adjacent items in the knowledge graph are likely to have similar user relevance labels/scores. Label smoothness provides regularization over the edge weights and we prove that it is equivalent to a label propagation scheme on a graph. We also develop an efficient implementation that shows strong scalability with respect to the knowledge graph size. Experiments on four datasets show that our method outperforms state of the art baselines. KGNN-LS also achieves strong performance in cold-start scenarios where user-item interactions are sparse. △ Less

Submitted 13 June, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

arXiv:1904.12575 [pdf, other]

doi 10.1145/3308558.3313417

Knowledge Graph Convolutional Networks for Recommender Systems

Authors: Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, Minyi Guo

Abstract: To alleviate sparsity and cold start problem of collaborative filtering based recommender systems, researchers and engineers usually collect attributes of users and items, and design delicate algorithms to exploit these additional information. In general, the attributes are not isolated but connected with each other, which forms a knowledge graph (KG). In this paper, we propose Knowledge Graph Con… ▽ More To alleviate sparsity and cold start problem of collaborative filtering based recommender systems, researchers and engineers usually collect attributes of users and items, and design delicate algorithms to exploit these additional information. In general, the attributes are not isolated but connected with each other, which forms a knowledge graph (KG). In this paper, we propose Knowledge Graph Convolutional Networks (KGCN), an end-to-end framework that captures inter-item relatedness effectively by mining their associated attributes on the KG. To automatically discover both high-order structure information and semantic information of the KG, we sample from the neighbors for each entity in the KG as their receptive field, then combine neighborhood information with bias when calculating the representation of a given entity. The receptive field can be extended to multiple hops away to model high-order proximity information and capture users' potential long-distance interests. Moreover, we implement the proposed KGCN in a minibatch fashion, which enables our model to operate on large datasets and KGs. We apply the proposed model to three datasets about movie, book, and music recommendation, and experiment results demonstrate that our approach outperforms strong recommender baselines. △ Less

Submitted 18 March, 2019; originally announced April 2019.

Comments: Proceedings of the 2019 World Wide Web Conference

arXiv:1902.02037 [pdf, other]

Bidirectional Inference Networks: A Class of Deep Bayesian Networks for Health Profiling

Authors: Hao Wang, Chengzhi Mao, Hao He, Mingmin Zhao, Tommi S. Jaakkola, Dina Katabi

Abstract: We consider the problem of inferring the values of an arbitrary set of variables (e.g., risk of diseases) given other observed variables (e.g., symptoms and diagnosed diseases) and high-dimensional signals (e.g., MRI images or EEG). This is a common problem in healthcare since variables of interest often differ for different patients. Existing methods including Bayesian networks and structured pre… ▽ More We consider the problem of inferring the values of an arbitrary set of variables (e.g., risk of diseases) given other observed variables (e.g., symptoms and diagnosed diseases) and high-dimensional signals (e.g., MRI images or EEG). This is a common problem in healthcare since variables of interest often differ for different patients. Existing methods including Bayesian networks and structured prediction either do not incorporate high-dimensional signals or fail to model conditional dependencies among variables. To address these issues, we propose bidirectional inference networks (BIN), which stich together multiple probabilistic neural networks, each modeling a conditional dependency. Predictions are then made via iteratively updating variables using backpropagation (BP) to maximize corresponding posterior probability. Furthermore, we extend BIN to composite BIN (CBIN), which involves the iterative prediction process in the training stage and improves both accuracy and computational efficiency by adaptively smoothing the optimization landscape. Experiments on synthetic and real-world datasets (a sleep study and a dermatology dataset) show that CBIN is a single model that can achieve state-of-the-art performance and obtain better accuracy in most inference tasks than multiple models each specifically trained for a different task. △ Less

Submitted 6 February, 2019; originally announced February 2019.

Comments: Appeared at AAAI 2019

arXiv:1901.08907 [pdf, other]

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Authors: Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo

Abstract: Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge gr… ▽ More Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by cross&compress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that cross&compress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain a decent performance even if user-item interactions are sparse. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Comments: In Proceedings of The 2019 Web Conference (WWW 2019)

arXiv:1901.06020 [pdf, other]

GO Gradient for Expectation-Based Objectives

Authors: Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin

Abstract: Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer from high variance, seeking help from (often) complicated variance-reduction techniques; or (ii) they only apply to reparameterizable continuous ra… ▽ More Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer from high variance, seeking help from (often) complicated variance-reduction techniques; or (ii) they only apply to reparameterizable continuous random variables and employ a reparameterization trick. To address these limitations, we propose a General and One-sample (GO) gradient that (i) applies to many distributions associated with non-reparameterizable continuous or discrete random variables, and (ii) has the same low-variance as the reparameterization trick. We find that the GO gradient often works well in practice based on only one Monte Carlo sample (although one can of course use more samples if desired). Alongside the GO gradient, we develop a means of propagating the chain rule through distributions, yielding statistical back-propagation, coupling neural networks to common random variables. △ Less

Submitted 17 January, 2019; originally announced January 2019.

arXiv:1808.03587 [pdf]

A simplified convolutional sparse filter for impulsive signature enhancement and its application to the prognostic of rotating machinery

Authors: Xiaodong Jia, Ming Zhao, Haoshu Cai, Jay Lee

Abstract: Impulsive signature enhancement (ISE) is an important topic in the monitoring of rotating machinery and many different methods have been proposed. Even though, the topic of how to leverage these ISE techniques to improve the data quality in terms of prognostics and health management (PHM) still needs to be investigated. In this work, a systematic view for data quality enhancement is presented. The… ▽ More Impulsive signature enhancement (ISE) is an important topic in the monitoring of rotating machinery and many different methods have been proposed. Even though, the topic of how to leverage these ISE techniques to improve the data quality in terms of prognostics and health management (PHM) still needs to be investigated. In this work, a systematic view for data quality enhancement is presented. The data quality issues for the prognostics and health management (PHM) of rotating machinery are identified, and the major steps to enhance data quality are organized. Based on this, a novel ISE algorithm is originally proposed, the importance of extracting scale invariant features are explained, and also related features are proposed for the PHM of rotating machinery. In order to demonstrate the effectiveness of the novelties, two experimental studies are presented. The final results indicate that the proposed method can be effectively employed to enhance the data quality for machine failure detection and diagnosis. △ Less

Submitted 10 August, 2018; originally announced August 2018.

arXiv:1803.03467 [pdf, other]

doi 10.1145/3269206.3271739

RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Authors: Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo

Abstract: To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendati… ▽ More To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines. △ Less

Submitted 25 August, 2018; v1 submitted 9 March, 2018; originally announced March 2018.

Comments: CIKM 2018

arXiv:1712.00731 [pdf, other]

doi 10.1145/3132847.3132889

Joint Topic-Semantic-aware Social Recommendation for Online Voting

Authors: Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, Minyi Guo

Abstract: Online voting is an emerging feature in social networks, in which users can express their attitudes toward various issues and show their unique interest. Online voting imposes new challenges on recommendation, because the propagation of votings heavily depends on the structure of social networks as well as the content of votings. In this paper, we investigate how to utilize these two factors in a… ▽ More Online voting is an emerging feature in social networks, in which users can express their attitudes toward various issues and show their unique interest. Online voting imposes new challenges on recommendation, because the propagation of votings heavily depends on the structure of social networks as well as the content of votings. In this paper, we investigate how to utilize these two factors in a comprehensive manner when doing voting recommendation. First, due to the fact that existing text mining methods such as topic model and semantic model cannot well process the content of votings that is typically short and ambiguous, we propose a novel Topic-Enhanced Word Embedding (TEWE) method to learn word and document representation by jointly considering their topics and semantics. Then we propose our Joint Topic-Semantic-aware social Matrix Factorization (JTS-MF) model for voting recommendation. JTS-MF model calculates similarity among users and votings by combining their TEWE representation and structural information of social networks, and preserves this topic-semantic-social similarity during matrix factorization. To evaluate the performance of TEWE representation and JTS-MF model, we conduct extensive experiments on real online voting dataset. The results prove the efficacy of our approach against several state-of-the-art baselines. △ Less

Submitted 3 December, 2017; originally announced December 2017.

Comments: The 26th ACM International Conference on Information and Knowledge Management (CIKM 2017)

arXiv:1711.08267 [pdf, other]

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

Authors: Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, Minyi Guo

Abstract: The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper,… ▽ More The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a game-theoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces "fake" samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on real-world datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over state-of-the-art baselines. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 8 pages

arXiv:1603.09453 [pdf]

An overview and perspective on social network monitoring

Authors: William H. Woodall, Meng J. Zhao, Kamran Paynabar, Ross Sparks, James D. Wilson

Abstract: In this expository paper we give an overview of some statistical methods for the monitoring of social networks. We discuss the advantages and limitations of various methods as well as some relevant issues. One of our primary contributions is to give the relationships between network monitoring methods and monitoring methods in engineering statistics and public health surveillance. We encourage res… ▽ More In this expository paper we give an overview of some statistical methods for the monitoring of social networks. We discuss the advantages and limitations of various methods as well as some relevant issues. One of our primary contributions is to give the relationships between network monitoring methods and monitoring methods in engineering statistics and public health surveillance. We encourage researchers in the industrial process monitoring area to work on developing and comparing the performance of social network monitoring methods. We also discuss some of the issues in social network monitoring and give a number of research ideas. △ Less

Submitted 31 March, 2016; originally announced March 2016.

Comments: To appear, IIE Transactions

arXiv:1205.1496 [pdf, ps, other]

Graph-based Learning with Unbalanced Clusters

Authors: Jing Qian, Venkatesh Saligrama, Manqi Zhao

Abstract: Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $ε$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more impor… ▽ More Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $ε$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more importance on balancing cluster sizes over reducing cut values. We propose a novel graph construction technique and show that the RatioCut solution on this new graph is able to handle proximal and unbalanced data. Our method is based on adaptively modulating the neighborhood degrees in a $k$-NN graph, which tends to sparsify neighborhoods in low density regions. Our method adapts to data with varying levels of unbalancedness and can be naturally used for small cluster detection. We justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method. △ Less

Submitted 8 May, 2012; v1 submitted 7 May, 2012; originally announced May 2012.

Comments: 21 pages, 7 figures

arXiv:1112.2319 [pdf, ps, other]

Graph Construction for Learning with Unbalanced Data

Authors: Jing Qian, Venkatesh Saligrama, Manqi Zhao

Abstract: Unbalanced data arises in many learning tasks such as clustering of multi-class data, hierarchical divisive clustering and semisupervised learning. Graph-based approaches are popular tools for these problems. Graph construction is an important aspect of graph-based learning. We show that graph-based algorithms can fail for unbalanced data for many popular graphs such as k-NN, ε-neighborhood and fu… ▽ More Unbalanced data arises in many learning tasks such as clustering of multi-class data, hierarchical divisive clustering and semisupervised learning. Graph-based approaches are popular tools for these problems. Graph construction is an important aspect of graph-based learning. We show that graph-based algorithms can fail for unbalanced data for many popular graphs such as k-NN, ε-neighborhood and full-RBF graphs. We propose a novel graph construction technique that encodes global statistical information into node degrees through a ranking scheme. The rank of a data sample is an estimate of its p-value and is proportional to the total number of data samples with smaller density. This ranking scheme serves as a surrogate for density; can be reliably estimated; and indicates whether a data sample is close to valleys/modes. This rank-modulated degree(RMD) scheme is able to significantly sparsify the graph near valleys and provides an adaptive way to cope with unbalanced data. We then theoretically justify our method through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method. △ Less

Submitted 10 December, 2011; originally announced December 2011.

Comments: 14 pages, 6 figures, 2 tables

Showing 1–39 of 39 results for author: Zhao, M