Search | arXiv e-print repository

doi 10.1016/j.jfluidstructs.2022.103793

Analysis of Full-scale Riser Responses in Field Conditions Based on Gaussian Mixture Model

Authors: Jie Wu, Sølve Eidnes, Jingzhe Jin, Halvor Lie, Decao Yin, Elizabeth Passano, Svein Sævik, Signe Riemer-Sorensen

Abstract: Offshore slender marine structures experience complex and combined load conditions from waves, current and vessel motions that may result in both wave frequency and vortex shedding response patterns. Field measurements often consist of records of environmental conditions and riser responses, typically with 30-minute intervals. These data can be represented in a high-dimensional parameter space. Ho… ▽ More Offshore slender marine structures experience complex and combined load conditions from waves, current and vessel motions that may result in both wave frequency and vortex shedding response patterns. Field measurements often consist of records of environmental conditions and riser responses, typically with 30-minute intervals. These data can be represented in a high-dimensional parameter space. However, it is difficult to visualize and understand the structural responses, as they are affected by many of these parameters. It becomes easier to identify trends and key parameters if the measurements with the same characteristics can be grouped together. Cluster analysis is an unsupervised learning method, which groups the data based on their relative distance, density of the data space, intervals, or statistical distributions. In the present study, a Gaussian mixture model guided by domain knowledge has been applied to analyze field measurements. Using the 242 measurement events of the Helland-Hansen riser, it is demonstrated that riser responses can be grouped into 12 clusters by the identification of key environmental parameters. This results in an improved understanding of complex structure responses. Furthermore, the cluster results are valuable for evaluating the riser response prediction accuracy. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Matches accepted version

Journal ref: Journal of Fluids and Structures, Volume 116, 2023, 103793

arXiv:2403.10070 [pdf, other]

A Structure-Preserving Kernel Method for Learning Hamiltonian Systems

Authors: Jianyu Hu, Juan-Pablo Ortega, Daiying Yin

Abstract: A structure-preserving kernel ridge regression method is presented that allows the recovery of nonlinear Hamiltonian functions out of datasets made of noisy observations of Hamiltonian vector fields. The method proposes a closed-form solution that yields excellent numerical performances that surpass other techniques proposed in the literature in this setup. From the methodological point of view, t… ▽ More A structure-preserving kernel ridge regression method is presented that allows the recovery of nonlinear Hamiltonian functions out of datasets made of noisy observations of Hamiltonian vector fields. The method proposes a closed-form solution that yields excellent numerical performances that surpass other techniques proposed in the literature in this setup. From the methodological point of view, the paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required and, in particular, a differential reproducing property and a Representer Theorem are proved in this context. The relation between the structure-preserving kernel estimator and the Gaussian posterior mean estimator is analyzed. A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters. The good performance of the proposed estimator together with the convergence rate is illustrated with various numerical experiments. △ Less

Submitted 4 April, 2025; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.00019 [pdf, other]

Transformer-based Parameter Estimation in Statistics

Authors: Xiaoxin Yin, David S. Yin

Abstract: Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not… ▽ More Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not exist (e.g., for Beta distribution). In this paper we propose a transformer-based approach to parameter estimation. Compared with existing solutions, our approach does not require a closed-form solution or any mathematical derivations. It does not even require knowing the probability density function, which is needed by numerical methods. After the transformer model is trained, only a single inference is needed to estimate the parameters of the underlying distribution based on a sample of observations. In the empirical study we compared our approach with maximum likelihood estimation on commonly used distributions such as normal distribution, exponential distribution and beta distribution. It is shown that our approach achieves similar or better accuracy as measured by mean-square-errors. △ Less

Submitted 27 February, 2024; originally announced March 2024.

arXiv:2108.05533 [pdf, ps, other]

Efficient Local Planning with Linear Function Approximation

Authors: Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

Abstract: We study query and computationally efficient planning algorithms with linear function approximation and a simulator. We assume that the agent only has local access to the simulator, meaning that the agent can only query the simulator at states that have been visited before. This setting is more practical than many prior works on reinforcement learning with a generative model. We propose two algori… ▽ More We study query and computationally efficient planning algorithms with linear function approximation and a simulator. We assume that the agent only has local access to the simulator, meaning that the agent can only query the simulator at states that have been visited before. This setting is more practical than many prior works on reinforcement learning with a generative model. We propose two algorithms, named confident Monte Carlo least square policy iteration (Confident MC-LSPI) and confident Monte Carlo Politex (Confident MC-Politex) for this setting. Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space. One technical contribution of our work is the introduction of a novel proof technique that makes use of a virtual policy iteration algorithm. We use this method to leverage existing results on $\ell_\infty$-bounded approximate policy iteration to show that our algorithm can learn the optimal policy for the given initial state even only with local access to the simulator. We believe that this technique can be extended to broader settings beyond this work. △ Less

Submitted 4 February, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: Algorithmic Learning Theory 2022

arXiv:2102.12611 [pdf, other]

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

Authors: Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

Abstract: In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation.… ▽ More In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation. Our result provides the first high-probability $O(\sqrt{T})$ regret bound for a computationally efficient algorithm in this setting. The exact implementation of Politex with neural network function approximation is inefficient in terms of memory and computation. Since our analysis suggests that we need to approximate the average of the action-value functions of past policies well, we propose a simple efficient implementation where we train a single Q-function on a replay buffer with past data. We show that this often leads to superior performance over other implementation choices, especially in terms of wall-clock time. Our work also provides a novel theoretical justification for using experience replay within policy iteration algorithms. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2006.10974 [pdf, ps, other]

Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

Authors: Dong Yin, Mehrdad Farajtabar, Ang Li, Nir Levine, Alex Mott

Abstract: Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary… ▽ More Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary classes of methods to alleviate catastrophic forgetting. In this paper, we provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. This viewpoint leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning. Our theoretical results indicate the importance of accurate approximation of the Hessian matrix. The experimental results on several benchmarks provide empirical validation of our theoretical findings. △ Less

Submitted 8 February, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: Preliminary version with a different title presented at ICML Workshop on Continual Learning, 2020 (spotlight)

arXiv:2006.04435 [pdf, other]

CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

Authors: Xiang Li, Ben Kao, Caihua Shan, Dawei Yin, Martin Ester

Abstract: We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apar… ▽ More We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart while those of a dense cluster have to be sufficiently close. Following [16], we solve the problem of spectral clustering on multi-scale data by integrating the concept of objects' "reachability similarity" with a given distance-based similarity to derive an objects' coefficient matrix. We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix. We prove that the resulting coefficient matrix has the "grouping effect" and that it exhibits "sparsity". We show that these two characteristics imply very effective spectral clustering. We evaluate CAST and 10 other clustering methods on a wide range of datasets w.r.t. various measures. Experimental results show that CAST provides excellent performance and is highly robust across test cases of multi-scale data. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2006.04088 [pdf, other]

An Efficient Framework for Clustered Federated Learning

Authors: Avishek Ghosh, Jichan Chung, Dong Yin, Kannan Ramchandran

Abstract: We address the problem of federated learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient federated learning. For th… ▽ More We address the problem of federated learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient federated learning. For this new framework of clustered federated learning, we propose the Iterative Federated Clustering Algorithm (IFCA), which alternately estimates the cluster identities of the users and optimizes model parameters for the user clusters via gradient descent. We analyze the convergence rate of this algorithm first in a linear model with squared loss and then for generic strongly convex and smooth loss functions. We show that in both settings, with good initialization, IFCA is guaranteed to converge, and discuss the optimality of the statistical error rate. In particular, for the linear model with two clusters, we can guarantee that our algorithm converges as long as the initialization is slightly better than random. When the clustering structure is ambiguous, we propose to train the models by combining IFCA with the weight sharing technique in multi-task learning. In the experiments, we show that our algorithm can succeed even if we relax the requirements on initialization with random initialization and multiple restarts. We also present experimental results showing that our algorithm is efficient in non-convex problems such as neural networks. We demonstrate the benefits of IFCA over the baselines on several clustered FL benchmarks. △ Less

Submitted 8 June, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: Preliminary results appeared at NeurIPS 2020

arXiv:2006.00945 [pdf, other]

Robust Reinforcement Learning with Wasserstein Constraint

Authors: Linfang Hou, Liang Pang, Xin Hong, Yanyan Lan, Zhiming Ma, Dawei Yin

Abstract: Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness through disturbing the current state or simulating environmental parameters in a heuristic way, which lack quantified robustness to the system dynamics (i.e. transition probability). To overcome this issue, we leverage Wa… ▽ More Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness through disturbing the current state or simulating environmental parameters in a heuristic way, which lack quantified robustness to the system dynamics (i.e. transition probability). To overcome this issue, we leverage Wasserstein distance to measure the disturbance to the reference transition kernel. With Wasserstein distance, we are able to connect transition kernel disturbance to the state disturbance, i.e. reduce an infinite-dimensional optimization problem to a finite-dimensional risk-aware problem. Through the derived risk-aware optimal Bellman equation, we show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm--Wasserstein Robust Advantage Actor-Critic algorithm (WRAAC). The effectiveness of the proposed algorithm is verified in the Cart-Pole environment. △ Less

Submitted 1 June, 2020; originally announced June 2020.

arXiv:2001.06216 [pdf, other]

GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks

Authors: Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, Dawei Yin, Yi Chang

Abstract: Graph structured data has wide applicability in various domains such as physics, chemistry, biology, computer vision, and social networks, to name a few. Recently, graph neural networks (GNN) were shown to be successful in effectively representing graph structured data because of their good performance and generalization ability. GNN is a deep learning based method that learns a node representatio… ▽ More Graph structured data has wide applicability in various domains such as physics, chemistry, biology, computer vision, and social networks, to name a few. Recently, graph neural networks (GNN) were shown to be successful in effectively representing graph structured data because of their good performance and generalization ability. GNN is a deep learning based method that learns a node representation by combining specific nodes and the structural/topological information of a graph. However, like other deep models, explaining the effectiveness of GNN models is a challenging task because of the complex nonlinear transformations made over the iterations. In this paper, we propose GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method. GraphLIME is a generic GNN-model explanation framework that learns a nonlinear interpretable model locally in the subgraph of the node being explained. More specifically, to explain a node, we generate a nonlinear interpretable model from its $N$-hop neighborhood and then compute the K most representative features as the explanations of its prediction using HSIC Lasso. Through experiments on two real-world datasets, the explanations of GraphLIME are found to be of extraordinary degree and more descriptive in comparison to the existing explanation methods. △ Less

Submitted 27 September, 2020; v1 submitted 17 January, 2020; originally announced January 2020.

arXiv:1907.09652 [pdf, other]

Off-policy Learning for Multiple Loggers

Authors: Li He, Long Xia, Wei Zeng, Zhi-Ming Ma, Yihong Zhao, Dawei Yin

Abstract: It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is more crucial to apply off-policy learning in real-world applications instead. Though there have been some existing works, most are focusing on learning with one… ▽ More It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is more crucial to apply off-policy learning in real-world applications instead. Though there have been some existing works, most are focusing on learning with one single historical policy. However, in practice, usually a number of parallel experiments, e.g. multiple AB tests, are performed simultaneously. To make full use of such historical data, learning policies from multiple loggers becomes necessary. Motivated by this, in this paper, we investigate off-policy learning when the training data coming from multiple historical policies. Specifically, policies, e.g. neural networks, can be learned directly from multi-logger data, with counterfactual estimators. In order to understand the generalization ability of such estimator better, we conduct generalization error analysis for the empirical risk minimization problem. We then introduce the generalization error bound as the new risk function, which can be reduced to a constrained optimization problem. Finally, we give the corresponding learning algorithm for the new constrained problem, where we can appeal to the minimax problems to control the constraints. Extensive experiments on benchmark datasets demonstrate that the proposed methods achieve better performances than the state-of-the-arts. △ Less

Submitted 5 August, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

arXiv:1907.03215 [pdf, other]

Stochastic Gradient and Langevin Processes

Authors: Xiang Cheng, Dong Yin, Peter L. Bartlett, Michael I. Jordan

Abstract: We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second mom… ▽ More We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second moment of the additive noise. We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset. △ Less

Submitted 18 November, 2020; v1 submitted 6 July, 2019; originally announced July 2019.

Comments: ICML 2020, code available at https://github.com/dongyin92/noise_covariance

arXiv:1906.08988 [pdf, other]

A Fourier Perspective on Model Robustness in Computer Vision

Authors: Dong Yin, Raphael Gontijo Lopes, Jonathon Shlens, Ekin D. Cubuk, Justin Gilmer

Abstract: Achieving robustness to distributional shift is a longstanding and challenging goal of computer vision. Data augmentation is a commonly used approach for improving robustness, however robustness gains are typically not uniform across corruption types. Indeed increasing performance in the presence of random noise is often met with reduced performance on other corruptions such as contrast change. Un… ▽ More Achieving robustness to distributional shift is a longstanding and challenging goal of computer vision. Data augmentation is a commonly used approach for improving robustness, however robustness gains are typically not uniform across corruption types. Indeed increasing performance in the presence of random noise is often met with reduced performance on other corruptions such as contrast change. Understanding when and why these sorts of trade-offs occur is a crucial step towards mitigating them. Towards this end, we investigate recently observed trade-offs caused by Gaussian data augmentation and adversarial training. We find that both methods improve robustness to corruptions that are concentrated in the high frequency domain while reducing robustness to corruptions that are concentrated in the low frequency domain. This suggests that one way to mitigate these trade-offs via data augmentation is to use a more diverse set of augmentations. Towards this end we observe that AutoAugment, a recently proposed data augmentation policy optimized for clean accuracy, achieves state-of-the-art robustness on the CIFAR-10-C benchmark. △ Less

Submitted 16 September, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

Comments: NeurIPS 2019

arXiv:1906.06629 [pdf, other]

Robust Federated Learning in a Heterogeneous Environment

Authors: Avishek Ghosh, Justin Hong, Dong Yin, Kannan Ramchandran

Abstract: We study a recently proposed large-scale distributed learning paradigm, namely Federated Learning, where the worker machines are end users' own devices. Statistical and computational challenges arise in Federated Learning particularly in the presence of heterogeneous data distribution (i.e., data points on different devices belong to different distributions signifying different clusters) and Byzan… ▽ More We study a recently proposed large-scale distributed learning paradigm, namely Federated Learning, where the worker machines are end users' own devices. Statistical and computational challenges arise in Federated Learning particularly in the presence of heterogeneous data distribution (i.e., data points on different devices belong to different distributions signifying different clusters) and Byzantine machines (i.e., machines that may behave abnormally, or even exhibit arbitrary and potentially adversarial behavior). To address the aforementioned challenges, first we propose a general statistical model for this problem which takes both the cluster structure of the users and the Byzantine machines into account. Then, leveraging the statistical model, we solve the robust heterogeneous Federated Learning problem \emph{optimally}; in particular our algorithm matches the lower bound on the estimation error in dimension and the number of data points. Furthermore, as a by-product, we prove statistical guarantees for an outlier-robust clustering algorithm, which can be considered as the Lloyd algorithm with robust estimation. Finally, we show via synthetic as well as real data experiments that the estimation error obtained by our proposed algorithm is significantly better than the non-Byzantine-robust algorithms; in particular, we gain at least by 53\% and 33\% for synthetic and real data experiments, respectively, in typical settings. △ Less

Submitted 9 October, 2019; v1 submitted 15 June, 2019; originally announced June 2019.

Comments: Fixing technical issues. Please discard any previous version

arXiv:1906.02611 [pdf, other]

Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation

Authors: Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, Ekin D. Cubuk

Abstract: Deploying machine learning systems in the real world requires both high accuracy on clean data and robustness to naturally occurring corruptions. While architectural advances have led to improved accuracy, building robust models remains challenging. Prior work has argued that there is an inherent trade-off between robustness and accuracy, which is exemplified by standard data augment techniques su… ▽ More Deploying machine learning systems in the real world requires both high accuracy on clean data and robustness to naturally occurring corruptions. While architectural advances have led to improved accuracy, building robust models remains challenging. Prior work has argued that there is an inherent trade-off between robustness and accuracy, which is exemplified by standard data augment techniques such as Cutout, which improves clean accuracy but not robustness, and additive Gaussian noise, which improves robustness but hurts accuracy. To overcome this trade-off, we introduce Patch Gaussian, a simple augmentation scheme that adds noise to randomly selected patches in an input image. Models trained with Patch Gaussian achieve state of the art on the CIFAR-10 and ImageNetCommon Corruptions benchmarks while also improving accuracy on clean data. We find that this augmentation leads to reduced sensitivity to high frequency noise(similar to Gaussian) while retaining the ability to take advantage of relevant high frequency information in the image (similar to Cutout). Finally, we show that Patch Gaussian can be used in conjunction with other regularization methods and data augmentation policies such as AutoAugment, and improves performance on the COCO object detection benchmark. △ Less

Submitted 6 June, 2019; originally announced June 2019.

arXiv:1810.11914 [pdf, other]

Rademacher Complexity for Adversarially Robust Generalization

Authors: Dong Yin, Kannan Ramchandran, Peter Bartlett

Abstract: Many machine learning models are vulnerable to adversarial attacks; for example, adding adversarial perturbations that are imperceptible to humans can often make machine learning models produce wrong predictions with high confidence. Moreover, although we may obtain robust models on the training dataset via adversarial training, in some problems the learned models cannot generalize well to the tes… ▽ More Many machine learning models are vulnerable to adversarial attacks; for example, adding adversarial perturbations that are imperceptible to humans can often make machine learning models produce wrong predictions with high confidence. Moreover, although we may obtain robust models on the training dataset via adversarial training, in some problems the learned models cannot generalize well to the test data. In this paper, we focus on $\ell_\infty$ attacks, and study the adversarially robust generalization problem through the lens of Rademacher complexity. For binary linear classifiers, we prove tight bounds for the adversarial Rademacher complexity, and show that the adversarial Rademacher complexity is never smaller than its natural counterpart, and it has an unavoidable dimension dependence, unless the weight vector has bounded $\ell_1$ norm. The results also extend to multi-class linear classifiers. For (nonlinear) neural networks, we show that the dimension dependence in the adversarial Rademacher complexity also exists. We further consider a surrogate adversarial loss for one-hidden layer ReLU network and prove margin bounds for this setting. Our results indicate that having $\ell_1$ norm constraints on the weight matrices might be a potential way to improve generalization in the adversarial setting. We demonstrate experimental results that validate our theoretical findings. △ Less

Submitted 29 July, 2020; v1 submitted 28 October, 2018; originally announced October 2018.

Comments: ICML 2019

arXiv:1810.10627 [pdf, other]

Streaming Graph Neural Networks

Authors: Yao Ma, Ziyi Guo, Zhaochun Ren, Eric Zhao, Jiliang Tang, Dawei Yin

Abstract: Graphs are essential representations of many real-world data such as social networks. Recent years have witnessed the increasing efforts made to extend the neural network models to graph-structured data. These methods, which are usually known as the graph neural networks, have been applied to advance many graphs related tasks such as reasoning dynamics of the physical system, graph classification,… ▽ More Graphs are essential representations of many real-world data such as social networks. Recent years have witnessed the increasing efforts made to extend the neural network models to graph-structured data. These methods, which are usually known as the graph neural networks, have been applied to advance many graphs related tasks such as reasoning dynamics of the physical system, graph classification, and node classification. Most of the existing graph neural network models have been designed for static graphs, while many real-world graphs are inherently dynamic. For example, social networks are naturally evolving as new users joining and new relations being created. Current graph neural network models cannot utilize the dynamic information in dynamic graphs. However, the dynamic information has been proven to enhance the performance of many graph analytic tasks such as community detection and link prediction. Hence, it is necessary to design dedicated graph neural networks for dynamic graphs. In this paper, we propose DGNN, a new {\bf D}ynamic {\bf G}raph {\bf N}eural {\bf N}etwork model, which can model the dynamic information as the graph evolving. In particular, the proposed framework can keep updating node information by capturing the sequential information of edges (interactions), the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework. △ Less

Submitted 6 November, 2018; v1 submitted 24 October, 2018; originally announced October 2018.

arXiv:1808.06170 [pdf, other]

Linked Recurrent Neural Networks

Authors: Zhiwei Wang, Yao Ma, Dawei Yin, Jiliang Tang

Abstract: Recurrent Neural Networks (RNNs) have been proven to be effective in modeling sequential data and they have been applied to boost a variety of tasks such as document classification, speech recognition and machine translation. Most of existing RNN models have been designed for sequences assumed to be identically and independently distributed (i.i.d). However, in many real-world applications, sequen… ▽ More Recurrent Neural Networks (RNNs) have been proven to be effective in modeling sequential data and they have been applied to boost a variety of tasks such as document classification, speech recognition and machine translation. Most of existing RNN models have been designed for sequences assumed to be identically and independently distributed (i.i.d). However, in many real-world applications, sequences are naturally linked. For example, web documents are connected by hyperlinks; and genes interact with each other. On the one hand, linked sequences are inherently not i.i.d., which poses tremendous challenges to existing RNN models. On the other hand, linked sequences offer link information in addition to the sequential information, which enables unprecedented opportunities to build advanced RNN models. In this paper, we study the problem of RNN for linked sequences. In particular, we introduce a principled approach to capture link information and propose a linked Recurrent Neural Network (LinkedRNN), which models sequential and link information coherently. We conduct experiments on real-world datasets from multiple domains and the experimental results validate the effectiveness of the proposed framework. △ Less

Submitted 19 August, 2018; originally announced August 2018.

arXiv:1806.05358 [pdf, ps, other]

Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning

Authors: Dong Yin, Yudong Chen, Kannan Ramchandran, Peter Bartlett

Abstract: We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior. In this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators… ▽ More We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior. In this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators are used. We develop ByzantinePGD, a robust first-order algorithm that can provably escape saddle points and fake local minima, and converge to an approximate true local minimizer with low iteration complexity. As a by-product, we give a simpler algorithm and analysis for escaping saddle points in the usual non-Byzantine setting. We further discuss three robust gradient estimators that can be used in ByzantinePGD, including median, trimmed mean, and iterative filtering. We characterize their performance in concrete statistical settings, and argue for their near-optimality in low and high dimensional regimes. △ Less

Submitted 29 July, 2020; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: ICML 2019

arXiv:1803.01498 [pdf, other]

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

Authors: Dong Yin, Yudong Chen, Kannan Ramchandran, Peter Bartlett

Abstract: In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures -- arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving opti… ▽ More In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures -- arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving optimal statistical performance. A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively. We prove statistical error rates for three kinds of population loss functions: strongly convex, non-strongly convex, and smooth non-convex. In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. To achieve better communication efficiency, we further propose a median-based distributed algorithm that is provably robust, and uses only one communication round. For strongly convex quadratic loss, we show that this algorithm achieves the same optimal error rate as the robust distributed gradient descent algorithms. △ Less

Submitted 25 February, 2021; v1 submitted 5 March, 2018; originally announced March 2018.

Comments: ICML 2018

arXiv:1802.06501 [pdf, other]

doi 10.1145/3219819.3219886

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

Authors: Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, Dawei Yin

Abstract: Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously impr… ▽ More Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback. Users' feedback can be positive and negative and both types of feedback have great potentials to boost recommendations. However, the number of negative feedback is much larger than that of positive one; thus incorporating them simultaneously is challenging since positive feedback could be buried by negative one. In this paper, we develop a novel approach to incorporate them into the proposed deep recommender system (DEERS) framework. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to understand the importance of both positive and negative feedback in recommendations. △ Less

Submitted 9 August, 2018; v1 submitted 18 February, 2018; originally announced February 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1801.00209

arXiv:1802.05315 [pdf, other]

Online Learning for Non-Stationary A/B Tests

Authors: Andrés Muñoz Medina, Sergei Vassilvitskii, Dong Yin

Abstract: The rollout of new versions of a feature in modern applications is a manual multi-stage process, as the feature is released to ever larger groups of users, while its performance is carefully monitored. This kind of A/B testing is ubiquitous, but suboptimal, as the monitoring requires heavy human intervention, is not guaranteed to capture consistent, but short-term fluctuations in performance, and… ▽ More The rollout of new versions of a feature in modern applications is a manual multi-stage process, as the feature is released to ever larger groups of users, while its performance is carefully monitored. This kind of A/B testing is ubiquitous, but suboptimal, as the monitoring requires heavy human intervention, is not guaranteed to capture consistent, but short-term fluctuations in performance, and is inefficient, as better versions take a long time to reach the full population. In this work we formulate this question as that of expert learning, and give a new algorithm Follow-The-Best-Interval, FTBI, that works in dynamic, non-stationary environments. Our approach is practical, simple, and efficient, and has rigorous guarantees on its performance. Finally, we perform a thorough evaluation on synthetic and real world datasets and show that our approach outperforms current state-of-the-art methods. △ Less

Submitted 27 May, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

arXiv:1801.00209 [pdf, other]

Deep Reinforcement Learning for List-wise Recommendations

Authors: Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, Jiliang Tang

Abstract: Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously impr… ▽ More Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework. △ Less

Submitted 27 June, 2019; v1 submitted 30 December, 2017; originally announced January 2018.

arXiv:1608.04048 [pdf, other]

Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data

Authors: Makoto Yamada, Jiliang Tang, Jose Lugo-Martinez, Ermin Hodzic, Raunak Shrestha, Avishek Saha, Hua Ouyang, Dawei Yin, Hiroshi Mamitsuka, Cenk Sahinalp, Predrag Radivojac, Filippo Menczer, Yi Chang

Abstract: Machine learning methods are used to discover complex nonlinear relationships in biological and medical data. However, sophisticated learning models are computationally unfeasible for data with millions of features. Here we introduce the first feature selection method for nonlinear learning problems that can scale up to large, ultra-high dimensional biological data. More specifically, we scale up… ▽ More Machine learning methods are used to discover complex nonlinear relationships in biological and medical data. However, sophisticated learning models are computationally unfeasible for data with millions of features. Here we introduce the first feature selection method for nonlinear learning problems that can scale up to large, ultra-high dimensional biological data. More specifically, we scale up the novel Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) to handle millions of features with tens of thousand samples. The proposed method is guaranteed to find an optimal subset of maximally predictive features with minimal redundancy, yielding higher predictive power and improved interpretability. Its effectiveness is demonstrated through applications to classify phenotypes based on module expression in human prostate cancer patients and to detect enzymes among protein structures. We achieve high accuracy with as few as 20 out of one million features --- a dimensionality reduction of 99.998%. Our algorithm can be implemented on commodity cloud computing platforms. The dramatic reduction of features may lead to the ubiquitous deployment of sophisticated prediction models in mobile health care applications. △ Less

Submitted 13 August, 2016; originally announced August 2016.

Comments: Substantially improved version of arXiv:1411.2331

arXiv:1605.08346 [pdf, other]

Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks

Authors: Adam Charles, Dong Yin, Christopher Rozell

Abstract: Recurrent neural networks (RNNs) have drawn interest from machine learning researchers because of their effectiveness at preserving past inputs for time-varying data processing tasks. To understand the success and limitations of RNNs, it is critical that we advance our analysis of their fundamental memory properties. We focus on echo state networks (ESNs), which are RNNs with simple memoryless nod… ▽ More Recurrent neural networks (RNNs) have drawn interest from machine learning researchers because of their effectiveness at preserving past inputs for time-varying data processing tasks. To understand the success and limitations of RNNs, it is critical that we advance our analysis of their fundamental memory properties. We focus on echo state networks (ESNs), which are RNNs with simple memoryless nodes and random connectivity. In most existing analyses, the short-term memory (STM) capacity results conclude that the ESN network size must scale linearly with the input size for unstructured inputs. The main contribution of this paper is to provide general results characterizing the STM capacity for linear ESNs with multidimensional input streams when the inputs have common low-dimensional structure: sparsity in a basis or significant statistical dependence between inputs. In both cases, we show that the number of nodes in the network must scale linearly with the information rate and poly-logarithmically with the ambient input dimension. The analysis relies on advanced applications of random matrix theory and results in explicit non-asymptotic bounds on the recovery error. Taken together, this analysis provides a significant step forward in our understanding of the STM properties in RNNs. △ Less

Submitted 27 January, 2017; v1 submitted 26 May, 2016; originally announced May 2016.

Comments: 37 pages, 3 figures

Journal ref: Journal of Machine Learning Research, 18:1-37 Jan. 2017

arXiv:1412.2113 [pdf, other]

Consistent Collective Matrix Completion under Joint Low Rank Structure

Authors: Suriya Gunasekar, Makoto Yamada, Dawei Yin, Yi Chang

Abstract: We address the collective matrix completion problem of jointly recovering a collection of matrices with shared structure from partial (and potentially noisy) observations. To ensure well--posedness of the problem, we impose a joint low rank structure, wherein each component matrix is low rank and the latent space of the low rank factors corresponding to each entity is shared across the entire coll… ▽ More We address the collective matrix completion problem of jointly recovering a collection of matrices with shared structure from partial (and potentially noisy) observations. To ensure well--posedness of the problem, we impose a joint low rank structure, wherein each component matrix is low rank and the latent space of the low rank factors corresponding to each entity is shared across the entire collection. We first develop a rigorous algebra for representing and manipulating collective--matrix structure, and identify sufficient conditions for consistent estimation of collective matrices. We then propose a tractable convex estimator for solving the collective matrix completion problem, and provide the first non--trivial theoretical guarantees for consistency of collective matrix completion. We show that under reasonable assumptions stated in Section 3.1, with high probability, the proposed estimator exactly recovers the true matrices whenever sample complexity requirements dictated by Theorem 1 are met. The sample complexity requirement derived in the paper are optimum up to logarithmic factors, and significantly improve upon the requirements obtained by trivial extensions of standard matrix completion. Finally, we propose a scalable approximate algorithm to solve the proposed convex program, and corroborate our results through simulated experiments. △ Less

Submitted 7 April, 2015; v1 submitted 5 December, 2014; originally announced December 2014.

Comments: 19 pages, 3 figures

arXiv:1411.2331 [pdf, ps, other]

N$^3$LARS: Minimum Redundancy Maximum Relevance Feature Selection for Large and High-dimensional Data

Authors: Makoto Yamada, Avishek Saha, Hua Ouyang, Dawei Yin, Yi Chang

Abstract: We propose a feature selection method that finds non-redundant features from a large and high-dimensional data in nonlinear way. Specifically, we propose a nonlinear extension of the non-negative least-angle regression (LARS) called N${}^3$LARS, where the similarity between input and output is measured through the normalized version of the Hilbert-Schmidt Independence Criterion (HSIC). An advantag… ▽ More We propose a feature selection method that finds non-redundant features from a large and high-dimensional data in nonlinear way. Specifically, we propose a nonlinear extension of the non-negative least-angle regression (LARS) called N${}^3$LARS, where the similarity between input and output is measured through the normalized version of the Hilbert-Schmidt Independence Criterion (HSIC). An advantage of N${}^3$LARS is that it can easily incorporate with map-reduce frameworks such as Hadoop and Spark. Thus, with the help of distributed computing, a set of features can be efficiently selected from a large and high-dimensional data. Moreover, N${}^3$LARS is a convex method and can find a global optimum solution. The effectiveness of the proposed method is first demonstrated through feature selection experiments for classification and regression with small and high-dimensional datasets. Finally, we evaluate our proposed method over a large and high-dimensional biology dataset. △ Less

Submitted 10 November, 2014; originally announced November 2014.

Comments: arXiv admin note: text overlap with arXiv:1202.0515

Showing 1–27 of 27 results for author: Yin, D