Search | arXiv e-print repository

GLL: A Differentiable Graph Learning Layer for Neural Networks

Authors: Jason Brown, Bohan Chen, Harris Hardiman-Mostow, Jeff Calder, Andrea L. Bertozzi

Abstract: Standard deep learning architectures used for classification generate label predictions with a projection head and softmax activation function. Although successful, these methods fail to leverage the relational information between samples in the batch for generating label predictions. In recent works, graph-based learning techniques, namely Laplace learning, have been heuristically combined with n… ▽ More Standard deep learning architectures used for classification generate label predictions with a projection head and softmax activation function. Although successful, these methods fail to leverage the relational information between samples in the batch for generating label predictions. In recent works, graph-based learning techniques, namely Laplace learning, have been heuristically combined with neural networks for both supervised and semi-supervised learning (SSL) tasks. However, prior works approximate the gradient of the loss function with respect to the graph learning algorithm or decouple the processes; end-to-end integration with neural networks is not achieved. In this work, we derive backpropagation equations, via the adjoint method, for inclusion of a general family of graph learning layers into a neural network. This allows us to precisely integrate graph Laplacian-based label propagation into a neural network layer, replacing a projection head and softmax activation function for classification tasks. Using this new framework, our experimental results demonstrate smooth label transitions across data, improved robustness to adversarial attacks, improved generalization, and improved training dynamics compared to the standard softmax-based approach. △ Less

Submitted 10 December, 2024; originally announced December 2024.

Comments: 44 pages, 11 figures. Preprint. Submitted to the Journal of Machine Learning Research

MSC Class: 68T05; 68T07; 35R02 ACM Class: I.2.6; I.2.10; I.4.0

arXiv:2406.13781 [pdf, other]

A Primal-Dual Framework for Transformers and Neural Networks

Authors: Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Abstract: Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresp… ▽ More Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem, whose primal formulation has the form of a neural network layer. Using our framework, we derive popular attention layers used in practice and propose two new attentions: 1) the Batch Normalized Attention (Attention-BN) derived from the batch normalization layer and 2) the Attention with Scaled Head (Attention-SH) derived from using less training data to fit the SVR model. We empirically demonstrate the advantages of the Attention-BN and Attention-SH in reducing head redundancy, increasing the model's accuracy, and improving the model's efficiency in a variety of practical applications including image and time-series classification. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to ICLR 2023, 26 pages, 4 figures, 14 tables

arXiv:2311.11163 [pdf, other]

Hate speech and hate crimes: a data-driven study of evolving discourse around marginalized groups

Authors: Malvina Bozhidarova, Jonathn Chang, Aaishah Ale-rasool, Yuxiang Liu, Chongyao Ma, Andrea L. Bertozzi, P. Jeffrey Brantingham, Junyuan Lin, Sanjukta Krishnagopal

Abstract: This study explores the dynamic relationship between online discourse, as observed in tweets, and physical hate crimes, focusing on marginalized groups. Leveraging natural language processing techniques, including keyword extraction and topic modeling, we analyze the evolution of online discourse after events affecting these groups. Examining sentiment and polarizing tweets, we establish correlati… ▽ More This study explores the dynamic relationship between online discourse, as observed in tweets, and physical hate crimes, focusing on marginalized groups. Leveraging natural language processing techniques, including keyword extraction and topic modeling, we analyze the evolution of online discourse after events affecting these groups. Examining sentiment and polarizing tweets, we establish correlations with hate crimes in Black and LGBTQ+ communities. Using a knowledge graph, we connect tweets, users, topics, and hate crimes, enabling network analyses. Our findings reveal divergent patterns in the evolution of user communities for Black and LGBTQ+ groups, with notable differences in sentiment among influential users. This analysis sheds light on distinctive online discourse patterns and emphasizes the need to monitor hate speech to prevent hate crimes, especially following significant events impacting marginalized communities. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2110.07739 [pdf]

doi 10.1007/s42967-023-00328-z

Model-Change Active Learning in Graph-Based Semi-Supervised Learning

Authors: Kevin Miller, Andrea L. Bertozzi

Abstract: Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier. A challenge is to identify which points to label to best improve performance while limiting the number of new labels. "Model Change" active learning quantifies the resulting change incurred in the classifier by introducing the additional… ▽ More Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier. A challenge is to identify which points to label to best improve performance while limiting the number of new labels. "Model Change" active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s). We pair this idea with graph-based semi-supervised learning methods, that use the spectrum of the graph Laplacian matrix, which can be truncated to avoid prohibitively large computational and storage costs. We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution. We show a variety of multiclass examples that illustrate improved performance over prior state-of-art. △ Less

Submitted 14 November, 2024; v1 submitted 14 October, 2021; originally announced October 2021.

Journal ref: Commun. Appl. Math. Comput. 6, 1270-1298 (2024)

arXiv:2007.12809 [pdf, other]

doi 10.1088/1361-6420/ac1e80

Posterior Consistency of Semi-Supervised Regression on Graphs

Authors: Andrea L. Bertozzi, Bamdad Hosseini, Hao Li, Kevin Miller, Andrew M. Stuart

Abstract: Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present… ▽ More Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present a Bayesian formulation of SSR in which the weighted graph defines a Gaussian prior, using a graph Laplacian, and the labeled data defines a likelihood. We analyze the rate of contraction of the posterior measure around the ground truth in terms of parameters that quantify the small label error and inherent clustering in the graph. We obtain bounds on the rates of contraction and illustrate their sharpness through numerical experiments. The analysis also gives insight into the choice of hyperparameters that enter the definition of the prior. △ Less

Submitted 24 March, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

arXiv:2007.11126 [pdf, other]

Efficient Graph-Based Active Learning with Probit Likelihood via Gaussian Approximations

Authors: Kevin Miller, Hao Li, Andrea L. Bertozzi

Abstract: We present a novel adaptation of active learning to graph-based semi-supervised learning (SSL) under non-Gaussian Bayesian models. We present an approximation of non-Gaussian distributions to adapt previously Gaussian-based acquisition functions to these more general cases. We develop an efficient rank-one update for applying "look-ahead" based methods as well as model retraining. We also introduc… ▽ More We present a novel adaptation of active learning to graph-based semi-supervised learning (SSL) under non-Gaussian Bayesian models. We present an approximation of non-Gaussian distributions to adapt previously Gaussian-based acquisition functions to these more general cases. We develop an efficient rank-one update for applying "look-ahead" based methods as well as model retraining. We also introduce a novel "model change" acquisition function based on these approximations that further expands the available collection of active learning acquisition functions for such methods. △ Less

Submitted 21 July, 2020; originally announced July 2020.

Comments: Accepted in ICML Workshop on Real World Experiment Design and Active Learning 2020

arXiv:2006.06919 [pdf, other]

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Authors: Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Abstract: Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numeri… ▽ More Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at https://github.com/minhtannguyen/MomentumRNN. △ Less

Submitted 11 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 21 pages, 11 figures, Accepted for publication at Advances in Neural Information Processing Systems (NeurIPS) 2020

MSC Class: 68T07 ACM Class: I.2

Journal ref: Advances in Neural Information Processing Systems (NeurIPS) 2020

arXiv:2003.00631 [pdf, other]

Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets

Authors: Thu Dinh, Bao Wang, Andrea L. Bertozzi, Stanley J. Osher

Abstract: Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep l… ▽ More Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning. Such a co-design enables us to advance the goal of accommodating both sparsity and robustness. With this objective in mind, we leverage the relaxed augmented Lagrangian based algorithms to prune the weights of adversarially trained DNNs, at both structured and unstructured levels. Using a Feynman-Kac formalism principled robust and sparse DNNs, we can at least double the channel sparsity of the adversarially trained ResNet20 for CIFAR10 classification, meanwhile, improve the natural accuracy by $8.69$\% and the robust accuracy under the benchmark $20$ iterations of IFGSM attack by $5.42$\%. The code is available at \url{https://github.com/BaoWangMath/rvsm-rgsm-admm}. △ Less

Submitted 1 March, 2020; originally announced March 2020.

Comments: 16 pages, 7 figures

MSC Class: 68T01

arXiv:2002.10583 [pdf, other]

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Authors: Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Abstract: Stochastic gradient descent (SGD) with constant momentum and its variants such as Adam are the optimization algorithms of choice for training deep neural networks (DNNs). Since DNN training is incredibly computationally expensive, there is great interest in speeding up the convergence. Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimizatio… ▽ More Stochastic gradient descent (SGD) with constant momentum and its variants such as Adam are the optimization algorithms of choice for training deep neural networks (DNNs). Since DNN training is incredibly computationally expensive, there is great interest in speeding up the convergence. Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst. In this paper, we propose Scheduled Restart SGD (SRSGD), a new NAG-style scheme for training DNNs. SRSGD replaces the constant momentum in SGD by the increasing momentum in NAG but stabilizes the iterations by resetting the momentum to zero according to a schedule. Using a variety of models and benchmarks for image classification, we demonstrate that, in training DNNs, SRSGD significantly improves convergence and generalization; for instance in training ResNet200 for ImageNet classification, SRSGD achieves an error rate of 20.93% vs. the benchmark of 22.13%. These improvements become more significant as the network grows deeper. Furthermore, on both CIFAR and ImageNet, SRSGD reaches similar or even better error rates with significantly fewer training epochs compared to the SGD baseline. △ Less

Submitted 26 April, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: 35 pages, 16 figures, 18 tables

arXiv:1904.09062 [pdf, other]

Semi-Supervised First-Person Activity Recognition in Body-Worn Video

Authors: Honglin Chen, Hao Li, Alexander Song, Matt Haberland, Osman Akar, Adam Dhillon, Tiankuang Zhou, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: Body-worn cameras are now commonly used for logging daily life, sports, and law enforcement activities, creating a large volume of archived footage. This paper studies the problem of classifying frames of footage according to the activity of the camera-wearer with an emphasis on application to real-world police body-worn video. Real-world datasets pose a different set of challenges from existing e… ▽ More Body-worn cameras are now commonly used for logging daily life, sports, and law enforcement activities, creating a large volume of archived footage. This paper studies the problem of classifying frames of footage according to the activity of the camera-wearer with an emphasis on application to real-world police body-worn video. Real-world datasets pose a different set of challenges from existing egocentric vision datasets: the amount of footage of different activities is unbalanced, the data contains personally identifiable information, and in practice it is difficult to provide substantial training footage for a supervised approach. We address these challenges by extracting features based exclusively on motion information then segmenting the video footage using a semi-supervised classification algorithm. On publicly available datasets, our method achieves results comparable to, if not better than, supervised and/or deep learning methods using a fraction of the training data. It also shows promising results on real-world police body-worn video. △ Less

Submitted 18 April, 2019; originally announced April 2019.

arXiv:1902.05113 [pdf, other]

A Study on Graph-Structured Recurrent Neural Networks and Sparsification with Application to Epidemic Forecasting

Authors: Zhijian Li, Xiyang Luo, Bao Wang, Andrea L. Bertozzi, Jack Xin

Abstract: We study epidemic forecasting on real-world health data by a graph-structured recurrent neural network (GSRNN). We achieve state-of-the-art forecasting accuracy on the benchmark CDC dataset. To improve model efficiency, we sparsify the network weights via transformed-$\ell_1$ penalty and maintain prediction accuracy at the same level with 70% of the network weights being zero. We study epidemic forecasting on real-world health data by a graph-structured recurrent neural network (GSRNN). We achieve state-of-the-art forecasting accuracy on the benchmark CDC dataset. To improve model efficiency, we sparsify the network weights via transformed-$\ell_1$ penalty and maintain prediction accuracy at the same level with 70% of the network weights being zero. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1811.06321 [pdf, other]

Multivariate Spatiotemporal Hawkes Processes and Network Reconstruction

Authors: Baichuan Yuan, Hao Li, Andrea L. Bertozzi, P. Jeffrey Brantingham, Mason A. Porter

Abstract: There is often latent network structure in spatial and temporal data and the tools of network analysis can yield fascinating insights into such data. In this paper, we develop a nonparametric method for network reconstruction from spatiotemporal data sets using multivariate Hawkes processes. In contrast to prior work on network reconstruction with point-process models, which has often focused on e… ▽ More There is often latent network structure in spatial and temporal data and the tools of network analysis can yield fascinating insights into such data. In this paper, we develop a nonparametric method for network reconstruction from spatiotemporal data sets using multivariate Hawkes processes. In contrast to prior work on network reconstruction with point-process models, which has often focused on exclusively temporal information, our approach uses both temporal and spatial information and does not assume a specific parametric form of network dynamics. This leads to an effective way of recovering an underlying network. We illustrate our approach using both synthetic networks and networks constructed from real-world data sets (a location-based social media network, a narrative of crime events, and violent gang crimes). Our results demonstrate that, in comparison to using only temporal data, our spatiotemporal approach yields improved network reconstruction, providing a basis for meaningful subsequent analysis --- such as community structure and motif analysis --- of the reconstructed networks. △ Less

Submitted 15 November, 2018; originally announced November 2018.

arXiv:1809.08516 [pdf, other]

Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization

Authors: Bao Wang, Alex T. Lin, Wei Zhu, Penghang Yin, Andrea L. Bertozzi, Stanley J. Osher

Abstract: We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast… ▽ More We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we combine this data-dependent activation with total variation minimization on adversarial images and training data augmentation, we achieve an improvement in robust accuracy by 38.9$\%$ for ResNet56 under the strongest IFGSM attack. Furthermore, We provide an intuitive explanation of our defense by analyzing the geometry of the feature space. △ Less

Submitted 29 April, 2020; v1 submitted 22 September, 2018; originally announced September 2018.

Comments: 17 pages, 6 figures

MSC Class: 68Pxx

Journal ref: Inverse Problems and Imaging, 2020

arXiv:1806.02485 [pdf, other]

doi 10.1007/s00332-019-09541-8

Stochastic Block Models are a Discrete Surface Tension

Authors: Zachary M. Boyd, Mason A. Porter, Andrea L. Bertozzi

Abstract: Networks, which represent agents and interactions between them, arise in myriad applications throughout the sciences, engineering, and even the humanities. To understand large-scale structure in a network, a common task is to cluster a network's nodes into sets called "communities", such that there are dense connections within communities but sparse connections between them. A popular and statisti… ▽ More Networks, which represent agents and interactions between them, arise in myriad applications throughout the sciences, engineering, and even the humanities. To understand large-scale structure in a network, a common task is to cluster a network's nodes into sets called "communities", such that there are dense connections within communities but sparse connections between them. A popular and statistically principled method to perform such clustering is to use a family of generative models known as stochastic block models (SBMs). In this paper, we show that maximum likelihood estimation in an SBM is a network analog of a well-known continuum surface-tension problem that arises from an application in metallurgy. To illustrate the utility of this relationship, we implement network analogs of three surface-tension algorithms, with which we successfully recover planted community structure in synthetic networks and which yield fascinating insights on empirical networks that we construct from hyperspectral videos. △ Less

Submitted 24 March, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: to appear in Journal of Nonlinear Science

MSC Class: 65K10; 49M20; 35Q56; 62H30; 91C20; 91D30; 94C15

arXiv:1804.00684 [pdf, other]

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

Authors: Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discov… ▽ More We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting. △ Less

Submitted 2 April, 2018; originally announced April 2018.

Comments: 9 pages, 19 figures

MSC Class: 65-06

arXiv:1711.08833 [pdf, other]

Deep Learning for Real-Time Crime Forecasting and its Ternarization

Authors: Bao Wang, Penghang Yin, Andrea L. Bertozzi, P. Jeffrey Brantingham, Stanley J. Osher, Jack Xin

Abstract: Real-time crime forecasting is important. However, accurate prediction of when and where the next crime will happen is difficult. No known physical model provides a reasonable approximation to such a complex system. Historical crime data are sparse in both space and time and the signal of interests is weak. In this work, we first present a proper representation of crime data. We then adapt the spa… ▽ More Real-time crime forecasting is important. However, accurate prediction of when and where the next crime will happen is difficult. No known physical model provides a reasonable approximation to such a complex system. Historical crime data are sparse in both space and time and the signal of interests is weak. In this work, we first present a proper representation of crime data. We then adapt the spatial temporal residual network on the well represented data to predict the distribution of crime in Los Angeles at the scale of hours in neighborhood-sized parcels. These experiments as well as comparisons with several existing approaches to prediction demonstrate the superiority of the proposed model in terms of accuracy. Finally, we present a ternarization technique to address the resource consumption issue for its deployment in real world. This work is an extension of our short conference proceeding paper [Wang et al, Arxiv 1707.03340]. △ Less

Submitted 23 November, 2017; originally announced November 2017.

Comments: 14 pages, 7 figures

MSC Class: 62-07

arXiv:1707.09285 [pdf, other]

Simplified Energy Landscape for Modularity Using Total Variation

Authors: Zachary Boyd, Egil Bae, Xue-Cheng Tai, Andrea L. Bertozzi

Abstract: Networks capture pairwise interactions between entities and are frequently used in applications such as social networks, food networks, and protein interaction networks, to name a few. Communities, cohesive groups of nodes, often form in these applications, and identifying them gives insight into the overall organization of the network. One common quality function used to identify community struct… ▽ More Networks capture pairwise interactions between entities and are frequently used in applications such as social networks, food networks, and protein interaction networks, to name a few. Communities, cohesive groups of nodes, often form in these applications, and identifying them gives insight into the overall organization of the network. One common quality function used to identify community structure is modularity. In Hu et al. [SIAM J. App. Math., 73(6), 2013], it was shown that modularity optimization is equivalent to minimizing a particular nonconvex total variation (TV) based functional over a discrete domain. They solve this problem, assuming the number of communities is known, using a Merriman, Bence, Osher (MBO) scheme. We show that modularity optimization is equivalent to minimizing a convex TV-based functional over a discrete domain, again, assuming the number of communities is known. Furthermore, we show that modularity has no convex relaxation satisfying certain natural conditions. We therefore, find a manageable non-convex approximation using a Ginzburg Landau functional, which provably converges to the correct energy in the limit of a certain parameter. We then derive an MBO algorithm with fewer hand-tuned parameters than in Hu et al. and which is 7 times faster at solving the associated diffusion equation due to the fact that the underlying discretization is unconditionally stable. Our numerical tests include a hyperspectral video whose associated graph has 2.9x10^7 edges, which is roughly 37 times larger than was handled in the paper of Hu et al. △ Less

Submitted 2 April, 2018; v1 submitted 28 July, 2017; originally announced July 2017.

Comments: 25 pages, 3 figures, 3 tables, submitted to SIAM J. App. Math

arXiv:1707.03340 [pdf, other]

Deep Learning for Real Time Crime Forecasting

Authors: Bao Wang, Duo Zhang, Duanhao Zhang, P. Jeffery Brantingham, Andrea L. Bertozzi

Abstract: Accurate real time crime prediction is a fundamental issue for public safety, but remains a challenging problem for the scientific community. Crime occurrences depend on many complex factors. Compared to many predictable events, crime is sparse. At different spatio-temporal scales, crime distributions display dramatically different patterns. These distributions are of very low regularity in both s… ▽ More Accurate real time crime prediction is a fundamental issue for public safety, but remains a challenging problem for the scientific community. Crime occurrences depend on many complex factors. Compared to many predictable events, crime is sparse. At different spatio-temporal scales, crime distributions display dramatically different patterns. These distributions are of very low regularity in both space and time. In this work, we adapt the state-of-the-art deep learning spatio-temporal predictor, ST-ResNet [Zhang et al, AAAI, 2017], to collectively predict crime distribution over the Los Angeles area. Our models are two staged. First, we preprocess the raw crime data. This includes regularization in both space and time to enhance predictable signals. Second, we adapt hierarchical structures of residual convolutional units to train multi-factor crime prediction models. Experiments over a half year period in Los Angeles reveal highly accurate predictive power of our models. △ Less

Submitted 9 July, 2017; originally announced July 2017.

Comments: 4 pages, 6 figures, NOLTA, 2017

MSC Class: 68T05

arXiv:1703.08816 [pdf, other]

Uncertainty quantification in graph-based classification of high dimensional data

Authors: Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis

Abstract: Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distr… ▽ More Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003]. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms. △ Less

Submitted 8 February, 2018; v1 submitted 26 March, 2017; originally announced March 2017.

Comments: 33 pages, 14 figures

arXiv:1302.3913 [pdf, other]

Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

Authors: Cristina Garcia-Cardona, Ekaterina Merkurjev, Andrea L. Bertozzi, Arjuna Flenner, Allon Percus

Abstract: We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The firs… ▽ More We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms. △ Less

Submitted 17 January, 2014; v1 submitted 15 February, 2013; originally announced February 2013.

Comments: 14 pages

MSC Class: 62-XX

arXiv:1206.4969 [pdf, other]

Community detection using spectral clustering on sparse geosocial data

Authors: Yves van Gennip, Blake Hunter, Raymond Ahn, Peter Elliott, Kyle Luh, Megan Halvorson, Shannon Reid, Matt Valasik, James Wo, George E. Tita, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters i… ▽ More In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters in the graph, corresponding to communities in Hollenbeck, and compare these with the LAPD's knowledge of the individuals' gang membership. We discuss different ways of encoding the geosocial information using a graph structure and the influence on the resulting clusterings. Finally we analyze the robustness of this technique with respect to noisy and incomplete data, thereby providing suggestions about the relative importance of quantity versus quality of collected data. △ Less

Submitted 8 November, 2012; v1 submitted 21 June, 2012; originally announced June 2012.

Comments: 22 pages, 6 figures (with subfigures)

MSC Class: 62H30; 91C20; 91D30; 94C15

Showing 1–21 of 21 results for author: Bertozzi, A L