Search | arXiv e-print repository

Graph Semi-Supervised Learning for Point Classification on Data Manifolds

Authors: Caio F. Deberaldini Netto, Zhiyang Wang, Luana Ruiz

Abstract: We propose a graph semi-supervised learning framework for classification tasks on data manifolds. Motivated by the manifold hypothesis, we model data as points sampled from a low-dimensional manifold $\mathcal{M} \subset \mathbb{R}^F$. The manifold is approximated in an unsupervised manner using a variational autoencoder (VAE), where the trained encoder maps data to embeddings that represent their… ▽ More We propose a graph semi-supervised learning framework for classification tasks on data manifolds. Motivated by the manifold hypothesis, we model data as points sampled from a low-dimensional manifold $\mathcal{M} \subset \mathbb{R}^F$. The manifold is approximated in an unsupervised manner using a variational autoencoder (VAE), where the trained encoder maps data to embeddings that represent their coordinates in $\mathbb{R}^F$. A geometric graph is constructed with Gaussian-weighted edges inversely proportional to distances in the embedding space, transforming the point classification problem into a semi-supervised node classification task on the graph. This task is solved using a graph neural network (GNN). Our main contribution is a theoretical analysis of the statistical generalization properties of this data-to-manifold-to-graph pipeline. We show that, under uniform sampling from $\mathcal{M}$, the generalization gap of the semi-supervised task diminishes with increasing graph size, up to the GNN training error. Leveraging a training procedure which resamples a slightly larger graph at regular intervals during training, we then show that the generalization gap can be reduced even further, vanishing asymptotically. Finally, we validate our findings with numerical experiments on image classification benchmarks, demonstrating the empirical effectiveness of our approach. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: 26 pages

arXiv:2504.08216 [pdf, other]

Local Distance-Preserving Node Embeddings and Their Performance on Random Graphs

Authors: My Le, Luana Ruiz, Souvik Dhara

Abstract: Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as la… ▽ More Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as landmark-based algorithms, these embeddings approximate pairwise distances by computing shortest paths from a small subset of reference nodes (i.e., landmarks). Our main theoretical contribution shows that random graphs, such as Erdős-Rényi random graphs, require lower dimensions in landmark-based embeddings compared to worst-case graphs. Empirically, we demonstrate that the GNN-based approximations for the distances to landmarks generalize well to larger networks, offering a scalable alternative for graph representation learning. △ Less

Submitted 10 April, 2025; originally announced April 2025.

arXiv:2311.10610 [pdf, ps, other]

A Poincaré Inequality and Consistency Results for Signal Sampling on Large Graphs

Authors: Thien Le, Luana Ruiz, Stefanie Jegelka

Abstract: Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In th… ▽ More Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this paper, we introduce a signal sampling theory for a type of graph limit -- the graphon. We prove a Poincaré inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks. △ Less

Submitted 9 October, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 23 pages

arXiv:2310.10953 [pdf, other]

A Local Graph Limits Perspective on Sampling-Based GNNs

Authors: Yeganeh Alimohammadi, Luana Ruiz, Amin Saberi

Abstract: We propose a theoretical framework for training Graph Neural Networks (GNNs) on large input graphs via training on small, fixed-size sampled subgraphs. This framework is applicable to a wide range of models, including popular sampling-based GNNs, such as GraphSAGE and FastGCN. Leveraging the theory of graph local limits, we prove that, under mild assumptions, parameters learned from training sampl… ▽ More We propose a theoretical framework for training Graph Neural Networks (GNNs) on large input graphs via training on small, fixed-size sampled subgraphs. This framework is applicable to a wide range of models, including popular sampling-based GNNs, such as GraphSAGE and FastGCN. Leveraging the theory of graph local limits, we prove that, under mild assumptions, parameters learned from training sampling-based GNNs on small samples of a large input graph are within an $ε$-neighborhood of the outcome of training the same architecture on the whole graph. We derive bounds on the number of samples, the size of the graph, and the training steps required as a function of $ε$. Our results give a novel theoretical understanding for using sampling in training GNNs. They also suggest that by training GNNs on small samples of the input graph, practitioners can identify and select the best models, hyperparameters, and sampling algorithms more efficiently. We empirically illustrate our results on a node classification task on large citation graphs, observing that sampling-based GNNs trained on local subgraphs 12$\times$ smaller than the original graph achieve comparable performance to those trained on the input graph. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2301.10808 [pdf, other]

Graph Neural Tangent Kernel: Convergence on Large Graphs

Authors: Sanjukta Krishnagopal, Luana Ruiz

Abstract: Graph neural networks (GNNs) achieve remarkable performance in graph machine learning tasks but can be hard to train on large-graph data, where their learning dynamics are not well understood. We investigate the training dynamics of large-graph GNNs using graph neural tangent kernels (GNTKs) and graphons. In the limit of large width, optimization of an overparametrized NN is equivalent to kernel r… ▽ More Graph neural networks (GNNs) achieve remarkable performance in graph machine learning tasks but can be hard to train on large-graph data, where their learning dynamics are not well understood. We investigate the training dynamics of large-graph GNNs using graph neural tangent kernels (GNTKs) and graphons. In the limit of large width, optimization of an overparametrized NN is equivalent to kernel regression on the NTK. Here, we investigate how the GNTK evolves as another independent dimension is varied: the graph size. We use graphons to define limit objects -- graphon NNs for GNNs, and graphon NTKs for GNTKs -- , and prove that, on a sequence of graphs, the GNTKs converge to the graphon NTK. We further prove that the spectrum of the GNTK, which is related to the directions of fastest learning which becomes relevant during early stopping, converges to the spectrum of the graphon NTK. This implies that in the large-graph limit, the GNTK fitted on a graph of moderate size can be used to solve the same task on the large graph, and to infer the learning dynamics of the large-graph GNN. These results are verified empirically on node regression and classification tasks. △ Less

Submitted 31 May, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: Accepted in the 40th International Conference on Machine Learning (ICML), Honolulu, Hawaii

arXiv:2008.01767 [pdf, other]

Graph Neural Networks: Architectures, Stability and Transferability

Authors: Luana Ruiz, Fernando Gama, Alejandro Ribeiro

Abstract: Graph Neural Networks (GNNs) are information processing architectures for signals supported on graphs. They are presented here as generalizations of convolutional neural networks (CNNs) in which individual layers contain banks of graph convolutional filters instead of banks of classical convolutional filters. Otherwise, GNNs operate as CNNs. Filters are composed with pointwise nonlinearities and s… ▽ More Graph Neural Networks (GNNs) are information processing architectures for signals supported on graphs. They are presented here as generalizations of convolutional neural networks (CNNs) in which individual layers contain banks of graph convolutional filters instead of banks of classical convolutional filters. Otherwise, GNNs operate as CNNs. Filters are composed with pointwise nonlinearities and stacked in layers. It is shown that GNN architectures exhibit equivariance to permutation and stability to graph deformations. These properties help explain the good performance of GNNs that can be observed empirically. It is also shown that if graphs converge to a limit object, a graphon, GNNs converge to a corresponding limit object, a graphon neural network. This convergence justifies the transferability of GNNs across networks with different number of nodes. Concepts are illustrated by the application of GNNs to recommendation systems, decentralized collaborative control, and wireless communication networks. △ Less

Submitted 29 January, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: Accepted for publication in the Proceedings of the IEEE

arXiv:2008.01175 [pdf, other]

doi 10.1016/j.eswa.2021.114971

Identifying meaningful clusters in malware data

Authors: Renato Cordeiro de Amorim, Carlos David Lopez Ruiz

Abstract: Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to no… ▽ More Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage. In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average silhouette width. △ Less

Submitted 31 July, 2020; originally announced August 2020.

arXiv:2006.03548 [pdf, other]

Graphon Neural Networks and the Transferability of Graph Neural Networks

Authors: Luana Ruiz, Luiz F. O. Chamon, Alejandro Ribeiro

Abstract: Graph neural networks (GNNs) rely on graph convolutions to extract local features from network data. These graph convolutions combine information from adjacent nodes using coefficients that are shared across all nodes. Since these coefficients are shared and do not depend on the graph, one can envision using the same coefficients to define a GNN on another graph. This motivates analyzing the trans… ▽ More Graph neural networks (GNNs) rely on graph convolutions to extract local features from network data. These graph convolutions combine information from adjacent nodes using coefficients that are shared across all nodes. Since these coefficients are shared and do not depend on the graph, one can envision using the same coefficients to define a GNN on another graph. This motivates analyzing the transferability of GNNs across graphs. In this paper we introduce graphon NNs as limit objects of GNNs and prove a bound on the difference between the output of a GNN and its limit graphon-NN. This bound vanishes with growing number of nodes if the graph convolutional filters are bandlimited in the graph spectral domain. This result establishes a tradeoff between discriminability and transferability of GNNs. △ Less

Submitted 20 October, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

arXiv:2003.01795 [pdf, other]

Graphon Pooling in Graph Neural Networks

Authors: Alejandro Parada-Mayorga, Luana Ruiz, Alejandro Ribeiro

Abstract: Graph neural networks (GNNs) have been used effectively in different applications involving the processing of signals on irregular structures modeled by graphs. Relying on the use of shift-invariant graph filters, GNNs extend the operation of convolution to graphs. However, the operations of pooling and sampling are still not clearly defined and the approaches proposed in the literature either mod… ▽ More Graph neural networks (GNNs) have been used effectively in different applications involving the processing of signals on irregular structures modeled by graphs. Relying on the use of shift-invariant graph filters, GNNs extend the operation of convolution to graphs. However, the operations of pooling and sampling are still not clearly defined and the approaches proposed in the literature either modify the graph structure in a way that does not preserve its spectral properties, or require defining a policy for selecting which nodes to keep. In this work, we propose a new strategy for pooling and sampling on GNNs using graphons which preserves the spectral properties of the graph. To do so, we consider the graph layers in a GNN as elements of a sequence of graphs that converge to a graphon. In this way we have no ambiguity in the node labeling when mapping signals from one layer to the other and a spectral representation that is consistent throughout the layers. We evaluate this strategy in a synthetic and a real-world numerical experiment where we show that graphon pooling GNNs are less prone to overfitting and improve upon other pooling techniques, especially when the dimensionality reduction ratios between layers is large. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:1907.10235 [pdf, other]

doi 10.1145/3292500.3330783

Predicting Different Types of Conversions with Multi-Task Learning in Online Advertising

Authors: Junwei Pan, Yizhi Mao, Alfonso Lobos Ruiz, Yu Sun, Aaron Flores

Abstract: Conversion prediction plays an important role in online advertising since Cost-Per-Action (CPA) has become one of the primary campaign performance objectives in the industry. Unlike click prediction, conversions have different types in nature, and each type may be associated with different decisive factors. In this paper, we formulate conversion prediction as a multi-task learning problem, so that… ▽ More Conversion prediction plays an important role in online advertising since Cost-Per-Action (CPA) has become one of the primary campaign performance objectives in the industry. Unlike click prediction, conversions have different types in nature, and each type may be associated with different decisive factors. In this paper, we formulate conversion prediction as a multi-task learning problem, so that the prediction models for different types of conversions can be learned together. These models share feature representations, but have their specific parameters, providing the benefit of information-sharing across all tasks. We then propose Multi-Task Field-weighted Factorization Machine (MT-FwFM) to solve these tasks jointly. Our experiment results show that, compared with two state-of-the-art models, MT-FwFM improve the AUC by 0.74% and 0.84% on two conversion types, and the weighted AUC across all conversion types is also improved by 0.50%. △ Less

Submitted 8 March, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

Comments: SIGKDD

Journal ref: SIGKDD 2019

arXiv:1903.01888 [pdf, ps, other]

Gated Graph Convolutional Recurrent Neural Networks

Authors: Luana Ruiz, Fernando Gama, Alejandro Ribeiro

Abstract: Graph processes model a number of important problems such as identifying the epicenter of an earthquake or predicting weather. In this paper, we propose a Graph Convolutional Recurrent Neural Network (GCRNN) architecture specifically tailored to deal with these problems. GCRNNs use convolutional filter banks to keep the number of trainable parameters independent of the size of the graph and of the… ▽ More Graph processes model a number of important problems such as identifying the epicenter of an earthquake or predicting weather. In this paper, we propose a Graph Convolutional Recurrent Neural Network (GCRNN) architecture specifically tailored to deal with these problems. GCRNNs use convolutional filter banks to keep the number of trainable parameters independent of the size of the graph and of the time sequences considered. We also put forward Gated GCRNNs, a time-gated variation of GCRNNs akin to LSTMs. When compared with GNNs and another graph recurrent architecture in experiments using both synthetic and real-word data, GCRNNs significantly improve performance while using considerably less parameters. △ Less

Submitted 27 June, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

Comments: Accepted at EUSIPCO 2019

arXiv:1810.12165 [pdf, other]

Median activation functions for graph neural networks

Authors: Luana Ruiz, Fernando Gama, Antonio G. Marques, Alejandro Ribeiro

Abstract: Graph neural networks (GNNs) have been shown to replicate convolutional neural networks' (CNNs) superior performance in many problems involving graphs. By replacing regular convolutions with linear shift-invariant graph filters (LSI-GFs), GNNs take into account the (irregular) structure of the graph and provide meaningful representations of network data. However, LSI-GFs fail to encode local nonli… ▽ More Graph neural networks (GNNs) have been shown to replicate convolutional neural networks' (CNNs) superior performance in many problems involving graphs. By replacing regular convolutions with linear shift-invariant graph filters (LSI-GFs), GNNs take into account the (irregular) structure of the graph and provide meaningful representations of network data. However, LSI-GFs fail to encode local nonlinear graph signal behavior, and so do regular activation functions, which are nonlinear but pointwise. To address this issue, we propose median activation functions with support on graph neighborhoods instead of individual nodes. A GNN architecture with a trainable multirresolution version of this activation function is then tested on synthetic and real-word datasets, where we show that median activation functions can improve GNN capacity with marginal increase in complexity. △ Less

Submitted 11 February, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: Submitted to ICASSP 2019

arXiv:1806.03514 [pdf, other]

doi 10.1145/3178876.3186040

Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising

Authors: Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, Quan Lu

Abstract: Click-through rate (CTR) prediction is a critical task in online display advertising. The data involved in CTR prediction are typically multi-field categorical data, i.e., every feature is categorical and belongs to one and only one field. One of the interesting characteristics of such data is that features from one field often interact differently with features from different other fields. Recent… ▽ More Click-through rate (CTR) prediction is a critical task in online display advertising. The data involved in CTR prediction are typically multi-field categorical data, i.e., every feature is categorical and belongs to one and only one field. One of the interesting characteristics of such data is that features from one field often interact differently with features from different other fields. Recently, Field-aware Factorization Machines (FFMs) have been among the best performing models for CTR prediction by explicitly modeling such difference. However, the number of parameters in FFMs is in the order of feature number times field number, which is unacceptable in the real-world production systems. In this paper, we propose Field-weighted Factorization Machines (FwFMs) to model the different feature interactions between different fields in a much more memory-efficient way. Our experimental evaluations show that FwFMs can achieve competitive prediction performance with only as few as 4% parameters of FFMs. When using the same number of parameters, FwFMs can bring 0.92% and 0.47% AUC lift over FFMs on two real CTR prediction data sets. △ Less

Submitted 8 March, 2020; v1 submitted 9 June, 2018; originally announced June 2018.

Showing 1–13 of 13 results for author: Ruiz, L