-
A State of the Art on Recent Progress and Emerging Challenges on Energy Transfer Between Vibrating Modes Under an External Mechanical Force With Time-Varying Frequency From 2020 to 2025
Authors:
Jose Manoel Balthazar,
Jorge Luis Palacios Felix,
Mauricio A. Ribeiro,
Angelo Marcelo Tusset,
Jeferson Jose de Lima,
Vinicius Piccirillo,
Julijana Simonovic,
Nikola D. Nevsic,
Marcos Varanis,
Clivaldo de Oliveira,
Raphaela C. Machado,
Gabriella O M Oliveira
Abstract:
In this paper, we discuss an example of current importance with a future perspective in engineering, in which excitation sources always have limited power, limited inertia, and their frequencies vary according to the instantaneous state of the vibrating system. Practical examples of non-ideal systems are considered. The most common phenomenon for this kind of system is discussed. The period consid…
▽ More
In this paper, we discuss an example of current importance with a future perspective in engineering, in which excitation sources always have limited power, limited inertia, and their frequencies vary according to the instantaneous state of the vibrating system. Practical examples of non-ideal systems are considered. The most common phenomenon for this kind of system is discussed. The period considered is from 2020 to 2025. The specific properties of various models are also discussed. Directions for future investigations are provided. In this paper, the authors revisited some publications based on the assumption that the external excitations are produced by non-ideal sources (RNIS), that is, with limited power supply. Among these applications, nonlinear phenomena such as the Sommerfeld effect and saturation phenomenon were observed, considering fractional damping. Energy harvesters and the Jacobi-Anger expansion were used in the governing equations of motion. We also used the Jacobi-Anger expansion in the case of energy transfer between vibrating modes under an external force with time-varying frequency, which represents one of the future directions of research on non-ideal vibrating systems (RNIS).
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Alignment of large language models with constrained learning
Authors:
Botong Zhang,
Shuo Li,
Ignacio Hounie,
Osbert Bastani,
Dongsheng Ding,
Alejandro Ribeiro
Abstract:
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem, where the goal is to maximize a primary reward objective while satisfying constraints on secondary utilities. Despite the popularity of Lagrangian-based LLM policy search in constrained alignment, iterative primal-dual methods often fail to converge, and non-iterative dual-based meth…
▽ More
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem, where the goal is to maximize a primary reward objective while satisfying constraints on secondary utilities. Despite the popularity of Lagrangian-based LLM policy search in constrained alignment, iterative primal-dual methods often fail to converge, and non-iterative dual-based methods do not achieve optimality in the LLM parameter space. To address these challenges, we employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the LLM policy via Lagrangian maximization and updating the dual variable via dual descent. In theory, we characterize the primal-dual gap between the primal value in the distribution space and the dual value in the LLM parameter space. We further quantify the optimality gap of the learned LLM policies at near-optimal dual variables with respect to both the objective and the constraint functions. These results prove that dual-based alignment methods can find an optimal constrained LLM policy, up to an LLM parametrization gap. We demonstrate the effectiveness and merits of our approach through extensive experiments conducted on the PKU-SafeRLHF dataset.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Wireless Link Scheduling with State-Augmented Graph Neural Networks
Authors:
Romina Garcia Camargo,
Zhiyang Wang,
Navid NaderiAlizadeh,
Alejandro Ribeiro
Abstract:
We consider the problem of optimal link scheduling in large-scale wireless ad hoc networks. We specifically aim for the maximum long-term average performance, subject to a minimum transmission requirement for each link to ensure fairness. With a graph structure utilized to represent the conflicts of links, we formulate a constrained optimization problem to learn the scheduling policy, which is par…
▽ More
We consider the problem of optimal link scheduling in large-scale wireless ad hoc networks. We specifically aim for the maximum long-term average performance, subject to a minimum transmission requirement for each link to ensure fairness. With a graph structure utilized to represent the conflicts of links, we formulate a constrained optimization problem to learn the scheduling policy, which is parameterized with a graph neural network (GNN). To address the challenge of long-term performance, we use the state-augmentation technique. In particular, by augmenting the Lagrangian dual variables as dynamic inputs to the scheduling policy, the GNN can be trained to gradually adapt the scheduling decisions to achieve the minimum transmission requirements. We verify the efficacy of our proposed policy through numerical simulations and compare its performance with several baselines in various network settings.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Generative Diffusion Models for Resource Allocation in Wireless Networks
Authors:
Yigit Berkay Uslu,
Samar Hadou,
Shirin Saeedi Bidokhti,
Alejandro Ribeiro
Abstract:
This paper proposes a supervised training algorithm for learning stochastic resource allocation policies with generative diffusion models (GDMs). We formulate the allocation problem as the maximization of an ergodic utility function subject to ergodic Quality of Service (QoS) constraints. Given samples from a stochastic expert policy that yields a near-optimal solution to the problem, we train a G…
▽ More
This paper proposes a supervised training algorithm for learning stochastic resource allocation policies with generative diffusion models (GDMs). We formulate the allocation problem as the maximization of an ergodic utility function subject to ergodic Quality of Service (QoS) constraints. Given samples from a stochastic expert policy that yields a near-optimal solution to the problem, we train a GDM policy to imitate the expert and generate new samples from the optimal distribution. We achieve near-optimal performance through sequential execution of the generated samples. To enable generalization to a family of network configurations, we parameterize the backward diffusion process with a graph neural network (GNN) architecture. We present numerical results in a case study of power control in multi-user interference networks.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
A CNN-based Local-Global Self-Attention via Averaged Window Embeddings for Hierarchical ECG Analysis
Authors:
Arthur Buzelin,
Pedro Robles Dutenhefner,
Turi Rezende,
Luisa G. Porfirio,
Pedro Bento,
Yan Aquino,
Jose Fernandes,
Caio Santana,
Gabriela Miana,
Gisele L. Pappa,
Antonio Ribeiro,
Wagner Meira Jr
Abstract:
Cardiovascular diseases remain the leading cause of global mortality, emphasizing the critical need for efficient diagnostic tools such as electrocardiograms (ECGs). Recent advancements in deep learning, particularly transformers, have revolutionized ECG analysis by capturing detailed waveform features as well as global rhythm patterns. However, traditional transformers struggle to effectively cap…
▽ More
Cardiovascular diseases remain the leading cause of global mortality, emphasizing the critical need for efficient diagnostic tools such as electrocardiograms (ECGs). Recent advancements in deep learning, particularly transformers, have revolutionized ECG analysis by capturing detailed waveform features as well as global rhythm patterns. However, traditional transformers struggle to effectively capture local morphological features that are critical for accurate ECG interpretation. We propose a novel Local-Global Attention ECG model (LGA-ECG) to address this limitation, integrating convolutional inductive biases with global self-attention mechanisms. Our approach extracts queries by averaging embeddings obtained from overlapping convolutional windows, enabling fine-grained morphological analysis, while simultaneously modeling global context through attention to keys and values derived from the entire sequence. Experiments conducted on the CODE-15 dataset demonstrate that LGA-ECG outperforms state-of-the-art models and ablation studies validate the effectiveness of the local-global attention strategy. By capturing the hierarchical temporal dependencies and morphological patterns in ECG signals, this new design showcases its potential for clinical deployment with robust automated ECG classification.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Offshore Wind Turbine Tower Design and Optimization: A Review and AI-Driven Future Directions
Authors:
João Alves Ribeiro,
Bruno Alves Ribeiro,
Francisco Pimenta,
Sérgio M. O. Tavares,
Jie Zhang,
Faez Ahmed
Abstract:
Offshore wind energy leverages the high intensity and consistency of oceanic winds, playing a key role in the transition to renewable energy. As energy demands grow, larger turbines are required to optimize power generation and reduce the Levelized Cost of Energy (LCoE), which represents the average cost of electricity over a project's lifetime. However, upscaling turbines introduces engineering c…
▽ More
Offshore wind energy leverages the high intensity and consistency of oceanic winds, playing a key role in the transition to renewable energy. As energy demands grow, larger turbines are required to optimize power generation and reduce the Levelized Cost of Energy (LCoE), which represents the average cost of electricity over a project's lifetime. However, upscaling turbines introduces engineering challenges, particularly in the design of supporting structures, especially towers. These towers must support increased loads while maintaining structural integrity, cost-efficiency, and transportability, making them essential to offshore wind projects' success. This paper presents a comprehensive review of the latest advancements, challenges, and future directions driven by Artificial Intelligence (AI) in the design optimization of Offshore Wind Turbine (OWT) structures, with a focus on towers. It provides an in-depth background on key areas such as design types, load types, analysis methods, design processes, monitoring systems, Digital Twin (DT), software, standards, reference turbines, economic factors, and optimization techniques. Additionally, it includes a state-of-the-art review of optimization studies related to tower design optimization, presenting a detailed examination of turbine, software, loads, optimization method, design variables and constraints, analysis, and findings, motivating future research to refine design approaches for effective turbine upscaling and improved efficiency. Lastly, the paper explores future directions where AI can revolutionize tower design optimization, enabling the development of efficient, scalable, and sustainable structures. By addressing the upscaling challenges and supporting the growth of renewable energy, this work contributes to shaping the future of offshore wind turbine towers and others supporting structures.
△ Less
Submitted 28 December, 2024;
originally announced February 2025.
-
Explainable Brain Age Gap Prediction in Neurodegenerative Conditions using coVariance Neural Networks
Authors:
Saurabh Sihag,
Gonzalo Mateos,
Alejandro Ribeiro
Abstract:
Brain age is the estimate of biological age derived from neuroimaging datasets using machine learning algorithms. Increasing \textit{brain age gap} characterized by an elevated brain age relative to the chronological age can reflect increased vulnerability to neurodegeneration and cognitive decline. Hence, brain age gap is a promising biomarker for monitoring brain health. However, black-box machi…
▽ More
Brain age is the estimate of biological age derived from neuroimaging datasets using machine learning algorithms. Increasing \textit{brain age gap} characterized by an elevated brain age relative to the chronological age can reflect increased vulnerability to neurodegeneration and cognitive decline. Hence, brain age gap is a promising biomarker for monitoring brain health. However, black-box machine learning approaches to brain age gap prediction have limited practical utility. Recent studies on coVariance neural networks (VNN) have proposed a relatively transparent deep learning pipeline for neuroimaging data analyses, which possesses two key features: (i) inherent \textit{anatomically interpretablity} of derived biomarkers; and (ii) a methodologically interpretable perspective based on \textit{linkage with eigenvectors of anatomic covariance matrix}. In this paper, we apply the VNN-based approach to study brain age gap using cortical thickness features for various prevalent neurodegenerative conditions. Our results reveal distinct anatomic patterns for brain age gap in Alzheimer's disease, frontotemporal dementia, and atypical Parkinsonian disorders. Furthermore, we demonstrate that the distinct anatomic patterns of brain age gap are linked with the differences in how VNN leverages the eigenspectrum of the anatomic covariance matrix, thus lending explainability to the reported results.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Convolutional Filtering with RKHS Algebras
Authors:
Alejandro Parada-Mayorga,
Leopoldo Agorio,
Alejandro Ribeiro,
Juan Bazerque
Abstract:
In this paper, we develop a generalized theory of convolutional signal processing and neural networks for Reproducing Kernel Hilbert Spaces (RKHS). Leveraging the theory of algebraic signal processing (ASP), we show that any RKHS allows the formal definition of multiple algebraic convolutional models. We show that any RKHS induces algebras whose elements determine convolutional operators acting on…
▽ More
In this paper, we develop a generalized theory of convolutional signal processing and neural networks for Reproducing Kernel Hilbert Spaces (RKHS). Leveraging the theory of algebraic signal processing (ASP), we show that any RKHS allows the formal definition of multiple algebraic convolutional models. We show that any RKHS induces algebras whose elements determine convolutional operators acting on RKHS elements. This approach allows us to achieve scalable filtering and learning as a byproduct of the convolutional model, and simultaneously take advantage of the well-known benefits of processing information in an RKHS. To emphasize the generality and usefulness of our approach, we show how algebraic RKHS can be used to define convolutional signal models on groups, graphons, and traditional Euclidean signal spaces. Furthermore, using algebraic RKHS models, we build convolutional networks, formally defining the notion of pointwise nonlinearities and deriving explicit expressions for the training. Such derivations are obtained in terms of the algebraic representation of the RKHS. We present a set of numerical experiments on real data in which wireless coverage is predicted from measurements captured by unmaned aerial vehicles. This particular real-life scenario emphasizes the benefits of the convolutional RKHS models in neural networks compared to fully connected and standard convolutional operators.
△ Less
Submitted 1 June, 2025; v1 submitted 2 November, 2024;
originally announced November 2024.
-
Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning
Authors:
Shreyas Muthusamy,
Damian Owerko,
Charilaos I. Kanatsoulis,
Saurav Agarwal,
Alejandro Ribeiro
Abstract:
Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance, aiming to minimize the total distance traveled. The problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions o…
▽ More
Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance, aiming to minimize the total distance traveled. The problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets. This scenario combines elements of combinatorial assignment and continuous-space motion planning, posing significant scalability challenges for traditional centralized approaches. To overcome these challenges, we propose a decentralized policy learned via a Graph Neural Network (GNN). The GNN enables robots to determine (1) what information to communicate to neighbors and (2) how to integrate received information with local observations for decision-making. We train the GNN using imitation learning with the centralized Hungarian algorithm as the expert policy, and further fine-tune it using reinforcement learning to avoid collisions and enhance performance. Extensive empirical evaluations demonstrate the scalability and effectiveness of our approach. The GNN policy trained on 100 robots generalizes to scenarios with up to 500 robots, outperforming state-of-the-art solutions by 8.6\% on average and significantly surpassing greedy decentralized methods. This work lays the foundation for solving multi-robot coordination problems in settings where scalability is important.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Constrained Learning for Decentralized Multi-Objective Coverage Control
Authors:
Juan Cervino,
Saurav Agarwal,
Vijay Kumar,
Alejandro Ribeiro
Abstract:
The multi-objective coverage control problem requires a robot swarm to collaboratively provide sensor coverage to multiple heterogeneous importance density fields IDFs simultaneously. We pose this as an optimization problem with constraints and study two different formulations: (1) Fair coverage, where we minimize the maximum coverage cost for any field, promoting equitable resource distribution a…
▽ More
The multi-objective coverage control problem requires a robot swarm to collaboratively provide sensor coverage to multiple heterogeneous importance density fields IDFs simultaneously. We pose this as an optimization problem with constraints and study two different formulations: (1) Fair coverage, where we minimize the maximum coverage cost for any field, promoting equitable resource distribution among all fields; and (2) Constrained coverage, where each field must be covered below a certain cost threshold, ensuring that critical areas receive adequate coverage according to predefined importance levels. We study the decentralized setting where robots have limited communication and local sensing capabilities, making the system more realistic, scalable, and robust. Given the complexity, we propose a novel decentralized constrained learning approach that combines primal-dual optimization with a Learnable Perception-Action-Communication (LPAC) neural network architecture. We show that the Lagrangian of the dual problem can be reformulated as a linear combination of the IDFs, enabling the LPAC policy to serve as a primal solver. We empirically demonstrate that the proposed method (i) significantly outperforms state-of-the-art decentralized controllers by 30% on average in terms of coverage cost, (ii) transfers well to larger environments with more robots, and (iii) scalable in the number of IDFs and robots in the swarm.
△ Less
Submitted 13 March, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Generalization of Geometric Graph Neural Networks
Authors:
Zhiyang Wang,
Juan Cervino,
Alejandro Ribeiro
Abstract:
In this paper, we study the generalization capabilities of geometric graph neural networks (GNNs). We consider GNNs over a geometric graph constructed from a finite set of randomly sampled points over an embedded manifold with topological information captured. We prove a generalization gap between the optimal empirical risk and the optimal statistical risk of this GNN, which decreases with the num…
▽ More
In this paper, we study the generalization capabilities of geometric graph neural networks (GNNs). We consider GNNs over a geometric graph constructed from a finite set of randomly sampled points over an embedded manifold with topological information captured. We prove a generalization gap between the optimal empirical risk and the optimal statistical risk of this GNN, which decreases with the number of sampled points from the manifold and increases with the dimension of the underlying manifold. This generalization gap ensures that the GNN trained on a graph on a set of sampled points can be utilized to process other unseen graphs constructed from the same underlying manifold. The most important observation is that the generalization capability can be realized with one large graph instead of being limited to the size of the graph as in previous results. The generalization gap is derived based on the non-asymptotic convergence result of a GNN on the sampled graph to the underlying manifold neural networks (MNNs). We verify this theoretical result with experiments on both Arxiv dataset and Cora dataset.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Generalization of Graph Neural Networks is Robust to Model Mismatch
Authors:
Zhiyang Wang,
Juan Cervino,
Alejandro Ribeiro
Abstract:
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities. However, the current analysis of GNN generalization relies on the assumption that training and testing data are independent and identically distributed (i.i.d). This imposes limitations on the cases where a model mismatch exists when generating testing data. In this p…
▽ More
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities. However, the current analysis of GNN generalization relies on the assumption that training and testing data are independent and identically distributed (i.i.d). This imposes limitations on the cases where a model mismatch exists when generating testing data. In this paper, we examine GNNs that operate on geometric graphs generated from manifold models, explicitly focusing on scenarios where there is a mismatch between manifold models generating training and testing data. Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch. This indicates that GNNs trained on graphs generated from a manifold can still generalize well to unseen nodes and graphs generated from a mismatched manifold. We attribute this mismatch to both node feature perturbations and edge perturbations within the generated graph. Our findings indicate that the generalization gap decreases as the number of nodes grows in the training graph while increasing with larger manifold dimension as well as larger mismatch. Importantly, we observe a trade-off between the generalization of GNNs and the capability to discriminate high-frequency components when facing a model mismatch. The most important practical consequence of this analysis is to shed light on the filter design of generalizable GNNs robust to model mismatch. We verify our theoretical findings with experiments on multiple real-world datasets.
△ Less
Submitted 10 September, 2024; v1 submitted 25 August, 2024;
originally announced August 2024.
-
Distributed Training of Large Graph Neural Networks with Variable Communication Rates
Authors:
Juan Cervino,
Md Asadullah Turja,
Hesham Mostafa,
Nageen Himayat,
Alejandro Ribeiro
Abstract:
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. However, as the graph cannot generally be decomposed into small non-interacting components, data communication between the traini…
▽ More
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. However, as the graph cannot generally be decomposed into small non-interacting components, data communication between the training machines quickly limits training speeds. Compressing the communicated node activations by a fixed amount improves the training speeds, but lowers the accuracy of the trained GNN. In this paper, we introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model. Based on our theoretical analysis, we derive a variable compression method that converges to a solution equivalent to the full communication case, for all graph partitioning schemes. Our empirical results show that our method attains a comparable performance to the one obtained with full communication. We outperform full communication at any fixed compression ratio for any communication budget.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Learning to Slice Wi-Fi Networks: A State-Augmented Primal-Dual Approach
Authors:
Yiğit Berkay Uslu,
Roya Doostnejad,
Alejandro Ribeiro,
Navid NaderiAlizadeh
Abstract:
Network slicing is a key feature in 5G/NG cellular networks that creates customized slices for different service types with various quality-of-service (QoS) requirements, which can achieve service differentiation and guarantee service-level agreement (SLA) for each service type. In Wi-Fi networks, there is limited prior work on slicing, and a potential solution is based on a multi-tenant architect…
▽ More
Network slicing is a key feature in 5G/NG cellular networks that creates customized slices for different service types with various quality-of-service (QoS) requirements, which can achieve service differentiation and guarantee service-level agreement (SLA) for each service type. In Wi-Fi networks, there is limited prior work on slicing, and a potential solution is based on a multi-tenant architecture on a single access point (AP) that dedicates different channels to different slices. In this paper, we define a flexible, constrained learning framework to enable slicing in Wi-Fi networks subject to QoS requirements. We specifically propose an unsupervised learning-based network slicing method that leverages a state-augmented primal-dual algorithm, where a neural network policy is trained offline to optimize a Lagrangian function and the dual variable dynamics are updated online in the execution phase. We show that state augmentation is crucial for generating slicing decisions that meet the ergodic QoS requirements.
△ Less
Submitted 27 January, 2025; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks
Authors:
Xingran Chen,
Navid NaderiAlizadeh,
Alejandro Ribeiro,
Shirin Saeedi Bidokhti
Abstract:
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a multi-hop wireless network with statistically-identical agents. Agents cache the most recent samples from others and communicate over wireless collision channels governed by an underlying graph topology. Our goal is to minimize time-average estimation error and/or age of information with decentra…
▽ More
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a multi-hop wireless network with statistically-identical agents. Agents cache the most recent samples from others and communicate over wireless collision channels governed by an underlying graph topology. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies, considering both oblivious (where decision-making is independent of the physical processes) and non-oblivious policies (where decision-making depends on physical processes). We prove that in oblivious policies, minimizing estimation error is equivalent to minimizing the age of information. The complexity of the problem, especially the multi-dimensional action spaces and arbitrary network topologies, makes theoretical methods for finding optimal transmission policies intractable. We optimize the policies using a graphical multi-agent reinforcement learning framework, where each agent employs a permutation-equivariant graph neural network architecture. Theoretically, we prove that our proposed framework exhibits desirable transferability properties, allowing transmission policies trained on small- or moderate-size networks to be executed effectively on large-scale topologies. Numerical experiments demonstrate that (i) Our proposed framework outperforms state-of-the-art baselines; (ii) The trained policies are transferable to larger networks, and their performance gains increase with the number of agents; (iii) The training procedure withstands non-stationarity even if we utilize independent learning techniques; and, (iv) Recurrence is pivotal in both independent learning and centralized training and decentralized execution, and improves the resilience to non-stationarity in independent learning.
△ Less
Submitted 8 March, 2025; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Near-Optimal Solutions of Constrained Learning Problems
Authors:
Juan Elenter,
Luiz F. O. Chamon,
Alejandro Ribeiro
Abstract:
With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual a…
▽ More
With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual ascent algorithms. Yet, though these algorithms converge in objective value, even in non-convex settings, they cannot guarantee that their outcome is feasible. Doing so requires randomizing over all iterates, which is impractical in virtually any modern applications. Still, final iterates have been observed to perform well in practice. In this work, we address this gap between theory and practice by characterizing the constraint violation of Lagrangian minimizers associated with optimal dual variables, despite lack of convexity. To do this, we leverage the fact that non-convex, finite-dimensional constrained learning problems can be seen as parametrizations of convex, functional problems. Our results show that rich parametrizations effectively mitigate the issue of feasibility in dual methods, shedding light on prior empirical successes of dual learning. We illustrate our findings in fair learning tasks.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Sampling and Uniqueness Sets in Graphon Signal Processing
Authors:
Alejandro Parada-Mayorga,
Alejandro Ribeiro
Abstract:
In this work, we study the properties of sampling sets on families of large graphs by leveraging the theory of graphons and graph limits. To this end, we extend to graphon signals the notion of removable and uniqueness sets, which was developed originally for the analysis of signals on graphs. We state the formal definition of a $Λ-$removable set and conditions under which a bandlimited graphon si…
▽ More
In this work, we study the properties of sampling sets on families of large graphs by leveraging the theory of graphons and graph limits. To this end, we extend to graphon signals the notion of removable and uniqueness sets, which was developed originally for the analysis of signals on graphs. We state the formal definition of a $Λ-$removable set and conditions under which a bandlimited graphon signal can be represented in a unique way when its samples are obtained from the complement of a given $Λ-$removable set in the graphon. By leveraging such results we show that graphon representations of graphs and graph signals can be used as a common framework to compare sampling sets between graphs with different numbers of nodes and edges, and different node labelings. Additionally, given a sequence of graphs that converges to a graphon, we show that the sequences of sampling sets whose graphon representation is identical in $[0,1]$ are convergent as well. We exploit the convergence results to provide an algorithm that obtains approximately close to optimal sampling sets. Performing a set of numerical experiments, we evaluate the quality of these sampling sets. Our results open the door for the efficient computation of optimal sampling sets in graphs of large size.
△ Less
Submitted 1 June, 2025; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Reply to 'Comments on Graphon Signal Processing' [arXiv:2310.14683]
Authors:
Luana Ruiz,
Luiz F. O. Chamon,
Alejandro Ribeiro
Abstract:
This technical note addresses an issue [arXiv:2310.14683] with the proof (but not the statement) of [arXiv:2003.05030, Proposition 4]. The statement of the proposition is correct, but the proof as written in [arXiv:2003.05030] is not and due to a typo in the manuscript, a reference to the correct proof is effectively missing. In the sequel, we present [arXiv:2003.05030, Proposition 4] and its proo…
▽ More
This technical note addresses an issue [arXiv:2310.14683] with the proof (but not the statement) of [arXiv:2003.05030, Proposition 4]. The statement of the proposition is correct, but the proof as written in [arXiv:2003.05030] is not and due to a typo in the manuscript, a reference to the correct proof is effectively missing. In the sequel, we present [arXiv:2003.05030, Proposition 4] and its proof. The proof follows from results in [2] that we reproduce here for clarity of exposition.
Since the statement of the proposition remains correct, no change in the results of [arXiv:2003.05030] are required. In particular, Lemma 3 and Lemma 4 showing spectral convergence of graphs to graphons, Theorem 1 showing convergence of the GFT to the WFT, and Theorems 3 and 4 showing convergence of graph to graphon filters, remain valid.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Resilient Constrained Reinforcement Learning
Authors:
Dongsheng Ding,
Zhengyan Huan,
Alejandro Ribeiro
Abstract:
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we…
▽ More
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we propose a new constrained RL approach that searches for policy and constraint specifications together. This method features the adaptation of relaxing the constraint according to a relaxation cost introduced in the learning objective. Since this feature mimics how ecological systems adapt to disruptions by altering operation, our approach is termed as resilient constrained RL. Specifically, we provide a set of sufficient conditions that balance the constraint satisfaction and the reward maximization in notion of resilient equilibrium, propose a tractable formulation of resilient constrained policy optimization that takes this equilibrium as an optimal solution, and advocate two resilient constrained policy search algorithms with non-asymptotic convergence guarantees on the optimality gap and constraint satisfaction. Furthermore, we demonstrate the merits and the effectiveness of our approach in computational experiments.
△ Less
Submitted 29 December, 2023; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Robust Stochastically-Descending Unrolled Networks
Authors:
Samar Hadou,
Navid NaderiAlizadeh,
Alejandro Ribeiro
Abstract:
Deep unrolling, or unfolding, is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. However, the convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. To tackle these problems, we provide deep unrolled architectures with a stochastic descent nature by imposing descendi…
▽ More
Deep unrolling, or unfolding, is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. However, the convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. To tackle these problems, we provide deep unrolled architectures with a stochastic descent nature by imposing descending constraints during training. The descending constraints are forced layer by layer to ensure that each unrolled layer takes, on average, a descent step toward the optimum during training. We theoretically prove that the sequence constructed by the outputs of the unrolled layers is then guaranteed to converge for unseen problems, assuming no distribution shift between training and test problems. We also show that standard unrolling is brittle to perturbations, and our imposed constraints provide the unrolled networks with robustness to additive noise and perturbations. We numerically assess unrolled architectures trained under the proposed constraints in two different applications, including the sparse coding using learnable iterative shrinkage and thresholding algorithm (LISTA) and image inpainting using proximal generative flow (GLOW-Prox), and demonstrate the performance and robustness benefits of the proposed method.
△ Less
Submitted 29 November, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
MEDPSeg: Hierarchical polymorphic multitask learning for the segmentation of ground-glass opacities, consolidation, and pulmonary structures on computed tomography
Authors:
Diedre S. Carmo,
Jean A. Ribeiro,
Alejandro P. Comellas,
Joseph M. Reinhardt,
Sarah E. Gerard,
Letícia Rittner,
Roberto A. Lotufo
Abstract:
The COVID-19 pandemic response highlighted the potential of deep learning methods in facilitating the diagnosis, prognosis and understanding of lung diseases through automated segmentation of pulmonary structures and lesions in chest computed tomography (CT). Automated separation of lung lesion into ground-glass opacity (GGO) and consolidation is hindered due to the labor-intensive and subjective…
▽ More
The COVID-19 pandemic response highlighted the potential of deep learning methods in facilitating the diagnosis, prognosis and understanding of lung diseases through automated segmentation of pulmonary structures and lesions in chest computed tomography (CT). Automated separation of lung lesion into ground-glass opacity (GGO) and consolidation is hindered due to the labor-intensive and subjective nature of this task, resulting in scarce availability of ground truth for supervised learning. To tackle this problem, we propose MEDPSeg. MEDPSeg learns from heterogeneous chest CT targets through hierarchical polymorphic multitask learning (HPML). HPML explores the hierarchical nature of GGO and consolidation, lung lesions, and the lungs, with further benefits achieved through multitasking airway and pulmonary artery segmentation. Over 6000 volumetric CT scans from different partially labeled sources were used for training and testing. Experiments show PML enabling new state-of-the-art performance for GGO and consolidation segmentation tasks. In addition, MEDPSeg simultaneously performs segmentation of the lung parenchyma, airways, pulmonary artery, and lung lesions, all in a single forward prediction, with performance comparable to state-of-the-art methods specialized in each of those targets. Finally, we provide an open-source implementation with a graphical user interface at https://github.com/MICLab-Unicamp/medpseg.
△ Less
Submitted 25 March, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Learning State-Augmented Policies for Information Routing in Communication Networks
Authors:
Sourajit Das,
Navid NaderiAlizadeh,
Alejandro Ribeiro
Abstract:
This paper examines the problem of information routing in a large-scale communication network, which can be formulated as a constrained statistical learning problem having access to only local information. We delineate a novel State Augmentation (SA) strategy to maximize the aggregate information at source nodes using graph neural network (GNN) architectures, by deploying graph convolutions over t…
▽ More
This paper examines the problem of information routing in a large-scale communication network, which can be formulated as a constrained statistical learning problem having access to only local information. We delineate a novel State Augmentation (SA) strategy to maximize the aggregate information at source nodes using graph neural network (GNN) architectures, by deploying graph convolutions over the topological links of the communication network. The proposed technique leverages only the local information available at each node and efficiently routes desired information to the destination nodes. We leverage an unsupervised learning procedure to convert the output of the GNN architecture to optimal information routing strategies. In the experiments, we perform the evaluation on real-time network topologies to validate our algorithms. Numerical simulations depict the improved performance of the proposed method in training a GNN parameterization as compared to baseline algorithms.
△ Less
Submitted 6 December, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation
Authors:
Juan Elenter,
Navid NaderiAlizadeh,
Tara Javidi,
Alejandro Ribeiro
Abstract:
Continual learning is inherently a constrained learning problem. The goal is to learn a predictor under a no-forgetting requirement. Although several prior studies formulate it as such, they do not solve the constrained problem explicitly. In this work, we show that it is both possible and beneficial to undertake the constrained optimization problem directly. To do this, we leverage recent results…
▽ More
Continual learning is inherently a constrained learning problem. The goal is to learn a predictor under a no-forgetting requirement. Although several prior studies formulate it as such, they do not solve the constrained problem explicitly. In this work, we show that it is both possible and beneficial to undertake the constrained optimization problem directly. To do this, we leverage recent results in constrained learning through Lagrangian duality. We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer. In this setting, we analyze two versions of the continual learning problem: a coarse approach with constraints at the task level and a fine approach with constraints at the sample level. We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations. We then leverage this result to partition the buffer in the coarse approach, allocating more resources to harder tasks, and to populate the buffer in the fine approach, including only impactful samples. We derive a deviation bound on dual variables as sensitivity indicators, and empirically corroborate this result in diverse continual learning benchmarks. We also discuss the limitations of these methods with respect to the amount of memory available and the expressiveness of the parametrization.
△ Less
Submitted 31 May, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Transferability of Convolutional Neural Networks in Stationary Learning Tasks
Authors:
Damian Owerko,
Charilaos I. Kanatsoulis,
Jennifer Bondarchuk,
Donald J. Bucci Jr,
Alejandro Ribeiro
Abstract:
Recent advances in hardware and big data acquisition have accelerated the development of deep learning techniques. For an extended period of time, increasing the model complexity has led to performance improvements for various tasks. However, this trend is becoming unsustainable and there is a need for alternative, computationally lighter methods. In this paper, we introduce a novel framework for…
▽ More
Recent advances in hardware and big data acquisition have accelerated the development of deep learning techniques. For an extended period of time, increasing the model complexity has led to performance improvements for various tasks. However, this trend is becoming unsustainable and there is a need for alternative, computationally lighter methods. In this paper, we introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. To accomplish this we investigate the properties of CNNs for tasks where the underlying signals are stationary. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. This claim is supported by our theoretical analysis, which provides a bound on the performance degradation. Additionally, we conduct thorough experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten. Thus, CNN architectures provide solutions to these problems at previously computationally intractable scales.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Authors:
Dongsheng Ding,
Chen-Yu Wei,
Kaiqing Zhang,
Alejandro Ribeiro
Abstract:
We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To…
▽ More
We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual variable via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
△ Less
Submitted 16 January, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Solving Large-scale Spatial Problems with Convolutional Neural Networks
Authors:
Damian Owerko,
Charilaos I. Kanatsoulis,
Alejandro Ribeiro
Abstract:
Over the past decade, deep learning research has been accelerated by increasingly powerful hardware, which facilitated rapid growth in the model complexity and the amount of data ingested. This is becoming unsustainable and therefore refocusing on efficiency is necessary. In this paper, we employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a c…
▽ More
Over the past decade, deep learning research has been accelerated by increasingly powerful hardware, which facilitated rapid growth in the model complexity and the amount of data ingested. This is becoming unsustainable and therefore refocusing on efficiency is necessary. In this paper, we employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation, and provide a theoretical bound on the resulting generalization error. Our proof leverages shift-equivariance of CNNs, a property that is underexploited in transfer learning. The theoretical results are experimentally supported in the context of mobile infrastructure on demand (MID). The proposed approach is able to tackle MID at large scales with hundreds of agents, which was computationally intractable prior to this work.
△ Less
Submitted 7 February, 2024; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Geometric Graph Filters and Neural Networks: Limit Properties and Discriminability Trade-offs
Authors:
Zhiyang Wang,
Luana Ruiz,
Alejandro Ribeiro
Abstract:
This paper studies the relationship between a graph neural network (GNN) and a manifold neural network (MNN) when the graph is constructed from a set of points sampled from the manifold, thus encoding geometric information. We consider convolutional MNNs and GNNs where the manifold and the graph convolutions are respectively defined in terms of the Laplace-Beltrami operator and the graph Laplacian…
▽ More
This paper studies the relationship between a graph neural network (GNN) and a manifold neural network (MNN) when the graph is constructed from a set of points sampled from the manifold, thus encoding geometric information. We consider convolutional MNNs and GNNs where the manifold and the graph convolutions are respectively defined in terms of the Laplace-Beltrami operator and the graph Laplacian. Using the appropriate kernels, we analyze both dense and moderately sparse graphs. We prove non-asymptotic error bounds showing that convolutional filters and neural networks on these graphs converge to convolutional filters and neural networks on the continuous manifold. As a byproduct of this analysis, we observe an important trade-off between the discriminability of graph filters and their ability to approximate the desired behavior of manifold filters. We then discuss how this trade-off is ameliorated in neural networks due to the frequency mixing property of nonlinearities. We further derive a transferability corollary for geometric graphs sampled from the same manifold. We validate our results numerically on a navigation control problem and a point cloud classification task.
△ Less
Submitted 27 June, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Stochastic Unrolled Federated Learning
Authors:
Samar Hadou,
Navid NaderiAlizadeh,
Alejandro Ribeiro
Abstract:
Algorithm unrolling has emerged as a learning-based optimization paradigm that unfolds truncated iterative algorithms in trainable neural-network optimizers. We introduce Stochastic UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning in order to expedite its convergence. Our proposed method tackles two challenges of this expansion, namely the need to…
▽ More
Algorithm unrolling has emerged as a learning-based optimization paradigm that unfolds truncated iterative algorithms in trainable neural-network optimizers. We introduce Stochastic UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning in order to expedite its convergence. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolled optimizers to find a descent direction and the decentralized nature of federated learning. We circumvent the former challenge by feeding stochastic mini-batches to each unrolled layer and imposing descent constraints to guarantee its convergence. We address the latter challenge by unfolding the distributed gradient descent (DGD) algorithm in a graph neural network (GNN)-based unrolled architecture, which preserves the decentralized nature of training in federated learning. We theoretically prove that our proposed unrolled optimizer converges to a near-optimal region infinitely often. Through extensive numerical experiments, we also demonstrate the effectiveness of the proposed framework in collaborative training of image classifiers.
△ Less
Submitted 6 February, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Lie Group Algebra Convolutional Filters
Authors:
Harshat Kumar,
Alejandro Parada-Mayorga,
Alejandro Ribeiro
Abstract:
In this paper we propose a framework to leverage Lie group symmetries on arbitrary spaces exploiting \textit{algebraic signal processing} (ASP). We show that traditional group convolutions are one particular instantiation of a more general Lie group algebra homomorphism associated to an algebraic signal model rooted in the Lie group algebra $L^{1}(G)$ for given Lie group $G$. Exploiting this fact,…
▽ More
In this paper we propose a framework to leverage Lie group symmetries on arbitrary spaces exploiting \textit{algebraic signal processing} (ASP). We show that traditional group convolutions are one particular instantiation of a more general Lie group algebra homomorphism associated to an algebraic signal model rooted in the Lie group algebra $L^{1}(G)$ for given Lie group $G$. Exploiting this fact, we decouple the discretization of the Lie group convolution elucidating two separate sampling instances: the filter and the signal. To discretize the filters, we exploit the exponential map that links a Lie group with its associated Lie algebra. We show that the discrete Lie group filter learned from the data determines a unique filter in $L^{1}(G)$, and we show how this uniqueness of representation is defined by the bandwidth of the filter given a spectral representation. We also derive error bounds for the approximations of the filters in $L^{1}(G)$ with respect to its learned discrete representations. The proposed framework allows the processing of signals on spaces of arbitrary dimension and where the actions of some elements of the group are not necessarily well defined. Finally, we show that multigraph convolutional signal models come as the natural discrete realization of Lie group signal processing models, and we use this connection to establish stability results for Lie group algebra filters. To evaluate numerically our results, we build neural networks with these filters and we apply them in multiple datasets, including a knot classification problem.
△ Less
Submitted 26 January, 2024; v1 submitted 7 May, 2023;
originally announced May 2023.
-
Tangent Bundle Convolutional Learning: from Manifolds to Cellular Sheaves and Back
Authors:
Claudio Battiloro,
Zhiyang Wang,
Hans Riess,
Paolo Di Lorenzo,
Alejandro Ribeiro
Abstract:
In this work we introduce a convolution operation over the tangent bundle of Riemann manifolds in terms of exponentials of the Connection Laplacian operator. We define tangent bundle filters and tangent bundle neural networks (TNNs) based on this convolution operation, which are novel continuous architectures operating on tangent bundle signals, i.e. vector fields over the manifolds. Tangent bundl…
▽ More
In this work we introduce a convolution operation over the tangent bundle of Riemann manifolds in terms of exponentials of the Connection Laplacian operator. We define tangent bundle filters and tangent bundle neural networks (TNNs) based on this convolution operation, which are novel continuous architectures operating on tangent bundle signals, i.e. vector fields over the manifolds. Tangent bundle filters admit a spectral representation that generalizes the ones of scalar manifold filters, graph filters and standard convolutional filters in continuous time. We then introduce a discretization procedure, both in the space and time domains, to make TNNs implementable, showing that their discrete counterpart is a novel principled variant of the very recently introduced sheaf neural networks. We formally prove that this discretized architecture converges to the underlying continuous TNN. Finally, we numerically evaluate the effectiveness of the proposed architecture on various learning tasks, both on synthetic and real data.
△ Less
Submitted 15 March, 2024; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Invertible Kernel PCA with Random Fourier Features
Authors:
Daniel Gedon,
Antôni H. Ribeiro,
Niklas Wahlström,
Thomas B. Schön
Abstract:
Kernel principal component analysis (kPCA) is a widely studied method to construct a low-dimensional data representation after a nonlinear transformation. The prevailing method to reconstruct the original input signal from kPCA -- an important task for denoising -- requires us to solve a supervised learning problem. In this paper, we present an alternative method where the reconstruction follows n…
▽ More
Kernel principal component analysis (kPCA) is a widely studied method to construct a low-dimensional data representation after a nonlinear transformation. The prevailing method to reconstruct the original input signal from kPCA -- an important task for denoising -- requires us to solve a supervised learning problem. In this paper, we present an alternative method where the reconstruction follows naturally from the compression step. We first approximate the kernel with random Fourier features. Then, we exploit the fact that the nonlinear transformation is invertible in a certain subdomain. Hence, the name \emph{invertible kernel PCA (ikPCA)}. We experiment with different data modalities and show that ikPCA performs similarly to kPCA with supervised reconstruction on denoising tasks, making it a strong alternative.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.
-
Deep networks for system identification: a Survey
Authors:
Gianluigi Pillonetto,
Aleksandr Aravkin,
Daniel Gedon,
Lennart Ljung,
Antônio H. Ribeiro,
Thomas B. Schön
Abstract:
Deep learning is a topic of considerable current interest. The availability of massive data collections and powerful software resources has led to an impressive amount of results in many application areas that reveal essential but hidden properties of the observations. System identification learns mathematical descriptions of dynamic systems from input-output data and can thus benefit from the adv…
▽ More
Deep learning is a topic of considerable current interest. The availability of massive data collections and powerful software resources has led to an impressive amount of results in many application areas that reveal essential but hidden properties of the observations. System identification learns mathematical descriptions of dynamic systems from input-output data and can thus benefit from the advances of deep neural networks to enrich the possible range of models to choose from. For this reason, we provide a survey of deep learning from a system identification perspective. We cover a wide spectrum of topics to enable researchers to understand the methods, providing rigorous practical and theoretical insights into the benefits and challenges of using them. The main aim of the identified model is to predict new data from previous observations. This can be achieved with different deep learning based modelling techniques and we discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks. Their parameters have to be estimated from past data trying to optimize the prediction performance. For this purpose, we discuss a specific set of first-order optimization tools that is emerged as efficient. The survey then draws connections to the well-studied area of kernel-based methods. They control the data fit by regularization terms that penalize models not in line with prior assumptions. We illustrate how to cast them in deep architectures to obtain deep kernel-based methods. The success of deep learning also resulted in surprising empirical observations, like the counter-intuitive behaviour of models with many parameters. We discuss the role of overparameterized models, including their connection to kernels, as well as implicit regularization mechanisms which affect generalization, specifically the interesting phenomena of benign overfitting ...
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
ECG-Based Electrolyte Prediction: Evaluating Regression and Probabilistic Methods
Authors:
Philipp Von Bachmann,
Daniel Gedon,
Fredrik K. Gustafsson,
Antônio H. Ribeiro,
Erik Lampa,
Stefan Gustafsson,
Johan Sundström,
Thomas B. Schön
Abstract:
Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple…
▽ More
Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple to acquire. However, the problem of estimating continuous electrolyte concentrations directly from ECGs is not well-studied. We therefore investigate if regression methods can be used for accurate ECG-based prediction of electrolyte concentrations. Methods: We explore the use of deep neural networks (DNNs) for this task. We analyze the regression performance across four electrolytes, utilizing a novel dataset containing over 290000 ECGs. For improved understanding, we also study the full spectrum from continuous predictions to binary classification of extreme concentration levels. To enhance clinical usefulness, we finally extend to a probabilistic regression approach and evaluate different uncertainty estimates. Results: We find that the performance varies significantly between different electrolytes, which is clinically justified in the interplay of electrolytes and their manifestation in the ECG. We also compare the regression accuracy with that of traditional machine learning models, demonstrating superior performance of DNNs. Conclusion: Discretization can lead to good classification performance, but does not help solve the original problem of predicting continuous concentration levels. While probabilistic regression demonstrates potential practical usefulness, the uncertainty estimates are not particularly well-calibrated. Significance: Our study is a first step towards accurate and reliable ECG-based prediction of electrolyte concentration levels.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Convolutional Filtering on Sampled Manifolds
Authors:
Zhiyang Wang,
Luana Ruiz,
Alejandro Ribeiro
Abstract:
The increasing availability of geometric data has motivated the need for information processing over non-Euclidean domains modeled as manifolds. The building block for information processing architectures with desirable theoretical properties such as invariance and stability is convolutional filtering. Manifold convolutional filters are defined from the manifold diffusion sequence, constructed by…
▽ More
The increasing availability of geometric data has motivated the need for information processing over non-Euclidean domains modeled as manifolds. The building block for information processing architectures with desirable theoretical properties such as invariance and stability is convolutional filtering. Manifold convolutional filters are defined from the manifold diffusion sequence, constructed by successive applications of the Laplace-Beltrami operator to manifold signals. However, the continuous manifold model can only be accessed by sampling discrete points and building an approximate graph model from the sampled manifold. Effective linear information processing on the manifold requires quantifying the error incurred when approximating manifold convolutions with graph convolutions. In this paper, we derive a non-asymptotic error bound for this approximation, showing that convolutional filtering on the sampled manifold converges to continuous manifold filtering. Our findings are further demonstrated empirically on a problem of navigation control.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Algebraic Convolutional Filters on Lie Group Algebras
Authors:
Harshat Kumar,
Alejandro Parada-Mayorga,
Alejandro Ribeiro
Abstract:
Group convolutional neural networks are a useful tool for utilizing symmetries known to be in a signal; however, they require that the signal is defined on the group itself. Existing approaches either work directly with group signals, or they impose a lifting step with heuristics to compute the convolution which can be computationally costly. Taking an algebraic signal processing perspective, we p…
▽ More
Group convolutional neural networks are a useful tool for utilizing symmetries known to be in a signal; however, they require that the signal is defined on the group itself. Existing approaches either work directly with group signals, or they impose a lifting step with heuristics to compute the convolution which can be computationally costly. Taking an algebraic signal processing perspective, we propose a novel convolutional filter from the Lie group algebra directly, thereby removing the need to lift altogether. Furthermore, we establish stability of the filter by drawing connections to multigraph signal processing. The proposed filter is evaluated on a classification problem on two datasets with $SO(3)$ group symmetries.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
A State-Augmented Approach for Learning Optimal Resource Management Decisions in Wireless Networks
Authors:
Yiğit Berkay Uslu,
Navid NaderiAlizadeh,
Mark Eisen,
Alejandro Ribeiro
Abstract:
We consider a radio resource management (RRM) problem in a multi-user wireless network, where the goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users. We propose a state-augmented parameterization for the RRM policy, where alongside the instantaneous network states, the RRM policy takes as input the set of dual variables correspond…
▽ More
We consider a radio resource management (RRM) problem in a multi-user wireless network, where the goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users. We propose a state-augmented parameterization for the RRM policy, where alongside the instantaneous network states, the RRM policy takes as input the set of dual variables corresponding to the constraints. We provide theoretical justification for the feasibility and near-optimality of the RRM decisions generated by the proposed state-augmented algorithm. Focusing on the power allocation problem with RRM policies parameterized by a graph neural network (GNN) and dual variables sampled from the dual descent dynamics, we numerically demonstrate that the proposed approach achieves a superior trade-off between mean, minimum, and 5th percentile rates than baseline methods.
△ Less
Submitted 8 November, 2022; v1 submitted 28 October, 2022;
originally announced October 2022.
-
Learning with Multigraph Convolutional Filters
Authors:
Landon Butler,
Alejandro Parada-Mayorga,
Alejandro Ribeiro
Abstract:
In this paper, we introduce a convolutional architecture to perform learning when information is supported on multigraphs. Exploiting algebraic signal processing (ASP), we propose a convolutional signal processing model on multigraphs (MSP). Then, we introduce multigraph convolutional neural networks (MGNNs) as stacked and layered structures where information is processed according to an MSP model…
▽ More
In this paper, we introduce a convolutional architecture to perform learning when information is supported on multigraphs. Exploiting algebraic signal processing (ASP), we propose a convolutional signal processing model on multigraphs (MSP). Then, we introduce multigraph convolutional neural networks (MGNNs) as stacked and layered structures where information is processed according to an MSP model. We also develop a procedure for tractable computation of filter coefficients in the MGNN and a low cost method to reduce the dimensionality of the information transferred between layers. We conclude by comparing the performance of MGNNs against other learning architectures on an optimal resource allocation task for multi-channel communication systems.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Space-Time Graph Neural Networks with Stochastic Graph Perturbations
Authors:
Samar Hadou,
Charilaos Kanatsoulis,
Alejandro Ribeiro
Abstract:
Space-time graph neural networks (ST-GNNs) are recently developed architectures that learn efficient graph representations of time-varying data. ST-GNNs are particularly useful in multi-agent systems, due to their stability properties and their ability to respect communication delays between the agents. In this paper we revisit the stability properties of ST-GNNs and prove that they are stable to…
▽ More
Space-time graph neural networks (ST-GNNs) are recently developed architectures that learn efficient graph representations of time-varying data. ST-GNNs are particularly useful in multi-agent systems, due to their stability properties and their ability to respect communication delays between the agents. In this paper we revisit the stability properties of ST-GNNs and prove that they are stable to stochastic graph perturbations. Our analysis suggests that ST-GNNs are suitable for transfer learning on time-varying graphs and enables the design of generalized convolutional architectures that jointly process time-varying graphs and time-varying signals. Numerical experiments on decentralized control systems validate our theoretical results and showcase the benefits of traditional and generalized ST-GNN architectures.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Multi-Target Tracking with Transferable Convolutional Neural Networks
Authors:
Damian Owerko,
Charilaos I. Kanatsoulis,
Jennifer Bondarchuk,
Donald J. Bucci Jr,
Alejandro Ribeiro
Abstract:
Multi-target tracking (MTT) is a classical signal processing task, where the goal is to estimate the states of an unknown number of moving targets from noisy sensor measurements. In this paper, we revisit MTT from a deep learning perspective and propose a convolutional neural network (CNN) architecture to tackle it. We represent the target states and sensor measurements as images and recast the pr…
▽ More
Multi-target tracking (MTT) is a classical signal processing task, where the goal is to estimate the states of an unknown number of moving targets from noisy sensor measurements. In this paper, we revisit MTT from a deep learning perspective and propose a convolutional neural network (CNN) architecture to tackle it. We represent the target states and sensor measurements as images and recast the problem as an image-to-image prediction task. Then we train a fully convolutional model at small tracking areas and transfer it to much larger areas with numerous targets and sensors. This transfer learning approach enables MTT at a large scale and is also theoretically supported by our novel analysis that bounds the generalization error. In practice, the proposed transferable CNN architecture outperforms random finite set filters on the MTT task with 10 targets and transfers without re-training to a larger MTT task with 250 targets with a 29% performance improvement.
△ Less
Submitted 25 July, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Tangent Bundle Filters and Neural Networks: from Manifolds to Cellular Sheaves and Back
Authors:
Claudio Battiloro,
Zhiyang Wang,
Hans Riess,
Paolo Di Lorenzo,
Alejandro Ribeiro
Abstract:
In this work we introduce a convolution operation over the tangent bundle of Riemannian manifolds exploiting the Connection Laplacian operator. We use the convolution to define tangent bundle filters and tangent bundle neural networks (TNNs), novel continuous architectures operating on tangent bundle signals, i.e. vector fields over manifolds. We discretize TNNs both in space and time domains, sho…
▽ More
In this work we introduce a convolution operation over the tangent bundle of Riemannian manifolds exploiting the Connection Laplacian operator. We use the convolution to define tangent bundle filters and tangent bundle neural networks (TNNs), novel continuous architectures operating on tangent bundle signals, i.e. vector fields over manifolds. We discretize TNNs both in space and time domains, showing that their discrete counterpart is a principled variant of the recently introduced Sheaf Neural Networks. We formally prove that this discrete architecture converges to the underlying continuous TNN. We numerically evaluate the effectiveness of the proposed architecture on a denoising task of a tangent vector field over the unit 2-sphere.
△ Less
Submitted 18 November, 2022; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Unsupervised Optimal Power Flow Using Graph Neural Networks
Authors:
Damian Owerko,
Fernando Gama,
Alejandro Ribeiro
Abstract:
Optimal power flow (OPF) is a critical optimization problem that allocates power to the generators in order to satisfy the demand at a minimum cost. Solving this problem exactly is computationally infeasible in the general case. In this work, we propose to leverage graph signal processing and machine learning. More specifically, we use a graph neural network to learn a nonlinear parametrization be…
▽ More
Optimal power flow (OPF) is a critical optimization problem that allocates power to the generators in order to satisfy the demand at a minimum cost. Solving this problem exactly is computationally infeasible in the general case. In this work, we propose to leverage graph signal processing and machine learning. More specifically, we use a graph neural network to learn a nonlinear parametrization between the power demanded and the corresponding allocation. We learn the solution in an unsupervised manner, minimizing the cost directly. In order to take into account the electrical constraints of the grid, we propose a novel barrier method that is differentiable and works on initially infeasible points. We show through simulations that the use of GNNs in this unsupervised learning context leads to solutions comparable to standard solvers while being computationally efficient and avoiding constraint violations most of the time.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Convolutional Neural Networks on Manifolds: From Graphs and Back
Authors:
Zhiyang Wang,
Luana Ruiz,
Alejandro Ribeiro
Abstract:
Geometric deep learning has gained much attention in recent years due to more available data acquired from non-Euclidean domains. Some examples include point clouds for 3D models and wireless sensor networks in communications. Graphs are common models to connect these discrete data points and capture the underlying geometric structure. With the large amount of these geometric data, graphs with arb…
▽ More
Geometric deep learning has gained much attention in recent years due to more available data acquired from non-Euclidean domains. Some examples include point clouds for 3D models and wireless sensor networks in communications. Graphs are common models to connect these discrete data points and capture the underlying geometric structure. With the large amount of these geometric data, graphs with arbitrarily large size tend to converge to a limit model -- the manifold. Deep neural network architectures have been proved as a powerful technique to solve problems based on these data residing on the manifold. In this paper, we propose a manifold neural network (MNN) composed of a bank of manifold convolutional filters and point-wise nonlinearities. We define a manifold convolution operation which is consistent with the discrete graph convolution by discretizing in both space and time domains. To sum up, we focus on the manifold model as the limit of large graphs and construct MNNs, while we can still bring back graph neural networks by the discretization of MNNs. We carry out experiments based on point-cloud dataset to showcase the performance of our proposed MNNs.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
Learning Globally Smooth Functions on Manifolds
Authors:
Juan Cervino,
Luiz F. O. Chamon,
Benjamin D. Haeffele,
Rene Vidal,
Alejandro Ribeiro
Abstract:
Smoothness and low dimensional structures play central roles in improving generalization and stability in learning and statistics. This work combines techniques from semi-infinite constrained learning and manifold regularization to learn representations that are globally smooth on a manifold. To do so, it shows that under typical conditions the problem of learning a Lipschitz continuous function o…
▽ More
Smoothness and low dimensional structures play central roles in improving generalization and stability in learning and statistics. This work combines techniques from semi-infinite constrained learning and manifold regularization to learn representations that are globally smooth on a manifold. To do so, it shows that under typical conditions the problem of learning a Lipschitz continuous function on a manifold is equivalent to a dynamically weighted manifold regularization problem. This observation leads to a practical algorithm based on a weighted Laplacian penalty whose weights are adapted using stochastic gradient techniques. It is shown that under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct. Experiments on real world data illustrate the advantages of the proposed method relative to existing alternatives.
△ Less
Submitted 1 February, 2023; v1 submitted 1 October, 2022;
originally announced October 2022.
-
Federated Representation Learning via Maximal Coding Rate Reduction
Authors:
Juan Cervino,
Navid NaderiAlizadeh,
Alejandro Ribeiro
Abstract:
We propose a federated methodology to learn low-dimensional representations from a dataset that is distributed among several clients. In particular, we move away from the commonly-used cross-entropy loss in federated learning, and seek to learn shared low-dimensional representations of the data in a decentralized manner via the principle of maximal coding rate reduction (MCR2). Our proposed method…
▽ More
We propose a federated methodology to learn low-dimensional representations from a dataset that is distributed among several clients. In particular, we move away from the commonly-used cross-entropy loss in federated learning, and seek to learn shared low-dimensional representations of the data in a decentralized manner via the principle of maximal coding rate reduction (MCR2). Our proposed method, which we refer to as FLOW, utilizes MCR2 as the objective of choice, hence resulting in representations that are both between-class discriminative and within-class compressible. We theoretically show that our distributed algorithm achieves a first-order stationary point. Moreover, we demonstrate, via numerical experiments, the utility of the learned low-dimensional representations.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
Convolutional Learning on Multigraphs
Authors:
Landon Butler,
Alejandro Parada-Mayorga,
Alejandro Ribeiro
Abstract:
Graph convolutional learning has led to many exciting discoveries in diverse areas. However, in some applications, traditional graphs are insufficient to capture the structure and intricacies of the data. In such scenarios, multigraphs arise naturally as discrete structures in which complex dynamics can be embedded. In this paper, we develop convolutional information processing on multigraphs and…
▽ More
Graph convolutional learning has led to many exciting discoveries in diverse areas. However, in some applications, traditional graphs are insufficient to capture the structure and intricacies of the data. In such scenarios, multigraphs arise naturally as discrete structures in which complex dynamics can be embedded. In this paper, we develop convolutional information processing on multigraphs and introduce convolutional multigraph neural networks (MGNNs). To capture the complex dynamics of information diffusion within and across each of the multigraph's classes of edges, we formalize a convolutional signal processing model, defining the notions of signals, filtering, and frequency representations on multigraphs. Leveraging this model, we develop a multigraph learning architecture, including a sampling procedure to reduce computational complexity. The introduced architecture is applied towards optimal wireless resource allocation and a hate speech localization task, offering improved performance over traditional graph neural networks.
△ Less
Submitted 8 February, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
On Merging Feature Engineering and Deep Learning for Diagnosis, Risk-Prediction and Age Estimation Based on the 12-Lead ECG
Authors:
Eran Zvuloni,
Jesse Read,
Antônio H. Ribeiro,
Antonio Luiz P. Ribeiro,
Joachim A. Behar
Abstract:
Objective: Machine learning techniques have been used extensively for 12-lead electrocardiogram (ECG) analysis. For physiological time series, deep learning (DL) superiority to feature engineering (FE) approaches based on domain knowledge is still an open question. Moreover, it remains unclear whether combining DL with FE may improve performance. Methods: We considered three tasks intending to add…
▽ More
Objective: Machine learning techniques have been used extensively for 12-lead electrocardiogram (ECG) analysis. For physiological time series, deep learning (DL) superiority to feature engineering (FE) approaches based on domain knowledge is still an open question. Moreover, it remains unclear whether combining DL with FE may improve performance. Methods: We considered three tasks intending to address these research gaps: cardiac arrhythmia diagnosis (multiclass-multilabel classification), atrial fibrillation risk prediction (binary classification), and age estimation (regression). We used an overall dataset of 2.3M 12-lead ECG recordings to train the following models for each task: i) a random forest taking the FE as input was trained as a classical machine learning approach; ii) an end-to-end DL model; and iii) a merged model of FE+DL. Results: FE yielded comparable results to DL while necessitating significantly less data for the two classification tasks and it was outperformed by DL for the regression task. For all tasks, merging FE with DL did not improve performance over DL alone. Conclusion: We found that for traditional 12-lead ECG based diagnosis tasks DL did not yield a meaningful improvement over FE, while it improved significantly the nontraditional regression task. We also found that combining FE with DL did not improve over DL alone which suggests that the FE were redundant with the features learned by DL. Significance: Our findings provides important recommendations on what machine learning strategy and data regime to chose with respect to the task at hand for the development of new machine learning models based on the 12-lead ECG.
△ Less
Submitted 16 July, 2022; v1 submitted 13 July, 2022;
originally announced July 2022.
-
State-Augmented Learnable Algorithms for Resource Management in Wireless Networks
Authors:
Navid NaderiAlizadeh,
Mark Eisen,
Alejandro Ribeiro
Abstract:
We consider resource management problems in multi-user wireless networks, which can be cast as optimizing a network-wide utility function, subject to constraints on the long-term average performance of users across the network. We propose a state-augmented algorithm for solving the aforementioned radio resource management (RRM) problems, where, alongside the instantaneous network state, the RRM po…
▽ More
We consider resource management problems in multi-user wireless networks, which can be cast as optimizing a network-wide utility function, subject to constraints on the long-term average performance of users across the network. We propose a state-augmented algorithm for solving the aforementioned radio resource management (RRM) problems, where, alongside the instantaneous network state, the RRM policy takes as input the set of dual variables corresponding to the constraints, which evolve depending on how much the constraints are violated during execution. We theoretically show that the proposed state-augmented algorithm leads to feasible and near-optimal RRM decisions. Moreover, focusing on the problem of wireless power control using graph neural network (GNN) parameterizations, we demonstrate the superiority of the proposed RRM algorithm over baseline methods across a suite of numerical experiments.
△ Less
Submitted 11 December, 2022; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Surprises in adversarially-trained linear regression
Authors:
Antônio H. Ribeiro,
Dave Zachariah,
Thomas B. Schön
Abstract:
State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against such examples. It is formulated as a min-max problem, searching for the best solution when the training data was corrupted by the worst-case attacks. For linear regression problems, adversarial training can…
▽ More
State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against such examples. It is formulated as a min-max problem, searching for the best solution when the training data was corrupted by the worst-case attacks. For linear regression problems, adversarial training can be formulated as a convex problem. We use this reformulation to make two technical contributions: First, we formulate the training problem as an instance of robust regression to reveal its connection to parameter-shrinking methods, specifically that $\ell_\infty$-adversarial training produces sparse solutions. Secondly, we study adversarial training in the overparameterized regime, i.e. when there are more parameters than data. We prove that adversarial training with small disturbances gives the solution with the minimum-norm that interpolates the training data. Ridge regression and lasso approximate such interpolating solutions as their regularization parameter vanishes. By contrast, for adversarial training, the transition into the interpolation regime is abrupt and for non-zero values of disturbance. This result is proved and illustrated with numerical examples.
△ Less
Submitted 20 October, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Representation Power of Graph Neural Networks: Improved Expressivity via Algebraic Analysis
Authors:
Charilaos I. Kanatsoulis,
Alejandro Ribeiro
Abstract:
Despite the remarkable success of Graph Neural Networks (GNNs), the common belief is that their representation power is limited and that they are at most as expressive as the Weisfeiler-Lehman (WL) algorithm. In this paper, we argue the opposite and show that standard GNNs, with anonymous inputs, produce more discriminative representations than the WL algorithm. Our novel analysis employs linear a…
▽ More
Despite the remarkable success of Graph Neural Networks (GNNs), the common belief is that their representation power is limited and that they are at most as expressive as the Weisfeiler-Lehman (WL) algorithm. In this paper, we argue the opposite and show that standard GNNs, with anonymous inputs, produce more discriminative representations than the WL algorithm. Our novel analysis employs linear algebraic tools and characterizes the representation power of GNNs with respect to the eigenvalue decomposition of the graph operators. We prove that GNNs are able to generate distinctive outputs from white uninformative inputs, for, at least, all graphs that have different eigenvalues. We also show that simple convolutional architectures with white inputs, produce equivariant features that count the closed paths in the graph and are provably more expressive than the WL representations. Thorough experimental analysis on graph isomorphism and graph classification datasets corroborates our theoretical results and demonstrates the effectiveness of the proposed approach.
△ Less
Submitted 21 July, 2023; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Learning Graph Structure from Convolutional Mixtures
Authors:
Max Wasserman,
Saurabh Sihag,
Gonzalo Mateos,
Alejandro Ribeiro
Abstract:
Machine learning frameworks such as graph neural networks typically rely on a given, fixed graph to exploit relational inductive biases and thus effectively learn from network data. However, when said graphs are (partially) unobserved, noisy, or dynamic, the problem of inferring graph structure from data becomes relevant. In this paper, we postulate a graph convolutional relationship between the o…
▽ More
Machine learning frameworks such as graph neural networks typically rely on a given, fixed graph to exploit relational inductive biases and thus effectively learn from network data. However, when said graphs are (partially) unobserved, noisy, or dynamic, the problem of inferring graph structure from data becomes relevant. In this paper, we postulate a graph convolutional relationship between the observed and latent graphs, and formulate the graph learning task as a network inverse (deconvolution) problem. In lieu of eigendecomposition-based spectral methods or iterative optimization solutions, we unroll and truncate proximal gradient iterations to arrive at a parameterized neural network architecture that we call a Graph Deconvolution Network (GDN). GDNs can learn a distribution of graphs in a supervised fashion, perform link prediction or edge-weight regression tasks by adapting the loss function, and they are inherently inductive. We corroborate GDN's superior graph recovery performance and its generalization to larger graphs using synthetic data in supervised settings. Furthermore, we demonstrate the robustness and representation power of GDNs on real world neuroimaging and social network datasets.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.