Search | arXiv e-print repository

Monitoring snow avalanches from SAR data with deep learning

Authors: Filippo Maria Bianchi, Jakob Grahn

Abstract: Snow avalanches present significant risks to human life and infrastructure, particularly in mountainous regions, making effective monitoring crucial. Traditional monitoring methods, such as field observations, are limited by accessibility, weather conditions, and cost. Satellite-borne Synthetic Aperture Radar (SAR) data has become an important tool for large-scale avalanche detection, as it can ca… ▽ More Snow avalanches present significant risks to human life and infrastructure, particularly in mountainous regions, making effective monitoring crucial. Traditional monitoring methods, such as field observations, are limited by accessibility, weather conditions, and cost. Satellite-borne Synthetic Aperture Radar (SAR) data has become an important tool for large-scale avalanche detection, as it can capture data in all weather conditions and across remote areas. However, traditional processing methods struggle with the complexity and variability of avalanches. This chapter reviews the application of deep learning for detecting and segmenting snow avalanches from SAR data. Early efforts focused on the binary classification of SAR images, while recent advances have enabled pixel-level segmentation, providing greater accuracy and spatial resolution. A case study using Sentinel-1 SAR data demonstrates the effectiveness of deep learning models for avalanche segmentation, achieving superior results over traditional methods. We also present an extension of this work, testing recent state-of-the-art segmentation architectures on an expanded dataset of over 4,500 annotated SAR images. The best-performing model among those tested was applied for large-scale avalanche detection across the whole of Norway, revealing important spatial and temporal patterns over several winter seasons. △ Less

Submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.09443 [pdf, ps, other]

Relational Conformal Prediction for Correlated Time Series

Authors: Andrea Cini, Alexander Jenkins, Danilo Mandic, Cesare Alippi, Filippo Maria Bianchi

Abstract: We address the problem of uncertainty quantification in time series forecasting by exploiting observations at correlated sequences. Relational deep learning methods leveraging graph representations are among the most effective tools for obtaining point estimates from spatiotemporal data and correlated time series. However, the problem of exploiting relational structures to estimate the uncertainty… ▽ More We address the problem of uncertainty quantification in time series forecasting by exploiting observations at correlated sequences. Relational deep learning methods leveraging graph representations are among the most effective tools for obtaining point estimates from spatiotemporal data and correlated time series. However, the problem of exploiting relational structures to estimate the uncertainty of such predictions has been largely overlooked in the same context. To this end, we propose a novel distribution-free approach based on the conformal prediction framework and quantile regression. Despite the recent applications of conformal prediction to sequential data, existing methods operate independently on each target time series and do not account for relationships among them when constructing the prediction interval. We fill this void by introducing a novel conformal prediction method based on graph deep learning operators. Our approach, named Conformal Relational Prediction (CoRel), does not require the relational structure (graph) to be known a priori and can be applied on top of any pre-trained predictor. Additionally, CoRel includes an adaptive component to handle non-exchangeable data and changes in the input time series. Our approach provides accurate coverage and achieves state-of-the-art uncertainty quantification in relevant benchmarks. △ Less

Submitted 5 June, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

Comments: ICML 2025

arXiv:2501.09821 [pdf, other]

BN-Pool: a Bayesian Nonparametric Approach to Graph Pooling

Authors: Daniele Castellana, Filippo Maria Bianchi

Abstract: We introduce BN-Pool, the first clustering-based pooling method for Graph Neural Networks (GNNs) that adaptively determines the number of supernodes in a coarsened graph. By leveraging a Bayesian non-parametric framework, BN-Pool employs a generative model capable of partitioning graph nodes into an unbounded number of clusters. During training, we learn the node-to-cluster assignments by combinin… ▽ More We introduce BN-Pool, the first clustering-based pooling method for Graph Neural Networks (GNNs) that adaptively determines the number of supernodes in a coarsened graph. By leveraging a Bayesian non-parametric framework, BN-Pool employs a generative model capable of partitioning graph nodes into an unbounded number of clusters. During training, we learn the node-to-cluster assignments by combining the supervised loss of the downstream task with an unsupervised auxiliary term, which encourages the reconstruction of the original graph topology while penalizing unnecessary proliferation of clusters. This adaptive strategy allows BN-Pool to automatically discover an optimal coarsening level, offering enhanced flexibility and removing the need to specify sensitive pooling ratios. We show that BN-Pool achieves superior performance across diverse benchmarks. △ Less

Submitted 16 January, 2025; originally announced January 2025.

arXiv:2410.13469 [pdf, other]

Interpreting Temporal Graph Neural Networks with Koopman Theory

Authors: Michele Guerra, Simone Scardapane, Filippo Maria Bianchi

Abstract: Spatiotemporal graph neural networks (STGNNs) have shown promising results in many domains, from forecasting to epidemiology. However, understanding the dynamics learned by these models and explaining their behaviour is significantly more complex than for models dealing with static data. Inspired by Koopman theory, which allows a simpler description of intricate, nonlinear dynamical systems, we in… ▽ More Spatiotemporal graph neural networks (STGNNs) have shown promising results in many domains, from forecasting to epidemiology. However, understanding the dynamics learned by these models and explaining their behaviour is significantly more complex than for models dealing with static data. Inspired by Koopman theory, which allows a simpler description of intricate, nonlinear dynamical systems, we introduce an explainability approach for temporal graphs. We present two methods to interpret the STGNN's decision process and identify the most relevant spatial and temporal patterns in the input for the task at hand. The first relies on dynamic mode decomposition (DMD), a Koopman-inspired dimensionality reduction method. The second relies on sparse identification of nonlinear dynamics (SINDy), a popular method for discovering governing equations, which we use for the first time as a general tool for explainability. We show how our methods can correctly identify interpretable features such as infection times and infected nodes in the context of dissemination processes. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2409.05100 [pdf, other]

MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks

Authors: Carlo Abate, Filippo Maria Bianchi

Abstract: We propose a novel approach to compute the MAXCUT in attributed graphs, i.e., graphs with features associated with nodes and edges. Our approach works well on any kind of graph topology and can find solutions that jointly optimize the MAXCUT along with other objectives. Based on the obtained MAXCUT partition, we implement a hierarchical graph pooling layer for Graph Neural Networks, which is spars… ▽ More We propose a novel approach to compute the MAXCUT in attributed graphs, i.e., graphs with features associated with nodes and edges. Our approach works well on any kind of graph topology and can find solutions that jointly optimize the MAXCUT along with other objectives. Based on the obtained MAXCUT partition, we implement a hierarchical graph pooling layer for Graph Neural Networks, which is sparse, trainable end-to-end, and particularly suitable for downstream tasks on heterophilic graphs. △ Less

Submitted 31 January, 2025; v1 submitted 8 September, 2024; originally announced September 2024.

Comments: Accepted at ICLR 2025

arXiv:2402.10634 [pdf, other]

Graph-based Forecasting with Missing Data through Spatiotemporal Downsampling

Authors: Ivan Marisca, Cesare Alippi, Filippo Maria Bianchi

Abstract: Given a set of synchronous time series, each associated with a sensor-point in space and characterized by inter-series relationships, the problem of spatiotemporal forecasting consists of predicting future observations for each point. Spatiotemporal graph neural networks achieve striking results by representing the relationships across time series as a graph. Nonetheless, most existing methods rel… ▽ More Given a set of synchronous time series, each associated with a sensor-point in space and characterized by inter-series relationships, the problem of spatiotemporal forecasting consists of predicting future observations for each point. Spatiotemporal graph neural networks achieve striking results by representing the relationships across time series as a graph. Nonetheless, most existing methods rely on the often unrealistic assumption that inputs are always available and fail to capture hidden spatiotemporal dynamics when part of the data is missing. In this work, we tackle this problem through hierarchical spatiotemporal downsampling. The input time series are progressively coarsened over time and space, obtaining a pool of representations that capture heterogeneous temporal and spatial dynamics. Conditioned on observations and missing data patterns, such representations are combined by an interpretable attention mechanism to generate the forecasts. Our approach outperforms state-of-the-art methods on synthetic and real-world benchmarks under different missing data distributions, particularly in the presence of contiguous blocks of missing values. △ Less

Submitted 8 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: Accepted at ICML 2024

arXiv:2308.12844 [pdf, other]

Probabilistic load forecasting with Reservoir Computing

Authors: Michele Guerra, Simone Scardapane, Filippo Maria Bianchi

Abstract: Some applications of deep learning require not only to provide accurate results but also to quantify the amount of confidence in their prediction. The management of an electric power grid is one of these cases: to avoid risky scenarios, decision-makers need both precise and reliable forecasts of, for example, power loads. For this reason, point forecasts are not enough hence it is necessary to ado… ▽ More Some applications of deep learning require not only to provide accurate results but also to quantify the amount of confidence in their prediction. The management of an electric power grid is one of these cases: to avoid risky scenarios, decision-makers need both precise and reliable forecasts of, for example, power loads. For this reason, point forecasts are not enough hence it is necessary to adopt methods that provide an uncertainty quantification. This work focuses on reservoir computing as the core time series forecasting method, due to its computational efficiency and effectiveness in predicting time series. While the RC literature mostly focused on point forecasting, this work explores the compatibility of some popular uncertainty quantification methods with the reservoir setting. Both Bayesian and deterministic approaches to uncertainty assessment are evaluated and compared in terms of their prediction accuracy, computational resource efficiency and reliability of the estimated uncertainty, based on a set of carefully chosen performance metrics. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2304.07152 [pdf, other]

Combining Stochastic Explainers and Subgraph Neural Networks can Increase Expressivity and Interpretability

Authors: Indro Spinelli, Michele Guerra, Filippo Maria Bianchi, Simone Scardapane

Abstract: Subgraph-enhanced graph neural networks (SGNN) can increase the expressive power of the standard message-passing framework. This model family represents each graph as a collection of subgraphs, generally extracted by random sampling or with hand-crafted heuristics. Our key observation is that by selecting "meaningful" subgraphs, besides improving the expressivity of a GNN, it is also possible to o… ▽ More Subgraph-enhanced graph neural networks (SGNN) can increase the expressive power of the standard message-passing framework. This model family represents each graph as a collection of subgraphs, generally extracted by random sampling or with hand-crafted heuristics. Our key observation is that by selecting "meaningful" subgraphs, besides improving the expressivity of a GNN, it is also possible to obtain interpretable results. For this purpose, we introduce a novel framework that jointly predicts the class of the graph and a set of explanatory sparse subgraphs, which can be analyzed to understand the decision process of the classifier. We compare the performance of our framework against standard subgraph extraction policies, like random node/edge deletion strategies. The subgraphs produced by our framework allow to achieve comparable performance in terms of accuracy, with the additional benefit of providing explanations. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.01575 [pdf, other]

The expressive power of pooling in Graph Neural Networks

Authors: Filippo Maria Bianchi, Veronica Lachi

Abstract: In Graph Neural Networks (GNNs), hierarchical pooling operators generate local summaries of the data by coarsening the graph structure and the vertex features. While considerable attention has been devoted to analyzing the expressive power of message-passing (MP) layers in GNNs, a study on how graph pooling affects the expressiveness of a GNN is still lacking. Additionally, despite the recent adva… ▽ More In Graph Neural Networks (GNNs), hierarchical pooling operators generate local summaries of the data by coarsening the graph structure and the vertex features. While considerable attention has been devoted to analyzing the expressive power of message-passing (MP) layers in GNNs, a study on how graph pooling affects the expressiveness of a GNN is still lacking. Additionally, despite the recent advances in the design of pooling operators, there is not a principled criterion to compare them. In this work, we derive sufficient conditions for a pooling operator to fully preserve the expressive power of the MP layers before it. These conditions serve as a universal and theoretically grounded criterion for choosing among existing pooling operators or designing new ones. Based on our theoretical findings, we analyze several existing pooling operators and identify those that fail to satisfy the expressiveness conditions. Finally, we introduce an experimental setup to verify empirically the expressive power of a GNN equipped with pooling layers, in terms of its capability to perform a graph isomorphism test. △ Less

Submitted 12 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2211.06218 [pdf, other]

Total Variation Graph Neural Networks

Authors: Jonas Berg Hansen, Filippo Maria Bianchi

Abstract: Recently proposed Graph Neural Networks (GNNs) for vertex clustering are trained with an unsupervised minimum cut objective, approximated by a Spectral Clustering (SC) relaxation. However, the SC relaxation is loose and, while it offers a closed-form solution, it also yields overly smooth cluster assignments that poorly separate the vertices. In this paper, we propose a GNN model that computes clu… ▽ More Recently proposed Graph Neural Networks (GNNs) for vertex clustering are trained with an unsupervised minimum cut objective, approximated by a Spectral Clustering (SC) relaxation. However, the SC relaxation is loose and, while it offers a closed-form solution, it also yields overly smooth cluster assignments that poorly separate the vertices. In this paper, we propose a GNN model that computes cluster assignments by optimizing a tighter relaxation of the minimum cut based on graph total variation (GTV). The cluster assignments can be used directly to perform vertex clustering or to implement graph pooling in a graph classification framework. Our model consists of two core components: i) a message-passing layer that minimizes the $\ell_1$ distance in the features of adjacent vertices, which is key to achieving sharp transitions between clusters; ii) an unsupervised loss function that minimizes the GTV of the cluster assignments while ensuring balanced partitions. Experimental results show that our model outperforms other GNNs for vertex clustering and graph classification. △ Less

Submitted 27 April, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

arXiv:2209.07926 [pdf, other]

doi 10.7557/18.6796

Explainability in subgraphs-enhanced Graph Neural Networks

Authors: Michele Guerra, Indro Spinelli, Simone Scardapane, Filippo Maria Bianchi

Abstract: Recently, subgraphs-enhanced Graph Neural Networks (SGNNs) have been introduced to enhance the expressive power of Graph Neural Networks (GNNs), which was proved to be not higher than the 1-dimensional Weisfeiler-Leman isomorphism test. The new paradigm suggests using subgraphs extracted from the input graph to improve the model's expressiveness, but the additional complexity exacerbates an alread… ▽ More Recently, subgraphs-enhanced Graph Neural Networks (SGNNs) have been introduced to enhance the expressive power of Graph Neural Networks (GNNs), which was proved to be not higher than the 1-dimensional Weisfeiler-Leman isomorphism test. The new paradigm suggests using subgraphs extracted from the input graph to improve the model's expressiveness, but the additional complexity exacerbates an already challenging problem in GNNs: explaining their predictions. In this work, we adapt PGExplainer, one of the most recent explainers for GNNs, to SGNNs. The proposed explainer accounts for the contribution of all the different subgraphs and can produce a meaningful explanation that humans can interpret. The experiments that we performed both on real and synthetic datasets show that our framework is successful in explaining the decision process of an SGNN on graph classification tasks. △ Less

Submitted 19 January, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

Comments: The source code implementing our workflow is publicly available online at https://github.com/MicheleUIT/Explaining_SGNN

arXiv:2209.06520 [pdf, other]

Scalable Spatiotemporal Graph Neural Networks

Authors: Andrea Cini, Ivan Marisca, Filippo Maria Bianchi, Cesare Alippi

Abstract: Neural forecasting of spatiotemporal time series drives both research and industrial innovation in several relevant application domains. Graph neural networks (GNNs) are often the core component of the forecasting architecture. However, in most spatiotemporal GNNs, the computational complexity scales up to a quadratic factor with the length of the sequence times the number of links in the graph, h… ▽ More Neural forecasting of spatiotemporal time series drives both research and industrial innovation in several relevant application domains. Graph neural networks (GNNs) are often the core component of the forecasting architecture. However, in most spatiotemporal GNNs, the computational complexity scales up to a quadratic factor with the length of the sequence times the number of links in the graph, hence hindering the application of these models to large graphs and long temporal sequences. While methods to improve scalability have been proposed in the context of static graphs, few research efforts have been devoted to the spatiotemporal case. To fill this gap, we propose a scalable architecture that exploits an efficient encoding of both temporal and spatial dynamics. In particular, we use a randomized recurrent neural network to embed the history of the input time series into high-dimensional state representations encompassing multi-scale temporal dynamics. Such representations are then propagated along the spatial dimension using different powers of the graph adjacency matrix to generate node embeddings characterized by a rich pool of spatiotemporal features. The resulting node embeddings can be efficiently pre-computed in an unsupervised manner, before being fed to a feed-forward decoder that learns to map the multi-scale spatiotemporal representations to predictions. The training procedure can then be parallelized node-wise by sampling the node embeddings without breaking any dependency, thus enabling scalability to large networks. Empirical results on relevant datasets show that our approach achieves results competitive with the state of the art, while dramatically reducing the computational burden. △ Less

Submitted 20 February, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Published as conference paper at AAAI 23

arXiv:2207.08779 [pdf, other]

doi 10.7557/18.6790

Simplifying Clustering with Graph Neural Networks

Authors: Filippo Maria Bianchi

Abstract: The objective functions used in spectral clustering are usually composed of two terms: i) a term that minimizes the local quadratic variation of the cluster assignments on the graph and; ii) a term that balances the clustering partition and helps avoiding degenerate solutions. This paper shows that a graph neural network, equipped with suitable message passing layers, can generate good cluster ass… ▽ More The objective functions used in spectral clustering are usually composed of two terms: i) a term that minimizes the local quadratic variation of the cluster assignments on the graph and; ii) a term that balances the clustering partition and helps avoiding degenerate solutions. This paper shows that a graph neural network, equipped with suitable message passing layers, can generate good cluster assignments by optimizing only a balancing term. Results on attributed graph datasets show the effectiveness of the proposed approach in terms of clustering performance and computation time. △ Less

Submitted 27 November, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

arXiv:2203.16401 [pdf, other]

doi 10.1109/TGRS.2022.3204886

Recognition of polar lows in Sentinel-1 SAR images with deep learning

Authors: Jakob Grahn, Filippo Maria Bianchi

Abstract: In this paper, we explore the possibility of detecting polar lows in C-band SAR images by means of deep learning. Specifically, we introduce a novel dataset consisting of Sentinel-1 images divided into two classes, representing the presence and absence of a maritime mesocyclone, respectively. The dataset is constructed using the ERA5 dataset as baseline and it consists of 2004 annotated images. To… ▽ More In this paper, we explore the possibility of detecting polar lows in C-band SAR images by means of deep learning. Specifically, we introduce a novel dataset consisting of Sentinel-1 images divided into two classes, representing the presence and absence of a maritime mesocyclone, respectively. The dataset is constructed using the ERA5 dataset as baseline and it consists of 2004 annotated images. To our knowledge, this is the first dataset of its kind to be publicly released. The dataset is used to train a deep learning model to classify the labeled images. Evaluated on an independent test set, the model yields an F-1 score of 0.95, indicating that polar lows can be consistently detected from SAR images. Interpretability techniques applied to the deep learning model reveal that atmospheric fronts and cyclonic eyes are key features in the classification. Moreover, experimental results show that the model is accurate even if: (i) such features are significantly cropped due to the limited swath width of the SAR, (ii) the features are partly covered by sea ice and (iii) land is covering significant parts of the images. By evaluating the model performance on multiple input image resolutions (pixel sizes of 500m, 1km and 2km), it is found that higher resolution yield the best performance. This emphasises the potential of using high resolution sensors like SAR for detecting polar lows, as compared to conventionally used sensors such as scatterometers. △ Less

Submitted 5 September, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: 11 pages (+4 supplementary), 11 figures (+2 supplementary)

arXiv:2203.07080 [pdf, other]

Probabilistic forecasts of wind power generation in regions with complex topography using deep learning methods: An Arctic case

Authors: Odin Foldvik Eikeland, Finn Dag Hovem, Tom Eirik Olsen, Matteo Chiesa, Filippo Maria Bianchi

Abstract: The energy market relies on forecasting capabilities of both demand and power generation that need to be kept in dynamic balance. Today, when it comes to renewable energy generation, such decisions are increasingly made in a liberalized electricity market environment, where future power generation must be offered through contracts and auction mechanisms, hence based on forecasts. The increased sha… ▽ More The energy market relies on forecasting capabilities of both demand and power generation that need to be kept in dynamic balance. Today, when it comes to renewable energy generation, such decisions are increasingly made in a liberalized electricity market environment, where future power generation must be offered through contracts and auction mechanisms, hence based on forecasts. The increased share of highly intermittent power generation from renewable energy sources increases the uncertainty about the expected future power generation. Point forecast does not account for such uncertainties. To account for these uncertainties, it is possible to make probabilistic forecasts. This work first presents important concepts and approaches concerning probabilistic forecasts with deep learning. Then, deep learning models are used to make probabilistic forecasts of day-ahead power generation from a wind power plant located in Northern Norway. The performance in terms of obtained quality of the prediction intervals is compared for different deep learning models and sets of covariates. The findings show that the accuracy of the predictions improves when historical data on measured weather and numerical weather predictions (NWPs) were included as exogenous variables. This allows the model to auto-correct systematic biases in the NWPs using the historical measurement data. Using only NWPs, or only measured weather as exogenous variables, worse prediction performances were obtained. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 16 pages, 8 Figures, 4 Tables

arXiv:2202.08756 [pdf, other]

doi 10.1109/TNNLS.2022.3217694

Ensemble Conformalized Quantile Regression for Probabilistic Time Series Forecasting

Authors: Vilde Jensen, Filippo Maria Bianchi, Stian Norman Anfinsen

Abstract: This paper presents a novel probabilistic forecasting method called ensemble conformalized quantile regression (EnCQR). EnCQR constructs distribution-free and approximately marginally valid prediction intervals (PIs), which are suitable for nonstationary and heteroscedastic time series data. EnCQR can be applied on top of a generic forecasting model, including deep learning architectures. EnCQR ex… ▽ More This paper presents a novel probabilistic forecasting method called ensemble conformalized quantile regression (EnCQR). EnCQR constructs distribution-free and approximately marginally valid prediction intervals (PIs), which are suitable for nonstationary and heteroscedastic time series data. EnCQR can be applied on top of a generic forecasting model, including deep learning architectures. EnCQR exploits a bootstrap ensemble estimator, which enables the use of conformal predictors for time series by removing the requirement of data exchangeability. The ensemble learners are implemented as generic machine learning algorithms performing quantile regression, which allow the length of the PIs to adapt to local variability in the data. In the experiments, we predict time series characterized by a different amount of heteroscedasticity. The results demonstrate that EnCQR outperforms models based only on quantile regression or conformal prediction, and it provides sharper, more informative, and valid PIs. △ Less

Submitted 6 November, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2022

arXiv:2111.02169 [pdf, other]

doi 10.1109/TPWRS.2022.3195301

Power Flow Balancing with Decentralized Graph Neural Networks

Authors: Jonas Berg Hansen, Stian Normann Anfinsen, Filippo Maria Bianchi

Abstract: We propose an end-to-end framework based on a Graph Neural Network (GNN) to balance the power flows in energy grids. The balancing is framed as a supervised vertex regression task, where the GNN is trained to predict the current and power injections at each grid branch that yield a power flow balance. By representing the power grid as a line graph with branches as vertices, we can train a GNN that… ▽ More We propose an end-to-end framework based on a Graph Neural Network (GNN) to balance the power flows in energy grids. The balancing is framed as a supervised vertex regression task, where the GNN is trained to predict the current and power injections at each grid branch that yield a power flow balance. By representing the power grid as a line graph with branches as vertices, we can train a GNN that is accurate and robust to changes in topology. In addition, by using specialized GNN layers, we are able to build a very deep architecture that accounts for large neighborhoods on the graph, while implementing only localized operations. We perform three different experiments to evaluate: i) the benefits of using localized rather than global operations and the tendency of deep GNN models to oversmooth the quantities on the nodes; ii) the resilience to perturbations in the graph topology; and iii) the capability to train the model simultaneously on multiple grid topologies and the consequential improvement in generalization to new, unseen grids. The proposed framework is efficient and, compared to other solvers based on deep learning, is robust to perturbations not only to the physical quantities on the grid components, but also to the topology. △ Less

Submitted 11 August, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

arXiv:2110.05292 [pdf, other]

doi 10.1109/TNNLS.2022.3190922

Understanding Pooling in Graph Neural Networks

Authors: Daniele Grattarola, Daniele Zambon, Filippo Maria Bianchi, Cesare Alippi

Abstract: Inspired by the conventional pooling layers in convolutional neural networks, many recent works in the field of graph machine learning have introduced pooling operators to reduce the size of graphs. The great variety in the literature stems from the many possible strategies for coarsening a graph, which may depend on different assumptions on the graph structure or the specific downstream task. In… ▽ More Inspired by the conventional pooling layers in convolutional neural networks, many recent works in the field of graph machine learning have introduced pooling operators to reduce the size of graphs. The great variety in the literature stems from the many possible strategies for coarsening a graph, which may depend on different assumptions on the graph structure or the specific downstream task. In this paper we propose a formal characterization of graph pooling based on three main operations, called selection, reduction, and connection, with the goal of unifying the literature under a common framework. Following this formalization, we introduce a taxonomy of pooling operators and categorize more than thirty pooling methods proposed in recent literature. We propose criteria to evaluate the performance of a pooling operator and use them to investigate and contrast the behavior of different classes of the taxonomy on a variety of tasks. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 10 pages, 6 figures

Journal ref: IEEE Transactions on Neural Networks and Learning Systems (Volume: 35, Issue: 2, February 2024)

arXiv:2108.07060 [pdf, other]

Detecting and interpreting faults in vulnerable power grids with machine learning

Authors: Odin Foldvik Eikeland, Inga Setså Holmstrand, Sigurd Bakkejord, Matteo Chiesa, Filippo Maria Bianchi

Abstract: Unscheduled power disturbances cause severe consequences both for customers and grid operators. To defend against such events, it is necessary to identify the causes of interruptions in the power distribution network. In this work, we focus on the power grid of a Norwegian community in the Arctic that experiences several faults whose sources are unknown. First, we construct a data set consisting o… ▽ More Unscheduled power disturbances cause severe consequences both for customers and grid operators. To defend against such events, it is necessary to identify the causes of interruptions in the power distribution network. In this work, we focus on the power grid of a Norwegian community in the Arctic that experiences several faults whose sources are unknown. First, we construct a data set consisting of relevant meteorological data and information about the current power quality logged by power-quality meters. Then, we adopt machine-learning techniques to predict the occurrence of faults. Experimental results show that both linear and non-linear classifiers achieve good classification performance. This indicates that the considered power-quality and weather variables explain well the power disturbances. Interpreting the decision process of the classifiers provides valuable insights to understand the main causes of disturbances. Traditional features selection methods can only indicate which are the variables that, on average, mostly explain the fault occurrences in the dataset. Besides providing such a global interpretation, it is also important to identify the specific set of variables that explain each individual fault. To address this challenge, we adopt a recent technique to interpret the decision process of a deep learning model, called Integrated Gradients. The proposed approach allows to gain detailed insights on the occurrence of a specific fault, which are valuable for the distribution system operators to implement strategies to prevent and mitigate power disturbances. △ Less

Submitted 16 August, 2021; originally announced August 2021.

arXiv:2104.04710 [pdf, other]

Pyramidal Reservoir Graph Neural Network

Authors: Filippo Maria Bianchi, Claudio Gallicchio, Alessio Micheli

Abstract: We propose a deep Graph Neural Network (GNN) model that alternates two types of layers. The first type is inspired by Reservoir Computing (RC) and generates new vertex features by iterating a non-linear map until it converges to a fixed point. The second type of layer implements graph pooling operations, that gradually reduce the support graph and the vertex features, and further improve the compu… ▽ More We propose a deep Graph Neural Network (GNN) model that alternates two types of layers. The first type is inspired by Reservoir Computing (RC) and generates new vertex features by iterating a non-linear map until it converges to a fixed point. The second type of layer implements graph pooling operations, that gradually reduce the support graph and the vertex features, and further improve the computational efficiency of the RC-based GNN. The architecture is, therefore, pyramidal. In the last layer, the features of the remaining vertices are combined into a single vector, which represents the graph embedding. Through a mathematical derivation introduced in this paper, we show formally how graph pooling can reduce the computational complexity of the model and speed-up the convergence of the dynamical updates of the vertex features. Our proposed approach to the design of RC-based GNNs offers an advantageous and principled trade-off between accuracy and complexity, which we extensively demonstrate in experiments on a large set of graph datasets. △ Less

Submitted 10 April, 2021; originally announced April 2021.

Comments: this is a pre-print version of a paper submitted for journal publication

arXiv:2006.13575 [pdf, other]

Large-scale detection and categorization of oil spills from SAR images with deep learning

Authors: Filippo Maria Bianchi, Martine M. Espeseth, Njål Borch

Abstract: We propose a deep learning framework to detect and categorize oil spills in synthetic aperture radar (SAR) images at a large scale. By means of a carefully designed neural network model for image segmentation trained on an extensive dataset, we are able to obtain state-of-the-art performance in oil spill detection, achieving results that are comparable to results produced by human operators. We al… ▽ More We propose a deep learning framework to detect and categorize oil spills in synthetic aperture radar (SAR) images at a large scale. By means of a carefully designed neural network model for image segmentation trained on an extensive dataset, we are able to obtain state-of-the-art performance in oil spill detection, achieving results that are comparable to results produced by human operators. We also introduce a classification task, which is novel in the context of oil spill detection in SAR. Specifically, after being detected, each oil spill is also classified according to different categories pertaining to its shape and texture characteristics. The classification results provide valuable insights for improving the design of oil spill services by world-leading providers. As the last contribution, we present our operational pipeline and a visualization tool for large-scale data, which allows to detect and analyze the historical presence of oil spills worldwide. △ Less

Submitted 24 June, 2020; originally announced June 2020.

arXiv:2004.07011 [pdf, other]

Code-Aligned Autoencoders for Unsupervised Change Detection in Multimodal Remote Sensing Images

Authors: Luigi T. Luppino, Mads A. Hansen, Michael Kampffmeyer, Filippo M. Bianchi, Gabriele Moser, Robert Jenssen, Stian N. Anfinsen

Abstract: Image translation with convolutional autoencoders has recently been used as an approach to multimodal change detection in bitemporal satellite images. A main challenge is the alignment of the code spaces by reducing the contribution of change pixels to the learning of the translation function. Many existing approaches train the networks by exploiting supervised information of the change areas, whi… ▽ More Image translation with convolutional autoencoders has recently been used as an approach to multimodal change detection in bitemporal satellite images. A main challenge is the alignment of the code spaces by reducing the contribution of change pixels to the learning of the translation function. Many existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. We propose to extract relational pixel information captured by domain-specific affinity matrices at the input and use this to enforce alignment of the code spaces and reduce the impact of change pixels on the learning objective. A change prior is derived in an unsupervised fashion from pixel pair affinities that are comparable across domains. To achieve code space alignment we enforce that pixel with similar affinity relations in the input domains should be correlated also in code space. We demonstrate the utility of this procedure in combination with cycle consistency. The proposed approach are compared with state-of-the-art deep learning algorithms. Experiments conducted on four real datasets show the effectiveness of our methodology. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:2001.04271 [pdf, other]

doi 10.1109/TGRS.2021.3056196

Deep Image Translation with an Affinity-Based Change Prior for Unsupervised Multimodal Change Detection

Authors: Luigi Tommaso Luppino, Michael Kampffmeyer, Filippo Maria Bianchi, Gabriele Moser, Sebastiano Bruno Serpico, Robert Jenssen, Stian Normann Anfinsen

Abstract: Image translation with convolutional neural networks has recently been used as an approach to multimodal change detection. Existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. A main challenge in the unsupervised problem setting is to avoid that change pixels affect the learning of the translation function. We pro… ▽ More Image translation with convolutional neural networks has recently been used as an approach to multimodal change detection. Existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. A main challenge in the unsupervised problem setting is to avoid that change pixels affect the learning of the translation function. We propose two new network architectures trained with loss functions weighted by priors that reduce the impact of change pixels on the learning objective. The change prior is derived in an unsupervised fashion from relational pixel information captured by domain-specific affinity matrices. Specifically, we use the vertex degrees associated with an absolute affinity difference matrix and demonstrate their utility in combination with cycle consistency and adversarial training. The proposed neural networks are compared with state-of-the-art algorithms. Experiments conducted on three real datasets show the effectiveness of our methodology. △ Less

Submitted 8 March, 2021; v1 submitted 13 January, 2020; originally announced January 2020.

arXiv:1910.11436 [pdf, other]

doi 10.1109/TNNLS.2020.3044146

Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling

Authors: Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi, Cesare Alippi

Abstract: In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the… ▽ More In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the GNN learns new node representations and fits them to a pyramid of coarsened graphs, which is computed offline in a pre-processing stage. NDP consists of three steps. First, a node decimation procedure selects the nodes belonging to one side of the partition identified by a spectral algorithm that approximates the \maxcut{} solution. Afterwards, the selected nodes are connected with Kron reduction to form the coarsened graph. Finally, since the resulting graph is very dense, we apply a sparsification procedure that prunes the adjacency matrix of the coarsened graph to reduce the computational cost in the GNN. Notably, we show that it is possible to remove many edges without significantly altering the graph structure. Experimental results show that NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks. △ Less

Submitted 20 April, 2024; v1 submitted 24 October, 2019; originally announced October 2019.

arXiv:1910.05411 [pdf, other]

Snow avalanche segmentation in SAR images with Fully Convolutional Neural Networks

Authors: Filippo Maria Bianchi, Jakob Grahn, Markus Eckerstorfer, Eirik Malnes, Hannah Vickers

Abstract: Knowledge about frequency and location of snow avalanche activity is essential for forecasting and mapping of snow avalanche hazard. Traditional field monitoring of avalanche activity has limitations, especially when surveying large and remote areas. In recent years, avalanche detection in Sentinel-1 radar satellite imagery has been developed to improve monitoring. However, the current state-of-th… ▽ More Knowledge about frequency and location of snow avalanche activity is essential for forecasting and mapping of snow avalanche hazard. Traditional field monitoring of avalanche activity has limitations, especially when surveying large and remote areas. In recent years, avalanche detection in Sentinel-1 radar satellite imagery has been developed to improve monitoring. However, the current state-of-the-art detection algorithms, based on radar signal processing techniques, are still much less accurate than human experts. To reduce this gap, we propose a deep learning architecture for detecting avalanches in Sentinel-1 radar images. We trained a neural network on 6,345 manually labelled avalanches from 117 Sentinel-1 images, each one consisting of six channels that include backscatter and topographical information. Then, we tested our trained model on a new SAR image. Comparing to the manual labelling (the gold standard), we achieved an F1 score above 66\%, while the state-of-the-art detection algorithm sits at an F1 score of only 38\%. A visual inspection of the results generated by our deep learning model shows that only small avalanches are undetected, while some avalanches that were originally not labelled by the human expert are discovered. △ Less

Submitted 6 November, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

arXiv:1909.05948 [pdf, other]

doi 10.1109/TGRS.2019.2930348

Unsupervised Image Regression for Heterogeneous Change Detection

Authors: Luigi T. Luppino, Filippo M. Bianchi, Gabriele Moser, Stian N. Anfinsen

Abstract: Change detection in heterogeneous multitemporal satellite images is an emerging and challenging topic in remote sensing. In particular, one of the main challenges is to tackle the problem in an unsupervised manner. In this paper we propose an unsupervised framework for bitemporal heterogeneous change detection based on the comparison of affinity matrices and image regression. First, our method qua… ▽ More Change detection in heterogeneous multitemporal satellite images is an emerging and challenging topic in remote sensing. In particular, one of the main challenges is to tackle the problem in an unsupervised manner. In this paper we propose an unsupervised framework for bitemporal heterogeneous change detection based on the comparison of affinity matrices and image regression. First, our method quantifies the similarity of affinity matrices computed from co-located image patches in the two images. This is done to automatically identify pixels that are likely to be unchanged. With the identified pixels as pseudo-training data, we learn a transformation to map the first image to the domain of the other image, and vice versa. Four regression methods are selected to carry out the transformation: Gaussian process regression, support vector regression, random forest regression, and a recently proposed kernel regression method called homogeneous pixel transformation. To evaluate the potentials and limitations of our framework, and also the benefits and disadvantages of each regression method, we perform experiments on two real data sets. The results indicate that the comparison of the affinity matrices can already be considered a change detection method by itself. However, image regression is shown to improve the results obtained by the previous step alone and produces accurate change detection maps despite of the heterogeneity of the multitemporal input data. Notably, the random forest regression approach excels by achieving similar accuracy as the other methods, but with a significantly lower computational cost and with fast and robust tuning of hyperparameters. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: arXiv admin note: text overlap with arXiv:1807.11766

arXiv:1907.05251 [pdf, other]

Time series cluster kernels to exploit informative missingness and incomplete label information

Authors: Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Filippo Maria Bianchi, Arthur Revhaug, Robert Jenssen

Abstract: The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparam… ▽ More The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning. However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities. Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed methods. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: arXiv admin note: text overlap with arXiv:1803.07879

arXiv:1907.00481 [pdf, other]

Spectral Clustering with Graph Neural Networks for Graph Pooling

Authors: Filippo Maria Bianchi, Daniele Grattarola, Cesare Alippi

Abstract: Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new… ▽ More Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new optimization for each new sample. In this paper, we propose a graph clustering approach that addresses these limitations of SC. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNN-based implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on out-of-sample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of state-of-the-art graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks. △ Less

Submitted 29 December, 2020; v1 submitted 30 June, 2019; originally announced July 2019.

arXiv:1902.07517 [pdf, other]

doi 10.1016/j.patcog.2019.01.033

Noisy multi-label semi-supervised dimensionality reduction

Authors: Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Filippo Maria Bianchi, Robert Jenssen

Abstract: Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted… ▽ More Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms. △ Less

Submitted 20 February, 2019; originally announced February 2019.

Comments: 38 pages

Journal ref: Pattern Recognition, Vol 90, June 2019, Pages 257-270

arXiv:1902.04981 [pdf, other]

doi 10.1016/j.neunet.2019.01.015

Deep Divergence-Based Approach to Clustering

Authors: Michael Kampffmeyer, Sigurd Løkse, Filippo M. Bianchi, Lorenzo Livi, Arnt-Børre Salberg, Robert Jenssen

Abstract: A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open ques… ▽ More A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open question. Our contribution to this emerging field is a new deep clustering network that leverages the discriminative power of information-theoretic divergence measures, which have been shown to be effective in traditional clustering. We propose a novel loss function that incorporates geometric regularization constraints, thus avoiding degenerate structures of the resulting clustering partition. Experiments on synthetic benchmarks and real datasets show that the proposed network achieves competitive performance with respect to other state-of-the-art methods, scales well to large datasets, and does not require pre-training steps. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1901.01343 [pdf, other]

doi 10.1109/TPAMI.2021.3054830

Graph Neural Networks with convolutional ARMA filters

Authors: Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi, Cesare Alippi

Abstract: Popular graph neural networks implement convolution operations on graphs based on polynomial spectral filters. In this paper, we propose a novel graph convolutional layer inspired by the auto-regressive moving average (ARMA) filter that, compared to polynomial ones, provides a more flexible frequency response, is more robust to noise, and better captures the global graph structure. We propose a gr… ▽ More Popular graph neural networks implement convolution operations on graphs based on polynomial spectral filters. In this paper, we propose a novel graph convolutional layer inspired by the auto-regressive moving average (ARMA) filter that, compared to polynomial ones, provides a more flexible frequency response, is more robust to noise, and better captures the global graph structure. We propose a graph neural network implementation of the ARMA filter with a recursive and distributed formulation, obtaining a convolutional layer that is efficient to train, localized in the node space, and can be transferred to new graphs at test time. We perform a spectral analysis to study the filtering effect of the proposed ARMA layer and report experiments on four downstream tasks: semi-supervised node classification, graph signal classification, graph classification, and graph regression. Results show that the proposed ARMA layer brings significant improvements over graph neural networks based on polynomial filters. △ Less

Submitted 24 January, 2021; v1 submitted 4 January, 2019; originally announced January 2019.

arXiv:1807.11766 [pdf, other]

Remote sensing image regression for heterogeneous change detection

Authors: Luigi T. Luppino, Filippo M. Bianchi, Gabriele Moser, Stian N. Anfinsen

Abstract: Change detection in heterogeneous multitemporal satellite images is an emerging topic in remote sensing. In this paper we propose a framework, based on image regression, to perform change detection in heterogeneous multitemporal satellite images, which has become a main topic in remote sensing. Our method learns a transformation to map the first image to the domain of the other image, and vice ver… ▽ More Change detection in heterogeneous multitemporal satellite images is an emerging topic in remote sensing. In this paper we propose a framework, based on image regression, to perform change detection in heterogeneous multitemporal satellite images, which has become a main topic in remote sensing. Our method learns a transformation to map the first image to the domain of the other image, and vice versa. Four regression methods are selected to carry out the transformation: Gaussian processes, support vector machines, random forests, and a recently proposed kernel regression method called homogeneous pixel transformation. To evaluate not only potentials and limitations of our framework, but also the pros and cons of each regression method, we perform experiments on two data sets. The results indicates that random forests achieve good performance, are fast and robust to hyperparameters, whereas the homogeneous pixel transformation method can achieve better accuracy at the cost of a higher complexity. △ Less

Submitted 31 July, 2018; originally announced July 2018.

Comments: Accepted to Machine Learning for Signal Processing 2018

arXiv:1807.07868 [pdf, other]

doi 10.1016/j.asoc.2018.07.029

The Deep Kernelized Autoencoder

Authors: Michael Kampffmeyer, Sigurd Løkse, Filippo M. Bianchi, Robert Jenssen, Lorenzo Livi

Abstract: Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topologica… ▽ More Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topological properties of input data. In this paper, we enhance the autoencoder's ability to learn effective data representations by aligning inner products between codes with respect to a kernel matrix. By doing so, the proposed kernelized autoencoder allows learning similarity-preserving embeddings of input data, where the notion of similarity is explicitly controlled by the user and encoded in a positive semi-definite kernel matrix. Experiments are performed for evaluating both reconstruction and kernel alignment performance in classification tasks and visualization of high-dimensional data. Additionally, we show that our method is capable to emulate kernel principal component analysis on a denoising task, obtaining competitive results at a much lower computational cost. △ Less

Submitted 23 July, 2018; v1 submitted 19 July, 2018; originally announced July 2018.

Comments: This work extends the preliminary (conference) version of this paper (arXiv:1702.02526), Applied Soft Computing, Elsevier, 2018

arXiv:1805.03473 [pdf, other]

Learning representations for multivariate time series with missing data using Temporal Kernelized Autoencoders

Authors: Filippo Maria Bianchi, Lorenzo Livi, Karl Øyvind Mikalsen, Michael Kampffmeyer, Robert Jenssen

Abstract: Learning compressed representations of multivariate time series (MTS) facilitates data analysis in the presence of noise and redundant information, and for a large number of variates and time steps. However, classical dimensionality reduction approaches are designed for vectorial data and cannot deal explicitly with missing values. In this work, we propose a novel autoencoder architecture based on… ▽ More Learning compressed representations of multivariate time series (MTS) facilitates data analysis in the presence of noise and redundant information, and for a large number of variates and time steps. However, classical dimensionality reduction approaches are designed for vectorial data and cannot deal explicitly with missing values. In this work, we propose a novel autoencoder architecture based on recurrent neural networks to generate compressed representations of MTS. The proposed model can process inputs characterized by variable lengths and it is specifically designed to handle missing data. Our autoencoder learns fixed-length vectorial representations, whose pairwise similarities are aligned to a kernel function that operates in input space and that handles missing values. This allows to learn good representations, even in the presence of a significant amount of missing data. To show the effectiveness of the proposed approach, we evaluate the quality of the learned representations in several classification tasks, including those involving medical data, and we compare to other methods for dimensionality reduction. Successively, we design two frameworks based on the proposed architecture: one for imputing missing data and another for one-class classification. Finally, we analyze under what circumstances an autoencoder with recurrent layers can learn better compressed representations of MTS than feed-forward architectures. △ Less

Submitted 16 July, 2019; v1 submitted 9 May, 2018; originally announced May 2018.

arXiv:1803.07879 [pdf, other]

An Unsupervised Multivariate Time Series Kernel Approach for Identifying Patients with Surgical Site Infection from Blood Samples

Authors: Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Filippo Maria Bianchi, Arthur Revhaug, Robert Jenssen

Abstract: A large fraction of the electronic health records consists of clinical measurements collected over time, such as blood tests, which provide important information about the health status of a patient. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and the presence of missing data, which complicate analysis. In this work, we pro… ▽ More A large fraction of the electronic health records consists of clinical measurements collected over time, such as blood tests, which provide important information about the health status of a patient. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and the presence of missing data, which complicate analysis. In this work, we propose a surgical site infection detection framework for patients undergoing colorectal cancer surgery that is completely unsupervised, hence alleviating the problem of getting access to labelled training data. The framework is based on powerful kernels for multivariate time series that account for missing data when computing similarities. Our approach show superior performance compared to baselines that have to resort to imputation techniques and performs comparable to a supervised classification baseline. △ Less

Submitted 21 March, 2018; originally announced March 2018.

arXiv:1803.07870 [pdf, other]

Reservoir computing approaches for representation and classification of multivariate time series

Authors: Filippo Maria Bianchi, Simone Scardapane, Sigurd Løkse, Robert Jenssen

Abstract: Classification of multivariate time series (MTS) has been tackled with a large variety of methodologies and applied to a wide range of scenarios. Reservoir Computing (RC) provides efficient tools to generate a vectorial, fixed-size representation of the MTS that can be further processed by standard classifiers. Despite their unrivaled training speed, MTS classifiers based on a standard RC architec… ▽ More Classification of multivariate time series (MTS) has been tackled with a large variety of methodologies and applied to a wide range of scenarios. Reservoir Computing (RC) provides efficient tools to generate a vectorial, fixed-size representation of the MTS that can be further processed by standard classifiers. Despite their unrivaled training speed, MTS classifiers based on a standard RC architecture fail to achieve the same accuracy of fully trainable neural networks. In this paper we introduce the reservoir model space, an unsupervised approach based on RC to learn vectorial representations of MTS. Each MTS is encoded within the parameters of a linear model trained to predict a low-dimensional embedding of the reservoir dynamics. Compared to other RC methods, our model space yields better representations and attains comparable computational performance, thanks to an intermediate dimensionality reduction procedure. As a second contribution we propose a modular RC framework for MTS classification, with an associated open-source Python library. The framework provides different modules to seamlessly implement advanced RC architectures. The architectures are compared to other MTS classifiers, including deep learning models and time series kernels. Results obtained on benchmark and real-world MTS datasets show that RC classifiers are dramatically faster and, when implemented using our proposed representation, also achieve superior classification accuracy. △ Less

Submitted 7 June, 2020; v1 submitted 21 March, 2018; originally announced March 2018.

arXiv:1801.06845 [pdf, other]

Time series kernel similarities for predicting Paroxysmal Atrial Fibrillation from ECGs

Authors: Filippo Maria Bianchi, Lorenzo Livi, Alberto Ferrante, Jelena Milosevic, Miroslaw Malek

Abstract: We tackle the problem of classifying Electrocardiography (ECG) signals with the aim of predicting the onset of Paroxysmal Atrial Fibrillation (PAF). Atrial fibrillation is the most common type of arrhythmia, but in many cases PAF episodes are asymptomatic. Therefore, in order to help diagnosing PAF, it is important to design procedures for detecting and, more importantly, predicting PAF episodes.… ▽ More We tackle the problem of classifying Electrocardiography (ECG) signals with the aim of predicting the onset of Paroxysmal Atrial Fibrillation (PAF). Atrial fibrillation is the most common type of arrhythmia, but in many cases PAF episodes are asymptomatic. Therefore, in order to help diagnosing PAF, it is important to design procedures for detecting and, more importantly, predicting PAF episodes. We propose a method for predicting PAF events whose first step consists of a feature extraction procedure that represents each ECG as a multi-variate time series. Successively, we design a classification framework based on kernel similarities for multi-variate time series, capable of handling missing data. We consider different approaches to perform classification in the original space of the multi-variate time series and in an embedding space, defined by the kernel similarity measure. We achieve a classification accuracy comparable with state of the art methods, with the additional advantage of detecting the PAF onset up to 15 minutes in advance. △ Less

Submitted 4 April, 2018; v1 submitted 21 January, 2018; originally announced January 2018.

arXiv:1712.03560 [pdf, other]

A novel algorithm for online inexact string matching and its FPGA implementation

Authors: Alessandro Cinti, Filippo Maria Bianchi, Alessio Martino, Antonello Rizzi

Abstract: Accelerating inexact string matching procedures is of utmost importance when dealing with practical applications where huge amount of data must be processed in real time, as usual in bioinformatics or cybersecurity. Inexact matching procedures can yield multiple shadow hits, which must be filtered, according to some criterion, to obtain a concise and meaningful list of occurrences. The filtering p… ▽ More Accelerating inexact string matching procedures is of utmost importance when dealing with practical applications where huge amount of data must be processed in real time, as usual in bioinformatics or cybersecurity. Inexact matching procedures can yield multiple shadow hits, which must be filtered, according to some criterion, to obtain a concise and meaningful list of occurrences. The filtering procedures are often computationally demanding and are performed offline in a post-processing phase. This paper introduces a novel algorithm for Online Approximate String Matching (OASM) able to filter shadow hits on the fly, according to general purpose priority rules that greedily assign priorities to overlapping hits. An FPGA hardware implementation of OASM is proposed and compared with a serial software version. Even when implemented on entry level FPGAs, the proposed procedure can reach a high degree of parallelism and superior performance in time compared to the software implementation, while keeping low the usage of logic elements. This makes the developed architecture very competitive in terms of both performance and cost of the overall computing system. △ Less

Submitted 26 November, 2018; v1 submitted 10 December, 2017; originally announced December 2017.

arXiv:1711.08398 [pdf, other]

doi 10.1016/j.engappai.2016.05.007

Identifying user habits through data mining on call data records

Authors: Filippo Maria Bianchi, Antonello Rizzi, Alireza Sadeghian, Corrado Moiso

Abstract: In this paper we propose a framework for identifying patterns and regularities in the pseudo-anonymized Call Data Records (CDR) pertaining a generic subscriber of a mobile operator. We face the challenging task of automatically deriving meaningful information from the available data, by using an unsupervised procedure of cluster analysis and without including in the model any \textit{a-priori} kno… ▽ More In this paper we propose a framework for identifying patterns and regularities in the pseudo-anonymized Call Data Records (CDR) pertaining a generic subscriber of a mobile operator. We face the challenging task of automatically deriving meaningful information from the available data, by using an unsupervised procedure of cluster analysis and without including in the model any \textit{a-priori} knowledge on the applicative context. Clusters mining results are employed for understanding users' habits and to draw their characterizing profiles. We propose two implementations of the data mining procedure; the first is based on a novel system for clusters and knowledge discovery called LD-ABCD, capable of retrieving clusters and, at the same time, to automatically discover for each returned cluster the most appropriate dissimilarity measure (local metric). The second approach instead is based on PROCLUS, the well-know subclustering algorithm. The dataset under analysis contains records characterized only by few features and, consequently, we show how to generate additional fields which describe implicit information hidden in data. Finally, we propose an effective graphical representation of the results of the data-mining procedure, which can be easily understood and employed by analysts for practical applications. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Journal ref: Engineering Applications of Artificial Intelligence, Elsevier, 2016

arXiv:1711.06516 [pdf, other]

Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks

Authors: Andreas Storvik Strauman, Filippo Maria Bianchi, Karl Øyvind Mikalsen, Michael Kampffmeyer, Cristina Soguero-Ruiz, Robert Jenssen

Abstract: Clinical measurements that can be represented as time series constitute an important fraction of the electronic health records and are often both uncertain and incomplete. Recurrent neural networks are a special class of neural networks that are particularly suitable to process time series data but, in their original formulation, cannot explicitly deal with missing data. In this paper, we explore… ▽ More Clinical measurements that can be represented as time series constitute an important fraction of the electronic health records and are often both uncertain and incomplete. Recurrent neural networks are a special class of neural networks that are particularly suitable to process time series data but, in their original formulation, cannot explicitly deal with missing data. In this paper, we explore imputation strategies for handling missing values in classifiers based on recurrent neural network (RNN) and apply a recently proposed recurrent architecture, the Gated Recurrent Unit with Decay, specifically designed to handle missing data. We focus on the problem of detecting surgical site infection in patients by analyzing time series of their blood sample measurements and we compare the results obtained with different RNN-based classifiers. △ Less

Submitted 17 November, 2017; originally announced November 2017.

arXiv:1711.06509 [pdf, other]

Bidirectional deep-readout echo state networks

Authors: Filippo Maria Bianchi, Simone Scardapane, Sigurd Løkse, Robert Jenssen

Abstract: We propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds temporal relationships in the data. To improve the memorization capability, we implement a bidirectional reservoir, whose last state captures also past dependencies in the input. We apply dimensionality reduction to th… ▽ More We propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds temporal relationships in the data. To improve the memorization capability, we implement a bidirectional reservoir, whose last state captures also past dependencies in the input. We apply dimensionality reduction to the final reservoir states to obtain compressed fixed size representations of the time series. These are subsequently fed into a deep feedforward network trained to perform the final classification. We test our architecture on benchmark datasets and on a real-world use-case of blood samples classification. Results show that our method performs better than a standard echo state network and, at the same time, achieves results comparable to a fully-trained recurrent network, but with a faster training. △ Less

Submitted 13 February, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

arXiv:1710.07547 [pdf, other]

Learning compressed representations of blood samples time series with missing data

Authors: Filippo Maria Bianchi, Karl Øyvind Mikalsen, Robert Jenssen

Abstract: Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluste… ▽ More Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluster Kernel (TCK), a kernel that accounts for missingness patterns in MTS. Via kernel alignment, we incorporate TCK in the autoencoder to improve the learned representations in presence of missing data. We consider a classification problem of MTS with missing values, representing blood samples of patients with surgical site infection. With our approach, rather than with a standard autoencoder, we learn representations in low dimensions that can be classified better. △ Less

Submitted 20 October, 2017; originally announced October 2017.

arXiv:1705.04378 [pdf, other]

doi 10.1007/978-3-319-70338-1

An overview and comparative analysis of Recurrent Neural Networks for Short Term Load Forecasting

Authors: Filippo Maria Bianchi, Enrico Maiorino, Michael C. Kampffmeyer, Antonello Rizzi, Robert Jenssen

Abstract: The key component in forecasting demand and consumption of resources in a supply network is an accurate prediction of real-valued time series. Indeed, both service interruptions and resource waste can be reduced with the implementation of an effective forecasting system. Significant research has thus been devoted to the design and development of methodologies for short term load forecasting over t… ▽ More The key component in forecasting demand and consumption of resources in a supply network is an accurate prediction of real-valued time series. Indeed, both service interruptions and resource waste can be reduced with the implementation of an effective forecasting system. Significant research has thus been devoted to the design and development of methodologies for short term load forecasting over the past decades. A class of mathematical models, called Recurrent Neural Networks, are nowadays gaining renewed interest among researchers and they are replacing many practical implementation of the forecasting systems, previously based on static methods. Despite the undeniable expressive power of these architectures, their recurrent nature complicates their understanding and poses challenges in the training procedures. Recently, new important families of recurrent architectures have emerged and their applicability in the context of load forecasting has not been investigated completely yet. In this paper we perform a comparative study on the problem of Short-Term Load Forecast, by using different classes of state-of-the-art Recurrent Neural Networks. We test the reviewed models first on controlled synthetic tasks and then on different real datasets, covering important practical cases of study. We provide a general overview of the most important architectures and we define guidelines for configuring the recurrent networks to predict real-valued time series. △ Less

Submitted 20 July, 2018; v1 submitted 11 May, 2017; originally announced May 2017.

Comments: Springer Briefs in Computer Science (ISBN 978-3-319-70338-1), 2017

arXiv:1704.00794 [pdf, other]

Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

Authors: Karl Øyvind Mikalsen, Filippo Maria Bianchi, Cristina Soguero-Ruiz, Robert Jenssen

Abstract: Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robu… ▽ More Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data. △ Less

Submitted 29 June, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

Comments: 23 pages, 6 figures

arXiv:1702.03176 [pdf, other]

A clustering approach to heterogeneous change detection

Authors: Luigi Tommaso Luppino, Stian Normann Anfinsen, Gabriele Moser, Robert Jenssen, Filippo Maria Bianchi, Sebastiano Serpico, Gregoire Mercier

Abstract: Change detection in heterogeneous multitemporal satellite images is a challenging and still not much studied topic in remote sensing and earth observation. This paper focuses on comparison of image pairs covering the same geographical area and acquired by two different sensors, one optical radiometer and one synthetic aperture radar, at two different times. We propose a clustering-based technique… ▽ More Change detection in heterogeneous multitemporal satellite images is a challenging and still not much studied topic in remote sensing and earth observation. This paper focuses on comparison of image pairs covering the same geographical area and acquired by two different sensors, one optical radiometer and one synthetic aperture radar, at two different times. We propose a clustering-based technique to detect changes, identified as clusters that split or merge in the different images. To evaluate potentials and limitations of our method, we perform experiments on real data. Preliminary results confirm the relationship between splits and merges of clusters and the occurrence of changes. However, it becomes evident that it is necessary to incorporate prior, ancillary, or application-specific information to improve the interpretation of clustering results and to identify unambiguously the areas of change. △ Less

Submitted 10 February, 2017; originally announced February 2017.

arXiv:1702.02526 [pdf, other]

Deep Kernelized Autoencoders

Authors: Michael Kampffmeyer, Sigurd Løkse, Filippo Maria Bianchi, Robert Jenssen, Lorenzo Livi

Abstract: In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from such a kernel space to input space. The proposed method is based on traditional autoencoders and is trained through a new unsupervised loss function. During trai… ▽ More In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from such a kernel space to input space. The proposed method is based on traditional autoencoders and is trained through a new unsupervised loss function. During training, we optimize both the reconstruction accuracy of input samples and the alignment between a kernel matrix given as prior and the inner products of the hidden representations computed by the autoencoder. Kernel alignment provides control over the hidden representation learned by the autoencoder. Experiments have been performed to evaluate both reconstruction and kernel alignment performance. Additionally, we applied our method to emulate kPCA on a denoising task obtaining promising results. △ Less

Submitted 8 February, 2017; originally announced February 2017.

arXiv:1701.05159 [pdf, other]

Temporal Overdrive Recurrent Neural Network

Authors: Filippo Maria Bianchi, Michael Kampffmeyer, Enrico Maiorino, Robert Jenssen

Abstract: In this work we present a novel recurrent neural network architecture designed to model systems characterized by multiple characteristic timescales in their dynamics. The proposed network is composed by several recurrent groups of neurons that are trained to separately adapt to each timescale, in order to improve the system identification process. We test our framework on time series prediction ta… ▽ More In this work we present a novel recurrent neural network architecture designed to model systems characterized by multiple characteristic timescales in their dynamics. The proposed network is composed by several recurrent groups of neurons that are trained to separately adapt to each timescale, in order to improve the system identification process. We test our framework on time series prediction tasks and we show some promising, preliminary results achieved on synthetic data. To evaluate the capabilities of our network, we compare the performance with several state-of-the-art recurrent architectures. △ Less

Submitted 18 January, 2017; originally announced January 2017.

arXiv:1609.03068 [pdf, other]

doi 10.1038/srep44037

Multiplex visibility graphs to investigate recurrent neural networks dynamics

Authors: Filippo Maria Bianchi, Lorenzo Livi, Cesare Alippi, Robert Jenssen

Abstract: A recurrent neural network (RNN) is a universal approximator of dynamical systems, whose performance often depends on sensitive hyperparameters. Tuning of such hyperparameters may be difficult and, typically, based on a trial-and-error approach. In this work, we adopt a graph-based framework to interpret and characterize the internal RNN dynamics. Through this insight, we are able to design a prin… ▽ More A recurrent neural network (RNN) is a universal approximator of dynamical systems, whose performance often depends on sensitive hyperparameters. Tuning of such hyperparameters may be difficult and, typically, based on a trial-and-error approach. In this work, we adopt a graph-based framework to interpret and characterize the internal RNN dynamics. Through this insight, we are able to design a principled unsupervised method to derive configurations with maximized performances, in terms of prediction error and memory capacity. In particular, we propose to model time series of neurons activations with the recently introduced horizontal visibility graphs, whose topological properties reflect important dynamical features of the underlying dynamic system. Successively, each graph becomes a layer of a larger structure, called multiplex. We show that topological properties of such a multiplex reflect important features of RNN dynamics and are used to guide the tuning procedure. To validate the proposed method, we consider a class of RNNs called echo state networks. We perform experiments and discuss results on several benchmarks and real-world dataset of call data records. △ Less

Submitted 20 January, 2017; v1 submitted 10 September, 2016; originally announced September 2016.

arXiv:1608.04622 [pdf, other]

doi 10.1007/s12559-017-9450-z

Training Echo State Networks with Regularization through Dimensionality Reduction

Authors: Sigurd Løkse, Filippo Maria Bianchi, Robert Jenssen

Abstract: In this paper we introduce a new framework to train an Echo State Network to predict real valued time-series. The method consists in projecting the output of the internal layer of the network on a space with lower dimensionality, before training the output layer to learn the target task. Notably, we enforce a regularization constraint that leads to better generalization capabilities. We evaluate t… ▽ More In this paper we introduce a new framework to train an Echo State Network to predict real valued time-series. The method consists in projecting the output of the internal layer of the network on a space with lower dimensionality, before training the output layer to learn the target task. Notably, we enforce a regularization constraint that leads to better generalization capabilities. We evaluate the performances of our approach on several benchmark tests, using different techniques to train the readout of the network, achieving superior predictive performance when using the proposed framework. Finally, we provide an insight on the effectiveness of the implemented mechanics through a visualization of the trajectory in the phase space and relying on the methodologies of nonlinear time-series analysis. By applying our method on well known chaotic systems, we provide evidence that the lower dimensional embedding retains the dynamical properties of the underlying system better than the full-dimensional internal states of the network. △ Less

Submitted 16 August, 2016; originally announced August 2016.

arXiv:1603.03685 [pdf, other]

doi 10.1109/TNNLS.2016.2644268

Determination of the edge of criticality in echo state networks through Fisher information maximization

Authors: Lorenzo Livi, Filippo Maria Bianchi, Cesare Alippi

Abstract: It is a widely accepted fact that the computational capability of recurrent neural networks is maximized on the so-called "edge of criticality". Once the network operates in this configuration, it performs efficiently on a specific application both in terms of (i) low prediction error and (ii) high short-term memory capacity. Since the behavior of recurrent networks is strongly influenced by the p… ▽ More It is a widely accepted fact that the computational capability of recurrent neural networks is maximized on the so-called "edge of criticality". Once the network operates in this configuration, it performs efficiently on a specific application both in terms of (i) low prediction error and (ii) high short-term memory capacity. Since the behavior of recurrent networks is strongly influenced by the particular input signal driving the dynamics, a universal, application-independent method for determining the edge of criticality is still missing. In this paper, we aim at addressing this issue by proposing a theoretically motivated, unsupervised method based on Fisher information for determining the edge of criticality in recurrent neural networks. It is proven that Fisher information is maximized for (finite-size) systems operating in such critical regions. However, Fisher information is notoriously difficult to compute and either requires the probability density function or the conditional dependence of the system states with respect to the model parameters. The paper takes advantage of a recently-developed non-parametric estimator of the Fisher information matrix and provides a method to determine the critical region of echo state networks, a particular class of recurrent networks. The considered control parameters, which indirectly affect the echo state network performance, are explored to identify those configurations lying on the edge of criticality and, as such, maximizing Fisher information and computational performance. Experimental results on benchmarks and real-world data demonstrate the effectiveness of the proposed method. △ Less

Submitted 2 September, 2016; v1 submitted 11 March, 2016; originally announced March 2016.

Showing 1–50 of 54 results for author: Bianchi, F M