-
Information-theoretic Quantification of High-order Feature Effects in Classification Problems
Authors:
Ivan Lazic,
Chiara Barà,
Marta Iovino,
Sebastiano Stramaglia,
Niksa Jakovljevic,
Luca Faes
Abstract:
Understanding the contribution of individual features in predictive models remains a central goal in interpretable machine learning, and while many model-agnostic methods exist to estimate feature importance, they often fall short in capturing high-order interactions and disentangling overlapping contributions. In this work, we present an information-theoretic extension of the High-order interacti…
▽ More
Understanding the contribution of individual features in predictive models remains a central goal in interpretable machine learning, and while many model-agnostic methods exist to estimate feature importance, they often fall short in capturing high-order interactions and disentangling overlapping contributions. In this work, we present an information-theoretic extension of the High-order interactions for Feature importance (Hi-Fi) method, leveraging Conditional Mutual Information (CMI) estimated via a k-Nearest Neighbor (kNN) approach working on mixed discrete and continuous random variables. Our framework decomposes feature contributions into unique, synergistic, and redundant components, offering a richer, model-independent understanding of their predictive roles. We validate the method using synthetic datasets with known Gaussian structures, where ground truth interaction patterns are analytically derived, and further test it on non-Gaussian and real-world gene expression data from TCGA-BRCA. Results indicate that the proposed estimator accurately recovers theoretical and expected findings, providing a potential use case for developing feature selection algorithms or model development based on interaction analysis.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Localizing synergies of hidden factors across complex systems: resting brain networks and HeLa gene expression profile as case studies
Authors:
Marlis Ontivero-Ortega,
Gorana Mijatovic,
Luca Faes,
Daniele Marinazzo,
Sebastiano Stramaglia
Abstract:
Factor analysis is a well-known statistical method to describe the variability of observed variables in terms of a smaller number of unobserved latent variables called factors. Even though latent factors are conceptually independent of each other, their influence on the observed variables is often joint and synergistic. We propose to quantify the synergy of the joint influence of factors on the ob…
▽ More
Factor analysis is a well-known statistical method to describe the variability of observed variables in terms of a smaller number of unobserved latent variables called factors. Even though latent factors are conceptually independent of each other, their influence on the observed variables is often joint and synergistic. We propose to quantify the synergy of the joint influence of factors on the observed variables using the O-information, a recently introduced metrics to assess high order dependencies in complex systems, in a new framework where latent factors and observed variables are jointly analyzed in terms of their joint informational character. Two case studies are reported: analyzing resting fMRI data, we find that DMN and FP networks show the highest synergy, consistently with their crucial role in higher cognitive functions; concerning HeLa cells, we find that the most synergistic gene is STK-12 (AURKB), suggesting that this gene is involved in controlling the HeLa cell cycle. We believe that this approach, representing a bridge between factor analysis and the field of high-order interactions, will find wide application across several domains.
△ Less
Submitted 27 May, 2025;
originally announced June 2025.
-
Assessing high-order effects in feature importance via predictability decomposition
Authors:
Marlis Ontivero-Ortega,
Luca Faes,
Jesus M Cortes,
Daniele Marinazzo,
Sebastiano Stramaglia
Abstract:
Leveraging the large body of work devoted in recent years to describe redundancy and synergy in multivariate interactions among random variables, we propose a novel approach to quantify cooperative effects in feature importance, one of the most used techniques for explainable artificial intelligence. In particular, we propose an adaptive version of a well-known metric of feature importance, named…
▽ More
Leveraging the large body of work devoted in recent years to describe redundancy and synergy in multivariate interactions among random variables, we propose a novel approach to quantify cooperative effects in feature importance, one of the most used techniques for explainable artificial intelligence. In particular, we propose an adaptive version of a well-known metric of feature importance, named Leave One Covariate Out (LOCO), to disentangle high-order effects involving a given input feature in regression problems. LOCO is the reduction of the prediction error when the feature under consideration is added to the set of all the features used for regression. Instead of calculating the LOCO using all the features at hand, as in its standard version, our method searches for the multiplet of features that maximize LOCO and for the one that minimize it. This provides a decomposition of the LOCO as the sum of a two-body component and higher-order components (redundant and synergistic), also highlighting the features that contribute to building these high-order effects alongside the driving feature. We report the application to proton/pion discrimination from simulated detector measures by GEANT.
△ Less
Submitted 12 March, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
Partial information decomposition for mixed discrete and continuous random variables
Authors:
Chiara Barà,
Yuri Antonacci,
Marta Iovino,
Ivan Lazic,
Luca Faes
Abstract:
The framework of Partial Information Decomposition (PID) unveils complex nonlinear interactions in network systems by dissecting the mutual information (MI) between a target variable and several source variables. While PID measures have been formulated mostly for discrete variables, with only recent extensions to continuous systems, the case of mixed variables where the target is discrete and the…
▽ More
The framework of Partial Information Decomposition (PID) unveils complex nonlinear interactions in network systems by dissecting the mutual information (MI) between a target variable and several source variables. While PID measures have been formulated mostly for discrete variables, with only recent extensions to continuous systems, the case of mixed variables where the target is discrete and the sources are continuous is not yet covered properly. Here, we introduce a PID scheme whereby the MI between a specific state of the discrete target and (subsets of) the continuous sources is expressed as a Kullback-Leibler divergence and is estimated through a data-efficient nearest-neighbor strategy. The effectiveness of this PID is demonstrated in simulated systems of mixed variables and showcased in a physiological application. Our approach is relevant to many scientific problems, including sensory coding in neuroscience and feature selection in machine learning.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Disentangling high order effects in the transfer entropy
Authors:
Sebastiano Stramaglia,
Luca Faes,
Jesus M. Cortes,
Daniele Marinazzo
Abstract:
Transfer Entropy (TE), the primary method for determining directed information flow within a network system, can exhibit bias - either in deficiency or excess - during both pairwise and conditioned calculations, owing to high-order dependencies among the dynamic processes under consideration and the remaining processes in the system used for conditioning. Here, we propose a novel approach. Instead…
▽ More
Transfer Entropy (TE), the primary method for determining directed information flow within a network system, can exhibit bias - either in deficiency or excess - during both pairwise and conditioned calculations, owing to high-order dependencies among the dynamic processes under consideration and the remaining processes in the system used for conditioning. Here, we propose a novel approach. Instead of conditioning TE on all network processes except the driver and target, as in its fully conditioned version, or not conditioning at all, as in the pairwise approach, our method searches for both the multiplets of variables that maximize information flow and those that minimize it. This provides a decomposition of TE into unique, redundant, and synergistic atoms. Our approach enables the quantification of the relative importance of high-order effects compared to pure two-body effects in information transfer between two processes, while also highlighting the processes that contribute to building these high-order effects alongside the driver. We demonstrate the application of our approach in climatology by analyzing data from El Niño and the Southern Oscillation.
△ Less
Submitted 13 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Gradients of O-information: low-order descriptors of high-order dependencies
Authors:
Tomas Scagliarini,
Davide Nuzzi,
Yuri Antonacci,
Luca Faes,
Fernando E. Rosas,
Daniele Marinazzo,
Sebastiano Stramaglia
Abstract:
O-information is an information-theoretic metric that captures the overall balance between redundant and synergistic information shared by groups of three or more variables. To complement the global assessment provided by this metric, here we propose the gradients of the O-information as low-order descriptors that can characterise how high-order effects are localised across a system of interest. W…
▽ More
O-information is an information-theoretic metric that captures the overall balance between redundant and synergistic information shared by groups of three or more variables. To complement the global assessment provided by this metric, here we propose the gradients of the O-information as low-order descriptors that can characterise how high-order effects are localised across a system of interest. We illustrate the capabilities of the proposed framework by revealing the role of specific spins in Ising models with frustration, and on practical data analysis on US macroeconomic data. Our theoretical and empirical analyses demonstrate the potential of these gradients to highlight the contribution of variables in forming high-order informational circuits
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Local Granger Causality
Authors:
Sebastiano Stramaglia,
Tomas Scagliarini,
Yuri Antonacci,
Luca Faes
Abstract:
Granger causality is a statistical notion of causal influence based on prediction via vector autoregression. For Gaussian variables it is equivalent to transfer entropy, an information-theoretic measure of time-directed information transfer between jointly dependent processes. We exploit such equivalence and calculate exactly the 'local Granger causality', i.e. the profile of the information trans…
▽ More
Granger causality is a statistical notion of causal influence based on prediction via vector autoregression. For Gaussian variables it is equivalent to transfer entropy, an information-theoretic measure of time-directed information transfer between jointly dependent processes. We exploit such equivalence and calculate exactly the 'local Granger causality', i.e. the profile of the information transfer at each discrete time point in Gaussian processes; in this frame Granger causality is the average of its local version. Our approach offers a robust and computationally fast method to follow the information transfer along the time history of linear stochastic processes, as well as of nonlinear complex systems studied in the Gaussian approximation.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Multiscale Analysis of Information Dynamics for Linear Multivariate Processes
Authors:
Luca Faes,
Alessandro Montalto,
Sebastiano Stramaglia,
Giandomenico Nollo,
Daniele Marinazzo
Abstract:
In the study of complex physical and physiological systems represented by multivariate time series, an issue of great interest is the description of the system dynamics over a range of different temporal scales. While information-theoretic approaches to the multiscale analysis of complex dynamics are being increasingly used, the theoretical properties of the applied measures are poorly understood.…
▽ More
In the study of complex physical and physiological systems represented by multivariate time series, an issue of great interest is the description of the system dynamics over a range of different temporal scales. While information-theoretic approaches to the multiscale analysis of complex dynamics are being increasingly used, the theoretical properties of the applied measures are poorly understood. This study introduces for the first time a framework for the analytical computation of information dynamics for linear multivariate stochastic processes explored at different time scales. After showing that the multiscale processing of a vector autoregressive (VAR) process introduces a moving average (MA) component, we describe how to represent the resulting VARMA process using state-space (SS) models and how to exploit the SS model parameters to compute analytical measures of information storage and information transfer for the original and rescaled processes. The framework is then used to quantify multiscale information dynamics for simulated unidirectionally and bidirectionally coupled VAR processes, showing that rescaling may lead to insightful patterns of information storage and transfer but also to potentially misleading behaviors.
△ Less
Submitted 24 February, 2016; v1 submitted 19 February, 2016;
originally announced February 2016.
-
Neural Networks with Non-Uniform Embedding and Explicit Validation Phase to Assess Granger Causality
Authors:
Alessandro Montalto,
Sebastiano Stramaglia,
Luca Faes,
Giovanni Tessitore,
Roberto Prevete,
Daniele Marinazzo
Abstract:
A challenging problem when studying a dynamical system is to find the interdependencies among its individual components. Several algorithms have been proposed to detect directed dynamical influences between time series. Two of the most used approaches are a model-free one (transfer entropy) and a model-based one (Granger causality). Several pitfalls are related to the presence or absence of assump…
▽ More
A challenging problem when studying a dynamical system is to find the interdependencies among its individual components. Several algorithms have been proposed to detect directed dynamical influences between time series. Two of the most used approaches are a model-free one (transfer entropy) and a model-based one (Granger causality). Several pitfalls are related to the presence or absence of assumptions in modeling the relevant features of the data. We tried to overcome those pitfalls using a neural network approach in which a model is built without any a priori assumptions. In this sense this method can be seen as a bridge between model-free and model-based approaches. The experiments performed will show that the method presented in this work can detect the correct dynamical information flows occurring in a system of time series. Additionally we adopt a non-uniform embedding framework according to which only the past states that actually help the prediction are entered into the model, improving the prediction and avoiding the risk of overfitting. This method also leads to a further improvement with respect to traditional Granger causality approaches when redundant variables (i.e. variables sharing the same information about the future of the system) are involved. Neural networks are also able to recognize dynamics in data sets completely different from the ones used during the training phase.
△ Less
Submitted 8 September, 2015; v1 submitted 1 July, 2015;
originally announced July 2015.
-
Synergetic and redundant information flow detected by unnormalized Granger causality: application to resting state fMRI
Authors:
Sebastiano Stramaglia,
Leonardo Angelini,
Guorong Wu,
Jesus M. Cortés,
Luca Faes,
Daniele Marinazzo
Abstract:
Objectives: We develop a framework for the analysis of synergy and redundancy in the pattern of information flow between subsystems of a complex network. Methods: The presence of redundancy and/or synergy in multivariate time series data renders difficult to estimate the neat flow of information from each driver variable to a given target. We show that adopting an unnormalized definition of Grange…
▽ More
Objectives: We develop a framework for the analysis of synergy and redundancy in the pattern of information flow between subsystems of a complex network. Methods: The presence of redundancy and/or synergy in multivariate time series data renders difficult to estimate the neat flow of information from each driver variable to a given target. We show that adopting an unnormalized definition of Granger causality one may put in evidence redundant multiplets of variables influencing the target by maximizing the total Granger causality to a given target, over all the possible partitions of the set of driving variables. Consequently we introduce a pairwise index of synergy which is zero when two independent sources additively influence the future state of the system, differently from previous definitions of synergy.
Results: We report the application of the proposed approach to resting state fMRI data from the Human Connectome Project, showing that redundant pairs of regions arise mainly due to space contiguity and interhemispheric symmetry, whilst synergy occurs mainly between non-homologous pairs of regions in opposite hemispheres. Conclusions: Redundancy and synergy, in healthy resting brains, display characteristic patterns, revealed by the proposed approach.
Significance: The pairwise synergy index, here introduced, maps the informational character of the system at hand into a weighted complex network: the same approach can be applied to other complex systems whose normal state corresponds to a balance between redundant and synergetic circuits.
△ Less
Submitted 16 May, 2016; v1 submitted 14 April, 2015;
originally announced April 2015.