-
Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment
Authors:
Abbi Abdel-Rehim,
Hector Zenil,
Oghenejokpeme Orhobor,
Marie Fisher,
Ross J. Collins,
Elizabeth Bourne,
Gareth W. Fearnley,
Emma Tate,
Holly X. Smith,
Larisa N. Soldatova,
Ross D. King
Abstract:
Large language models LLMs have transformed AI and achieved breakthrough performance on a wide range of tasks In science the most interesting application of LLMs is for hypothesis formation A feature of LLMs which results from their probabilistic structure is that the output text is not necessarily a valid inference from the training text These are termed hallucinations and are harmful in many app…
▽ More
Large language models LLMs have transformed AI and achieved breakthrough performance on a wide range of tasks In science the most interesting application of LLMs is for hypothesis formation A feature of LLMs which results from their probabilistic structure is that the output text is not necessarily a valid inference from the training text These are termed hallucinations and are harmful in many applications In science some hallucinations may be useful novel hypotheses whose validity may be tested by laboratory experiments Here we experimentally test the application of LLMs as a source of scientific hypotheses using the domain of breast cancer treatment We applied the LLM GPT4 to hypothesize novel synergistic pairs of FDA-approved noncancer drugs that target the MCF7 breast cancer cell line relative to the nontumorigenic breast cell line MCF10A In the first round of laboratory experiments GPT4 succeeded in discovering three drug combinations out of twelve tested with synergy scores above the positive controls GPT4 then generated new combinations based on its initial results this generated three more combinations with positive synergy scores out of four tested We conclude that LLMs are a valuable source of scientific hypotheses.
△ Less
Submitted 8 May, 2025; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Assembly Theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolution
Authors:
Felipe S. Abrahão,
Santiago Hernández-Orozco,
Narsis A. Kiani,
Jesper Tegnér,
Hector Zenil
Abstract:
We prove the full equivalence between Assembly Theory (AT) and Shannon Entropy via a method based upon the principles of statistical compression renamed `assembly index' that belongs to the LZ family of popular compression algorithms (ZIP, GZIP, JPEG). Such popular algorithms have been shown to empirically reproduce the results of AT, results that have also been reported before in successful appli…
▽ More
We prove the full equivalence between Assembly Theory (AT) and Shannon Entropy via a method based upon the principles of statistical compression renamed `assembly index' that belongs to the LZ family of popular compression algorithms (ZIP, GZIP, JPEG). Such popular algorithms have been shown to empirically reproduce the results of AT, results that have also been reported before in successful applications to separating organic from non-organic molecules and in the context of the study of selection and evolution. We show that the assembly index value is equivalent to the size of a minimal context-free grammar. The statistical compressibility of such a method is bounded by Shannon Entropy and other equivalent traditional LZ compression schemes, such as LZ77, LZ78, or LZW. In addition, we demonstrate that AT, and the algorithms supporting its pathway complexity, assembly index, and assembly number, define compression schemes and methods that are subsumed into the theory of algorithmic (Kolmogorov-Solomonoff-Chaitin) complexity. Due to AT's current lack of logical consistency in defining causality for non-stochastic processes and the lack of empirical evidence that it outperforms other complexity measures found in the literature capable of explaining the same phenomena, we conclude that the assembly index and the assembly number do not lead to an explanation or quantification of biases in generative (physical or biological) processes, including those brought about by (abiotic or Darwinian) selection and evolution, that could not have been arrived at using Shannon Entropy or that have not been reported before using classical information theory or algorithmic complexity.
△ Less
Submitted 1 April, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Routine haematological markers can predict and discriminate health status and biological age even from noisy sources
Authors:
Santiago Hernández-Orozco,
Abicumaran Uthamacumaran,
Francisco Hernández-Quiroz,
Kourosh Saeb-Parsy,
Hector Zenil
Abstract:
For more than two decades, advances in personalised medicine and precision healthcare have largely been based on genomics and other omics data. These strategies aim to tailor interventions to individual patient profiles, promising greater treatment efficacy and more efficient allocation of healthcare resources. Here, we show that widely collected common haematologic markers can reliably predict an…
▽ More
For more than two decades, advances in personalised medicine and precision healthcare have largely been based on genomics and other omics data. These strategies aim to tailor interventions to individual patient profiles, promising greater treatment efficacy and more efficient allocation of healthcare resources. Here, we show that widely collected common haematologic markers can reliably predict and discriminate individual chronological age and health status from even noisy sources. Our analysis includes synthetic and real retrospective patient data, including medically relevant and extreme cases, and draws on more than 100\,000 complete blood count records over 13 years from the United States Centers for Disease Control and Prevention's National Health and Nutrition Examination Survey (CDC NHANES). We combine fully explainable risk assessment scores with machine and deep learning techniques to focus on clinically significant patterns and characteristics without functioning purely as a ''black-box model allowing interpretation and control. We validated the results with the UK Biobank, a larger cohort independent of the CDC NHANES and with very different collection techniques, the former a survey and the second a longitudinal study. Unlike current biological ageing indicators, this approach may offer rapid, and scalable implementations of personalised, precision and predictive approaches to healthcare and medicine without or before requiring other specialised, uncommon or costly tests.
△ Less
Submitted 7 May, 2025; v1 submitted 2 March, 2023;
originally announced March 2023.
-
A Review of Mathematical and Computational Methods in Cancer Dynamics
Authors:
Abicumaran Uthamacumaran,
Hector Zenil
Abstract:
Cancers are complex adaptive diseases regulated by the nonlinear feedback systems between genetic instabilities, environmental signals, cellular protein flows, and gene regulatory networks. Understanding the cybernetics of cancer requires the integration of information dynamics across multidimensional spatiotemporal scales, including genetic, transcriptional, metabolic, proteomic, epigenetic, and…
▽ More
Cancers are complex adaptive diseases regulated by the nonlinear feedback systems between genetic instabilities, environmental signals, cellular protein flows, and gene regulatory networks. Understanding the cybernetics of cancer requires the integration of information dynamics across multidimensional spatiotemporal scales, including genetic, transcriptional, metabolic, proteomic, epigenetic, and multi-cellular networks. However, the time-series analysis of these complex networks remains vastly absent in cancer research. With longitudinal screening and time-series analysis of cellular dynamics, universally observed causal patterns pertaining to dynamical systems, may self-organize in the signaling or gene expression state-space of cancer triggering processes. A class of these patterns, strange attractors, may be mathematical biomarkers of cancer progression. The emergence of intracellular chaos and chaotic cell population dynamics remains a new paradigm in systems oncology. As such, chaotic and complex dynamics are discussed as mathematical hallmarks of cancer cell fate dynamics herein. Given the assumption that time-resolved single-cell datasets are made available, a survey of interdisciplinary tools and algorithms from complexity theory, are hereby reviewed to investigate critical phenomena and chaotic dynamics in cancer ecosystems. To conclude, the perspective cultivates an intuition for computational systems oncology in terms of nonlinear dynamics, information theory, inverse problems and complexity. We highlight the limitations we see in the area of statistical machine learning but the opportunity at combining it with the symbolic computational power offered by the mathematical tools explored.
△ Less
Submitted 27 August, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Evolving Neural Networks through a Reverse Encoding Tree
Authors:
Haoling Zhang,
Chao-Han Huck Yang,
Hector Zenil,
Narsis A. Kiani,
Yue Shen,
Jesper N. Tegner
Abstract:
NeuroEvolution is one of the most competitive evolutionary learning frameworks for designing novel neural networks for use in specific tasks, such as logic circuit design and digital gaming. However, the application of benchmark methods such as the NeuroEvolution of Augmenting Topologies (NEAT) remains a challenge, in terms of their computational cost and search time inefficiency. This paper advan…
▽ More
NeuroEvolution is one of the most competitive evolutionary learning frameworks for designing novel neural networks for use in specific tasks, such as logic circuit design and digital gaming. However, the application of benchmark methods such as the NeuroEvolution of Augmenting Topologies (NEAT) remains a challenge, in terms of their computational cost and search time inefficiency. This paper advances a method which incorporates a type of topological edge coding, named Reverse Encoding Tree (RET), for evolving scalable neural networks efficiently. Using RET, two types of approaches -- NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) -- have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines. Additionally, we conduct a robustness test to evaluate the resilience of the proposed NEAT algorithms. The results show that the two proposed strategies deliver improved performance, characterized by (1) a higher accumulated reward within a finite number of time steps; (2) using fewer episodes to solve problems in targeted environments, and (3) maintaining adaptive robustness under noisy perturbations, which outperform the baselines in all tested cases. Our analysis also demonstrates that RET expends potential future research directions in dynamic environments. Code is available from https://github.com/HaolingZHANG/ReverseEncodingTree.
△ Less
Submitted 31 March, 2020; v1 submitted 2 February, 2020;
originally announced February 2020.
-
Estimations of Integrated Information Based on Algorithmic Complexity and Dynamic Querying
Authors:
Alberto Hernández-Espinosa,
Héctor Zenil,
Narsis A. Kiani,
Jesper Tegnér
Abstract:
The concept of information has emerged as a language in its own right, bridging several disciplines that analyze natural phenomena and man-made systems. Integrated information has been introduced as a metric to quantify the amount of information generated by a system beyond the information generated by its elements. Yet, this intriguing notion comes with the price of being prohibitively expensive…
▽ More
The concept of information has emerged as a language in its own right, bridging several disciplines that analyze natural phenomena and man-made systems. Integrated information has been introduced as a metric to quantify the amount of information generated by a system beyond the information generated by its elements. Yet, this intriguing notion comes with the price of being prohibitively expensive to calculate, since the calculations require an exponential number of sub-divisions of a system. Here we introduce a novel framework to connect algorithmic randomness and integrated information and a numerical method for estimating integrated information using a perturbation test rooted in algorithmic information dynamics. This method quantifies the change in program size of a system when subjected to a perturbation. The intuition behind is that if an object is random then random perturbations have little to no effect to what happens when a shorter program but when an object has the ability to move in both directions (towards or away from randomness) it will be shown to be better integrated as a measure of sophistication telling apart randomness and simplicity from structure. We show that an object with a high integrated information value is also more compressible, and is, therefore, more sensitive to perturbations. We find that such a perturbation test quantifying compression sensitivity provides a system with a means to extract explanations--causal accounts--of its own behaviour. Our technique can reduce the number of calculations to arrive at some bounds or estimations, as the algorithmic perturbation test guides an efficient search for estimating integrated information. Our work sets the stage for a systematic exploration of connections between algorithmic complexity and integrated information at the level of both theory and practice.
△ Less
Submitted 6 June, 2019; v1 submitted 9 April, 2019;
originally announced April 2019.
-
Controllability, Multiplexing, and Transfer Learning in Networks using Evolutionary Learning
Authors:
Rise Ooi,
Chao-Han Huck Yang,
Pin-Yu Chen,
Vìctor Eguìluz,
Narsis Kiani,
Hector Zenil,
David Gomez-Cabrero,
Jesper Tegnèr
Abstract:
Networks are fundamental building blocks for representing data, and computations. Remarkable progress in learning in structurally defined (shallow or deep) networks has recently been achieved. Here we introduce evolutionary exploratory search and learning method of topologically flexible networks under the constraint of producing elementary computational steady-state input-output operations.
Our…
▽ More
Networks are fundamental building blocks for representing data, and computations. Remarkable progress in learning in structurally defined (shallow or deep) networks has recently been achieved. Here we introduce evolutionary exploratory search and learning method of topologically flexible networks under the constraint of producing elementary computational steady-state input-output operations.
Our results include; (1) the identification of networks, over four orders of magnitude, implementing computation of steady-state input-output functions, such as a band-pass filter, a threshold function, and an inverse band-pass function. Next, (2) the learned networks are technically controllable as only a small number of driver nodes are required to move the system to a new state. Furthermore, we find that the fraction of required driver nodes is constant during evolutionary learning, suggesting a stable system design. (3), our framework allows multiplexing of different computations using the same network. For example, using a binary representation of the inputs, the network can readily compute three different input-output functions. Finally, (4) the proposed evolutionary learning demonstrates transfer learning. If the system learns one function A, then learning B requires on average less number of steps as compared to learning B from tabula rasa.
We conclude that the constrained evolutionary learning produces large robust controllable circuits, capable of multiplexing and transfer learning. Our study suggests that network-based computations of steady-state functions, representing either cellular modules of cell-to-cell communication networks or internal molecular circuits communicating within a cell, could be a powerful model for biologically inspired computing. This complements conceptualizations such as attractor based models, or reservoir computing.
△ Less
Submitted 3 November, 2019; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Algorithmic Complexity and Reprogrammability of Chemical Structure Networks
Authors:
Hector Zenil,
Narsis A. Kiani,
Ming-Mei Shang,
Jesper Tegnér
Abstract:
Here we address the challenge of profiling causal properties and tracking the transformation of chemical compounds from an algorithmic perspective. We explore the potential of applying a computational interventional calculus based on the principles of algorithmic probability to chemical structure networks. We profile the sensitivity of the elements and covalent bonds in a chemical structure networ…
▽ More
Here we address the challenge of profiling causal properties and tracking the transformation of chemical compounds from an algorithmic perspective. We explore the potential of applying a computational interventional calculus based on the principles of algorithmic probability to chemical structure networks. We profile the sensitivity of the elements and covalent bonds in a chemical structure network algorithmically, asking whether reprogrammability affords information about thermodynamic and chemical processes involved in the transformation of different compound classes. We arrive at numerical results suggesting a correspondence between some physical, structural and functional properties. Our methods are capable of separating chemical classes that reflect functional and natural differences without considering any information about atomic and molecular properties. We conclude that these methods, with their links to chemoinformatics via algorithmic, probability hold promise for future research.
△ Less
Submitted 18 March, 2018; v1 submitted 16 February, 2018;
originally announced February 2018.
-
Predictive Systems Toxicology
Authors:
Narsis A. Kiani,
Ming-Mei Shang,
Hector Zenil,
Jesper Tegnér
Abstract:
In this review we address to what extent computational techniques can augment our ability to predict toxicity. The first section provides a brief history of empirical observations on toxicity dating back to the dawn of Sumerian civilization. Interestingly, the concept of dose emerged very early on, leading up to the modern emphasis on kinetic properties, which in turn encodes the insight that toxi…
▽ More
In this review we address to what extent computational techniques can augment our ability to predict toxicity. The first section provides a brief history of empirical observations on toxicity dating back to the dawn of Sumerian civilization. Interestingly, the concept of dose emerged very early on, leading up to the modern emphasis on kinetic properties, which in turn encodes the insight that toxicity is not solely a property of a compound but instead depends on the interaction with the host organism. The next logical step is the current conception of evaluating drugs from a personalized medicine point-of-view. We review recent work on integrating what could be referred to as classical pharmacokinetic analysis with emerging systems biology approaches incorporating multiple omics data. These systems approaches employ advanced statistical analytical data processing complemented with machine learning techniques and use both pharmacokinetic and omics data. We find that such integrated approaches not only provide improved predictions of toxicity but also enable mechanistic interpretations of the molecular mechanisms underpinning toxicity and drug resistance. We conclude the chapter by discussing some of the main challenges, such as how to balance the inherent tension between the predictive capacity of models, which in practice amounts to constraining the number of features in the models versus allowing for rich mechanistic interpretability, i.e. equipping models with numerous molecular features. This challenge also requires patient-specific predictions on toxicity, which in turn requires proper stratification of patients as regards how they respond, with or without adverse toxic effects. In summary, the transformation of the ancient concept of dose is currently successfully operationalized using rich integrative data encoded in patient-specific models.
△ Less
Submitted 15 January, 2018;
originally announced January 2018.
-
An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems
Authors:
Hector Zenil,
Narsis A. Kiani,
Francesco Marabita,
Yue Deng,
Szabolcs Elias,
Angelika Schmidt,
Gordon Ball,
Jesper Tegnér
Abstract:
We demonstrate that the algorithmic information content of a system is deeply connected to its potential dynamics, thus affording an avenue for moving systems in the information-theoretic space and controlling them in the phase space. To this end we performed experiments and validated the results on (1) a very large set of small graphs, (2) a number of larger networks with different topologies, an…
▽ More
We demonstrate that the algorithmic information content of a system is deeply connected to its potential dynamics, thus affording an avenue for moving systems in the information-theoretic space and controlling them in the phase space. To this end we performed experiments and validated the results on (1) a very large set of small graphs, (2) a number of larger networks with different topologies, and (3) biological networks from a widely studied and validated genetic network (e.coli) as well as on a significant number of differentiating (Th17) and differentiated human cells from high quality databases (Harvard's CellNet) with results conforming to experimentally validated biological data. Based on these results we introduce a conceptual framework, a model-based interventional calculus and a reprogrammability measure with which to steer, manipulate, and reconstruct the dynamics of non- linear dynamical systems from partial and disordered observations. The method consists in finding and applying a series of controlled interventions to a dynamical system to estimate how its algorithmic information content is affected when every one of its elements are perturbed. The approach represents an alternative to numerical simulation and statistical approaches for inferring causal mechanistic/generative models and finding first principles. We demonstrate the framework's capabilities by reconstructing the phase space of some discrete dynamical systems (cellular automata) as case study and reconstructing their generating rules. We thus advance tools for reprogramming artificial and living systems without full knowledge or access to the system's actual kinetic equations or probability distributions yielding a suite of universal and parameter-free algorithms of wide applicability ranging from causation, dimension reduction, feature selection and model generation.
△ Less
Submitted 5 April, 2018; v1 submitted 15 September, 2017;
originally announced September 2017.
-
Algorithmically probable mutations reproduce aspects of evolution such as convergence rate, genetic memory, and modularity
Authors:
Santiago Hernández-Orozco,
Narsis A. Kiani,
Hector Zenil
Abstract:
Natural selection explains how life has evolved over millions of years from more primitive forms. The speed at which this happens, however, has sometimes defied formal explanations when based on random (uniformly distributed) mutations. Here we investigate the application of a simplicity bias based on a natural but algorithmic distribution of mutations (no recombination) in various examples, parti…
▽ More
Natural selection explains how life has evolved over millions of years from more primitive forms. The speed at which this happens, however, has sometimes defied formal explanations when based on random (uniformly distributed) mutations. Here we investigate the application of a simplicity bias based on a natural but algorithmic distribution of mutations (no recombination) in various examples, particularly binary matrices in order to compare evolutionary convergence rates. Results both on synthetic and on small biological examples indicate an accelerated rate when mutations are not statistical uniform but \textit{algorithmic uniform}. We show that algorithmic distributions can evolve modularity and genetic memory by preservation of structures when they first occur sometimes leading to an accelerated production of diversity but also population extinctions, possibly explaining naturally occurring phenomena such as diversity explosions (e.g. the Cambrian) and massive extinctions (e.g. the End Triassic) whose causes are currently a cause for debate. The natural approach introduced here appears to be a better approximation to biological evolution than models based exclusively upon random uniform mutations, and it also approaches a formal version of open-ended evolution based on previous formal results. These results validate some suggestions in the direction that computation may be an equally important driver of evolution. We also show that inducing the method on problems of optimization, such as genetic algorithms, has the potential to accelerate convergence of artificial evolutionary algorithms.
△ Less
Submitted 20 June, 2018; v1 submitted 1 September, 2017;
originally announced September 2017.
-
Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences
Authors:
Hector Zenil,
Peter Minary
Abstract:
We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancie…
▽ More
We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding representing a significant advancement in predicting the highest nucleosome occupancy following a training-free approach.
△ Less
Submitted 16 October, 2018; v1 submitted 5 August, 2017;
originally announced August 2017.
-
HiDi: An efficient reverse engineering schema for large scale dynamic regulatory network reconstruction using adaptive differentiation
Authors:
Yue Deng,
Hector Zenil,
Jesper Tégner,
Narsis A. Kiani
Abstract:
The use of differential equations (ODE) is one of the most promising approaches to network inference. The success of ODE-based approaches has, however, been limited, due to the difficulty in estimating parameters and by their lack of scalability. Here we introduce a novel method and pipeline to reverse engineer gene regulatory networks from gene expression of time series and perturbation data base…
▽ More
The use of differential equations (ODE) is one of the most promising approaches to network inference. The success of ODE-based approaches has, however, been limited, due to the difficulty in estimating parameters and by their lack of scalability. Here we introduce a novel method and pipeline to reverse engineer gene regulatory networks from gene expression of time series and perturbation data based upon an improvement on the calculation scheme of the derivatives and a pre-filtration step to reduce the number of possible links. The method introduces a linear differential equation model with adaptive numerical differentiation that is scalable to extremely large regulatory networks. We demonstrate the ability of this method to outperform current state-of-the-art methods applied to experimental and synthetic data using test data from the DREAM4 and DREAM5 challenges. Our method displays greater accuracy and scalability. We benchmark the performance of the pipeline with respect to data set size and levels of noise. We show that the computation time is linear over various network sizes.
△ Less
Submitted 7 June, 2017; v1 submitted 5 June, 2017;
originally announced June 2017.
-
Interacting Behavior and Emerging Complexity
Authors:
Alyssa Adams,
Hector Zenil,
Eduardo Hermo Reyes,
Joost Joosten
Abstract:
Can we quantify the change of complexity throughout evolutionary processes? We attempt to address this question through an empirical approach. In very general terms, we simulate two simple organisms on a computer that compete over limited available resources. We implement Global Rules that determine the interaction between two Elementary Cellular Automata on the same grid. Global Rules change the…
▽ More
Can we quantify the change of complexity throughout evolutionary processes? We attempt to address this question through an empirical approach. In very general terms, we simulate two simple organisms on a computer that compete over limited available resources. We implement Global Rules that determine the interaction between two Elementary Cellular Automata on the same grid. Global Rules change the complexity of the state evolution output which suggests that some complexity is intrinsic to the interaction rules themselves. The largest increases in complexity occurred when the interacting elementary rules had very little complexity, suggesting that they are able to accept complexity through interaction only. We also found that some Class 3 or 4 CA rules are more fragile than others to Global Rules, while others are more robust, hence suggesting some intrinsic properties of the rules independent of the Global Rule choice. We provide statistical mappings of Elementary Cellular Automata exposed to Global Rules and different initial conditions onto different complexity classes.
△ Less
Submitted 4 January, 2016; v1 submitted 23 December, 2015;
originally announced December 2015.
-
Evaluating Network Inference Methods in Terms of Their Ability to Preserve the Topology and Complexity of Genetic Networks
Authors:
Narsis A. Kiani,
Hector Zenil,
Jakub Olczak,
Jesper Tegnér
Abstract:
Network inference is a rapidly advancing field, with new methods being proposed on a regular basis. Understanding the advantages and limitations of different network inference methods is key to their effective application in different circumstances. The common structural properties shared by diverse networks naturally pose a challenge when it comes to devising accurate inference methods, but surpr…
▽ More
Network inference is a rapidly advancing field, with new methods being proposed on a regular basis. Understanding the advantages and limitations of different network inference methods is key to their effective application in different circumstances. The common structural properties shared by diverse networks naturally pose a challenge when it comes to devising accurate inference methods, but surprisingly, there is a paucity of comparison and evaluation methods. Historically, every new methodology has only been tested against \textit{gold standard} (true values) purpose-designed synthetic and real-world (validated) biological networks. In this paper we aim to assess the impact of taking into consideration aspects of topological and information content in the evaluation of the final accuracy of an inference procedure. Specifically, we will compare the best inference methods, in both graph-theoretic and information-theoretic terms, for preserving topological properties and the original information content of synthetic and biological networks. New methods for performance comparison are introduced by borrowing ideas from gene set enrichment analysis and by applying concepts from algorithmic complexity. Experimental results show that no individual algorithm outperforms all others in all cases, and that the challenging and non-trivial nature of network inference is evident in the struggle of some of the algorithms to turn in a performance that is superior to random guesswork. Therefore special care should be taken to suit the method to the purpose at hand. Finally, we show that evaluations from data generated using different underlying topologies have different signatures that can be used to better choose a network reconstruction method.
△ Less
Submitted 14 September, 2016; v1 submitted 3 December, 2015;
originally announced December 2015.
-
Approximations of Algorithmic and Structural Complexity Validate Cognitive-behavioural Experimental Results
Authors:
Hector Zenil,
James A. R. Marshall,
Jesper Tegnér
Abstract:
Being able to objectively characterise the intrinsic complexity of behavioural patterns resulting from human or animal decisions is fundamental for deconvolving cognition and designing autonomous artificial intelligence systems. Yet complexity is difficult in practice, particularly when strings are short. By numerically approximating algorithmic (Kolmogorov) complexity (K), we establish an objecti…
▽ More
Being able to objectively characterise the intrinsic complexity of behavioural patterns resulting from human or animal decisions is fundamental for deconvolving cognition and designing autonomous artificial intelligence systems. Yet complexity is difficult in practice, particularly when strings are short. By numerically approximating algorithmic (Kolmogorov) complexity (K), we establish an objective tool to characterise behavioural complexity. Next, we approximate structural (Bennett's Logical Depth) complexity (LD) to assess the amount of computation required for generating a behavioural string. We apply our toolbox to three landmark studies of animal behaviour of increasing sophistication and degree of environmental influence, including studies of foraging communication by ants, flight patterns of fruit flies, and tactical deception and competition (e.g., predator-prey) strategies. We find that ants harness the environmental condition in their internal decision process, modulating their behavioural complexity accordingly. Our analysis of flight (fruit flies) invalidated the common hypothesis that animals navigating in an environment devoid of stimuli adopt a random strategy. Fruit flies exposed to a featureless environment deviated the most from Levy flight, suggesting an algorithmic bias in their attempt to devise a useful (navigation) strategy. Similarly, a logical depth analysis of rats revealed that the structural complexity of the rat always ends up matching the structural complexity of the competitor, with the rats' behaviour simulating algorithmic randomness. Finally, we discuss how experiments on how humans perceive randomness suggest the existence of an algorithmic bias in our reasoning and decision processes, in line with our analysis of the animal experiments.
△ Less
Submitted 20 December, 2022; v1 submitted 21 September, 2015;
originally announced September 2015.
-
Quantifying Loss of Information in Network-based Dimensionality Reduction Techniques
Authors:
Hector Zenil,
Narsis A. Kiani,
Jesper Tegnér
Abstract:
To cope with the complexity of large networks, a number of dimensionality reduction techniques for graphs have been developed. However, the extent to which information is lost or preserved when these techniques are employed has not yet been clear. Here we develop a framework, based on algorithmic information theory, to quantify the extent to which information is preserved when network motif analys…
▽ More
To cope with the complexity of large networks, a number of dimensionality reduction techniques for graphs have been developed. However, the extent to which information is lost or preserved when these techniques are employed has not yet been clear. Here we develop a framework, based on algorithmic information theory, to quantify the extent to which information is preserved when network motif analysis, graph spectra and spectral sparsification methods are applied to over twenty different biological and artificial networks. We find that the spectral sparsification is highly sensitive to high number of edge deletion, leading to significant inconsistencies, and that graph spectral methods are the most irregular, capturing algebraic information in a condensed fashion but largely losing most of the information content of the original networks. However, the approach shows that network motif analysis excels at preserving the relative algorithmic information content of a network, hence validating and generalizing the remarkable fact that despite their inherent combinatorial possibilities, local regularities preserve information to such an extent that essential properties are fully recoverable across different networks to determine their family group to which they belong to (eg genetic vs social network). Our algorithmic information methodology thus provides a rigorous framework enabling a fundamental assessment and comparison between different data dimensionality reduction methods thereby facilitating the identification and evaluation of the capabilities of old and new methods.
△ Less
Submitted 27 August, 2015; v1 submitted 23 April, 2015;
originally announced April 2015.
-
Methods of Information Theory and Algorithmic Complexity for Network Biology
Authors:
Hector Zenil,
Narsis A. Kiani,
Jesper Tegnér
Abstract:
We survey and introduce concepts and tools located at the intersection of information theory and network biology. We show that Shannon's information entropy, compressibility and algorithmic complexity quantify different local and global aspects of synthetic and biological data. We show examples such as the emergence of giant components in Erdos-Renyi random graphs, and the recovery of topological…
▽ More
We survey and introduce concepts and tools located at the intersection of information theory and network biology. We show that Shannon's information entropy, compressibility and algorithmic complexity quantify different local and global aspects of synthetic and biological data. We show examples such as the emergence of giant components in Erdos-Renyi random graphs, and the recovery of topological properties from numerical kinetic properties simulating gene expression data. We provide exact theoretical calculations, numerical approximations and error estimations of entropy, algorithmic probability and Kolmogorov complexity for different types of graphs, characterizing their variant and invariant properties. We introduce formal definitions of complexity for both labeled and unlabeled graphs and prove that the Kolmogorov complexity of a labeled graph is a good approximation of its unlabeled Kolmogorov complexity and thus a robust definition of graph complexity.
△ Less
Submitted 11 December, 2015; v1 submitted 15 January, 2014;
originally announced January 2014.
-
Correlation of Automorphism Group Size and Topological Properties with Program-size Complexity Evaluations of Graphs and Complex Networks
Authors:
Hector Zenil,
Fernando Soler-Toscano,
Kamaludin Dingle,
Ard A. Louis
Abstract:
We show that numerical approximations of Kolmogorov complexity (K) applied to graph adjacency matrices capture some group-theoretic and topological properties of graphs and empirical networks ranging from metabolic to social networks. That K and the size of the group of automorphisms of a graph are correlated opens up interesting connections to problems in computational geometry, and thus connects…
▽ More
We show that numerical approximations of Kolmogorov complexity (K) applied to graph adjacency matrices capture some group-theoretic and topological properties of graphs and empirical networks ranging from metabolic to social networks. That K and the size of the group of automorphisms of a graph are correlated opens up interesting connections to problems in computational geometry, and thus connects several measures and concepts from complexity science. We show that approximations of K characterise synthetic and natural networks by their generating mechanisms, assigning lower algorithmic randomness to complex network models (Watts-Strogatz and Barabasi-Albert networks) and high Kolmogorov complexity to (random) Erdos-Renyi graphs. We derive these results via two different Kolmogorov complexity approximation methods applied to the adjacency matrices of the graphs and networks. The methods used are the traditional lossless compression approach to Kolmogorov complexity, and a normalised version of a Block Decomposition Method (BDM) measure, based on algorithmic probability theory.
△ Less
Submitted 22 February, 2014; v1 submitted 3 June, 2013;
originally announced June 2013.