-
StepProof: Step-by-step verification of natural language mathematical proofs
Authors:
Xiaolin Hu,
Qinghua Zhou,
Bogdan Grechuk,
Ivan Y. Tyukin
Abstract:
Interactive theorem provers (ITPs) are powerful tools for the formal verification of mathematical proofs down to the axiom level. However, their lack of a natural language interface remains a significant limitation. Recent advancements in large language models (LLMs) have enhanced the understanding of natural language inputs, paving the way for autoformalization - the process of translating natura…
▽ More
Interactive theorem provers (ITPs) are powerful tools for the formal verification of mathematical proofs down to the axiom level. However, their lack of a natural language interface remains a significant limitation. Recent advancements in large language models (LLMs) have enhanced the understanding of natural language inputs, paving the way for autoformalization - the process of translating natural language proofs into formal proofs that can be verified. Despite these advancements, existing autoformalization approaches are limited to verifying complete proofs and lack the capability for finer, sentence-level verification. To address this gap, we propose StepProof, a novel autoformalization method designed for granular, step-by-step verification. StepProof breaks down complete proofs into multiple verifiable subproofs, enabling sentence-level verification. Experimental results demonstrate that StepProof significantly improves proof success rates and efficiency compared to traditional methods. Additionally, we found that minor manual adjustments to the natural language proofs, tailoring them for step-level verification, further enhanced StepProof's performance in autoformalization.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
When fractional quasi p-norms concentrate
Authors:
Ivan Y. Tyukin,
Bogdan Grechuk,
Evgeny M. Mirkes,
Alexander N. Gorban
Abstract:
Concentration of distances in high dimension is an important factor for the development and design of stable and reliable data analysis algorithms. In this paper, we address the fundamental long-standing question about the concentration of distances in high dimension for fractional quasi $p$-norms, $p\in(0,1)$. The topic has been at the centre of various theoretical and empirical controversies. He…
▽ More
Concentration of distances in high dimension is an important factor for the development and design of stable and reliable data analysis algorithms. In this paper, we address the fundamental long-standing question about the concentration of distances in high dimension for fractional quasi $p$-norms, $p\in(0,1)$. The topic has been at the centre of various theoretical and empirical controversies. Here we, for the first time, identify conditions when fractional quasi $p$-norms concentrate and when they don't. We show that contrary to some earlier suggestions, for broad classes of distributions, fractional quasi $p$-norms admit exponential and uniform in $p$ concentration bounds. For these distributions, the results effectively rule out previously proposed approaches to alleviate concentration by "optimal" setting the values of $p$ in $(0,1)$. At the same time, we specify conditions and the corresponding families of distributions for which one can still control concentration rates by appropriate choices of $p$. We also show that in an arbitrarily small vicinity of a distribution from a large class of distributions for which uniform concentration occurs, there are uncountably many other distributions featuring anti-concentration properties. Importantly, this behavior enables devising relevant data encoding or representation schemes favouring or discouraging distance concentration. The results shed new light on this long-standing problem and resolve the tension around the topic in both theory and empirical evidence reported in the literature.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Physics-informed neural networks for aggregation kinetics
Authors:
Farzona Mukhamedova,
Ivan Tyukin,
Nikolai Brilliantov
Abstract:
We introduce a novel physics-informed approach for accurately modeling aggregation kinetics which provides a comprehensive solution in a single run by outputting all model parameters simultaneously, a clear advancement over traditional single-output networks that require multiple executions. This method effectively captures the density distributions of both large and small clusters, showcasing a n…
▽ More
We introduce a novel physics-informed approach for accurately modeling aggregation kinetics which provides a comprehensive solution in a single run by outputting all model parameters simultaneously, a clear advancement over traditional single-output networks that require multiple executions. This method effectively captures the density distributions of both large and small clusters, showcasing a notable improvement in predicting small particles, which have historically posed challenges in computational models. This approach yields significant advancements in computational efficiency and accuracy for solving the Smoluchowski equations by minimizing the interval over which the physics-informed loss function operates, allowing for efficient computation over extended time-frames with minimal increase in computational cost. Due to the the independence of predefined shapes for bias or weight outputs, it removes the dependency on prior assumptions about output structures. Furthermore, our physics-informed framework exhibits high compatibility with the generalized Brownian kernel, maintaining robust accuracy for this previously unaddressed kernel type. The framework's notable novelty also lies in addressing four different kernels with one neural network architecture. Therefore with high computational efficiency, combined with low error margins it indicates significant potential for long-term predictions and integration into broader computational systems.
△ Less
Submitted 15 October, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Stealth edits to large language models
Authors:
Oliver J. Sutton,
Qinghua Zhou,
Wei Wang,
Desmond J. Higham,
Alexander N. Gorban,
Alexander Bastounis,
Ivan Y. Tyukin
Abstract:
We reveal the theoretical foundations of techniques for editing large language models, and present new methods which can do so without requiring retraining. Our theoretical insights show that a single metric (a measure of the intrinsic dimension of the model's features) can be used to assess a model's editability and reveals its previously unrecognised susceptibility to malicious stealth attacks.…
▽ More
We reveal the theoretical foundations of techniques for editing large language models, and present new methods which can do so without requiring retraining. Our theoretical insights show that a single metric (a measure of the intrinsic dimension of the model's features) can be used to assess a model's editability and reveals its previously unrecognised susceptibility to malicious stealth attacks. This metric is fundamental to predicting the success of a variety of editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these as stealth editing methods, because they directly update a model's weights to specify its response to specific known hallucinating prompts without affecting other model behaviour. By carefully applying our theoretical insights, we are able to introduce a new jet-pack network block which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. We also reveal the vulnerability of language models to stealth attacks: a small change to a model's weights which fixes its response to a single attacker-chosen prompt. Stealth attacks are computationally simple, do not require access to or knowledge of the model's training data, and therefore represent a potent yet previously unrecognised threat to redistributed foundation models. Extensive experimental results illustrate and support our methods and their theoretical underpinnings. Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits.
△ Less
Submitted 30 October, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Agile gesture recognition for low-power applications: customisation for generalisation
Authors:
Ying Liu,
Liucheng Guo,
Valeri A. Makarovc,
Alexander Gorbana,
Evgeny Mirkesa,
Ivan Y. Tyukin
Abstract:
Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that op…
▽ More
Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that operate on low-power sensor devices. This is due to the rising concerns for data leakage and end-user privacy, as well as the limited battery capacity and the computing power in low-cost devices. Moreover, the challenge in data collection for individually designed hardware also hinders the generalisation of a gesture recognition model.
In this study, we unveil a novel methodology for pattern recognition systems using adaptive and agile error correction, designed to enhance the performance of legacy gesture recognition models on devices with limited battery capacity and computing power. This system comprises a compact Support Vector Machine as the base model for live gesture recognition. Additionally, it features an adaptive agile error corrector that employs few-shot learning within the feature space induced by high-dimensional kernel mappings. The error corrector can be customised for each user, allowing for dynamic adjustments to the gesture prediction based on their movement patterns while maintaining the agile performance of its base model on a low-cost and low-power micro-controller. This proposed system is distinguished by its compact size, rapid processing speed, and low power consumption, making it ideal for a wide range of embedded systems.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Fast social-like learning of complex behaviors based on motor motifs
Authors:
Carlos Calvo Tapia,
Ivan Y. Tyukin,
Valeriy A. Makarov Slizneva
Abstract:
Social learning is widely observed in many species. Less experienced agents copy successful behaviors, exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of $n$ motor motifs. Then a neural network capable of activating motor motifs in a given sequence…
▽ More
Social learning is widely observed in many species. Less experienced agents copy successful behaviors, exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of $n$ motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for $(n-1)!$ possible sequences of motifs in a neural network, we employ the winner-less competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire ``on the fly'' its synaptic couplings in no more than $(n-1)$ learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that indeed the learner is capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees
Authors:
Ivan Y. Tyukin,
Tatiana Tyukina,
Daniel van Helden,
Zedong Zheng,
Evgeny M. Mirkes,
Oliver J. Sutton,
Qinghua Zhou,
Alexander N. Gorban,
Penelope Allison
Abstract:
We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining fro…
▽ More
We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce.
△ Less
Submitted 13 February, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Knowledge-Informed Neuro-Integrators for Aggregation Kinetics
Authors:
Dmitrii Lukashevich,
Ivan Tyukin,
Nikolay Brilliantov
Abstract:
We report a novel approach for the efficient computation of solutions of a broad class of large-scale systems of non-linear ordinary differential equations, describing aggregation kinetics. The method is based on a new take on the dimensionality reduction for this class of equations which can be naturally implemented by a cascade of small feed-forward artificial neural networks. We show that this…
▽ More
We report a novel approach for the efficient computation of solutions of a broad class of large-scale systems of non-linear ordinary differential equations, describing aggregation kinetics. The method is based on a new take on the dimensionality reduction for this class of equations which can be naturally implemented by a cascade of small feed-forward artificial neural networks. We show that this cascade, of otherwise static models, is capable of predicting solutions of the original large-scale system over large intervals of time, using the information about the solution computed over much smaller intervals. The computational cost of the method depends very mildly on the temporal horizon, which is a major improvement over the current state-of-the-art methods, whose complexity increases super-linearly with the system's size and proportionally to the simulation time. In cases when prior information about the values of solutions over a relatively small interval of time is already available, the method's computational complexity does not depend explicitly on the system's size. The successful application of the new method is illustrated for spatially-homogeneous systems, with a source of monomers, for a number of the most representative reaction rates kernels.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Relative intrinsic dimensionality is intrinsic to learning
Authors:
Oliver J. Sutton,
Qinghua Zhou,
Alexander N. Gorban,
Ivan Y. Tyukin
Abstract:
High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of t…
▽ More
High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data. For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data. We extend this notion to that of the relative intrinsic dimension of two data distributions, which we show provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem
△ Less
Submitted 10 October, 2023;
originally announced November 2023.
-
The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning
Authors:
Alexander Bastounis,
Alexander N. Gorban,
Anders C. Hansen,
Desmond J. Higham,
Danil Prokhorov,
Oliver Sutton,
Ivan Y. Tyukin,
Qinghua Zhou
Abstract:
In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accu…
▽ More
In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures.
△ Less
Submitted 21 November, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
How adversarial attacks can disrupt seemingly stable accurate classifiers
Authors:
Oliver J. Sutton,
Qinghua Zhou,
Ivan Y. Tyukin,
Alexander N. Gorban,
Alexander Bastounis,
Desmond J. Higham
Abstract:
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show th…
▽ More
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.
△ Less
Submitted 9 September, 2024; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Agile gesture recognition for capacitive sensing devices: adapting on-the-job
Authors:
Ying Liu,
Liucheng Guo,
Valeri A. Makarov,
Yuxiang Huang,
Alexander Gorban,
Evgeny Mirkes,
Ivan Y. Tyukin
Abstract:
Automated hand gesture recognition has been a focus of the AI community for decades. Traditionally, work in this domain revolved largely around scenarios assuming the availability of the flow of images of the user hands. This has partly been due to the prevalence of camera-based devices and the wide availability of image data. However, there is growing demand for gesture recognition technology tha…
▽ More
Automated hand gesture recognition has been a focus of the AI community for decades. Traditionally, work in this domain revolved largely around scenarios assuming the availability of the flow of images of the user hands. This has partly been due to the prevalence of camera-based devices and the wide availability of image data. However, there is growing demand for gesture recognition technology that can be implemented on low-power devices using limited sensor data instead of high-dimensional inputs like hand images. In this work, we demonstrate a hand gesture recognition system and method that uses signals from capacitive sensors embedded into the etee hand controller. The controller generates real-time signals from each of the wearer five fingers. We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms. The analysis is composed of a two stage training strategy, including dimension reduction through principal component analysis and classification with K nearest neighbour. Remarkably, we found that this combination showed a level of performance which was comparable to more advanced methods such as supervised variational autoencoder. The base system can also be equipped with the capability to learn from occasional errors by providing it with an additional adaptive error correction mechanism. The results showed that the error corrector improve the classification performance in the base system without compromising its performance. The system requires no more than 1 ms of computing time per input sample, and is smaller than deep neural networks, demonstrating the feasibility of agile gesture recognition systems based on this technology.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
MyI-Net: Fully Automatic Detection and Quantification of Myocardial Infarction from Cardiovascular MRI Images
Authors:
Shuihua Wang,
Ahmed M. S. E. K Abdelaty,
Kelly Parke,
J Ranjit Arnold,
Gerry P McCann,
Ivan Y Tyukin
Abstract:
A "heart attack" or myocardial infarction (MI), occurs when an artery supplying blood to the heart is abruptly occluded. The "gold standard" method for imaging MI is Cardiovascular Magnetic Resonance Imaging (MRI), with intravenously administered gadolinium-based contrast (late gadolinium enhancement). However, no "gold standard" fully automated method for the quantification of MI exists. In this…
▽ More
A "heart attack" or myocardial infarction (MI), occurs when an artery supplying blood to the heart is abruptly occluded. The "gold standard" method for imaging MI is Cardiovascular Magnetic Resonance Imaging (MRI), with intravenously administered gadolinium-based contrast (late gadolinium enhancement). However, no "gold standard" fully automated method for the quantification of MI exists. In this work, we propose an end-to-end fully automatic system (MyI-Net) for the detection and quantification of MI in MRI images. This has the potential to reduce the uncertainty due to the technical variability across labs and inherent problems of the data and labels. Our system consists of four processing stages designed to maintain the flow of information across scales. First, features from raw MRI images are generated using feature extractors built on ResNet and MoblieNet architectures. This is followed by the Atrous Spatial Pyramid Pooling (ASPP) to produce spatial information at different scales to preserve more image context. High-level features from ASPP and initial low-level features are concatenated at the third stage and then passed to the fourth stage where spatial information is recovered via up-sampling to produce final image segmentation output into: i) background, ii) heart muscle, iii) blood and iv) scar areas. New models were compared with state-of-art models and manual quantification. Our models showed favorable performance in global segmentation and scar tissue detection relative to state-of-the-art work, including a four-fold better performance in matching scar pixels to contours produced by clinicians.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
Towards a mathematical understanding of learning from few examples with nonlinear feature maps
Authors:
Oliver J. Sutton,
Alexander N. Gorban,
Ivan Y. Tyukin
Abstract:
We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's…
▽ More
We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's generalisation capabilities of nonlinear feature transformations mapping the original data into high, and possibly infinite, dimensional spaces.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Learning from few examples with nonlinear feature maps
Authors:
Ivan Y. Tyukin,
Oliver Sutton,
Alexander N. Gorban
Abstract:
In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the in…
▽ More
In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities. Subject to appropriate assumptions, we establish new relationships between intrinsic dimensions of the transformed data and the probabilities to learn successfully from few presentations.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation
Authors:
Qinghua Zhou,
Alexander N. Gorban,
Evgeny M. Mirkes,
Jonathan Bac,
Andrei Zinovyev,
Ivan Y. Tyukin
Abstract:
Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural arc…
▽ More
Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Situation-based memory in spiking neuron-astrocyte network
Authors:
Susanna Gordleeva,
Yuliya A. Tsybina,
Mikhail I. Krivonosov,
Ivan Y. Tyukin,
Victor B. Kazantsev,
Alexey A. Zaikin,
Alexander N. Gorban
Abstract:
Mammalian brains operate in a very special surrounding: to survive they have to react quickly and effectively to the pool of stimuli patterns previously recognized as danger. Many learning tasks often encountered by living organisms involve a specific set-up centered around a relatively small set of patterns presented in a particular environment. For example, at a party, people recognize friends i…
▽ More
Mammalian brains operate in a very special surrounding: to survive they have to react quickly and effectively to the pool of stimuli patterns previously recognized as danger. Many learning tasks often encountered by living organisms involve a specific set-up centered around a relatively small set of patterns presented in a particular environment. For example, at a party, people recognize friends immediately, without deep analysis, just by seeing a fragment of their clothes. This set-up with reduced "ontology" is referred to as a "situation". Situations are usually local in space and time. In this work, we propose that neuron-astrocyte networks provide a network topology that is effectively adapted to accommodate situation-based memory. In order to illustrate this, we numerically simulate and analyze a well-established model of a neuron-astrocyte network, which is subjected to stimuli conforming to the situation-driven environment. Three pools of stimuli patterns are considered: external patterns, patterns from the situation associative pool regularly presented to the network and learned by the network, and patterns already learned and remembered by astrocytes. Patterns from the external world are added to and removed from the associative pool. Then we show that astrocytes are structurally necessary for an effective function in such a learning and testing set-up. To demonstrate this we present a novel neuromorphic model for short-term memory implemented by a two-net spiking neural-astrocytic network. Our results show that such a system tested on synthesized data with selective astrocyte-induced modulation of neuronal activity provides an enhancement of retrieval quality in comparison to standard spiking neural networks trained via Hebbian plasticity only. We argue that the proposed set-up may offer a new way to analyze, model, and understand neuromorphic artificial intelligence systems.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Scikit-dimension: a Python package for intrinsic dimension estimation
Authors:
Jonathan Bac,
Evgeny M. Mirkes,
Alexander N. Gorban,
Ivan Tyukin,
Andrei Zinovyev
Abstract:
Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source P…
▽ More
Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source Python package for intrinsic dimension estimation. \texttt{scikit-dimension} package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface to evaluate global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data. The source code is available from https://github.com/j-bac/scikit-dimension , the documentation is available from https://scikit-dimension.readthedocs.io .
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Learning from scarce information: using synthetic data to classify Roman fine ware pottery
Authors:
Santos J. Núñez Jareño,
Daniël P. van Helden,
Evgeny M. Mirkes,
Ivan Y. Tyukin,
Penelope M. Allison
Abstract:
In this article we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study the objects were smartphone photographs of n…
▽ More
In this article we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study the objects were smartphone photographs of near-complete Roman terra sigillata pottery vessels from the collection of the Museum of London. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process through our synthetic data generator. After this first initial training the model was fine-tuned with data from photographs of real vessels. We show, through exhaustive experiments across several popular deep learning architectures, different test priors, and considering the impact of the photograph viewpoint and excessive damage to the vessels, that the proposed hybrid approach enables the creation of classifiers with appropriate generalisation performance. This performance is significantly better than that of classifiers trained exclusively on the original data which shows the promise of the approach to alleviate the fundamental issue of learning from small datasets.
△ Less
Submitted 3 July, 2021;
originally announced July 2021.
-
High-dimensional separability for one- and few-shot learning
Authors:
Alexander N. Gorban,
Bogdan Grechuk,
Evgeny M. Mirkes,
Sergey V. Stasenko,
Ivan Y. Tyukin
Abstract:
This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in…
▽ More
This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in which the legacy AI system works well and a new decision for situations with potential errors. Input signals for the correctors can be the inputs of the legacy AI system, its internal signals, and outputs. If the intrinsic dimensionality of data is high enough then the classifiers for correction of small number of errors can be very simple. According to the blessing of dimensionality effects, even simple and robust Fisher's discriminants can be used for one-shot learning of AI correctors. Stochastic separation theorems provide the mathematical basis for this one-short learning. However, as the number of correctors needed grows, the cluster structure of data becomes important and a new family of stochastic separation theorems is required. We refuse the classical hypothesis of the regularity of the data distribution and assume that the data can have a fine-grained structure with many clusters and peaks in the probability density. New stochastic separation theorems for data with fine-grained structure are formulated and proved. The multi-correctors for granular data are proposed. The advantages of the multi-corrector technology were demonstrated by examples of correcting errors and learning new classes of objects by a deep convolutional neural network on the CIFAR-10 dataset. The key problems of the non-classical high-dimensional data analysis are reviewed together with the basic preprocessing steps including supervised, semi-supervised and domain adaptation Principal Component Analysis.
△ Less
Submitted 22 October, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
The Feasibility and Inevitability of Stealth Attacks
Authors:
Ivan Y. Tyukin,
Desmond J. Higham,
Alexander Bastounis,
Eliyas Woldegeorgis,
Alexander N. Gorban
Abstract:
We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgr…
▽ More
We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a ``democratization of AI'' agenda, where network architectures and trained parameter sets are shared publicly. We develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts using state of the art architectures on two standard image data sets. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks.
△ Less
Submitted 4 January, 2023; v1 submitted 26 June, 2021;
originally announced June 2021.
-
Demystification of Few-shot and One-shot Learning
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Muhammad H. Alkhudaydi,
Qinghua Zhou
Abstract:
Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large t…
▽ More
Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large training and testing samples to be meaningful. This sharply contrasts with numerous examples of successful one- and few-shot learning systems and applications.
In this work we present mathematical foundations for a theory of one-shot and few-shot learning and reveal conditions specifying when such learning schemes are likely to succeed. Our theory is based on intrinsic properties of high-dimensional spaces. We show that if the ambient or latent decision space of a learning machine is sufficiently high-dimensional than a large class of objects in this space can indeed be easily learned from few examples provided that certain data non-concentration conditions are met.
△ Less
Submitted 29 May, 2021; v1 submitted 25 April, 2021;
originally announced April 2021.
-
General stochastic separation theorems with optimal bounds
Authors:
Bogdan Grechuk,
Alexander N. Gorban,
Ivan Y. Tyukin
Abstract:
Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest…
▽ More
Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest of the data. The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same stochastic separability that holds the keys to understanding the fundamentals of robustness and adaptivity in high-dimensional data-driven AI. To manage errors and analyze vulnerabilities, the stochastic separation theorems should evaluate the probability that the dataset will be Fisher separable in given dimensionality and for a given class of distributions. Explicit and optimal estimates of these separation probabilities are required, and this problem is solved in present work. The general stochastic separation theorems with optimal probability estimates are obtained for important classes of distributions: log-concave distribution, their convex combinations and product distributions. The standard i.i.d. assumption was significantly relaxed. These theorems and estimates can be used both for correction of high-dimensional data driven AI systems and for analysis of their vulnerabilities. The third area of application is the emergence of memories in ensembles of neurons, the phenomena of grandmother's cells and sparse coding in the brain, and explanation of unexpected effectiveness of small neural ensembles in high-dimensional brain.
△ Less
Submitted 9 January, 2021; v1 submitted 11 October, 2020;
originally announced October 2020.
-
On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems
Authors:
Ivan Y. Tyukin,
Desmond J. Higham,
Alexander N. Gorban
Abstract:
In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first cla…
▽ More
In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
High--Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality
Authors:
Alexander N. Gorban,
Valery A. Makarov,
Ivan Y. Tyukin
Abstract:
High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much atten…
▽ More
High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Universal principles justify the existence of concept cells
Authors:
Carlos Calvo Tapia,
Ivan Tyukin,
Valeri A. Makarov
Abstract:
It is largely believed that complex cognitive phenomena require the perfect orchestrated collaboration of many neurons. However, this is not what converging experimental evidence suggests. Single neurons, the so-called concept cells, may be responsible for complex tasks performed by an individual. Here, starting from a few first principles, we layout physical foundations showing that concept cells…
▽ More
It is largely believed that complex cognitive phenomena require the perfect orchestrated collaboration of many neurons. However, this is not what converging experimental evidence suggests. Single neurons, the so-called concept cells, may be responsible for complex tasks performed by an individual. Here, starting from a few first principles, we layout physical foundations showing that concept cells are not only possible but highly likely, given that neurons work in a high dimensional space.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
Blessing of dimensionality at the edge
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Alistair A. McEwan,
Sepehr Meshkinfamfard,
Lixin Tang
Abstract:
In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setti…
▽ More
In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setting, the computational complexity of training is linear in the number of training samples. At the time of classification, the computational complexity is bounded by few inner product calculations. Moreover, the implementation is shown to be very scalable. This makes it viable for deployment in applications where computational power and memory are limited, such as embedded environments. It enables the possibility for fast on-line optimisation using improved training samples. The approach is based on the concentration of measure effects and stochastic separation theorems and is illustrated with an example on the identification faulty processes in Computer Numerical Control (CNC) milling and with a case study on adaptive removal of false positives in an industrial video surveillance and analytics system.
△ Less
Submitted 10 July, 2020; v1 submitted 30 September, 2019;
originally announced October 2019.
-
Symphony of high-dimensional brain
Authors:
Alexander N. Gorban,
Valeri A. Makarov,
Ivan Y. Tyukin
Abstract:
This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion…
▽ More
This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion varied from the necessity to take into account the difference between the theoretical random distributions and "extremely non-random" real distributions and revise the common machine learning theory, to different forms of the curse of dimensionality and high-dimensional pitfalls in neuroscience. V. K{ů}rkov{á}, A. Tozzi and J.F. Peters, R. Quian Quiroga, P. Varona, R. Barrio, G. Kreiman, L. Fortuna, C. van Leeuwen, R. Quian Quiroga, and V. Kreinovich, A.N. Gorban, V.A. Makarov, and I.Y. Tyukin participated in the discussion. In this paper we analyse the symphony of opinions and the possible outcomes of the simplicity revolution for machine learning and neuroscience.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Simple model of complex dynamics of activity patterns in developing networks of neuronal cultures
Authors:
I. Y. Tyukin,
D. Iudin,
F. Iudin,
T. Tyukina,
V. Kazantsev,
I. Mukhina,
A. N. Gorban
Abstract:
Living neuronal networks in dissociated neuronal cultures are widely known for their ability to generate highly robust spatiotemporal activity patterns in various experimental conditions. These include neuronal avalanches satisfying the power scaling law and thereby exemplifying self-organized criticality in living systems. A crucial question is how these patterns can be explained and modeled in a…
▽ More
Living neuronal networks in dissociated neuronal cultures are widely known for their ability to generate highly robust spatiotemporal activity patterns in various experimental conditions. These include neuronal avalanches satisfying the power scaling law and thereby exemplifying self-organized criticality in living systems. A crucial question is how these patterns can be explained and modeled in a way that is biologically meaningful, mathematically tractable and yet broad enough to account for neuronal heterogeneity and complexity. Here we propose a simple model which may offer an answer to this question. Our derivations are based on just few phenomenological observations concerning input-output behavior of an isolated neuron. A distinctive feature of the model is that at the simplest level of description it comprises of only two variables, a network activity variable and an exogenous variable corresponding to energy needed to sustain the activity and modulate the efficacy of signal transmission. Strikingly, this simple model is already capable of explaining emergence of network spikes and bursts in developing neuronal cultures. The model behavior and predictions are supported by empirical observations and published experimental evidence on cultured neurons behavior exposed to oxygen and energy deprivation. At the larger, network scale, introduction of the energy-dependent regulatory mechanism enables the network to balance on the edge of the network percolation transition. Network activity in this state shows population bursts satisfying the scaling avalanche conditions. This network state is self-sustainable and represents a balance between global network-wide processes and spontaneous activity of individual elements.
△ Less
Submitted 22 December, 2018;
originally announced December 2018.
-
Correction of AI systems by linear discriminants: Probabilistic foundations
Authors:
A. N. Gorban,
A. Golubkov,
B. Grechuk,
E. M. Mirkes,
I. Y. Tyukin
Abstract:
Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources i…
▽ More
Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the 'ideal' correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems).
△ Less
Submitted 11 November, 2018;
originally announced November 2018.
-
Fast Construction of Correcting Ensembles for Legacy Artificial Intelligence Systems: Algorithms and a Case Study
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Stephen Green,
Danil Prokhorov
Abstract:
This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the id…
▽ More
This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the ideas of the concentration of measure. We show that, subject to mild technical assumptions on statistical properties of internal signals in the original AI system, the technology enables instantaneous and computationally efficient removal of spurious and systematic errors with probability close to one on the datasets which are exponentially large in dimension. The method is illustrated with numerical examples and a case study of ten digits recognition from American Sign Language.
△ Less
Submitted 13 February, 2019; v1 submitted 12 October, 2018;
originally announced October 2018.
-
The unreasonable effectiveness of small neural ensembles in high-dimensional brain
Authors:
A. N. Gorban,
V. A. Makarov,
I. Y. Tyukin
Abstract:
Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain.
In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea…
▽ More
Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain.
In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems.
These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. New stochastic separation theorems reveal the fine structure of the data clouds.
We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools?
Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons.
△ Less
Submitted 10 November, 2018; v1 submitted 20 September, 2018;
originally announced September 2018.
-
How deep should be the depth of convolutional neural networks: a backyard dog case study
Authors:
A. N. Gorban,
E. M. Mirkes,
I. Y. Tyukin
Abstract:
The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task
The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve,…
▽ More
The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task
The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve, may not necessarily be always needed, desired, or even achievable due to the lack of data or technical constraints. In relation to the face recognition problem, we formulated an example of such a usecase, the `backyard dog' problem. The `backyard dog', implemented by a lean network, should correctly identify members from a limited group of individuals, a `family', and should distinguish between them. At the same time, the network must produce an alarm to an image of an individual who is not in a member of the family. To produce such a network, we propose a shallowing algorithm. The algorithm takes an existing deep learning model on its input and outputs a shallowed version of it. The algorithm is non-iterative and is based on the Advanced Supervised Principal Component Analysis. Performance of the algorithm is assessed in exhaustive numerical experiments. In the above usecase, the `backyard dog' problem, the method is capable of drastically reducing the depth of deep learning neural networks, albeit at the cost of mild performance deterioration.
We developed a simple non-iterative method for shallowing down pre-trained deep networks. The method is generic in the sense that it applies to a broad class of feed-forward networks, and is based on the Advanced Supervise Principal Component Analysis. The method enables generation of families of smaller-size shallower specialized networks tuned for specific operational conditions and tasks from a single larger and more universal legacy network.
△ Less
Submitted 8 December, 2019; v1 submitted 3 May, 2018;
originally announced May 2018.
-
Augmented Artificial Intelligence: a Conceptual Framework
Authors:
Alexander N. Gorban,
Bogdan Grechuk,
Ivan Y. Tyukin
Abstract:
All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effecti…
▽ More
All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems. The mathematical foundations of AI non-destructive correction are presented and a series of new stochastic separation theorems is proven. These theorems provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. They demonstrate that in high dimensions and even for exponentially large samples, linear classifiers in their classical Fisher's form are powerful enough to separate errors from correct responses with high probability and to provide efficient solution to the non-destructive corrector problem. In particular, we prove some hypotheses formulated in our paper `Stochastic Separation Theorems' (Neural Networks, 94, 255--259, 2017), and answer one general problem published by Donoho and Tanner in 2009.
△ Less
Submitted 24 March, 2018; v1 submitted 6 February, 2018;
originally announced February 2018.
-
Blessing of dimensionality: mathematical foundations of the statistical physics of data
Authors:
A. N. Gorban,
I. Y. Tyukin
Abstract:
The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the b…
▽ More
The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality.
This paper summarises recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median level set of energy or another Lipschitz function, etc.).
The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant.
All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us by such classifiers and a non-iterative (one-shot) procedure for learning.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
High-dimensional brain. A tool for encoding and rapid learning of memories by single neurons
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Carlos Calvo,
Julia Makarova,
Valeri A. Makarov
Abstract:
Codifying memories is one of the fundamental problems of modern Neuroscience. The functional mechanisms behind this phenomenon remain largely unknown. Experimental evidence suggests that some of the memory functions are performed by stratified brain structures such as, e.g., the hippocampus. In this particular case, single neurons in the CA1 region receive a highly multidimensional input from the…
▽ More
Codifying memories is one of the fundamental problems of modern Neuroscience. The functional mechanisms behind this phenomenon remain largely unknown. Experimental evidence suggests that some of the memory functions are performed by stratified brain structures such as, e.g., the hippocampus. In this particular case, single neurons in the CA1 region receive a highly multidimensional input from the CA3 area, which is a hub for information processing. We thus assess the implication of the abundance of neuronal signalling routes converging onto single cells on the information processing. We show that single neurons can selectively detect and learn arbitrary information items, given that they operate in high dimensions. The argument is based on Stochastic Separation Theorems and the concentration of measure phenomena. We demonstrate that a simple enough functional neuronal model is capable of explaining: i) the extreme selectivity of single neurons to the information content, ii) simultaneous separation of several uncorrelated stimuli or informational items from a large set, and iii) dynamic learning of new items by associating them with already "known" ones. These results constitute a basis for organization of complex memories in ensembles of single neurons. Moreover, they show that no a priori assumptions on the structural organization of neuronal ensembles are necessary for explaining basic concepts of static and dynamic memories.
△ Less
Submitted 27 January, 2018; v1 submitted 30 October, 2017;
originally announced October 2017.
-
Knowledge Transfer Between Artificial Intelligence Systems
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Konstantin Sofeikov,
Ilya Romanenko
Abstract:
We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Ar…
▽ More
We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the "student" Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the "student" system can successfully and non-iteratively learn $k\ll n$ new examples from the "teacher" (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features.
△ Less
Submitted 14 November, 2017; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Stochastic Separation Theorems
Authors:
A. N. Gorban,
I. Y. Tyukin
Abstract:
The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear dis…
▽ More
The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear discriminants. Surprisingly, separation of a new image from a very large set of known images is almost always possible even in moderately high dimensions by linear functionals, and coefficients of these functionals can be found explicitly. Based on fundamental properties of measure concentration, we show that for $M<a\exp(b{n})$ random $M$-element sets in $\mathbb{R}^n$ are linearly separable with probability $p$, $p>1-\vartheta$, where $1>\vartheta>0$ is a given small constant. Exact values of $a,b>0$ depend on the probability distribution that determines how the random $M$-element sets are drawn, and on the constant $\vartheta$. These {\em stochastic separation theorems} provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. Theoretical statements are illustrated with numerical examples.
△ Less
Submitted 3 August, 2017; v1 submitted 3 March, 2017;
originally announced March 2017.
-
One-Trial Correction of Legacy AI Systems and Stochastic Separation Theorems
Authors:
Alexander N. Gorban,
Ilya Romanenko,
Richard Burton,
Ivan Y. Tyukin
Abstract:
We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables d…
▽ More
We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables dealing with errors without the need to re-train the system. Instead of re-training a simple cascade of perceptron nodes is added to the legacy system. The added cascade modulates the AI legacy system's decisions. If applied repeatedly, the process results in a network of modulating rules "dressing up" and improving performance of existing AI systems. Mathematical rationale behind the method is based on the fundamental property of measure concentration in high dimensional spaces. The method is illustrated with an example of fine-tuning a deep convolutional network that has been pre-trained to detect pedestrians in images.
△ Less
Submitted 13 February, 2019; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Fast Sampling of Evolving Systems with Periodic Trajectories
Authors:
I. Yu. Tyukin,
A. N. Gorban,
T. A. Tyukina,
J. Al Ameri,
Yu. A. Korablev
Abstract:
We propose a novel method for fast and scalable evaluation of periodic solutions of systems of ordinary differential equations for a given set of parameter values and initial conditions. The equations governing the system dynamics are supposed to be of a special class, albeit admitting nonlinear parametrization and state nonlinearities. The method enables to represent a given periodic solution as…
▽ More
We propose a novel method for fast and scalable evaluation of periodic solutions of systems of ordinary differential equations for a given set of parameter values and initial conditions. The equations governing the system dynamics are supposed to be of a special class, albeit admitting nonlinear parametrization and state nonlinearities. The method enables to represent a given periodic solution as sums of computable integrals and functions that are explicitly dependent on parameters of interest and initial conditions. This allows invoking parallel computational streams in order to increase speed of calculations. Performance and practical implications of the method are illustrated with examples including classical predator-prey system and models of neuronal cells.
△ Less
Submitted 27 May, 2016; v1 submitted 10 November, 2015;
originally announced November 2015.
-
Approximation with Random Bases: Pro et Contra
Authors:
Alexander N. Gorban,
Ivan Yu. Tyukin,
Danil V. Prokhorov,
Konstantin I. Sofeikov
Abstract:
In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, wher…
▽ More
In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, where $N$ is the number of elements. We show that both randomized and deterministic procedures are successful if additional information about the families of functions to be approximated is provided. In the absence of such additional information one may observe exponential growth of the number of terms needed to approximate the function and/or extreme sensitivity of the outcome of the approximation to parameters. Implications of our analysis for applications of neural networks in modeling and control are illustrated with examples.
△ Less
Submitted 24 October, 2015; v1 submitted 15 June, 2015;
originally announced June 2015.
-
Leaders do not look back, or do they?
Authors:
Alexander N. Gorban,
Nick Jarman,
Erik Steur,
Cees van Leeuwen,
Ivan Tyukin
Abstract:
We study the effect of adding to a directed chain of interconnected systems a directed feedback from the last element in the chain to the first. The problem is closely related to the fundamental question of how a change in network topology may influence the behavior of coupled systems. We begin the analysis by investigating a simple linear system. The matrix that specifies the system dynamics is t…
▽ More
We study the effect of adding to a directed chain of interconnected systems a directed feedback from the last element in the chain to the first. The problem is closely related to the fundamental question of how a change in network topology may influence the behavior of coupled systems. We begin the analysis by investigating a simple linear system. The matrix that specifies the system dynamics is the transpose of the network Laplacian matrix, which codes the connectivity of the network. Our analysis shows that for any nonzero complex eigenvalue $λ$ of this matrix, the following inequality holds: $\frac{|\Im λ|}{|\Re λ|} \leq \cot\fracπ{n}$. This bound is sharp, as it becomes an equality for an eigenvalue of a simple directed cycle with uniform interaction weights. The latter has the slowest decay of oscillations among all other network configurations with the same number of states. The result is generalized to directed rings and chains of identical nonlinear oscillators. For directed rings, a lower bound $σ_c$ for the connection strengths that guarantees asymptotic synchronization is found to follow a similar pattern: $σ_c=\frac{1}{1-\cos\left( 2π/n\right)} $. Numerical analysis revealed that, depending on the network size $n$, multiple dynamic regimes co-exist in the state space of the system. In addition to the fully synchronous state a rotating wave solution occurs. The effect is observed in networks exceeding a certain critical size. The emergence of a rotating wave highlights the importance of long chains and loops in networks of oscillators: the larger the size of chains and loops, the more sensitive the network dynamics becomes to removal or addition of a single connection.
△ Less
Submitted 21 May, 2015; v1 submitted 6 May, 2015;
originally announced May 2015.
-
Adaptive observers for nonlinearly parameterized systems subjected to parametric constraints
Authors:
I. Yu. Tyukin,
P. A. Rogachev,
H. Nijmeijer
Abstract:
We consider the problem of adaptive observer design in the settings when the system is allowed to be nonlinear in the parameters, and furthermore they are to satisfy additional feasibility constraints. A solution to the problem is proposed that is based on the idea of universal observers and non-uniform small-gain theorem. The procedure is illustrated with an example.
We consider the problem of adaptive observer design in the settings when the system is allowed to be nonlinear in the parameters, and furthermore they are to satisfy additional feasibility constraints. A solution to the problem is proposed that is based on the idea of universal observers and non-uniform small-gain theorem. The procedure is illustrated with an example.
△ Less
Submitted 17 December, 2014;
originally announced December 2014.
-
Further Results on Lyapunov-Like Conditions of Forward Invariance and Boundedness for a Class of Unstable Systems
Authors:
A. N. Gorban,
I. Yu. Tyukin,
H. Nijmeijer
Abstract:
We provide several characterizations of convergence to unstable equilibria in nonlinear systems. Our current contribution is three-fold. First we present simple algebraic conditions for establishing local convergence of non-trivial solutions of nonlinear systems to unstable equilibria. The conditions are based on the earlier work (A.N. Gorban, I.Yu. Tyukin, E. Steur, and H. Nijmeijer, SIAM Journal…
▽ More
We provide several characterizations of convergence to unstable equilibria in nonlinear systems. Our current contribution is three-fold. First we present simple algebraic conditions for establishing local convergence of non-trivial solutions of nonlinear systems to unstable equilibria. The conditions are based on the earlier work (A.N. Gorban, I.Yu. Tyukin, E. Steur, and H. Nijmeijer, SIAM Journal on Control and Optimization, Vol. 51, No. 3, 2013) and can be viewed as an extension of the Lyapunov's first method in that they apply to systems in which the corresponding Jacobian has one zero eigenvalue. Second, we show that for a relevant subclass of systems, persistency of excitation of a function of time in the right-hand side of the equations governing dynamics of the system ensure existence of an attractor basin such that solutions passing through this basin in forward time converge to the origin exponentially. Finally we demonstrate that conditions developed in (A.N. Gorban, I.Yu. Tyukin, E. Steur, and H. Nijmeijer, SIAM Journal on Control and Optimization, Vol. 51, No. 3, 2013) may be remarkably tight.
△ Less
Submitted 1 December, 2014;
originally announced December 2014.
-
Optimal measurement of visual motion across spatial and temporal scales
Authors:
Sergei Gepshtein,
Ivan Tyukin
Abstract:
Sensory systems use limited resources to mediate the perception of a great variety of objects and events. Here a normative framework is presented for exploring how the problem of efficient allocation of resources can be solved in visual perception. Starting with a basic property of every measurement, captured by Gabor's uncertainty relation about the location and frequency content of signals, pres…
▽ More
Sensory systems use limited resources to mediate the perception of a great variety of objects and events. Here a normative framework is presented for exploring how the problem of efficient allocation of resources can be solved in visual perception. Starting with a basic property of every measurement, captured by Gabor's uncertainty relation about the location and frequency content of signals, prescriptions are developed for optimal allocation of sensors for reliable perception of visual motion. This study reveals that a large-scale characteristic of human vision (the spatiotemporal contrast sensitivity function) is similar to the optimal prescription, and it suggests that some previously puzzling phenomena of visual sensitivity, adaptation, and perceptual organization have simple principled explanations.
△ Less
Submitted 2 May, 2014;
originally announced May 2014.
-
Supplementary material for: Adaptive Observers and Parameter Estimation for a Class of Systems Nonlinear in the Parameters
Authors:
Ivan Tyukin,
Erik Steur,
Henk Nijmeijer,
Cees van Leeuwen
Abstract:
This supplement illustrates application of adaptive observer design from (Tyukin et al, 2013) for systems which are not uniquely identifiable. It also provides an example of adaptive observer design for a magnetic bearings benchmark system (Lin, Knospe, 2000).
This supplement illustrates application of adaptive observer design from (Tyukin et al, 2013) for systems which are not uniquely identifiable. It also provides an example of adaptive observer design for a magnetic bearings benchmark system (Lin, Knospe, 2000).
△ Less
Submitted 16 April, 2013; v1 submitted 15 April, 2013;
originally announced April 2013.
-
Explicit Reduced-Order Integral Formulations of State and Parameter Estimation Problems for a Class of Nonlinear Systems
Authors:
I. Yu. Tyukin,
A. N. Gorban
Abstract:
We propose a technique for reformulation of state and parameter estimation problems as that of matching explicitly computable definite integrals with known kernels to data. The technique applies for a class of systems of nonlinear ordinary differential equations and is aimed to exploit parallel computational streams in order to increase speed of calculations. The idea is based on the classical ada…
▽ More
We propose a technique for reformulation of state and parameter estimation problems as that of matching explicitly computable definite integrals with known kernels to data. The technique applies for a class of systems of nonlinear ordinary differential equations and is aimed to exploit parallel computational streams in order to increase speed of calculations. The idea is based on the classical adaptive observers design. It has been shown that in case the data is periodic it may be possible to reduce dimensionality of the inference problem to that of the dimension of the vector of parameters entering the right-hand side of the model nonlinearly. Performance and practical implications of the method are illustrated on a benchmark model governing dynamics of voltage in generated in barnacle giant muscle.
△ Less
Submitted 10 September, 2013; v1 submitted 5 April, 2013;
originally announced April 2013.
-
Uncertainty of visual measurement and efficient allocation of sensory resources
Authors:
Sergei Gepshtein,
Ivan Tyukin
Abstract:
We review the reasoning underlying two approaches to combination of sensory uncertainties. First approach is noncommittal, making no assumptions about properties of uncertainty or parameters of stimulation. Then we explain the relationship between this approach and the one commonly used in modeling "higher level" aspects of sensory systems, such as in visual cue integration, where assumptions are…
▽ More
We review the reasoning underlying two approaches to combination of sensory uncertainties. First approach is noncommittal, making no assumptions about properties of uncertainty or parameters of stimulation. Then we explain the relationship between this approach and the one commonly used in modeling "higher level" aspects of sensory systems, such as in visual cue integration, where assumptions are made about properties of stimulation. The two approaches follow similar logic, except in one case maximal uncertainty is minimized, and in the other minimal certainty is maximized. Then we demonstrate how optimal solutions are found to the problem of resource allocation under uncertainty.
△ Less
Submitted 2 May, 2014; v1 submitted 1 July, 2010;
originally announced July 2010.
-
Feasibility of random basis function approximators for modeling and control
Authors:
Ivan Tyukin,
Danil Prokhorov
Abstract:
We discuss the role of random basis function approximators in modeling and control. We analyze the published work on random basis function approximators and demonstrate that their favorable error rate of convergence O(1/n) is guaranteed only with very substantial computational resources. We also discuss implications of our analysis for applications of neural networks in modeling and control.
We discuss the role of random basis function approximators in modeling and control. We analyze the published work on random basis function approximators and demonstrate that their favorable error rate of convergence O(1/n) is guaranteed only with very substantial computational resources. We also discuss implications of our analysis for applications of neural networks in modeling and control.
△ Less
Submitted 5 May, 2009;
originally announced May 2009.
-
Observers for canonic models of neural oscillators
Authors:
David Fairhurst,
Ivan Tyukin,
Henk Nijmeijer,
Cees van Leeuwen
Abstract:
We consider the problem of state and parameter estimation for a wide class of nonlinear oscillators. Observable variables are limited to a few components of state vector and an input signal. The problem of state and parameter reconstruction is viewed within the classical framework of observer design. This framework offers computationally-efficient solutions to the problem of state and parameter re…
▽ More
We consider the problem of state and parameter estimation for a wide class of nonlinear oscillators. Observable variables are limited to a few components of state vector and an input signal. The problem of state and parameter reconstruction is viewed within the classical framework of observer design. This framework offers computationally-efficient solutions to the problem of state and parameter reconstruction of a system of nonlinear differential equations, provided that these equations are in the so-called adaptive observer canonic form. We show that despite typical neural oscillators being locally observable they are not in the adaptive canonic observer form. Furthermore, we show that no parameter-independent diffeomorphism exists such that the original equations of these models can be transformed into the adaptive canonic observer form. We demonstrate, however, that for the class of Hindmarsh-Rose and FitzHugh-Nagumo models, parameter-dependent coordinate transformations can be used to render these systems into the adaptive observer canonical form. This allows reconstruction, at least partially and up to a (bi)linear transformation, of unknown state and parameter values with exponential rate of convergence. In order to avoid the problem of only partial reconstruction and to deal with more general nonlinear models in which the unknown parameters enter the system nonlinearly, we present a new method for state and parameter reconstruction for these systems. The method combines advantages of standard Lyapunov-based design with more flexible design and analysis techniques based on the non-uniform small-gain theorems. Effectiveness of the method is illustrated with simple numerical examples.
△ Less
Submitted 18 June, 2010; v1 submitted 1 May, 2009;
originally announced May 2009.