Skip to main content

Showing 1–32 of 32 results for author: Mirkes, E M

.
  1. arXiv:2505.19635  [pdf, ps, other

    cs.LG math.ST stat.ML

    When fractional quasi p-norms concentrate

    Authors: Ivan Y. Tyukin, Bogdan Grechuk, Evgeny M. Mirkes, Alexander N. Gorban

    Abstract: Concentration of distances in high dimension is an important factor for the development and design of stable and reliable data analysis algorithms. In this paper, we address the fundamental long-standing question about the concentration of distances in high dimension for fractional quasi $p$-norms, $p\in(0,1)$. The topic has been at the centre of various theoretical and empirical controversies. He… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    MSC Class: 68T09; 62R07; 94A16

  2. arXiv:2402.06563  [pdf

    cs.LG cs.AI cs.CL cs.HC cs.IT

    What is Hiding in Medicine's Dark Matter? Learning with Missing Data in Medical Practices

    Authors: Neslihan Suzen, Evgeny M. Mirkes, Damian Roland, Jeremy Levesley, Alexander N. Gorban, Tim J. Coats

    Abstract: Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 8 pages

    Journal ref: 2023 IEEE International Conference on Big Data (BigData), 4979-4986

  3. arXiv:2402.00899  [pdf, other

    cs.LG cs.AI stat.ML

    Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees

    Authors: Ivan Y. Tyukin, Tatiana Tyukina, Daniel van Helden, Zedong Zheng, Evgeny M. Mirkes, Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban, Penelope Allison

    Abstract: We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining fro… ▽ More

    Submitted 13 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    MSC Class: 68T05; 68T37

  4. Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data

    Authors: Evgeny M Mirkes, Jonathan Bac, Aziz Fouché, Sergey V. Stasenko, Andrei Zinovyev, Alexander N. Gorban

    Abstract: Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between s… ▽ More

    Submitted 15 December, 2022; v1 submitted 28 August, 2022; originally announced August 2022.

    Journal ref: Entropy, 25(1), 33, 2023

  5. arXiv:2205.15696  [pdf

    cs.CL cs.AI cs.HC cs.IT

    An Informational Space Based Semantic Analysis for Scientific Texts

    Authors: Neslihan Suzen, Alexander N. Gorban, Jeremy Levesley, Evgeny M. Mirkes

    Abstract: One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: 19 pages. arXiv admin note: substantial text overlap with arXiv:2009.08859, arXiv:2004.13717

    Journal ref: Computer Science & Information Technology, volume 12, number 08, pp. 81-99, 2022. CS & IT - CSCP 2022

  6. arXiv:2203.16687  [pdf, other

    cs.LG

    Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

    Authors: Qinghua Zhou, Alexander N. Gorban, Evgeny M. Mirkes, Jonathan Bac, Andrei Zinovyev, Ivan Y. Tyukin

    Abstract: Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural arc… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    MSC Class: 68T05; 68Q32

  7. arXiv:2109.02596  [pdf, other

    cs.LG stat.ML

    Scikit-dimension: a Python package for intrinsic dimension estimation

    Authors: Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, Andrei Zinovyev

    Abstract: Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source P… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 12 pages, 4 figures, 1 table

    Journal ref: Entropy, 2021, 23(10), 1368

  8. Learning from scarce information: using synthetic data to classify Roman fine ware pottery

    Authors: Santos J. Núñez Jareño, Daniël P. van Helden, Evgeny M. Mirkes, Ivan Y. Tyukin, Penelope M. Allison

    Abstract: In this article we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study the objects were smartphone photographs of n… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

    MSC Class: 68T07; 68T45

  9. arXiv:2106.15416  [pdf, other

    cs.LG cs.AI stat.ML

    High-dimensional separability for one- and few-shot learning

    Authors: Alexander N. Gorban, Bogdan Grechuk, Evgeny M. Mirkes, Sergey V. Stasenko, Ivan Y. Tyukin

    Abstract: This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in… ▽ More

    Submitted 22 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: Corrected and restructured version with some extensions

    Journal ref: Entropy. 2021; 23(8):1090

  10. arXiv:2106.08966  [pdf

    physics.soc-ph q-bio.PE

    Social stress drives the multi-wave dynamics of COVID-19 outbreaks

    Authors: I. A. Kastalskiy, E. V. Pankratova, E. M. Mirkes, V. B. Kazantsev, A. N. Gorban

    Abstract: The dynamics of epidemics depend on how people's behavior changes during an outbreak. At the beginning of the epidemic, people do not know about the virus, then, after the outbreak of epidemics and alarm, they begin to comply with the restrictions and the spreading of epidemics may decline. Over time, some people get tired/frustrated by the restrictions and stop following them (exhaustion), especi… ▽ More

    Submitted 19 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Minor corrections, enriched discussion and extended bibliography

    Journal ref: Sci Rep 11, 22497 (2021)

  11. Coloring Panchromatic Nighttime Satellite Images: Comparing the Performance of Several Machine Learning Methods

    Authors: N. Rybnikova, B. A. Portnov, E. M. Mirkes, A. Zinovyev, A. Brook, A. N. Gorban

    Abstract: Artificial light-at-night (ALAN), emitted from the ground and visible from space, marks human presence on Earth. Since the launch of the Suomi National Polar Partnership satellite with the Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS/DNB) onboard, global nighttime images have significantly improved; however, they remained panchromatic. Although multispectral images are also avai… ▽ More

    Submitted 10 April, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 60, Art no. 4702715. 2022

  12. Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data

    Authors: Sergey E. Golovenkin, Jonathan Bac, Alexander Chervov, Evgeny M. Mirkes, Yuliya V. Orlova, Emmanuel Barillot, Alexander N. Gorban, Andrei Zinovyev

    Abstract: Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such a… ▽ More

    Submitted 5 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    ACM Class: I.2.6; J.3; J.2

    Journal ref: GigaScience, Volume 9, Issue 11, 2020, giaa128,

  13. arXiv:2005.06284  [pdf, other

    cs.LG cs.NE stat.ML

    Pruning coupled with learning, ensembles of minimal neural networks, and future of XAI

    Authors: Alexander N. Gorban, Evgeny M. Mirkes

    Abstract: Pruning coupled with learning aims to optimize the neural network (NN) structure for solving specific problems. This optimization can be used for various purposes: to prevent overfitting, to save resources for implementation and training, to provide explainability of the trained NN, and many others. The minimal structure that cannot be pruned further is not unique. Ensemble of minimal structures c… ▽ More

    Submitted 22 January, 2023; v1 submitted 13 May, 2020; originally announced May 2020.

    Comments: Significantly modified and extended version, 23 pages, 5 figures

  14. arXiv:2004.14249  [pdf, other

    physics.chem-ph

    Universal Gorban's Entropies: Geometric Case Study

    Authors: Evgeny M Mirkes

    Abstract: Recently, A.N. Gorban presented a rich family of universal Lyapunov functions for any linear or non-linear reaction network with detailed or complex balance. Two main elements of the construction algorithm are partial equilibria of reactions and convex envelopes of families of functions. These new functions aimed to resolve "the mystery" about the difference between the rich family of Lyapunov fun… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Journal ref: Entropy, 22(3), p.264 (2020)

  15. arXiv:2004.14230  [pdf, other

    cs.LG stat.ML

    Fractional norms and quasinorms do not help to overcome the curse of dimensionality

    Authors: Evgeny M. Mirkes, Jeza Allohibi, Alexander N. Gorban

    Abstract: The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Journal ref: Entropy. 2020; 22(10):1105

  16. arXiv:2004.13717  [pdf, other

    cs.CL cs.DL

    Informational Space of Meaning for Scientific Texts

    Authors: Neslihan Suzen, Evgeny M. Mirkes, Alexander N. Gorban

    Abstract: In Natural Language Processing, automatic extracting the meaning of texts constitutes an important problem. Our focus is the computational analysis of meaning of short scientific texts (abstracts or brief reports). In this paper, a vector space model is developed for quantifying the meaning of words and texts. We introduce the Meaning Space, in which the meaning of a word is represented by a vecto… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 320 pages

  17. Personality Traits and Drug Consumption. A Story Told by Data

    Authors: Elaine Fehrman, Vincent Egan, Alexander N. Gorban, Jeremy Levesley, Evgeny M. Mirkes, Awaz K. Muhammad

    Abstract: This is a preprint version of the first book from the series: "Stories told by data". In this book a story is told about the psychological traits associated with drug consumption. The book includes: - A review of published works on the psychological profiles of drug users. - Analysis of a new original database with information on 1885 respondents and usage of 18 drugs. (Database is available o… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

    Comments: A preprint version prepared by the authors before the Springer editorial work. 124 pages, 27 figures, 63 tables, bibl. 244

    Journal ref: Springer, Cham, Research Monograph, 2019, ISBN 978-3-030-10441-2

  18. arXiv:1912.06858  [pdf, other

    cs.CL cs.DL

    LScDC-new large scientific dictionary

    Authors: Neslihan Suzen, Evgeny M. Mirkes, Alexander N. Gorban

    Abstract: In this paper, we present a scientific corpus of abstracts of academic papers in English -- Leicester Scientific Corpus (LSC). The LSC contains 1,673,824 abstracts of research articles and proceeding papers indexed by Web of Science (WoS) in which publication year is 2014. Each abstract is assigned to at least one of 252 subject categories. Paper metadata include these categories and the number of… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

    Comments: 63 pages

  19. arXiv:1811.05321  [pdf, other

    cs.LG cs.AI stat.ML

    Correction of AI systems by linear discriminants: Probabilistic foundations

    Authors: A. N. Gorban, A. Golubkov, B. Grechuk, E. M. Mirkes, I. Y. Tyukin

    Abstract: Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources i… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.07656 and arXiv:1802.02172

    Journal ref: Information Sciences 466 (2018), 303-322

  20. arXiv:1805.01516  [pdf, ps, other

    cs.NE cs.LG stat.ML

    How deep should be the depth of convolutional neural networks: a backyard dog case study

    Authors: A. N. Gorban, E. M. Mirkes, I. Y. Tyukin

    Abstract: The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve,… ▽ More

    Submitted 8 December, 2019; v1 submitted 3 May, 2018; originally announced May 2018.

    Comments: Edited and extended version with more detailed description of numerical experiments

  21. arXiv:1804.07580  [pdf

    cs.LG q-bio.QM stat.ML

    Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

    Authors: Luca Albergante, Evgeny M. Mirkes, Huidong Chen, Alexis Martin, Louis Faure, Emmanuel Barillot, Luca Pinello, Alexander N. Gorban, Andrei Zinovyev

    Abstract: Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computa… ▽ More

    Submitted 20 June, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: 32 pages, 14 figures

    Journal ref: Entropy 22, no. 3: 296, 2020

  22. Pseudo-Outcrop Visualization of Borehole Images and Core Scans

    Authors: Evgeny M. Mirkes, Alexander N. Gorban, Jeremy Levesley, Peter A. S. Elkington, James A. Whetton

    Abstract: A pseudo-outcrop visualization is demonstrated for borehole and full-diameter rock core images to augment the ubiquitous unwrapped cylinder view and thereby to assist non-specialist interpreters. The pseudo-outcrop visualization is equivalent to a nonlinear projection of the image from borehole to earth frame of reference that creates a solid volume sliced longitudinally to reveal two or more face… ▽ More

    Submitted 3 September, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

    Comments: Updated and corrected version with extended set of figures

    Journal ref: Mathematical Geosciences, 2017

  23. Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

    Authors: A. N. Gorban, E. M. Mirkes, A. Zinovyev

    Abstract: Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore,… ▽ More

    Submitted 21 August, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

    Comments: Edited and extended version with algortihms of regularized regression

    Journal ref: Neural Networks, Volume 84, December 2016, 28-38

  24. Handling missing data in large healthcare dataset: a case study of unknown trauma outcomes

    Authors: E. M. Mirkes, T. J. Coats, J. Levesley, A. N. Gorban

    Abstract: Handling of missed data is one of the main tasks in data preprocessing especially in large public service datasets. We have analysed data from the Trauma Audit and Research Network (TARN) database, the largest trauma database in Europe. For the analysis we used 165,559 trauma cases. Among them, there are 19,289 cases (13.19\%) with unknown outcome. We have demonstrated that these outcomes are not… ▽ More

    Submitted 18 May, 2020; v1 submitted 3 April, 2016; originally announced April 2016.

    Comments: Minor editing and additions

    Journal ref: Computers in Biology and Medicine, 75 (2016) 203-216

  25. Robust principal graphs for data approximation

    Authors: A. N. Gorban, E. M. Mirkes, A. Zinovyev

    Abstract: Revealing hidden geometry and topology in noisy data sets is a challenging task. Elastic principal graph is a computationally efficient and flexible data approximator based on embedding a graph into the data space and minimizing the energy functional penalizing the deviation of graph nodes both from data points and from pluri-harmonic configuration (generalization of linearity). The structure of p… ▽ More

    Submitted 24 November, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

    Comments: A talk given at ECDA2015 (European Conference on Data Analysis, September 2nd to 4th 2015, University of Essex, Colchester, UK), to be published in Archives of Data Science

    Journal ref: Archives of Data Science, Series A, Vol. 2, No. 1, 2017

  26. arXiv:1506.06297  [pdf, ps, other

    stat.AP

    The Five Factor Model of personality and evaluation of drug consumption risk

    Authors: E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan, A. N. Gorban

    Abstract: The problem of evaluating an individual's risk of drug consumption and misuse is highly important. An online survey methodology was employed to collect data including Big Five personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation seeking (ImpSS), and demographic information. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation… ▽ More

    Submitted 15 January, 2017; v1 submitted 20 June, 2015; originally announced June 2015.

    Comments: Significantly extended report with 67 pages, 27 tables, 21 figures

  27. arXiv:1503.05869  [pdf

    q-bio.GN

    Long and short range multi-locus QTL interactions in a complex trait of yeast

    Authors: Evgeny M. Mirkes, Thomas Walsh, Edward J. Louis, Alexander N. Gorban

    Abstract: We analyse interactions of Quantitative Trait Loci (QTL) in heat selected yeast by comparing them to an unselected pool of random individuals. Here we re-examine data on individual F12 progeny selected for heat tolerance, which have been genotyped at 25 locations identified by sequencing a selected pool [Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J., Molin, M.,… ▽ More

    Submitted 19 March, 2015; originally announced March 2015.

  28. Computational diagnosis and risk evaluation for canine lymphoma

    Authors: E. M. Mirkes, I. Alexandrakis, K. Slater, R. Tuli, A. N. Gorban

    Abstract: The canine lymphoma blood test detects the levels of two biomarkers, the acute phase proteins (C-Reactive Protein and Haptoglobin). This test can be used for diagnostics, for screening, and for remission monitoring as well. We analyze clinical data, test various machine learning methods and select the best approach to these problems. Three family of methods, decision trees, kNN (including advanced… ▽ More

    Submitted 3 July, 2014; v1 submitted 21 May, 2013; originally announced May 2013.

    Comments: 24 pages, 86 references in the bibliography, Significantly extended version with review of lymphoma biomarkers and data mining methods (Three new sections are added: 1.1. Biomarkers for canine lymphoma, 1.2. Acute phase proteins as lymphoma biomarkers and 3.1. Data mining methods for biomarker cancer diagnosis. Flowcharts of data analysis are included as supplementary material (20 pages)

    Journal ref: Computers in Biology and Medicine, Volume 53, 1 October 2014, 279-290

  29. Geometrical complexity of data approximators

    Authors: E. M. Mirkes, A. Zinovyev, A. N. Gorban

    Abstract: There are many methods developed to approximate a cloud of vectors embedded in high-dimensional space by simpler objects: starting from principal points and linear manifolds to self-organizing maps, neural gas, elastic maps, various types of principal curves and principal trees, and so on. For each type of approximators the measure of the approximator complexity was developed too. These measures a… ▽ More

    Submitted 3 May, 2013; v1 submitted 11 February, 2013; originally announced February 2013.

    Comments: 10 pages, 3 figures, minor correction and extension

    Journal ref: IWANN 2013, Advances in Computation Intelligence, Springer LNCS 7902, pp. 500-509, 2013

  30. arXiv:1210.5873  [pdf

    stat.ML cs.LG

    Initialization of Self-Organizing Maps: Principal Components Versus Random Initialization. A Case Study

    Authors: A. A. Akinduko, E. M. Mirkes

    Abstract: The performance of the Self-Organizing Map (SOM) algorithm is dependent on the initial weights of the map. The different initialization methods can broadly be classified into random and data analysis based initialization approach. In this paper, the performance of random initialization (RI) approach is compared to that of principal component initialization (PCI) in which the initial map weights ar… ▽ More

    Submitted 22 October, 2012; originally announced October 2012.

    Comments: 18 pages, 6 figures

  31. arXiv:1207.2507  [pdf, ps, other

    cond-mat.stat-mech physics.chem-ph

    Thermodynamics in the Limit of Irreversible Reactions

    Authors: A. N. Gorban, E. M. Mirkes, G. S. Yablonsky

    Abstract: For many real physico-chemical complex systems detailed mechanism includes both reversible and irreversible reactions. Such systems are typical in homogeneous combustion and heterogeneous catalytic oxidation. Most complex enzyme reactions include irreversible steps. The classical thermodynamics has no limit for irreversible reactions whereas the kinetic equations may have such a limit. We represen… ▽ More

    Submitted 11 October, 2012; v1 submitted 10 July, 2012; originally announced July 2012.

    Comments: 23 pages, extended version with figs

    Journal ref: Physica A, Volume 392, Issue 6, 2013, Pages 1318-1335

  32. arXiv:cond-mat/0307083  [pdf

    cond-mat cs.NE physics.data-an

    Generation of Explicit Knowledge from Empirical Data through Pruning of Trainable Neural Networks

    Authors: A. N. Gorban, Eu. M. Mirkes, V. G. Tsaregorodtsev

    Abstract: This paper presents a generalized technology of extraction of explicit knowledge from data. The main ideas are 1) maximal reduction of network complexity (not only removal of neurons or synapses, but removal all the unnecessary elements and signals and reduction of the complexity of elements), 2) using of adjustable and flexible pruning process (the pruning sequence shouldn't be predetermined -… ▽ More

    Submitted 3 July, 2003; originally announced July 2003.

    Comments: 9 pages, The talk was given at the IJCNN '99 (Washington DC, July 1999)