Skip to main content

Showing 1–30 of 30 results for author: Tsagris, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2501.02849  [pdf, other

    stat.CO

    Fast and light-weight energy statistics using the \textit{R} package \textsf{Rfast}

    Authors: Michail Tsagris, Manos Papadakis

    Abstract: Energy statistics, also known as $\mathcal{\varepsilon}$-statistics, are functions of distances between statistical observations. This class of functions has enabled the development of non-linear statistical concepts, such as distance variance, distance covariance, and distance correlation. However, the computational burden associated with $\mathcal{\varepsilon}$-statistics is substantial, particu… ▽ More

    Submitted 21 February, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

  2. arXiv:2412.05199  [pdf

    stat.ME

    Energy Based Equality of Distributions Testing for Compositional Data

    Authors: Volkan Sevinc, Michail Tsagris

    Abstract: Not many tests exist for testing the equality for two or more multivariate distributions with compositional data, perhaps due to their constrained sample space. At the moment, there is only one test suggested that relies upon random projections. We propose a novel test termed α-Energy Based Test (α-EBT) to compare the multivariate distributions of two (or more) compositional data sets. Similar to… ▽ More

    Submitted 11 December, 2024; v1 submitted 6 December, 2024; originally announced December 2024.

  3. Directional data analysis using the spherical Cauchy and the Poisson kernel-based distribution

    Authors: Michail Tsagris, Panagiotis Papastamoulis, Shogo Kato

    Abstract: In 2020, two novel distributions for the analysis of directional data were introduced: the spherical Cauchy distribution and the Poisson kernel-based distribution. This paper provides a detailed exploration of both distributions within various analytical frameworks. To enhance the practical utility of these distributions, alternative parametrizations that offer advantages in numerical stability an… ▽ More

    Submitted 10 November, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

    Journal ref: Statistics and Computing 35, 51 (2025)

  4. arXiv:2403.19835  [pdf, other

    stat.ME

    Constrained least squares simplicial-simplicial regression

    Authors: Michail Tsagris

    Abstract: Simplicial-simplicial regression refers to the regression setting where both the responses and predictor variables lie within the simplex space, i.e. they are compositional. For this setting, constrained least squares, where the regression coefficients themselves lie within the simplex, is proposed. The model is transformation-free but the adoption of a power transformation is straightforward, it… ▽ More

    Submitted 23 December, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  5. arXiv:2302.02468  [pdf, other

    stat.ME

    Circular and Spherical Projected Cauchy Distributions: A Novel Framework for Circular and Directional Data Modeling

    Authors: Michail Tsagris, Omar Alzeley

    Abstract: We introduce a novel family of projected distributions on the circle and the sphere, namely the circular and spherical projected Cauchy distributions, as promising alternatives for modelling circular and spherical data. The circular distribution encompasses the wrapped Cauchy distribution as a special case, while featuring a more convenient parameterisation. We also propose a generalised wrapped C… ▽ More

    Submitted 11 September, 2024; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Preprint

    MSC Class: 62H11; 62H10

  6. arXiv:2211.03181  [pdf, other

    stat.ME

    Cauchy robust principal component analysis with applications to high-deimensional data sets

    Authors: Ayisha Fayomi, Yannis Pantazis, Michail Tsagris, Andrew T. A. Wood

    Abstract: Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modifie… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  7. arXiv:2211.02582  [pdf, other

    stat.ME stat.CO

    Inference for Network Count Time Series with the R Package PNAR

    Authors: Mirko Armillotta, Michail Tsagris, Konstantinos Fokianos

    Abstract: We introduce a new R package useful for inference about network count time series. Such data are frequently encountered in statistics and they are usually treated as multivariate time series. Their statistical analysis is based on linear or log linear models. Nonlinear models, which have been applied successfully in several research areas, have been neglected from such applications mainly because… ▽ More

    Submitted 25 October, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

  8. arXiv:2208.13073  [pdf, ps, other

    stat.ME

    Modelling structural zeros in compositional data via a zero-censored multivariate normal model

    Authors: Michail Tsagris

    Abstract: We present a new model for analyzing compositional data with structural zeros. Inspired by \cite{butler2008} who suggested a model in the presence of zero values in the data we propose a model that treats the zero values in a different manner. Instead of projecting every zero value towards a vertex, we project them onto their corresponding edge and fit a zero-censored multivariate model.

    Submitted 27 August, 2022; originally announced August 2022.

    MSC Class: 62F30; 62H12

  9. The FEDHC Bayesian network learning algorithm

    Authors: Michail Tsagris

    Abstract: The paper proposes a new hybrid Bayesian network learning algorithm, termed Forward Early Dropping Hill Climbing (FEDHC), devised to work with either continuous or categorical variables. Further, the paper manifests that the only implementation of MMHC in the statistical software \textit{R}, is prohibitively expensive and a new implementation is offered. Further, specifically for the case of conti… ▽ More

    Submitted 12 August, 2022; v1 submitted 30 November, 2020; originally announced December 2020.

    Comments: This is a preprint of the paper published in Mathematics

    MSC Class: 62H22

    Journal ref: Mathematics 2022, 10(15), 2604

  10. Estimating NBA players salary share according to their performance on court: A machine learning approach

    Authors: Ioanna Papadaki, Michail Tsagris

    Abstract: It is customary for researchers and practitioners to fit linear models in order to predict NBA player's salary based on the players' performance on court. On the contrary, we focus on the players salary share (with regards to the team payroll) by first selecting the most important determinants or statistics (years of experience in the league, games played, etc.) and then utilise them to predict th… ▽ More

    Submitted 31 October, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: 19 pages

  11. arXiv:2004.00281  [pdf, other

    stat.ML cs.LG q-bio.GN

    A generalised OMP algorithm for feature selection with application to gene expression data

    Authors: Michail Tsagris, Zacharias Papadovasilakis, Kleanthi Lakiotaki, Ioannis Tsamardinos

    Abstract: Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of available features. In this paper, we propose gOMP, a highly-scalable generalisation of the Orthogonal Matching Pursuit feature selectio… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

  12. arXiv:2002.05137  [pdf, other

    stat.ME

    Flexible non-parametric regression models for compositional data

    Authors: Michail Tsagris, Abdulaziz Alenazi, Connie Stewart

    Abstract: Compositional data arise in many real-life applications and versatile methods for properly analyzing this type of data in the regression context are needed. When parametric assumptions do not hold or are difficult to verify, non-parametric regression models can provide a convenient alternative method for prediction. To this end, we consider an extension to the classical $k$--$NN$ regression, terme… ▽ More

    Submitted 6 September, 2023; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: This is a preprint

    MSC Class: 62H99; 62G08

  13. arXiv:2002.04691  [pdf, other

    stat.ME stat.CO

    Computationally efficient univariate filtering for massive data

    Authors: M. Tsagris, A. Alenazi, S. Fafalios

    Abstract: The vast availability of large scale, massive and big data has increased the computational cost of data analysis. One such case is the computational cost of the univariate filtering which typically involves fitting many univariate regression models and is essential for numerous variable selection algorithms to reduce the number of predictor variables. The paper manifests how to dramatically reduce… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: The paper has been submitted

  14. arXiv:1812.11361  [pdf, other

    stat.ME

    Hypothesis testing for two population means: parametric or non-parametric test?

    Authors: Michail Tsagris, Abdulaziz Alenazi, Kleio-Maria Verrou, Nikolaos Pandis

    Abstract: The parametric Welch $t$-test and the non-parametric Wilcoxon-Mann-Whitney test are the most commonly used two independent sample means tests. More recent testing approaches include the non-parametric, empirical likelihood and exponential empirical likelihood. However, the applicability of these non-parametric likelihood testing procedures is limited partially because of their tendency to inflate… ▽ More

    Submitted 4 October, 2019; v1 submitted 29 December, 2018; originally announced December 2018.

    Comments: Accepted for publication in the Journal of Statistical Computation and Simulation

    MSC Class: 62G10; 62G09

  15. Gaussian asymptotic limits for the $α$-transformation in the analysis of compositional data

    Authors: Yannis Pantazis, Michail Tsagris, Andrew T. A. Wood

    Abstract: Compositional data consists of vectors of proportions whose components sum to 1. Such vectors lie in the standard simplex, which is a manifold with boundary. One issue that has been rather controversial within the field of compositional data analysis is the choice of metric on the simplex. One popular possibility has been to use the metric implied by logtransforming the data, as proposed by Aitchi… ▽ More

    Submitted 21 February, 2019; v1 submitted 29 November, 2018; originally announced December 2018.

    Comments: This is a preprint of the original publication that is available at https://link.springer.com/article/10.1007/s13171-018-00160-1

    MSC Class: 62E20; 62H12

  16. arXiv:1806.10947  [pdf

    stat.ME

    Extremely efficient permutation and bootstrap hypothesis tests using R

    Authors: Christina Chatzipantsiou, Marios Dimitriadis, Manos Papadakis, Michail Tsagris

    Abstract: Re-sampling based statistical tests are known to be computationally heavy, but reliable when small sample sizes are available. Despite their nice theoretical properties not much effort has been put to make them efficient. In this paper we treat the case of Pearson correlation coefficient and two independent samples t-test. We propose a highly computationally efficient method for calculating permut… ▽ More

    Submitted 28 June, 2018; originally announced June 2018.

    Comments: Theis is a pre-print of the paper that was accepted in the Journal of Modern Applied Statistical Methods

  17. arXiv:1802.07330  [pdf, ps, other

    stat.ML stat.ME

    A folded model for compositional data analysis

    Authors: Michail Tsagris, Connie Stewart

    Abstract: A folded type model is developed for analyzing compositional data. The proposed model involves an extension of the $α$-transformation for compositional data and provides a new and flexible class of distributions for modeling data defined on the simplex sample space. Despite its rather seemingly complex structure, employment of the EM algorithm guarantees efficient parameter estimation. The model i… ▽ More

    Submitted 26 February, 2019; v1 submitted 20 February, 2018; originally announced February 2018.

  18. arXiv:1706.02046  [pdf, other

    stat.ME

    Conditional independence test for categorical data using Poisson log-linear model

    Authors: Michail Tsagris

    Abstract: We demonstrate how to test for conditional independence of two variables with categorical data using Poisson log-linear models. The size of the conditioning set of variables can vary from 0 (simple independence) up to many variables. We also provide a function in R for performing the test. Instead of calculating all possible tables with for loop we perform the test using the log-linear models and… ▽ More

    Submitted 7 June, 2017; originally announced June 2017.

    Comments: 11 pages and 1 Figure

    Journal ref: Journal of Data Science, 2017, Volume 15(2): 347-356

  19. arXiv:1611.03227  [pdf, ps, other

    stat.ML q-bio.QM

    Feature Selection with the R Package MXM: Discovering Statistically-Equivalent Feature Subsets

    Authors: Vincenzo Lagani, Giorgos Athineou, Alessio Farcomeni, Michail Tsagris, Ioannis Tsamardinos

    Abstract: The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constrained-based learning of Bayesian Networks. Most of the currently available feature-selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal… ▽ More

    Submitted 10 November, 2016; originally announced November 2016.

    Comments: Accepted for publication in Journal of Statistical Software

  20. arXiv:1607.07974  [pdf, ps, other

    stat.ME

    Nonparametric hypothesis testing for equality of means on the simplex

    Authors: Michail Tsagris, Simon Preston, Andrew T. A. Wood

    Abstract: In the context of data that lie on the simplex, we investigate use of empirical and exponential empirical likelihood, and Hotelling and James statistics, to test the null hypothesis of equal population means based on two independent samples. We perform an extensive numerical study using data simulated from various distributions on the simplex. The results, taken together with practical considerati… ▽ More

    Submitted 4 August, 2016; v1 submitted 27 July, 2016; originally announced July 2016.

    Comments: This is a preprint of the article to be published by Taylor & Francis Group in Journal of Statistical Computation and Simulation

  21. arXiv:1511.07601  [pdf

    stat.ME

    Exploring the Distribution for the Estimator of Rosenthal's 'Fail-Safe' Number of Unpublished Studies in Meta-analysis

    Authors: Konstantinos C. Fragkos, Michail Tsagris, Christos C. Frangos

    Abstract: The present paper discusses the statistical distribution for the estimator of Rosenthal's 'Fail-Safe' number NR, which is an estimator of unpublished studies in meta-analysis. We calculate the probability distribution function of NR. This is achieved based on the Central Limit Theorem and the proposition that certain components of the estimator NR follow a half normal distribution, derived from th… ▽ More

    Submitted 24 November, 2015; originally announced November 2015.

    Comments: This is a preperint of the paper accepted for publication in Communications in Statistics: Theory and Methods. arXiv admin note: text overlap with arXiv:1509.01365

  22. arXiv:1511.07600  [pdf, ps, other

    stat.ME

    A novel, divergence based, regression for compositional data

    Authors: Michail Tsagris

    Abstract: In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science amongst others. The goal of this paper is to propose a new, divergence based, regression modelling technique for compositional data. To do so, a recently proved metric which… ▽ More

    Submitted 24 November, 2015; originally announced November 2015.

    Comments: This is a preprint of the paper accepted for publication in the Proceedings of the 28th Panhellenic Statistics Conference, 15-18/4/2015, Athens, Greece

    MSC Class: 62H99; 62J02

  23. The Assessment of Performance of Correlation Estimates in Discrete Bivariate Distributions Using Bootstrap Methodology

    Authors: Michael Tsagris, Ioannis Elmatzoglou, Christos C. Frangos

    Abstract: Little attention has been given to the correlation coefficient when data come from discrete or continuous non-normal populations. In this article, we consider the efficiency of two correlation coefficients which are from the same family, Pearson's and Spearman's estimators. Two discrete bivariate distributions were examined: the Poisson and the Negative Binomial. The comparison between these two e… ▽ More

    Submitted 5 November, 2015; originally announced November 2015.

    Comments: 22 pages with 5 tables and 2 figures

    MSC Class: Primary 62G05; Secondary 62F40

    Journal ref: Communications in Statistics - Theory and Methods, 2012, 41(1): 138--152

  24. arXiv:1509.01365  [pdf

    stat.ME stat.OT

    Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number

    Authors: Konstantinos C. Fragkos, Michail Tsagris, Christos C. Frangos

    Abstract: The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number.This was produced by discerning whether th… ▽ More

    Submitted 4 September, 2015; originally announced September 2015.

    Comments: Published in the International Scholarly Research Notices in December 2014

    MSC Class: 91E99

  25. arXiv:1508.01913  [pdf, ps, other

    stat.ME

    Regression analysis with compositional data containing zero values

    Authors: Michail Tsagris

    Abstract: Regression analysis with compositional data containing zero values

    Submitted 8 August, 2015; originally announced August 2015.

    Comments: The paper has been accepted for publication in the Chilean Journal of Statistics. It consists of 12 pages with 4 figures

    MSC Class: Primary 62J02; Secondary 62H99

  26. arXiv:1506.05216  [pdf, ps, other

    stat.ME

    The k-NN algorithm for compositional data: a revised approach with and without zero values present

    Authors: Michail Tsagris

    Abstract: In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for compositional data by employing a power transformation. Both metrics ar… ▽ More

    Submitted 17 June, 2015; originally announced June 2015.

    Comments: This manuscript will appear at the. http://www.jds-online.com/volume-12-number-3-july-2014

    MSC Class: 62H30

    Journal ref: Journal of Data Science, Vol 12, Number 3, July 2014

  27. arXiv:1506.04976  [pdf, ps, other

    stat.ME

    Improved classification for compositional data using the $α$-transformation

    Authors: Michail Tsagris, Simon Preston, Andrew T. A. Wood

    Abstract: In compositional data analysis an observation is a vector containing non-negative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we inv… ▽ More

    Submitted 17 June, 2015; v1 submitted 16 June, 2015; originally announced June 2015.

    Comments: This is a 17-page preprint and has been accepted for publication at the Journal of Classification

    MSC Class: 62H30

  28. arXiv:1410.5011  [pdf, ps, other

    stat.ME

    A Dirichlet Regression Model for Compositional Data with Zeros

    Authors: Michail Tsagris, Connie Stewart

    Abstract: Compositional data are met in many different fields, such as economics, archaeometry, ecology, geology and political sciences. Regression where the dependent variable is a composition is usually carried out via a log-ratio transformation of the composition or via the Dirichlet distribution. However, when there are zero values in the data these two ways are not readily applicable. Suggestions for t… ▽ More

    Submitted 7 June, 2017; v1 submitted 18 October, 2014; originally announced October 2014.

    Comments: Research article consisting of 18 pages, 4 figures

    MSC Class: 62H99 (Primary); 62P12 (Secondary)

  29. On the folded normal distribution

    Authors: Michail Tsagris, Christina Beneki, Hossein Hassani

    Abstract: The characteristic function of the folded normal distribution and its moment function are derived. The entropy of the folded normal distribution and the Kullback--Leibler from the normal and half normal distributions are approximated using Taylor series. The accuracy of the results are also assessed using different criteria. The maximum likelihood estimates and confidence intervals for the paramet… ▽ More

    Submitted 14 February, 2014; originally announced February 2014.

    Comments: Published in Mathematics. http://www.mdpi.com/2227-7390/2/1/12

    Journal ref: Mathematics 2014, 2(1), 12-28

  30. arXiv:1106.1451  [pdf, ps, other

    stat.ME

    A data-based power transformation for compositional data

    Authors: Michail T. Tsagris, Simon Preston, Andrew T. A. Wood

    Abstract: Compositional data analysis is carried out either by neglecting the compositional constraint and applying standard multivariate data analysis, or by transforming the data using the logs of the ratios of the components. In this work we examine a more general transformation which includes both approaches as special cases. It is a power transformation and involves a single parameter, α. The transform… ▽ More

    Submitted 16 June, 2011; v1 submitted 7 June, 2011; originally announced June 2011.

    Comments: Published in the proceddings of the 4th international workshop on Compositional Data Analysis. http://congress.cimne.com/codawork11/frontal/default.asp

    Journal ref: Proceedings of CoDaWork'11: 4th international workshop on Compositional Data Analysis, Egozcue, J.J., Tolosana-Delgado, R. and Ortego, M.I. (eds.) 2011. ISBN: 978-84-87867-76-7