Skip to main content

Showing 1–16 of 16 results for author: Karlis, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2411.06298  [pdf, other

    stat.ME

    Efficient subsampling for high-dimensional data

    Authors: Vasilis Chasiotis, Lin Wang, Dimitris Karlis

    Abstract: In the field of big data analytics, the search for efficient subdata selection methods that enable robust statistical inferences with minimal computational resources is of high importance. A procedure prior to subdata selection could perform variable selection, as only a subset of a large number of variables is active. We propose an approach when both the size of the full dataset and the number of… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

  2. A Model-Based Approach to Shot Charts Estimation in Basketball

    Authors: Luca Scrucca, Dimitris Karlis

    Abstract: Shot charts in basketball analytics provide an indispensable tool for evaluating players' shooting performance by visually representing the distribution of field goal attempts across different court locations. However, conventional methods often overlook the bounded nature of the basketball court, leading to inaccurate representations, particularly along the boundaries and corners. In this paper,… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  3. arXiv:2404.05702  [pdf, other

    stat.ME

    On the estimation of complex statistics combining different surveys

    Authors: Vasilis Chasiotis, Dimitris Karlis

    Abstract: The importance of exploring a potential integration among surveys has been acknowledged in order to enhance effectiveness and minimize expenses. In this work, we employ the alignment method to combine information from two different surveys for the estimation of complex statistics. The derivation of the alignment weights poses challenges in case of complex statistics due to their non-linear form. T… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  4. arXiv:2404.04213  [pdf, other

    stat.ME stat.AP

    Modelling handball outcomes using univariate and bivariate approaches

    Authors: Dimitris Karlis, Rouven Michels, Marius Otting

    Abstract: Handball has received growing interest during the last years, including academic research for many different aspects of the sport. On the other hand modelling the outcome of the game has attracted less interest mainly because of the additional challenges that occur. Data analysis has revealed that the number of goals scored by each team are under-dispersed relative to a Poisson distribution and he… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    MSC Class: 62P99

  5. arXiv:2402.09888  [pdf, other

    stat.ME

    Multinomial mixture for spatial data

    Authors: Anna Nalpantidi, Dimitris Karlis, Panagiotis Papastamoulis

    Abstract: The purpose of this paper is to extend standard finite mixture models in the context of multinomial mixtures for spatial data, in order to cluster geographical units according to demographic characteristics. The spatial information is incorporated on the model through the mixing probabilities of each component. To be more specific, a Gibbs distribution is assumed for prior probabilities. In this w… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    MSC Class: 62P25

  6. arXiv:2307.02139  [pdf, other

    stat.ME

    Extending the Dixon and Coles model: an application to women's football data

    Authors: Rouven Michels, Marius Ötting, Dimitris Karlis

    Abstract: The prevalent model by Dixon and Coles (1997) extends the double Poisson model where two independent Poisson distributions model the number of goals scored by each team by moving probabilities between the scores 0-0, 0-1, 1-0, and 1-1. We show that this is a special case of a multiplicative model known as the Sarmanov family. Based on this family, we create more suitable models by moving probabili… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  7. arXiv:2306.10980  [pdf, other

    stat.ME

    Optimal subdata selection for linear model selection

    Authors: Vasilis Chasiotis, Dimitris Karlis

    Abstract: If the assumed model does not accurately capture the underlying structure of the data, a statistical method is likely to yield sub-optimal results, and so model selection is crucial in order to conduct any statistical analysis. However, in case of massive datasets, the selection of an appropriate model from a large pool of candidates becomes computationally challenging, and limited research has be… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  8. arXiv:2305.01597  [pdf, other

    stat.ME stat.AP

    On the selection of optimal subdata for big data regression based on leverage scores

    Authors: Vasilis Chasiotis, Dimitris Karlis

    Abstract: The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper, we explore an existing approa… ▽ More

    Submitted 5 July, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2305.00218

  9. Subdata selection for big data regression: an improved approach

    Authors: Vasilis Chasiotis, Dimitris Karlis

    Abstract: In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may suffer due to the large sample size, since they involve inverting huge data matrices or even because the data cannot fit to the memory. Proposed approaches are… ▽ More

    Submitted 17 April, 2024; v1 submitted 29 April, 2023; originally announced May 2023.

    Journal ref: Journal of Data Science, Statistics, and Visualisation 2024, 4(3)

  10. arXiv:2112.03688  [pdf, other

    stat.AP

    Piecewise survival models: a change-point analysis on herpes zoster associated pain data revisited and extended

    Authors: Dimitra Eleftheriou, Dimitris Karlis

    Abstract: For many diseases it is reasonable to assume that the hazard rate is not constant across time, but also that it changes in different time intervals. To capture this, we work here with a piecewise survival model. One of the major problems in such piecewise models is to determine the time points of change of the hazard rate. From the practical point of view this can provide very important informatio… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

  11. arXiv:2011.06045  [pdf, other

    stat.ME stat.AP

    Bayesian inference for transportation origin-destination matrices: the Poisson-inverse Gaussian and other Poisson mixtures

    Authors: Konstantinos Perrakis, Dimitris Karlis, Mario Cools, Davy Janssens

    Abstract: In this paper we present Poisson mixture approaches for origin-destination (OD) modeling in transportation analysis. We introduce covariate-based models which incorporate different transport modeling phases and also allow for direct probabilistic inference on link traffic based on Bayesian predictions. Emphasis is placed on the Poisson-inverse Gaussian as an alternative to the commonly-used Poisso… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

  12. arXiv:2005.05324  [pdf, other

    stat.ME

    Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data

    Authors: Yuan Fang, Dimitris Karlis, Sanjeena Subedi

    Abstract: Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. However, for cluster analysis, using a traditional finite mixture model framework, either the number of components needs to be known $a$-$priori$ or needs to be estimated $a$-$posteriori$ using some model selection criterion after deriving result… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: 61 pages. arXiv admin note: text overlap with arXiv:2005.02585

    MSC Class: 62H30

  13. arXiv:2005.02585  [pdf, other

    stat.CO

    A Bayesian approach for clustering skewed data using mixtures of multivariate normal-inverse Gaussian distributions

    Authors: Yuan Fang, Dimitris Karlis, Sanjeena Subedi

    Abstract: Non-Gaussian mixture models are gaining increasing attention for mixture model-based clustering particularly when dealing with data that exhibit features such as skewness and heavy tails. Here, such a mixture distribution is presented, based on the multivariate normal inverse Gaussian (MNIG) distribution. For parameter estimation of the mixture, a Bayesian approach via Gibbs sampler is used; for t… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    Comments: 40 pages, 7 figures

    MSC Class: 62H30

  14. arXiv:1901.09249  [pdf, other

    stat.ME stat.AP stat.ML

    Clustering Discrete-Valued Time Series

    Authors: Tyler Roick, Dimitris Karlis, Paul D. McNicholas

    Abstract: There is a need for the development of models that are able to account for discreteness in data, along with its time series properties and correlation. Our focus falls on INteger-valued AutoRegressive (INAR) type models. The INAR type models can be used in conjunction with existing model-based clustering techniques to cluster discrete-valued time series data. With the use of a finite mixture model… ▽ More

    Submitted 27 March, 2020; v1 submitted 26 January, 2019; originally announced January 2019.

  15. arXiv:1805.08561  [pdf, other

    stat.AP

    An integer-valued time series model for multivariate surveillance

    Authors: Xanthi Pedeli, Dimitris Karlis

    Abstract: In recent days different types of surveillance data are becoming available for public health reasons. In most cases several variables are monitored and events of different types are reported. As the amount of surveillance data increases, statistical methods that can effectively address multivariate surveillance scenarios are demanded. Even though research activity in this field is increasing rapid… ▽ More

    Submitted 13 September, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

  16. Model-based clustering using copulas with applications

    Authors: Ioannis Kosmidis, Dimitris Karlis

    Abstract: The majority of model-based clustering techniques is based on multivariate Normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: i) the appropriate choice of copulas provides the ability to obtain a range of exot… ▽ More

    Submitted 2 July, 2015; v1 submitted 15 April, 2014; originally announced April 2014.

    Journal ref: Stat.Comput. 26 (2016) 1079-1099