Skip to main content

Showing 1–17 of 17 results for author: Viroli, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.13589  [pdf, other

    stat.ME

    The quantile-based classifier with variable-wise parameters

    Authors: Marco Berrettini, Christian Hennig, Cinzia Viroli

    Abstract: Quantile-based classifiers can classify high-dimensional observations by minimising a discrepancy of an observation to a class based on suitable quantiles of the within-class distributions, corresponding to a unique percentage for all variables. The present work extends these classifiers by introducing a way to determine potentially different optimal percentages for different variables. Furthermor… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  2. Dealing with overdispersion in multivariate count data

    Authors: Noemi Corsini, Cinzia Viroli

    Abstract: The problem of overdispersion in multivariate count data is a challenging issue. Nowadays, it covers a central role mainly due to the relevance of modern technologies data, such as Next Generation Sequencing and textual data from the web or digital collections. This work presents a comprehensive analysis of the likelihood-based models for extra-variation data proposed in the scientific literature.… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: 21 pages, 4 figures, 3 tables

    Journal ref: Computational Statistics & Data Analysis 170 (2022) 107447

  3. Mixed data Deep Gaussian Mixture Model: A clustering model for mixed datasets

    Authors: Robin Fuchs, Denys Pommeret, Cinzia Viroli

    Abstract: Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the variables in order to design groups. In this work we introduce a multilayer architecture model-based clustering method called Mixed Deep Gaussian Mixture Model (MDG… ▽ More

    Submitted 10 March, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

  4. arXiv:2009.05007  [pdf, other

    stat.ME

    Directional quantile classifiers

    Authors: Alessio Farcomeni, Marco Geraci, Cinzia Viroli

    Abstract: We introduce classifiers based on directional quantiles. We derive theoretical results for selecting optimal quantile levels given a direction, and, conversely, an optimal direction given a quantile level. We also show that the misclassification rate is infinitesimal if population distributions differ by at most a location shift and if the number of directions is allowed to diverge at the same rat… ▽ More

    Submitted 11 September, 2020; v1 submitted 10 September, 2020; originally announced September 2020.

    Comments: 23 pages, 2 figures, 5 tables

    MSC Class: 62G05; 62G20

  5. arXiv:1902.07068  [pdf, ps, other

    cs.CL cs.IR cs.LG stat.ML

    Classifying textual data: shallow, deep and ensemble methods

    Authors: Laura Anderlucci, Lucia Guastadisegni, Cinzia Viroli

    Abstract: This paper focuses on a comparative evaluation of the most common and modern methods for text classification, including the recent deep learning strategies and ensemble methods. The study is motivated by a challenging real data problem, characterized by high-dimensional and extremely sparse data, deriving from incoming calls to the customer care of an Italian phone company. We will show that deep… ▽ More

    Submitted 18 February, 2019; originally announced February 2019.

  6. arXiv:1902.06615  [pdf, other

    stat.ML cs.LG

    Deep Mixtures of Unigrams for uncovering Topics in Textual Data

    Authors: Cinzia Viroli, Laura Anderlucci

    Abstract: Mixtures of Unigrams are one of the simplest and most efficient tools for clustering textual data, as they assume that documents related to the same topic have similar distributions of terms, naturally described by Multinomials. When the classification task is particularly challenging, such as when the document-term matrix is high-dimensional and extremely sparse, a more composite representation c… ▽ More

    Submitted 9 December, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

  7. arXiv:1806.10403  [pdf, ps, other

    stat.ME

    Quantile-based clustering

    Authors: Christian Hennig, Cinzia Viroli, Laura Anderlucci

    Abstract: A new cluster analysis method, $K$-quantiles clustering, is introduced. $K$-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd's algorithm for $K$-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for diffe… ▽ More

    Submitted 8 November, 2019; v1 submitted 27 June, 2018; originally announced June 2018.

  8. arXiv:1711.06929  [pdf, ps, other

    stat.ML cs.LG

    Deep Gaussian Mixture Models

    Authors: Cinzia Viroli, Geoffrey J. McLachlan

    Abstract: Deep learning is a hierarchical inference method formed by subsequent multiple layers of learning able to more efficiently describe complex relationships. In this work, Deep Gaussian Mixture Models are introduced and discussed. A Deep Gaussian Mixture model (DGMM) is a network of multiple layers of latent variables, where, at each layer, the variables follow a mixture of Gaussian distributions. Th… ▽ More

    Submitted 18 November, 2017; originally announced November 2017.

    Comments: 19 pages, 4 figures

  9. arXiv:1709.03563  [pdf, other

    stat.AP

    The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

    Authors: Laura Anderlucci, Angela Montanari, Cinzia Viroli

    Abstract: In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing… ▽ More

    Submitted 11 September, 2017; originally announced September 2017.

  10. Infinite Mixtures of Infinite Factor Analysers

    Authors: Keefe Murphy, Cinzia Viroli, Isobel Claire Gormley

    Abstract: Factor-analytic Gaussian mixture models are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and remain fixed. The pair which optimises some model selection criterion is then chosen. For computational reasons, models in which the number of latent factors differ across c… ▽ More

    Submitted 13 July, 2021; v1 submitted 24 January, 2017; originally announced January 2017.

    Comments: Published in Bayesian Analysis

    Journal ref: Bayesian Analysis, 15(3): 937-963 (2020)

  11. arXiv:1604.02318  [pdf, other

    stat.ME

    Bayesian Smooth-and-Match strategy for ordinary differential equations models that are linear in the parameters

    Authors: Saverio Ranciati, Cinzia Viroli, Ernst Wit

    Abstract: In many fields of application, dynamic processes that evolve through time are well described by systems of ordinary differential equations (ODEs). The analytical solution of the ODEs is often not available and different methods have been proposed to infer these quantities: from numerical optimization to regularized (penalized) models, these procedures aim to estimate indirectly the parameters with… ▽ More

    Submitted 18 July, 2017; v1 submitted 8 April, 2016; originally announced April 2016.

    Comments: 31 pages, 4 tables, 5 figures

  12. arXiv:1601.04879  [pdf, other

    stat.AP

    Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data

    Authors: Saverio Ranciati, Cinzia Viroli, Ernst Wit

    Abstract: Model-based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of Next-Generation Sequencing (NGS) experiments, for example, the signal observed in the data might be produced by two (or more) different biological processes opera… ▽ More

    Submitted 12 May, 2016; v1 submitted 19 January, 2016; originally announced January 2016.

    Comments: 25 pages; 3 tables, 6 figures

  13. arXiv:1410.8093  [pdf, ps, other

    stat.ME

    Modelling overdispersion heterogeneity in differential expression analysis using mixtures

    Authors: Elisabetta Bonafede, Franck Picard, Stéphane Robin, Cinzia Viroli

    Abstract: Next-generation sequencing technologies now constitute a method of choice to measure gene expression. Data to analyze are read counts, commonly modeled using Negative Binomial distributions. A relevant issue associated with this probabilistic framework is the reliable estimation of the overdispersion parameter, reinforced by the limited number of replicates generally observable for each gene. Many… ▽ More

    Submitted 7 November, 2014; v1 submitted 23 October, 2014; originally announced October 2014.

  14. arXiv:1401.1301  [pdf, ps, other

    stat.ME stat.AP

    Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data

    Authors: Laura Anderlucci, Cinzia Viroli

    Abstract: We propose a novel approach for modeling multivariate longitudinal data in the presence of unobserved heterogeneity for the analysis of the Health and Retirement Study (HRS) data. Our proposal can be cast within the framework of linear mixed models with discrete individual random intercepts; however, differently from the standard formulation, the proposed Covariance Pattern Mixture Model (CPMM) do… ▽ More

    Submitted 16 September, 2015; v1 submitted 7 January, 2014; originally announced January 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS816 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS816

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 777-800

  15. arXiv:1303.1282  [pdf, ps, other

    stat.ME

    Quantile-based classifiers

    Authors: Christian Hennig, Cinzia Viroli

    Abstract: Quantile classifiers for potentially high-dimensional data are defined by classifying an observation according to a sum of appropriately weighted component-wise distances of the components of the observation to the within-class quantiles. An optimal percentage for the quantiles can be chosen by minimizing the misclassification error in the training sample. It is shown that this is consistent, fo… ▽ More

    Submitted 12 November, 2013; v1 submitted 6 March, 2013; originally announced March 2013.

  16. arXiv:1010.2314  [pdf, ps, other

    stat.ME

    A factor mixture analysis model for multivariate binary data

    Authors: Silvia Cagnone, Cinzia Viroli

    Abstract: The paper proposes a latent variable model for binary data coming from an unobserved heterogeneous population. The heterogeneity is taken into account by replacing the traditional assumption of Gaussian distributed factors by a finite mixture of multivariate Gaussians. The aim of the proposed model is twofold: it allows to achieve dimension reduction when the data are dichotomous and, simultaneous… ▽ More

    Submitted 12 October, 2010; originally announced October 2010.

    Comments: 27 pages, 2 figures

  17. arXiv:1010.2310   

    stat.ME stat.CO

    Stochastic model selection for Mixtures of Matrix-Normals

    Authors: Cinzia Viroli

    Abstract: Finite mixtures of matrix normal distributions are a powerful tool for classifying three-way data in unsupervised problems. The distribution of each component is assumed to be a matrix variate normal density. The mixture model can be estimated through the EM algorithm under the assumption that the number of components is known and fixed. In this work we introduce, develop and explore a Bayesian an… ▽ More

    Submitted 6 March, 2013; v1 submitted 12 October, 2010; originally announced October 2010.

    Comments: This paper has been withdrawn by the author. Some content of the work paper has been extended

    MSC Class: 62H30