Skip to main content

Showing 1–8 of 8 results for author: Anderlucci, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:1912.00905  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Matrix sketching for supervised classification with imbalanced classes

    Authors: Roberta Falcone, Angela Montanari, Laura Anderlucci

    Abstract: Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so doing numerical operations on big data sets become faster. Sketching algorithms generally use random projections to compress the original dataset and this stochast… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

  2. arXiv:1909.10832  [pdf, other

    stat.ME

    High-dimensional clustering via Random Projections

    Authors: Laura Anderlucci, Francesca Fortunato, Angela Montanari

    Abstract: In this work, we address the unsupervised classification issue by exploiting the general idea of Random Projection Ensemble. Specifically, we propose to generate a set of low dimensional independent random projections and to perform model-based clustering on each of them. The top $B^*$ projections, i.e. the projections which show the best grouping structure are then retained. The final partition i… ▽ More

    Submitted 23 November, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

  3. arXiv:1905.02406  [pdf, ps, other

    stat.AP math.ST

    One-class classification with application to forensic analysis

    Authors: Laura Anderlucci, Francesca Fortunato, Angela Montanari

    Abstract: The analysis of broken glass is forensically important to reconstruct the events of a criminal act. In particular, the comparison between the glass fragments found on a suspect (recovered cases) and those collected on the crime scene (control cases) may help the police to correctly identify the offender(s). The forensic issue can be framed as a one-class classification problem. One-class classific… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

  4. arXiv:1902.07068  [pdf, ps, other

    cs.CL cs.IR cs.LG stat.ML

    Classifying textual data: shallow, deep and ensemble methods

    Authors: Laura Anderlucci, Lucia Guastadisegni, Cinzia Viroli

    Abstract: This paper focuses on a comparative evaluation of the most common and modern methods for text classification, including the recent deep learning strategies and ensemble methods. The study is motivated by a challenging real data problem, characterized by high-dimensional and extremely sparse data, deriving from incoming calls to the customer care of an Italian phone company. We will show that deep… ▽ More

    Submitted 18 February, 2019; originally announced February 2019.

  5. arXiv:1902.06615  [pdf, other

    stat.ML cs.LG

    Deep Mixtures of Unigrams for uncovering Topics in Textual Data

    Authors: Cinzia Viroli, Laura Anderlucci

    Abstract: Mixtures of Unigrams are one of the simplest and most efficient tools for clustering textual data, as they assume that documents related to the same topic have similar distributions of terms, naturally described by Multinomials. When the classification task is particularly challenging, such as when the document-term matrix is high-dimensional and extremely sparse, a more composite representation c… ▽ More

    Submitted 9 December, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

  6. arXiv:1806.10403  [pdf, ps, other

    stat.ME

    Quantile-based clustering

    Authors: Christian Hennig, Cinzia Viroli, Laura Anderlucci

    Abstract: A new cluster analysis method, $K$-quantiles clustering, is introduced. $K$-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd's algorithm for $K$-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for diffe… ▽ More

    Submitted 8 November, 2019; v1 submitted 27 June, 2018; originally announced June 2018.

  7. arXiv:1709.03563  [pdf, other

    stat.AP

    The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

    Authors: Laura Anderlucci, Angela Montanari, Cinzia Viroli

    Abstract: In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing… ▽ More

    Submitted 11 September, 2017; originally announced September 2017.

  8. arXiv:1401.1301  [pdf, ps, other

    stat.ME stat.AP

    Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data

    Authors: Laura Anderlucci, Cinzia Viroli

    Abstract: We propose a novel approach for modeling multivariate longitudinal data in the presence of unobserved heterogeneity for the analysis of the Health and Retirement Study (HRS) data. Our proposal can be cast within the framework of linear mixed models with discrete individual random intercepts; however, differently from the standard formulation, the proposed Covariance Pattern Mixture Model (CPMM) do… ▽ More

    Submitted 16 September, 2015; v1 submitted 7 January, 2014; originally announced January 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS816 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS816

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 777-800