-
MSmix: An R Package for clustering partial rankings via mixtures of Mallows Models with Spearman distance
Authors:
Marta Crispino,
Cristina Mollica,
Lucia Modugno
Abstract:
MSmix is a recently developed R package implementing maximum likelihood estimation of finite mixtures of Mallows models with Spearman distance for full and partial rankings. The package is designed to implement computationally tractable estimation routines of the model parameters, with the ability to handle arbitrary forms of partial rankings and sequences of a large number of items. The frequenti…
▽ More
MSmix is a recently developed R package implementing maximum likelihood estimation of finite mixtures of Mallows models with Spearman distance for full and partial rankings. The package is designed to implement computationally tractable estimation routines of the model parameters, with the ability to handle arbitrary forms of partial rankings and sequences of a large number of items. The frequentist estimation task is accomplished via EM algorithms, integrating data augmentation strategies to recover the unobserved heterogeneity and the missing ranks. The package also provides functionalities for uncertainty quantification of the estimated parameters, via diverse bootstrap methods and asymptotic confidence intervals. Generic methods for S3 class objects are constructed for more effectively managing the output of the main routines. The usefulness of the package and its computational performance compared with competing software is illustrated via applications to both simulated and original real ranking datasets.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Efficient and accurate inference for mixtures of Mallows models with Spearman distance
Authors:
Marta Crispino,
Cristina Mollica,
Valerio Astuti,
Luca Tardella
Abstract:
The Mallows model occupies a central role in parametric modelling of ranking data to learn preferences of a population of judges. Despite the wide range of metrics for rankings that can be considered in the model specification, the choice is typically limited to the Kendall, Cayley or Hamming distances, due to the closed-form expression of the related model normalizing constant. This work instead…
▽ More
The Mallows model occupies a central role in parametric modelling of ranking data to learn preferences of a population of judges. Despite the wide range of metrics for rankings that can be considered in the model specification, the choice is typically limited to the Kendall, Cayley or Hamming distances, due to the closed-form expression of the related model normalizing constant. This work instead focuses on the Mallows model with Spearman distance. An efficient and accurate EM algorithm for estimating finite mixtures of Mallows models with Spearman distance is developed, by relying on a twofold data augmentation strategy aimed at i) enlarging the applicability of Mallows models to samples drawn from heterogeneous populations; ii) dealing with partial rankings affected by diverse forms of censoring. Additionally, a novel approximation of the model normalizing constant is introduced to support the challenging model-based clustering of rankings with a large number of items. The inferential ability of the EM scheme and the effectiveness of the approximation are assessed by extensive simulation studies. Finally, we show that the application to three real-world datasets endorses our proposals also in the comparison with competing mixtures of ranking models.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
BayesMallows: An R Package for the Bayesian Mallows Model
Authors:
Øystein Sørensen,
Marta Crispino,
Qinghua Liu,
Valeria Vitelli
Abstract:
BayesMallows is an R package for analyzing data in the form of rankings or preferences with the Mallows rank model, and its finite mixture extension, in a Bayesian probabilistic framework.
The Mallows model is a well-known model, grounded on the idea that the probability density of an observed ranking decreases exponentially fast as its distance to the location parameter increases. Despite the m…
▽ More
BayesMallows is an R package for analyzing data in the form of rankings or preferences with the Mallows rank model, and its finite mixture extension, in a Bayesian probabilistic framework.
The Mallows model is a well-known model, grounded on the idea that the probability density of an observed ranking decreases exponentially fast as its distance to the location parameter increases. Despite the model being quite popular, this is the first Bayesian implementation that allows a wide choice of distances, and that works well with a large amount of items to be ranked. BayesMallows supports footrule, Spearman, Kendall, Cayley, Hamming and Ulam distances, allowing full use of the rich expressiveness of the Mallows model. This is possible thanks to the implementation of fast algorithms for approximating the partition function of the model under various distances. Although developed for being used in computing the posterior distribution of the model, these algorithms may be of interest in their own right.
BayesMallows handles non-standard data: partial rankings and pairwise comparisons, even in cases including non-transitive preference patterns. The advantage of the Bayesian paradigm in this context comes from its ability to coherently quantify posterior uncertainties of estimates of any quantity of interest. These posteriors are fully available to the user, and the package comes with convienient tools for summarizing and visualizing the posterior distributions.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
Dependence properties and Bayesian inference for asymmetric multivariate copulas
Authors:
Julyan Arbel,
Marta Crispino,
Stéphane Girard
Abstract:
We study a broad class of asymmetric copulas introduced by Liebscher (2008) as a combination of multiple - usually symmetric - copulas. The main thrust of the paper is to provide new theoretical properties including exact tail dependence expressions and stability properties. A subclass of Liebscher copulas obtained by combining Fréchet copulas is studied in more details. We establish further depen…
▽ More
We study a broad class of asymmetric copulas introduced by Liebscher (2008) as a combination of multiple - usually symmetric - copulas. The main thrust of the paper is to provide new theoretical properties including exact tail dependence expressions and stability properties. A subclass of Liebscher copulas obtained by combining Fréchet copulas is studied in more details. We establish further dependence properties for copulas of this class and show that they are characterized by an arbitrary number of singular components. Furthermore, we introduce a novel iterative representation for general Liebscher copulas which de facto insures uniform margins, thus relaxing a constraint of Liebscher's original construction. Besides, we show that this iterative construction proves useful for inference by developing an Approximate Bayesian computation sampling scheme. This inferential procedure is demonstrated on simulated data.
△ Less
Submitted 26 June, 2019; v1 submitted 2 February, 2019;
originally announced February 2019.
-
Informative extended Mallows priors in the Bayesian Mallows model
Authors:
Marta Crispino,
Isadora Antoniano-Villalobos
Abstract:
The aim of this work is to study the problem of prior elicitation for the Mallows model with Spearman's distance, a popular distance-based model for rankings or permutation data. Previous Bayesian inference for such model has been limited to the use of the uniform prior over the space of permutations. We present a novel strategy to elicit subjective prior beliefs on the location parameter of the m…
▽ More
The aim of this work is to study the problem of prior elicitation for the Mallows model with Spearman's distance, a popular distance-based model for rankings or permutation data. Previous Bayesian inference for such model has been limited to the use of the uniform prior over the space of permutations. We present a novel strategy to elicit subjective prior beliefs on the location parameter of the model, discussing the interpretation of hyper-parameters and the implication of prior choices for the posterior analysis.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
A Bayesian Mallows approach to non-transitive pair comparison data: how human are sounds?
Authors:
Marta Crispino,
Elja Arjas,
Valeria Vitelli,
Natasha Barrett,
Arnoldo Frigessi
Abstract:
We are interested in learning how listeners perceive sounds as having human origins. An experiment was performed with a series of electronically synthesized sounds, and listeners were asked to compare them in pairs. We propose a Bayesian probabilistic method to learn individual preferences from non-transitive pairwise comparison data, as happens when one (or more) individual preferences in the dat…
▽ More
We are interested in learning how listeners perceive sounds as having human origins. An experiment was performed with a series of electronically synthesized sounds, and listeners were asked to compare them in pairs. We propose a Bayesian probabilistic method to learn individual preferences from non-transitive pairwise comparison data, as happens when one (or more) individual preferences in the data contradicts what is implied by the others. We build a Bayesian Mallows model in order to handle non-transitive data, with a latent layer of uncertainty which captures the generation of preference misreporting. We then develop a mixture extension of the Mallows model, able to learn individual preferences in a heterogeneous population. The results of our analysis of the musicology experiment are of interest to electroacoustic composers and sound designers, and to the audio industry in general, whose aim is to understand how computer generated sounds can be produced in order to sound more human.
△ Less
Submitted 31 August, 2018; v1 submitted 24 May, 2017;
originally announced May 2017.
-
Probabilistic preference learning with the Mallows rank model
Authors:
Valeria Vitelli,
Øystein Sørensen,
Marta Crispino,
Arnoldo Frigessi,
Elja Arjas
Abstract:
Ranking and comparing items is crucial for collecting information about preferences in many areas, from marketing to politics. The Mallows rank model is among the most successful approaches to analyse rank data, but its computational complexity has limited its use to a particular form based on Kendall distance. We develop new computationally tractable methods for Bayesian inference in Mallows mode…
▽ More
Ranking and comparing items is crucial for collecting information about preferences in many areas, from marketing to politics. The Mallows rank model is among the most successful approaches to analyse rank data, but its computational complexity has limited its use to a particular form based on Kendall distance. We develop new computationally tractable methods for Bayesian inference in Mallows models that work with any right-invariant distance. Our method performs inference on the consensus ranking of the items, also when based on partial rankings, such as top-k items or pairwise comparisons. We prove that items that none of the assessors has ranked do not influence the maximum a posteriori consensus ranking, and can therefore be ignored. When assessors are many or heterogeneous, we propose a mixture model for clustering them in homogeneous subgroups, with cluster-specific consensus rankings. We develop approximate stochastic algorithms that allow a fully probabilistic analysis, leading to coherent quantifications of uncertainties. We make probabilistic predictions on the class membership of assessors based on their ranking of just some items, and predict missing individual preferences, as needed in recommendation systems. We test our approach using several experimental and benchmark datasets.
△ Less
Submitted 27 April, 2017; v1 submitted 30 May, 2014;
originally announced May 2014.