-
Multiple hypothesis screening using mixtures of non-local distributions with applications to genomic studies
Authors:
Francesco Denti,
Stefano Peluso,
Michele Guindani,
Antonietta Mira
Abstract:
The analysis of large-scale datasets, especially in biomedical contexts, frequently involves a principled screening of multiple hypotheses. The celebrated two-group model jointly models the distribution of the test statistics with mixtures of two competing densities, the null and the alternative distributions. We investigate the use of weighted densities and, in particular, non-local densities as…
▽ More
The analysis of large-scale datasets, especially in biomedical contexts, frequently involves a principled screening of multiple hypotheses. The celebrated two-group model jointly models the distribution of the test statistics with mixtures of two competing densities, the null and the alternative distributions. We investigate the use of weighted densities and, in particular, non-local densities as working alternative distributions, to enforce separation from the null and thus refine the screening procedure. We show how these weighted alternatives improve various operating characteristics, such as the Bayesian False Discovery rate, of the resulting tests for a fixed mixture proportion with respect to a local, unweighted likelihood approach. Parametric and nonparametric model specifications are proposed, along with efficient samplers for posterior inference. By means of a simulation study, we exhibit how our model compares with both well-established and state-of-the-art alternatives in terms of various operating characteristics. Finally, to illustrate the versatility of our method, we conduct three differential expression analyses with publicly-available datasets from genomic studies of heterogeneous nature.
△ Less
Submitted 9 March, 2023; v1 submitted 2 May, 2022;
originally announced May 2022.
-
A Bayesian Semiparametric Vector Multiplicative Error Model
Authors:
Nicola Donelli,
Stefano Peluso,
Antonietta Mira
Abstract:
Interactions among multiple time series of positive random variables are crucial in diverse financial applications, from spillover effects to volatility interdependence. A popular model in this setting is the vector Multiplicative Error Model (vMEM) which poses a linear iterative structure on the dynamics of the conditional mean, perturbed by a multiplicative innovation term. A main limitation of…
▽ More
Interactions among multiple time series of positive random variables are crucial in diverse financial applications, from spillover effects to volatility interdependence. A popular model in this setting is the vector Multiplicative Error Model (vMEM) which poses a linear iterative structure on the dynamics of the conditional mean, perturbed by a multiplicative innovation term. A main limitation of vMEM is however its restrictive assumption on the distribution of the random innovation term. A Bayesian semiparametric approach that models the innovation vector as an infinite location-scale mixture of multidimensional kernels with support on the positive orthant is used to address this major shortcoming of vMEM. Computational complications arising from the constraints to the positive orthant are avoided through the formulation of a slice sampler on the parameter-extended unconstrained version of the model. The method is applied to simulated and real data and a flexible specification is obtained that outperforms the classical ones in terms of fitting and predictive power.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Equivalence class selection of categorical graphical models
Authors:
Federico Castelletti,
Stefano Peluso
Abstract:
Learning the structure of dependence relations between variables is a pervasive issue in the statistical literature. A directed acyclic graph (DAG) can represent a set of conditional independences, but different DAGs may encode the same set of relations and are indistinguishable using observational data. Equivalent DAGs can be collected into classes, each represented by a partially directed graph…
▽ More
Learning the structure of dependence relations between variables is a pervasive issue in the statistical literature. A directed acyclic graph (DAG) can represent a set of conditional independences, but different DAGs may encode the same set of relations and are indistinguishable using observational data. Equivalent DAGs can be collected into classes, each represented by a partially directed graph known as essential graph (EG). Structure learning directly conducted on the EG space, rather than on the allied space of DAGs, leads to theoretical and computational benefits. Still, the majority of efforts in the literature has been dedicated to Gaussian data, with less attention to methods designed for multivariate categorical data. We then propose a Bayesian methodology for structure learning of categorical EGs. Combining a constructive parameter prior elicitation with a graph-driven likelihood decomposition, we derive a closed-form expression for the marginal likelihood of a categorical EG model. Asymptotic properties are studied, and an MCMC sampler scheme developed for approximate posterior inference. We evaluate our methodology on both simulated scenarios and real data, with appreciable performance in comparison with state-of-the-art methods.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Conditionally Gaussian Random Sequences for an Integrated Variance Estimator with Correlation between Noise and Returns
Authors:
Stefano Peluso,
Antonietta Mira,
Pietro Muliere
Abstract:
Correlation between microstructure noise and latent financial logarithmic returns is an empirically relevant phenomenon with sound theoretical justification. With few notable exceptions, all integrated variance estimators proposed in the financial literature are not designed to explicitly handle such a dependence, or handle it only in special settings. We provide an integrated variance estimator t…
▽ More
Correlation between microstructure noise and latent financial logarithmic returns is an empirically relevant phenomenon with sound theoretical justification. With few notable exceptions, all integrated variance estimators proposed in the financial literature are not designed to explicitly handle such a dependence, or handle it only in special settings. We provide an integrated variance estimator that is robust to correlated noise and returns. For this purpose, a generalization of the Forward Filtering Backward Sampling algorithm is proposed, to provide a sampling technique for a latent conditionally Gaussian random sequence. We apply our methodology to intra-day Microsoft prices, and compare it in a simulation study with established alternatives, showing an advantage in terms of root mean square error and dispersion.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
The semi-Markov beta-Stacy process: a Bayesian non-parametric prior for semi-Markov processes
Authors:
Andrea Arfè,
Stefano Peluso,
Pietro Muliere
Abstract:
The literature on Bayesian methods for the analysis of discrete-time semi-Markov processes is sparse. In this paper, we introduce the semi-Markov beta-Stacy process, a stochastic process useful for the Bayesian non-parametric analysis of semi-Markov processes. The semi-Markov beta-Stacy process is conjugate with respect to data generated by a semi-Markov process, a property which makes it easy to…
▽ More
The literature on Bayesian methods for the analysis of discrete-time semi-Markov processes is sparse. In this paper, we introduce the semi-Markov beta-Stacy process, a stochastic process useful for the Bayesian non-parametric analysis of semi-Markov processes. The semi-Markov beta-Stacy process is conjugate with respect to data generated by a semi-Markov process, a property which makes it easy to obtain probabilistic forecasts. Its predictive distributions are characterized by a reinforced random walk on a system of urns.
△ Less
Submitted 23 July, 2020; v1 submitted 1 December, 2018;
originally announced December 2018.
-
Reinforced urns and the subdistribution beta-Stacy process prior for competing risks analysis
Authors:
Andrea Arfé,
Stefano Peluso,
Pietro Muliere
Abstract:
In this paper we introduce the subdistribution beta-Stacy process, a novel Bayesian nonparametric process prior for subdistribution functions useful for the analysis of competing risks data. In particular, we i) characterize this process from a predictive perspective by means of an urn model with reinforcement, ii) show that it is conjugate with respect to right-censored data, and iii) highlight i…
▽ More
In this paper we introduce the subdistribution beta-Stacy process, a novel Bayesian nonparametric process prior for subdistribution functions useful for the analysis of competing risks data. In particular, we i) characterize this process from a predictive perspective by means of an urn model with reinforcement, ii) show that it is conjugate with respect to right-censored data, and iii) highlight its relations with other prior processes for competing risks data. Additionally, we consider the subdistribution beta-Stacy process prior in a nonparametric regression model for competing risks data which, contrary to most others available in the literature, is not based on the proportional hazards assumption.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Marginal models with individual-specific effects for the analysis of longitudinal bipartite networks
Authors:
Francesco Bartolucci,
Antonietta Mira,
Stefano Peluso
Abstract:
A new modeling framework for bipartite social networks arising from a sequence of partially time-ordered relational events is proposed. We directly model the joint distribution of the binary variables indicating if each single actor is involved or not in an event. The adopted parametrization is based on first- and second-order effects, formulated as in marginal models for categorical data and free…
▽ More
A new modeling framework for bipartite social networks arising from a sequence of partially time-ordered relational events is proposed. We directly model the joint distribution of the binary variables indicating if each single actor is involved or not in an event. The adopted parametrization is based on first- and second-order effects, formulated as in marginal models for categorical data and free higher order effects. In particular, second-order effects are log-odds ratios with meaningful interpretation from the social perspective in terms of tendency to cooperate, in contrast to first-order effects interpreted in terms of tendency of each single actor to participate in an event. These effects are parametrized on the basis of the event times, so that suitable latent trajectories of individual behaviors may be represented. Inference is based on a composite likelihood function, maximized by an algorithm with numerical complexity proportional to the square of the number of units in the network. A classification composite likelihood is used to cluster the actors, simplifying the interpretation of the data structure. The proposed approach is illustrated on a dataset of scientific articles published in four top statistical journals from 2003 to 2012.
△ Less
Submitted 20 October, 2018;
originally announced October 2018.
-
International Trade: a Reinforced Urn Network Model
Authors:
Stefano Peluso,
Antonietta Mira,
Pietro Muliere,
Alessandro Lomi
Abstract:
We propose a unified modelling framework that theoretically justifies the main empirical regularities characterizing the international trade network. Each country is associated to a Polya urn whose composition controls the propensity of the country to trade with other countries. The urn composition is updated through the walk of the Reinforced Urn Process of Muliere et al. (2000). The model implie…
▽ More
We propose a unified modelling framework that theoretically justifies the main empirical regularities characterizing the international trade network. Each country is associated to a Polya urn whose composition controls the propensity of the country to trade with other countries. The urn composition is updated through the walk of the Reinforced Urn Process of Muliere et al. (2000). The model implies a local preferential attachment scheme and a power law right tail behaviour of bilateral trade flows. Different assumptions on the urns' reinforcement parameters account for local clustering, path-shortening and sparsity. Likelihood-based estimation approaches are facilitated by feasible likelihood analytical derivation in various network settings. A simulated example and the empirical results on the international trade network are discussed.
△ Less
Submitted 12 January, 2016;
originally announced January 2016.