-
Dimensions of Higher Order Factor Analysis Models
Authors:
Muhammad Ardiyansyah,
Luca Sodomaco
Abstract:
The factor analysis model is a statistical model where a certain number of hidden random variables, called factors, affect linearly the behaviour of another set of observed random variables, with additional random noise. The main assumption of the model is that the factors and the noise are Gaussian random variables. This implies that the feasible set lies in the cone of positive semidefinite matr…
▽ More
The factor analysis model is a statistical model where a certain number of hidden random variables, called factors, affect linearly the behaviour of another set of observed random variables, with additional random noise. The main assumption of the model is that the factors and the noise are Gaussian random variables. This implies that the feasible set lies in the cone of positive semidefinite matrices. In this paper, we do not assume that the factors and the noise are Gaussian, hence the higher order moment and cumulant tensors of the observed variables are generally nonzero. This motivates the notion of kth-order factor analysis model, that is the family of all random vectors in a factor analysis model where the factors and the noise have finite and possibly nonzero moment and cumulant tensors up to order k. This subset may be described as the image of a polynomial map onto a Cartesian product of symmetric tensor spaces. Our goal is to compute its dimension and we provide conditions under which the image has positive codimension.
△ Less
Submitted 9 July, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Embeddability of centrosymmetric matrices capturing the double-helix structure in natural and synthetic DNA
Authors:
Muhammad Ardiyansyah,
Dimitra Kosta,
Jordi Roca-Lacostena
Abstract:
In this paper, we discuss the embedding problem for centrosymmetric matrices, which are higher order generalizations of the matrices occurring in Strand Symmetric Models. These models capture the substitution symmetries arising from the double helix structure of the DNA. Deciding whether a transition matrix is embeddable or not enables us to know if the observed substitution probabilities are cons…
▽ More
In this paper, we discuss the embedding problem for centrosymmetric matrices, which are higher order generalizations of the matrices occurring in Strand Symmetric Models. These models capture the substitution symmetries arising from the double helix structure of the DNA. Deciding whether a transition matrix is embeddable or not enables us to know if the observed substitution probabilities are consistent with a homogeneous continuous time substitution model, such as the Kimura models, the Jukes-Cantor model or the general time-reversible model. On the other hand, the generalization to higher order matrices is motivated by the setting of synthetic biology, which works with different sizes of genetic alphabets.
△ Less
Submitted 8 November, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Distinguishing Level-2 Phylogenetic Networks Using Phylogenetic Invariants
Authors:
Muhammad Ardiyansyah
Abstract:
In phylogenetics, it is important for the phylogenetic network model parameters to be identifiable so that the evolutionary histories of a group of species can be consistently inferred. However, as the complexity of the phylogenetic network models grows, the identifiability of network models becomes increasingly difficult to analyze. As an attempt to analyze the identifiability of network models,…
▽ More
In phylogenetics, it is important for the phylogenetic network model parameters to be identifiable so that the evolutionary histories of a group of species can be consistently inferred. However, as the complexity of the phylogenetic network models grows, the identifiability of network models becomes increasingly difficult to analyze. As an attempt to analyze the identifiability of network models, we check whether two networks are distinguishable. In this paper, we specifically study the distinguishability of phylogenetic network models associated with level-2 networks. Using an algebraic approach, namely using discrete Fourier transformation, we present some results on the distinguishability of some level-2 networks, which generalize earlier work on the distinguishability of level-1 networks. In particular, we study simple and semisimple level-2 networks. Simple and semisimple level-2 networks can be thought as generalizations of level-1 sunlet and cycle networks, respectively. Moreover, we also compare the varieties associated with semisimple level-2 and cycle networks.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
The Model-Specific Markov Embedding Problem for Symmetric Group-Based Models
Authors:
Muhammad Ardiyansyah,
Dimitra Kosta,
Kaie Kubjas
Abstract:
We study model embeddability, which is a variation of the famous embedding problem in probability theory, when apart from the requirement that the Markov matrix is the matrix exponential of a rate matrix, we additionally ask that the rate matrix follows the model structure. We provide a characterisation of model embeddable Markov matrices corresponding to symmetric group-based phylogenetic models.…
▽ More
We study model embeddability, which is a variation of the famous embedding problem in probability theory, when apart from the requirement that the Markov matrix is the matrix exponential of a rate matrix, we additionally ask that the rate matrix follows the model structure. We provide a characterisation of model embeddable Markov matrices corresponding to symmetric group-based phylogenetic models. In particular, we provide necessary and sufficient conditions in terms of the eigenvalues of symmetric group-based matrices. To showcase our main result on model embeddability, we provide an application to hachimoji models, which are eight-state models for synthetic DNA. Moreover, our main result on model embeddability enables us to compute the volume of the set of model embeddable Markov matrices relative to the volume of other relevant sets of Markov matrices within the model.
△ Less
Submitted 1 April, 2021; v1 submitted 14 May, 2020;
originally announced May 2020.