Markov chain order estimation with parametric significance tests of conditional mutual information

Papapetrou, Maria; Kugiumtzis, Dimitris

Abstract:Besides the different approaches suggested in the literature, accurate estimation of the order of a Markov chain from a given symbol sequence is an open issue, especially when the order is moderately large. Here, parametric significance tests of conditional mutual information (CMI) of increasing order $m$, $I_c(m)$, on a symbol sequence are conducted for increasing orders $m$ in order to estimate the true order $L$ of the underlying Markov chain. CMI of order $m$ is the mutual information of two variables in the Markov chain being $m$ time steps apart, conditioning on the intermediate variables of the chain. The null distribution of CMI is approximated with a normal and gamma distribution deriving analytic expressions of their parameters, and a gamma distribution deriving its parameters from the mean and variance of the normal distribution. The accuracy of order estimation is assessed with the three parametric tests, and the parametric tests are compared to the randomization significance test and other known order estimation criteria using Monte Carlo simulations of Markov chains with different order $L$, length of symbol sequence $N$ and number of symbols $K$. The parametric test using the gamma distribution (with directly defined parameters) is consistently better than the other two parametric tests and matches well the performance of the randomization test. The tests are applied to genes and intergenic regions of DNA sequences, and the estimated orders are interpreted in view of the results from the simulation study. The application shows the usefulness of the parametric gamma test for long symbol sequences where the randomization test becomes prohibitively slow to compute.

Comments:	19 pages, 7 figures
Subjects:	Methodology (stat.ME); Information Theory (cs.IT); Data Analysis, Statistics and Probability (physics.data-an)
Cite as:	arXiv:1511.02339 [stat.ME]
	(or arXiv:1511.02339v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.1511.02339

Statistics > Methodology

Title:Markov chain order estimation with parametric significance tests of conditional mutual information

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators