-
Clustering genomic words in human DNA using peaks and trends of distributions
Abstract: In this work we seek clusters of genomic words in human DNA by studying their inter-word lag distributions. Due to the particularly spiked nature of these histograms, a clustering procedure is proposed that first decomposes each distribution into a baseline and a peak distribution. An outlier-robust fitting method is used to estimate the baseline distribution (the `trend'), and a sparse vector of… ▽ More
Submitted 13 August, 2018; originally announced August 2018.
Journal ref: Advances in Data Analysis and Classification, 2020
-
Comparing reverse complementary genomic words based on their distance distributions and frequencies
Abstract: In this work we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pair… ▽ More
Submitted 6 October, 2017; originally announced October 2017.
Comments: Post-print of a paper accepted to publication in "Interdisciplinary Sciences: Computational Life Sciences" (ISSN: 1913-2751, ESSN: 1867-1462)
MSC Class: 62P10
Journal ref: Interdisciplinary Sciences: Computational Life Sciences, 2018, Vol. 10, 1-11
-
Dissimilar Symmetric Word Pairs in the Human Genome
Abstract: In this work we explore the dissimilarity between symmetric word pairs, by comparing the inter-word distance distribution of a word to that of its reversed complement. We propose a new measure of dissimilarity between such distributions. Since symmetric pairs with different patterns could point to evolutionary features, we search for the pairs with the most dissimilar behaviour. We focus our study… ▽ More
Submitted 5 July, 2017; v1 submitted 14 February, 2017; originally announced February 2017.
Comments: Submitted 13-Feb-2017; accepted, after a minor revision, 17-Mar-2017; 11th International Conference on Practical Applications of Computational Biology & Bioinformatics, PACBB 2017, Porto, Portugal, 21-23 June, 2017
Journal ref: Advances in Intelligent Systems and Computing, Vol 616, 248-256. Springer, 2017