Skip to main content

Showing 1–25 of 25 results for author: Aurell, E

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2403.14202  [pdf, other

    q-bio.PE q-bio.QM

    Two fitness inference schemes compared using allele frequencies from 1,068,391 sequences sampled in the UK during the COVID-19 pandemic

    Authors: Hong-Li Zeng, Cheng-Long Yang, Bo Jing, John Barton, Erik Aurell

    Abstract: Throughout the course of the SARS-CoV-2 pandemic, genetic variation has contributed to the spread and persistence of the virus. For example, various mutations have allowed SARS-CoV-2 to escape antibody neutralization or to bind more strongly to the receptors that it uses to enter human cells. Here, we compared two methods that estimate the fitness effects of viral mutations using the abundant sequ… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 10 pages, 6 figures

  2. arXiv:2112.12957  [pdf, other

    q-bio.GN

    Temporal epistasis inference from more than 3,500,000 SARS-CoV-2 Genomic Sequences

    Authors: Hong-Li Zeng, Yue Liu, Vito Dichio, Erik Aurell

    Abstract: We use Direct Coupling Analysis (DCA) to determine epistatic interactions between loci of variability of the SARS-CoV-2 virus, segmenting genomes by month of sampling. We use full-length, high-quality genomes from the GISAID repository up to October 2021, in total over 3,500,000 genomes. We find that DCA terms are more stable over time than correlations, but nevertheless change over time as mutati… ▽ More

    Submitted 5 June, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: 15 pages, 9 figures

  3. arXiv:2109.02962  [pdf, other

    q-bio.PE q-bio.GN stat.CO

    Mutation frequency time series reveal complex mixtures of clones in the world-wide SARS-CoV-2 viral population

    Authors: Hong-Li Zeng, Yue Liu, Vito Dichio, Kaisa Thorell, Rickard Nordén, Erik Aurell

    Abstract: We compute the allele frequencies of the alpha (B.1.1.7), beta (B.1.351) and delta (B.167.2) variants of SARS-CoV-2 from almost two million genome sequences on the GISAID repository. We find that the frequencies of a majority of the defining mutations in alpha rose towards the end of 2020 but drifted apart during spring 2021, a similar pattern being followed by delta during summer of 2021. For bet… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

  4. arXiv:2105.01428  [pdf, other

    q-bio.PE cond-mat.dis-nn physics.bio-ph

    Statistical Genetics in and out of Quasi-Linkage Equilibrium (Extended)

    Authors: Vito Dichio, Hong-Li Zeng, Erik Aurell

    Abstract: This review is about statistical genetics, an interdisciplinary topic between statistical physics and population biology. The focus is on the phase of quasi-linkage equilibrium (QLE). Our goals here are to clarify under which conditions the QLE phase can be expected to hold in population biology and how the stability of the QLE phase is lost. The QLE state, which has many similarities to a thermal… ▽ More

    Submitted 3 February, 2023; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: 58 pages, 13 figures

    Journal ref: 2023 Rep. Prog. Phys. 86 052601

  5. Global analysis of more than 50,000 SARS-Cov-2 genomes reveals epistasis between 8 viral genes

    Authors: Hong-Li Zeng, Vito Dichio, Edwin Rodríguez Horta, Kaisa Thorell, Erik Aurell

    Abstract: Genome-wide epistasis analysis is a powerful tool to infer gene interactions, which can guide drug and vaccine development and lead to a deeper understanding of microbial pathogenesis. We have considered all complete SARS-CoV-2 genomes deposited in the GISAID repository until \textbf{four} different cut-off dates, and used Direct Coupling Analysis together with an assumption of Quasi-Linkage Equil… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 22 pages, 11 pages

  6. arXiv:2006.16735  [pdf, other

    q-bio.PE cond-mat.stat-mech

    Inferring epistasis from genomic data with comparable mutation and outcrossing rate

    Authors: Hong-Li Zeng, Eugenio Mauri, Vito Dichio, Simona Cocco, Remi Monasson, Erik Aurell

    Abstract: We consider a population evolving due to mutation, selection and recombination, where selection includes single-locus terms (additive fitness) and two-loci terms (pairwise epistatic fitness). We further consider the problem of inferring fitness in the evolutionary dynamics from one or several snap-shots of the distribution of genotypes in the population. In the recent literature this has been done… ▽ More

    Submitted 4 May, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: 16 pages, 9 figures. Substantial revision from second version, previous suggestions and comments gratefully acknowledged

  7. Inferring genetic fitness from genomic data

    Authors: Hong-Li Zeng, Erik Aurell

    Abstract: The genetic composition of a naturally developing population is considered as due to mutation, selection, genetic drift and recombination. Selection is modeled as single-locus terms (additive fitness) and two-loci terms (pairwise epistatic fitness). The problem is posed to infer epistatic fitness from population-wide whole-genome data from a time series of a developing population. We generate such… ▽ More

    Submitted 7 January, 2020; originally announced January 2020.

    Journal ref: Phys. Rev. E 101, 052409 (2020)

  8. arXiv:1808.03478  [pdf, other

    q-bio.PE cond-mat.stat-mech

    DCA for genome-wide epistasis analysis: the statistical genetics perspective

    Authors: Chen-Yi Gao, Fabio Cecconi, Angelo Vulpiani, Hai-Jun Zhou, Erik Aurell

    Abstract: Direct Coupling Analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. DCA has been applied with great success to sequences of homologous proteins, and also more recently to whole-genome population-wide sequencing data. We here argue that the use of DCA on the genome scale is cont… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Comments: 9 pages, 5 figures

  9. arXiv:1710.04819  [pdf, other

    physics.bio-ph q-bio.QM

    Correlation-Compressed Direct Coupling Analysis

    Authors: Chen-Yi Gao, Hai-Jun Zhou, Erik Aurell

    Abstract: Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising/Potts distributions cannot be computed efficie… ▽ More

    Submitted 13 October, 2017; originally announced October 2017.

    Comments: 15 pages, including 11 figures

    Journal ref: Phys. Rev. E 98, 032407 (2018)

  10. arXiv:1706.00784  [pdf, other

    cond-mat.stat-mech cond-mat.soft q-bio.QM

    Steady diffusion in a drift field: a comparison of large deviation techniques and multiple-scale analysis

    Authors: Erik Aurell, Stefano Bo

    Abstract: A particle with internal unobserved states diffusing in a force field will generally display effective advection-diffusion. The drift velocity is proportional to the mobility averaged over the internal states, or effective mobility, while the effective diffusion has two terms. One is of the equilibrium type and satisfies an Einstein relation with the effective mobility while the other is quadratic… ▽ More

    Submitted 12 October, 2017; v1 submitted 2 June, 2017; originally announced June 2017.

    Journal ref: Phys. Rev. E 96, 032140 (2017)

  11. arXiv:1606.04576  [pdf, other

    q-bio.GN

    An observation of circular RNAs in bacterial RNA-seq data

    Authors: Nicolas Innocenti, Hoang-Son Nguyen, Aymeric Fouquier d'hérouël, Erik Aurell

    Abstract: Circular RNAs (circRNAs) are a class of RNA with an important role in micro RNA (miRNA) regulation recently discovered in Human and various other eukaryotes as well as in archaea. Here, we have analyzed RNA-seq data obtained from {\it Enterococcus faecalis} and {\it Escherichia coli} in a way similar to previous studies performed on eukaryotes. We report observations of circRNAs in RNA-seq data th… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

  12. arXiv:1509.05188  [pdf, other

    q-bio.GN physics.bio-ph

    The Bulk and The Tail of Minimal Absent Words in Genome Sequences

    Authors: Erik Aurell, Nicolas Innocenti, Hai-Jun-Zhou

    Abstract: Minimal absent words (MAW) of a genomic sequence are subsequences that are absent themselves but the subwords of which are all present in the sequence. The characteristic distribution of genomic MAWs as a function of their length has been observed to be qualitatively similar for all living organisms, the bulk being rather short, and only relatively few being long. It has been an open issue whether… ▽ More

    Submitted 17 September, 2015; originally announced September 2015.

    Comments: Supplemental Information to the paper is available as ancillary file

  13. arXiv:1410.1925  [pdf, other

    q-bio.GN q-bio.BM q-bio.QM

    Whole genome mapping of 5' RNA ends in bacteria by tagged sequencing : A comprehensive view in Enterococcus faecalis

    Authors: Nicolas Innocenti, Monica Golumbeanu, Aymeric Fouquier d'Hérouël, Caroline Lacoux, Rémy A. Bonnin, Sean P. Kennedy, Françoise Wessner, Pascale Serror, Philippe Bouloc, Francis Repoila, Erik Aurell

    Abstract: Enterococcus faecalis is the third cause of nosocomial infections. To obtain the first comprehensive view of transcriptional organizations in this bacterium, we used a modified RNA-seq approach enabling to discriminate primary from processed 5'RNA ends. We also validated our approach by confirming known features in Escherichia coli. We mapped 559 transcription start sites and 352 processing site… ▽ More

    Submitted 7 October, 2014; originally announced October 2014.

  14. SEK: Sparsity exploiting $k$-mer-based estimation of bacterial community composition

    Authors: Saikat Chatterjee, David Koslicki, Siyuan Dong, Nicolas Innocenti, Lu Cheng, Yueheng Lan, Mikko Vehkaperä, Mikael Skoglund, Lars K. Rasmussen, Erik Aurell, Jukka Corander

    Abstract: Motivation: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. Since the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically very time… ▽ More

    Submitted 1 July, 2014; originally announced July 2014.

    Comments: 10 pages

  15. arXiv:1403.0379  [pdf, other

    q-bio.BM cond-mat.stat-mech

    Improving contact prediction along three dimensions

    Authors: Christoph Feinauer, Marcin J. Skwark, Andrea Pagnani, Erik Aurell

    Abstract: Correlation patterns in multiple sequence alignments of homologous proteins can be exploited to infer information on the three-dimensional structure of their members. The typical pipeline to address this task, which we in this paper refer to as the three dimensions of contact prediction, is to: (i) filter and align the raw sequence data representing the evolutionarily related proteins; (ii) choose… ▽ More

    Submitted 5 March, 2014; v1 submitted 3 March, 2014; originally announced March 2014.

    Comments: 19 pages, 8 figures in main text; 7 pages, 6 figures in supporting information

    MSC Class: 92C40; 62P10; ACM Class: J.3

  16. arXiv:1401.4832  [pdf, other

    q-bio.QM physics.comp-ph physics.data-an

    Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences

    Authors: Magnus Ekeberg, Tuomo Hartonen, Erik Aurell

    Abstract: Direct-Coupling Analysis is a group of methods to harvest information about coevolving residues in a protein family by learning a generative model in an exponential family from data. In protein families of realistic size, this learning can only be done approximately, and there is a trade-off between inference precision and computational speed. We here show that an earlier introduced $l_2$-regulari… ▽ More

    Submitted 20 January, 2014; originally announced January 2014.

    Comments: 33 pages, 4 figures; M. Ekeberg and T. Hartonen are joint first authors; code and supplementary information on http://plmdca.csc.kth.se/

    Journal ref: Journal of Computational Physics 276 (2014) 341-356

  17. Lognormality and oscillations in the coverage of high-throughput transcriptomic data towards gene ends

    Authors: Nicolas Innocenti, Erik Aurell

    Abstract: High-throughput transcriptomics experiments have reached the stage where the count of the number of reads alignable to a given position can be treated as an almost-continuous signal. This allows to ask questions of biophysical/biotechnical nature, but which may still have biological implications. Here we show that when sequencing RNA fragments from one end, as it is the case on most platforms, an… ▽ More

    Submitted 28 August, 2013; v1 submitted 18 March, 2013; originally announced March 2013.

    Journal ref: J. Stat. Mech. (2013) P10013

  18. arXiv:1211.1281  [pdf, ps, other

    q-bio.QM cond-mat.dis-nn cond-mat.stat-mech physics.data-an

    Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models

    Authors: Magnus Ekeberg, Cecilia Lövkvist, Yueheng Lan, Martin Weigt, Erik Aurell

    Abstract: Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statis… ▽ More

    Submitted 12 January, 2013; v1 submitted 6 November, 2012; originally announced November 2012.

    Comments: 19 pages, 16 figures, published version

    Journal ref: M. Ekeberg, C. Lövkvist, Y. Lan, M. Weigt, E. Aurell, Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models, Phys. Rev. E 87, 012707 (2013)

  19. arXiv:1206.2311  [pdf

    q-bio.MN cond-mat.other

    Quasi-potential landscape in complex multi-stable systems

    Authors: Joseph Xu Zhou, M. D. S. Aliyu, Erik Aurell, Sui Huang

    Abstract: Developmental dynamics of multicellular organism is a process that takes place in a multi-stable system in which each attractor state represents a cell type and attractor transitions correspond to cell differentiation paths. This new understanding has revived the idea of a quasi-potential landscape, first proposed by Waddington as a metaphor. To describe development one is interested in the "relat… ▽ More

    Submitted 11 June, 2012; originally announced June 2012.

    Comments: 30 pages, 6 figures

  20. arXiv:0905.1410  [pdf, ps, other

    q-bio.QM cond-mat.dis-nn q-bio.NC

    Statistical physics of pairwise probability models

    Authors: Yasser Roudi, Erik Aurell, John Hertz

    Abstract: Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pair… ▽ More

    Submitted 9 May, 2009; originally announced May 2009.

    Comments: 25 pages, 3 figures

    Report number: NORDITA-2009-25

    Journal ref: Front. Comput. Neurosci (2009) 3:22

  21. arXiv:0712.3711  [pdf, ps, other

    q-bio.MN physics.bio-ph

    A computational systems biology study of the lambda-lac mutants

    Authors: Maria Werner, Erik Aurell

    Abstract: We present a comprehensive computational study of some 900 possible "lambda-lac" mutants of the lysogeny maintenance switch in phage lambda, of which up to date 19 have been studied experimentally (Atsumi & Little, PNAS 103: 4558-4563, (2006)). We clarify that these mutants realise regulatory schemes quite different from wild-type lambda, and can therefore be expected to behave differently, with… ▽ More

    Submitted 21 December, 2007; originally announced December 2007.

    Comments: 13 pages, 6 figures

  22. arXiv:0706.1852  [pdf, ps, other

    q-bio.SC cond-mat.soft q-bio.MN

    Cooperative action in eukaryotic gene regulation: physical properties of a viral example

    Authors: Maria Werner, LiZhe Zhu, Erik Aurell

    Abstract: The Epstein-Barr virus (EBV) infects more than 90% of the human population, and is the cause of several both serious and mild diseases. It is a tumorivirus, and has been widely studied as a model system for gene (de)regulation in human. A central feature of the EBV life cycle is its ability to persist in human B cells in states denoted latency I, II and III. In latency III the host cell is drive… ▽ More

    Submitted 13 June, 2007; originally announced June 2007.

    Comments: 7 pages, 6 figures, 1 table

  23. arXiv:q-bio/0703063  [pdf, other

    q-bio.GN

    Noise-filtering features of transcription regulation in the yeast S. cerevisiae

    Authors: Erik Aurell, Aymeric Fouquier d'Herouel, Claes Malmnas, Massimo Vergassola

    Abstract: Transcription regulation is largely governed by the profile and the dynamics of transcription factors' binding to DNA. Stochastic effects are intrinsic to this dynamics and the binding to functional sites must be controled with a certain specificity for living organisms to be able to elicit specific cellular responses. Specificity stems here from the interplay between binding affinity and cellul… ▽ More

    Submitted 29 March, 2007; originally announced March 2007.

    Comments: 15 pages, 5 figures

  24. Epigenetics as a first exit problem

    Authors: E. Aurell, K. Sneppen

    Abstract: We develop a framework to discuss stability of epigenetic states as first exit problems in dynamical systems with noise. We consider in particular the stability of the lysogenic state of the lambda prophage, which is known to exhibit exceptionally large stability. The formalism defines a quantative measure of robustness of inherited states. In contrast to Kramers' well-known problem of escape… ▽ More

    Submitted 5 March, 2001; originally announced March 2001.

    Comments: 6 pages, 3 figures, in REVTeX

  25. arXiv:cond-mat/0010286  [pdf, ps, other

    cond-mat.soft cond-mat.stat-mech q-bio

    Stability Puzzles in Phage Lambda

    Authors: Erik Aurell, Stanley Brown, Johan Johanson, Kim Sneppen

    Abstract: The lysogeny maintenance switch in phage lambda is one of the simplest examples on the molecular level of computation, command and control in a living system. If, following infection of the bacterium E. coli, the virus enters the lysogenic pathway, it represses its developmental functions, and integrates its DNA into the host chromosome. In this state the prophage may be passively replicated for… ▽ More

    Submitted 19 October, 2000; originally announced October 2000.

    Comments: LaTeX 24 pages, 7 figures in PostScript