bioSBM: a random graph model to integrate epigenomic data in chromatin structure prediction
Authors:
Alex Chen Yi Zhang,
Angelo Rosa,
Guido Sanguinetti
Abstract:
The spatial organization of chromatin within the nucleus plays a crucial role in gene expression and genome function. However, the quantitative relationship between this organization and nuclear biochemical processes remains under debate. In this study, we present a graph-based generative model, bioSBM, designed to capture long-range chromatin interaction patterns from Hi-C data and, importantly,…
▽ More
The spatial organization of chromatin within the nucleus plays a crucial role in gene expression and genome function. However, the quantitative relationship between this organization and nuclear biochemical processes remains under debate. In this study, we present a graph-based generative model, bioSBM, designed to capture long-range chromatin interaction patterns from Hi-C data and, importantly, simultaneously, link these patterns to biochemical features. Applying bioSBM to Hi-C maps of the GM12878 lymphoblastoid cell line, we identified a latent structure of chromatin interactions, revealing 12 distinct communities that strongly align with known biological annotations. Additionally, we infer a linear transformation that maps biochemical observables, such as histone marks, to the parameters of the generative graph model, enabling accurate genome-wide predictions of chromatin contact maps on out-of-sample data, both within the same cell line, and on the completely unseen HCT116 cell line under RAD21 depletion. These findings highlight bioSBM's potential as a powerful tool for elucidating the relationship between biochemistry and chromatin architecture and predicting long-range genome organization from independent biochemical data.
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
Bottom-up data integration in polymer models of chromatin organisation
Authors:
Alex Chen Yi Zhang,
Angelo Rosa,
Guido Sanguinetti
Abstract:
Cellular functions crucially depend on the precise execution of complex biochemical reactions taking place on the chromatin fiber in the tightly packed environment of the cell nucleus. Despite the availability of large data sets probing this process from multiple angles, we still lack a bottom-up framework which can incorporate the sequence-specific nature of biochemistry in a unified model of 3D…
▽ More
Cellular functions crucially depend on the precise execution of complex biochemical reactions taking place on the chromatin fiber in the tightly packed environment of the cell nucleus. Despite the availability of large data sets probing this process from multiple angles, we still lack a bottom-up framework which can incorporate the sequence-specific nature of biochemistry in a unified model of 3D chromatin dynamics. Here we propose SEMPER (Sequence Enhanced Magnetic PolymER), a novel stochastic polymer model which naturally incorporates observational data about sequence-driven biochemical processes, such as binding of transcription factor proteins, in a 3D model of chromatin structure. By introducing a new algorithm for approximate Bayesian inference, we discuss how to estimate in a robust manner the relative importance of biochemical vs. polymer signals in the determination of the chromatin epigenetic states which is leading to a significant revision of the interpretation of previous models. Furthermore we show that, without additional input from the genome 3D structure, our model can predict with reasonable accuracy some notable and non trivial conformational features of chromatin folding within the nucleus. Our work highlights the importance of introducing physically realistic statistical models for predicting chromatin states from epigenetic data, and opens the way to a new class of more systematic approaches to interpret epigenomic data.
△ Less
Submitted 16 March, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.