-
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences
Authors:
Katalin Ferenc,
Lorenzo Martini,
Ieva Rauluseviciute,
Geir Kjetil Sandve,
Anthony Mathelier
Abstract:
The accurate development, assessment, interpretation, and benchmarking of bioinformatics frameworks for analyzing transcriptional regulatory grammars rely on controlled simulations to validate the underlying methods. However, existing simulators often lack end-to-end flexibility or ease of integration, which limits their practical use. We present inMOTIFin, a lightweight, modular, and user-friendl…
▽ More
The accurate development, assessment, interpretation, and benchmarking of bioinformatics frameworks for analyzing transcriptional regulatory grammars rely on controlled simulations to validate the underlying methods. However, existing simulators often lack end-to-end flexibility or ease of integration, which limits their practical use. We present inMOTIFin, a lightweight, modular, and user-friendly Python-based software that addresses these gaps by providing versatile and efficient simulation and modification of DNA regulatory sequences. inMOTIFin enables users to simulate or modify regulatory sequences efficiently for the customizable generation of motifs and insertion of motif instances with precise control over their positions, co-occurrences, and spacing, as well as direct modification of real sequences, facilitating a comprehensive evaluation of motif-based methods and interpretation tools. We demonstrate inMOTIFin applications for the assessment of de novo motif discovery prediction, the analysis of transcription factor cooperativity, and the support of explainability analyses for deep learning models. inMOTIFin ensures robust and reproducible analyses for studying transcriptional regulatory grammars.
inMOTIFin is available at PyPI https://pypi.org/project/inMOTIFin/ and Docker Hub https://hub.docker.com/r/cbgr/inmotifin. Detailed documentation is available at https://inmotifin.readthedocs.io/en/latest/. The code for use case analyses is available at https://bitbucket.org/CBGR/inmotifin_evaluation/src/main/.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Biological Random Walks: multi-omics integration for disease gene prioritization
Authors:
Michele Gentili,
Leonardo Martini,
Marialuisa Sponziello,
Luca Becchetti
Abstract:
Motivation: Over the past decade, network-based approaches have proven useful in identifying disease modules within the human interactome, often providing insights into key mechanisms and guiding the quest for therapeutic targets. This is all the more important, since experimental investigation of potential gene candidates is an expensive task, thus not always a feasible option. On the other hand,…
▽ More
Motivation: Over the past decade, network-based approaches have proven useful in identifying disease modules within the human interactome, often providing insights into key mechanisms and guiding the quest for therapeutic targets. This is all the more important, since experimental investigation of potential gene candidates is an expensive task, thus not always a feasible option. On the other hand, many sources of biological information exist beyond the interactome and an important research direction is the design of effective techniques for their integration. Results: In this work, we introduce the Biological Random Walks (BRW) approach for disease gene prioritization in the human interactome. The proposed framework leverages multiple biological sources within an integrated framework. We perform an extensive, comparative study of BRW's performance against well-established baselines. Availability and implementation: All code is publicly available and can be downloaded at \url{https://github.com/LeoM93/BiologicalRandomWalks}. We used publicly available datasets, details on their retrieval and preprocessing are provided in the supplementary material.
△ Less
Submitted 29 November, 2022; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Network Based Approach to Gene Prioritization at Genome-Wide Association Study Loci
Authors:
Leonardo Martini,
Adriano Fazzone,
Michele Gentili,
Luca Becchetti,
Brian Hobbs
Abstract:
Motivation: Genome-wide association studies (GWAS) have successfully identified thousands of genetic risk loci for complex traits and diseases. Most of these GWAS loci lie in regulatory regions of the genome and the gene through which each GWAS risk locus exerts its effects is not always clear. Many computational methods utilizing biological data sources have been proposed to identify putative cas…
▽ More
Motivation: Genome-wide association studies (GWAS) have successfully identified thousands of genetic risk loci for complex traits and diseases. Most of these GWAS loci lie in regulatory regions of the genome and the gene through which each GWAS risk locus exerts its effects is not always clear. Many computational methods utilizing biological data sources have been proposed to identify putative casual genes at GWAS loci; however, these methods can be improved upon. Results: We present the Relations-Maximization Method, a dense module searching method to identify putative causal genes at GWAS loci through the generation of candidate sub-networks derived by integrating association signals from GWAS data into the gene co-regulation network. We employ our method in a chronic obstructive pulmonary disease GWAS. We perform an extensive, comparative study of Relations-Maximization Method's performance against well-established baselines.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Network and Sequence-Based Prediction of Protein-Protein Interactions
Authors:
Leonardo Martini,
Adriano Fazzone,
Luca Becchetti
Abstract:
Background:Typically, proteins perform key biological functions by interacting with each other. As a consequence, predicting which protein pairs interact is a fundamental problem. Experimental methods are slow, expensive, and may be error prone.Many computational methods have been proposed to identify candidate interacting pairs. When accurate, they can serve as an inexpensive, preliminary filteri…
▽ More
Background:Typically, proteins perform key biological functions by interacting with each other. As a consequence, predicting which protein pairs interact is a fundamental problem. Experimental methods are slow, expensive, and may be error prone.Many computational methods have been proposed to identify candidate interacting pairs. When accurate, they can serve as an inexpensive, preliminary filtering stage, to be followed by downstream experimental validation. Among such methods, sequence-based ones are very promising.Results:We present, a new algorithm that leverages both topological and biological information to predict protein-protein interactions. We comprehensively compare our Framework with state-of-the-art approaches on reliable PPIs datasets, showing that they have competitive or higher accuracy on biologically validated test sets.Conclusion:We shown that topological plus sequence-based computational methods can effectively predict the entire human interactome compared with methods that leverage only one source of biological information.
△ Less
Submitted 6 February, 2022; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Biological Random Walks: integrating heterogeneous data in disease gene prioritization
Authors:
Michele Gentili,
Leonardo Martini,
Manuela Petti,
Lorenzo Farina,
Luca Becchetti
Abstract:
This work proposes a unified framework to leverage biological information in network propagation-based gene prioritization algorithms. Preliminary results on breast cancer data show significant improvements over state-of-the-art baselines, such as the prioritization of genes that are not identified as potential candidates by interactome-based algorithms, but that appear to be involved in/or potent…
▽ More
This work proposes a unified framework to leverage biological information in network propagation-based gene prioritization algorithms. Preliminary results on breast cancer data show significant improvements over state-of-the-art baselines, such as the prioritization of genes that are not identified as potential candidates by interactome-based algorithms, but that appear to be involved in/or potentially related to breast cancer, according to a functional analysis based on recent literature.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Identification and Analysis of Transition and Metastable Markov States
Authors:
Linda Martini,
Adam Kells,
Gerhard Hummer,
Nicolae-Viorel Buchete,
Edina Rosta
Abstract:
We present a new method that enables the identification and analysis of both transition and metastable conformational states from atomistic or coarse-grained molecular dynamics (MD) trajectories. Our algorithm is presented and studied by using both analytical and actual examples from MD simulations of the helix-forming peptide Ala5, and of a larger system, the epidermal growth factor receptor (EGF…
▽ More
We present a new method that enables the identification and analysis of both transition and metastable conformational states from atomistic or coarse-grained molecular dynamics (MD) trajectories. Our algorithm is presented and studied by using both analytical and actual examples from MD simulations of the helix-forming peptide Ala5, and of a larger system, the epidermal growth factor receptor (EGFR) protein. In all cases, our method identifies automatically the corresponding transition states and metastable conformations in an optimal way, with the input of a set of relevant coordinates, by capturing accurately the intrinsic slowest relaxation rate. Our approach provides a general and easy to implement analysis method that provides unique insight into the molecular mechanism and the rare but crucial rate limiting conformational pathways occurring in complex dynamical systems such as molecular trajectories.
△ Less
Submitted 13 May, 2016;
originally announced May 2016.