Scalable Genomic Context Analysis with GCsnap2 on HPC Clusters
Authors:
Reto Krummenacher,
Osman Seckin Simsek,
Michèle Leemann,
Leila T. Alexander,
Torsten Schwede,
Florina M. Ciorba,
Joana Pereira
Abstract:
GCsnap2 Cluster is a scalable, high performance tool for genomic context analysis, developed to overcome the limitations of its predecessor, GCsnap1 Desktop. Leveraging distributed computing with mpi4py[.]futures, GCsnap2 Cluster achieved a 22x improvement in execution time and can now perform genomic context analysis for hundreds of thousands of input sequences in HPC clusters. Its modular archit…
▽ More
GCsnap2 Cluster is a scalable, high performance tool for genomic context analysis, developed to overcome the limitations of its predecessor, GCsnap1 Desktop. Leveraging distributed computing with mpi4py[.]futures, GCsnap2 Cluster achieved a 22x improvement in execution time and can now perform genomic context analysis for hundreds of thousands of input sequences in HPC clusters. Its modular architecture enables the creation of task-specific workflows and flexible deployment in various computational environments, making it well suited for bioinformatics studies of large-scale datasets. This work highlights the potential for applying similar approaches to solve scalability challenges in other scientific domains that rely on large-scale data analysis pipelines.
△ Less
Submitted 13 May, 2025; v1 submitted 4 May, 2025;
originally announced May 2025.