-
Scaling a Variant Calling Genomics Pipeline with FaaS
Authors:
Aitor Arjona,
Arnau Gabriel-Atienza,
Sara Lanuza-Orna,
Xavier Roca-Canals,
Ayman Bourramouss,
Tyler K. Chafin,
Lucio Marcello,
Paolo Ribeca,
Pedro García-López
Abstract:
With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverle…
▽ More
With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverless is not a straightforward process. In this article, we present a Variant Calling genomics pipeline migrated from single-node HPC to a serverless architecture. We describe the inherent challenges of this approach and the engineering efforts required to achieve scalability. We contribute by open-sourcing the pipeline for future systems research and as a scalable user-friendly tool for the bioinformatics community.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Networks of Ethereum Non-Fungible Tokens: A graph-based analysis of the ERC-721 ecosystem
Authors:
S. Casale-Brunet,
P. Ribeca,
P. Doyle,
M. Mattavelli
Abstract:
Non-fungible tokens (NFTs) as a decentralized proof of ownership represent one of the main reasons why Ethereum is a disruptive technology. This paper presents the first systematic study of the interactions occurring in a number of NFT ecosystems. We illustrate how to retrieve transaction data available on the blockchain and structure it as a graph-based model. Thanks to this methodology, we are a…
▽ More
Non-fungible tokens (NFTs) as a decentralized proof of ownership represent one of the main reasons why Ethereum is a disruptive technology. This paper presents the first systematic study of the interactions occurring in a number of NFT ecosystems. We illustrate how to retrieve transaction data available on the blockchain and structure it as a graph-based model. Thanks to this methodology, we are able to study for the first time the topological structure of NFT networks and show that their properties (degree distribution and others) are similar to those of interaction graphs in social networks. Time-dependent analysis metrics, useful to characterize market influencers and interactions between different wallets, are also introduced. Based on those, we identify across a number of NFT networks the widespread presence of both investors accumulating NFTs and individuals who make large profits.
△ Less
Submitted 24 October, 2021;
originally announced October 2021.
-
CARGO: Effective format-free compressed storage of genomic information
Authors:
Łukasz Roguski,
Paolo Ribeca
Abstract:
The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into the focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective data sharing. Here we present CARGO (Compressed ARchiving for GenOmics…
▽ More
The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into the focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective data sharing. Here we present CARGO (Compressed ARchiving for GenOmics), a high-level framework to automatically generate software systems optimized for the compressed storage of arbitrary types of large genomic data collections. Straightforward applications of our approach to FASTQ and SAM archives require a few lines of code, produce solutions that match and sometimes outperform specialized format-tailored compressors, and scale well to multi-TB datasets.
△ Less
Submitted 16 June, 2015;
originally announced June 2015.