Skip to main content

Showing 1–27 of 27 results for author: Firtina, C

Searching in archive q-bio. Search in all archives.
.
  1. MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem

    Authors: Melina Soysal, Konstantina Koliogeorgi, Can Firtina, Nika Mansouri Ghiasi, Rakesh Nadig, Haiyu Mao, Geraldo F. Oliveira, Yu Liang, Klea Zambaku, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult for software-based RSGA to match the throughput of raw signal generation. This paper demonstrates that while hardware acceleration techniques can significantly ac… ▽ More

    Submitted 3 July, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2504.03732  [pdf, other

    cs.AR cs.DC q-bio.GN

    SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Analysis

    Authors: Nika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu

    Abstract: Given the exponentially growing volumes of genomic data, there are extensive efforts to accelerate genome analysis. We demonstrate a major bottleneck that greatly limits and diminishes the benefits of state-of-the-art genome analysis accelerators: the data preparation bottleneck, where genomic data is stored in compressed form and needs to be decompressed and formatted first before an accelerator… ▽ More

    Submitted 21 April, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

  3. arXiv:2503.02997  [pdf, other

    q-bio.GN cs.AR cs.DS cs.ET

    Enabling Fast, Accurate, and Efficient Real-Time Genome Analysis via New Algorithms and Techniques

    Authors: Can Firtina

    Abstract: The advent of high-throughput sequencing technologies has revolutionized genome analysis by enabling the rapid and cost-effective sequencing of large genomes. Despite these advancements, the increasing complexity and volume of genomic data present significant challenges related to accuracy, scalability, and computational efficiency. These challenges are mainly due to various forms of unwanted and… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: PhD Thesis submitted to ETH Zurich

  4. arXiv:2410.17801  [pdf, other

    q-bio.GN

    Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism

    Authors: Can Firtina, Maximilian Mordig, Harun Mustafa, Sayan Goswami, Nika Mansouri Ghiasi, Stefano Mercogliano, Furkan Eris, Joël Lindegger, Andre Kahles, Onur Mutlu

    Abstract: Raw nanopore signal analysis is a common approach in genomics to provide fast and resource-efficient analysis without translating the signals to bases (i.e., without basecalling). However, existing solutions cannot interpret raw signals directly if a reference genome is unknown due to a lack of accurate mechanisms to handle increased noise in pairwise raw signal comparison. Our goal is to enable t… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  5. arXiv:2406.19113  [pdf, other

    cs.AR cs.DC q-bio.GN

    MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

  6. arXiv:2311.12527  [pdf, other

    cs.AR q-bio.GN q-bio.QM

    MetaStore: High-Performance Metagenomic Analysis via In-Storage Computing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Ma, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advancements in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases containing information on different species' genomes. Metagenomic analysis suffers from significant data movement overhead due to moving large amo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  7. arXiv:2311.02029  [pdf

    q-bio.GN cs.AR q-bio.QM

    MetaTrinity: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation

    Authors: Arvid E. Gollwitzer, Mohammed Alser, Joel Bergtholdt, Joel Lindegger, Maximilian-David Rumpf, Can Firtina, Serghei Mangul, Onur Mutlu

    Abstract: Metagenomics, the study of genome sequences of diverse organisms cohabiting in a shared environment, has experienced significant advancements across various medical and biological fields. Metagenomic analysis is crucial, for instance, in clinical applications such as infectious disease screening and the diagnosis and early detection of diseases such as cancer. A key task in metagenomics is to dete… ▽ More

    Submitted 23 January, 2025; v1 submitted 3 November, 2023; originally announced November 2023.

  8. arXiv:2310.16908  [pdf

    q-bio.GN cs.AR q-bio.QM

    SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences

    Authors: Maximilian-David Rumpf, Mohammed Alser, Arvid E. Gollwitzer, Joel Lindegger, Nour Almadhoun, Can Firtina, Serghei Mangul, Onur Mutlu

    Abstract: Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses. Due to its high computational complexity, optimized exact and heuristic algorithms are still being develop… ▽ More

    Submitted 23 January, 2025; v1 submitted 25 October, 2023; originally announced October 2023.

  9. arXiv:2310.05037  [pdf, other

    q-bio.GN q-bio.QM

    RawAlign: Accurate, Fast, and Scalable Raw Nanopore Signal Mapping via Combining Seeding and Alignment

    Authors: Joël Lindegger, Can Firtina, Nika Mansouri Ghiasi, Mohammad Sadrosadati, Mohammed Alser, Onur Mutlu

    Abstract: Nanopore sequencers generate raw electrical signals representing the contents of a biological sequence molecule passing through the nanopore. These signals can be analyzed directly, avoiding basecalling entirely. We observe that while existing proposals for raw signal analysis typically do well in all metrics for small genomes (e.g., viral genomes), they all perform poorly for large genomes (e.g.,… ▽ More

    Submitted 23 October, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  10. arXiv:2310.04366  [pdf, other

    cs.AR cs.ET q-bio.GN

    Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors

    Authors: Taha Shahroodi, Gagandeep Singh, Mahdi Zahedi, Haiyu Mao, Joel Lindegger, Can Firtina, Stephan Wong, Onur Mutlu, Said Hamdioui

    Abstract: Basecalling, an essential step in many genome analysis studies, relies on large Deep Neural Networks (DNNs) to achieve high accuracy. Unfortunately, these DNNs are computationally slow and inefficient, leading to considerable delays and resource constraints in the sequence analysis process. A Computation-In-Memory (CIM) architecture using memristors can significantly accelerate the performance of… ▽ More

    Submitted 26 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

  11. RawHash2: Mapping Raw Nanopore Signals Using Hash-Based Seeding and Adaptive Quantization

    Authors: Can Firtina, Melina Soysal, Joël Lindegger, Onur Mutlu

    Abstract: Summary: Raw nanopore signals can be analyzed while they are being generated, a process known as real-time analysis. Real-time analysis of raw signals is essential to utilize the unique features that nanopore sequencing provides, enabling the early stopping of the sequencing of a read or the entire sequencing run based on the analysis. The state-of-the-art mechanism, RawHash, offers the first hash… ▽ More

    Submitted 13 August, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted in Bioinformatics: https://doi.org/10.1093/bioinformatics/btae478

    Journal ref: Bioinformatics, 2024, btae478

  12. arXiv:2305.00492  [pdf, ps, other

    cs.AR q-bio.GN

    Accelerating Genome Analysis via Algorithm-Architecture Co-Design

    Authors: Onur Mutlu, Can Firtina

    Abstract: High-throughput sequencing (HTS) technologies have revolutionized the field of genomics, enabling rapid and cost-effective genome analysis for various applications. However, the increasing volume of genomic data generated by HTS technologies presents significant challenges for computational techniques to effectively analyze genomes. To address these challenges, several algorithm-architecture co-de… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: To appear as an invited special session paper at DAC 2023

  13. RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes

    Authors: Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, Onur Mutlu

    Abstract: Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the se… ▽ More

    Submitted 1 June, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

    Comments: To appear in proceedings of ISMB/ECCB 2023

  14. arXiv:2212.04953  [pdf, other

    q-bio.GN cs.AI cs.LG

    TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

    Authors: Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu

    Abstract: Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally inefficient and memory-hungry, bottlenecking the entire genome analysis pipeline. However, for… ▽ More

    Submitted 23 October, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

    Journal ref: Frontiers in Genetics, 15: 1429306, 2024

  15. arXiv:2211.03079  [pdf, other

    cs.AR cs.DC q-bio.GN

    RUBICON: A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers

    Authors: Gagandeep Singh, Mohammed Alser, Kristof Denolf, Can Firtina, Alireza Khodamoradi, Meryem Banu Cavlak, Henk Corporaal, Onur Mutlu

    Abstract: Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The accuracy and speed of basecalling have critical implications for all later steps in genome analysis. Many researchers adopt complex deep learning-based models to perform basecalling without considering the compute demands… ▽ More

    Submitted 5 February, 2024; v1 submitted 6 November, 2022; originally announced November 2022.

  16. arXiv:2209.08600  [pdf, other

    cs.AR cs.DS q-bio.GN

    GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

    Authors: Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, Onur Mutlu

    Abstract: Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The secon… ▽ More

    Submitted 17 December, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

    Comments: 17 pages, 13 figures

  17. arXiv:2207.09765  [pdf, other

    cs.AR cs.AI cs.LG q-bio.GN q-bio.QM

    ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

    Authors: Can Firtina, Kamlesh Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie Kim, Taha Shahroodi, Meryem Banu Cavlak, Joel Lindegger, Mohammed Alser, Juan Gómez Luna, Sreenivas Subramoney, Onur Mutlu

    Abstract: Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highl… ▽ More

    Submitted 21 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM TACO

  18. arXiv:2206.01932  [pdf, other

    cs.AR q-bio.GN

    Demeter: A Fast and Energy-Efficient Food Profiler using Hyperdimensional Computing in Memory

    Authors: Taha Shahroodi, Mahdi Zahedi, Can Firtina, Mohammed Alser, Stephan Wong, Onur Mutlu, Said Hamdioui

    Abstract: Food profiling is an essential step in any food monitoring system needed to prevent health risks and potential frauds in the food industry. Significant improvements in sequencing technologies are pushing food profiling to become the main computational bottleneck. State-of-the-art profilers are unfortunately too costly for food profiling. Our goal is to design a food profiler that solves the main… ▽ More

    Submitted 24 August, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

  19. arXiv:2205.07957  [pdf

    q-bio.GN cs.AR q-bio.QM

    Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis

    Authors: Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun, Haiyu Mao, Gagandeep Singh, Juan Gomez-Luna, Onur Mutlu

    Abstract: We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently exist major computational bottlenecks and inefficiencies throughout the entire genome analysis pipeline, because state-of-the-art genome sequencing technologie… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2008.00961

  20. SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

    Authors: Damla Senol Cali, Konstantinos Kanellopoulos, Joel Lindegger, Zülal Bingöl, Gurpreet S. Kalsi, Ziyi Zuo, Can Firtina, Meryem Banu Cavlak, Jeremie Kim, Nika Mansouri Ghiasi, Gagandeep Singh, Juan Gómez-Luna, Nour Almadhoun Alserr, Mohammed Alser, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu

    Abstract: A critical step of genome sequence analysis is the mapping of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-to-sequence mapping). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in… ▽ More

    Submitted 31 May, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: To appear in ISCA'22

  21. arXiv:2203.16261  [pdf

    q-bio.GN cs.DC cs.SE stat.AP

    Packaging, containerization, and virtualization of computational omics methods: Advances, challenges, and opportunities

    Authors: Mohammed Alser, Sharon Waymost, Ram Ayyala, Brendan Lawlor, Richard J. Abdill, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, Andre M. Ribeiro-dos-Santos, Can Firtina, Nour Almadhoun, Varuni Sarwal, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do, Kim, Malak S. Abedalthagafi, Onur Mutlu, Serghei Mangul

    Abstract: Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging, virtualization, and containerization are different approaches to satisfy this need by wrapping omics tools in additional softwa… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  22. arXiv:2202.10400  [pdf, other

    cs.AR cs.DC cs.OS q-bio.GN

    GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

    Authors: Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, Onur Mutlu

    Abstract: Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). To address the computational challenges in genome analysis, many prior works propose various approaches such as filters that select th… ▽ More

    Submitted 6 April, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: Published at ASPLOS 2022

  23. FastRemap: A Tool for Quickly Remapping Reads between Genome Assemblies

    Authors: Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Can Alkan, Onur Mutlu

    Abstract: A genome read data set can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly-used CrossMap tool. With the explosion of available genomic data sets and references, high-performance remapping tools will be even more important for keeping up with the computation… ▽ More

    Submitted 4 September, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Comments: FastRemap is open source and all scripts needed to replicate the results in this paper can be found at https://github.com/CMU-SAFARI/FastRemap

    Journal ref: Bioinformatics, Sep 30; 38(19):4633-4635, 2022

  24. BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis

    Authors: Can Firtina, Jisung Park, Mohammed Alser, Jeremie S. Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, Onur Mutlu

    Abstract: Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only e… ▽ More

    Submitted 23 May, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Published in NARGAB

    Journal ref: NAR Genomics and Bioinformatics, vol. 5, no. 1, p. lqad004, Mar. 2023

  25. arXiv:2009.07692  [pdf, other

    cs.AR q-bio.GN

    GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

    Authors: Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu

    Abstract: Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. A major co… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: To appear in MICRO 2020

  26. arXiv:1912.08735  [pdf, other

    q-bio.GN cs.CE

    AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

    Authors: Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Mohammed Alser, Nastaran Hajinazar, Can Alkan, Onur Mutlu

    Abstract: AirLift is the first read remapping tool that enables users to quickly and comprehensively map a read set, that had been previously mapped to one reference genome, to another similar reference. Users can then quickly run a downstream analysis of read sets for each latest reference release. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces the overall… ▽ More

    Submitted 11 September, 2024; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: Published in the IEEE/ACM TCBB journal: https://ieeexplore.ieee.org/document/10638724

  27. Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

    Authors: Can Firtina, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A. Ercument Cicek, Can Alkan, Onur Mutlu

    Abstract: Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in downstream genome analysis. Unfortunately, long reads have high sequencing error rates and a large proportion of bps in these long reads are incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis.… ▽ More

    Submitted 7 March, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: 9 pages, 1 figure. Accepted in Bioinformatics

    Journal ref: Bioinformatics . 2020 Jun 1;36(12):3669-3679