Skip to main content

Showing 1–34 of 34 results for author: Saeed, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.04267  [pdf, other

    cs.DS cs.DM cs.IT math.PR stat.CO

    Efficient Rejection Sampling in the Entropy-Optimal Range

    Authors: Thomas L. Draper, Feras A. Saad

    Abstract: The problem of generating a random variate $X$ from a finite discrete probability distribution $P$ using an entropy source of independent unbiased coin flips is considered. The Knuth and Yao complexity theory of nonuniform random number generation furnishes a family of "entropy-optimal" sampling algorithms that consume between $H(P)$ and $H(P)+2$ coin flips per generated output, where $H$ is the S… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

    Comments: 37 pages, 11 figures, 4 tables, 4 algorithms

  2. arXiv:2503.14484  [pdf, other

    cs.MA cs.AI cs.CL

    Gricean Norms as a Basis for Effective Collaboration

    Authors: Fardin Saad, Pradeep K. Murukannaiah, Munindar P. Singh

    Abstract: Effective human-AI collaboration hinges not only on the AI agent's ability to follow explicit instructions but also on its capacity to navigate ambiguity, incompleteness, invalidity, and irrelevance in communication. Gricean conversational and inference norms facilitate collaboration by aligning unclear instructions with cooperative principles. We propose a normative framework that integrates Gric… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Accepted to AAMAS 2025. 8 pages (excl. references), 9 figures/tables. (Appendix: 5 pages, 6 figures/tables). Code available at: https://github.com/fardinsaad/Gricean-Norms

  3. GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables

    Authors: Mathieu Huot, Matin Ghavami, Alexander K. Lew, Ulrich Schaechtle, Cameron E. Freer, Zane Shelby, Martin C. Rinard, Feras A. Saad, Vikash K. Mansinghka

    Abstract: This article presents GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL's query planner rests on a unified programmatic interface for interacting with probabilistic model… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 54 pages, 30 figures, 1 table, published at PLDI 2024

  4. arXiv:2403.07657  [pdf, other

    cs.LG cs.AI stat.AP stat.ME

    Scalable Spatiotemporal Prediction with Bayesian Neural Fields

    Authors: Feras Saad, Jacob Burnim, Colin Carroll, Brian Patton, Urs Köster, Rif A. Saurous, Matthew Hoffman

    Abstract: Spatiotemporal datasets, which consist of spatially-referenced time series, are ubiquitous in diverse applications, such as air pollution monitoring, disease tracking, and cloud-demand forecasting. As the scale of modern datasets increases, there is a growing need for statistical methods that are flexible enough to capture complex spatiotemporal dynamics and scalable enough to handle many observat… ▽ More

    Submitted 26 November, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 29 pages, 7 figures, 2 tables, 1 listing

    Journal ref: Nature Communications 15(7942), 2024

  5. arXiv:2401.13912  [pdf, other

    cs.LG

    A Survey of Deep Learning and Foundation Models for Time Series Forecasting

    Authors: John A. Miller, Mohammed Aldosari, Farah Saeed, Nasid Habib Barna, Subas Rana, I. Budak Arpinar, Ninghao Liu

    Abstract: Deep Learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting. For example, in the well-known Makridakis (M) Competitions, hybrids of traditional statistical or machine learning techniques have only recently become the top performers. With the recent architectural advances in deep learning being applied to time seri… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  6. arXiv:2308.07501  [pdf, other

    cs.DB

    Data-CASE: Grounding Data Regulations for Compliant Data Processing Systems

    Authors: Vishal Chakraborty, Stacy Ann-Elvy, Sharad Mehrotra, Faisal Nawab, Mohammad Sadoghi, Shantanu Sharma, Nalini Venkatsubhramanian, Farhan Saeed

    Abstract: Data regulations, such as GDPR, are increasingly being adopted globally to protect against unsafe data management practices. Such regulations are, often ambiguous (with multiple valid interpretations) when it comes to defining the expected dynamic behavior of data processing systems. This paper argues that it is possible to represent regulations such as GDPR formally as invariants using a (small s… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: To appear in EDBT '24

  7. arXiv:2307.09607  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Sequential Monte Carlo Learning for Time Series Structure Discovery

    Authors: Feras A. Saad, Brian J. Patton, Matthew D. Hoffman, Rif A. Saurous, Vikash K. Mansinghka

    Abstract: This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 17 pages, 8 figures, 2 tables. Appearing in ICML 2023

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:29473-29489, 2023

  8. arXiv:2202.12363  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Estimators of Entropy and Information via Inference in Probabilistic Models

    Authors: Feras A. Saad, Marco Cusumano-Towner, Vikash K. Mansinghka

    Abstract: Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use imp… ▽ More

    Submitted 12 December, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: 18 pages, 8 figures. Appearing in AISTATS 2022

    Journal ref: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5604-5621, 2022

  9. arXiv:2111.10776  [pdf

    cs.CL cs.HC cs.LG

    A Case Study on the Independence of Speech Emotion Recognition in Bangla and English Languages using Language-Independent Prosodic Features

    Authors: Fardin Saad, Hasan Mahmud, Mohammad Ridwan Kabir, Md. Alamin Shaheen, Paresha Farastu, Md. Kamrul Hasan

    Abstract: A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we performed a step-by-step comparative analysis of Speech Emotion Recognition (SER) using Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. Six emotions were categorized for this study, such as - happy, angry, neut… ▽ More

    Submitted 13 May, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

    Comments: 13 pages [currently under review]

  10. arXiv:2110.03877  [pdf

    eess.IV cs.AI cs.CV

    Designing the Architecture of a Convolutional Neural Network Automatically for Diabetic Retinopathy Diagnosis

    Authors: Fahman Saeed, Muhammad Hussain, Hatim A Aboalsamh, Fadwa Al Adel, Adi Mohammed Al Owaifeer

    Abstract: The prevalence of diabetic retinopathy (DR) has reached 34.6% worldwide and is a major cause of blindness among middle-aged diabetic patients. Regular DR screening using fundus photography helps detect its complications and prevent its progression to advanced levels. As manual screening is time-consuming and subjective, machine learning (ML) and deep learning (DL) have been employed to aid graders… ▽ More

    Submitted 7 November, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: 20 pages, 6 figures

  11. arXiv:2108.07208  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Hierarchical Infinite Relational Model

    Authors: Feras A. Saad, Vikash K. Mansinghka

    Abstract: This paper describes the hierarchical infinite relational model (HIRM), a new probabilistic generative model for noisy, sparse, and heterogeneous relational data. Given a set of relations defined over a collection of domains, the model first infers multiple non-overlapping clusters of relations using a top-level Chinese restaurant process. Within each cluster of relations, a Dirichlet process mixt… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    Comments: 11 pages, 6 figures, 4 tables. Appearing in UAI 2021

    Journal ref: Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence, PMLR 161:1067-1077, 2021

  12. Communication-avoiding micro-architecture to compute Xcorr scores for peptide identification

    Authors: Sumesh Kumar, Fahad Saeed

    Abstract: Database algorithms play a crucial part in systems biology studies by identifying proteins from mass spectrometry data. Many of these database search algorithms incur huge computational costs by computing similarity scores for each pair of sparse experimental spectrum and candidate theoretical spectrum vectors. Modern MS instrumentation techniques which are capable of generating high-resolution sp… ▽ More

    Submitted 5 August, 2021; v1 submitted 31 July, 2021; originally announced August 2021.

    Comments: 4 pages, 5 figures

  13. arXiv:2102.07004  [pdf

    cs.LG cs.AI

    Weight Initialization Techniques for Deep Learning Algorithms in Remote Sensing: Recent Trends and Future Perspectives

    Authors: Wadii Boulila, Maha Driss, Mohamed Al-Sarem, Faisal Saeed, Moez Krichen

    Abstract: During the last decade, several research works have focused on providing novel deep learning methods in many application fields. However, few of them have investigated the weight initialization process for deep learning, although its importance is revealed in improving deep learning performance. This can be justified by the technical difficulties in proposing new techniques for this promising rese… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

  14. arXiv:2102.02286  [pdf, other

    cs.DC q-bio.MN

    HiCOPS: High Performance Computing Framework for Tera-Scale Database Search of Mass Spectrometry based Omics Data

    Authors: Muhammad Haseeb, Fahad Saeed

    Abstract: Database-search algorithms, that deduce peptides from Mass Spectrometry (MS) data, have tried to improve the computational efficiency to accomplish larger, and more complex systems biology studies. Existing serial, and high-performance computing (HPC) search engines, otherwise highly successful, are known to exhibit poor-scalability with increasing size of theoretical search-space needed for incre… ▽ More

    Submitted 5 February, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: Under peer review. 37 pages, 9 figures, 5 tables

  15. arXiv:2010.03485  [pdf, other

    cs.PL cs.LG cs.SC stat.CO stat.ML

    SPPL: Probabilistic Programming with Fast Exact Symbolic Inference

    Authors: Feras A. Saad, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: We present the Sum-Product Probabilistic Language (SPPL), a new probabilistic programming language that automatically delivers exact solutions to a broad range of probabilistic inference queries. SPPL translates probabilistic programs into sum-product expressions, a new symbolic representation and associated semantic domain that extends standard sum-product networks to support mixed-type distribut… ▽ More

    Submitted 11 June, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI '21), June 20-25, 2021, Virtual, Canada. ACM, New York, NY, USA

  16. arXiv:2009.14123  [pdf, other

    cs.DC q-bio.GN q-bio.MN

    Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data

    Authors: Fahad Saeed, Muhammad Haseeb, SS Iyengar

    Abstract: Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is rea… ▽ More

    Submitted 11 August, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: 20 pages, 6 figures, preprint

  17. arXiv:2006.02570  [pdf, other

    eess.IV cs.CV cs.LG

    Exploration of Interpretability Techniques for Deep COVID-19 Classification using Chest X-ray Images

    Authors: Soumick Chatterjee, Fatima Saad, Chompunuch Sarasaen, Suhita Ghosh, Valerie Krug, Rupali Khatun, Rahul Mishra, Nirja Desai, Petia Radeva, Georg Rose, Sebastian Stober, Oliver Speck, Andreas Nürnberger

    Abstract: The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing infected patients. Medical imaging, such as X-ray and Computed Tomography (CT), combined with the potential of Artificial Intelligence (AI), plays an essential role in supporting medical pers… ▽ More

    Submitted 24 January, 2024; v1 submitted 3 June, 2020; originally announced June 2020.

    Journal ref: Journal of Imaging. 2024; 10(2):45

  18. arXiv:2003.03830  [pdf, other

    stat.CO cs.DM cs.DS cs.IT math.PR

    The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

    Authors: Feras A. Saad, Cameron E. Freer, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: This paper introduces a new algorithm for the fundamental problem of generating a random integer from a discrete probability distribution using a source of independent and unbiased random coin flips. We prove that this algorithm, which we call the Fast Loaded Dice Roller (FLDR), is highly efficient in both space and time: (i) the size of the sampler is guaranteed to be linear in the number of bits… ▽ More

    Submitted 1 June, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

    Comments: 12 pages, 5 figures, 1 table. Appearing in AISTATS 2020

    Journal ref: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, PMLR 108:1036-1046, 2020

  19. arXiv:2003.01541  [pdf

    q-bio.NC cs.LG stat.ML

    Explainable and Scalable Machine-Learning Algorithms for Detection of Autism Spectrum Disorder using fMRI Data

    Authors: Taban Eslami, Joseph S. Raiker, Fahad Saeed

    Abstract: Diagnosing Autism Spectrum Disorder (ASD) is a challenging problem, and is based purely on behavioral descriptions of symptomology (DSM-5/ICD-10), and requires informants to observe children with disorder across different settings (e.g. home, school). Numerous limitations (e.g., informant discrepancies, lack of adherence to assessment guidelines, informant biases) to current diagnostic practices h… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  20. arXiv:2001.04555  [pdf, other

    cs.DS cs.DM cs.IT math.PR stat.CO

    Optimal Approximate Sampling from Discrete Probability Distributions

    Authors: Feras A. Saad, Cameron E. Freer, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

    Journal ref: Proc. ACM Program. Lang. 4, POPL, Article 36 (January 2020)

  21. arXiv:1907.06249  [pdf, other

    cs.PL cs.AI cs.LG stat.CO

    Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

    Authors: Feras A. Saad, Marco F. Cusumano-Towner, Ulrich Schaechtle, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs in these modeling languages given observed data. We… ▽ More

    Submitted 14 July, 2019; originally announced July 2019.

    Journal ref: Proc. ACM Program. Lang. 3, POPL, Article 37 (January 2019)

  22. arXiv:1904.07577  [pdf, other

    cs.LG eess.IV stat.ML

    ASD-DiagNet: A hybrid learning approach for detection of Autism Spectrum Disorder using fMRI data

    Authors: Taban Eslami, Vahid Mirjalili, Alvis Fong, Angela Laird, Fahad Saeed

    Abstract: Mental disorders such as Autism Spectrum Disorders (ASD) are heterogeneous disorders that are notoriously difficult to diagnose, especially in children. The current psychiatric diagnostic process is based purely on the behavioural observation of symptomology (DSM-5/ICD-10) and may be prone to over-prescribing of drugs due to misdiagnosis. In order to move the field towards more quantitative fashio… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

  23. arXiv:1902.10142  [pdf, other

    math.ST cs.LG stat.ME

    A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions

    Authors: Feras A. Saad, Cameron E. Freer, Nathanael L. Ackerman, Vikash K. Mansinghka

    Abstract: The objective of goodness-of-fit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete distributions on high-dimensional domains. The test is readily implemented using a simulation-based, linear-time procedure. The testing procedu… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: 20 pages, 6 figures. Appearing in AISTATS 2019

    Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, PMLR 89:1640-1649, 2019

  24. arXiv:1710.06900  [pdf, other

    stat.ME cs.LG stat.ML

    Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series

    Authors: Feras A. Saad, Vikash K. Mansinghka

    Abstract: This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series data. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant… ▽ More

    Submitted 1 April, 2018; v1 submitted 18 October, 2017; originally announced October 2017.

    Comments: 19 pages, 10 figures, 2 tables. Appearing in AISTATS 2018

    Journal ref: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, PMLR 84:755-764, 2018

  25. arXiv:1704.01087  [pdf, other

    cs.AI cs.DB cs.LG stat.ML

    Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

    Authors: Feras Saad, Leonardo Casarsa, Vikash Mansinghka

    Abstract: Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching structured data based on probabilistic programming and nonparametric Bayes. Users specify queries in a probabilistic language that combines standard SQL database searc… ▽ More

    Submitted 4 April, 2017; originally announced April 2017.

  26. arXiv:1611.01708  [pdf, other

    stat.ML cs.AI cs.LG

    Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes

    Authors: Feras Saad, Vikash Mansinghka

    Abstract: Datasets with hundreds of variables and many missing values are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes an approach that combines probabilistic programming, information theory, and non-parametric Bayes. It shows how to use Bayesian non-p… ▽ More

    Submitted 26 March, 2017; v1 submitted 5 November, 2016; originally announced November 2016.

    Journal ref: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:632-641, 2017

  27. arXiv:1608.05347  [pdf, other

    cs.AI cs.LG stat.ML

    Probabilistic Data Analysis with Probabilistic Programming

    Authors: Feras Saad, Vikash Mansinghka

    Abstract: Probabilistic techniques are central to data analysis, but different approaches can be difficult to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include hierarchical Bayesi… ▽ More

    Submitted 18 August, 2016; originally announced August 2016.

  28. arXiv:1301.0834  [pdf, ps, other

    cs.DS q-bio.GN q-bio.QM

    An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry Data

    Authors: Fahad Saeed, Trairak Pisitkun, Mark A. Knepper, Jason D. Hoffert

    Abstract: High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data wh… ▽ More

    Submitted 4 January, 2013; originally announced January 2013.

    Comments: 4 pages, 4 figures, Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on

    Journal ref: IEEE Proceedings publications 2012

  29. arXiv:1104.5510  [pdf, other

    q-bio.QM cs.CE cs.DB cs.DS q-bio.MN

    Mining Temporal Patterns from iTRAQ Mass Spectrometry(LC-MS/MS) Data

    Authors: Fahad Saeed, Trairak Pisitkun, Mark A. Knepper, Jason D. Hoffert

    Abstract: Large-scale proteomic analysis is emerging as a powerful technique in biology and relies heavily on data acquired by state-of-the-art mass spectrometers. As with any other field in Systems Biology, computational tools are required to deal with this ocean of data. iTRAQ (isobaric Tags for Relative and Absolute quantification) is a technique that allows simultaneous quantification of proteins from m… ▽ More

    Submitted 28 April, 2011; originally announced April 2011.

    Comments: 12 pages, 10 figures, The Proceedings of the ISCA 3rd International Conference on Bioinformatics and Computational Biology (BiCoB), pp 152-159 New Orleans, Louisiana, USA, March 23-25, 2011 (ISBN: 978-1-880843-81-9)

  30. arXiv:0906.3771  [pdf

    cs.NI

    High Transmission Bit Rate of A thermal Arrayed Waveguide Grating (AWG) Module in Passive Optical Networks

    Authors: Abd El Naser A. Mohammed, Ahmed Nabih Zaki Rashed, Gaber E. S. M. El Abyad, Abd El Fattah A. Saad

    Abstract: In the present paper, high transmission bit rate of a thermal arrayed waveguide grating (AWG) which is composed of lithium niobate (LiNbO3)/polymethyl metha acrylate (PMMA) hybrid materials on a silicon substrate in Passive Optical Networks (PONs) has parametrically analyzed and investigated over wide range of the affecting parameters. We have theoretically investigated the temperature dependent… ▽ More

    Submitted 19 June, 2009; originally announced June 2009.

    Comments: 9 pages, International Journal of Computer Science and Information Security

    Journal ref: IJCSIS, May 2009, Vol. 1

  31. A Domain Decomposition Strategy for Alignment of Multiple Biological Sequences on Multiprocessor Platforms

    Authors: Fahad Saeed, Ashfaq Khokhar

    Abstract: Multiple Sequences Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic analysis, and prediction of evolutionary origins. The MSA problem is considered NP-hard and known heuristics for the problem do not scale well with increasing… ▽ More

    Submitted 11 May, 2009; originally announced May 2009.

    Comments: 36 pages, 17 figures, Accepted manuscript in Journal of Parallel and Distributed Computing(JPDC)

    Journal ref: as: F. Saeed, A. Khokhar, A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms, J. Parallel Distrib. Comput. (2009)

  32. arXiv:0901.2751  [pdf, other

    cs.DS cs.DC q-bio.GN q-bio.QM

    Pyro-Align: Sample-Align based Multiple Alignment system for Pyrosequencing Reads of Large Number

    Authors: Fahad Saeed

    Abstract: Pyro-Align is a multiple alignment program specifically designed for pyrosequencing reads of huge number. Multiple sequence alignment is shown to be NP-hard and heuristics are designed for approximate solutions. Multiple sequence alignment of pyrosequenceing reads is complex mainly because of 2 factors. One being the huge number of reads, making the use of traditional heuristics,that scale very… ▽ More

    Submitted 18 January, 2009; originally announced January 2009.

    Comments: 6 pages, 1 figure, Technical Report, Department of Biosystems Science and Engineering, ETH Zurich Switzerland

    Report number: DBSSE-08-2008

  33. arXiv:0901.2747  [pdf

    cs.DS q-bio.GN q-bio.QM

    An Overview of Multiple Sequence Alignment Systems

    Authors: Fahad Saeed, Ashfaq Khokhar

    Abstract: An overview of current multiple alignment systems to date are described.The useful algorithms, the procedures adopted and their limitations are presented.We also present the quality of the alignments obtained and in which cases(kind of alignments, kind of sequences etc) the particular systems are useful.

    Submitted 18 January, 2009; originally announced January 2009.

    Comments: 24 pages, 15 figures, Technical Report Parallel Algorithms & Multimedia System Laboratory

    Report number: PAMS-05-2007

  34. arXiv:0901.2742  [pdf

    cs.DC q-bio.GN q-bio.QM

    Sample-Align-D: A High Performance Multiple Sequence Alignment System using Phylogenetic Sampling and Domain Decomposition

    Authors: Fahad Saeed, Ashfaq Khokhar

    Abstract: Multiple Sequence Alignment (MSA) is one of the most computationally intensive tasks in Computational Biology. Existing best known solutions for multiple sequence alignment take several hours (in some cases days) of computation time to align, for example, 2000 homologous sequences of average length 300. Inspired by the Sample Sort approach in parallel processing, in this paper we propose a highl… ▽ More

    Submitted 18 January, 2009; originally announced January 2009.

    Comments: 12 pages, 8 figures, paper appeared in HICOMB, IPDPS 2008