Skip to main content

Showing 1–25 of 25 results for author: Patterson, M

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2503.14574  [pdf, other

    q-bio.QM cs.LG

    Sequence Analysis Using the Bezier Curve

    Authors: Taslim Murad, Sarwan Ali, Murray Patterson

    Abstract: The analysis of sequences (e.g., protein, DNA, and SMILES string) is essential for disease diagnosis, biomaterial engineering, genetic engineering, and drug discovery domains. Conventional analytical methods focus on transforming sequences into numerical representations for applying machine learning/deep learning-based sequence characterization. However, their efficacy is constrained by the intrin… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  2. arXiv:2412.20616  [pdf, other

    cs.LG q-bio.OT

    Hilbert Curve Based Molecular Sequence Analysis

    Authors: Sarwan Ali, Tamkanat E Ali, Imdad Ullah Khan, Murray Patterson

    Abstract: Accurate molecular sequence analysis is a key task in the field of bioinformatics. To apply molecular sequence classification algorithms, we first need to generate the appropriate representations of the sequences. Traditional numeric sequence representation techniques are mostly based on sequence alignment that faces limitations in the form of lack of accuracy. Although several alignment-free tech… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  3. arXiv:2409.06694  [pdf, ps, other

    cs.LG q-bio.QM

    DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images

    Authors: Taslim Murad, Prakash Chourasia, Sarwan Ali, Imdad Ullah Khan, Murray Patterson

    Abstract: Cancer is a complex disease characterized by uncontrolled cell growth. T cell receptors (TCRs), crucial proteins in the immune system, play a key role in recognizing antigens, including those associated with cancer. Recent advancements in sequencing technologies have facilitated comprehensive profiling of TCR repertoires, uncovering TCRs with potent anti-cancer activity and enabling TCR-based immu… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

  4. arXiv:2409.04922  [pdf, other

    q-bio.GN cs.AI cs.CC cs.LG

    Nearest Neighbor CCP-Based Molecular Sequence Analysis

    Authors: Sarwan Ali, Prakash Chourasia, Bipin Koirala, Murray Patterson

    Abstract: Molecular sequence analysis is crucial for comprehending several biological processes, including protein-protein interactions, functional annotation, and disease classification. The large number of sequences and the inherently complicated nature of protein structures make it challenging to analyze such data. Finding patterns and enhancing subsequent research requires the use of dimensionality redu… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  5. arXiv:2403.19844  [pdf, other

    q-bio.BM cs.LG physics.chem-ph

    Expanding Chemical Representation with k-mers and Fragment-based Fingerprints for Molecular Fingerprinting

    Authors: Sarwan Ali, Prakash Chourasia, Murray Patterson

    Abstract: This study introduces a novel approach, combining substruct counting, $k$-mers, and Daylight-like fingerprints, to expand the representation of chemical structures in SMILES strings. The integrated method generates comprehensive molecular embeddings that enhance discriminative power and information content. Experimental evaluations demonstrate its superiority over traditional Morgan fingerprinting… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 12 Pages, 3 tables, Accepted at SimBig2023

    Journal ref: SimBig2023

  6. arXiv:2402.08117  [pdf, other

    cs.LG q-bio.QM

    A Universal Non-Parametric Approach For Improved Molecular Sequence Analysis

    Authors: Sarwan Ali, Tamkanat E Ali, Prakash Chourasia, Murray Patterson

    Abstract: In the field of biological research, it is essential to comprehend the characteristics and functions of molecular sequences. The classification of molecular sequences has seen widespread use of neural network-based techniques. Despite their astounding accuracy, these models often require a substantial number of parameters and more data collection. In this work, we present a novel approach based on… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted at The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2024

  7. arXiv:2308.01920  [pdf, other

    q-bio.BM cs.LG

    Sequence-Based Nanobody-Antigen Binding Prediction

    Authors: Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdad Ullah Khan, Murray Patterson

    Abstract: Nanobodies (Nb) are monomeric heavy-chain fragments derived from heavy-chain only antibodies naturally found in Camelids and Sharks. Their considerably small size (~3-4 nm; 13 kDa) and favorable biophysical properties make them attractive targets for recombinant production. Furthermore, their unique ability to bind selectively to specific antigens, such as toxins, chemicals, bacteria, and viruses,… ▽ More

    Submitted 14 July, 2023; originally announced August 2023.

  8. arXiv:2306.05514  [pdf, other

    eess.IV cs.CV cs.LG q-bio.NC

    Robust Brain Age Estimation via Regression Models and MRI-derived Features

    Authors: Mansoor Ahmed, Usama Sardar, Sarwan Ali, Shafiq Alam, Murray Patterson, Imdad Ullah Khan

    Abstract: The determination of biological brain age is a crucial biomarker in the assessment of neurological disorders and understanding of the morphological changes that occur during aging. Various machine learning models have been proposed for estimating brain age through Magnetic Resonance Imaging (MRI) of healthy controls. However, developing a robust brain age estimation (BAE) framework has been challe… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Published at the 15th International Conference on Computational Collective Intelligence

  9. arXiv:2304.13145  [pdf, other

    cs.LG q-bio.QM

    T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification

    Authors: Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad, Murray Patterson

    Abstract: Cancer is a complex disease characterized by uncontrolled cell growth and proliferation. T cell receptors (TCRs) are essential proteins for the adaptive immune system, and their specific recognition of antigens plays a crucial role in the immune response against diseases, including cancer. The diversity and specificity of TCRs make them ideal for targeting cancer cells, and recent advancements in… ▽ More

    Submitted 5 September, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted at ICONIP 2023

  10. arXiv:2304.12328  [pdf, other

    q-bio.GN cs.AI cs.LG

    Virus2Vec: Viral Sequence Classification Using Machine Learning

    Authors: Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Pin-Yu Chen, Imdad Ullah Khan, Murray Patterson

    Abstract: Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: 11 Pages 6 Figures Accepted in conference Conference on Health, Inference, and Learning (CHIL) 2023

  11. arXiv:2304.06731  [pdf, other

    q-bio.QM cs.LG

    PCD2Vec: A Poisson Correction Distance-Based Approach for Viral Host Classification

    Authors: Sarwan Ali, Taslim Murad, Murray Patterson

    Abstract: Coronaviruses are membrane-enveloped, non-segmented positive-strand RNA viruses belonging to the Coronaviridae family. Various animal species, mainly mammalian and avian, are severely infected by various coronaviruses, causing serious concerns like the recent pandemic (COVID-19). Therefore, building a deeper understanding of these viruses is essential to devise prevention and mitigation mechanisms… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted at International Joint Conference on Neural Networks (IJCNN) 2023

  12. arXiv:2304.02891  [pdf, other

    q-bio.GN cs.LG

    ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

    Authors: Sarwan Ali, Prakash Chourasia, Zahra Tayebi, Babatunde Bello, Murray Patterson

    Abstract: The amount of sequencing data for SARS-CoV-2 is several orders of magnitude larger than any virus. This will continue to grow geometrically for SARS-CoV-2, and other viruses, as many countries heavily finance genomic surveillance efforts. Hence, we need methods for processing large amounts of sequence data to allow for effective yet timely decision-making. Such data will come from heterogeneous so… ▽ More

    Submitted 7 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: 24 pages, 5 figures, accepted to Springer Medical & Biological Engineering & Computing

  13. arXiv:2304.00291  [pdf, ps, other

    cs.LG q-bio.GN

    BioSequence2Vec: Efficient Embedding Generation For Biological Sequences

    Authors: Sarwan Ali, Usama Sardar, Murray Patterson, Imdad Ullah Khan

    Abstract: Representation learning is an important step in the machine learning pipeline. Given the current biological sequencing data volume, learning an explicit representation is prohibitive due to the dimensionality of the resulting feature vectors. Kernel-based methods, e.g., SVM, are a proven efficient and useful alternative for several machine learning (ML) tasks such as sequence classification. Three… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: Accepted to PAKDD 2023

  14. arXiv:2303.02421  [pdf, other

    cs.LG q-bio.QM

    Exploring The Potential Of GANs In Biological Sequence Analysis

    Authors: Taslim Murad, Sarwan Ali, Murray Patterson

    Abstract: Biological sequence analysis is an essential step toward building a deeper understanding of the underlying functions, structures, and behaviors of the sequences. It can help in identifying the characteristics of the associated organisms, like viruses, etc., and building prevention mechanisms to eradicate their spread and impact, as viruses are known to cause epidemics that can become pandemics glo… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

  15. arXiv:2211.08267  [pdf, other

    q-bio.QM cs.LG q-bio.GN

    Reads2Vec: Efficient Embedding of Raw High-Throughput Sequencing Reads Data

    Authors: Prakash Chourasia, Sarwan Ali, Simone Ciccolella, Gianluca Della Vedova, Murray Patterson

    Abstract: The massive amount of genomic data appearing for SARS-CoV-2 since the beginning of the COVID-19 pandemic has challenged traditional methods for studying its dynamics. As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared. Such a tool is tailored to take as input assembled, aligned and curated full-length sequences, su… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  16. arXiv:2209.04952  [pdf, other

    cs.LG q-bio.QM

    Efficient Approximate Kernel Based Spike Sequence Classification

    Authors: Sarwan Ali, Bikram Sahoo, Muhammad Asad Khan, Alexander Zelikovsky, Imdad Ullah Khan, Murray Patterson

    Abstract: Machine learning (ML) models, such as SVM, for tasks like classification and clustering of sequences, require a definition of distance/similarity between pairs of sequences. Several methods have been proposed to compute the similarity between sequences, such as the exact approach that counts the number of matches between $k$-mers (sub-sequences of length $k$) and an approximate approach that estim… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at "IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)"

  17. arXiv:2207.08898  [pdf, other

    q-bio.GN cs.LG

    Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification

    Authors: Sarwan Ali, Bikram Sahoo, Alexander Zelikovskiy, Pin-Yu Chen, Murray Patterson

    Abstract: The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome -- millions of sequences and counting. This amount of data, while being orders of magnitude beyond the capacity of traditional approaches to understanding the diversity, dynamics, and evolution of viruses is nonetheless a rich resource for machine learning (ML) approaches as… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  18. arXiv:2201.02273  [pdf, other

    q-bio.GN cs.LG q-bio.QM

    PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences

    Authors: Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Yijing Zhou, Murray Patterson

    Abstract: COVID-19 pandemic, is still unknown and is an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona-) viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  19. arXiv:2110.09622  [pdf, other

    cs.LG q-bio.QM

    Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants

    Authors: Zahra Tayebi, Sarwan Ali, Murray Patterson

    Abstract: The widespread availability of large amounts of genomic data on the SARS-CoV-2 virus, as a result of the COVID-19 pandemic, has created an opportunity for researchers to analyze the disease at a level of detail unlike any virus before it. One one had, this will help biologists, policy makers and other authorities to make timely and appropriate decisions to control the spread of the coronavirus. On… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  20. arXiv:2110.00809  [pdf, other

    cs.LG q-bio.QM

    Characterizing SARS-CoV-2 Spike Sequences Based on Geographical Location

    Authors: Sarwan Ali, Babatunde Bello, Zahra Tayebi, Murray Patterson

    Abstract: With the rapid spread of COVID-19 worldwide, viral genomic data is available in the order of millions of sequences on public databases such as GISAID. This Big Data creates a unique opportunity for analysis towards the research of effective vaccine development for current pandemics, and avoiding or mitigating future pandemics. One piece of information that comes with every such viral sequence is t… ▽ More

    Submitted 12 October, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

    Comments: Accepted at Journal of Computational Biology (JCB)

  21. arXiv:2109.05019  [pdf, other

    q-bio.GN cs.LG

    Spike2Vec: An Efficient and Scalable Embedding Approach for COVID-19 Spike Sequences

    Authors: Sarwan Ali, Murray Patterson

    Abstract: With the rapid global spread of COVID-19, more and more data related to this virus is becoming available, including genomic sequence data. The total number of genomic sequences that are publicly available on platforms such as GISAID is currently several million, and is increasing with every day. The availability of such \emph{Big Data} creates a new opportunity for researchers to study this virus… ▽ More

    Submitted 15 November, 2021; v1 submitted 11 September, 2021; originally announced September 2021.

    Comments: Accepted at IEEE International Conference on Big Data (IEEE Big Data)

  22. arXiv:2108.08143  [pdf, other

    q-bio.PE cs.LG

    Effective and scalable clustering of SARS-CoV-2 sequences

    Authors: Sarwan Ali, Tamkanat-E-Ali, Muhammad Asad Khan, Imdadullah Khan, Murray Patterson

    Abstract: SARS-CoV-2, like any other virus, continues to mutate as it spreads, according to an evolutionary process. Unlike any other virus, the number of currently available sequences of SARS-CoV-2 in public databases such as GISAID is already several million. This amount of data has the potential to uncover the evolutionary dynamics of a virus like never before. However, a million is already several order… ▽ More

    Submitted 12 October, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

    Comments: To Appear in: International Conference on Big Data Research (ICBDR)

  23. arXiv:2108.03465  [pdf, other

    q-bio.QM cs.LG

    A k-mer Based Approach for SARS-CoV-2 Variant Identification

    Authors: Sarwan Ali, Bikram Sahoo, Naimat Ullah, Alexander Zelikovskiy, Murray Patterson, Imdadullah Khan

    Abstract: With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2. Identifying particular variants helps to understand and model their spread patterns, design effective mitigation strategies, and prevent future outbreaks. It also plays a crucial role… ▽ More

    Submitted 12 October, 2021; v1 submitted 7 August, 2021; originally announced August 2021.

    Comments: Accepted for Publication at "International Symposium on Bioinformatics Research and Applications (ISBRA), 2021

  24. arXiv:1903.04377  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    SleepNet: Automated Sleep Analysis via Dense Convolutional Neural Network Using Physiological Time Series

    Authors: Bahareh Pourbabaee, Matthew Howe-Patterson, Matthew Patterson, Frederic Benard

    Abstract: In this work, a dense recurrent convolutional neural network (DRCNN) was constructed to detect sleep disorders including arousal, apnea and hypopnea using Polysomnography (PSG) measurement channels provided in the 2018 Physionet challenge database. Our model structure is composed of multiple dense convolutional units (DCU) followed by a bidirectional long-short term memory (LSTM) layer followed by… ▽ More

    Submitted 24 July, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: 20 pages, 4 figures, Accepted to be published by Physiological Measurement Journal

  25. arXiv:1306.6656  [pdf, other

    q-bio.PE q-bio.GN

    Lateral Gene Transfer, Rearrangement and Reconciliation

    Authors: Murray Patterson, Gergely J Szöllősi, Vincent Daubin, Eric Tannier

    Abstract: Background. Models of ancestral gene order reconstruction have progressively integrated different evolutionary patterns and processes such as unequal gene content, gene duplications, and implicitly sequence evolution via reconciled gene trees. In unicellular organisms, these models have so far ignored lateral gene transfer, even though it can have an important confounding effect on such models,… ▽ More

    Submitted 27 June, 2013; originally announced June 2013.

    Comments: submitted for RECOMB CG 2013