-
Functional Classification of Spiking Signal Data Using Artificial Intelligence Techniques: A Review
Authors:
Danial Sharifrazi,
Nouman Javed,
Javad Hassannataj Joloudari,
Roohallah Alizadehsani,
Prasad N. Paradkar,
Ru-San Tan,
U. Rajendra Acharya,
Asim Bhatti
Abstract:
Human brain neuron activities are incredibly significant nowadays. Neuronal behavior is assessed by analyzing signal data such as electroencephalography (EEG), which can offer scientists valuable information about diseases and human-computer interaction. One of the difficulties researchers confront while evaluating these signals is the existence of large volumes of spike data. Spikes are some cons…
▽ More
Human brain neuron activities are incredibly significant nowadays. Neuronal behavior is assessed by analyzing signal data such as electroencephalography (EEG), which can offer scientists valuable information about diseases and human-computer interaction. One of the difficulties researchers confront while evaluating these signals is the existence of large volumes of spike data. Spikes are some considerable parts of signal data that can happen as a consequence of vital biomarkers or physical issues such as electrode movements. Hence, distinguishing types of spikes is important. From this spot, the spike classification concept commences. Previously, researchers classified spikes manually. The manual classification was not precise enough as it involves extensive analysis. Consequently, Artificial Intelligence (AI) was introduced into neuroscience to assist clinicians in classifying spikes correctly. This review discusses the importance and use of AI in spike classification, focusing on the recognition of neural activity noises. The task is divided into three main components: preprocessing, classification, and evaluation. Existing methods are introduced and their importance is determined. The review also highlights the need for more efficient algorithms. The primary goal is to provide a perspective on spike classification for future research and provide a comprehensive understanding of the methodologies and issues involved. The review organizes materials in the spike classification field for future studies. In this work, numerous studies were extracted from different databases. The PRISMA-related research guidelines were then used to choose papers. Then, research studies based on spike classification using machine learning and deep learning approaches with effective preprocessing were selected.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina
Authors:
Yasin Sadeghi Bazargani,
Majid Mirzaei,
Navid Sobhi,
Mirsaeed Abdollahi,
Ali Jafarizadeh,
Siamak Pedrammehr,
Roohallah Alizadehsani,
Ru San Tan,
Sheikh Mohammed Shariful Islam,
U. Rajendra Acharya
Abstract:
Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled s…
▽ More
Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled systems developed for high-throughput detection of DR using digitized retinal images have become clinically adopted. Beyond DR screening, AI integration also holds immense potential to address challenges associated with the holistic care of the patient with DM. In this work, we aim to comprehensively review the literature for studies on AI applications based on retinal images related to DM diagnosis, prognostication, and management. We will describe the findings of holistic AI-assisted diabetes care, including but not limited to DR screening, and discuss barriers to implementing such systems, including issues concerning ethics, data privacy, equitable access, and explainability. With the ability to evaluate the patient's health status vis a vis DM complication as well as risk prognostication of future cardiovascular complications, AI-assisted retinal image analysis has the potential to become a central tool for modern personalized medicine in patients with DM.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Automated detection of Zika and dengue in Aedes aegypti using neural spiking analysis
Authors:
Danial Sharifrazi,
Nouman Javed,
Roohallah Alizadehsani,
Prasad N. Paradkar,
U. Rajendra Acharya,
Asim Bhatti
Abstract:
Mosquito-borne diseases present considerable risks to the health of both animals and humans. Aedes aegypti mosquitoes are the primary vectors for numerous medically important viruses such as dengue, Zika, yellow fever, and chikungunya. To characterize this mosquito neural activity, it is essential to classify the generated electrical spikes. However, no open-source neural spike classification meth…
▽ More
Mosquito-borne diseases present considerable risks to the health of both animals and humans. Aedes aegypti mosquitoes are the primary vectors for numerous medically important viruses such as dengue, Zika, yellow fever, and chikungunya. To characterize this mosquito neural activity, it is essential to classify the generated electrical spikes. However, no open-source neural spike classification method is currently available for mosquitoes. Our work presented in this paper provides an innovative artificial intelligence-based method to classify the neural spikes in uninfected, dengue-infected, and Zika-infected mosquitoes. Aiming for outstanding performance, the method employs a fusion of normalization, feature importance, and dimension reduction for the preprocessing and combines convolutional neural network and extra gradient boosting (XGBoost) for classification. The method uses the electrical spiking activity data of mosquito neurons recorded by microelectrode array technology. We used data from 0, 1, 2, 3, and 7 days post-infection, containing over 15 million samples, to analyze the method's performance. The performance of the proposed method was evaluated using accuracy, precision, recall, and the F1 scores. The results obtained from the method highlight its remarkable performance in differentiating infected vs uninfected mosquito samples, achieving an average of 98.1%. The performance was also compared with 6 other machine learning algorithms to further assess the method's capability. The method outperformed all other machine learning algorithms' performance. Overall, this research serves as an efficient method to classify the neural spikes of Aedes aegypti mosquitoes and can assist in unraveling the complex interactions between pathogens and mosquitoes.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Artificial Intelligence in Assessing Cardiovascular Diseases and Risk Factors via Retinal Fundus Images: A Review of the Last Decade
Authors:
Mirsaeed Abdollahi,
Ali Jafarizadeh,
Amirhosein Ghafouri Asbagh,
Navid Sobhi,
Keysan Pourmoghtader,
Siamak Pedrammehr,
Houshyar Asadi,
Roohallah Alizadehsani,
Ru-San Tan,
U. Rajendra Acharya
Abstract:
Background: Cardiovascular diseases (CVDs) are the leading cause of death globally. The use of artificial intelligence (AI) methods - in particular, deep learning (DL) - has been on the rise lately for the analysis of different CVD-related topics. The use of fundus images and optical coherence tomography angiography (OCTA) in the diagnosis of retinal diseases has also been extensively studied. To…
▽ More
Background: Cardiovascular diseases (CVDs) are the leading cause of death globally. The use of artificial intelligence (AI) methods - in particular, deep learning (DL) - has been on the rise lately for the analysis of different CVD-related topics. The use of fundus images and optical coherence tomography angiography (OCTA) in the diagnosis of retinal diseases has also been extensively studied. To better understand heart function and anticipate changes based on microvascular characteristics and function, researchers are currently exploring the integration of AI with non-invasive retinal scanning. There is great potential to reduce the number of cardiovascular events and the financial strain on healthcare systems by utilizing AI-assisted early detection and prediction of cardiovascular diseases on a large scale. Method: A comprehensive search was conducted across various databases, including PubMed, Medline, Google Scholar, Scopus, Web of Sciences, IEEE Xplore, and ACM Digital Library, using specific keywords related to cardiovascular diseases and artificial intelligence. Results: The study included 87 English-language publications selected for relevance, and additional references were considered. This paper provides an overview of the recent developments and difficulties in using artificial intelligence and retinal imaging to diagnose cardiovascular diseases. It provides insights for further exploration in this field. Conclusion: Researchers are trying to develop precise disease prognosis patterns in response to the aging population and the growing global burden of CVD. AI and deep learning are revolutionizing healthcare by potentially diagnosing multiple CVDs from a single retinal image. However, swifter adoption of these technologies in healthcare systems is required.
△ Less
Submitted 28 April, 2024; v1 submitted 11 November, 2023;
originally announced November 2023.
-
Empowering Precision Medicine: AI-Driven Schizophrenia Diagnosis via EEG Signals: A Comprehensive Review from 2002-2023
Authors:
Mahboobeh Jafari,
Delaram Sadeghi,
Afshin Shoeibi,
Hamid Alinejad-Rokny,
Amin Beheshti,
David López García,
Zhaolin Chen,
U. Rajendra Acharya,
Juan M. Gorriz
Abstract:
Schizophrenia (SZ) is a prevalent mental disorder characterized by cognitive, emotional, and behavioral changes. Symptoms of SZ include hallucinations, illusions, delusions, lack of motivation, and difficulties in concentration. Diagnosing SZ involves employing various tools, including clinical interviews, physical examinations, psychological evaluations, the Diagnostic and Statistical Manual of M…
▽ More
Schizophrenia (SZ) is a prevalent mental disorder characterized by cognitive, emotional, and behavioral changes. Symptoms of SZ include hallucinations, illusions, delusions, lack of motivation, and difficulties in concentration. Diagnosing SZ involves employing various tools, including clinical interviews, physical examinations, psychological evaluations, the Diagnostic and Statistical Manual of Mental Disorders (DSM), and neuroimaging techniques. Electroencephalography (EEG) recording is a significant functional neuroimaging modality that provides valuable insights into brain function during SZ. However, EEG signal analysis poses challenges for neurologists and scientists due to the presence of artifacts, long-term recordings, and the utilization of multiple channels. To address these challenges, researchers have introduced artificial intelligence (AI) techniques, encompassing conventional machine learning (ML) and deep learning (DL) methods, to aid in SZ diagnosis. This study reviews papers focused on SZ diagnosis utilizing EEG signals and AI methods. The introduction section provides a comprehensive explanation of SZ diagnosis methods and intervention techniques. Subsequently, review papers in this field are discussed, followed by an introduction to the AI methods employed for SZ diagnosis and a summary of relevant papers presented in tabular form. Additionally, this study reports on the most significant challenges encountered in SZ diagnosis, as identified through a review of papers in this field. Future directions to overcome these challenges are also addressed. The discussion section examines the specific details of each paper, culminating in the presentation of conclusions and findings.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Automatic autism spectrum disorder detection using artificial intelligence methods with MRI neuroimaging: A review
Authors:
Parisa Moridian,
Navid Ghassemi,
Mahboobeh Jafari,
Salam Salloum-Asfar,
Delaram Sadeghi,
Marjane Khodatars,
Afshin Shoeibi,
Abbas Khosravi,
Sai Ho Ling,
Abdulhamit Subasi,
Roohallah Alizadehsani,
Juan M. Gorriz,
Sara A Abdulla,
U. Rajendra Acharya
Abstract:
Autism spectrum disorder (ASD) is a brain condition characterized by diverse signs and symptoms that appear in early childhood. ASD is also associated with communication deficits and repetitive behavior in affected individuals. Various ASD detection methods have been developed, including neuroimaging modalities and psychological tests. Among these methods, magnetic resonance imaging (MRI) imaging…
▽ More
Autism spectrum disorder (ASD) is a brain condition characterized by diverse signs and symptoms that appear in early childhood. ASD is also associated with communication deficits and repetitive behavior in affected individuals. Various ASD detection methods have been developed, including neuroimaging modalities and psychological tests. Among these methods, magnetic resonance imaging (MRI) imaging modalities are of paramount importance to physicians. Clinicians rely on MRI modalities to diagnose ASD accurately. The MRI modalities are non-invasive methods that include functional (fMRI) and structural (sMRI) neuroimaging methods. However, diagnosing ASD with fMRI and sMRI for specialists is often laborious and time-consuming; therefore, several computer-aided design systems (CADS) based on artificial intelligence (AI) have been developed to assist specialist physicians. Conventional machine learning (ML) and deep learning (DL) are the most popular schemes of AI used for diagnosing ASD. This study aims to review the automated detection of ASD using AI. We review several CADS that have been developed using ML techniques for the automated diagnosis of ASD using MRI modalities. There has been very limited work on the use of DL techniques to develop automated diagnostic models for ASD. A summary of the studies developed using DL is provided in the Supplementary Appendix. Then, the challenges encountered during the automated diagnosis of ASD using MRI and AI techniques are described in detail. Additionally, a graphical comparison of studies using ML and DL to diagnose ASD automatically is discussed. We suggest future approaches to detecting ASDs using AI techniques and MRI neuroimaging.
△ Less
Submitted 6 October, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
ECG Language Processing (ELP): a New Technique to Analyze ECG Signals
Authors:
Sajad Mousavi,
Fatemeh Afghah,
Fatemeh Khadem,
U. Rajendra Acharya
Abstract:
A language is constructed of a finite/infinite set of sentences composing of words. Similar to natural languages, Electrocardiogram (ECG) signal, the most common noninvasive tool to study the functionality of the heart and diagnose several abnormal arrhythmias, is made up of sequences of three or four distinct waves including the P-wave, QRS complex, T-wave and U-wave. An ECG signal may contain se…
▽ More
A language is constructed of a finite/infinite set of sentences composing of words. Similar to natural languages, Electrocardiogram (ECG) signal, the most common noninvasive tool to study the functionality of the heart and diagnose several abnormal arrhythmias, is made up of sequences of three or four distinct waves including the P-wave, QRS complex, T-wave and U-wave. An ECG signal may contain several different varieties of each wave (e.g., the QRS complex can have various appearances). For this reason, the ECG signal is a sequence of heartbeats similar to sentences in natural languages) and each heartbeat is composed of a set of waves (similar to words in a sentence) of different morphologies. Analogous to natural language processing (NLP) which is used to help computers understand and interpret the human's natural language, it is possible to develop methods inspired by NLP to aid computers to gain a deeper understanding of Electrocardiogram signals. In this work, our goal is to propose a novel ECG analysis technique, \textit{ECG language processing (ELP)}, focusing on empowering computers to understand ECG signals in a way physicians do. We evaluated the proposed method on two tasks including the classification of heartbeats and the detection of atrial fibrillation in the ECG signals. Experimental results on three databases (i.e., PhysionNet's MIT-BIH, MIT-BIH AFIB and PhysioNet Challenge 2017 AFIB Dataset databases) reveal that the proposed method is a general idea that can be applied to a variety of biomedical applications and is able to achieve remarkable performance.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
HAN-ECG: An Interpretable Atrial Fibrillation Detection Model Using Hierarchical Attention Networks
Authors:
Sajad Mousavi,
Fatemeh Afghah,
U. Rajendra Acharya
Abstract:
Atrial fibrillation (AF) is one of the most prevalent cardiac arrhythmias that affects the lives of more than 3 million people in the U.S. and over 33 million people around the world and is associated with a five-fold increased risk of stroke and mortality. like other problems in healthcare domain, artificial intelligence (AI)-based algorithms have been used to reliably detect AF from patients' ph…
▽ More
Atrial fibrillation (AF) is one of the most prevalent cardiac arrhythmias that affects the lives of more than 3 million people in the U.S. and over 33 million people around the world and is associated with a five-fold increased risk of stroke and mortality. like other problems in healthcare domain, artificial intelligence (AI)-based algorithms have been used to reliably detect AF from patients' physiological signals. The cardiologist level performance in detecting this arrhythmia is often achieved by deep learning-based methods, however, they suffer from the lack of interpretability. In other words, these approaches are unable to explain the reasons behind their decisions. The lack of interpretability is a common challenge toward a wide application of machine learning-based approaches in the healthcare which limits the trust of clinicians in such methods. To address this challenge, we propose HAN-ECG, an interpretable bidirectional-recurrent-neural-network-based approach for the AF detection task. The HAN-ECG employs three attention mechanism levels to provide a multi-resolution analysis of the patterns in ECG leading to AF. The first level, wave level, computes the wave weights, the second level, heartbeat level, calculates the heartbeat weights, and third level, window (i.e., multiple heartbeats) level, produces the window weights in triggering a class of interest. The detected patterns by this hierarchical attention model facilitate the interpretation of the neural network decision process in identifying the patterns in the signal which contributed the most to the final prediction. Experimental results on two AF databases demonstrate that our proposed model performs significantly better than the existing algorithms. Visualization of these attention layers illustrates that our model decides upon the important waves and heartbeats which are clinically meaningful in the detection task.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
SleepEEGNet: Automated Sleep Stage Scoring with Sequence to Sequence Deep Learning Approach
Authors:
Sajad Mousavi,
Fatemeh Afghah,
U. Rajendra Acharya
Abstract:
Electroencephalogram (EEG) is a common base signal used to monitor brain activity and diagnose sleep disorders. Manual sleep stage scoring is a time-consuming task for sleep experts and is limited by inter-rater reliability. In this paper, we propose an automatic sleep stage annotation method called SleepEEGNet using a single-channel EEG signal. The SleepEEGNet is composed of deep convolutional ne…
▽ More
Electroencephalogram (EEG) is a common base signal used to monitor brain activity and diagnose sleep disorders. Manual sleep stage scoring is a time-consuming task for sleep experts and is limited by inter-rater reliability. In this paper, we propose an automatic sleep stage annotation method called SleepEEGNet using a single-channel EEG signal. The SleepEEGNet is composed of deep convolutional neural networks (CNNs) to extract time-invariant features, frequency information, and a sequence to sequence model to capture the complex and long short-term context dependencies between sleep epochs and scores. In addition, to reduce the effect of the class imbalance problem presented in the available sleep datasets, we applied novel loss functions to have an equal misclassified error for each sleep stage while training the network. We evaluated the proposed method on different single-EEG channels (i.e., Fpz-Cz and Pz-Oz EEG channels) from the Physionet Sleep-EDF datasets published in 2013 and 2018. The evaluation results demonstrate that the proposed method achieved the best annotation performance compared to current literature, with an overall accuracy of 84.26%, a macro F1-score of 79.66% and Cohen's Kappa coefficient = 0.79. Our developed model is ready to test with more sleep EEG signals and aid the sleep specialists to arrive at an accurate diagnosis. The source code is available at https://github.com/SajadMo/SleepEEGNet.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
ECGNET: Learning where to attend for detection of atrial fibrillation with deep visual attention
Authors:
Sajad Mousavi,
Fatemeh Afghah,
Abolfazl Razi,
U. Rajendra Acharya
Abstract:
The complexity of the patterns associated with Atrial Fibrillation (AF) and the high level of noise affecting these patterns have significantly limited the current signal processing and shallow machine learning approaches to get accurate AF detection results. Deep neural networks have shown to be very powerful to learn the non-linear patterns in the data. While a deep learning approach attempts to…
▽ More
The complexity of the patterns associated with Atrial Fibrillation (AF) and the high level of noise affecting these patterns have significantly limited the current signal processing and shallow machine learning approaches to get accurate AF detection results. Deep neural networks have shown to be very powerful to learn the non-linear patterns in the data. While a deep learning approach attempts to learn complex pattern related to the presence of AF in the ECG, they can benefit from knowing which parts of the signal is more important to focus during learning. In this paper, we introduce a two-channel deep neural network to more accurately detect AF presented in the ECG signal. The first channel takes in a preprocessed ECG signal and automatically learns where to attend for detection of AF. The second channel simultaneously takes in the preprocessed ECG signal to consider all features of entire signals. The model shows via visualization that what parts of the given ECG signal are important to attend while trying to detect atrial fibrillation. In addition, this combination significantly improves the performance of the atrial fibrillation detection (achieved a sensitivity of 99.53%, specificity of 99.26% and accuracy of 99.40% on the MIT-BIH atrial fibrillation database with 5-s ECG segments.)
△ Less
Submitted 14 February, 2019; v1 submitted 8 December, 2018;
originally announced December 2018.
-
Inter- and intra- patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach
Authors:
Sajad Mousavi,
Fatemeh Afghah,
U. Rajendra Acharya
Abstract:
Electrocardiogram (ECG) signal is a common and powerful tool to study heart function and diagnose several abnormal arrhythmias. While there have been remarkable improvements in cardiac arrhythmia classification methods, they still cannot offer acceptable performance in detecting different heart conditions, especially when dealing with imbalanced datasets. In this paper, we propose a solution to ad…
▽ More
Electrocardiogram (ECG) signal is a common and powerful tool to study heart function and diagnose several abnormal arrhythmias. While there have been remarkable improvements in cardiac arrhythmia classification methods, they still cannot offer acceptable performance in detecting different heart conditions, especially when dealing with imbalanced datasets. In this paper, we propose a solution to address this limitation of current classification approaches by developing an automatic heartbeat classification method using deep convolutional neural networks and sequence to sequence models. We evaluated the proposed method on the MIT-BIH arrhythmia database, considering the intra-patient and inter-patient paradigms, and the AAMI EC57 standard. The evaluation results for both paradigms show that our method achieves the best performance in the literature (a positive predictive value of 96.46% and sensitivity of 100% for the category S, and a positive predictive value of 98.68% and sensitivity of 97.40% for the category F for the intra-patient scheme; a positive predictive value of 92.57% and sensitivity of 88.94% for the category S, and a positive predictive value of 99.50% and sensitivity of 99.94% for the category V for the inter-patient scheme.). The source code is available at https://github.com/SajadMo/ECG-Heartbeat-Classification-seq2seq-model.
△ Less
Submitted 12 March, 2019; v1 submitted 8 December, 2018;
originally announced December 2018.
-
A frame-based representation of genomic sequences for removing errors and rare variant detection in NGS data
Authors:
Raunaq Malhotra,
Manjari Mukhopadhyay,
Mary Poss,
Raj Acharya
Abstract:
We propose a frame-based representation of k-mers for detecting sequencing errors and rare variants in next generation sequencing data obtained from populations of closely related genomes. Frames are sets of non-orthogonal basis functions, traditionally used in signal processing for noise removal. We define a frame for genomes and sequenced reads to consist of discrete spatial signals of every k-m…
▽ More
We propose a frame-based representation of k-mers for detecting sequencing errors and rare variants in next generation sequencing data obtained from populations of closely related genomes. Frames are sets of non-orthogonal basis functions, traditionally used in signal processing for noise removal. We define a frame for genomes and sequenced reads to consist of discrete spatial signals of every k-mer of a given size. We show that each k-mer in the sequenced data can be projected onto multiple frames and these projections are maximized for spatial signals corresponding to the k-mer's substrings. Our proposed classifier, MultiRes, is trained on the projections of k-mers as features used for marking k-mers as erroneous or true variations in the genome. We evaluate MultiRes on simulated and real viral population datasets and compare it to other error correction methods known in the literature. MultiRes has 4 to 500 times less false positives k-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their de-novo assembly. It has high recall of the true k-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs), fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is freely available from the GitHub link (https://github.com/raunaq-m/MultiRes).
△ Less
Submitted 16 April, 2016;
originally announced April 2016.
-
Maximum Likelihood de novo reconstruction of viral populations using paired end sequencing data
Authors:
Raunaq Malhotra,
Manjari Mukhopadhyay Steven Wu,
Allen Rodrigo,
Mary Poss,
Raj Acharya
Abstract:
We present MLEHaplo, a maximum likelihood de novo assembly algorithm for reconstructing viral haplotypes in a virus population from paired-end next generation sequencing (NGS) data. Using the pairing information of reads in our proposed Viral Path Reconstruction Algorithm (ViPRA), we generate a small subset of paths from a De Bruijn graph of reads that serve as candidate paths for true viral haplo…
▽ More
We present MLEHaplo, a maximum likelihood de novo assembly algorithm for reconstructing viral haplotypes in a virus population from paired-end next generation sequencing (NGS) data. Using the pairing information of reads in our proposed Viral Path Reconstruction Algorithm (ViPRA), we generate a small subset of paths from a De Bruijn graph of reads that serve as candidate paths for true viral haplotypes. Our proposed method MLEHaplo then generates a maximum likelihood estimate of the viral population using the paths reconstructed by ViPRA. We evaluate and compare MLEHaplo on simulated datasets of 1200 base pairs at different sequence coverage, on HCV strains with sequencing errors, and on a lab mixture of five HIV-1 strains. MLEHaplo reconstructs full length viral haplotypes having a 100% sequence identity to the true viral haplotypes in most of the small genome simulated viral populations at 250x sequencing coverage. While reference based methods either under-estimate or over-estimate the viral haplotypes, MLEHaplo limits the over-estimation to 3 times the size of true viral haplotypes, reconstructs the full phylogeny in the HCV to greater than 99% sequencing identity and captures more sequencing variation for the HIV-1 strains dataset compared to their known consensus sequences.
△ Less
Submitted 16 April, 2016; v1 submitted 14 February, 2015;
originally announced February 2015.
-
Clustering pipeline for determining consensus sequences in targeted next-generation sequencing
Authors:
Raunaq Malhotra,
Daniel Elleder,
Le Bao,
David R Hunter,
Raj Acharya,
Mary Poss
Abstract:
Analyses of targeted genomic sequencing data from next-generation-sequencing (NGS) technologies typically involves mapping reads to a reference sequence or clustering reads. For a number of species a reference genome is not available so the analyses of targeted sequencing data, for example polymorphic structural variation caused by mobile elements is difficult; clustering methods are preferred for…
▽ More
Analyses of targeted genomic sequencing data from next-generation-sequencing (NGS) technologies typically involves mapping reads to a reference sequence or clustering reads. For a number of species a reference genome is not available so the analyses of targeted sequencing data, for example polymorphic structural variation caused by mobile elements is difficult; clustering methods are preferred for such data analysis. Clustering of reads requires a clustering threshold parameter, which is used to compare and group reads. However, determining the optimal clustering threshold for a read dataset is challenging because of different sequence composition, the number of sequences present, and also the amount of sequencing errors in the dataset. High values of the clustering threshold parameter can falsely inflate the number of recovered genomic regions, while low values of clustering threshold can merge reads from distinct regions into a single cluster. Thus, an algorithm that can empirically determine clustering threshold is needed. We propose a pipeline for clustering genomic sequences wherein the clustering threshold is empirically determined from the NGS data. The optimal threshold is decided based on two internal clustering measures which assess clusters for small intra-cluster diameters and large inter-cluster distances. We evaluate the pipeline on two simulated datasets derived from human genome sequence simulating different genomic regions and sequencing depth. The total number of clusters obtained from our pipeline is closer to the actual number of reference sequences when compared to single round of clustering. Also, the number of clusters whose consensus sequence matches a corresponding reference sequence is higher in our pipeline. We observe that the presence of repeat regions affects clustering accuracy.
△ Less
Submitted 13 February, 2016; v1 submitted 6 October, 2014;
originally announced October 2014.