-
Multimodal joint prediction of traffic spatial-temporal data with graph sparse attention mechanism and bidirectional temporal convolutional network
Authors:
Dongran Zhang,
Jiangnan Yan,
Kemal Polat,
Adi Alhudhaif,
Jun Li
Abstract:
Traffic flow prediction plays a crucial role in the management and operation of urban transportation systems. While extensive research has been conducted on predictions for individual transportation modes, there is relatively limited research on joint prediction across different transportation modes. Furthermore, existing multimodal traffic joint modeling methods often lack flexibility in spatial-…
▽ More
Traffic flow prediction plays a crucial role in the management and operation of urban transportation systems. While extensive research has been conducted on predictions for individual transportation modes, there is relatively limited research on joint prediction across different transportation modes. Furthermore, existing multimodal traffic joint modeling methods often lack flexibility in spatial-temporal feature extraction. To address these issues, we propose a method called Graph Sparse Attention Mechanism with Bidirectional Temporal Convolutional Network (GSABT) for multimodal traffic spatial-temporal joint prediction. First, we use a multimodal graph multiplied by self-attention weights to capture spatial local features, and then employ the Top-U sparse attention mechanism to obtain spatial global features. Second, we utilize a bidirectional temporal convolutional network to enhance the temporal feature correlation between the output and input data, and extract inter-modal and intra-modal temporal features through the share-unique module. Finally, we have designed a multimodal joint prediction framework that can be flexibly extended to both spatial and temporal dimensions. Extensive experiments conducted on three real datasets indicate that the proposed model consistently achieves state-of-the-art predictive performance.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Emotional Speaker Identification using a Novel Capsule Nets Model
Authors:
Ali Bou Nassif,
Ismail Shahin,
Ashraf Elnagar,
Divya Velayudhan,
Adi Alhudhaif,
Kemal Polat
Abstract:
Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment.…
▽ More
Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between low-level features with their pooling technique, this study investigates the performance of using CapsNets in identifying speakers from emotional speech recordings. A CapsNet-based speaker identification model is proposed and evaluated using three distinct speech databases, i.e., the Emirati Speech Database, SUSAS Dataset, and RAVDESS (open-access). The proposed model is also compared to baseline systems. Experimental results demonstrate that the novel proposed CapsNet model trains faster and provides better results over current state-of-the-art schemes. The effect of the routing algorithm on speaker identification performance was also studied by varying the number of iterations, both with and without a decoder network.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Novel Hybrid DNN Approaches for Speaker Verification in Emotional and Stressful Talking Environments
Authors:
Ismail Shahin,
Ali Bou Nassif,
Nawel Nemmour,
Ashraf Elnagar,
Adi Alhudhaif,
Kemal Polat
Abstract:
In this work, we conducted an empirical comparative study of the performance of text-independent speaker verification in emotional and stressful environments. This work combined deep models with shallow architecture, which resulted in novel hybrid classifiers. Four distinct hybrid models were utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural network-Gaussian mixture model (D…
▽ More
In this work, we conducted an empirical comparative study of the performance of text-independent speaker verification in emotional and stressful environments. This work combined deep models with shallow architecture, which resulted in novel hybrid classifiers. Four distinct hybrid models were utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural network-Gaussian mixture model (DNN-GMM), Gaussian mixture model-deep neural network (GMM-DNN), and hidden Markov model-deep neural network (HMM-DNN). All models were based on novel implemented architecture. The comparative study used three distinct speech datasets: a private Arabic dataset and two public English databases, namely, Speech Under Simulated and Actual Stress (SUSAS) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The test results of the aforementioned hybrid models demonstrated that the proposed HMM-DNN leveraged the verification performance in emotional and stressful environments. Results also showed that HMM-DNN outperformed all other hybrid models in terms of equal error rate (EER) and area under the curve (AUC) evaluation metrics. The average resulting verification system based on the three datasets yielded EERs of 7.19%, 16.85%, 11.51%, and 11.90% based on HMM-DNN, DNN-HMM, DNN-GMM, and GMM-DNN, respectively. Furthermore, we found that the DNN-GMM model demonstrated the least computational complexity compared to all other hybrid models in both talking environments. Conversely, the HMM-DNN model required the greatest amount of training time. Findings also demonstrated that EER and AUC values depended on the database when comparing average emotional and stressful performances.
△ Less
Submitted 26 December, 2021;
originally announced December 2021.
-
Novel Dual-Channel Long Short-Term Memory Compressed Capsule Networks for Emotion Recognition
Authors:
Ismail Shahin,
Noor Hindawi,
Ali Bou Nassif,
Adi Alhudhaif,
Kemal Polat
Abstract:
Recent analysis on speech emotion recognition has made considerable advances with the use of MFCCs spectrogram features and the implementation of neural network approaches such as convolutional neural networks (CNNs). Capsule networks (CapsNet) have gained gratitude as alternatives to CNNs with their larger capacities for hierarchical representation. To address these issues, this research introduc…
▽ More
Recent analysis on speech emotion recognition has made considerable advances with the use of MFCCs spectrogram features and the implementation of neural network approaches such as convolutional neural networks (CNNs). Capsule networks (CapsNet) have gained gratitude as alternatives to CNNs with their larger capacities for hierarchical representation. To address these issues, this research introduces a text-independent and speaker-independent SER novel architecture, where a dual-channel long short-term memory compressed-CapsNet (DC-LSTM COMP-CapsNet) algorithm is proposed based on the structural features of CapsNet. Our proposed novel classifier can ensure the energy efficiency of the model and adequate compression method in speech emotion recognition, which is not delivered through the original structure of a CapsNet. Moreover, the grid search approach is used to attain optimal solutions. Results witnessed an improved performance and reduction in the training and testing running time. The speech datasets used to evaluate our algorithm are: Arabic Emirati-accented corpus, English speech under simulated and actual stress corpus, English Ryerson audio-visual database of emotional speech and song corpus, and crowd-sourced emotional multimodal actors dataset. This work reveals that the optimum feature extraction method compared to other known methods is MFCCs delta-delta. Using the four datasets and the MFCCs delta-delta, DC-LSTM COMP-CapsNet surpasses all the state-of-the-art systems, classical classifiers, CNN, and the original CapsNet. Using the Arabic Emirati-accented corpus, our results demonstrate that the proposed work yields average emotion recognition accuracy of 89.3% compared to 84.7%, 82.2%, 69.8%, 69.2%, 53.8%, 42.6%, and 31.9% based on CapsNet, CNN, support vector machine, multi-layer perceptron, k-nearest neighbor, radial basis function, and naive Bayes, respectively.
△ Less
Submitted 26 December, 2021;
originally announced December 2021.
-
Estimation of atrial fibrillation from lead-I ECGs: Comparison with cardiologists and machine learning model (CurAlive), a clinical validation study
Authors:
N. Korucuk,
C. Polat,
E. S. Gunduz,
O. Karaman,
V. Tosun,
M. Onac,
N. Yildirim,
Y. Cete,
K. Polat
Abstract:
Electrocardiogram recognition of cardiac arrhythmias is critical for cardiac abnormality diagnosis. Because of their strong prediction characteristics, artificial neural networks are the preferred method in medical diagnosis systems. This study presents a method to detect atrial fibrillation with lead-I ECGs using artificial intelligence. The aim of the study is to compare the accuracy of the diag…
▽ More
Electrocardiogram recognition of cardiac arrhythmias is critical for cardiac abnormality diagnosis. Because of their strong prediction characteristics, artificial neural networks are the preferred method in medical diagnosis systems. This study presents a method to detect atrial fibrillation with lead-I ECGs using artificial intelligence. The aim of the study is to compare the accuracy of the diagnoses estimated by cardiologists and artificial intelligence over lead-I ECGs using 12-lead ECGs as references. To evaluate the performance of the proposed model, dataset were collected from China Physiological Signal Challenge 2018. In the study, diagnoses were examined in three groups as normal sinus rhythm, atrial fibrillation and OTHER. All rhythm and beat types except NSR and AFIB were labeled as OTHER super-class. OTHER contains First-degree atrioventricular blocks, Conduction disturbances, Left bundle branch block, Right bundle branch block, Premature atrial contraction, Premature ventricular contraction, ST-segment depression and ST-segment elevated type ECGs. CurAlive A.I. model which is using DenseNet as a CNN architecture and continuous wavelet transform as feature extraction method, showed a great performance on classifying ECGs from only lead-I compared to cardiologists. The AI model reached the weighted average precision, recall, F1-score and total accuracy 94.1%, 93.6%, 93.7% and 93.6% respectively, and the average of each of the three cardiologists has reached weighted average precision, recall, F1-score and total accuracy 82.2%, 54.6%, 57.5% and 54.6% respectively. This study showed that the proposed CNN model CurAlive, can be used to accurately diagnose AFIB, NSR, and OTHER rhythm using lead-I ECGs to accelerate the early detection of AFIB as a cardiologist assistant. It is also able to identify patients into different risk groups as part of remote patient monitoring systems.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Classification of COVID-19 in Chest CT Images using Convolutional Support Vector Machines
Authors:
Umut Özkaya,
Şaban Öztürk,
Serkan Budak,
Farid Melgani,
Kemal Polat
Abstract:
Purpose: Coronavirus 2019 (COVID-19), which emerged in Wuhan, China and affected the whole world, has cost the lives of thousands of people. Manual diagnosis is inefficient due to the rapid spread of this virus. For this reason, automatic COVID-19 detection studies are carried out with the support of artificial intelligence algorithms. Methods: In this study, a deep learning model that detects COV…
▽ More
Purpose: Coronavirus 2019 (COVID-19), which emerged in Wuhan, China and affected the whole world, has cost the lives of thousands of people. Manual diagnosis is inefficient due to the rapid spread of this virus. For this reason, automatic COVID-19 detection studies are carried out with the support of artificial intelligence algorithms. Methods: In this study, a deep learning model that detects COVID-19 cases with high performance is presented. The proposed method is defined as Convolutional Support Vector Machine (CSVM) and can automatically classify Computed Tomography (CT) images. Unlike the pre-trained Convolutional Neural Networks (CNN) trained with the transfer learning method, the CSVM model is trained as a scratch. To evaluate the performance of the CSVM method, the dataset is divided into two parts as training (%75) and testing (%25). The CSVM model consists of blocks containing three different numbers of SVM kernels. Results: When the performance of pre-trained CNN networks and CSVM models is assessed, CSVM (7x7, 3x3, 1x1) model shows the highest performance with 94.03% ACC, 96.09% SEN, 92.01% SPE, 92.19% PRE, 94.10% F1-Score, 88.15% MCC and 88.07% Kappa metric values. Conclusion: The proposed method is more effective than other methods. It has proven in experiments performed to be an inspiration for combating COVID and for future studies.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.