Search | arXiv e-print repository

C2PO: Coherent Co-packaged Optics using offset-QAM-16 for Beyond PAM-4 Optical I/O

Authors: Dan Sturm, Marzieyh Rezaei, Alana Dee, Sajjad Moazeni

Abstract: Co-packaged optics (CPO) has emerged as a promising solution for achieving the ultra-high bandwidths, shoreline densities, and energy efficiencies required by future GPUs and network switches for AI. Microring modulators (MRMs) are well suited for transmitters due to their compact size, high energy efficiency, and natural compatibility with dense wavelength-division multiplexing (DWDM). However, e… ▽ More Co-packaged optics (CPO) has emerged as a promising solution for achieving the ultra-high bandwidths, shoreline densities, and energy efficiencies required by future GPUs and network switches for AI. Microring modulators (MRMs) are well suited for transmitters due to their compact size, high energy efficiency, and natural compatibility with dense wavelength-division multiplexing (DWDM). However, extending beyond the recently demonstrated 200 Gb/s will require more advanced modulation formats, such as higher-order coherent modulation (e.g., QAM-16). In this work, we show how microring resonators (MRMs) can be efficiently used to implement phase-constant amplitude modulators and form the building blocks of a transmitter for offset QAM-16, which has been shown to simplify carrier-phase recovery relative to conventional QAM. We simulate and evaluate the performance of our proposed MRM-based coherent CPO (C2PO) transmitters using a foundry-provided commercial silicon photonics process, demonstrating an input-normalized electric field amplitude contrast of 0.64 per dimension. Through full link-level bit error rate modeling, we show that our design achieves 400 Gb/s using offset QAM-16 at a total optical laser power of 9.65 dBm-comparable to that required by conventional QAM-16 MZI-based links, despite using 10-100x less area. We further conduct a thermal simulation to assess the transmitter's thermal stability at the MRM input optical power required to meet a target BER at the desired data rates. Finally, as a proof of concept, we demonstrate 25 Gb/s MRM-based offset QAM-4 modulation with a chip fabricated in the GlobalFoundries 45 nm monolithic silicon photonics process. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.12006 [pdf, ps, other]

crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023

Authors: Navodini Wijethilake, Reuben Dorent, Marina Ivory, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Mohamed Okasha, Anna Oviedova, Hexin Dong, Bogyeong Kang, Guillaume Sallé, Luyi Han, Ziyuan Zhao, Han Liu, Tao Yang, Shahad Hardan, Hussain Alasmawi, Santosh Sanjeev, Yuzhou Zhuang, Satoshi Kondo, Maria Baldeon Calisto, Shaikh Muhammad Uzair Noman, Cancan Chen, Ipek Oguz, Rongguo Zhang , et al. (14 additional authors not shown)

Abstract: The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a mea… ▽ More The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a meaningful and illustrative benchmark. From a clinical application perspective, it aims to automate Vestibular Schwannoma (VS) and cochlea segmentation on T2 scans for more cost-effective VS management. Over time, the challenge objectives have evolved to enhance its clinical relevance. The challenge evolved from using single-institutional data and basic segmentation in 2021 to incorporating multi-institutional data and Koos grading in 2022, and by 2023, it included heterogeneous routine data and sub-segmentation of intra- and extra-meatal tumour components. In this work, we report the findings of the 2022 and 2023 editions and perform a retrospective analysis of the challenge progression over the years. The observations from the successive challenge contributions indicate that the number of outliers decreases with an expanding dataset. This is notable since the diversity of scanning protocols of the datasets concurrently increased. The winning approach of the 2023 edition reduced the number of outliers on the 2021 and 2022 testing data, demonstrating how increased data heterogeneity can enhance segmentation performance even on homogeneous data. However, the cochlea Dice score declined in 2023, likely due to the added complexity from tumour sub-annotations affecting overall segmentation performance. While progress is still needed for clinically acceptable VS segmentation, the plateauing performance suggests that a more challenging cross-modal task may better serve future benchmarking. △ Less

Submitted 24 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

arXiv:2505.18534 [pdf, ps, other]

A DSP-Free Carrier Phase Recovery System using 16-Offset-QAM Laser Forwarded Links for 400Gb/s and Beyond

Authors: Marziyeh Rezaei, Dan Sturm, Pengyu Zeng, Sajjad Moazeni

Abstract: Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been demonstrated, achieving higher data rates and energy efficiencies requires high-order coherent modulations like 16-QAM. Current coherent links rely on energy-intensive digital signal processing (DSP) f… ▽ More Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been demonstrated, achieving higher data rates and energy efficiencies requires high-order coherent modulations like 16-QAM. Current coherent links rely on energy-intensive digital signal processing (DSP) for channel impairment compensation and carrier phase recovery (CPR), which consumes approximately 50pJ/b - 10x higher than future intra-data center requirements. For shorter links, simpler or DSP-free CPR methods can significantly reduce power and complexity. While Costas loops enable CPR for QPSK, they face challenges in scaling to higher-order modulations (e.g., 16/64-QAM) due to varying symbol amplitudes. In this work, we propose an optical coherent link architecture using laser forwarding and a novel DSP-free CPR system using offset-QAM modulation. The proposed analog CPR feedback loop is highly scalable, capable of supporting arbitrary offset-QAM modulations without requiring architectural modifications. This scalability is achieved through its phase error detection mechanism, which operates independently of the data rate and modulation type. We validated this method using GlobalFoundry's monolithic 45nm silicon photonics PDK models, with circuit- and system-level implementation at 100GBaud in the O-band. We will investigate the feedback loop dynamics, circuit-level implementations, and phase-noise performance of the proposed CPR loop. Our method can be adopted to realize low-power QAM optical interconnects for future coherent-lite pluggable transceivers as well as co-packaged optics (CPO) applications. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2409.07645 [pdf, other]

Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review

Authors: Mohsen Azarmi, Mahdi Rezaei, He Wang, Ali Arabian

Abstract: Recent advancements in predicting pedestrian crossing intentions for Autonomous Vehicles using Computer Vision and Deep Neural Networks are promising. However, the black-box nature of DNNs poses challenges in understanding how the model works and how input features contribute to final predictions. This lack of interpretability delimits the trust in model performance and hinders informed decisions… ▽ More Recent advancements in predicting pedestrian crossing intentions for Autonomous Vehicles using Computer Vision and Deep Neural Networks are promising. However, the black-box nature of DNNs poses challenges in understanding how the model works and how input features contribute to final predictions. This lack of interpretability delimits the trust in model performance and hinders informed decisions on feature selection, representation, and model optimisation; thereby affecting the efficacy of future research in the field. To address this, we introduce Context-aware Permutation Feature Importance (CAPFI), a novel approach tailored for pedestrian intention prediction. CAPFI enables more interpretability and reliable assessments of feature importance by leveraging subdivided scenario contexts, mitigating the randomness of feature values through targeted shuffling. This aims to reduce variance and prevent biased estimations in importance scores during permutations. We divide the Pedestrian Intention Estimation (PIE) dataset into 16 comparable context sets, measure the baseline performance of five distinct neural network architectures for intention prediction in each context, and assess input feature importance using CAPFI. We observed nuanced differences among models across various contextual characteristics. The research reveals the critical role of pedestrian bounding boxes and ego-vehicle speed in predicting pedestrian intentions, and potential prediction biases due to the speed feature through cross-context permutation evaluation. We propose an alternative feature representation by considering proximity change rate for rendering dynamic pedestrian-vehicle locomotion, thereby enhancing the contributions of input features to intention prediction. These findings underscore the importance of contextual features and their diversity to develop accurate and robust intent-predictive models. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2405.14044 [pdf, other]

Single Input Multi Output Model of Molecular Communication via Diffusion with Spheroidal Receivers

Authors: Ibrahim Isik, Mitra Rezaei, Adam Noel

Abstract: Spheroids are aggregates of cells that can mimic the cellular organization often found in tissues. They are typically formed through the self-assembly of cells in a culture where there is a promotion of interactions and cell-to-cell communication. Spheroids can be created from various cell types, including cancer cells, stem cells, and primary cells, and they serve as valuable tools in biological… ▽ More Spheroids are aggregates of cells that can mimic the cellular organization often found in tissues. They are typically formed through the self-assembly of cells in a culture where there is a promotion of interactions and cell-to-cell communication. Spheroids can be created from various cell types, including cancer cells, stem cells, and primary cells, and they serve as valuable tools in biological research. In this letter, molecule propagation from a point source is simulated in the presence of multiple spheroids to observe the impact of the spheroids on the spatial molecule distribution. The spheroids are modeled as porous media with a corresponding effective diffusion coefficient. System variations are considered with a higher spheroid porosity (i.e., with a higher effective diffusion coefficient) and with molecule uptake by the spheroid cells (approximated as a first-order degradation reaction while molecules diffuse within the spheroid). Results provide initial insights about the molecule propagation dynamics and their potential to model transport and drug delivery within crowded spheroid systems. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: submitted to TMBMC journal

arXiv:2402.12810 [pdf, ps, other]

doi 10.1109/TITS.2025.3570794

PIP-Net: Pedestrian Intention Prediction in the Wild

Authors: Mohsen Azarmi, Mahdi Rezaei, He Wang

Abstract: Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spat… ▽ More Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spatial features from the driving scene, the proposed model employs a recurrent and temporal attention-based solution, outperforming state-of-the-art performance. To enhance the visual representation of road users and their proximity to the ego vehicle, we introduce a categorical depth feature map, combined with a local motion flow feature, providing rich insights into the scene dynamics. Additionally, we explore the impact of expanding the camera's field of view, from one to three cameras surrounding the ego vehicle, leading to an enhancement in the model's contextual perception. Depending on the traffic scenario and road environment, the model excels in predicting pedestrian crossing intentions up to 4 seconds in advance, which is a breakthrough in current research studies in pedestrian intention prediction. Finally, for the first time, we present the Urban-PIP dataset, a customised pedestrian intention prediction dataset, with multi-camera annotations in real-world automated driving scenarios. △ Less

Submitted 6 July, 2025; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Author Accepted Version in IEEE Transactions on Intelligent Transportation Systems

Journal ref: in IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 7, pp. 9824-9837, July 2025

arXiv:2308.10033 [pdf]

CRC-ICM: Colorectal Cancer Immune Cell Markers Pattern Dataset

Authors: Zahra Mokhtari, Elham Amjadi, Hamidreza Bolhasani, Zahra Faghih, AmirReza Dehghanian, Marzieh Rezaei

Abstract: Colorectal Cancer (CRC) is the second most common cause of cancer death in the world, ad can be identified by the location of the primary tumor in the large intestine: right and left colon, and rectum. Based on the location, CRC shows differences in chromosomal and molecular characteristics, microbiomes incidence, pathogenesis, and outcome. It has been shown that tumors on left and right sides als… ▽ More Colorectal Cancer (CRC) is the second most common cause of cancer death in the world, ad can be identified by the location of the primary tumor in the large intestine: right and left colon, and rectum. Based on the location, CRC shows differences in chromosomal and molecular characteristics, microbiomes incidence, pathogenesis, and outcome. It has been shown that tumors on left and right sides also have different immune landscape, so the prognosis may be different based on the primary tumor locations. It is widely accepted that immune components of the tumor microenvironment (TME) plays a critical role in tumor development. One of the critical regulatory molecules in the TME is immune checkpoints that as the gatekeepers of immune responses regulate the infiltrated immune cell functions. Inhibitory immune checkpoints such as PD-1, Tim3, and LAG3, as the main mechanism of immune suppression in TME overexpressed and result in further development of the tumor. The images of this dataset have been taken from colon tissues of patients with CRC, stained with specific antibodies for CD3, CD8, CD45RO, PD-1, LAG3 and Tim3. The name of this dataset is CRC-ICM and contains 1756 images related to 136 patients. The initial version of CRC-ICM is published on Elsevier Mendeley dataset portal, and the latest version is accessible via: https://databiox.com △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2305.15421 [pdf]

Generative Adversarial Networks for Brain Images Synthesis: A Review

Authors: Firoozeh Shomal Zadeh, Sevda Molani, Maysam Orouskhani, Marziyeh Rezaei, Mehrzad Shafiei, Hossein Abbasi

Abstract: In medical imaging, image synthesis is the estimation process of one image (sequence, modality) from another image (sequence, modality). Since images with different modalities provide diverse biomarkers and capture various features, multi-modality imaging is crucial in medicine. While multi-screening is expensive, costly, and time-consuming to report by radiologists, image synthesis methods are ca… ▽ More In medical imaging, image synthesis is the estimation process of one image (sequence, modality) from another image (sequence, modality). Since images with different modalities provide diverse biomarkers and capture various features, multi-modality imaging is crucial in medicine. While multi-screening is expensive, costly, and time-consuming to report by radiologists, image synthesis methods are capable of artificially generating missing modalities. Deep learning models can automatically capture and extract the high dimensional features. Especially, generative adversarial network (GAN) as one of the most popular generative-based deep learning methods, uses convolutional networks as generators, and estimated images are discriminated as true or false based on a discriminator network. This review provides brain image synthesis via GANs. We summarized the recent developments of GANs for cross-modality brain image synthesis including CT to PET, CT to MRI, MRI to PET, and vice versa. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 9 pages, 3 tabels, 4 figures

MSC Class: 68T07 ACM Class: I.2.m

arXiv:2305.10421 [pdf]

Evolving Tsukamoto Neuro Fuzzy Model for Multiclass Covid 19 Classification with Chest X Ray Images

Authors: Marziyeh Rezaei, Sevda Molani, Negar Firoozeh, Hossein Abbasi, Farzan Vahedifard, Maysam Orouskhani

Abstract: Du e to rapid population growth and the need to use artificial intelligence to make quick decisions, developing a machine learning-based disease detection model and abnormality identification system has greatly improved the level of medical diagnosis Since COVID-19 has become one of the most severe diseases in the world, developing an automatic COVID-19 detection framework helps medical doctors in… ▽ More Du e to rapid population growth and the need to use artificial intelligence to make quick decisions, developing a machine learning-based disease detection model and abnormality identification system has greatly improved the level of medical diagnosis Since COVID-19 has become one of the most severe diseases in the world, developing an automatic COVID-19 detection framework helps medical doctors in the diagnostic process of disease and provides correct and fast results. In this paper, we propose a machine lear ning based framework for the detection of Covid 19. The proposed model employs a Tsukamoto Neuro Fuzzy Inference network to identify and distinguish Covid 19 disease from normal and pneumonia cases. While the traditional training methods tune the parameters of the neuro-fuzzy model by gradient-based algorithms and recursive least square method, we use an evolutionary-based optimization, the Cat swarm algorithm to update the parameters. In addition, six texture features extracted from chest X-ray images are give n as input to the model. Finally, the proposed model is conducted on the chest X-ray dataset to detect Covid 19. The simulation results indicate that the proposed model achieves an accuracy of 98.51%, sensitivity of 98.35%, specificity of 98.08%, and F1 score of 98.17%. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 14 pages, 5 figures, 3 tables

MSC Class: 68W50 ACM Class: I.5.0

arXiv:2212.07560 [pdf, other]

Multi-level and multi-modal feature fusion for accurate 3D object detection in Connected and Automated Vehicles

Authors: Yiming Hou, Mahdi Rezaei, Richard Romano

Abstract: Aiming at highly accurate object detection for connected and automated vehicles (CAVs), this paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor by developing a novel LIDAR-Camera fusion scheme. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded d… ▽ More Aiming at highly accurate object detection for connected and automated vehicles (CAVs), this paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor by developing a novel LIDAR-Camera fusion scheme. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded during the convolutional process. The novel fusion scheme effectively fuses features across sensory modalities and convolutional layers to find the best representative global features. The fused features are shared by a two-stage network: the region proposal network (RPN) and the detection head (DH). The RPN generates high-recall proposals, and the DH produces final detection results. The experimental results show the proposed model outperforms more recent research on the KITTI 2D and 3D detection benchmark, particularly for distant and highly occluded instances. △ Less

Submitted 19 December, 2022; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2211.09979 [pdf]

Comparison between EM and FCM algorithms in skin tone extraction

Authors: Elham Ravanbakhsh, Mosab Rezaei, Ehsan Namjoo, Padideh Choobdar

Abstract: This study aims to investigate implementing EM and FCM algorithms for skin color extraction. The capabilities of three well-known color spaces, namely, RGB, HSV, and YCbCr for skin-tone extraction are assessed by using statistical modeling of skin tones using EM and FCM algorithms. The results show that utilizing a Gaussian mixture model for parametric modeling of skin tones using EM algorithm wor… ▽ More This study aims to investigate implementing EM and FCM algorithms for skin color extraction. The capabilities of three well-known color spaces, namely, RGB, HSV, and YCbCr for skin-tone extraction are assessed by using statistical modeling of skin tones using EM and FCM algorithms. The results show that utilizing a Gaussian mixture model for parametric modeling of skin tones using EM algorithm works well in HSV color space when all three components of the color vector are used. In spite of discarding the luminance components in YCbCr and HSV color spaces, EM algorithm provides the best results. The results of the detailed comparisons are explained in the conclusion. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 2016 1st International Conference on New Research Achievements in Electrical and Computer Engineering (ICNRAECE)

arXiv:2204.01729 [pdf, other]

Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models

Authors: Ashkan Khakzar, Yawei Li, Yang Zhang, Mirac Sanisoglu, Seong Tae Kim, Mina Rezaei, Bernd Bischl, Nassir Navab

Abstract: One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and… ▽ More One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2109.10777 [pdf, other]

Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images

Authors: Farzin Soleymani, Mohammad Eslami, Tobias Elze, Bernd Bischl, Mina Rezaei

Abstract: We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivari… ▽ More We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications. The source code of the method and all the experiments are available publicly at: https://github.com/csfarzin/DVC △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: arXiv admin note: text overlap with arXiv:2109.05232

arXiv:2109.09435 [pdf]

Incremental Learning Techniques for Online Human Activity Recognition

Authors: Meysam Vakili, Masoumeh Rezaei

Abstract: Unobtrusive and smart recognition of human activities using smartphones inertial sensors is an interesting topic in the field of artificial intelligence acquired tremendous popularity among researchers, especially in recent years. A considerable challenge that needs more attention is the real-time detection of physical activities, since for many real-world applications such as health monitoring an… ▽ More Unobtrusive and smart recognition of human activities using smartphones inertial sensors is an interesting topic in the field of artificial intelligence acquired tremendous popularity among researchers, especially in recent years. A considerable challenge that needs more attention is the real-time detection of physical activities, since for many real-world applications such as health monitoring and elderly care, it is required to recognize users' activities immediately to prevent severe damages to individuals' wellness. In this paper, we propose a human activity recognition (HAR) approach for the online prediction of physical movements, benefiting from the capabilities of incremental learning algorithms. We develop a HAR system containing monitoring software and a mobile application that collects accelerometer and gyroscope data and send them to a remote server via the Internet for classification and recognition operations. Six incremental learning algorithms are employed and evaluated in this work and compared with several batch learning algorithms commonly used for developing offline HAR systems. The Final results indicated that considering all performance evaluation metrics, Incremental K-Nearest Neighbors and Incremental Naive Bayesian outperformed other algorithms, exceeding a recognition accuracy of 95% in real-time. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: 16 pages, 5 figures, 7 tables

arXiv:2105.04881 [pdf]

doi 10.1016/j.compbiomed.2021.104697

Applications of Deep Learning Techniques for Automated Multiple Sclerosis Detection Using Magnetic Resonance Imaging: A Review

Authors: Afshin Shoeibi, Marjane Khodatars, Mahboobeh Jafari, Parisa Moridian, Mitra Rezaei, Roohallah Alizadehsani, Fahime Khozeimeh, Juan Manuel Gorriz, Jónathan Heras, Maryam Panahiazar, Saeid Nahavandi, U. Rajendra Acharya

Abstract: Multiple Sclerosis (MS) is a type of brain disease which causes visual, sensory, and motor problems for people with a detrimental effect on the functioning of the nervous system. In order to diagnose MS, multiple screening methods have been proposed so far; among them, magnetic resonance imaging (MRI) has received considerable attention among physicians. MRI modalities provide physicians with fund… ▽ More Multiple Sclerosis (MS) is a type of brain disease which causes visual, sensory, and motor problems for people with a detrimental effect on the functioning of the nervous system. In order to diagnose MS, multiple screening methods have been proposed so far; among them, magnetic resonance imaging (MRI) has received considerable attention among physicians. MRI modalities provide physicians with fundamental information about the structure and function of the brain, which is crucial for the rapid diagnosis of MS lesions. Diagnosing MS using MRI is time-consuming, tedious, and prone to manual errors. Hence, computer aided diagnosis systems (CADS) based on artificial intelligence (AI) methods have been proposed in recent years for accurate diagnosis of MS using MRI neuroimaging modalities. In the AI field, automated MS diagnosis is being conducted using (i) conventional machine learning and (ii) deep learning (DL) techniques. The conventional machine learning approach is based on feature extraction and selection by trial and error. In DL, these steps are performed by the DL model itself. In this paper, a complete review of automated MS diagnosis methods performed using DL techniques with MRI neuroimaging modalities are discussed. Also, each work is thoroughly reviewed and discussed. Finally, the most important challenges and future directions in the automated MS diagnosis using DL techniques coupled with MRI modalities are presented in detail. △ Less

Submitted 9 August, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

Journal ref: Computers in Biology and Medicine,Volume 136,2021,104697

arXiv:2011.13851 [pdf, other]

Real-time Active Vision for a Humanoid Soccer Robot Using Deep Reinforcement Learning

Authors: Soheil Khatibi, Meisam Teimouri, Mahdi Rezaei

Abstract: In this paper, we present an active vision method using a deep reinforcement learning approach for a humanoid soccer-playing robot. The proposed method adaptively optimises the viewpoint of the robot to acquire the most useful landmarks for self-localisation while keeping the ball into its viewpoint. Active vision is critical for humanoid decision-maker robots with a limited field of view. To deal… ▽ More In this paper, we present an active vision method using a deep reinforcement learning approach for a humanoid soccer-playing robot. The proposed method adaptively optimises the viewpoint of the robot to acquire the most useful landmarks for self-localisation while keeping the ball into its viewpoint. Active vision is critical for humanoid decision-maker robots with a limited field of view. To deal with an active vision problem, several probabilistic entropy-based approaches have previously been proposed which are highly dependent on the accuracy of the self-localisation model. However, in this research, we formulate the problem as an episodic reinforcement learning problem and employ a Deep Q-learning method to solve it. The proposed network only requires the raw images of the camera to move the robot's head toward the best viewpoint. The model shows a very competitive rate of 80% success rate in achieving the best viewpoint. We implemented the proposed method on a humanoid robot simulated in Webots simulator. Our evaluations and experimental results show that the proposed method outperforms the entropy-based methods in the RoboCup context, in cases with high self-localisation errors. △ Less

Submitted 27 November, 2020; originally announced November 2020.

Comments: The paper has been accepted in ICAART 2021

arXiv:2008.11672 [pdf, other]

doi 10.3390/app10217514

DeepSOCIAL: Social Distancing Monitoring and Infection Risk Assessment in COVID-19 Pandemic

Authors: Mahdi Rezaei, Mohsen Azarmi

Abstract: Social distancing is a recommended solution by the World Health Organisation (WHO) to minimise the spread of COVID-19 in public places. The majority of governments and national health authorities have set the 2-meter physical distancing as a mandatory safety measure in shopping centres, schools and other covered areas. In this research, we develop a hybrid Computer Vision and YOLOv4-based Deep Neu… ▽ More Social distancing is a recommended solution by the World Health Organisation (WHO) to minimise the spread of COVID-19 in public places. The majority of governments and national health authorities have set the 2-meter physical distancing as a mandatory safety measure in shopping centres, schools and other covered areas. In this research, we develop a hybrid Computer Vision and YOLOv4-based Deep Neural Network model for automated people detection in the crowd in indoor and outdoor environments using common CCTV security cameras. The proposed DNN model in combination with an adapted inverse perspective mapping (IPM) technique and SORT tracking algorithm leads to a robust people detection and social distancing monitoring. The model has been trained against two most comprehensive datasets by the time of the research the Microsoft Common Objects in Context (MS COCO) and Google Open Image datasets. The system has been evaluated against the Oxford Town Centre dataset with superior performance compared to three state-of-the-art methods. The evaluation has been conducted in challenging conditions, including occlusion, partial visibility, and under lighting variations with the mean average precision of 99.8% and the real-time speed of 24.1 fps. We also provide an online infection risk assessment scheme by statistical analysis of the Spatio-temporal data from people's moving trajectories and the rate of social distancing violations. The developed model is a generic and accurate people detection and tracking solution that can be applied in many other fields such as autonomous vehicles, human action recognition, anomaly detection, sports, crowd analysis, or any other research areas where the human detection is in the centre of attention. △ Less

Submitted 28 November, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

Journal ref: Applied Sciences. 2020, 10, 7514

arXiv:2007.06392 [pdf, other]

DeepHAZMAT: Hazardous Materials Sign Detection and Segmentation with Restricted Computational Resources

Authors: Amir Sharifi, Ahmadreza Zibaei, Mahdi Rezaei

Abstract: One of the most challenging and non-trivial tasks in robot-based rescue operations is the Hazardous Materials or HAZMATs sign detection in the operation field, to prevent further unexpected disasters. Each Hazmat sign has a specific meaning that the rescue robot should detect and interpret it to take a safe action, accordingly. Accurate Hazmat detection and real-time processing are the two most im… ▽ More One of the most challenging and non-trivial tasks in robot-based rescue operations is the Hazardous Materials or HAZMATs sign detection in the operation field, to prevent further unexpected disasters. Each Hazmat sign has a specific meaning that the rescue robot should detect and interpret it to take a safe action, accordingly. Accurate Hazmat detection and real-time processing are the two most important factors in such robotics applications. Furthermore, we also have to cope with some secondary challenges such as image distortion and restricted CPU and computational resources which are embedded in a rescue robot. In this paper, we propose a CNN-Based pipeline called DeepHAZMAT for detecting and segmenting Hazmats in four steps; 1) optimising the number of input images that are fed into the CNN network, 2) using the YOLOv3-tiny structure to collect the required visual information from the hazardous areas, 3) Hazmat sign segmentation and separation from the background using GrabCut technique, and 4) post-processing the result with morphological operators and convex hull algorithm. In spite of the utilisation of a very limited memory and CPU resources, the experimental results show the proposed method has successfully maintained a better performance in terms of detection-speed and detection-accuracy, compared with the state-of-the-art methods. △ Less

Submitted 18 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

arXiv:2007.02811 [pdf, other]

Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method

Authors: Fatemeh Serpush, Mahdi Rezaei

Abstract: Automated human action recognition is one of the most attractive and practical research fields in computer vision, in spite of its high computational costs. In such systems, the human action labelling is based on the appearance and patterns of the motions in the video sequences; however, the conventional methodologies and classic neural networks cannot use temporal information for action recogniti… ▽ More Automated human action recognition is one of the most attractive and practical research fields in computer vision, in spite of its high computational costs. In such systems, the human action labelling is based on the appearance and patterns of the motions in the video sequences; however, the conventional methodologies and classic neural networks cannot use temporal information for action recognition prediction in the upcoming frames in a video sequence. On the other hand, the computational cost of the preprocessing stage is high. In this paper, we address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences. Furthermore, we extract the key features of the representative frame rather than the entire features. We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method. The combination of a CNN and the LSTM recursive network is considered for feature selection and maintaining the previous information, and finally, a Softmax-KNN classifier is used for labelling human activities. We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short. To evaluate the proposed method, we use the UCF dataset for the benchmarking which is widely-used among researchers in action recognition research. The dataset includes 101 complicated activities in the wild. Experimental results show a significant improvement in terms of accuracy and speed in comparison with six state-of-the-art articles. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Showing 1–19 of 19 results for author: Rezaei, M