Search | arXiv e-print repository

arXiv:2409.16823 [pdf, other]

Exploring Brain Network Organization in Alzheimer Disease and Frontotemporal Dementia: A Crossplot Transition Entropy Approach

Authors: Shivani Ranjan, Lalan Kumar

Abstract: Dementia poses a growing challenge in our aging society. Frontotemporal dementia (FTD) and Alzheimer disease (AD) are the leading causes of early-onset dementia. FTD and AD display unique traits in their onset, progression, and treatment responses. In particular, FTD often faces a prolonged diagnostic process and is commonly misdiagnosed with AD due to overlapping symptoms. This study utilizes a c… ▽ More Dementia poses a growing challenge in our aging society. Frontotemporal dementia (FTD) and Alzheimer disease (AD) are the leading causes of early-onset dementia. FTD and AD display unique traits in their onset, progression, and treatment responses. In particular, FTD often faces a prolonged diagnostic process and is commonly misdiagnosed with AD due to overlapping symptoms. This study utilizes a complex network model of brain electrical activity using resting-state EEG recordings to address the misdiagnosis. It compares the network organization between AD and FTD, highlighting connectivity differences and examining the significance of EEG signals across frequency bands in distinguishing AD and FTD. The publicly available EEG dataset of 36 AD and 23 FTD patients is utilized for analyses. Cross-plot transition entropy (CPTE) is employed to measure synchronization between EEG signals and construct connection matrices. CPTE offers advantages in parameter setting, computational efficiency, and robustness. The analysis reveals significantly different clustering coefficients (CC), subgraph centrality (SC), and eigenvector centrality (EC) between the two groups. FTD shows higher connectivity, particularly in delta, theta, and gamma bands, owing to lower neurodegeneration. The CPTE-based network parameters effectively classify the two groups with an accuracy of 87.58\%, with the gamma band demonstrating the highest accuracy of 92.87%. Consequently, CPTE-based, complex network analysis of EEG data from AD and FTD patients reveals significant differences in brain network organization. The approach shows potential for identifying unique characteristics and providing insights into the underlying pathophysiological processes of the various forms of dementia, thereby assisting in accurate diagnosis and treatment. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2408.10816 [pdf, other]

Deep Learning-based Classification of Dementia using Image Representation of Subcortical Signals

Authors: Shivani Ranjan, Ayush Tripathi, Harshal Shende, Robin Badal, Amit Kumar, Pramod Yadav, Deepak Joshi, Lalan Kumar

Abstract: Dementia is a neurological syndrome marked by cognitive decline. Alzheimer's disease (AD) and Frontotemporal dementia (FTD) are the common forms of dementia, each with distinct progression patterns. EEG, a non-invasive tool for recording brain activity, has shown potential in distinguishing AD from FTD and mild cognitive impairment (MCI). Previous studies have utilized various EEG features, such a… ▽ More Dementia is a neurological syndrome marked by cognitive decline. Alzheimer's disease (AD) and Frontotemporal dementia (FTD) are the common forms of dementia, each with distinct progression patterns. EEG, a non-invasive tool for recording brain activity, has shown potential in distinguishing AD from FTD and mild cognitive impairment (MCI). Previous studies have utilized various EEG features, such as subband power and connectivity patterns to differentiate these conditions. However, artifacts in EEG signals can obscure crucial information, necessitating advanced signal processing techniques. This study aims to develop a deep learning-based classification system for dementia by analyzing scout time-series signals from deep brain regions, specifically the hippocampus, amygdala, and thalamus. The study utilizes scout time series extracted via the standardized low-resolution brain electromagnetic tomography (sLORETA) technique. The time series is converted to image representations using continuous wavelet transform (CWT) and fed as input to deep learning models. Two high-density EEG datasets are utilized to check for the efficacy of the proposed method: the online BrainLat dataset (comprising AD, FTD, and healthy controls (HC)) and the in-house IITD-AIIA dataset (including subjects with AD, MCI, and HC). Different classification strategies and classifier combinations have been utilized for the accurate mapping of classes on both datasets. The best results were achieved by using a product of probabilities from classifiers for left and right subcortical regions in conjunction with the DenseNet model architecture. It yields accuracies of 94.17$\%$ and 77.72$\%$ on the BrainLat and IITD-AIIA datasets, respectively. This highlights the potential of this approach for early and accurate differentiation of neurodegenerative disorders. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2406.09443 [pdf, other]

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Authors: Satyam Kumar, Sai Srujana Buddi, Utkarsh Oggy Sarawgi, Vineet Garg, Shivesh Ranjan, Ognjen, Rudovic, Ahmed Hussen Abdelaziz, Saurabh Adya

Abstract: Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection… ▽ More Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection (PVAD) systems to assess their real-world effectiveness. We introduce a comprehensive approach to assess PVAD systems, incorporating various performance metrics such as frame-level and utterance-level error rates, detection latency and accuracy, alongside user-level analysis. Through extensive experimentation and evaluation, we provide a thorough understanding of the strengths and limitations of various PVAD variants. This paper advances the understanding of PVAD technology by offering insights into its efficacy and viability in practical applications using a comprehensive set of metrics. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2302.04577 [pdf, other]

Incorporating Total Variation Regularization in the design of an intelligent Query by Humming system

Authors: Shivangi Ranjan, Vishal Srivastava

Abstract: A Query-By-Humming (QBH) system constitutes a particular case of music information retrieval where the input is a user-hummed melody and the output is the original song which contains that melody. A typical QBH system consists of melody extraction and candidate melody retrieval. For melody extraction, accurate note transcription is the key enabling technology. However, current transcription meth… ▽ More A Query-By-Humming (QBH) system constitutes a particular case of music information retrieval where the input is a user-hummed melody and the output is the original song which contains that melody. A typical QBH system consists of melody extraction and candidate melody retrieval. For melody extraction, accurate note transcription is the key enabling technology. However, current transcription methods are unable to definitively capture the melody and address inaccuracies in user-hummed queries. In this paper, we incorporate Total Variation Regularization (TVR) to denoise queries. This approach accounts for user error in humming without loss of meaningful data and reliably captures the underlying melody. For candidate melody retrieval, we employ a deep learning approach to time series classification using a Fully Convolutional Neural Network. The trained network classifies the incoming query as belonging to one of the target songs. For our experiments, we use Roger Jang's MIR-QBSH dataset which is the standard MIREX dataset. We demonstrate that inclusion of TVR denoised queries in the training set enhances the overall accuracy of the system to 93% which is higher than other state-of-the-art QBH systems. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2204.02455 [pdf, other]

Improving Voice Trigger Detection with Metric Learning

Authors: Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

Abstract: Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented… ▽ More Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented groups, such as accented speakers. In this work, we propose a novel voice trigger detector that can use a small number of utterances from a target speaker to improve detection accuracy. Our proposed model employs an encoder-decoder architecture. While the encoder performs speaker independent voice trigger detection, similar to the conventional detector, the decoder predicts a personalized embedding for each utterance. A personalized voice trigger score is then obtained as a similarity score between the embeddings of enrollment utterances and a test utterance. The personalized embedding allows adapting to target speaker's speech when computing the voice trigger score, hence improving voice trigger detection accuracy. Experimental results show that the proposed approach achieves a 38% relative reduction in a false rejection rate (FRR) compared to a baseline speaker independent voice trigger model. △ Less

Submitted 13 September, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: Accepted at InterSpeech 2022

arXiv:2011.05186 [pdf, other]

Pristine annotations-based multi-modal trained artificial intelligence solution to triage chest X-ray for COVID-19

Authors: Tao Tan, Bipul Das, Ravi Soni, Mate Fejes, Sohan Ranjan, Daniel Attila Szabo, Vikram Melapudi, K S Shriram, Utkarsh Agrawal, Laszlo Rusko, Zita Herczeg, Barbara Darazs, Pal Tegzes, Lehel Ferenczi, Rakesh Mullick, Gopal Avinash

Abstract: The COVID-19 pandemic continues to spread and impact the well-being of the global population. The front-line modalities including computed tomography (CT) and X-ray play an important role for triaging COVID patients. Considering the limited access of resources (both hardware and trained personnel) and decontamination considerations, CT may not be ideal for triaging suspected subjects. Artificial i… ▽ More The COVID-19 pandemic continues to spread and impact the well-being of the global population. The front-line modalities including computed tomography (CT) and X-ray play an important role for triaging COVID patients. Considering the limited access of resources (both hardware and trained personnel) and decontamination considerations, CT may not be ideal for triaging suspected subjects. Artificial intelligence (AI) assisted X-ray based applications for triaging and monitoring require experienced radiologists to identify COVID patients in a timely manner and to further delineate the disease region boundary are seen as a promising solution. Our proposed solution differs from existing solutions by industry and academic communities, and demonstrates a functional AI model to triage by inferencing using a single x-ray image, while the deep-learning model is trained using both X-ray and CT data. We report on how such a multi-modal training improves the solution compared to X-ray only training. The multi-modal solution increases the AUC (area under the receiver operating characteristic curve) from 0.89 to 0.93 and also positively impacts the Dice coefficient (0.59 to 0.62) for localizing the pathology. To the best our knowledge, it is the first X-ray solution by leveraging multi-modal information for the development. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:1904.07386 [pdf, other]

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: 5 pages

arXiv:1903.12149 [pdf]

Performance and Energy Conservation of 3GPP IFOM Protocol for Dual Connectivity in Heterogeneous LTE-WLAN Network

Authors: Shubhada Gadgil, Shashi Ranjan, Abhay Karandikar

Abstract: For the 5th Generation (5G) networks, Third Generation Partnership Project (3GPP) is considering standardization of various solutions for traffic aggregation using licensed and unlicensed spectrum, to meet the rising data demands. IP Flow Mobility (IFOM) is a multi access connectivity solution/protocol standardized by the Internet Engineering Task force (IETF) and 3GPP in Release 10. It enables co… ▽ More For the 5th Generation (5G) networks, Third Generation Partnership Project (3GPP) is considering standardization of various solutions for traffic aggregation using licensed and unlicensed spectrum, to meet the rising data demands. IP Flow Mobility (IFOM) is a multi access connectivity solution/protocol standardized by the Internet Engineering Task force (IETF) and 3GPP in Release 10. It enables concurrent access for an User Equipment (UE) to Heterogeneous Networks (HetNets) such as Long Term Evolution (LTE) and IEEE 802.11 Wireless Local Area Network (WLAN). IFOM enabled UEs have multiple interfaces to connect to HetNets. They can have concurrent flows with different traffic types over these networks and can seamlessly switch the flows from one network to the other. In this paper, we focus on two objectives. First is to investigate the performance parameters e.g. throughput, latency, tunnelling overhead, packet loss, energy cost etc. of IFOM enabled UEs (IeUs) in HetNets of LTE and WLAN. We have proposed a novel mechanism to maximize the throughput of IeUs achieving a significant throughput gain with low latency for the IeUs. We have explored further and observed a throughput energy trade off for low data rate flows. To address this, we also propose a smart energy efficient and throughput optimization algorithm for the IeUs, resulting in a substantial reduction in energy cost, while maintaining the high throughput at lower latency and satisfying the Quality of Service (QoS) requirements of the IeUs. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Comments: 12 pages, 15 figures, journal

Showing 1–8 of 8 results for author: Ranjan, S