Skip to main content

Showing 1–50 of 59 results for author: Chowdhury, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.12616  [pdf, other

    eess.SY

    Equivalent-Circuit Thermal Model for Batteries with One-Shot Parameter Identification

    Authors: Myisha A. Chowdhury, Qiugang Lu

    Abstract: Accurate state of temperature (SOT) estimation for batteries is crucial for regulating their temperature within a desired range to ensure safe operation and optimal performance. The existing measurement-based methods often generate noisy signals and cannot scale up for large-scale battery packs. The electrochemical model-based methods, on the contrary, offer high accuracy but are computationally e… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  2. arXiv:2503.12258  [pdf, other

    eess.SY

    Lithium-ion Battery Capacity Prediction via Conditional Recurrent Generative Adversarial Network-based Time-Series Regeneration

    Authors: Myisha A. Chowdhury, Gift Modekwe, Qiugang Lu

    Abstract: Accurate capacity prediction is essential for the safe and reliable operation of batteries by anticipating potential failures beforehand. The performance of state-of-the-art capacity prediction methods is significantly hindered by the limited availability of training data, primarily attributed to the expensive experimentation and data sharing restrictions. To tackle this issue, this paper presents… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: 7 pages, 6 figures

  3. arXiv:2411.19549  [pdf, other

    eess.IV cs.CV cs.LG

    Contextual Checkerboard Denoise -- A Novel Neural Network-Based Approach for Classification-Aware OCT Image Denoising

    Authors: Md. Touhidul Islam, Md. Abtahi M. Chowdhury, Sumaiya Salekin, Aye T. Maung, Akil A. Taki, Hafiz Imtiaz

    Abstract: In contrast to non-medical image denoising, where enhancing image clarity is the primary goal, medical image denoising warrants preservation of crucial features without introduction of new artifacts. However, many denoising methods that improve the clarity of the image, inadvertently alter critical information of the denoised images, potentially compromising classification performance and diagnost… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

    Comments: Under review in Springer Journal of Medical Systems. Code available: https://github.com/AbtahiMajeed/CheckerBoardDenoiser/tree/main

  4. arXiv:2409.01962  [pdf, other

    eess.SP cs.CV cs.HC cs.LG

    AttDiCNN: Attentive Dilated Convolutional Neural Network for Automatic Sleep Staging using Visibility Graph and Force-directed Layout

    Authors: Md Jobayer, Md. Mehedi Hasan Shawon, Tasfin Mahmud, Md. Borhan Uddin Antor, Arshad M. Chowdhury

    Abstract: Sleep stages play an essential role in the identification of sleep patterns and the diagnosis of sleep disorders. In this study, we present an automated sleep stage classifier termed the Attentive Dilated Convolutional Neural Network (AttDiCNN), which uses deep learning methodologies to address challenges related to data heterogeneity, computational complexity, and reliable automatic sleep staging… ▽ More

    Submitted 21 August, 2024; originally announced September 2024.

    Comments: In review to IEEEtrans NNLS; 15-pages main paper and 3-pages supplementary material

  5. arXiv:2408.14927  [pdf, other

    eess.IV cs.CV

    Automatic Detection of COVID-19 from Chest X-ray Images Using Deep Learning Model

    Authors: Alloy Das, Rohit Agarwal, Rituparna Singh, Arindam Chowdhury, Debashis Nandi

    Abstract: The infectious disease caused by novel corona virus (2019-nCoV) has been widely spreading since last year and has shaken the entire world. It has caused an unprecedented effect on daily life, global economy and public health. Hence this disease detection has life-saving importance for both patients as well as doctors. Due to limited test kits, it is also a daunting task to test every patient with… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted in AIP Conference Proceedings (Vol. 2424, No. 1)

  6. arXiv:2408.02430  [pdf, other

    eess.AS

    Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic

    Authors: Yassine El Kheir, Hamdy Mubarak, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: This paper presents a novel Dialectal Sound and Vowelization Recovery framework, designed to recognize borrowed and dialectal sounds within phonologically diverse and dialect-rich languages, that extends beyond its standard orthographic sound sets. The proposed framework utilized a quantized sequence of input with(out) continuous pretrained self-supervised representation. We show the efficacy of t… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted ACL 2024 Main Conference

  7. arXiv:2406.17124  [pdf, other

    cs.SD cs.LG eess.AS

    Investigating Confidence Estimation Measures for Speaker Diarization

    Authors: Anurag Chowdhury, Abhinav Misra, Mark C. Fuhs, Monika Woszczyna

    Abstract: Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise, and overlapping speech. These errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity, such as speaker-adapted speec… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted in INTERSPEECH 2024

  8. arXiv:2406.16099  [pdf, other

    cs.SD eess.AS

    Speech Representation Analysis based on Inter- and Intra-Model Similarities

    Authors: Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model similarity, independent of any external annotation an… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 5 pages, Accepted to appear in ICASSP XAI-SA Workshop

  9. arXiv:2406.13431  [pdf, other

    cs.CL cs.SD eess.AS

    Children's Speech Recognition through Discrete Token Enhancement

    Authors: Vrunda N. Sukhadia, Shammur Absar Chowdhury

    Abstract: Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information but capture both linguistic and acoustic information… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  10. arXiv:2406.12309  [pdf, ps, other

    eess.SY

    Adaptive Safe Reinforcement Learning-Enabled Optimization of Battery Fast-Charging Protocols

    Authors: Myisha A. Chowdhury, Saif S. S. Al-Wahaibi, Qiugang Lu

    Abstract: Optimizing charging protocols is critical for reducing battery charging time and decelerating battery degradation in applications such as electric vehicles. Recently, reinforcement learning (RL) methods have been adopted for such purposes. However, RL-based methods may not ensure system (safety) constraints, which can cause irreversible damages to batteries and reduce their lifetime. To this end,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  11. arXiv:2406.08914  [pdf, other

    cs.SD cs.LG eess.AS

    Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition

    Authors: William Ravenscroft, George Close, Stefan Goetze, Thomas Hain, Mohammad Soleymanpour, Anurag Chowdhury, Mark C. Fuhs

    Abstract: One solution to automatic speech recognition (ASR) of overlapping speakers is to separate speech and then perform ASR on the separated signals. Commonly, the separator produces artefacts which often degrade ASR performance. Addressing this issue typically requires reference transcriptions to jointly train the separation and ASR networks. This is often not viable for training on real-world in-domai… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 Figures, 3 Tables, Accepted for Interspeech 2024

  12. arXiv:2404.03606  [pdf, other

    cs.SD cs.AI cs.IR eess.AS

    Analyzing Musical Characteristics of National Anthems in Relation to Global Indices

    Authors: S M Rakib Hasan, Aakar Dhakal, Ms. Ayesha Siddiqua, Mohammad Mominur Rahman, Md Maidul Islam, Mohammed Arfat Raihan Chowdhury, S M Masfequier Rahman Swapno, SM Nuruzzaman Nobel

    Abstract: Music plays a huge part in shaping peoples' psychology and behavioral patterns. This paper investigates the connection between national anthems and different global indices with computational music analysis and statistical correlation analysis. We analyze national anthem musical data to determine whether certain musical characteristics are associated with peace, happiness, suicide rate, crime rate… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  13. arXiv:2401.10297  [pdf, other

    eess.SP cs.LG cs.NI

    Learning Non-myopic Power Allocation in Constrained Scenarios

    Authors: Arindam Chowdhury, Santiago Paternain, Gunjan Verma, Ananthram Swami, Santiago Segarra

    Abstract: We propose a learning-based framework for efficient power allocation in ad hoc interference networks under episodic constraints. The problem of optimal power allocation -- for maximizing a given network utility metric -- under instantaneous constraints has recently gained significant popularity. Several learnable algorithms have been proposed to obtain fast, effective, and near-optimal performance… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: ASILOMAR 2023

  14. arXiv:2310.13974  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Pronunciation Assessment -- A Review

    Authors: Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

    Abstract: Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challeng… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 9 pages, accepted to EMNLP Findings

  15. arXiv:2309.07739  [pdf, other

    cs.CL cs.SD eess.AS

    The complementary roles of non-verbal cues for Robust Pronunciation Assessment

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verba… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, submitted to ICASSP 2024

  16. arXiv:2309.07719  [pdf, other

    cs.CL cs.SD eess.AS

    L1-aware Multilingual Mispronunciation Detection Framework

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechani… ▽ More

    Submitted 21 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 papers, submitted to ICASSP 2024

  17. arXiv:2309.02404  [pdf, other

    cs.SD cs.CV eess.AS

    Voice Morphing: Two Identities in One Voice

    Authors: Sushanta K. Pani, Anurag Chowdhury, Morgan Sandler, Arun Ross

    Abstract: In a biometric system, each biometric sample or template is typically associated with a single identity. However, recent research has demonstrated the possibility of generating "morph" biometric samples that can successfully match more than a single identity. Morph attacks are now recognized as a potential security threat to biometric systems. However, most morph attacks have been studied on biome… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted oral paper at BIOSIG 2023

  18. arXiv:2309.01481  [pdf, ps, other

    eess.SP

    Half-Duplex APs with Dynamic TDD vs. Full-Duplex APs in Cell-Free Systems

    Authors: Anubhab Chowdhury, Chandra R. Murthy

    Abstract: In this paper, we present a comparative study of half-duplex (HD) access points (APs) with dynamic time-division duplex (DTDD) and full-duplex (FD) APs in cell-free (CF) systems. Although both DTDD and FD CF systems support concurrent downlink (DL) transmission and uplink (UL) reception capability, the sum spectral efficiency (SE) is limited by various cross-link interferences. We first present a… ▽ More

    Submitted 31 January, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: 18 pages, 13 Figures, accepted in IEEE Transactions on Communications, Jan. 2024

  19. arXiv:2308.02503  [pdf, other

    eess.AS cs.CL cs.SD

    MyVoice: Arabic Speech Resource Collaboration Platform

    Authors: Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and… ▽ More

    Submitted 23 July, 2023; originally announced August 2023.

    Comments: 2 pages, accepted at InterSpeech23 Show and Tell Session

  20. arXiv:2308.02405  [pdf, other

    eess.SP eess.IV

    Development Of Automated Cardiac Arrhythmia Detection Methods Using Single Channel ECG Signal

    Authors: Arpita Paul, Avik Kumar Das, Manas Rakshit, Ankita Ray Chowdhury, Susmita Saha, Hrishin Roy, Sajal Sarkar, Dongiri Prasanth, Eravelli Saicharan

    Abstract: Arrhythmia, an abnormal cardiac rhythm, is one of the most common types of cardiac disease. Automatic detection and classification of arrhythmia can be significant in reducing deaths due to cardiac diseases. This work proposes a multi-class arrhythmia detection algorithm using single channel electrocardiogram (ECG) signal. In this work, heart rate variability (HRV) along with morphological feature… ▽ More

    Submitted 23 July, 2023; originally announced August 2023.

    Comments: 17 pages, 7 figures

  21. arXiv:2306.12913  [pdf, other

    eess.AS cs.CL cs.SD

    Implicit spoken language diarization

    Authors: Jagabandhu Mishra, Amartya Chowdhury, S. R. Mahadeva Prasanna

    Abstract: Spoken language diarization (LD) and related tasks are mostly explored using the phonotactic approach. Phonotactic approaches mostly use explicit way of language modeling, hence requiring intermediate phoneme modeling and transcribed data. Alternatively, the ability of deep learning approaches to model temporal dynamics may help for the implicit modeling of language information through deep embedd… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  22. arXiv:2306.01894  [pdf

    eess.SP

    Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks

    Authors: Rashed Hasan Ratul, S M Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, Mirza Muntasir Nishat

    Abstract: Over the past few decades, the development of cellular communication technology has spanned several generations in order to add sophisticated features in the updated versions. Moreover, different high-frequency bands are considered for advanced cellular generations. The presence of updated generations like 4G and 5G is driven by the rising demand for a greater data rate and a better experience for… ▽ More

    Submitted 27 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted for presentation at THE 14th INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT)

  23. arXiv:2306.01845  [pdf, other

    cs.SD eess.AS

    Multi-View Multi-Task Representation Learning for Mispronunciation Detection

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by auxiliary tasks to learn more distinctive phoneti… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 5 pages, Accepted SLaTE23

  24. arXiv:2305.07790  [pdf

    cond-mat.mtrl-sci cs.CV eess.IV

    Automated Grain Boundary (GB) Segmentation and Microstructural Analysis in 347H Stainless Steel Using Deep Learning and Multimodal Microscopy

    Authors: Shoieb Ahmed Chowdhury, M. F. N. Taufique, Jing Wang, Marissa Masden, Madison Wenzlick, Ram Devanathan, Alan L Schemer-Kohrn, Keerti S Kappagantula

    Abstract: Austenitic 347H stainless steel offers superior mechanical properties and corrosion resistance required for extreme operating conditions such as high temperature. The change in microstructure due to composition and process variations is expected to impact material properties. Identifying microstructural features such as grain boundaries thus becomes an important task in the process-microstructure-… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  25. arXiv:2305.07445  [pdf, other

    eess.AS cs.CL cs.SD

    QVoice: Arabic Speech Pronunciation Learning Application

    Authors: Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy Mubarak, Shazia Afzal, Ahmed Ali

    Abstract: This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module. The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills, while also helping native speakers mitigate any potential influence from regional dialects on their Modern Standard Arabic (MSA) pr… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 2 pages, Accepted InterSpeech23 Show & Tell Demo Session

    Journal ref: InterSpeech 2023

  26. arXiv:2304.00649  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multilingual Word Error Rate Estimation: e-WER3

    Authors: Shammur Absar Chowdhury, Ahmed Ali

    Abstract: The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework -- eWER3 -- jointly trained on acoustic and lexical re… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted in ICASSP, Multilingual WER estimation, End-to-End systems, multilingual model, automatic word error rate estimation

  27. arXiv:2304.00446  [pdf, other

    eess.SP cs.LG

    Deep Graph Unfolding for Beamforming in MU-MIMO Interference Networks

    Authors: Arindam Chowdhury, Gunjan Verma, Ananthram Swami, Santiago Segarra

    Abstract: We develop an efficient and near-optimal solution for beamforming in multi-user multiple-input-multiple-output single-hop wireless ad-hoc interference networks. Inspired by the weighted minimum mean squared error (WMMSE) method, a classical approach to solving this problem, and the principle of algorithm unfolding, we present unfolded WMMSE (UWMMSE) for MU-MIMO. This method learns a parameterized… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

    Comments: Under review at IEEE Trans. in Wireless Comm

  28. Transthoracic super-resolution ultrasound localisation microscopy of myocardial vasculature in patients

    Authors: Jipeng Yan, Biao Huang, Johanna Tonko, Matthieu Toulemonde, Joseph Hansen-Shearer, Qingyuan Tan, Kai Riemer, Konstantinos Ntagiantas, Rasheda A Chowdhury, Pier Lambiase, Roxy Senior, Meng-Xing Tang

    Abstract: Micro-vascular flow in the myocardium is of significant importance clinically but remains poorly understood. Up to 25% of patients with symptoms of coronary heart diseases have no obstructive coronary arteries and have suspected microvascular diseases. However, such microvasculature is difficult to image in vivo with existing modalities due to the lack of resolution and sensitivity. Here, we demon… ▽ More

    Submitted 28 March, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: 22 pages, 10 figures

  29. arXiv:2302.02477  [pdf, other

    cs.LG eess.SP q-bio.QM

    Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment

    Authors: Qitong Gao, Stephen L. Schimdt, Afsana Chowdhury, Guangyu Feng, Jennifer J. Peters, Katherine Genty, Warren M. Grill, Dennis A. Turner, Miroslav Pajic

    Abstract: Deep brain stimulation (DBS) has shown great promise toward treating motor symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to the Basal Ganglia (BG) region of the brain. However, DBS devices approved by the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS (cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces battery lifet… ▽ More

    Submitted 15 March, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Accepted to International Conference on Cyber Physical Systems (ICCPS) 2023

  30. arXiv:2211.00923  [pdf, other

    cs.SD cs.CL eess.AS

    SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

    Authors: Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali, Hamdy Mubarak, Shazia Afzal

    Abstract: The lack of labeled second language (L2) speech data is a major challenge in designing mispronunciation detection models. We introduce SpeechBlender - a fine-grained data augmentation pipeline for generating mispronunciation errors to overcome such data scarcity. The SpeechBlender utilizes varieties of masks to target different regions of phonetic units, and use the mixing factors to linearly inte… ▽ More

    Submitted 12 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: 5 pages

  31. arXiv:2210.02381  [pdf, other

    eess.SY

    A Novel Entropy-Maximizing TD3-based Reinforcement Learning for Automatic PID Tuning

    Authors: Myisha A. Chowdhury, Qiugang Lu

    Abstract: Proportional-integral-derivative (PID) controllers have been widely used in the process industry. However, the satisfactory control performance of a PID controller depends strongly on the tuning parameters. Conventional PID tuning methods require extensive knowledge of the system model, which is not always known especially in the case of complex dynamical systems. In contrast, reinforcement learni… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: 6 pages, 7 figures

  32. arXiv:2206.08835  [pdf, other

    cs.CL cs.SD eess.AS

    What can Speech and Language Tell us About the Working Alliance in Psychotherapy

    Authors: Sebastian P. Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury, Tommaso Ciulli, Morena Danieli, Korbinian Riedhammer, Giuseppe Riccardi

    Abstract: We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Alliance Inventory Observer-rated Shortened - a 12 item… ▽ More

    Submitted 27 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted at Interspeech 2022

  33. arXiv:2205.15543  [pdf, other

    q-bio.QM cs.CV eess.IV

    AI-based automated Meibomian gland segmentation, classification and reflection correction in infrared Meibography

    Authors: Ripon Kumar Saha, A. M. Mahmud Chowdhury, Kyung-Sun Na, Gyu Deok Hwang, Youngsub Eom, Jaeyoung Kim, Hae-Gon Jeon, Ho Sik Hwang, Euiheon Chung

    Abstract: Purpose: Develop a deep learning-based automated method to segment meibomian glands (MG) and eyelids, quantitatively analyze the MG area and MG ratio, estimate the meiboscore, and remove specular reflections from infrared images. Methods: A total of 1600 meibography images were captured in a clinical setting. 1000 images were precisely annotated with multiple revisions by investigators and graded… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: 11 pages, 13 Figures, 5 Supplementary Figures

  34. arXiv:2201.02550  [pdf, other

    cs.CL cs.SD eess.AS

    Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

    Authors: Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur

    Abstract: The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with mono… ▽ More

    Submitted 11 January, 2023; v1 submitted 7 January, 2022; originally announced January 2022.

  35. arXiv:2112.15187  [pdf, other

    eess.SY

    Stability-Preserving Automatic Tuning of PID Control with Reinforcement Learning

    Authors: Ayub I. Lakhani, Myisha A. Chowdhury, Qiugang Lu

    Abstract: PID control has been the dominant control strategy in the process industry due to its simplicity in design and effectiveness in controlling a wide range of processes. However, traditional methods on PID tuning often require extensive domain knowledge and field experience. To address the issue, this work proposes an automatic PID tuning framework based on reinforcement learning (RL), particularly t… ▽ More

    Submitted 11 February, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

    Comments: 9 figures, 3 table, 18 pages

  36. arXiv:2110.11292  [pdf, other

    cs.LG cs.AI eess.SY

    OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

    Authors: Animesh Basak Chowdhury, Benjamin Tan, Ramesh Karri, Siddharth Garg

    Abstract: Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and g… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: 18 pages

  37. arXiv:2110.09968  [pdf, ps, other

    eess.SP cs.IT

    Can Dynamic TDD Enabled Half-Duplex Cell-Free Massive MIMO Outperform Full-Duplex Cellular Massive MIMO?

    Authors: Anubhab Chowdhury, Ribhu Chopra, Chandra R. Murthy

    Abstract: We consider a dynamic time division duplex (DTDD) enabled cell-free massive multiple-input multiple-output (CF-mMIMO) system, where each half-duplex (HD) access point (AP) is scheduled to operate in the uplink (UL) or downlink (DL) mode based on the data demands of the user equipments (UEs), with the goal of maximizing the sum UL-DL spectral efficiency (SE). We develop a new, low complexity, greed… ▽ More

    Submitted 21 May, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted, IEEE Transactions on Communications

    Journal ref: IEEE Transactions on Communications, May, 2022

  38. arXiv:2110.07471  [pdf, other

    eess.SP cs.LG

    Stability Analysis of Unfolded WMMSE for Power Allocation

    Authors: Arindam Chowdhury, Fernando Gama, Santiago Segarra

    Abstract: Power allocation is one of the fundamental problems in wireless networks and a wide variety of algorithms address this problem from different perspectives. A common element among these algorithms is that they rely on an estimation of the channel state, which may be inaccurate on account of hardware defects, noisy feedback systems, and environmental and adversarial disturbances. Therefore, it is es… ▽ More

    Submitted 9 January, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Under review at IEEE ICASSP 2022

  39. arXiv:2110.03763  [pdf, other

    cs.LG eess.SP

    Label Propagation across Graphs: Node Classification using Graph Neural Tangent Kernels

    Authors: Artun Bayer, Arindam Chowdhury, Santiago Segarra

    Abstract: Graph neural networks (GNNs) have achieved superior performance on node classification tasks in the last few years. Commonly, this is framed in a transductive semi-supervised learning setup wherein the entire graph, including the target nodes to be labeled, is available for training. Driven in part by scalability, recent works have focused on the inductive case where only the labeled portion of a… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Under review at IEEE ICASSP 2022

  40. Moving Object Detection for Event-based vision using Graph Spectral Clustering

    Authors: Anindya Mondal, Shashant R, Jhony H. Giraldo, Thierry Bouwmans, Ananda S. Chowdhury

    Abstract: Moving object detection has been a central topic of discussion in computer vision for its wide range of applications like in self-driving cars, video surveillance, security, and enforcement. Neuromorphic Vision Sensors (NVS) are bio-inspired sensors that mimic the working of the human eye. Unlike conventional frame-based cameras, these sensors capture a stream of asynchronous 'events' that pose mu… ▽ More

    Submitted 2 December, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: Ten pages, five figures, Published in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada

  41. arXiv:2109.06992  [pdf, other

    cs.IT eess.SP

    ML-aided power allocation for Tactical MIMO

    Authors: Arindam Chowdhury, Gunjan Verma, Chirag Rao, Ananthram Swami, Santiago Segarra

    Abstract: We study the problem of optimal power allocation in single-hop multi-antenna ad-hoc wireless networks. A standard technique to solve this problem involves optimizing a tri-convex function under power constraints using a block-coordinate-descent based iterative algorithm. This approach, termed WMMSE, tends to be computationally complex and time consuming. Several learning-based approaches have been… ▽ More

    Submitted 28 October, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted at MILCOM 2021

  42. arXiv:2107.00439  [pdf, other

    cs.CL cs.SD eess.AS

    What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

    Authors: Shammur Absar Chowdhury, Nadir Durrani, Ahmed Ali

    Abstract: Deep neural networks are inherently opaque and challenging to interpret. Unlike hand-crafted feature-based models, we struggle to comprehend the concepts learned and how they interact within these models. This understanding is crucial not only for debugging purposes but also for ensuring fairness in ethical decision-making. In our study, we conduct a post-hoc functional interpretability analysis o… ▽ More

    Submitted 10 July, 2023; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: Accepted in CSL journal. Keywords: Speech, Neuron Analysis, Interpretibility, Diagnostic Classifier, AI explainability, End-to-End Architecture

  43. arXiv:2106.13000  [pdf, other

    cs.CL cs.SD eess.AS

    QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus

    Authors: Hamdy Mubarak, Amir Hussein, Shammur Absar Chowdhury, Ahmed Ali

    Abstract: We introduce the largest transcribed Arabic speech corpus, QASR, collected from the broadcast domain. This multi-dialect speech dataset contains 2,000 hours of speech sampled at 16kHz crawled from Aljazeera news channel. The dataset is released with lightly supervised transcriptions, aligned with the audio segments. Unlike previous datasets, QASR contains linguistically motivated segmentation, pun… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Speech Corpus, Spoken Conversation, ASR, Dialect Identification, Punctuation Restoration, Speaker Verification, NER, Named Entity, Arabic, Speaker gender, Turn-taking Accepted in ACL 2021

  44. arXiv:2106.07048  [pdf

    eess.SP

    Ultrasound Classification of Breast Masses Using a Comprehensive Nakagami Imaging and Machine Learning Framework

    Authors: Ahmad Chowdhury, Rezwana R. Razzaque, Ahmad Shafiullah, Sabiq Muhtadi, Brian S. Garra, S. Kaisar Alam

    Abstract: In this study we investigate the potential of parametric images formed from ultrasound B-mode scans using the Nakagami distribution for non-invasive classification of breast lesions. Through a sliding window technique, we generated seven types of parametric images from each patient scan in our dataset using basic and as well as derived parameters of the Nakagami distribution. To determine the most… ▽ More

    Submitted 20 June, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

    Comments: 25 pages, 12 figures

  45. arXiv:2105.14779  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

    Authors: Shammur Absar Chowdhury, Amir Hussein, Ahmed Abdelali, Ahmed Ali

    Abstract: With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. We trained the system using Arabic (Ar), English (E… ▽ More

    Submitted 5 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: Accepted in INTERSPEECH 2021, Multilingual ASR, Multi-dialectal ASR, Code-Switching ASR, Arabic ASR, Conformer, Transformer, E2E ASR, Speech Recognition, ASR, Arabic, English, French

  46. arXiv:2012.05084  [pdf, other

    cs.SD cs.LG eess.AS

    DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

    Authors: Anurag Chowdhury, Arun Ross, Prabu David

    Abstract: Automatic speaker recognition algorithms typically characterize speech audio using short-term spectral features that encode the physiological and anatomical aspects of speech production. Such algorithms do not fully capitalize on speaker-dependent characteristics present in behavioral speech features. In this work, we propose a prosody encoding network called DeepTalk for extracting vocal style fe… ▽ More

    Submitted 14 February, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    Comments: Accepted in IEEE ICASSP 2021, 5 pages, 3 figures

  47. arXiv:2012.02250  [pdf, other

    eess.SP cs.LG

    Efficient power allocation using graph neural networks and deep algorithm unfolding

    Authors: Arindam Chowdhury, Gunjan Verma, Chirag Rao, Ananthram Swami, Santiago Segarra

    Abstract: We study the problem of optimal power allocation in a single-hop ad hoc wireless network. In solving this problem, we propose a hybrid neural architecture inspired by the algorithmic unfolding of the iterative weighted minimum mean squared error (WMMSE) method, that we denote as unfolded WMMSE (UWMMSE). The learnable weights within UWMMSE are parameterized using graph neural networks (GNNs), where… ▽ More

    Submitted 18 November, 2020; originally announced December 2020.

    Comments: Under review at IEEE ICASSP 2021. arXiv admin note: substantial text overlap with arXiv:2009.10812

  48. arXiv:2009.10812  [pdf, ps, other

    eess.SP

    Unfolding WMMSE using Graph Neural Networks for Efficient Power Allocation

    Authors: Arindam Chowdhury, Gunjan Verma, Chirag Rao, Ananthram Swami, Santiago Segarra

    Abstract: We study the problem of optimal power allocation in a single-hop ad hoc wireless network. In solving this problem, we depart from classical purely model-based approaches and propose a hybrid method that retains key modeling elements in conjunction with data-driven components. More precisely, we put forth a neural network architecture inspired by the algorithmic unfolding of the iterative weighted… ▽ More

    Submitted 8 April, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: Accepted at IEEE Transactions on Wireless Communications

  49. arXiv:2008.11668  [pdf, other

    eess.AS cs.LG

    DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Non-ideal Audio Signals

    Authors: Anurag Chowdhury, Arun Ross

    Abstract: Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. However, it has been observed that the features extracted using these filterbanks are not resilient to diverse audio degradations. In this work, we propose a deep learning-based technique to deduce the filterbank design from vast amounts… ▽ More

    Submitted 12 June, 2022; v1 submitted 26 August, 2020; originally announced August 2020.

  50. arXiv:2008.09866  [pdf, other

    cs.CV eess.IV

    Symbolic Semantic Segmentation and Interpretation of COVID-19 Lung Infections in Chest CT volumes based on Emergent Languages

    Authors: Aritra Chowdhury, Alberto Santamaria-Pang, James R. Kubricht, Jianwei Qiu, Peter Tu

    Abstract: The coronavirus disease (COVID-19) has resulted in a pandemic crippling the a breadth of services critical to daily life. Segmentation of lung infections in computerized tomography (CT) slices could be be used to improve diagnosis and understanding of COVID-19 in patients. Deep learning systems lack interpretability because of their black box nature. Inspired by human communication of complex idea… ▽ More

    Submitted 22 August, 2020; originally announced August 2020.