-
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Authors:
Yeona Hong,
Hyewon Han,
Woo-jin Chung,
Hong-Goo Kang
Abstract:
In this paper, we propose StableQuant, a novel adaptive post-training quantization (PTQ) algorithm for widely used speech foundation models (SFMs). While PTQ has been successfully employed for compressing large language models (LLMs) due to its ability to bypass additional fine-tuning, directly applying these techniques to SFMs may not yield optimal results, as SFMs utilize distinct network archit…
▽ More
In this paper, we propose StableQuant, a novel adaptive post-training quantization (PTQ) algorithm for widely used speech foundation models (SFMs). While PTQ has been successfully employed for compressing large language models (LLMs) due to its ability to bypass additional fine-tuning, directly applying these techniques to SFMs may not yield optimal results, as SFMs utilize distinct network architecture for feature extraction. StableQuant demonstrates optimal quantization performance regardless of the network architecture type, as it adaptively determines the quantization range for each layer by analyzing both the scale distributions and overall performance. We evaluate our algorithm on two SFMs, HuBERT and wav2vec2.0, for an automatic speech recognition (ASR) task, and achieve superior performance compared to traditional PTQ methods. StableQuant successfully reduces the sizes of SFM models to a quarter and doubles the inference speed while limiting the word error rate (WER) performance drop to less than 0.3% with 8-bit quantization.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
NeuroAMP: A Novel End-to-end General Purpose Deep Neural Amplifier for Personalized Hearing Aids
Authors:
Shafique Ahmed,
Ryandhimas E. Zezario,
Hui-Guan Yuan,
Amir Hussain,
Hsin-Min Wang,
Wei-Ho Chung,
Yu Tsao
Abstract:
The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral fea…
▽ More
The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral features and the listener's audiogram as inputs, and we investigate four architectures: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Recurrent Neural Network (CRNN), and Transformer. We also introduce Denoising NeuroAMP, an extension that integrates noise reduction along with amplification capabilities for improved performance in real-world scenarios. To enhance generalization, a comprehensive data augmentation strategy was employed during training on diverse speech (TIMIT and TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) demonstrates that the Transformer architecture within NeuroAMP achieves the best performance, with SRCC scores of 0.9927 (HASQI) and 0.9905 (HASPI) on TIMIT, and 0.9738 (HAAQI) on the Cadenza Challenge MUSIC dataset. Notably, our data augmentation strategy maintains high performance on unseen datasets (e.g., VCTK, MUSDB18-HQ). Furthermore, Denoising NeuroAMP outperforms both the conventional NAL-R+WDRC approach and a two-stage baseline on the VoiceBank+DEMAND dataset, achieving a 10% improvement in both HASPI (0.90) and HASQI (0.59) scores. These results highlight the potential of NeuroAMP and Denoising NeuroAMP to deliver notable improvements in personalized hearing aid amplification.
△ Less
Submitted 15 February, 2025;
originally announced February 2025.
-
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Authors:
Sangmin Lee,
Woo-Jin Chung,
Hong-Goo Kang
Abstract:
Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while ma…
▽ More
Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while matching the performance of state-of-the-art models trained on a minimal amount of data. Our pipeline consists of two key steps. First, we utilize a universal transcription generator to unify orthographic features into Romanized form and capture common phonetic characteristics across diverse languages. Second, we utilize a universal converter to transform these universal transcriptions into language-specific ones. In experiments, we demonstrate the effectiveness of our proposed method leveraging universal transcriptions for massively multilingual ASR. Our pipeline achieves a relative error reduction rate of 45% when compared to Whisper and performs comparably to MMS, despite being trained on only 0.1% of Whisper's training data. Furthermore, our pipeline does not rely on any language-specific modules. However, it performs on par with zero-shot ASR approaches which utilize additional language-specific lexicons and language models. We expect this framework to serve as a cornerstone for flexible multilingual ASR systems that are generalizable even to unseen languages.
△ Less
Submitted 22 December, 2024; v1 submitted 19 December, 2024;
originally announced December 2024.
-
OneNet: A Channel-Wise 1D Convolutional U-Net
Authors:
Sanghyun Byun,
Kayvan Shah,
Ayushi Gang,
Christopher Apton,
Jacob Song,
Woo Seong Chung
Abstract:
Many state-of-the-art computer vision architectures leverage U-Net for its adaptability and efficient feature extraction. However, the multi-resolution convolutional design often leads to significant computational demands, limiting deployment on edge devices. We present a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications…
▽ More
Many state-of-the-art computer vision architectures leverage U-Net for its adaptability and efficient feature extraction. However, the multi-resolution convolutional design often leads to significant computational demands, limiting deployment on edge devices. We present a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications. Our novel encoder architecture achieves semantic segmentation through channel-wise 1D convolutions combined with pixel-unshuffle operations. By incorporating PixelShuffle, known for improving accuracy in super-resolution tasks while reducing computational load, OneNet captures spatial relationships without requiring 2D convolutions, reducing parameters by up to 47%. Additionally, we explore a fully 1D encoder-decoder that achieves a 71% reduction in size, albeit with some accuracy loss. We benchmark our approach against U-Net variants across diverse mask-generation tasks, demonstrating that it preserves accuracy effectively. Although focused on image segmentation, this architecture is adaptable to other convolutional applications. Code for the project is available at https://github.com/shbyun080/OneNet .
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
GPT-4o System Card
Authors:
OpenAI,
:,
Aaron Hurst,
Adam Lerer,
Adam P. Goucher,
Adam Perelman,
Aditya Ramesh,
Aidan Clark,
AJ Ostrow,
Akila Welihinda,
Alan Hayes,
Alec Radford,
Aleksander MÄ…dry,
Alex Baker-Whitcomb,
Alex Beutel,
Alex Borzunov,
Alex Carney,
Alex Chow,
Alex Kirillov,
Alex Nichol,
Alex Paino,
Alex Renzin,
Alex Tachard Passos,
Alexander Kirillov,
Alexi Christakis
, et al. (395 additional authors not shown)
Abstract:
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil…
▽ More
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Optimization of DNN-based speaker verification model through efficient quantization technique
Authors:
Yeona Hong,
Woo-Jin Chung,
Hong-Goo Kang
Abstract:
As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of…
▽ More
As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of the speaker verification model. By analyzing performance changes and model size reductions in each layer of a pre-trained speaker verification model, we have effectively minimized performance degradation while significantly reducing the model size. Our quantization algorithm is the first attempt to maintain the performance of the state-of-the-art pre-trained speaker verification model, ECAPATDNN, while significantly compressing its model size. Overall, our quantization approach resulted in reducing the model size by half, with an increase in EER limited to 0.07%.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator
Authors:
Woo-Jin Chung,
Hong-Goo Kang
Abstract:
We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern in…
▽ More
We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography (EMA) signals during the AAI process. We train our model using an adversarial approach and introduce an attention-based Multi-duration phoneme discriminator (MDPD) designed to fully capture the intricate relationship among multi-channel articulatory signals. Our method achieves a Pearson correlation coefficient of 0.847, marking state-of-the-art performance in speaker-independent AAI models. The implementation details and code can be found online.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography
Authors:
Hika Barki,
Wan-Young Chung
Abstract:
Mental stress is a prevalent condition that can have negative impacts on one's health. Early detection and treatment are crucial for preventing related illnesses and maintaining overall wellness. This study presents a new method for identifying mental stress using a wearable biosensor worn in the ear. Data was gathered from 14 participants in a controlled environment using stress-inducing tasks su…
▽ More
Mental stress is a prevalent condition that can have negative impacts on one's health. Early detection and treatment are crucial for preventing related illnesses and maintaining overall wellness. This study presents a new method for identifying mental stress using a wearable biosensor worn in the ear. Data was gathered from 14 participants in a controlled environment using stress-inducing tasks such as memory and math tests. The raw photoplethysmography data was then processed by filtering, segmenting, and transforming it into scalograms using a continuous wavelet transform (CWT) which are based on two different mother wavelets, namely, a generalized Morse wavelet and the analytic Morlet (Gabor) wavelet. The scalograms were then passed through a convolutional neural network classifier, GoogLeNet, to classify the signals as stressed or non-stressed. The method achieved an outstanding result using the generalized Morse wavelet with an accuracy of 91.02% and an F1-score of 90.95%. This method demonstrates promise as a reliable tool for early detection and treatment of mental stress by providing real-time monitoring and allowing for preventive measures to be taken before it becomes a serious issue.
△ Less
Submitted 13 May, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
A Robust UWOC-assisted Multi-hop Topology for Underwater Sensor Network Nodes
Authors:
Maaz Salman,
Javad Bolboli,
Wan-Young Chung
Abstract:
Underwater environment is substantially less explored territory as compared to earth surface due to lack of robust underwater communication infrastructure. For Internet of Underwater things connectivity, underwater wireless optical communication can play a vital role, compared to conventional radio frequency communication, due to longer range, high data rate, low latency, and unregulated bandwidth…
▽ More
Underwater environment is substantially less explored territory as compared to earth surface due to lack of robust underwater communication infrastructure. For Internet of Underwater things connectivity, underwater wireless optical communication can play a vital role, compared to conventional radio frequency communication, due to longer range, high data rate, low latency, and unregulated bandwidth. This study proposes underwater wireless optical communication driven local area network UWOC LAN, comprised of multiple network nodes with optical transceivers. Moreover, the temperature sensor data is encapsulated with individual authentication identity to enhance the security of the framework at the user end. The proposed system is evaluated in a specially designed water tank of 4 meters. The proposed system evaluation analysis shows that the system can transmit underwater temperature data reliably in real time. The proposed secure UWOC LAN is tested within a communication range of 16 meters by incorporating multi hop connectivity to monitor the underwater environment.
△ Less
Submitted 31 March, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Dimensionality Reduction of Dynamics on Lie Manifolds via Structure-Aware Canonical Correlation Analysis
Authors:
Wooyoung Chung,
Daniel Polani,
Stas Tiomkin
Abstract:
Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems…
▽ More
Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems increase when applying group structure, converting them into Lie manifolds. The range of their applications is very wide and includes the important case of robotic tasks. Canonical Correlation Analysis (CCA) can construct a hierarchical sequence of maximal correlations of up to two paired data sets in these Euclidean spaces. We present a method to generalize this concept to Lie Manifolds and demonstrate its efficacy through the substantial improvements it achieves in making structure-consistent predictions about changes in the state of a robotic hand.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge
Authors:
Woo-Jin Chung,
Miseul Kim,
Hong-Goo Kang
Abstract:
This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through exper…
▽ More
This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through experimental findings, we demonstrate C2C's promising potential to enhance the diagnostic accuracy of COVID-19 via cough signals. Our proposed model achieves a ROC-AUC value of 0.7810 in the context of COVID-19 diagnosis. The implementation details and the python code can be found in the following link: https://github.com/Woo-jin-Chung/BHI_2023_challenge_Audio_Alchemists
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
Authors:
Samuel Cahyawijaya,
Holy Lovenia,
Willy Chung,
Rita Frieske,
Zihan Liu,
Pascale Fung
Abstract:
Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese,…
▽ More
Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly. To conduct the experiment, we develop an English-Mandarin speech emotion benchmark for adults and the elderly, BiMotion, and a Cantonese speech emotion dataset, YueMotion. This study concludes that different language and age groups require specific speech features, thus making cross-lingual inference an unsuitable method. However, cross-group data augmentation is still beneficial to regularize the model, with linguistic distance being a significant influence on cross-lingual transferability. We release publicly release our code at https://github.com/HLTCHKUST/elderly_ser.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion
Authors:
Woo-Jin Chung,
Doyeon Kim,
Soo-Whan Chung,
Hong-Goo Kang
Abstract:
We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments. Our model leverages the periodic characteristics of audio signals and involves two key steps: extracting pitch periodicity using periodic non-periodic convolution (PNP-Conv) b…
▽ More
We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments. Our model leverages the periodic characteristics of audio signals and involves two key steps: extracting pitch periodicity using periodic non-periodic convolution (PNP-Conv) blocks and estimating pitch by aggregating multi-level features using a modified bi-directional feature pyramid network (BiFPN). We evaluate our model on speech and music datasets and achieve superior pitch estimation performance compared to state-of-the-art baselines while using fewer model parameters. Our model achieves 99.20 % accuracy in pitch estimation on a clean musical dataset. Overall, our proposed model provides a promising solution for accurate pitch estimation in challenging acoustic environments and has potential applications in audio signal processing.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
An empirical study on speech restoration guided by self supervised speech representation
Authors:
Jaeuk Byun,
Youna Ji,
Soo Whan Chung,
Soyeon Choe,
Min Seok Choi
Abstract:
Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen…
▽ More
Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech representation learning on the speech restoration task. Specifically, we employ speech representation in various speech restoration networks and evaluate their performance under complicated distortion scenarios. Our experiments demonstrate that the contextual information provided by the self-supervised speech representation can enhance speech restoration performance in various distortion scenarios, while also increasing robustness against the duration of speech attenuation and mismatched test conditions.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
WiRiS: Transformer for RIS-Assisted Device-Free Sensing for Joint People Counting and Localization using Wi-Fi CSI
Authors:
Wei-Yu Chung,
Li-Hsiang Shen,
Kai-Ten Feng,
Yuan-Chun Lin,
Shih-Cheng Lin,
Sheng-Fuh Chang
Abstract:
Channel State Information (CSI) is widely adopted as a feature for indoor localization. Taking advantage of the abundant information from the CSI, people can be accurately sensed even without equipped devices. However, the positioning error increases severely in non-line-of-sight (NLoS) regions. Reconfigurable intelligent surface (RIS) has been introduced to improve signal coverage in NLoS areas,…
▽ More
Channel State Information (CSI) is widely adopted as a feature for indoor localization. Taking advantage of the abundant information from the CSI, people can be accurately sensed even without equipped devices. However, the positioning error increases severely in non-line-of-sight (NLoS) regions. Reconfigurable intelligent surface (RIS) has been introduced to improve signal coverage in NLoS areas, which can re-direct and enhance reflective signals with massive meta-material elements. In this paper, we have proposed a Transformer-based RIS-assisted device-free sensing for joint people counting and localization (WiRiS) system to precisely predict the number of people and their corresponding locations through configuring RIS. A series of predefined RIS beams is employed to create inputs of fingerprinting CSI features as sequence-to-sequence learning database for Transformer. We have evaluated the performance of proposed WiRiS system in both ray-tracing simulators and experiments. Both simulation and real-world experiments demonstrate that people counting accuracy exceeds 90\%, and the localization error can achieve the centimeter-level, which outperforms the existing benchmarks without employment of RIS.
△ Less
Submitted 9 November, 2023; v1 submitted 25 March, 2023;
originally announced April 2023.
-
Unsupervised Learning Based Hybrid Beamforming with Low-Resolution Phase Shifters for MU-MIMO Systems
Authors:
Chia-Ho Kuo,
Hsin-Yuan Chang,
Ronald Y. Chang,
Wei-Ho Chung
Abstract:
Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based sche…
▽ More
Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based scheme to jointly design the analog precoder and combiner with low-resolution PSs for multiuser multiple-input multiple-output (MU-MIMO) systems. We transform the analog precoder and combiner design problem into a phase classification problem and propose a generic neural network architecture, termed the phase classification network (PCNet), capable of producing solutions of various PS resolutions. Simulation results demonstrate the superior sum-rate and complexity performance of the proposed scheme, as compared to state-of-the-art hybrid beamforming designs for the most commonly used low-resolution PS configurations.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Supervised Segmentation with Domain Adaptation for Small Sampled Orbital CT Images
Authors:
Sungho Suh,
Sojeong Cheon,
Wonseo Choi,
Yeon Woong Chung,
Won-Kyung Cho,
Ji-Sun Paik,
Sung Eun Kim,
Dong-Jin Chang,
Yong Oh Lee
Abstract:
Deep neural networks (DNNs) have been widely used for medical image analysis. However, the lack of access a to large-scale annotated dataset poses a great challenge, especially in the case of rare diseases, or new domains for the research society. Transfer of pre-trained features, from the relatively large dataset is a considerable solution. In this paper, we have explored supervised segmentation…
▽ More
Deep neural networks (DNNs) have been widely used for medical image analysis. However, the lack of access a to large-scale annotated dataset poses a great challenge, especially in the case of rare diseases, or new domains for the research society. Transfer of pre-trained features, from the relatively large dataset is a considerable solution. In this paper, we have explored supervised segmentation using domain adaptation for optic nerve and orbital tumor, when only small sampled CT images are given. Even the lung image database consortium image collection (LIDC-IDRI) is a cross-domain to orbital CT, but the proposed domain adaptation method improved the performance of attention U-Net for the segmentation in public optic nerve dataset and our clinical orbital tumor dataset. The code and dataset are available at https://github.com/cmcbigdata.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Deep Metric Learning-based Image Retrieval System for Chest Radiograph and its Clinical Applications in COVID-19
Authors:
Aoxiao Zhong,
Xiang Li,
Dufan Wu,
Hui Ren,
Kyungsang Kim,
Younggon Kim,
Varun Buch,
Nir Neumark,
Bernardo Bizzo,
Won Young Tak,
Soo Young Park,
Yu Rim Lee,
Min Kyu Kang,
Jung Gil Park,
Byung Seok Kim,
Woo Jin Chung,
Ning Guo,
Ittai Dayan,
Mannudeep K. Kalra,
Quanzheng Li
Abstract:
In recent years, deep learning-based image analysis methods have been widely applied in computer-aided detection, diagnosis and prognosis, and has shown its value during the public health crisis of the novel coronavirus disease 2019 (COVID-19) pandemic. Chest radiograph (CXR) has been playing a crucial role in COVID-19 patient triaging, diagnosing and monitoring, particularly in the United States.…
▽ More
In recent years, deep learning-based image analysis methods have been widely applied in computer-aided detection, diagnosis and prognosis, and has shown its value during the public health crisis of the novel coronavirus disease 2019 (COVID-19) pandemic. Chest radiograph (CXR) has been playing a crucial role in COVID-19 patient triaging, diagnosing and monitoring, particularly in the United States. Considering the mixed and unspecific signals in CXR, an image retrieval model of CXR that provides both similar images and associated clinical information can be more clinically meaningful than a direct image diagnostic model. In this work we develop a novel CXR image retrieval model based on deep metric learning. Unlike traditional diagnostic models which aims at learning the direct mapping from images to labels, the proposed model aims at learning the optimized embedding space of images, where images with the same labels and similar contents are pulled together. It utilizes multi-similarity loss with hard-mining sampling strategy and attention mechanism to learn the optimized embedding space, and provides similar images to the query image. The model is trained and validated on an international multi-site COVID-19 dataset collected from 3 different sources. Experimental results of COVID-19 image retrieval and diagnosis tasks show that the proposed model can serve as a robust solution for CXR analysis and patient management for COVID-19. The model is also tested on its transferability on a different clinical decision support task, where the pre-trained model is applied to extract image features from a new dataset without any further training. These results demonstrate our deep metric learning based image retrieval model is highly efficient in the CXR retrieval, diagnosis and prognosis, and thus has great clinical value for the treatment and management of COVID-19 patients.
△ Less
Submitted 25 November, 2020;
originally announced December 2020.
-
Deep Learning-based Four-region Lung Segmentation in Chest Radiography for COVID-19 Diagnosis
Authors:
Young-Gon Kim,
Kyungsang Kim,
Dufan Wu,
Hui Ren,
Won Young Tak,
Soo Young Park,
Yu Rim Lee,
Min Kyu Kang,
Jung Gil Park,
Byung Seok Kim,
Woo Jin Chung,
Mannudeep K. Kalra,
Quanzheng Li
Abstract:
Purpose. Imaging plays an important role in assessing severity of COVID 19 pneumonia. However, semantic interpretation of chest radiography (CXR) findings does not include quantitative description of radiographic opacities. Most current AI assisted CXR image analysis framework do not quantify for regional variations of disease. To address these, we proposed a four region lung segmentation method t…
▽ More
Purpose. Imaging plays an important role in assessing severity of COVID 19 pneumonia. However, semantic interpretation of chest radiography (CXR) findings does not include quantitative description of radiographic opacities. Most current AI assisted CXR image analysis framework do not quantify for regional variations of disease. To address these, we proposed a four region lung segmentation method to assist accurate quantification of COVID 19 pneumonia. Methods. A segmentation model to separate left and right lung is firstly applied, and then a carina and left hilum detection network is used, which are the clinical landmarks to separate the upper and lower lungs. To improve the segmentation performance of COVID 19 images, ensemble strategy incorporating five models is exploited. Using each region, we evaluated the clinical relevance of the proposed method with the Radiographic Assessment of the Quality of Lung Edema (RALE). Results. The proposed ensemble strategy showed dice score of 0.900, which is significantly higher than conventional methods (0.854 0.889). Mean intensities of segmented four regions indicate positive correlation to the extent and density scores of pulmonary opacities under the RALE framework. Conclusion. A deep learning based model in CXR can accurately segment and quantify regional distribution of pulmonary opacities in patients with COVID 19 pneumonia.
△ Less
Submitted 26 September, 2020;
originally announced September 2020.
-
A Passivity-based Nonlinear Admittance Control with Application to Powered Upper-limb Control under Unknown Environmental Interactions
Authors:
Min Jun Kim,
Woongyong Lee,
Jae Yeon Choi,
Goobong Chung,
Kyung-Lyong Han,
Il Seop Choi,
Christian Ott,
Wan Kyun Chung
Abstract:
This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although…
▽ More
This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although the environmental interaction cannot be detected by any sensors (hence unknown), passivity allows us to have natural interaction. An analysis shows that the behavior of the actual system mimics that of a nominal model as the control gain goes to infinity, which implies that the proposed approach is an admittance controller. However, because the control gain cannot grow infinitely in practice, the performance limitation according to the achievable control gain is also analyzed. The result of this analysis indicates that the performance in the sense of infinite norm increases linearly with the control gain. In the experiments, the proposed properties were verified using 1 degree-of-freedom testbench, and an actual powered upper-limb exoskeleton was used to lift and maneuver the unknown payload.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
Local Cyber-Physical Attack for Masking Line Outage and Topology Attack in Smart Grid
Authors:
Hwei-Ming Chung,
Wen-Tai Li,
Chau Yuen,
Wei-Ho Chung,
Yan Zhang,
Chao-Kai Wen
Abstract:
Malicious attacks in the power system can eventually result in a large-scale cascade failure if not attended on time. These attacks, which are traditionally classified into \emph{physical} and \emph{cyber attacks}, can be avoided by using the latest and advanced detection mechanisms. However, a new threat called \emph{cyber-physical attacks} which jointly target both the physical and cyber layers…
▽ More
Malicious attacks in the power system can eventually result in a large-scale cascade failure if not attended on time. These attacks, which are traditionally classified into \emph{physical} and \emph{cyber attacks}, can be avoided by using the latest and advanced detection mechanisms. However, a new threat called \emph{cyber-physical attacks} which jointly target both the physical and cyber layers of the system to interfere the operations of the power grid is more malicious as compared with the traditional attacks. In this paper, we propose a new cyber-physical attack strategy where the transmission line is first physically disconnected, and then the line-outage event is masked, such that the control center is misled into detecting as an obvious line outage at a different position in the local area of the power system. Therefore, the topology information in the control center is interfered by our attack. We also propose a novel procedure for selecting vulnerable lines, and analyze the observability of our proposed framework. Our proposed method can effectively and continuously deceive the control center into detecting fake line-outage positions, and thereby increase the chance of cascade failure because the attention is given to the fake outage. The simulation results validate the efficiency of our proposed attack strategy.
△ Less
Submitted 24 July, 2018;
originally announced July 2018.
-
Fundamental Limits on Data Acquisition: Trade-offs between Sample Complexity and Query Difficulty
Authors:
Hye Won Chung,
Ji Oon Lee,
Alfred O. Hero
Abstract:
We consider query-based data acquisition and the corresponding information recovery problem, where the goal is to recover $k$ binary variables (information bits) from parity measurements of those variables. The queries and the corresponding parity measurements are designed using the encoding rule of Fountain codes. By using Fountain codes, we can design potentially limitless number of queries, and…
▽ More
We consider query-based data acquisition and the corresponding information recovery problem, where the goal is to recover $k$ binary variables (information bits) from parity measurements of those variables. The queries and the corresponding parity measurements are designed using the encoding rule of Fountain codes. By using Fountain codes, we can design potentially limitless number of queries, and corresponding parity measurements, and guarantee that the original $k$ information bits can be recovered with high probability from any sufficiently large set of measurements of size $n$. In the query design, the average number of information bits that is associated with one parity measurement is called query difficulty ($\bar{d}$) and the minimum number of measurements required to recover the $k$ information bits for a fixed $\bar{d}$ is called sample complexity ($n$). We analyze the fundamental trade-offs between the query difficulty and the sample complexity, and show that the sample complexity of $n=c\max\{k,(k\log k)/\bar{d}\}$ for some constant $c>0$ is necessary and sufficient to recover $k$ information bits with high probability as $k\to\infty$.
△ Less
Submitted 2 January, 2018; v1 submitted 30 November, 2017;
originally announced December 2017.
-
Local Cyber-physical Attack with Leveraging Detection in Smart Grid
Authors:
Hwei-Ming Chung,
Wen-Tai Li,
Chau Yuen,
Wei-Ho Chung,
Chao-Kai Wen
Abstract:
A well-designed attack in the power system can cause an initial failure and then results in large-scale cascade failure. Several works have discussed power system attack through false data injection, line-maintaining attack, and line-removing attack. However, the existing methods need to continuously attack the system for a long time, and, unfortunately, the performance cannot be guaranteed if the…
▽ More
A well-designed attack in the power system can cause an initial failure and then results in large-scale cascade failure. Several works have discussed power system attack through false data injection, line-maintaining attack, and line-removing attack. However, the existing methods need to continuously attack the system for a long time, and, unfortunately, the performance cannot be guaranteed if the system states vary. To overcome this issue, we consider a new type of attack strategy called combinational attack which masks a line-outage at one position but misleads the control center on line outage at another position. Therefore, the topology information in the control center is interfered by our attack. We also offer a procedure of selecting the vulnerable lines of its kind. The proposed method can effectively and continuously deceive the control center in identifying the actual position of line-outage. The system under attack will be exposed to increasing risks as the attack continuously. Simulation results validate the efficiency of the proposed attack strategy.
△ Less
Submitted 10 August, 2017;
originally announced August 2017.