-
Prediction of Permissioned Blockchain Performance for Resource Scaling Configurations
Authors:
Seungwoo Jung,
Yeonho Yoo,
Gyeongsik Yang,
Chuck Yoo
Abstract:
Blockchain is increasingly offered as blockchain-as-a-service (BaaS) by cloud service providers. However, configuring BaaS appropriately for optimal performance and reliability resorts to try-and-error. A key challenge is that BaaS is often perceived as a ``black-box,'' leading to uncertainties in performance and resource provisioning. Previous studies attempted to address this challenge; however,…
▽ More
Blockchain is increasingly offered as blockchain-as-a-service (BaaS) by cloud service providers. However, configuring BaaS appropriately for optimal performance and reliability resorts to try-and-error. A key challenge is that BaaS is often perceived as a ``black-box,'' leading to uncertainties in performance and resource provisioning. Previous studies attempted to address this challenge; however, the impacts of both vertical and horizontal scaling remain elusive. To this end, we present machine learning-based models to predict network reliability and throughput based on scaling configurations. In our evaluation, the models exhibit prediction errors of ~1.9%, which is highly accurate and can be applied in the real-world.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Equalization-Enhanced Phase Noise: Modeling and DSP-aware Analysis
Authors:
Sebastian Jung,
Tim Janz,
Vahid Aref,
Stephan ten Brink
Abstract:
In coherent optical communication systems the laser phase noise is commonly modeled as a Wiener process. We propose a sliding-window based linearization of the phase noise, enabling a novel description. We show that, by stochastically modeling the residual error introduced by this approximation, equalization-enhanced phase noise (EEPN) can be described and decomposed into four different components…
▽ More
In coherent optical communication systems the laser phase noise is commonly modeled as a Wiener process. We propose a sliding-window based linearization of the phase noise, enabling a novel description. We show that, by stochastically modeling the residual error introduced by this approximation, equalization-enhanced phase noise (EEPN) can be described and decomposed into four different components. Furthermore, we analyze the four components separately and provide a stochastical model for each of them. This novel model allows to predict the impact of well-known algorithms in coherent digital signal processing (DSP) pipelines such as timing recovery (TR) and carrier phase recovery (CPR) on each of the terms. Thus, it enables to approximate the resulting signal affected by EEPN after each of these DSP steps and helps to derive appropriate ways of mitigating such effects.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization
Authors:
Haaris Mehmood,
Karthikeyan Saravanan,
Pablo Peso Parada,
David Tuckey,
Mete Ozay,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
Automatic Speech Recognition (ASR) is widely used within consumer devices such as mobile phones. Recently, personalization or on-device model fine-tuning has shown that adaptation of ASR models towards target user speech improves their performance over rare words or accented speech. Despite these gains, fine-tuning on user data (target domain) risks the personalized model to forget knowledge about…
▽ More
Automatic Speech Recognition (ASR) is widely used within consumer devices such as mobile phones. Recently, personalization or on-device model fine-tuning has shown that adaptation of ASR models towards target user speech improves their performance over rare words or accented speech. Despite these gains, fine-tuning on user data (target domain) risks the personalized model to forget knowledge about its original training distribution (source domain) i.e. catastrophic forgetting, leading to subpar general ASR performance. A simple and efficient approach to combat catastrophic forgetting is to measure forgetting via a validation set that represents the source domain distribution. However, such validation sets are large and impractical for mobile devices. Towards this, we propose a novel method to subsample a substantially large validation set into a smaller one while maintaining the ability to estimate forgetting. We demonstrate the efficacy of such a dataset in mitigating forgetting by utilizing it to dynamically determine the number of ideal fine-tuning epochs. When measuring the deviations in per user fine-tuning epochs against a 50x larger validation set (oracle), our method achieves a lower mean-absolute-error (3.39) compared to randomly selected subsets of the same size (3.78-8.65). Unlike random baselines, our method consistently tracks the oracle's behaviour across three different forgetting thresholds.
△ Less
Submitted 7 April, 2025; v1 submitted 12 March, 2025;
originally announced March 2025.
-
DeePen: Penetration Testing for Audio Deepfake Detection
Authors:
Nicolas Müller,
Piotr Kawa,
Adriana Stan,
Thien-Phuc Doan,
Souhwan Jung,
Wei Herng Choong,
Philip Sperl,
Konstantin Böttinger
Abstract:
Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen.…
▽ More
Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen. Our approach operates without prior knowledge of or access to the target deepfake detection models. Instead, it leverages a set of carefully selected signal processing modifications - referred to as attacks - to evaluate model vulnerabilities. Using DeePen, we analyze both real-world production systems and publicly available academic model checkpoints, demonstrating that all tested systems exhibit weaknesses and can be reliably deceived by simple manipulations such as time-stretching or echo addition. Furthermore, our findings reveal that while some attacks can be mitigated by retraining detection systems with knowledge of the specific attack, others remain persistently effective. We release all associated code.
△ Less
Submitted 5 March, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
SPARK: A Modular Benchmark for Humanoid Robot Safety
Authors:
Yifan Sun,
Rui Chen,
Kai S. Yun,
Yikuan Fang,
Sebin Jung,
Feihan Li,
Bowei Li,
Weiye Zhao,
Changliu Liu
Abstract:
This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. T…
▽ More
This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. To facilitate the safe deployment of complex robot systems, SPARK can be used as a toolbox that comes with state-of-the-art safe control algorithms in a modular and composable robot control framework. Users can easily configure safety criteria and sensitivity levels to optimize the balance between safety and performance. To accelerate humanoid safety research and development, SPARK provides a simulation benchmark that compares safety approaches in a variety of environments, tasks, and robot models. Furthermore, SPARK allows quick deployment of synthesized safe controllers on real robots. For hardware deployment, SPARK supports Apple Vision Pro (AVP) or a Motion Capture System as external sensors, while also offering interfaces for seamless integration with alternative hardware setups. This paper demonstrates SPARK's capability with both simulation experiments and case studies with a Unitree G1 humanoid robot. Leveraging these advantages of SPARK, users and researchers can significantly improve the safety of their humanoid systems as well as accelerate relevant research. The open-source code is available at https://github.com/intelligent-control-lab/spark.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning
Authors:
Sua Jung
Abstract:
Compositional Zero-Shot Learning (CZSL) aims to recognize subtle differences in meaning or the combination of states and objects through the use of known and unknown concepts during training. Existing methods either focused on prompt configuration or on using prompts to tune the pre-trained Vision-Language model. However, these methods faced challenges in accurately identifying subtle differences…
▽ More
Compositional Zero-Shot Learning (CZSL) aims to recognize subtle differences in meaning or the combination of states and objects through the use of known and unknown concepts during training. Existing methods either focused on prompt configuration or on using prompts to tune the pre-trained Vision-Language model. However, these methods faced challenges in accurately identifying subtle differences in meaning or combining states with objects. To jointly eradicate the above issues and construct an efficient and effective CZSL technique, we suggest a method to improve attribute recognition performance by utilizing diverse Prompt Learning with an Inter/Intra-Modality Fusion Synthesizer in scene understanding involving subtle semantic differences and multiple objects.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
persoDA: Personalized Data Augmentation for Personalized ASR
Authors:
Pablo Peso Parada,
Spyros Fontalis,
Md Asif Jalal,
Karthikeyan Saravanan,
Anastasios Drosou,
Mete Ozay,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
Data augmentation (DA) is ubiquitously used in training of Automatic Speech Recognition (ASR) models. DA offers increased data variability, robustness and generalization against different acoustic distortions. Recently, personalization of ASR models on mobile devices has been shown to improve Word Error Rate (WER). This paper evaluates data augmentation in this context and proposes persoDA; a DA m…
▽ More
Data augmentation (DA) is ubiquitously used in training of Automatic Speech Recognition (ASR) models. DA offers increased data variability, robustness and generalization against different acoustic distortions. Recently, personalization of ASR models on mobile devices has been shown to improve Word Error Rate (WER). This paper evaluates data augmentation in this context and proposes persoDA; a DA method driven by user's data utilized to personalize ASR. persoDA aims to augment training with data specifically tuned towards acoustic characteristics of the end-user, as opposed to standard augmentation based on Multi-Condition Training (MCT) that applies random reverberation and noises. Our evaluation with an ASR conformer-based baseline trained on Librispeech and personalized for VOICES shows that persoDA achieves a 13.9% relative WER reduction over using standard data augmentation (using random noise & reverberation). Furthermore, persoDA shows 16% to 20% faster convergence over MCT.
△ Less
Submitted 17 January, 2025; v1 submitted 15 January, 2025;
originally announced January 2025.
-
Deformation-Aware Segmentation Network Robust to Motion Artifacts for Brain Tissue Segmentation using Disentanglement Learning
Authors:
Sunyoung Jung,
Yoonseok Choi,
Mohammed A. Al-masni,
Minyoung Jung,
Dong-Hyun Kim
Abstract:
Motion artifacts caused by prolonged acquisition time are a significant challenge in Magnetic Resonance Imaging (MRI), hindering accurate tissue segmentation. These artifacts appear as blurred images that mimic tissue-like appearances, making segmentation difficult. This study proposes a novel deep learning framework that demonstrates superior performance in both motion correction and robust brain…
▽ More
Motion artifacts caused by prolonged acquisition time are a significant challenge in Magnetic Resonance Imaging (MRI), hindering accurate tissue segmentation. These artifacts appear as blurred images that mimic tissue-like appearances, making segmentation difficult. This study proposes a novel deep learning framework that demonstrates superior performance in both motion correction and robust brain tissue segmentation in the presence of artifacts. The core concept lies in a complementary process: a disentanglement learning network progressively removes artifacts, leading to cleaner images and consequently, more accurate segmentation by a jointly trained motion estimation and segmentation network. This network generates three outputs: a motioncorrected image, a motion deformation map that identifies artifact-affected regions, and a brain tissue segmentation mask. This deformation serves as a guidance mechanism for the disentanglement process, aiding the model in recovering lost information or removing artificial structures introduced by the artifacts. Extensive in-vivo experiments on pediatric motion data demonstrate that our proposed framework outperforms state-of-the-art methods in segmenting motion-corrupted MRI scans.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
ISLES'24: Final Infarct Prediction with Multimodal Imaging and Clinical Data. Where Do We Stand?
Authors:
Ezequiel de la Rosa,
Ruisheng Su,
Mauricio Reyes,
Evamaria O. Riedel,
Hakim Baazaoui,
Roland Wiest,
Florian Kofler,
Kaiyuan Yang,
David Robben,
Mahsa Mojtahedi,
Laura van Poppel,
Lucas de Vries,
Anthony Winder,
Kimberly Amador,
Nils D. Forkert,
Gyeongyeon Hwang,
Jiwoo Song,
Dohyun Kim,
Eneko Uruñuela,
Annabella Bregazzi,
Matthias Wilms,
Hyun Yang,
Jin Tae Kwak,
Sumin Jung,
Luan Matheus Trindade Dalmazo
, et al. (15 additional authors not shown)
Abstract:
Accurate estimation of brain infarction (i.e., irreversibly damaged tissue) is critical for guiding treatment decisions in acute ischemic stroke. Reliable infarct prediction informs key clinical interventions, including the need for patient transfer to comprehensive stroke centers, the potential benefit of additional reperfusion attempts during mechanical thrombectomy, decisions regarding secondar…
▽ More
Accurate estimation of brain infarction (i.e., irreversibly damaged tissue) is critical for guiding treatment decisions in acute ischemic stroke. Reliable infarct prediction informs key clinical interventions, including the need for patient transfer to comprehensive stroke centers, the potential benefit of additional reperfusion attempts during mechanical thrombectomy, decisions regarding secondary neuroprotective treatments, and ultimately, prognosis of clinical outcomes. This work introduces the Ischemic Stroke Lesion Segmentation (ISLES) 2024 challenge, which focuses on the prediction of final infarct volumes from pre-interventional acute stroke imaging and clinical data. ISLES24 provides a comprehensive, multimodal setting where participants can leverage all clinically and practically available data, including full acute CT imaging, sub-acute follow-up MRI, and structured clinical information, across a train set of 150 cases. On the hidden test set of 98 cases, the top-performing model, a multimodal nnU-Net-based architecture, achieved a Dice score of 0.285 (+/- 0.213) and an absolute volume difference of 21.2 (+/- 37.2) mL, underlining the significant challenges posed by this task and the need for further advances in multimodal learning. This work makes two primary contributions: first, we establish a standardized, clinically realistic benchmark for post-treatment infarct prediction, enabling systematic evaluation of multimodal algorithmic strategies on a longitudinal stroke dataset; second, we analyze current methodological limitations and outline key research directions to guide the development of next-generation infarct prediction models.
△ Less
Submitted 7 July, 2025; v1 submitted 20 August, 2024;
originally announced August 2024.
-
BPMP-Tracker: A Versatile Aerial Target Tracker Using Bernstein Polynomial Motion Primitives
Authors:
Yunwoo Lee,
Jungwon Park,
Boseong Jeon,
Seungwoo Jung,
H. Jin Kim
Abstract:
This letter presents a versatile trajectory planning pipeline for aerial tracking. The proposed tracker is capable of handling various chasing settings such as complex unstructured environments, crowded dynamic obstacles and multiple-target following. Among the entire pipeline, we focus on developing a predictor for future target motion and a chasing trajectory planner. For rapid computation, we e…
▽ More
This letter presents a versatile trajectory planning pipeline for aerial tracking. The proposed tracker is capable of handling various chasing settings such as complex unstructured environments, crowded dynamic obstacles and multiple-target following. Among the entire pipeline, we focus on developing a predictor for future target motion and a chasing trajectory planner. For rapid computation, we employ the sample-check-select strategy: modules sample a set of candidate movements, check multiple constraints, and then select the best trajectory. Also, we leverage the properties of Bernstein polynomials for quick calculations. The prediction module predicts the trajectories of the targets, which do not overlap with static and dynamic obstacles. Then the trajectory planner outputs a trajectory, ensuring various conditions such as occlusion and collision avoidance, the visibility of all targets within a camera image and dynamical limits. We fully test the proposed tracker in simulations and hardware experiments under challenging scenarios, including dual-target following, environments with dozens of dynamic obstacles and complex indoor and outdoor spaces.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks
Authors:
Gyu Seon Kim,
Yeryeong Cho,
Jaehyun Chung,
Soohyun Park,
Soyi Jung,
Zhu Han,
Joongheon Kim
Abstract:
Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov…
▽ More
Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for providing cooperatively global access sustainability and energy efficiency. However, as the number of CubeSats and HALE-UAVs, increases, the scheduling dimension of each ground station (GS) increases. As a result, each GS can fall into the curse of dimensionality, and this challenge becomes one major hurdle for efficient global access. Therefore, this paper provides a quantum multi-agent reinforcement Learning (QMARL)-based method for scheduling between GSs and CubeSats/HALE-UAVs in order to improve global access availability and energy efficiency. The main reason why the QMARL-based scheduler can be beneficial is that the algorithm facilitates a logarithmic-scale reduction in scheduling action dimensions, which is one critical feature as the number of CubeSats and HALE-UAVs expands. Additionally, individual GSs have different traffic demands depending on their locations and characteristics, thus it is essential to provide differentiated access services. The superiority of the proposed scheduler is validated through data-intensive experiments in realistic CubeSat/HALE-UAV settings.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans
Authors:
Kwang-Hyun Uhm,
Seung-Won Jung,
Sung-Hoo Hong,
Sung-Jea Ko
Abstract:
Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Altho…
▽ More
Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Although deep learning-based approaches have been recently explored for differential diagnosis of kidney cancer, they do not explicitly model the relationships between CT phases in the network design, limiting the diagnostic performance. In this paper, we propose a novel lesion-aware cross-phase attention network (LACPANet) that can effectively capture temporal dependencies of renal lesions across CT phases to accurately classify the lesions into five major pathological subtypes from time-series multi-phase CT images. We introduce a 3D inter-phase lesion-aware attention mechanism to learn effective 3D lesion features that are used to estimate attention weights describing the inter-phase relations of the enhancement patterns. We also present a multi-scale attention scheme to capture and aggregate temporal patterns of lesion features at different spatial scales for further improvement. Extensive experiments on multi-phase CT scans of kidney cancer patients from the collected dataset demonstrate that our LACPANet outperforms state-of-the-art approaches in diagnostic accuracy.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Towards Efficient and Accurate CT Segmentation via Edge-Preserving Probabilistic Downsampling
Authors:
Shahzad Ali,
Yu Rim Lee,
Soo Young Park,
Won Young Tak,
Soon Ki Jung
Abstract:
Downsampling images and labels, often necessitated by limited resources or to expedite network training, leads to the loss of small objects and thin boundaries. This undermines the segmentation network's capacity to interpret images accurately and predict detailed labels, resulting in diminished performance compared to processing at original resolutions. This situation exemplifies the trade-off be…
▽ More
Downsampling images and labels, often necessitated by limited resources or to expedite network training, leads to the loss of small objects and thin boundaries. This undermines the segmentation network's capacity to interpret images accurately and predict detailed labels, resulting in diminished performance compared to processing at original resolutions. This situation exemplifies the trade-off between efficiency and accuracy, with higher downsampling factors further impairing segmentation outcomes. Preserving information during downsampling is especially critical for medical image segmentation tasks. To tackle this challenge, we introduce a novel method named Edge-preserving Probabilistic Downsampling (EPD). It utilizes class uncertainty within a local window to produce soft labels, with the window size dictating the downsampling factor. This enables a network to produce quality predictions at low resolutions. Beyond preserving edge details more effectively than conventional nearest-neighbor downsampling, employing a similar algorithm for images, it surpasses bilinear interpolation in image downsampling, enhancing overall performance. Our method significantly improved Intersection over Union (IoU) to 2.85%, 8.65%, and 11.89% when downsampling data to 1/2, 1/4, and 1/8, respectively, compared to conventional interpolation methods.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
LR-FHSS Transceiver for Direct-to-Satellite IoT Communications: Design, Implementation, and Verification
Authors:
Sooyeob Jung,
Seongah Jeong,
Jinkyu Kang,
Gyeongrae Im,
Sangjae Lee,
Mi-Kyung Oh,
Joon Gyu Ryu,
Joonhyuk Kang
Abstract:
This paper proposes a long range-frequency hopping spread spectrum (LR-FHSS) transceiver design for the Direct-to-Satellite Internet of Things (DtS-IoT) communication system. The DtS-IoT system has recently attracted attention as a promising nonterrestrial network (NTN) solution to provide high-traffic and low-latency data transfer services to IoT devices in global coverage. In particular, this st…
▽ More
This paper proposes a long range-frequency hopping spread spectrum (LR-FHSS) transceiver design for the Direct-to-Satellite Internet of Things (DtS-IoT) communication system. The DtS-IoT system has recently attracted attention as a promising nonterrestrial network (NTN) solution to provide high-traffic and low-latency data transfer services to IoT devices in global coverage. In particular, this study provides guidelines for the overall DtS-IoT system architecture and design details that conform to the Long Range Wide-Area Network (LoRaWAN). Furthermore, we also detail various DtS-IoT use cases. Considering the multiple low-Earth orbit (LEO) satellites, we developed the LR-FHSS transceiver to improve system efficiency, which is the first attempt in real satellite communication systems using LR-FHSS. Moreover, as an extension of our previous work with perfect synchronization, we applied a robust synchronization scheme against the Doppler effect and co-channel interference (CCI) caused by LEO satellite channel environments, including signal detection for the simultaneous reception of numerous frequency hopping signals and an enhanced soft-output-Viterbi-algorithm (SOVA) for the header and payload receptions. Lastly, we present proof-of-concept implementation and testbeds using an application-specific integrated circuit (ASIC) chipset and a field-programmable gate array (FPGA) that verify the performance of the proposed LR-FHSS transceiver design of DtS-IoT communication systems. The laboratory test results reveal that the proposed LR-FHSS-based framework with the robust synchronization technique can provide wide coverage, seamless connectivity, and high throughput communication links for the realization of future sixth-generation (6G) networks.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile
Authors:
Seokjun Lee,
Seung-Won Jung,
Hyunseok Seo
Abstract:
Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the…
▽ More
Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the generated images to improve generative performance of both GAN and diffusion models. This is realized by spectrum translation for the refinement of image generation (STIG) based on contrastive learning. We adopt theoretical logic of frequency components in various generative networks. The key idea, here, is to refine the spectrum of the generated image via the concept of image-to-image translation and contrastive learning in terms of digital signal processing. We evaluate our framework across eight fake image datasets and various cutting-edge models to demonstrate the effectiveness of STIG. Our framework outperforms other cutting-edges showing significant decreases in FID and log frequency distance of spectrum. We further emphasize that STIG improves image quality by decreasing the spectral anomaly. Additionally, validation results present that the frequency-based deepfake detector confuses more in the case where fake spectrums are manipulated by STIG.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Intelli-Z: Toward Intelligible Zero-Shot TTS
Authors:
Sunghee Jung,
Won Jang,
Jaesam Yoon,
Bongwan Kim
Abstract:
Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new…
▽ More
Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new one at the inference stage. In this study, we propose a zero-shot TTS model focused on intelligibility, which we refer to as Intelli-Z. Intelli-Z learns speaker embeddings by using multi-speaker TTS as its teacher and is trained with a cycle-consistency loss to include mismatched text-speech pairs for training. Additionally, it selectively aggregates speaker embeddings along the temporal dimension to minimize the interference of the text content of reference speech at the inference stage. We substantiate the effectiveness of the proposed methods with an ablation study. The Mean Opinion Score (MOS) increases by 9% for unseen speakers when the first two methods are applied, and it further improves by 16% when selective temporal aggregation is applied.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Locality enhanced dynamic biasing and sampling strategies for contextual ASR
Authors:
Md Asif Jalal,
Pablo Peso Parada,
George Pavlidis,
Vasileios Moschopoulos,
Karthikeyan Saravanan,
Chrysovalantis-Giorgos Kontoulis,
Jisi Zhang,
Anastasios Drosou,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t…
▽ More
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the training of CB for ASR with correlation plots between the bias embeddings among various training stages. Secondly, we introduce a neighbourhood attention (NA) that localizes self attention (SA) to the nearest neighbouring frames to further refine the CB output. The results show that this proposed approach provides on average a 25.84% relative WER improvement on LibriSpeech sets and rare-word evaluation compared to the baseline.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Consistency Based Unsupervised Self-training For ASR Personalisation
Authors:
Jisi Zhang,
Vandana Rajan,
Haaris Mehmood,
David Tuckey,
Pablo Peso Parada,
Md Asif Jalal,
Karthikeyan Saravanan,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model…
▽ More
On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model robustness. The majority of ASR personalisation methods assume labelled user data for supervision. Personalisation without any labelled data is challenging due to limited data size and poor quality of recorded audio samples. This work addresses unsupervised personalisation by developing a novel consistency based training method via pseudo-labelling. Our method achieves a relative Word Error Rate Reduction (WERR) of 17.3% on unlabelled training data and 8.1% on held-out data compared to a pre-trained model, and outperforms the current state-of-the art methods.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data
Authors:
Kwang-Hyun Uhm,
Seung-Won Jung,
Moon Hyung Choi,
Sung-Hoo Hong,
Sung-Jea Ko
Abstract:
Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv…
▽ More
Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effective for the diagnosis task. In this paper, we propose a unified framework for kidney cancer diagnosis with incomplete multi-phase CT, which simultaneously recovers missing CT images and classifies cancer subtypes using the completed set of images. The advantage of our framework is that it encourages a synthesis model to explicitly learn to generate missing CT phases that are helpful for classifying cancer subtypes. We further incorporate lesion segmentation network into our framework to exploit lesion-level features for effective cancer classification in the whole CT volumes. The proposed framework is based on fully 3D convolutional neural networks to jointly optimize both synthesis and classification of 3D CT volumes. Extensive experiments on both in-house and external datasets demonstrate the effectiveness of our framework for the diagnosis with incomplete data compared with state-of-the-art baselines. In particular, cancer subtype classification using the completed CT data by our method achieves higher performance than the classification using the given incomplete data.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Exploring 3D U-Net Training Configurations and Post-Processing Strategies for the MICCAI 2023 Kidney and Tumor Segmentation Challenge
Authors:
Kwang-Hyun Uhm,
Hyunjun Cho,
Zhixin Xu,
Seohoon Lim,
Seung-Won Jung,
Sung-Hoo Hong,
Sung-Jea Ko
Abstract:
In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper…
▽ More
In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper, we explore various 3D U-Net training configurations and effective post-processing strategies for accurate segmentation of kidneys, cysts, and kidney tumors in CT images. We validated our model on the dataset of the 2023 Kidney and Kidney Tumor Segmentation (KiTS23) challenge. Our method took second place in the final ranking of the KiTS23 challenge on unseen test data with an average Dice score of 0.820 and an average Surface Dice of 0.712.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
J-Net: Improved U-Net for Terahertz Image Super-Resolution
Authors:
Woon-Ha Yeo,
Seung-Hwan Jung,
Seung Jae Oh,
Inhee Maeng,
Eui Su Lee,
Han-Cheol Ryu
Abstract:
Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot…
▽ More
Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot research topics. We propose a novel network architecture called J-Net which is improved version of U-Net to solve the THz image super-resolution. It employs the simple baseline blocks which can extract low resolution (LR) image features and learn the mapping of LR images to highresolution (HR) images efficiently. All training was conducted using the DIV2K+Flickr2K dataset, and we employed the peak signal-to-noise ratio (PSNR) for quantitative comparison. In our comparisons with other THz image super-resolution methods, JNet achieved a PSNR of 32.52 dB, surpassing other techniques by more than 1 dB. J-Net also demonstrates superior performance on real THz images compared to other methods. Experiments show that the proposed J-Net achieves better PSNR and visual improvement compared with other THz image super-resolution methods.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency
Authors:
Chenyu Tang,
Muzi Xu,
Wentian Yi,
Zibo Zhang,
Edoardo Occhipinti,
Chaoqun Dong,
Dafydd Ravenscroft,
Sung-Min Jung,
Sanghyo Lee,
Shuo Gao,
Jong Min Kim,
Luigi G. Occhipinti
Abstract:
Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 42…
▽ More
Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 420%, simplifies signal processing compared to traditional voice recognition methods. Our system uses a computationally efficient neural network, specifically a one-dimensional convolutional neural network with residual structures, to decode speech signals. This network is energy and time-efficient, reducing computational load by 90% while achieving 95.25% accuracy for a 20-word lexicon and swiftly adapting to new users and words with minimal samples. This innovation demonstrates a practical, sensitive, and precise wearable SSI suitable for daily communication applications.
△ Less
Submitted 7 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Authors:
Md Asif Jalal,
Pablo Peso Parada,
Jisi Zhang,
Karthikeyan Saravanan,
Mete Ozay,
Myoungji Han,
Jung In Lee,
Seokyeong Jung
Abstract:
Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition…
▽ More
Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition (ASR). The proposed framework attaches flexible gradient reversal based speaker adversarial layers to target layers within an ASR model, where speaker adversarial training anonymizes acoustic embeddings generated by the targeted layers to remove speaker identity. We propose on-device deployment by execution of initial layers of the ASR model, and transmitting anonymized embeddings to the cloud, where the rest of the model is executed while preserving privacy. Experimental results show that our method efficiently reduces speaker recognition relative accuracy by 33%, and improves ASR performance by achieving 6.2% relative Word Error Rate (WER) reduction.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Sound Demixing Challenge 2023 Music Demixing Track Technical Report: TFC-TDF-UNet v3
Authors:
Minseok Kim,
Jun Hyung Lee,
Soonyoung Jung
Abstract:
In this report, we present our award-winning solutions for the Music Demixing Track of Sound Demixing Challenge 2023. First, we propose TFC-TDF-UNet v3, a time-efficient music source separation model that achieves state-of-the-art results on the MUSDB benchmark. We then give full details regarding our solutions for each Leaderboard, including a loss masking approach for noise-robust training. Code…
▽ More
In this report, we present our award-winning solutions for the Music Demixing Track of Sound Demixing Challenge 2023. First, we propose TFC-TDF-UNet v3, a time-efficient music source separation model that achieves state-of-the-art results on the MUSDB benchmark. We then give full details regarding our solutions for each Leaderboard, including a loss masking approach for noise-robust training. Code for reproducing model training and final submissions is available at github.com/kuielab/sdx23.
△ Less
Submitted 21 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Multi-Agent Reinforcement Learning for Cooperative Air Transportation Services in City-Wide Autonomous Urban Air Mobility
Authors:
Chanyoung Park,
Gyu Seon Kim,
Soohyun Park,
Soyi Jung,
Joongheon Kim
Abstract:
The development of urban-air-mobility (UAM) is rapidly progressing with spurs, and the demand for efficient transportation management systems is a rising need due to the multifaceted environmental uncertainties. Thus, this paper proposes a novel air transportation service management algorithm based on multi-agent deep reinforcement learning (MADRL) to address the challenges of multi-UAM cooperatio…
▽ More
The development of urban-air-mobility (UAM) is rapidly progressing with spurs, and the demand for efficient transportation management systems is a rising need due to the multifaceted environmental uncertainties. Thus, this paper proposes a novel air transportation service management algorithm based on multi-agent deep reinforcement learning (MADRL) to address the challenges of multi-UAM cooperation. Specifically, the proposed algorithm in this paper is based on communication network (CommNet) method utilizing centralized training and distributed execution (CTDE) in multiple UAMs for providing efficient air transportation services to passengers collaboratively. Furthermore, this paper adopts actual vertiport maps and UAM specifications for constructing realistic air transportation networks. By evaluating the performance of the proposed algorithm in data-intensive simulations, the results show that the proposed algorithm outperforms existing approaches in terms of air transportation service quality. Furthermore, there are no inferior UAMs by utilizing parameter sharing in CommNet and a centralized critic network in CTDE. Therefore, it can be confirmed that the research results in this paper can provide a promising solution for autonomous air transportation management systems in city-wide urban areas.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Transceiver Design and Performance Analysis for LR-FHSS-based Direct-to-Satellite IoT
Authors:
Sooyeob Jung,
Seongah Jeong,
Jinkyu Kang,
Joon Gyu Ryu,
Joonhyuk Kang
Abstract:
This paper presents a novel transceiver design aimed at enabling Direct-to-Satellite Internet of Things (DtS-IoT) systems based on long range-frequency hopping spread spectrum (LR-FHSS). Our focus lies in developing an accurate transmission method through the analysis of the frame structure and key parameters outlined in Long Range Wide-Area Network (LoRaWAN) [1]. To address the Doppler effect in…
▽ More
This paper presents a novel transceiver design aimed at enabling Direct-to-Satellite Internet of Things (DtS-IoT) systems based on long range-frequency hopping spread spectrum (LR-FHSS). Our focus lies in developing an accurate transmission method through the analysis of the frame structure and key parameters outlined in Long Range Wide-Area Network (LoRaWAN) [1]. To address the Doppler effect in DtS-IoT networks and simultaneously receive numerous frequency hopping signals, a robust signal detector for the receiver is proposed. We verify the performance of the proposed LR-FHSS transceiver design through simulations conducted in a realistic satellite channel environment, assessing metrics such as miss detection probability and packet error probability.
△ Less
Submitted 25 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Cross-domain Denoising for Low-dose Multi-frame Spiral Computed Tomography
Authors:
Yucheng Lu,
Zhixin Xu,
Moon Hyung Choi,
Jimin Kim,
Seung-Won Jung
Abstract:
Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effe…
▽ More
Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effectiveness of learning-based methods, most were developed on the simulated data. However, the real-world scenario differs significantly from the simulation domain, especially when using the multi-slice spiral scanner geometry. This paper proposes a two-stage method for the commercially available multi-slice spiral CT scanners that better exploits the complete reconstruction pipeline for LDCT denoising across different domains. Our approach makes good use of the high redundancy of multi-slice projections and the volumetric reconstructions while leveraging the over-smoothing problem in conventional cascaded frameworks caused by aggressive denoising. The dedicated design also provides a more explicit interpretation of the data flow. Extensive experiments on various datasets showed that the proposed method could remove up to 70\% of noise without compromised spatial resolution, and subjective evaluations by two experienced radiologists further supported its superior performance against state-of-the-art methods in clinical practice.
△ Less
Submitted 28 June, 2024; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Learning to exploit z-Spatial Diversity for Coherent Nonlinear Optical Fiber Communication
Authors:
Sebastian Jung,
Tim Uhlemann,
Alexander Span,
Maximilian Bauhofer,
Stephan ten Brink
Abstract:
Higher-order solitons inherently possess a spatial periodicity along the propagation axis. The pulse expands and compresses in both, frequency and time domain. This property is exploited for a bandwidth-limited receiver by sampling the optical signal at two different distances. Numerical simulations show that when pure solions are transmitted and the second (i.e., further propagated) signal is als…
▽ More
Higher-order solitons inherently possess a spatial periodicity along the propagation axis. The pulse expands and compresses in both, frequency and time domain. This property is exploited for a bandwidth-limited receiver by sampling the optical signal at two different distances. Numerical simulations show that when pure solions are transmitted and the second (i.e., further propagated) signal is also processed, a significant gain in terms of required receiver bandwidth is obtained. Since all pulses propagating in a nonlinear optical fiber exhibit solitonic behavior given sufficient input power and propagation distance, the above concept can also be applied to spectrally efficient Nyquist pulse shaping and higher symbol rates. Transmitter and receiver are trainable structures as part of an autoencoder, aiming to learn a suitable predistortion and post-equalization using both signals to increase the spectral efficiency.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking
Authors:
Yunwoo Lee,
Jungwon Park,
Seungwoo Jung,
Boseong Jeon,
Dahyun Oh,
H. Jin Kim
Abstract:
Maintaining the visibility of the target is one of the major objectives of aerial tracking missions. This paper proposes a target-visible trajectory planning pipeline using quadratic programming (QP). Our approach can handle various tracking settings, including 1) single- and dual-target following and 2) both static and dynamic environments, unlike other works that focus on a single specific setup…
▽ More
Maintaining the visibility of the target is one of the major objectives of aerial tracking missions. This paper proposes a target-visible trajectory planning pipeline using quadratic programming (QP). Our approach can handle various tracking settings, including 1) single- and dual-target following and 2) both static and dynamic environments, unlike other works that focus on a single specific setup. In contrast to other studies that fully trust the predicted trajectory of the target and consider only the visibility of the target's center, our pipeline considers error in target path prediction and the entire body of the target to maintain the target visibility robustly. First, a prediction module uses a sample-check strategy to quickly calculate the reachable sets of moving objects, which represent the areas their bodies can reach, considering obstacles. Subsequently, the planning module formulates a single QP problem, considering path topology, to generate a tracking trajectory that maximizes the visibility of the target's reachable set among obstacles. The performance of the planner is validated in multiple scenarios, through high-fidelity simulations and real-world experiments.
△ Less
Submitted 26 November, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Marine IoT Systems with Space-Air-Sea Integrated Networks: Hybrid LEO and UAV Edge Computing
Authors:
Sooyeob Jung,
Seongah Jeong,
Jinkyu Kang,
Joonhyuk Kang
Abstract:
Marine Internet of Things (IoT) systems have grown substantially with the development of non-terrestrial networks (NTN) via aerial and space vehicles in the upcoming sixth-generation (6G), thereby assisting environment protection, military reconnaissance, and sea transportation. Due to unpredictable climate changes and the extreme channel conditions of maritime networks, however, it is challenging…
▽ More
Marine Internet of Things (IoT) systems have grown substantially with the development of non-terrestrial networks (NTN) via aerial and space vehicles in the upcoming sixth-generation (6G), thereby assisting environment protection, military reconnaissance, and sea transportation. Due to unpredictable climate changes and the extreme channel conditions of maritime networks, however, it is challenging to efficiently and reliably collect and compute a huge amount of maritime data. In this paper, we propose a hybrid low-Earth orbit (LEO) and unmanned aerial vehicle (UAV) edge computing method in space-air-sea integrated networks for marine IoT systems. Specifically, two types of edge servers mounted on UAVs and LEO satellites are endowed with computational capabilities for the real-time utilization of a sizable data collected from ocean IoT sensors. Our system aims at minimizing the total energy consumption of the battery-constrained UAV by jointly optimizing the bit allocation of communication and computation along with the UAV path planning under latency, energy budget and operational constraints. For availability and practicality, the proposed methods were developed for three different cases according to the accessibility of the LEO satellite, ``Always On," ``Always Off" and ``Intermediate Disconnected", by leveraging successive convex approximation (SCA) strategies. Via numerical results, we verify that significant energy savings can be accrued for all cases of LEO accessibility by means of joint optimization of bit allocation and UAV path planning compared to partial optimization schemes that design for only the bit allocation or trajectory of the UAV.
△ Less
Submitted 10 January, 2023;
originally announced January 2023.
-
Situation-Aware Deep Reinforcement Learning for Autonomous Nonlinear Mobility Control in Cyber-Physical Loitering Munition Systems
Authors:
Hyunsoo Lee,
Soohyun Park,
Won Joon Yun,
Soyi Jung,
Joongheon Kim
Abstract:
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world…
▽ More
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.
△ Less
Submitted 31 December, 2022;
originally announced January 2023.
-
Neural Architectural Nonlinear Pre-Processing for mmWave Radar-based Human Gesture Perception
Authors:
Hankyul Baek,
Yoo Jeong,
Ha,
Minjae Yoo,
Soyi Jung,
Joongheon Kim
Abstract:
In modern on-driving computing environments, many sensors are used for context-aware applications. This paper utilizes two deep learning models, U-Net and EfficientNet, which consist of a convolutional neural network (CNN), to detect hand gestures and remove noise in the Range Doppler Map image that was measured through a millimeter-wave (mmWave) radar. To improve the performance of classification…
▽ More
In modern on-driving computing environments, many sensors are used for context-aware applications. This paper utilizes two deep learning models, U-Net and EfficientNet, which consist of a convolutional neural network (CNN), to detect hand gestures and remove noise in the Range Doppler Map image that was measured through a millimeter-wave (mmWave) radar. To improve the performance of classification, accurate pre-processing algorithms are essential. Therefore, a novel pre-processing approach to denoise images before entering the first deep learning model stage increases the accuracy of classification. Thus, this paper proposes a deep neural network based high-performance nonlinear pre-processing method.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
RAWtoBit: A Fully End-to-end Camera ISP Network
Authors:
Wooseok Jeong,
Seung-Won Jung
Abstract:
Image compression is an essential and last processing unit in the camera image signal processing (ISP) pipeline. While many studies have been made to replace the conventional ISP pipeline with a single end-to-end optimized deep learning model, image compression is barely considered as a part of the model. In this paper, we investigate the designing of a fully end-to-end optimized camera ISP incorp…
▽ More
Image compression is an essential and last processing unit in the camera image signal processing (ISP) pipeline. While many studies have been made to replace the conventional ISP pipeline with a single end-to-end optimized deep learning model, image compression is barely considered as a part of the model. In this paper, we investigate the designing of a fully end-to-end optimized camera ISP incorporating image compression. To this end, we propose RAWtoBit network (RBN) that can effectively perform both tasks simultaneously. RBN is further improved with a novel knowledge distillation scheme by introducing two teacher networks specialized in each task. Extensive experiments demonstrate that our proposed method significantly outperforms alternative approaches in terms of rate-distortion trade-off.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation
Authors:
Shahzad Ali,
Arif Mahmood,
Soon Ki Jung
Abstract:
Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks. Our model includes a residual connection along with a channel…
▽ More
Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks. Our model includes a residual connection along with a channel and spatial attention integrated within each convolution block. A simple patch-based approach for model training, test time augmentations, and majority voting on the obtained predictions resulted in superior performance. Our model did not leverage any readily available backbone architecture, pre-training on a similar external dataset, or any of the transfer learning techniques. The total number of network parameters being around 5 million made it a significantly lightweight model as compared with the available state-of-the-art models used for the foot ulcer segmentation task. Our experiments presented results at the patch-level and image-level. Applied on publicly available Foot Ulcer Segmentation (FUSeg) Challenge dataset from MICCAI 2021, our model achieved state-of-the-art image-level performance of 88.22% in terms of Dice similarity score and ranked second in the official challenge leaderboard. We also showed an extremely simple solution that could be compared against the more advanced architectures.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
FrequencyLowCut Pooling -- Plug & Play against Catastrophic Overfitting
Authors:
Julia Grabinski,
Steffen Jung,
Janis Keuper,
Margret Keuper
Abstract:
Over the last years, Convolutional Neural Networks (CNNs) have been the dominating neural architecture in a wide range of computer vision tasks. From an image and signal processing point of view, this success might be a bit surprising as the inherent spatial pyramid design of most CNNs is apparently violating basic signal processing laws, i.e. Sampling Theorem in their down-sampling operations. Ho…
▽ More
Over the last years, Convolutional Neural Networks (CNNs) have been the dominating neural architecture in a wide range of computer vision tasks. From an image and signal processing point of view, this success might be a bit surprising as the inherent spatial pyramid design of most CNNs is apparently violating basic signal processing laws, i.e. Sampling Theorem in their down-sampling operations. However, since poor sampling appeared not to affect model accuracy, this issue has been broadly neglected until model robustness started to receive more attention. Recent work [17] in the context of adversarial attacks and distribution shifts, showed after all, that there is a strong correlation between the vulnerability of CNNs and aliasing artifacts induced by poor down-sampling operations. This paper builds on these findings and introduces an aliasing free down-sampling operation which can easily be plugged into any CNN architecture: FrequencyLowCut pooling. Our experiments show, that in combination with simple and fast FGSM adversarial training, our hyper-parameter free operator significantly improves model robustness and avoids catastrophic overfitting.
△ Less
Submitted 20 September, 2022; v1 submitted 1 April, 2022;
originally announced April 2022.
-
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Authors:
Dan Lim,
Sunghee Jung,
Eesung Kim
Abstract:
In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech. For example, FastSpeech2 transforms an input text to a mel-spectrogram and then HiFi-GAN generates a raw waveform from a mel-spectogram where they are called an acoustic feature generator and a neural vocoder respectively. However, their training pipeline i…
▽ More
In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech. For example, FastSpeech2 transforms an input text to a mel-spectrogram and then HiFi-GAN generates a raw waveform from a mel-spectogram where they are called an acoustic feature generator and a neural vocoder respectively. However, their training pipeline is somewhat cumbersome in that it requires a fine-tuning and an accurate speech-text alignment for optimal performance. In this work, we present end-to-end text-to-speech (E2E-TTS) model which has a simplified training pipeline and outperforms a cascade of separately learned models. Specifically, our proposed model is jointly trained FastSpeech2 and HiFi-GAN with an alignment module. Since there is no acoustic feature mismatch between training and inference, it does not requires fine-tuning. Furthermore, we remove dependency on an external speech-text alignment tool by adopting an alignment learning objective in our joint training framework. Experiments on LJSpeech corpus shows that the proposed model outperforms publicly available, state-of-the-art implementations of ESPNet2-TTS on subjective evaluation (MOS) and some objective evaluations.
△ Less
Submitted 1 July, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Feasibility Study of Multi-Site Split Learning for Privacy-Preserving Medical Systems under Data Imbalance Constraints in COVID-19, X-Ray, and Cholesterol Dataset
Authors:
Yoo Jeong Ha,
Gusang Lee,
Minjae Yoo,
Soyi Jung,
Seehwan Yoo,
Joongheon Kim
Abstract:
It seems as though progressively more people are in the race to upload content, data, and information online; and hospitals haven't neglected this trend either. Hospitals are now at the forefront for multi-site medical data sharing to provide groundbreaking advancements in the way health records are shared and patients are diagnosed. Sharing of medical data is essential in modern medical research.…
▽ More
It seems as though progressively more people are in the race to upload content, data, and information online; and hospitals haven't neglected this trend either. Hospitals are now at the forefront for multi-site medical data sharing to provide groundbreaking advancements in the way health records are shared and patients are diagnosed. Sharing of medical data is essential in modern medical research. Yet, as with all data sharing technology, the challenge is to balance improved treatment with protecting patient's personal information. This paper provides a novel split learning algorithm coined the term, "multi-site split learning", which enables a secure transfer of medical data between multiple hospitals without fear of exposing personal data contained in patient records. It also explores the effects of varying the number of end-systems and the ratio of data-imbalance on the deep learning performance. A guideline for the most optimal configuration of split learning that ensures privacy of patient data whilst achieving performance is empirically given. We argue the benefits of our multi-site split learning algorithm, especially regarding the privacy preserving factor, using CT scans of COVID-19 patients, X-ray bone scans, and cholesterol level medical data.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Cooperative Multi-Agent Deep Reinforcement Learning for Reliable Surveillance via Autonomous Multi-UAV Control
Authors:
Won Joon Yun,
Soohyun Park,
Joongheon Kim,
MyungJae Shin,
Soyi Jung,
David A. Mohaisen,
Jae-Hyun Kim
Abstract:
CCTV-based surveillance using unmanned aerial vehicles (UAVs) is considered a key technology for security in smart city environments. This paper creates a case where the UAVs with CCTV-cameras fly over the city area for flexible and reliable surveillance services. UAVs should be deployed to cover a large area while minimize overlapping and shadow areas for a reliable surveillance system. However,…
▽ More
CCTV-based surveillance using unmanned aerial vehicles (UAVs) is considered a key technology for security in smart city environments. This paper creates a case where the UAVs with CCTV-cameras fly over the city area for flexible and reliable surveillance services. UAVs should be deployed to cover a large area while minimize overlapping and shadow areas for a reliable surveillance system. However, the operation of UAVs is subject to high uncertainty, necessitating autonomous recovery systems. This work develops a multi-agent deep reinforcement learning-based management scheme for reliable industry surveillance in smart city applications. The core idea this paper employs is autonomously replenishing the UAV's deficient network requirements with communications. Via intensive simulations, our proposed algorithm outperforms the state-of-the-art algorithms in terms of surveillance coverage, user support capability, and computational costs.
△ Less
Submitted 15 January, 2022;
originally announced January 2022.
-
Learning source-aware representations of music in a discrete latent space
Authors:
Jinsung Kim,
Yeong-Seok Jeong,
Woosung Choi,
Jaehwa Chung,
Soonyoung Jung
Abstract:
In recent years, neural network based methods have been proposed as a method that cangenerate representations from music, but they are not human readable and hardly analyzable oreditable by a human. To address this issue, we propose a novel method to learn source-awarelatent representations of music through Vector-Quantized Variational Auto-Encoder(VQ-VAE).We train our VQ-VAE to encode an input mi…
▽ More
In recent years, neural network based methods have been proposed as a method that cangenerate representations from music, but they are not human readable and hardly analyzable oreditable by a human. To address this issue, we propose a novel method to learn source-awarelatent representations of music through Vector-Quantized Variational Auto-Encoder(VQ-VAE).We train our VQ-VAE to encode an input mixture into a tensor of integers in a discrete latentspace, and design them to have a decomposed structure which allows humans to manipulatethe latent vector in a source-aware manner. This paper also shows that we can generate basslines by estimating latent vectors in a discrete space.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
LightSAFT: Lightweight Latent Source Aware Frequency Transform for Source Separation
Authors:
Yeong-Seok Jeong,
Jinsung Kim,
Woosung Choi,
Jaehwa Chung,
Soonyoung Jung
Abstract:
Conditioned source separations have attracted significant attention because of their flexibility, applicability and extensionality. Their performance was usually inferior to the existing approaches, such as the single source separation model. However, a recently proposed method called LaSAFT-Net has shown that conditioned models can show comparable performance against existing single-source separa…
▽ More
Conditioned source separations have attracted significant attention because of their flexibility, applicability and extensionality. Their performance was usually inferior to the existing approaches, such as the single source separation model. However, a recently proposed method called LaSAFT-Net has shown that conditioned models can show comparable performance against existing single-source separation models. This paper presents LightSAFT-Net, a lightweight version of LaSAFT-Net. As a baseline, it provided a sufficient SDR performance for comparison during the Music Demixing Challenge at ISMIR 2021. This paper also enhances the existing LightSAFT-Net by replacing the LightSAFT blocks in the encoder with TFC-TDF blocks. Our enhanced LightSAFT-Net outperforms the previous one with fewer parameters.Conditioned source separations have attracted significant attention because of their flexibility, applicability and extensionality. Their performance was usually inferior to the existing approaches, such as the single source separation model. However, a recently proposed method called LaSAFT-Net has shown that conditioned models can show comparable performance against existing single-source separation models. This paper presents LightSAFT-Net, a lightweight version of LaSAFT-Net. As a baseline, it provided a sufficient SDR performance for comparison during the Music Demixing Challenge at ISMIR 2021.
△ Less
Submitted 26 January, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing
Authors:
Minseok Kim,
Woosung Choi,
Jaehwa Chung,
Daewon Lee,
Soonyoung Jung
Abstract:
Recently, many methods based on deep learning have been proposed for music source separation. Some state-of-the-art methods have shown that stacking many layers with many skip connections improve the SDR performance. Although such a deep and complex architecture shows outstanding performance, it usually requires numerous computing resources and time for training and evaluation. This paper proposes…
▽ More
Recently, many methods based on deep learning have been proposed for music source separation. Some state-of-the-art methods have shown that stacking many layers with many skip connections improve the SDR performance. Although such a deep and complex architecture shows outstanding performance, it usually requires numerous computing resources and time for training and evaluation. This paper proposes a two-stream neural network for music demixing, called KUIELab-MDX-Net, which shows a good balance of performance and required resources. The proposed model has a time-frequency branch and a time-domain branch, where each branch separates stems, respectively. It blends results from two streams to generate the final estimation. KUIELab-MDX-Net took second place on leaderboard A and third place on leaderboard B in the Music Demixing Challenge at ISMIR 2021. This paper also summarizes experimental results on another benchmark, MUSDB18. Our source code is available online.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Stable Marriage Matching for Traffic-Aware Space-Air-Ground Integrated Networks: A Gale-Shapley Algorithmic Approach
Authors:
Hyunsoo Lee,
Haemin Lee,
Soyi Jung,
Joongheon Kim
Abstract:
In keeping with the rapid development of communication technology, a new communication structure is required in a next-generation communication system. In particular, research using High Altitude Platform (HAP) or Unmanned Aerial Vehicle(UAV) in existing terrestrial networks is active. In this paper, we propose matching HAP and UAV using the Gale-Shapley algorithm in a relay communication situatio…
▽ More
In keeping with the rapid development of communication technology, a new communication structure is required in a next-generation communication system. In particular, research using High Altitude Platform (HAP) or Unmanned Aerial Vehicle(UAV) in existing terrestrial networks is active. In this paper, we propose matching HAP and UAV using the Gale-Shapley algorithm in a relay communication situation. The numerical simulation results demonstrate that applying the Gale-Shapley algorithm shows superior performance compared to random matching.
△ Less
Submitted 17 October, 2021;
originally announced October 2021.
-
Spatio-Temporal Split Learning for Privacy-Preserving Medical Platforms: Case Studies with COVID-19 CT, X-Ray, and Cholesterol Data
Authors:
Yoo Jeong Ha,
Minjae Yoo,
Gusang Lee,
Soyi Jung,
Sae Won Choi,
Joongheon Kim,
Seehwan Yoo
Abstract:
Machine learning requires a large volume of sample data, especially when it is used in high-accuracy medical applications. However, patient records are one of the most sensitive private information that is not usually shared among institutes. This paper presents spatio-temporal split learning, a distributed deep neural network framework, which is a turning point in allowing collaboration among pri…
▽ More
Machine learning requires a large volume of sample data, especially when it is used in high-accuracy medical applications. However, patient records are one of the most sensitive private information that is not usually shared among institutes. This paper presents spatio-temporal split learning, a distributed deep neural network framework, which is a turning point in allowing collaboration among privacy-sensitive organizations. Our spatio-temporal split learning presents how distributed machine learning can be efficiently conducted with minimal privacy concerns. The proposed split learning consists of a number of clients and a centralized server. Each client has only has one hidden layer, which acts as the privacy-preserving layer, and the centralized server comprises the other hidden layers and the output layer. Since the centralized server does not need to access the training data and trains the deep neural network with parameters received from the privacy-preserving layer, privacy of original data is guaranteed. We have coined the term, spatio-temporal split learning, as multiple clients are spatially distributed to cover diverse datasets from different participants, and we can temporally split the learning process, detaching the privacy preserving layer from the rest of the learning process to minimize privacy breaches. This paper shows how we can analyze the medical data whilst ensuring privacy using our proposed multi-site spatio-temporal split learning algorithm on Coronavirus Disease-19 (COVID-19) chest Computed Tomography (CT) scans, MUsculoskeletal RAdiographs (MURA) X-ray images, and cholesterol levels.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Quantum Scheduling for Millimeter-Wave Observation Satellite Constellation
Authors:
Joongheon Kim,
Yunseok Kwak,
Soyi Jung,
Jae-Hyun Kim
Abstract:
In beyond 5G and 6G network scenarios, the use of satellites has been actively discussed for extending target monitoring areas, even for extreme circumstances, where the monitoring functionalities can be realized due to the usage of millimeter-wave wireless links. This paper designs an efficient scheduling algorithm which minimizes overlapping monitoring areas among observation satellite constella…
▽ More
In beyond 5G and 6G network scenarios, the use of satellites has been actively discussed for extending target monitoring areas, even for extreme circumstances, where the monitoring functionalities can be realized due to the usage of millimeter-wave wireless links. This paper designs an efficient scheduling algorithm which minimizes overlapping monitoring areas among observation satellite constellation. In order to achieve this objective, a quantum optimization based algorithm is used because the overlapping can be mathematically modelled via a max-weight independent set (MWIS) problem which is one of well-known NP-hard problems.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Distributed and Autonomous Aerial Data Collection in Smart City Surveillance Applications
Authors:
Haemin Lee,
Soyi Jung,
Joongheon Kim
Abstract:
The massive growth of Smart City and Internet of Things applications enables safety and security. The data those are produced from surveillance cameras in aerial devices such as unmanned aerial networks (UAVs) are needed to be transferred to ground stations for secure data analysis. When the scale of network is relatively large compare to the wireless communication coverage of device, it is not al…
▽ More
The massive growth of Smart City and Internet of Things applications enables safety and security. The data those are produced from surveillance cameras in aerial devices such as unmanned aerial networks (UAVs) are needed to be transferred to ground stations for secure data analysis. When the scale of network is relatively large compare to the wireless communication coverage of device, it is not always available to transmit the data to the ground stations, thus distributed and autonomous algorithms are essentially desired. Based on the needs, we propose a novel algorithm that is for collecting surveillance data under the consideration of mobility and flexibility of UAV networks. Due to the battery limitation in UAVs, we selectively collect data from the UAVs by setting rules under the consideration of distance and similarity. As a sequence, the UAV devices have to compete for a chance to get data processing. For this purpose, this paper designs a Myerson auction-based deep learning algorithm to leverage the UAV's revenue compare to traditional second-price auction while preserving truthfulness. Based on simulation results, we verify that our proposed algorithm achieves desired performance improvements.
△ Less
Submitted 25 July, 2021;
originally announced July 2021.
-
Progressive Joint Low-light Enhancement and Noise Removal for Raw Images
Authors:
Yucheng Lu,
Seung-Won Jung
Abstract:
Low-light imaging on mobile devices is typically challenging due to insufficient incident light coming through the relatively small aperture, resulting in a low signal-to-noise ratio. Most of the previous works on low-light image processing focus either only on a single task such as illumination adjustment, color enhancement, or noise removal; or on a joint illumination adjustment and denoising ta…
▽ More
Low-light imaging on mobile devices is typically challenging due to insufficient incident light coming through the relatively small aperture, resulting in a low signal-to-noise ratio. Most of the previous works on low-light image processing focus either only on a single task such as illumination adjustment, color enhancement, or noise removal; or on a joint illumination adjustment and denoising task that heavily relies on short-long exposure image pairs collected from specific camera models, and thus these approaches are less practical and generalizable in real-world settings where camera-specific joint enhancement and restoration is required. To tackle this problem, in this paper, we propose a low-light image processing framework that performs joint illumination adjustment, color enhancement, and denoising. Considering the difficulty in model-specific data collection and the ultra-high definition of the captured images, we design two branches: a coefficient estimation branch as well as a joint enhancement and denoising branch. The coefficient estimation branch works in a low-resolution space and predicts the coefficients for enhancement via bilateral learning, whereas the joint enhancement and denoising branch works in a full-resolution space and progressively performs joint enhancement and denoising. In contrast to existing methods, our framework does not need to recollect massive data when being adapted to another camera model, which significantly reduces the efforts required to fine-tune our approach for practical usage. Through extensive experiments, we demonstrate its great potential in real-world low-light imaging applications when compared with current state-of-the-art methods.
△ Less
Submitted 2 September, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries
Authors:
Woosung Choi,
Minseok Kim,
Marco A. Martínez Ramírez,
Jaehwa Chung,
Soonyoung Jung
Abstract:
This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is `transparent'; it usually carries…
▽ More
This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is `transparent'; it usually carries information from multiple sources, in contrast to a pixel in an image. To address this challenging problem, we propose AMSS-Net, which extracts latent sources and selectively manipulates them while preserving irrelevant sources. We also propose an evaluation benchmark for several AMSS tasks, and we show that AMSS-Net outperforms baselines on several AMSS tasks via objective metrics and empirical verification.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation
Authors:
Woosung Choi,
Minseok Kim,
Jaehwa Chung,
Soonyoung Jung
Abstract:
Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns.…
▽ More
Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate internal features. By employing these two novel methods, we extend the Conditioned-U-Net (CUNet) for multi-source separation, and the experimental results indicate that our LaSAFT and GPoCM can improve the CUNet's performance, achieving state-of-the-art SDR performance on several MUSDB18 source separation tasks.
△ Less
Submitted 14 April, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
A 6.3-Nanowatt-per-Channel 96-Channel Neural Spike Processor for a Movement-Intention-Decoding Brain-Computer-Interface Implant
Authors:
Zhewei Jiang,
Jiangyi Li,
Pavan K. Chundi,
Sung Justin Kim,
Minhao Yang,
Joonseong Kang,
Seungchul Jung,
Sang Joon Kim,
Mingoo Seok
Abstract:
This paper presents microwatt end-to-end neural signal processing hardware for deployment-stage real-time upper-limb movement intent decoding. This module features intercellular spike detection, sorting, and decoding operations for a 96-channel prosthetic implant. We design the algorithms for those operations to achieve minimal computation complexity while matching or advancing the accuracy of sta…
▽ More
This paper presents microwatt end-to-end neural signal processing hardware for deployment-stage real-time upper-limb movement intent decoding. This module features intercellular spike detection, sorting, and decoding operations for a 96-channel prosthetic implant. We design the algorithms for those operations to achieve minimal computation complexity while matching or advancing the accuracy of state-of-art Brain-Computer-Interface sorting and movement decoding. Based on those algorithms, we devise the architect of the neural signal processing hardware with the focus on hardware reuse and event-driven operation. The design achieves among the highest levels of integration, reducing wireless data rate by more than four orders of magnitude. The chip prototype in a 180-nm high-VTH, achieving the lowest power dissipation of 0.61 uW for 96 channels, 21X lower than the prior art at a comparable/better accuracy even with integration of kinematic state estimation computation.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Adaptable Multi-Domain Language Model for Transformer ASR
Authors:
Taewoo Lee,
Min-Joong Lee,
Tae Gyoon Kang,
Seokyeoung Jung,
Minseok Kwon,
Yeona Hong,
Jungin Lee,
Kyoung-Gu Woo,
Ho-Gyeong Kim,
Jiseung Jeong,
Jihyun Lee,
Hosik Lee,
Young Sang Choi
Abstract:
We propose an adapter based multi-domain Transformer based language model (LM) for Transformer ASR. The model consists of a big size common LM and small size adapters. The model can perform multi-domain adaptation with only the small size adapters and its related layers. The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed LM c…
▽ More
We propose an adapter based multi-domain Transformer based language model (LM) for Transformer ASR. The model consists of a big size common LM and small size adapters. The model can perform multi-domain adaptation with only the small size adapters and its related layers. The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed LM can be expanded to new domains by adding about 2% of parameters for a first domain and 13% parameters for after second domain. The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process. Using proposed adapter based approach, we observed that a general LM with adapter can outperform a dedicated music domain LM in terms of word error rate (WER).
△ Less
Submitted 10 February, 2021; v1 submitted 14 August, 2020;
originally announced August 2020.