Search | arXiv e-print repository

SEDD-PCC: A Single Encoder-Dual Decoder Framework For End-To-End Learned Point Cloud Compression

Authors: Kai Hsiang Hsieh, Monyneath Yim, Jui Chiu Chiang

Abstract: To encode point clouds containing both geometry and attributes, most learning-based compression schemes treat geometry and attribute coding separately, employing distinct encoders and decoders. This not only increases computational complexity but also fails to fully exploit shared features between geometry and attributes. To address this limitation, we propose SEDD-PCC, an end-to-end learning-base… ▽ More To encode point clouds containing both geometry and attributes, most learning-based compression schemes treat geometry and attribute coding separately, employing distinct encoders and decoders. This not only increases computational complexity but also fails to fully exploit shared features between geometry and attributes. To address this limitation, we propose SEDD-PCC, an end-to-end learning-based framework for lossy point cloud compression that jointly compresses geometry and attributes. SEDD-PCC employs a single encoder to extract shared geometric and attribute features into a unified latent space, followed by dual specialized decoders that sequentially reconstruct geometry and attributes. Additionally, we incorporate knowledge distillation to enhance feature representation learning from a teacher model, further improving coding efficiency. With its simple yet effective design, SEDD-PCC provides an efficient and practical solution for point cloud compression. Comparative evaluations against both rule-based and learning-based methods demonstrate its competitive performance, highlighting SEDD-PCC as a promising AI-driven compression approach. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2409.15586 [pdf, other]

TFT-multi: simultaneous forecasting of vital sign trajectories in the ICU

Authors: Rosemary Y. He, Jeffrey N. Chiang

Abstract: Trajectory forecasting in healthcare data has been an important area of research in precision care and clinical integration for computational methods. In recent years, generative AI models have demonstrated promising results in capturing short and long range dependencies in time series data. While these models have also been applied in healthcare, most of them only predict one value at a time, whi… ▽ More Trajectory forecasting in healthcare data has been an important area of research in precision care and clinical integration for computational methods. In recent years, generative AI models have demonstrated promising results in capturing short and long range dependencies in time series data. While these models have also been applied in healthcare, most of them only predict one value at a time, which is unrealistic in a clinical setting where multiple measures are taken at once. In this work, we extend the framework temporal fusion transformer (TFT), a multi-horizon time series prediction tool, and propose TFT-multi, an end-to-end framework that can predict multiple vital trajectories simultaneously. We apply TFT-multi to forecast 5 vital signs recorded in the intensive care unit: blood pressure, pulse, SpO2, temperature and respiratory rate. We hypothesize that by jointly predicting these measures, which are often correlated with one another, we can make more accurate predictions, especially in variables with large missingness. We validate our model on the public MIMIC dataset and an independent institutional dataset, and demonstrate that this approach outperforms state-of-the-art univariate prediction tools including the original TFT and Prophet, as well as vector regression modeling for multivariate prediction. Furthermore, we perform a study case analysis by applying our pipeline to forecast blood pressure changes in response to actual and hypothetical pressor administration. △ Less

Submitted 6 December, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

arXiv:2403.11249 [pdf, other]

YOLOv9 for Fracture Detection in Pediatric Wrist Trauma X-ray Images

Authors: Chun-Tse Chien, Rui-Yang Ju, Kuang-Yi Chou, Jen-Shiun Chiang

Abstract: The introduction of YOLOv9, the latest version of the You Only Look Once (YOLO) series, has led to its widespread adoption across various scenarios. This paper is the first to apply the YOLOv9 algorithm model to the fracture detection task as computer-assisted diagnosis (CAD) to help radiologists and surgeons to interpret X-ray images. Specifically, this paper trained the model on the GRAZPEDWRI-D… ▽ More The introduction of YOLOv9, the latest version of the You Only Look Once (YOLO) series, has led to its widespread adoption across various scenarios. This paper is the first to apply the YOLOv9 algorithm model to the fracture detection task as computer-assisted diagnosis (CAD) to help radiologists and surgeons to interpret X-ray images. Specifically, this paper trained the model on the GRAZPEDWRI-DX dataset and extended the training set using data augmentation techniques to improve the model performance. Experimental results demonstrate that compared to the mAP 50-95 of the current state-of-the-art (SOTA) model, the YOLOv9 model increased the value from 42.16% to 43.73%, with an improvement of 3.7%. The implementation code is publicly available at https://github.com/RuiyangJu/YOLOv9-Fracture-Detection. △ Less

Submitted 27 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted by Electronics Letters

arXiv:2312.06668 [pdf]

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Authors: Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi

Abstract: Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-… ▽ More Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted to ASRU 2023

arXiv:2308.14763 [pdf, other]

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

Authors: Jia-Jyu Su, Pang-Chen Liao, Yen-Ting Lin, Wu-Hao Li, Guan-Ting Liou, Cheng-Che Kao, Wei-Cheng Chen, Jen-Chieh Chiang, Wen-Yang Chang, Pin-Han Lin, Chen-Yu Chiang

Abstract: Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of… ▽ More Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of the developed personalized TTS systems, for the VoiceBanking project. The developed corpus is named after the VoiceBank-2023 speech corpus because of its release year. The corpus contains 29.78 hours of utterances with prompts of short paragraphs and common phrases spoken by 111 native Mandarin speakers. The corpus is labeled with information about gender, degree of speech impairment, types of users, transcription, SNRs, and speaking rates. The VoiceBank-2023 is available by request for non-commercial use and welcomes all parties to join the VoiceBanking project to improve the services for the speech impaired. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: submitted to 26th International Conference of the ORIENTAL-COCOSDA

arXiv:2112.10184 [pdf]

A Deep Learning Based Workflow for Detection of Lung Nodules With Chest Radiograph

Authors: Yang Tai, Yu-Wen Fang, Fang-Yi Su, Jung-Hsien Chiang

Abstract: PURPOSE: This study aimed to develop a deep learning-based tool to detect and localize lung nodules with chest radiographs(CXRs). We expected it to enhance the efficiency of interpreting CXRs and reduce the possibilities of delayed diagnosis of lung cancer. MATERIALS AND METHODS: We collected CXRs from NCKUH database and VBD, an open-source medical image dataset, as our training and validation d… ▽ More PURPOSE: This study aimed to develop a deep learning-based tool to detect and localize lung nodules with chest radiographs(CXRs). We expected it to enhance the efficiency of interpreting CXRs and reduce the possibilities of delayed diagnosis of lung cancer. MATERIALS AND METHODS: We collected CXRs from NCKUH database and VBD, an open-source medical image dataset, as our training and validation data. A number of CXRs from the Ministry of Health and Welfare(MOHW) database served as our test data. We built a segmentation model to identify lung areas from CXRs, and sliced them into 16 patches. Physicians labeled the CXRs by clicking the patches. These labeled patches were then used to train and fine-tune a deep neural network(DNN) model, classifying the patches as positive or negative. Finally, we test the DNN model with the lung patches of CXRs from MOHW. RESULTS: Our segmentation model identified the lung regions well from the whole CXR. The Intersection over Union(IoU) between the ground truth and the segmentation result was 0.9228. In addition, our DNN model achieved a sensitivity of 0.81, specificity of 0.82, and AUROC of 0.869 in 98 of 125 cases. For the other 27 difficult cases, the sensitivity was 0.54, specificity 0.494, and AUROC 0.682. Overall, we obtained a sensitivity of 0.78, specificity of 0.79, and AUROC 0.837. CONCLUSIONS: Our two-step workflow is comparable to state-of-the-art algorithms in the sensitivity and specificity of localizing lung nodules from CXRs. Notably, our workflow provides an efficient way for specialists to label the data, which is valuable for relevant researches because of the relative rarity of labeled medical image data. △ Less

Submitted 11 March, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

arXiv:2105.01840 [pdf]

CUAB: Convolutional Uncertainty Attention Block Enhanced the Chest X-ray Image Analysis

Authors: Chi-Shiang Wang, Fang-Yi Su, Tsung-Lu Michael Lee, Yi-Shan Tsai, Jung-Hsien Chiang

Abstract: In recent years, convolutional neural networks (CNNs) have been successfully implemented to various image recognition applications, such as medical image analysis, object detection, and image segmentation. Many studies and applications have been working on improving the performance of CNN algorithms and models. The strategies that aim to improve the performance of CNNs can be grouped into three ma… ▽ More In recent years, convolutional neural networks (CNNs) have been successfully implemented to various image recognition applications, such as medical image analysis, object detection, and image segmentation. Many studies and applications have been working on improving the performance of CNN algorithms and models. The strategies that aim to improve the performance of CNNs can be grouped into three major approaches: (1) deeper and wider network architecture, (2) automatic architecture search, and (3) convolutional attention block. Unlike approaches (1) and (2), the convolutional attention block approach is more flexible with lower cost. It enhances the CNN performance by extracting more efficient features. However, the existing attention blocks focus on enhancing the significant features, which lose some potential features in the uncertainty information. Inspired by the test time augmentation and test-time dropout approaches, we developed a novel convolutional uncertainty attention block (CUAB) that can leverage the uncertainty information to improve CNN-based models. The proposed module discovers potential information from the uncertain regions on feature maps in computer vision tasks. It is a flexible functional attention block that can be applied to any position in the convolutional block in CNN models. We evaluated the CUAB with notable backbone models, ResNet and ResNeXt, on a medical image segmentation task. The CUAB achieved a dice score of 73% and 84% in pneumonia and pneumothorax segmentation, respectively, thereby outperforming the original model and other notable attention approaches. The results demonstrated that the CUAB can efficiently utilize the uncertainty information to improve the model performance. △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:1908.10454 [pdf, other]

Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation

Authors: Nima Tajbakhsh, Laura Jeyaseelan, Qian Li, Jeffrey Chiang, Zhihao Wu, Xiaowei Ding

Abstract: The medical imaging literature has witnessed remarkable progress in high-performing segmentation models based on convolutional neural networks. Despite the new performance highs, the recent advanced segmentation models still require large, representative, and high quality annotated datasets. However, rarely do we have a perfect training dataset, particularly in the field of medical imaging, where… ▽ More The medical imaging literature has witnessed remarkable progress in high-performing segmentation models based on convolutional neural networks. Despite the new performance highs, the recent advanced segmentation models still require large, representative, and high quality annotated datasets. However, rarely do we have a perfect training dataset, particularly in the field of medical imaging, where data and annotations are both expensive to acquire. Recently, a large body of research has studied the problem of medical image segmentation with imperfect datasets, tackling two major dataset limitations: scarce annotations where only limited annotated data is available for training, and weak annotations where the training data has only sparse annotations, noisy annotations, or image-level annotations. In this article, we provide a detailed review of the solutions above, summarizing both the technical novelties and empirical results. We further compare the benefits and requirements of the surveyed methodologies and provide our recommended solutions. We hope this survey article increases the community awareness of the techniques that are available to handle imperfect medical image segmentation datasets. △ Less

Submitted 11 February, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: Accepted for publication in the journal of Medical Image Analysis

arXiv:1903.01380 [pdf]

Saliency Prediction for Omnidirectional Images Considering Optimization on Sphere Domain

Authors: Bhishma Dedhia, Jui-Chiu Chiang, Yi-Fan Char

Abstract: There are several formats to describe the omnidirectional images. Among them, equirectangular projection (ERP), represented as 2D image, is the most widely used format. There exist many outstanding methods capable of well predicting the saliency maps for the conventional 2D images. But these works cannot be directly extended to predict the saliency map of the ERP image, since the content on ERP is… ▽ More There are several formats to describe the omnidirectional images. Among them, equirectangular projection (ERP), represented as 2D image, is the most widely used format. There exist many outstanding methods capable of well predicting the saliency maps for the conventional 2D images. But these works cannot be directly extended to predict the saliency map of the ERP image, since the content on ERP is not for direct display. Instead, the viewport image on demand is generated after converting the ERP image to the sphere domain, followed by rectilinear projection. In this paper, we propose a model to predict the saliency maps of the ERP images using existing saliency predictors for the 2D image. Some pre-processing and post-processing are used to manage the problem mentioned above. In particular, a smoothing based optimization is realized on the sphere domain. A public dataset of omnidirectional images is used to perform all the experiments and competitive results are achieved. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Showing 1–9 of 9 results for author: Chiang, J