Search | arXiv e-print repository

IntelliCardiac: An Intelligent Platform for Cardiac Image Segmentation and Classification

Authors: Ting Yu Tsai, An Yu, Meghana Spurthi Maadugundu, Ishrat Jahan Mohima, Umme Habiba Barsha, Mei-Hwa F. Chen, Balakrishnan Prabhakaran, Ming-Ching Chang

Abstract: Precise and effective processing of cardiac imaging data is critical for the identification and management of the cardiovascular diseases. We introduce IntelliCardiac, a comprehensive, web-based medical image processing platform for the automatic segmentation of 4D cardiac images and disease classification, utilizing an AI model trained on the publicly accessible ACDC dataset. The system, intended… ▽ More Precise and effective processing of cardiac imaging data is critical for the identification and management of the cardiovascular diseases. We introduce IntelliCardiac, a comprehensive, web-based medical image processing platform for the automatic segmentation of 4D cardiac images and disease classification, utilizing an AI model trained on the publicly accessible ACDC dataset. The system, intended for patients, cardiologists, and healthcare professionals, offers an intuitive interface and uses deep learning models to identify essential heart structures and categorize cardiac diseases. The system supports analysis of both the right and left ventricles as well as myocardium, and then classifies patient's cardiac images into five diagnostic categories: dilated cardiomyopathy, myocardial infarction, hypertrophic cardiomyopathy, right ventricular abnormality, and no disease. IntelliCardiac combines a deep learning-based segmentation model with a two-step classification pipeline. The segmentation module gains an overall accuracy of 92.6%. The classification module, trained on characteristics taken from segmented heart structures, achieves 98% accuracy in five categories. These results exceed the performance of the existing state-of-the-art methods that integrate both segmentation and classification models. IntelliCardiac, which supports real-time visualization, workflow integration, and AI-assisted diagnostics, has great potential as a scalable, accurate tool for clinical decision assistance in cardiac imaging and diagnosis. △ Less

Submitted 7 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

arXiv:2503.13481 [pdf, other]

An On-Chip Ultra-wideband Antenna with Area-Bandwidth Optimization for Sub-Terahertz Transceivers and Radars

Authors: Boxun Yan, Runzhou Chen, Mau-Chung Frank Chang

Abstract: In this paper, we present an on-chip antenna at 290 GHz that achieves a maximum efficiency of 42\% on a low-resistivity silicon substrate for sub-terahertz integrated transceivers. The proposed antenna is based on a dual-slot structure to accommodate a limited ground plane and maintain desired radiation and impedance characteristics across the target frequency range. The antenna impedance bandwidt… ▽ More In this paper, we present an on-chip antenna at 290 GHz that achieves a maximum efficiency of 42\% on a low-resistivity silicon substrate for sub-terahertz integrated transceivers. The proposed antenna is based on a dual-slot structure to accommodate a limited ground plane and maintain desired radiation and impedance characteristics across the target frequency range. The antenna impedance bandwidth reaches 39\% with compact physical dimensions of 0.24$λ_0\times$0.42$λ_0$. Simulation and measurement results confirm its promising antenna performance for potential transceiver and radar applications. △ Less

Submitted 5 March, 2025; originally announced March 2025.

Comments: Accepted by the 2025 IEEE International Symposium on Antennas & Propagation (AP-S)

arXiv:2407.18465 [pdf]

Multiphysics Modeling on Photoconductive Antennas for Terahertz Applications

Authors: Boxun Yan, Bundel Pooja, Chi-Hou Chan, Mau-Chung Frank Chang

Abstract: Terahertz lies at the juncture between RF and optical electromagnetism, serving as a transition from mm-Wave to infrared photonics. Terahertz technology has been used for industrial quality control, security imaging, and high-speed communications, and often generated through optoelectronic solutions by using photoconductive antennas. In this paper, Multiphysics simulations on semi insulating GaAs,… ▽ More Terahertz lies at the juncture between RF and optical electromagnetism, serving as a transition from mm-Wave to infrared photonics. Terahertz technology has been used for industrial quality control, security imaging, and high-speed communications, and often generated through optoelectronic solutions by using photoconductive antennas. In this paper, Multiphysics simulations on semi insulating GaAs, grapheneenhanced photoconductive antennas are conducted to effectively decouple optical responses of semiconductor carrier generation/drift from Terahertz radiation computation, which provides a comprehensive and integrated platform for future terahertz photoconductive antenna designs △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: 3 pages, 4 figures, accepted by 2024 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO'2024)

arXiv:2405.17496 [pdf, other]

UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

Authors: Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang

Abstract: Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, w… ▽ More Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, we propose a new UU-Mamba model, integrating the U-Mamba model with the Sharpness-Aware Minimization (SAM) optimizer and an uncertainty-aware loss function. SAM enhances generalization by locating flat minima in the loss landscape, thus reducing overfitting. The uncertainty-aware loss combines region-based, distribution-based, and pixel-based loss designs to improve segmentation accuracy and robustness. Evaluation of our method is performed on the ACDC cardiac dataset, outperforming state-of-the-art models including TransUNet, Swin-Unet, nnUNet, and nnFormer. Our approach achieves Dice Similarity Coefficient (DSC) and Mean Squared Error (MSE) scores, demonstrating its effectiveness in cardiac MRI segmentation. △ Less

Submitted 27 August, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2403.16350 [pdf, other]

3D-EffiViTCaps: 3D Efficient Vision Transformer with Capsule for Medical Image Segmentation

Authors: Dongwei Gan, Ming Chang, Juan Chen

Abstract: Medical image segmentation (MIS) aims to finely segment various organs. It requires grasping global information from both parts and the entire image for better segmenting, and clinically there are often certain requirements for segmentation efficiency. Convolutional neural networks (CNNs) have made considerable achievements in MIS. However, they are difficult to fully collect global context inform… ▽ More Medical image segmentation (MIS) aims to finely segment various organs. It requires grasping global information from both parts and the entire image for better segmenting, and clinically there are often certain requirements for segmentation efficiency. Convolutional neural networks (CNNs) have made considerable achievements in MIS. However, they are difficult to fully collect global context information and their pooling layer may cause information loss. Capsule networks, which combine the benefits of CNNs while taking into account additional information such as relative location that CNNs do not, have lately demonstrated some advantages in MIS. Vision Transformer (ViT) employs transformers in visual tasks. Transformer based on attention mechanism has excellent global inductive modeling capabilities and is expected to capture longrange information. Moreover, there have been resent studies on making ViT more lightweight to minimize model complexity and increase efficiency. In this paper, we propose a U-shaped 3D encoder-decoder network named 3D-EffiViTCaps, which combines 3D capsule blocks with 3D EfficientViT blocks for MIS. Our encoder uses capsule blocks and EfficientViT blocks to jointly capture local and global semantic information more effectively and efficiently with less information loss, while the decoder employs CNN blocks and EfficientViT blocks to catch ffner details for segmentation. We conduct experiments on various datasets, including iSeg-2017, Hippocampus and Cardiac to verify the performance and efficiency of 3D-EffiViTCaps, which performs better than previous 3D CNN-based, 3D Capsule-based and 3D Transformer-based models. We further implement a series of ablation experiments on the main blocks. Our code is available at: https://github.com/HidNeuron/3D-EffiViTCaps. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: 15 pages, 4 figures, submitted to ICPR2024

arXiv:2402.09655 [pdf, other]

Evaluating Atypical Gaze Patterns through Vision Models: The Case of Cortical Visual Impairment

Authors: Kleanthis Avramidis, Melinda Y. Chang, Rahul Sharma, Mark S. Borchert, Shrikanth Narayanan

Abstract: A wide range of neurological and cognitive disorders exhibit distinct behavioral markers aside from their clinical manifestations. Cortical Visual Impairment (CVI) is a prime example of such conditions, resulting from damage to visual pathways in the brain, and adversely impacting low- and high-level visual function. The characteristics impacted by CVI are primarily described qualitatively, challe… ▽ More A wide range of neurological and cognitive disorders exhibit distinct behavioral markers aside from their clinical manifestations. Cortical Visual Impairment (CVI) is a prime example of such conditions, resulting from damage to visual pathways in the brain, and adversely impacting low- and high-level visual function. The characteristics impacted by CVI are primarily described qualitatively, challenging the establishment of an objective, evidence-based measure of CVI severity. To study those characteristics, we propose to create visual saliency maps by adequately prompting deep vision models with attributes of clinical interest. After extracting saliency maps for a curated set of stimuli, we evaluate fixation traces on those from children with CVI through eye tracking technology. Our experiments reveal significant gaze markers that verify clinical knowledge and yield nuanced discriminability when compared to those of age-matched control subjects. Using deep learning to unveil atypical visual saliency is an important step toward establishing an eye-tracking signature for severe neurodevelopmental disorders, like CVI. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 figures, submitted to IEEE EMBC 2024

arXiv:2307.07096 [pdf, other]

Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time

Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhijing Yang, S. M. Ahsan Kazmiand Yingxiu Chang

Abstract: Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LR… ▽ More Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LRP encompasses an additional low-rank structure, facilitating a linear constraint on UTIm to help formulate related low-rank structure information. This method allows us to attain globally optimal solutions for UTIm, given proper initialization. However, the initialization process often involves randomness, leading to suboptimal, local minimum values. This paper presents a novel, combined low-rank approximation (CLRA) method designed to mitigate the effects of this random initialization. We introduce three new LRP variants, underpinned by mathematical proof, which allow the UTIm to draw on a richer pool of low-rank structural information. Utilizing this augmented low-rank structural information from both LRP and the proposed variants, we formulate four linear constraints on the UTIm. Employing the proposed CLRA algorithm, we derive global optimal solutions for the UTIm via these four linear constraints.Experimental results highlight the superior performance of our method over existing state-of-the-art approaches, measured in terms of both the recovery number and reduced estimation errors of UTIm. △ Less

Submitted 21 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: 13 pages for main content; 9 pages for proof of proposed low rank properties; 13 figures

arXiv:2207.04565 [pdf, other]

Automating Detection of Papilledema in Pediatric Fundus Images with Explainable Machine Learning

Authors: Kleanthis Avramidis, Mohammad Rostami, Melinda Chang, Shrikanth Narayanan

Abstract: Papilledema is an ophthalmic neurologic disorder in which increased intracranial pressure leads to swelling of the optic nerves. Undiagnosed papilledema in children may lead to blindness and may be a sign of life-threatening conditions, such as brain tumors. Robust and accurate clinical diagnosis of this syndrome can be facilitated by automated analysis of fundus images using deep learning, especi… ▽ More Papilledema is an ophthalmic neurologic disorder in which increased intracranial pressure leads to swelling of the optic nerves. Undiagnosed papilledema in children may lead to blindness and may be a sign of life-threatening conditions, such as brain tumors. Robust and accurate clinical diagnosis of this syndrome can be facilitated by automated analysis of fundus images using deep learning, especially in the presence of challenges posed by pseudopapilledema that has similar fundus appearance but distinct clinical implications. We present a deep learning-based algorithm for the automatic detection of pediatric papilledema. Our approach is based on optic disc localization and detection of explainable papilledema indicators through data augmentation. Experiments on real-world clinical data demonstrate that our proposed method is effective with a diagnostic accuracy comparable to expert ophthalmologists. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: 5 pages, 4 figures, 2 tables, 2022 IEEE International Conference on Image Processing (ICIP)

arXiv:2204.03204 [pdf]

Convolutional Neural Network for Early Pulmonary Embolism Detection via Computed Tomography Pulmonary Angiography

Authors: Ching-Yuan Yu, Ming-Che Chang, Yun-Chien Cheng, Chin Kuo

Abstract: This study was conducted to develop a computer-aided detection (CAD) system for triaging patients with pulmonary embolism (PE). The purpose of the system was to reduce the death rate during the waiting period. Computed tomography pulmonary angiography (CTPA) is used for PE diagnosis. Because CTPA reports require a radiologist to review the case and suggest further management, this creates a waitin… ▽ More This study was conducted to develop a computer-aided detection (CAD) system for triaging patients with pulmonary embolism (PE). The purpose of the system was to reduce the death rate during the waiting period. Computed tomography pulmonary angiography (CTPA) is used for PE diagnosis. Because CTPA reports require a radiologist to review the case and suggest further management, this creates a waiting period during which patients may die. Our proposed CAD method was thus designed to triage patients with PE from those without PE. In contrast to related studies involving CAD systems that identify key PE lesion images to expedite PE diagnosis, our system comprises a novel classification-model ensemble for PE detection and a segmentation model for PE lesion labeling. The models were trained using data from National Cheng Kung University Hospital and open resources. The classification model yielded 0.73 for receiver operating characteristic curve (accuracy = 0.85), while the mean intersection over union was 0.689 for the segmentation model. The proposed CAD system can distinguish between patients with and without PE and automatically label PE lesions to expedite PE diagnosis △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2112.03099 [pdf, other]

VocBench: A Neural Vocoder Benchmark for Speech Synthesis

Authors: Ehab A. AlBadawy, Andrew Gibiansky, Qing He, Jilong Wu, Ming-Ching Chang, Siwei Lyu

Abstract: Neural vocoders, used for converting the spectral representations of an audio signal to the waveforms, are a commonly used component in speech synthesis pipelines. It focuses on synthesizing waveforms from low-dimensional representation, such as Mel-Spectrograms. In recent years, different approaches have been introduced to develop such vocoders. However, it becomes more challenging to assess thes… ▽ More Neural vocoders, used for converting the spectral representations of an audio signal to the waveforms, are a commonly used component in speech synthesis pipelines. It focuses on synthesizing waveforms from low-dimensional representation, such as Mel-Spectrograms. In recent years, different approaches have been introduced to develop such vocoders. However, it becomes more challenging to assess these new vocoders and compare their performance to previous ones. To address this problem, we present VocBench, a framework that benchmark the performance of state-of-the art neural vocoders. VocBench uses a systematic study to evaluate different neural vocoders in a shared environment that enables a fair comparison between them. In our experiments, we use the same setup for datasets, training pipeline, and evaluation metrics for all neural vocoders. We perform a subjective and objective evaluation to compare the performance of each vocoder along a different axis. Our results demonstrate that the framework is capable of showing the competitive efficacy and the quality of the synthesized samples for each vocoder. VocBench framework is available at https://github.com/facebookresearch/vocoder-benchmark. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: To appear in icassp 2022

arXiv:2111.11665 [pdf, other]

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR

Authors: Yuyin Zhou, Shih-Cheng Huang, Jason Alan Fries, Alaa Youssef, Timothy J. Amrhein, Marcello Chang, Imon Banerjee, Daniel Rubin, Lei Xing, Nigam Shah, Matthew P. Lungren

Abstract: Despite the routine use of electronic health record (EHR) data by radiologists to contextualize clinical history and inform image interpretation, the majority of deep learning architectures for medical imaging are unimodal, i.e., they only learn features from pixel-level information. Recent research revealing how race can be recovered from pixel data alone highlights the potential for serious bias… ▽ More Despite the routine use of electronic health record (EHR) data by radiologists to contextualize clinical history and inform image interpretation, the majority of deep learning architectures for medical imaging are unimodal, i.e., they only learn features from pixel-level information. Recent research revealing how race can be recovered from pixel data alone highlights the potential for serious biases in models which fail to account for demographics and other key patient attributes. Yet the lack of imaging datasets which capture clinical context, inclusive of demographics and longitudinal medical history, has left multimodal medical imaging underexplored. To better assess these challenges, we present RadFusion, a multimodal, benchmark dataset of 1794 patients with corresponding EHR data and high-resolution computed tomography (CT) scans labeled for pulmonary embolism. We evaluate several representative multimodal fusion models and benchmark their fairness properties across protected subgroups, e.g., gender, race/ethnicity, age. Our results suggest that integrating imaging and EHR data can improve classification performance and robustness without introducing large disparities in the true positive rate between population groups. △ Less

Submitted 26 November, 2021; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: RadFusion dataset: https://stanfordaimi.azurewebsites.net/datasets/3a7548a4-8f65-4ab7-85fa-3d68c9efc1bd

arXiv:2109.06638 [pdf, other]

Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks

Authors: Bor-Shiun Wang, Jun-Wei Hsieh, Ming-Ching Chang, Ping-Yang Chen, Lipeng Ke, Siwei Lyu

Abstract: Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features… ▽ More Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features with improved accuracy and efficiency. Motivated from the wavelet theory, we adopt the low-pass (L) and high-pass (H) filters horizontally and vertically for pooling on a 2D feature map. Feature signals are decomposed into four (LL, LH, HL, HH) subbands to retain features better and avoid information dropping. The wavelet transform ensures features after pooling can be fully preserved and recovered. We next adopt an energy-based attention learning to fine-select crucial and representative features. LDW-Pooling is effective and efficient when compared with other state-of-the-art pooling techniques such as WaveletPooling and LiftPooling. Extensive experimental validation shows that LDW-Pooling can be applied to a wide range of standard CNN architectures and consistently outperform standard (max, mean, mixed, and stochastic) pooling operations. △ Less

Submitted 20 October, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted by BMVC 2021

arXiv:2107.02672 [pdf, other]

COVID-19 Pneumonia Severity Prediction using Hybrid Convolution-Attention Neural Architectures

Authors: Nam Nguyen, J. Morris Chang

Abstract: This study proposed a novel framework for COVID-19 severity prediction, which is a combination of data-centric and model-centric approaches. First, we propose a data-centric pre-training for extremely scare data scenarios of the investigating dataset. Second, we propose two hybrid convolution-attention neural architectures that leverage the self-attention from the Transformer and the Dense Associa… ▽ More This study proposed a novel framework for COVID-19 severity prediction, which is a combination of data-centric and model-centric approaches. First, we propose a data-centric pre-training for extremely scare data scenarios of the investigating dataset. Second, we propose two hybrid convolution-attention neural architectures that leverage the self-attention from the Transformer and the Dense Associative Memory (Modern Hopfield networks). Our proposed approach achieves significant improvement from the conventional baseline approach. The best model from our proposed approach achieves $R^2 = 0.85 \pm 0.05$ and Pearson correlation coefficient $ρ= 0.92 \pm 0.02$ in geographic extend and $R^2 = 0.72 \pm 0.09, ρ= 0.85\pm 0.06$ in opacity prediction. △ Less

Submitted 7 July, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

arXiv:2102.10260 [pdf, other]

Wireless sensor network for in situ soil moisture monitoring

Authors: Jianing Fang, Chuheng Hu, Nour Smaoui, Doug Carlson, Jayant Gupchup, Razvan Musaloiu-E., Chieh-Jan Mike Liang, Marcus Chang, Omprakash Gnawali, Tamas Budavari, Andreas Terzis, Katalin Szlavecz, Alexander S. Szalay

Abstract: We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environm… ▽ More We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environmental scientist to be in full control of the system. Finally, we describe the current effort to build a large-scale Gen-4 sensing platform consisting of hundreds of nodes to track the environmental parameters for urban green spaces in Baltimore, Maryland. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 12 pages, 16 figures, Sensornets 2021 Conference

arXiv:2007.00199 [pdf, other]

Low-light Image Restoration with Short- and Long-exposure Raw Pairs

Authors: Meng Chang, Huajun Feng, Zhihai Xu, Qi Li

Abstract: Low-light imaging with handheld mobile devices is a challenging issue. Limited by the existing models and training data, most existing methods cannot be effectively applied in real scenarios. In this paper, we propose a new low-light image restoration method by using the complementary information of short- and long-exposure images. We first propose a novel data generation method to synthesize real… ▽ More Low-light imaging with handheld mobile devices is a challenging issue. Limited by the existing models and training data, most existing methods cannot be effectively applied in real scenarios. In this paper, we propose a new low-light image restoration method by using the complementary information of short- and long-exposure images. We first propose a novel data generation method to synthesize realistic short- and longexposure raw images by simulating the imaging pipeline in lowlight environment. Then, we design a new long-short-exposure fusion network (LSFNet) to deal with the problems of low-light image fusion, including high noise, motion blur, color distortion and misalignment. The proposed LSFNet takes pairs of shortand long-exposure raw images as input, and outputs a clear RGB image. Using our data generation method and the proposed LSFNet, we can recover the details and color of the original scene, and improve the low-light image quality effectively. Experiments demonstrate that our method can outperform the state-of-the art methods. △ Less

Submitted 28 February, 2021; v1 submitted 30 June, 2020; originally announced July 2020.

arXiv:2004.12064 [pdf, other]

CS-AF: A Cost-sensitive Multi-classifier Active Fusion Framework for Skin Lesion Classification

Authors: Di Zhuang, Keyu Chen, J. Morris Chang

Abstract: Convolutional neural networks (CNNs) have achieved the state-of-the-art performance in skin lesion analysis. Compared with single CNN classifier, combining the results of multiple classifiers via fusion approaches shows to be more effective and robust. Since the skin lesion datasets are usually limited and statistically biased, while designing an effective fusion approach, it is important to consi… ▽ More Convolutional neural networks (CNNs) have achieved the state-of-the-art performance in skin lesion analysis. Compared with single CNN classifier, combining the results of multiple classifiers via fusion approaches shows to be more effective and robust. Since the skin lesion datasets are usually limited and statistically biased, while designing an effective fusion approach, it is important to consider not only the performance of each classifier on the training/validation dataset, but also the relative discriminative power (e.g., confidence) of each classifier regarding an individual sample in the testing phase, which calls for an active fusion approach. Furthermore, in skin lesion analysis, the data of certain classes (e.g., the benign lesions) is usually abundant making them an over-represented majority, while the data of some other classes (e.g., the cancerous lesions) is deficient, making them an underrepresented minority. It is more crucial to precisely identify the samples from an underrepresented (i.e., in terms of the amount of data) but more important minority class (e.g., certain cancerous lesion). In other words, misclassifying a more severe lesion to a benign or less severe lesion should have relative more cost (e.g., money, time and even lives). To address such challenges, we present CS-AF, a cost-sensitive multi-classifier active fusion framework for skin lesion classification. In the experimental evaluation, we prepared 96 base classifiers (of 12 CNN architectures) on the ISIC research datasets. Our experimental results show that our framework consistently outperforms the static fusion competitors. △ Less

Submitted 9 September, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

Comments: 16 pages, 8 figures, 2 table

arXiv:2002.10201 [pdf, other]

Beyond Camera Motion Blur Removing: How to Handle Outliers in Deblurring

Authors: Meng Chang, Chenwei Yang, Huajun Feng, Zhihai Xu, Qi Li

Abstract: Camera motion deblurring is an important low-level vision task for achieving better imaging quality. When a scene has outliers such as saturated pixels, the captured blurred image becomes more difficult to restore. In this paper, we propose a novel method to handle camera motion blur with outliers. We first propose an edge-aware scale-recurrent network (EASRN) to conduct deblurring. EASRN has a se… ▽ More Camera motion deblurring is an important low-level vision task for achieving better imaging quality. When a scene has outliers such as saturated pixels, the captured blurred image becomes more difficult to restore. In this paper, we propose a novel method to handle camera motion blur with outliers. We first propose an edge-aware scale-recurrent network (EASRN) to conduct deblurring. EASRN has a separate deblurring module that removes blur at multiple scales and an upsampling module that fuses different input scales. Then a salient edge detection network is proposed to supervise the training process and constraint the edges restoration. By simulating camera motion and adding various light sources, we can generate blurred images with saturation cutoff. Using the proposed data generation method, our network can learn to deal with outliers effectively. We evaluate our method on public test datasets including the GoPro dataset, Kohler's dataset and Lai's dataset. Both objective evaluation indexes and subjective visualization show that our method results in better deblurring quality than other state-of-the-art approaches. △ Less

Submitted 27 April, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

arXiv:2001.10291 [pdf, other]

Spatial-Adaptive Network for Single Image Denoising

Authors: Meng Chang, Qi Li, Huajun Feng, Zhihai Xu

Abstract: Previous works have shown that convolutional neural networks can achieve good performance in image denoising tasks. However, limited by the local rigid convolutional operation, these methods lead to oversmoothing artifacts. A deeper network structure could alleviate these problems, but more computational overhead is needed. In this paper, we propose a novel spatial-adaptive denoising network (SADN… ▽ More Previous works have shown that convolutional neural networks can achieve good performance in image denoising tasks. However, limited by the local rigid convolutional operation, these methods lead to oversmoothing artifacts. A deeper network structure could alleviate these problems, but more computational overhead is needed. In this paper, we propose a novel spatial-adaptive denoising network (SADNet) for efficient single image blind noise removal. To adapt to changes in spatial textures and edges, we design a residual spatial-adaptive block. Deformable convolution is introduced to sample the spatially correlated features for weighting. An encoder-decoder structure with a context block is introduced to capture multiscale information. With noise removal from the coarse to fine, a high-quality noisefree image can be obtained. We apply our method to both synthetic and real noisy image datasets. The experimental results demonstrate that our method can surpass the state-of-the-art denoising methods both quantitatively and visually. △ Less

Submitted 13 July, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

arXiv:1912.04844 [pdf, other]

Quantifying the Chaos Level of Infants' Environment via Unsupervised Learning

Authors: Priyanka Khante, Mai Lee Chang, Domingo Martinez, Kaya de Barbaro, Edison Thomaz

Abstract: Acoustic environments vary dramatically within the home setting. They can be a source of comfort and tranquility or chaos that can lead to less optimal cognitive development in children. Research to date has only subjectively measured household chaos. In this work, we use three unsupervised machine learning techniques to quantify household chaos in infants' homes. These unsupervised techniques inc… ▽ More Acoustic environments vary dramatically within the home setting. They can be a source of comfort and tranquility or chaos that can lead to less optimal cognitive development in children. Research to date has only subjectively measured household chaos. In this work, we use three unsupervised machine learning techniques to quantify household chaos in infants' homes. These unsupervised techniques include hierarchical clustering using K-Means, clustering using self-organizing map (SOM) and deep learning. We evaluated these techniques using data from 9 participants which is a total of 197 hours. Results show that these techniques are promising to quantify household chaos. △ Less

Submitted 10 December, 2019; originally announced December 2019.

arXiv:1903.03474 [pdf, other]

doi 10.1109/JLT.2019.2945017

Demonstration of multivariate photonics: blind dimensionality reduction with analog integrated photonics

Authors: Alexander N. Tait, Philip Y. Ma, Thomas Ferreira de Lima, Eric C. Blow, Matthew P. Chang, Mitchell A. Nahmias, Bhavin J. Shastri, Paul R. Prucnal

Abstract: Multi-antenna radio front-ends generate a multi-dimensional flood of information, most of which is partially redundant. Redundancy is eliminated by dimensionality reduction, but contemporary digital processing techniques face harsh fundamental tradeoffs when implementing this class of functions. These tradeoffs can be broken in the analog domain, in which the performance of optical technologies gr… ▽ More Multi-antenna radio front-ends generate a multi-dimensional flood of information, most of which is partially redundant. Redundancy is eliminated by dimensionality reduction, but contemporary digital processing techniques face harsh fundamental tradeoffs when implementing this class of functions. These tradeoffs can be broken in the analog domain, in which the performance of optical technologies greatly exceeds that of electronic counterparts. Here, we present concepts, methods, and a first demonstration of multivariate photonics: a combination of integrated photonic hardware, analog dimensionality reduction, and blind algorithmic techniques. We experimentally demonstrate 2-channel, 1.0 GHz principal component analysis in a photonic weight bank using recently proposed algorithms for synthesizing the multivariate properties of signals to which the receiver is blind. Novel methods are introduced for controlling blindness conditions in a laboratory context. This work provides a foundation for further research in multivariate photonic information processing, which is poised to play a role in future generations of wireless technology. △ Less

Submitted 10 February, 2019; originally announced March 2019.

Comments: 24 pages, 7 figures

Showing 1–20 of 20 results for author: Chang, M