Skip to main content

Showing 1–22 of 22 results for author: Liang, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.08534  [pdf, ps, other

    eess.IV cs.AI cs.CV

    DCD: A Semantic Segmentation Model for Fetal Ultrasound Four-Chamber View

    Authors: Donglian Li, Hui Guo, Minglang Chen, Huizhen Chen, Jialing Chen, Bocheng Liang, Pengchen Liang, Ying Tan

    Abstract: Accurate segmentation of anatomical structures in the apical four-chamber (A4C) view of fetal echocardiography is essential for early diagnosis and prenatal evaluation of congenital heart disease (CHD). However, precise segmentation remains challenging due to ultrasound artifacts, speckle noise, anatomical variability, and boundary ambiguity across different gestational stages. To reduce the workl… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2503.07667  [pdf, other

    cs.LG cs.AI cs.CV eess.SP

    CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

    Authors: Wei Dai, Peilin Chen, Malinda Lu, Daniel Li, Haowen Wei, Hejie Cui, Paul Pu Liang

    Abstract: Recent advances in clinical AI have enabled remarkable progress across many clinical domains. However, existing benchmarks and models are primarily limited to a small set of modalities and tasks, which hinders the development of large-scale multimodal methods that can make holistic assessments of patient health and well-being. To bridge this gap, we introduce Clinical Large-Scale Integrative Multi… ▽ More

    Submitted 20 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

  3. arXiv:2503.02321  [pdf, ps, other

    eess.IV cs.CV

    Rapid Bone Scintigraphy Enhancement via Semantic Prior Distillation from Segment Anything Model

    Authors: Pengchen Liang, Leijun Shi, Huiping Yao, Bin Pu, Jianguo Chen, Lei Zhao, Haishan Huang, Zhuangzhuang Chen, Zhaozhao Xu, Lite Xu, Qing Chang, Yiwei Li

    Abstract: Rapid bone scintigraphy is crucial for diagnosing skeletal disorders and detecting tumor metastases in children, as it shortens scan duration and reduces discomfort. However, accelerated acquisition often degrades image quality, impairing the visibility of fine anatomical details and potentially compromising diagnosis. To overcome this limitation, we introduce the first application of SAM-based se… ▽ More

    Submitted 4 June, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

    Comments: 12 pages, 9 figures, 8 tables

  4. arXiv:2502.14584  [pdf, other

    eess.IV cs.CV

    Vision Foundation Models in Medical Image Analysis: Advances and Challenges

    Authors: Pengchen Liang, Bin Pu, Haishan Huang, Yiwei Li, Hualiang Wang, Weibo Ma, Qing Chang

    Abstract: The rapid development of Vision Foundation Models (VFMs), particularly Vision Transformers (ViT) and Segment Anything Model (SAM), has sparked significant advances in the field of medical image analysis. These models have demonstrated exceptional capabilities in capturing long-range dependencies and achieving high generalization in segmentation tasks. However, adapting these large models to medica… ▽ More

    Submitted 20 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 17 pages, 1 figure

  5. arXiv:2502.14363  [pdf, other

    eess.IV cs.CV

    Topology-Aware Wavelet Mamba for Airway Structure Segmentation in Postoperative Recurrent Nasopharyngeal Carcinoma CT Scans

    Authors: Haishan Huang, Pengchen Liang, Naier Lin, Luxi Wang, Bin Pu, Jianguo Chen, Qing Chang, Xia Shen, Guo Ran

    Abstract: Nasopharyngeal carcinoma (NPC) patients often undergo radiotherapy and chemotherapy, which can lead to postoperative complications such as limited mouth opening and joint stiffness, particularly in recurrent cases that require re-surgery. These complications can affect airway function, making accurate postoperative airway risk assessment essential for managing patient care. Accurate segmentation o… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 20 pages, 11 figures, 6 tables

  6. arXiv:2501.16368  [pdf, other

    cs.LG cs.AI eess.SY

    Foundation Models for CPS-IoT: Opportunities and Challenges

    Authors: Ozan Baris, Yizhuo Chen, Gaofeng Dong, Liying Han, Tomoyoshi Kimura, Pengrui Quan, Ruijie Wang, Tianchen Wang, Tarek Abdelzaher, Mario Bergés, Paul Pu Liang, Mani Srivastava

    Abstract: Methods from machine learning (ML) have transformed the implementation of Perception-Cognition-Communication-Action loops in Cyber-Physical Systems (CPS) and the Internet of Things (IoT), replacing mechanistic and basic statistical models with those derived from data. However, the first generation of ML approaches, which depend on supervised learning with annotated data to create task-specific mod… ▽ More

    Submitted 4 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

  7. arXiv:2410.01395  [pdf, other

    eess.IV cs.CV

    Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots

    Authors: Renkai Wu, Xianjin Wang, Pengchen Liang, Zhenyu Zhang, Qing Chang, Hao Tang

    Abstract: Robot-assisted surgery has profoundly influenced current forms of minimally invasive surgery. However, in transurethral suburethral urological surgical robots, they need to work in a liquid environment. This causes vaporization of the liquid when shearing and heating is performed, resulting in bubble atomization that affects the visual perception of the robot. This can lead to the need for uninter… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  8. arXiv:2404.13362  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    Semantically Corrected Amharic Automatic Speech Recognition

    Authors: Samuael Adnew, Paul Pu Liang

    Abstract: Automatic Speech Recognition (ASR) can play a crucial role in enhancing the accessibility of spoken languages worldwide. In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa. Amharic is written in the Ge'ez script, a sequence of graphemes with spacings denoting word boundaries. This makes computational processing of Am… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  9. UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

    Authors: Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang

    Abstract: Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba,… ▽ More

    Submitted 24 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Journal ref: Patterns. 2025

  10. arXiv:2310.06339  [pdf, other

    eess.IV cs.LG

    Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination

    Authors: Siyuan Jiang, Yan Ding, Yuling Wang, Lei Xu, Wenli Dai, Wanru Chang, Jianfeng Zhang, Jie Yu, Jianqiao Zhou, Chunquan Zhang, Ping Liang, Dexing Kong

    Abstract: Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views w… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  11. arXiv:2306.08620  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Anticipatory Music Transformer

    Authors: John Thickstun, David Hall, Chris Donahue, Percy Liang

    Abstract: We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by pro… ▽ More

    Submitted 25 July, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: TMLR accepted version

  12. arXiv:2305.13583  [pdf, other

    cs.CL cs.MM eess.AS eess.IV

    Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition

    Authors: Yaoting Wang, Yuanchao Li, Paul Pu Liang, Louis-Philippe Morency, Peter Bell, Catherine Lai

    Abstract: Fusing multiple modalities has proven effective for multimodal information processing. However, the incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition. In this study, we first analyze how the salient affective information in one modality can be affected by the other, and demonstrate that inter-modal incongruity exists latently in crossmodal att… ▽ More

    Submitted 12 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: *First two authors contributed equally

  13. arXiv:2304.01620  [pdf, other

    cs.CV eess.IV

    Image Blind Denoising Using Dual Convolutional Neural Network with Skip Connection

    Authors: Wencong Wu, Shicheng Liao, Guannan Lv, Peng Liang, Yungang Zhang

    Abstract: In recent years, deep convolutional neural networks have shown fascinating performance in the field of image denoising. However, deeper network architectures are often accompanied with large numbers of model parameters, leading to high training cost and long inference time, which limits their application in practical denoising tasks. In this paper, we propose a novel dual convolutional blind denoi… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  14. arXiv:2304.01498  [pdf, other

    cs.CV eess.IV

    DCANet: Dual Convolutional Neural Network with Attention for Image Blind Denoising

    Authors: Wencong Wu, Guannan Lv, Yingying Duan, Peng Liang, Yungang Zhang, Yuelong Xia

    Abstract: Noise removal of images is an essential preprocessing procedure for many computer vision tasks. Currently, many denoising models based on deep neural networks can perform well in removing the noise with known distributions (i.e. the additive Gaussian white noise). However eliminating real noise is still a very challenging task, since real-world noise often does not simply follow one single type of… ▽ More

    Submitted 16 June, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

  15. arXiv:2212.01884  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Melody transcription via generative pre-training

    Authors: Chris Donahue, John Thickstun, Percy Liang

    Abstract: Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for so… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: Published as a conference paper at ISMIR 2022

  16. arXiv:2207.00156  [pdf, other

    eess.IV cs.CV cs.LG

    Usable Region Estimate for Assessing Practical Usability of Medical Image Segmentation Models

    Authors: Yizhe Zhang, Suraj Mishra, Peixian Liang, Hao Zheng, Danny Z. Chen

    Abstract: We aim to quantitatively measure the practical usability of medical image segmentation models: to what extent, how often, and on which samples a model's predictions can be used/trusted. We first propose a measure, Correctness-Confidence Rank Correlation (CCRC), to capture how predictions' confidence estimates correlate with their correctness scores in rank. A model with a high value of CCRC means… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: Accepted by MICCAI2022

  17. arXiv:2202.04312  [pdf, other

    cs.NI eess.SP

    Using 5G in Smart Cities: A Systematic Mapping Study

    Authors: Chen Yang, Peng Liang, Liming Fu, Guorui Cui, Fei Huang, Feng Teng, Yawar Abbas Bangash

    Abstract: 5G is the fifth generation wireless network, with a set of characteristics, e.g., high bandwidth and data rates. The scenarios of using 5G include enhanced Mobile Broadband (eMBB), massive Machine Type Communications (mMTC), and ultra-Reliable and Low-Latency Communications (uRLLC). 5G is expected to support a wide variety of applications. We conducted a systematic mapping study that covers the li… ▽ More

    Submitted 15 February, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: Preprint accepted for publication in Intelligent Systems with Applications, 2022

  18. arXiv:2202.02487  [pdf, other

    eess.SP

    An Olfactory EEG Signal Classification Network Based on Frequency Band Feature Extraction

    Authors: Biao Sun, Zhigang Wei, Pei Liang, Huirang Hou

    Abstract: Classification of olfactory-induced electroencephalogram (EEG) signals has shown great potential in many fields. Since different frequency bands within the EEG signals contain different information, extracting specific frequency bands for classification performance is important. Moreover, due to the large inter-subject variability of the EEG signals, extracting frequency bands with subject-specifi… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

  19. arXiv:2107.05677  [pdf, other

    cs.SD cs.IR cs.LG cs.MM eess.AS

    Codified audio language modeling learns useful representations for music information retrieval

    Authors: Rodrigo Castellon, Chris Donahue, Percy Liang

    Abstract: We demonstrate that language models pre-trained on codified (discretely-encoded) music audio learn representations that are useful for downstream MIR tasks. Specifically, we explore representations from Jukebox (Dhariwal et al. 2020): a music generation system containing a language model trained on codified audio from 1M songs. To determine if Jukebox's representations contain useful information f… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: To appear in the proceedings of ISMIR 2021

  20. arXiv:2105.14766  [pdf, other

    eess.IV cs.CV

    BaMBNet: A Blur-aware Multi-branch Network for Defocus Deblurring

    Authors: Pengwei Liang, Junjun Jiang, Xianming Liu, Jiayi Ma

    Abstract: The defocus deblurring raised from the finite aperture size and exposure time is an essential problem in the computational photography. It is very challenging because the blur kernel is spatially varying and difficult to estimate by traditional methods. Due to its great breakthrough in low-level tasks, convolutional neural networks (CNNs) have been introduced to the defocus deblurring problem and… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: 11 pages, 8 figures

  21. arXiv:2101.08919  [pdf, other

    eess.AS cs.CR cs.SD

    Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks

    Authors: Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, Louis-Philippe Morency

    Abstract: As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data remains private. Existing solutions rely either on server-side methods or focus on hiding speaker identity. While these approaches reduce certain security concerns, they do not give users client-side control over whether their biometric information is sent to the server. In this paper,… ▽ More

    Submitted 22 October, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

  22. arXiv:1906.02125  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS stat.ML

    Strong and Simple Baselines for Multimodal Utterance Embeddings

    Authors: Paul Pu Liang, Yao Chong Lim, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Human language is a rich multimodal signal consisting of spoken words, facial expressions, body gestures, and vocal intonations. Learning representations for these spoken utterances is a complex research problem due to the presence of multiple heterogeneous sources of information. Recent advances in multimodal learning have followed the general trend of building more complex models that utilize va… ▽ More

    Submitted 28 February, 2020; v1 submitted 14 May, 2019; originally announced June 2019.

    Comments: NAACL 2019 oral presentation