Skip to main content

Showing 1–50 of 57 results for author: Dai, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.15868  [pdf

    q-bio.QM cs.AI eess.IV

    An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology

    Authors: Changchun Yang, Weiqian Dai, Yilan Zhang, Siyuan Chen, Jingdong Hu, Junkai Su, Yuxuan Chen, Ao Xu, Na Li, Xin Gao, Yongguo Yu

    Abstract: Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the s… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: These authors contributed equally to this work: Changchun Yang, Weiqian Dai, Yilan Zhang

  2. arXiv:2503.07667  [pdf, other

    cs.LG cs.AI cs.CV eess.SP

    CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

    Authors: Wei Dai, Peilin Chen, Malinda Lu, Daniel Li, Haowen Wei, Hejie Cui, Paul Pu Liang

    Abstract: Recent advances in clinical AI have enabled remarkable progress across many clinical domains. However, existing benchmarks and models are primarily limited to a small set of modalities and tasks, which hinders the development of large-scale multimodal methods that can make holistic assessments of patient health and well-being. To bridge this gap, we introduce Clinical Large-Scale Integrative Multi… ▽ More

    Submitted 20 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

  3. arXiv:2502.07012  [pdf, ps, other

    eess.SP

    Bayesian Beamforming for Integrated Sensing and Communication Systems

    Authors: Zongyao Zhao, Zhenyu Liu, Wei Dai, Xinke Tang, Xiao-Ping Zhang, Yuhan Dong

    Abstract: The uncertainty of the sensing target brings great challenge to the beamforming design of the integrated sensing and communication (ISAC) system. To address this issue, we model the scattering coefficient and azimuth angle of the target as random variables and introduce a novel metric, expected detection probability (EPd), to quantify the average detection performance from a Bayesian perspective.… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 6 pages, 6 figures

  4. arXiv:2501.13751  [pdf, other

    eess.IV cs.CV

    On Disentangled Training for Nonlinear Transform in Learned Image Compression

    Authors: Han Li, Shaohui Li, Wenrui Dai, Maida Cao, Nuowen Kan, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Learned image compression (LIC) has demonstrated superior rate-distortion (R-D) performance compared to traditional codecs, but is challenged by training inefficiency that could incur more than two weeks to train a state-of-the-art model from scratch. Existing LIC methods overlook the slow convergence caused by compacting energy in learning nonlinear transforms. In this paper, we first reveal that… ▽ More

    Submitted 15 February, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: Accepted by ICLR2025

  5. arXiv:2412.14369  [pdf, ps, other

    eess.SP

    Uncertainty Awareness in Wireless Communications and Sensing

    Authors: Shixiong Wang, Wei Dai, Jianyong Sun, Zongben Xu, Geoffrey Ye Li

    Abstract: Wireless communications and sensing (WCS) establish the backbone of modern information exchange and environment perception. Typical applications range from mobile networks and the Internet of Things to radar and sensor grids. Despite transformative capabilities, wireless systems often face diverse uncertainties in design and operation, such as modeling errors due to incomplete physical knowledge,… ▽ More

    Submitted 7 April, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Journal ref: IEEE Communications Magazine, May 2025

  6. arXiv:2411.06564  [pdf, other

    eess.SP

    Robust Beamforming with Application in High-Resolution Sensing

    Authors: Shixiong Wang, Wei Dai, Geoffrey Ye Li

    Abstract: As a fundamental technique in array signal processing, beamforming plays a crucial role in amplifying signals of interest while mitigating interference and noise. When uncertainties exist in the signal model or the data size of snapshots is limited, the performance of beamformers significantly degrades. In this article, we comprehensively study the conceptual system, theoretical analysis, and algo… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

  7. arXiv:2410.17343  [pdf

    eess.SP cs.AI cs.LG

    EEG-DIF: Early Warning of Epileptic Seizures through Generative Diffusion Model-based Multi-channel EEG Signals Forecasting

    Authors: Zekun Jiang, Wei Dai, Qu Wei, Ziyuan Qin, Kang Li, Le Zhang

    Abstract: Multi-channel EEG signals are commonly used for the diagnosis and assessment of diseases such as epilepsy. Currently, various EEG diagnostic algorithms based on deep learning have been developed. However, most research efforts focus solely on diagnosing and classifying current signal data but do not consider the prediction of future trends for early warning. Additionally, since multi-channel EEG c… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 9 pages, 4 figures, 3 tables, accepted by ACM BCB 2024

  8. arXiv:2409.00356  [pdf, other

    cs.SD cs.AI eess.AS

    Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology

    Authors: Weinan Dai, Yifeng Jiang, Yuanjing Liu, Jinkun Chen, Xin Sun, Jinglei Tao

    Abstract: This paper addresses the persistent challenge in Keyword Spotting (KWS), a fundamental component in speech technology, regarding the acquisition of substantial labeled data for training. Given the difficulty in obtaining large quantities of positive samples and the laborious process of collecting new target samples when the keyword changes, we introduce a novel approach combining unsupervised cont… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: This paper has been accepted by the ICPR2024

  9. arXiv:2407.07720  [pdf, other

    eess.IV cs.CV

    Exploiting Scale-Variant Attention for Segmenting Small Medical Objects

    Authors: Wei Dai, Rui Liu, Zixuan Wu, Tianyi Wu, Min Wang, Junxian Zhou, Yixuan Yuan, Jun Liu

    Abstract: Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise… ▽ More

    Submitted 5 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures, under review

  10. arXiv:2406.16692  [pdf, other

    eess.SP

    Stationary and Sparse Denoising Approach for Corticomuscular Causality Estimation

    Authors: Farwa Abbas, Verity McClelland, Zoran Cvetkovic, Wei Dai

    Abstract: Objective: Cortico-muscular communication patterns are instrumental in understanding movement control. Estimating significant causal relationships between motor cortex electroencephalogram (EEG) and surface electromyogram (sEMG) from concurrently active muscles presents a formidable challenge since the relevant processes underlying muscle control are typically weak in comparison to measurement noi… ▽ More

    Submitted 21 January, 2025; v1 submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.03228  [pdf, other

    eess.AS

    Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement

    Authors: Wang Dai, Xiaofei Li, Archontis Politis, Tuomas Virtanen

    Abstract: In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions cha… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by EUSIPCO 2024

  12. arXiv:2405.07739  [pdf, ps, other

    eess.SP

    A Low-rank Projected Proximal Gradient Method for Spectral Compressed Sensing

    Authors: Xi Yao, Wei Dai

    Abstract: This paper presents a new approach to the recovery of a spectrally sparse signal (SSS) from partially observed entries, focusing on challenges posed by large-scale data and heavy noise environments. The SSS reconstruction can be formulated as a non-convex low-rank Hankel recovery problem. Traditional formulations for SSS recovery often suffer from reconstruction inaccuracies due to unequally weigh… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  13. arXiv:2402.19176  [pdf, other

    math.OC eess.SP

    Proximal Dogleg Opportunistic Majorization for Nonconvex and Nonsmooth Optimization

    Authors: Yiming Zhou, Wei Dai

    Abstract: We consider minimizing a function consisting of a quadratic term and a proximable term which is possibly nonconvex and nonsmooth. This problem is also known as scaled proximal operator. Despite its simple form, existing methods suffer from slow convergence or high implementation complexity or both. To overcome these limitations, we develop a fast and user-friendly second-order proximal algorithm.… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  14. arXiv:2401.12345  [pdf, other

    eess.SP

    Distributionally Robust Receive Beamforming

    Authors: Shixiong Wang, Wei Dai, Geoffrey Ye Li

    Abstract: This article investigates signal estimation in wireless transmission (i.e., receive beamforming) from the perspective of statistical machine learning, where the transmit signals may be from an integrated sensing and communication system; that is, 1) signals may be not only discrete constellation points but also arbitrary complex values; 2) signals may be spatially correlated. Particular attention… ▽ More

    Submitted 10 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  15. arXiv:2311.00071  [pdf, other

    eess.SP

    Robust Waveform Design for Integrated Sensing and Communication

    Authors: Shixiong Wang, Wei Dai, Haowei Wang, Geoffrey Ye Li

    Abstract: Integrated sensing and communication (ISAC), which enables hardware, resources (e.g., spectra), and waveforms sharing, is becoming a key feature in future-generation communication systems. This paper investigates performance characterization and waveform design for ISAC systems when the underlying true communication channels are not accurately known. With uncertainty in a nominal communication cha… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: Accepted by IEEE Transactions on Signal Processing; Source Codes: https://github.com/Spratm-Asleaf/Robust-Waveform

  16. arXiv:2310.16387  [pdf, other

    eess.IV cs.CV

    Frequency-Aware Transformer for Learned Image Compression

    Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the… ▽ More

    Submitted 16 December, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: ICLR2024 poster

  17. arXiv:2310.06339  [pdf, other

    eess.IV cs.LG

    Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination

    Authors: Siyuan Jiang, Yan Ding, Yuling Wang, Lei Xu, Wenli Dai, Wanru Chang, Jianfeng Zhang, Jie Yu, Jianqiao Zhou, Chunquan Zhang, Ping Liang, Dexing Kong

    Abstract: Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views w… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  18. arXiv:2310.04705  [pdf, other

    eess.IV cs.CV

    Multi-scale MRI reconstruction via dilated ensemble networks

    Authors: Wendi Ma, Marlon Bran Lorenzana, Wei Dai, Hongfu Sun, Shekhar S. Chandra

    Abstract: As aliasing artefacts are highly structural and non-local, many MRI reconstruction networks use pooling to enlarge filter coverage and incorporate global context. However, this inadvertently impedes fine detail recovery as downsampling creates a resolution bottleneck. Moreover, real and imaginary features are commonly split into separate channels, discarding phase information particularly importan… ▽ More

    Submitted 30 November, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

  19. arXiv:2308.14983  [pdf

    cs.AI cs.LG eess.SP

    Constructive Incremental Learning for Fault Diagnosis of Rolling Bearings with Ensemble Domain Adaptation

    Authors: Jiang Liu, Wei Dai

    Abstract: Given the prevalence of rolling bearing fault diagnosis as a practical issue across various working conditions, the limited availability of samples compounds the challenge. Additionally, the complexity of the external environment and the structure of rolling bearings often manifests faults characterized by randomness and fuzziness, hindering the effective extraction of fault characteristics and re… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  20. arXiv:2308.00291  [pdf, other

    eess.IV cs.CV

    Fundus-Enhanced Disease-Aware Distillation Model for Retinal Disease Classification from OCT Images

    Authors: Lehan Wang, Weihang Dai, Mei Jin, Chubin Ou, Xiaomeng Li

    Abstract: Optical Coherence Tomography (OCT) is a novel and effective screening tool for ophthalmic examination. Since collecting OCT images is relatively more expensive than fundus photographs, existing methods use multi-modal learning to complement limited OCT data with additional context from fundus images. However, the multi-modal framework requires eye-paired datasets of both modalities, which is impra… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: Accepted as a conference paper at MICCAI 2023

  21. arXiv:2306.04286  [pdf, other

    cs.SD cs.AI eess.AS

    A Mask Free Neural Network for Monaural Speech Enhancement

    Authors: Liang Liu, Haixin Guan, Jinlong Ma, Wei Dai, Guangyong Wang, Shaowei Ding

    Abstract: In speech enhancement, the lack of clear structural characteristics in the target speech phase requires the use of conservative and cumbersome network frameworks. It seems difficult to achieve competitive performance using direct methods and simple network architectures. However, we propose the MFNet, a direct and simple network that can not only map speech but also map reverse noise. This network… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  22. arXiv:2304.03209  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Implicit Anatomical Rendering for Medical Image Segmentation with Stochastic Experts

    Authors: Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, James S. Duncan

    Abstract: Integrating high-level semantically correlated contents and low-level anatomical features is of central importance in medical image segmentation. Towards this end, recent deep learning-based medical segmentation methods have shown great promise in better modeling such information. However, convolution operators for medical segmentation typically operate on regular grids, which inherently blur the… ▽ More

    Submitted 17 July, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: Accepted at International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2023)

  23. arXiv:2304.02689  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

    Authors: Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, Jasjeet S. Sekhon, James S. Duncan

    Abstract: Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they… ▽ More

    Submitted 17 July, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Comments: Accepted by International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2023)

  24. arXiv:2303.07816  [pdf, other

    eess.AS cs.SD

    Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

    Authors: Wang Dai, Archontis Politis, Tuomas Virtanen

    Abstract: This work proposes a learnable filterbank based on a multi-channel masking framework for multi-channel source separation. The learnable filterbank is a 1D Conv layer, which transforms the raw waveform into a 2D representation. In contrast to the conventional single-channel masking method, we estimate a mask for each individual microphone channel. The estimated masks are then applied to the transfo… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  25. arXiv:2303.05696  [pdf, other

    eess.IV cs.CV cs.LG

    Explainable Semantic Medical Image Segmentation with Style

    Authors: Wei Dai, Siyu Liu, Craig B. Engstrom, Shekhar S. Chandra

    Abstract: Semantic medical image segmentation using deep learning has recently achieved high accuracy, making it appealing to clinical problems such as radiation therapy. However, the lack of high-quality semantically labelled data remains a challenge leading to model brittleness to small shifts to input data. Most works require extra data for semi-supervised learning and lack the interpretability of the bo… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  26. arXiv:2303.02666  [pdf, other

    eess.IV cs.CV cs.LG

    Learned Lossless Compression for JPEG via Frequency-Domain Prediction

    Authors: Jixiang Luo, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: JPEG images can be further compressed to enhance the storage and transmission of large-scale image datasets. Existing learned lossless compressors for RGB images cannot be well transferred to JPEG images due to the distinguishing distribution of DCT coefficients and raw pixels. In this paper, we propose a novel framework for learned lossless compression of JPEG images that achieves end-to-end opti… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

  27. arXiv:2302.01735  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective

    Authors: Chenyu You, Weicheng Dai, Yifei Min, Fenglin Liu, David A. Clifton, S Kevin Zhou, Lawrence Hamilton Staib, James S Duncan

    Abstract: For medical image segmentation, contrastive learning is the dominant practice to improve the quality of visual representations by contrasting semantically similar and dissimilar pairs of samples. This is enabled by the observation that without accessing ground truth labels, negative examples with truly dissimilar anatomical features, if sampled, can significantly improve the performance. In realit… ▽ More

    Submitted 23 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Accepted by Advances in Neural Information Processing Systems (NeurIPS 2023)

  28. Discovering Limitations of Image Quality Assessments with Noised Deep Learning Image Sets

    Authors: Wei Dai, Daniel Berleant

    Abstract: Image quality is important, and can affect overall performance in image processing and computer vision as well as for numerous other reasons. Image quality assessment (IQA) is consequently a vital task in different applications from aerial photography interpretation to object detection to medical image analysis. In previous research, the BRISQUE algorithm and the PSNR algorithm were evaluated with… ▽ More

    Submitted 29 January, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: 10 pages, 11 figures, 10 tables

    ACM Class: I.4

    Journal ref: 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 2022, pp. 3735-3744

  29. arXiv:2209.13476  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels

    Authors: Chenyu You, Weicheng Dai, Fenglin Liu, Yifei Min, Nicha C. Dvornek, Xiaoxiao Li, David A. Clifton, Lawrence Staib, James S. Duncan

    Abstract: Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in… ▽ More

    Submitted 22 September, 2024; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE-TPAMI)

  30. arXiv:2208.03028  [pdf, other

    eess.IV cs.CV

    Multimodal Brain Disease Classification with Functional Interaction Learning from Single fMRI Volume

    Authors: Wei Dai, Ziyao Zhang, Lixia Tian, Shengyuan Yu, Shuhui Wang, Zhao Dong, Hairong Zheng

    Abstract: In neuroimaging analysis, fMRI can well assess the function changes for brain diseases with no obvious structural lesions. To date, most deep-learning-based fMRI studies have employed functional connectivity (FC) as the basic feature for disease classification. However, FC is calculated on time series of predefined regions of interest and neglects detailed information contained in each voxel. Anot… ▽ More

    Submitted 1 March, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

  31. arXiv:2207.02663  [pdf, other

    cs.CL cs.SD eess.AS

    Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

    Authors: Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung

    Abstract: With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  32. arXiv:2206.07519  [pdf, other

    eess.SP cs.DB cs.LG

    Smart Meter Data Anomaly Detection using Variational Recurrent Autoencoders with Attention

    Authors: Wenjing Dai, Xiufeng Liu, Alfred Heller, Per Sieverts Nielsen

    Abstract: In the digitization of energy systems, sensors and smart meters are increasingly being used to monitor production, operation and demand. Detection of anomalies based on smart meter data is crucial to identify potential risks and unusual events at an early stage, which can serve as a reference for timely initiation of appropriate actions and improving management. However, smart meter data from ener… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  33. arXiv:2206.02307  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Bootstrapping Semi-supervised Medical Image Segmentation with Anatomical-aware Contrastive Distillation

    Authors: Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, James S. Duncan

    Abstract: Contrastive learning has shown great promise over annotation scarcity problems in the context of medical image segmentation. Existing approaches typically assume a balanced class distribution for both labeled and unlabeled medical images. However, medical image data in reality is commonly imbalanced (i.e., multi-class label imbalance), which naturally yields blurry contours and usually incorrectly… ▽ More

    Submitted 10 March, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted at Information Processing in Medical Imaging (IPMI 2023)

  34. arXiv:2205.04326  [pdf, other

    eess.IV cs.CV cs.LG

    Deeply Supervised Skin Lesions Diagnosis with Stage and Branch Attention

    Authors: Wei Dai, Rui Liu, Tianyi Wu, Min Wang, Jianqin Yin, Jun Liu

    Abstract: Accurate and unbiased examinations of skin lesions are critical for the early diagnosis and treatment of skin diseases. Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment. Recent studies have reported that ensembled convolutional neural networks (CNNs) are practical t… ▽ More

    Submitted 23 August, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: 11 pages, 9 figures

  35. arXiv:2204.11640  [pdf, other

    cs.CV cs.LG eess.IV

    Hybrid ISTA: Unfolding ISTA With Convergence Guarantees Using Free-Form Deep Neural Networks

    Authors: Ziyang Zheng, Wenrui Dai, Duoduo Xue, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: It is promising to solve linear inverse problems by unfolding iterative algorithms (e.g., iterative shrinkage thresholding algorithm (ISTA)) as deep neural networks (DNNs) with learnable parameters. However, existing ISTA-based unfolded algorithms restrict the network architectures for iterative updates with the partial weight coupling structure to guarantee convergence. In this paper, we propose… ▽ More

    Submitted 5 May, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: 109 pages, 16 figures; this is a draft and the final version has been accepted by TPAMI (DOI: 10.1109/TPAMI.2022.3172214)

  36. arXiv:2203.16954  [pdf, other

    cs.CL cs.SD eess.AS

    An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

    Authors: Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng

    Abstract: Text normalization, defined as a procedure transforming non standard words to spoken-form words, is crucial to the intelligibility of synthesized speech in text-to-speech system. Rule-based methods without considering context can not eliminate ambiguation, whereas sequence-to-sequence neural network based methods suffer from the unexpected and uninterpretable errors problem. Recently proposed hybr… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP 2022

  37. arXiv:2201.02419  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

    Authors: Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech… ▽ More

    Submitted 17 January, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

  38. arXiv:2110.12786  [pdf, ps, other

    eess.SP cs.LG

    Dictionary Learning Using Rank-One Atomic Decomposition (ROAD)

    Authors: Cheng Cheng, Wei Dai

    Abstract: Dictionary learning aims at seeking a dictionary under which the training data can be sparsely represented. Methods in the literature typically formulate the dictionary learning problem as an optimization w.r.t. two variables, i.e., dictionary and sparse coefficients, and solve it by alternating between two stages: sparse coding and dictionary update. The key contribution of this work is a Rank-On… ▽ More

    Submitted 26 October, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: text overlap with arXiv:1911.08975

  39. arXiv:2110.06641  [pdf, ps, other

    eess.SP cs.LG

    Dictionary Learning with Convex Update (ROMD)

    Authors: Cheng Cheng, Wei Dai

    Abstract: Dictionary learning aims to find a dictionary under which the training data can be sparsely represented, and it is usually achieved by iteratively applying two stages: sparse coding and dictionary update. Typical methods for dictionary update focuses on refining both dictionary atoms and their corresponding sparse coefficients by using the sparsity patterns obtained from sparse coding stage, and h… ▽ More

    Submitted 25 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  40. arXiv:2110.02141  [pdf, ps, other

    eess.SP

    Short-and-Sparse Deconvolution Via Rank-One Constrained Optimization (ROCO)

    Authors: Cheng Cheng, Wei Dai

    Abstract: Short-and-sparse deconvolution (SaSD) aims to recover a short kernel and a long and sparse signal from their convolution. In the literature, formulations of blind deconvolution is either a convex programming via a matrix lifting of convolution, or a bilinear Lasso. Optimization solvers are typically based on bilinear factorizations. In this paper, we formulate SaSD as a non-convex optimization wit… ▽ More

    Submitted 22 November, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

  41. arXiv:2109.05443  [pdf, other

    eess.IV cs.CV

    CAN3D: Fast 3D Medical Image Segmentation via Compact Context Aggregation

    Authors: Wei Dai, Boyeong Woo, Siyu Liu, Matthew Marques, Craig B. Engstrom, Peter B. Greer, Stuart Crozier, Jason A. Dowling, Shekhar S. Chandra

    Abstract: Direct automatic segmentation of objects from 3D medical imaging, such as magnetic resonance (MR) imaging, is challenging as it often involves accurately identifying a number of individual objects with complex geometries within a large volume under investigation. To address these challenges, most deep learning approaches typically enhance their learning capability by substantially increasing the c… ▽ More

    Submitted 22 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: 21 pages, 7 figures

  42. arXiv:2107.12829  [pdf, other

    eess.SY

    Conflict-Free Four-Dimensional Path Planning for Urban Air Mobility Considering Airspace Occupations

    Authors: Wei Dai, Bizhao Pang, Kin Huat Low

    Abstract: Urban air mobility (UAM) has attracted the attention of aircraft manufacturers, air navigation service providers and governments in recent years. Preventing the conflict among urban aircraft is crucial to UAM traffic safety, which is a key in enabling large scale UAM operation. Pre-flight conflict-free path planning can provide a strategic layer in the maintenance of safety performance, thus becom… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

  43. arXiv:2107.05097  [pdf, other

    cs.LG cs.CV eess.IV q-bio.NC

    BrainNNExplainer: An Interpretable Graph Neural Network Framework for Brain Network based Disease Analysis

    Authors: Hejie Cui, Wei Dai, Yanqiao Zhu, Xiaoxiao Li, Lifang He, Carl Yang

    Abstract: Interpretable brain network models for disease prediction are of great value for the advancement of neuroscience. GNNs are promising to model complicated network data, but they are prone to overfitting and suffer from poor interpretability, which prevents their usage in decision-critical scenarios like healthcare. To bridge this gap, we propose BrainNNExplainer, an interpretable GNN framework for… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

    Comments: This paper has been accepted to ICML 2021 Workshop on Interpretable Machine Learning in Healthcare

    MSC Class: 68T07; 68T45; 68T20 ACM Class: I.2.6; I.2.10; J.3

  44. arXiv:2106.09910  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Message Passing in Graph Convolution Networks via Adaptive Filter Banks

    Authors: Xing Gao, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong, Pascal Frossard

    Abstract: Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolut… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  45. arXiv:2103.07770  [pdf, other

    eess.IV cs.CV

    VMAF And Variants: Towards A Unified VQA

    Authors: Pankaj Topiwala, Wei Dai, Jiangfeng Pian, Katalina Biondi, Arvind Krovvidi

    Abstract: Video quality assessment (VQA) is now a fast-growing subject, maturing in the full reference (FR) case, yet challenging in the exploding no reference (NR) case. We investigate variants of the popular VMAF video quality assessment algorithm for the FR case, using both support vector regression and feedforward neural networks. We extend it to the NR case, using some different features but similar le… ▽ More

    Submitted 8 October, 2021; v1 submitted 13 March, 2021; originally announced March 2021.

    Comments: Some calculational errors have been fixed in this version

  46. arXiv:2006.15578  [pdf, other

    eess.IV cs.CV

    Generalisable 3D Fabric Architecture for Streamlined Universal Multi-Dataset Medical Image Segmentation

    Authors: Siyu Liu, Wei Dai, Craig Engstrom, Jurgen Fripp, Stuart Crozier, Jason A. Dowling, Shekhar S. Chandra

    Abstract: Data scarcity is common in deep learning models for medical image segmentation. Previous works proposed multi-dataset learning, either simultaneously or via transfer learning to expand training sets. However, medical image datasets have diverse-sized images and features, and developing a model simultaneously for multiple datasets is challenging. This work proposes Fabric Image Representation Encod… ▽ More

    Submitted 28 November, 2022; v1 submitted 28 June, 2020; originally announced June 2020.

  47. arXiv:2006.11118  [pdf, other

    cs.LG eess.SP stat.ML

    Graph Pooling with Node Proximity for Hierarchical Representation Learning

    Authors: Xing Gao, Wenrui Dai, Chenglin Li, Hongkai Xiong, Pascal Frossard

    Abstract: Graph neural networks have attracted wide attentions to enable representation learning of graph data in recent works. In complement to graph convolution operators, graph pooling is crucial for extracting hierarchical representation of graph data. However, most recent graph pooling methods still fail to efficiently exploit the geometry of graph data. In this paper, we propose a novel graph pooling… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  48. arXiv:2005.10803  [pdf, other

    eess.AS cs.SD

    Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism

    Authors: Wang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie

    Abstract: Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for forman… ▽ More

    Submitted 8 August, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: Accepted by Interspeech 2020

  49. arXiv:2001.02908  [pdf, other

    eess.SP cs.LG

    Spatial-Temporal Transformer Networks for Traffic Flow Forecasting

    Authors: Mingxing Xu, Wenrui Dai, Chunmiao Liu, Xing Gao, Weiyao Lin, Guo-Jun Qi, Hongkai Xiong

    Abstract: Traffic forecasting has emerged as a core component of intelligent transportation systems. However, timely accurate traffic forecasting, especially long-term forecasting, still remains an open challenge due to the highly nonlinear and dynamic spatial-temporal dependencies of traffic flows. In this paper, we propose a novel paradigm of Spatial-Temporal Transformer Networks (STTNs) that leverages dy… ▽ More

    Submitted 29 March, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

  50. arXiv:1911.08975  [pdf, other

    eess.SP

    Dictionary Learning Using Rank-One Projection (ROP)

    Authors: Cheng Cheng, Wei Dai

    Abstract: Dictionary learning aims to find a dictionary that can sparsely represent the training data. Methods in the literature typically formulate the dictionary learning problem as an optimisation with respect to two variables, i.e., dictionary and sparse coefficients, and solve it by alternating between two stages: sparse coding and dictionary update. The key contribution of this work is a Rank-One Proj… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.