Skip to main content

Showing 1–50 of 229 results for author: Chen, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03814  [pdf, ps, other

    eess.SP

    SHAP-AAD: DeepSHAP-Guided Channel Reduction for EEG Auditory Attention Detection

    Authors: Rayan Salmi, Guorui Lu, Qinyu Chen

    Abstract: Electroencephalography (EEG)-based auditory attention detection (AAD) offers a non-invasive way to enhance hearing aids, but conventional methods rely on too many electrodes, limiting wearability and comfort. This paper presents SHAP-AAD, a two-stage framework that combines DeepSHAP-based channel selection with a lightweight temporal convolutional network (TCN) for efficient AAD using fewer channe… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 5 pages, conference

  2. arXiv:2507.01291  [pdf, ps, other

    eess.IV cs.CV

    PanTS: The Pancreatic Tumor Segmentation Dataset

    Authors: Wenxuan Li, Xinze Zhou, Qi Chen, Tianyu Lin, Pedro R. A. S. Bassi, Szymon Plotka, Jaroslaw B. Cwikla, Xiaoxi Chen, Chen Ye, Zheren Zhu, Kai Ding, Heng Li, Kang Wang, Yang Yang, Yucheng Tang, Daguang Xu, Alan L. Yuille, Zongwei Zhou

    Abstract: PanTS is a large-scale, multi-institutional dataset curated to advance research in pancreatic CT analysis. It contains 36,390 CT scans from 145 medical centers, with expert-validated, voxel-wise annotations of over 993,000 anatomical structures, covering pancreatic tumors, pancreas head, body, and tail, and 24 surrounding anatomical structures such as vascular/skeletal structures and abdominal/tho… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2506.21448  [pdf, ps, other

    eess.AS cs.CV cs.SD

    ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

    Authors: Huadai Liu, Jialei Wang, Kaicheng Luo, Wen Wang, Qian Chen, Zhou Zhao, Wei Xue

    Abstract: While end-to-end video-to-audio generation has greatly improved, producing high-fidelity audio that authentically captures the nuances of visual content remains challenging. Like professionals in the creative industries, such generation requires sophisticated reasoning about items such as visual dynamics, acoustic environments, and temporal relationships. We present ThinkSound, a novel framework t… ▽ More

    Submitted 28 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  4. arXiv:2506.17540  [pdf, ps, other

    eess.IV cs.CV cs.LG

    MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization

    Authors: Tingting Liu, Yuan Liu, Jinhui Tang, Liyin Yuan, Chengyu Liu, Chunlai Li, Xiubao Sui, Qian Chen

    Abstract: Thermal infrared (TIR) images, acquired through thermal radiation imaging, are unaffected by variations in lighting conditions and atmospheric haze. However, TIR images inherently lack color and texture information, limiting downstream tasks and potentially causing visual fatigue. Existing colorization methods primarily rely on single-band images with limited spectral information and insufficient… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  5. arXiv:2506.12165  [pdf, ps, other

    eess.SP cs.AI

    TCN-DPD: Parameter-Efficient Temporal Convolutional Networks for Wideband Digital Predistortion

    Authors: Huanqiang Duan, Manno Versluis, Qinyu Chen, Leo C. N. de Vreede, Chang Gao

    Abstract: Digital predistortion (DPD) is essential for mitigating nonlinearity in RF power amplifiers, particularly for wideband applications. This paper presents TCN-DPD, a parameter-efficient architecture based on temporal convolutional networks, integrating noncausal dilated convolutions with optimized activation functions. Evaluated on the OpenDPD framework with the DPA_200MHz dataset, TCN-DPD achieves… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted to IEEE MTT-S International Microwave Symposium (IMS) 2025

  6. arXiv:2506.06566  [pdf, ps, other

    eess.AS cs.AI

    AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition

    Authors: Chen Bao, Chuanbing Huo, Qinyu Chen, Chang Gao

    Abstract: This paper proposes AS-ASR, a lightweight aphasia-specific speech recognition framework based on Whisper-tiny, tailored for low-resource deployment on edge devices. Our approach introduces a hybrid training strategy that systematically combines standard and aphasic speech at varying ratios, enabling robust generalization, and a GPT-4-based reference enhancement method that refines noisy aphasic tr… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Under review

  7. arXiv:2506.02093  [pdf, other

    eess.IV cs.CV

    Are Pixel-Wise Metrics Reliable for Sparse-View Computed Tomography Reconstruction?

    Authors: Tianyu Lin, Xinran Li, Chuntung Zhuang, Qi Chen, Yuanhao Cai, Kai Ding, Alan L. Yuille, Zongwei Zhou

    Abstract: Widely adopted evaluation metrics for sparse-view CT reconstruction--such as Structural Similarity Index Measure and Peak Signal-to-Noise Ratio--prioritize pixel-wise fidelity but often fail to capture the completeness of critical anatomical structures, particularly small or thin regions that are easily missed. To address this limitation, we propose a suite of novel anatomy-aware evaluation metric… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  8. arXiv:2505.24496  [pdf, other

    eess.AS

    Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

    Authors: Wenrui Liu, Qian Chen, Wen Wang, Yafeng Chen, Jin Xu, Zhifang Guo, Guanrou Yang, Weiqin Li, Xiaoda Yang, Tao Jin, Minghui Fang, Jialong Zuo, Bai Jionghao, Zemin Liu

    Abstract: Neural audio codecs, used as speech tokenizers, have demonstrated remarkable potential in the field of speech generation. However, to ensure high-fidelity audio reconstruction, neural audio codecs typically encode audio into long sequences of speech tokens, posing a significant challenge for downstream language models in long-context modeling. We observe that speech token sequences exhibit short-r… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  9. arXiv:2505.17589  [pdf, ps, other

    cs.SD cs.AI eess.AS

    CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

    Authors: Zhihao Du, Changfeng Gao, Yuxuan Wang, Fan Yu, Tianyu Zhao, Hao Wang, Xiang Lv, Hui Wang, Chongjia Ni, Xian Shi, Keyu An, Guanrou Yang, Yabin Li, Yanni Chen, Zhifu Gao, Qian Chen, Yue Gu, Mengzhe Chen, Yafeng Chen, Shiliang Zhang, Wen Wang, Jieping Ye

    Abstract: In our prior works, we introduced a scalable streaming speech synthesis model, CosyVoice 2, which integrates a large language model (LLM) and a chunk-aware flow matching (FM) model, and achieves low-latency bi-streaming speech synthesis and human-parity quality. Despite these advancements, CosyVoice 2 exhibits limitations in language coverage, domain diversity, data volume, text formats, and post-… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Preprint, work in progress

  10. arXiv:2505.13826  [pdf, ps, other

    eess.AS cs.SD

    Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization

    Authors: Yafeng Chen, Chong Deng, Hui Wang, Yiheng Jiang, Han Yin, Qian Chen, Wen Wang

    Abstract: Developing robust speaker verification (SV) systems without speaker labels has been a longstanding challenge. Earlier research has highlighted a considerable performance gap between self-supervised and fully supervised approaches. In this paper, we enhance the non-contrastive self-supervised framework, Self-Distillation Prototypes Network (SDPN), by introducing dimension regularization that explic… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  11. arXiv:2505.06250  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    DeltaDPD: Exploiting Dynamic Temporal Sparsity in Recurrent Neural Networks for Energy-Efficient Wideband Digital Predistortion

    Authors: Yizhuo Wu, Yi Zhu, Kun Qian, Qinyu Chen, Anding Zhu, John Gajadharsing, Leo C. N. de Vreede, Chang Gao

    Abstract: Digital Predistortion (DPD) is a popular technique to enhance signal quality in wideband RF power amplifiers (PAs). With increasing bandwidth and data rates, DPD faces significant energy consumption challenges during deployment, contrasting with its efficiency goals. State-of-the-art DPD models rely on recurrent neural networks (RNN), whose computational complexity hinders system efficiency. This… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)

  12. arXiv:2505.03266  [pdf

    physics.optics cs.IT eess.SP

    Rapid diagnostics of reconfigurable intelligent surfaces using space-time-coding modulation

    Authors: Yi Ning Zheng, Lei Zhang, Xiao Qing Chen, Marco Rossi, Giuseppe Castaldi, Shuo Liu, Tie Jun Cui, Vincenzo Galdi

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time co… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 30 pages, 6 figures, 1 table, supporting information

  13. arXiv:2504.14906  [pdf, ps, other

    eess.AS cs.CV cs.SD

    OmniAudio: Generating Spatial Audio from 360-Degree Video

    Authors: Huadai Liu, Tianyi Luo, Kaicheng Luo, Qikai Jiang, Peiwen Sun, Jialei Wang, Rongjie Huang, Qian Chen, Wen Wang, Xiangtai Li, Shiliang Zhang, Zhijie Yan, Zhou Zhao, Wei Xue

    Abstract: Traditional video-to-audio generation techniques primarily focus on perspective video and non-spatial audio, often missing the spatial cues necessary for accurately representing sound sources in 3D environments. To address this limitation, we introduce a novel task, 360V2SA, to generate spatial audio from 360-degree videos, specifically producing First-order Ambisonics (FOA) audio - a standard for… ▽ More

    Submitted 2 June, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: ICML 2025

  14. arXiv:2504.12867  [pdf, other

    eess.AS cs.AI cs.CL

    EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

    Authors: Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, ShiLiang Zhang, Xie Chen

    Abstract: Human speech goes beyond the mere transfer of information; it is a profound exchange of emotions and a connection between individuals. While Text-to-Speech (TTS) models have made huge progress, they still face challenges in controlling the emotional expression in the generated speech. In this work, we propose EmoVoice, a novel emotion-controllable TTS model that exploits large language models (LLM… ▽ More

    Submitted 21 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  15. arXiv:2504.09601  [pdf, other

    cs.CV cs.LG cs.MM eess.IV physics.med-ph

    Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation

    Authors: Jia Wei, Xiaoqi Zhao, Jonghye Woo, Jinsong Ouyang, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu

    Abstract: Single domain generalization (SDG) has recently attracted growing attention in medical image segmentation. One promising strategy for SDG is to leverage consistent semantic shape priors across different imaging protocols, scanner vendors, and clinical sites. However, existing dictionary learning methods that encode shape priors often suffer from limited representational power with a small set of o… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025 workshop

  16. arXiv:2503.21818  [pdf

    eess.IV cs.CV

    Deep Learning-Based Quantitative Assessment of Renal Chronicity Indices in Lupus Nephritis

    Authors: Tianqi Tu, Hui Wang, Jiangbo Pei, Xiaojuan Yu, Aidong Men, Suxia Wang, Qingchao Chen, Ying Tan, Feng Yu, Minghui Zhao

    Abstract: Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  17. arXiv:2503.14933  [pdf, ps, other

    eess.IV cs.CV physics.med-ph

    A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology

    Authors: Yi Luo, Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Xiaojian Chen, Rui Zhang, Quan Chen, Wil Ngwa, Kai Ding

    Abstract: Background: Lung cancer ranks as the leading cause of cancer-related mortality worldwide. The complexity of tumor delineation, crucial for radiation therapy, requires expertise often unavailable in resource-limited settings. Artificial Intelligence(AI), particularly with advancements in deep learning (DL) and natural language processing (NLP), offers potential solutions yet is challenged by high f… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 19 pages, 4 figures

  18. arXiv:2503.12010  [pdf, other

    eess.AS

    Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection

    Authors: Qixian Chen, Yuxiong Xu, Sara Mandelli, Sheng Li, Bin Li

    Abstract: In audio spoofing detection, most studies rely on clean datasets, making models susceptible to real-world post-processing attacks, such as channel compression and noise. To overcome this challenge, we propose the Adaptive MixtUre Low-rank ExperTs (AMULET) framework, which enhances resilience by leveraging attack-specific knowledge and dynamically adapting to varied attack conditions. Specifically,… ▽ More

    Submitted 10 May, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Comments: 5 pages, 1 figure, 4 tables

  19. arXiv:2503.10522  [pdf, other

    cs.MM cs.CV cs.LG cs.SD eess.AS

    AudioX: Diffusion Transformer for Anything-to-Audio Generation

    Authors: Zeyue Tian, Yizhu Jin, Zhaoyang Liu, Ruibin Yuan, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

    Abstract: Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs. In this work, we propose AudioX, a unified Diffusion Transformer model for Anyt… ▽ More

    Submitted 23 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: The code and datasets will be available at https://zeyuet.github.io/AudioX/

  20. arXiv:2503.01485  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    FlowDec: A flow-based full-band general audio codec with high perceptual quality

    Authors: Simon Welker, Matthew Le, Ricky T. Q. Chen, Wei-Ning Hsu, Timo Gerkmann, Alexander Richard, Yi-Chiao Wu

    Abstract: We propose FlowDec, a neural full-band audio codec for general audio sampled at 48 kHz that combines non-adversarial codec training with a stochastic postfilter based on a novel conditional flow matching method. Compared to the prior work ScoreDec which is based on score matching, we generalize from speech to general audio and move from 24 kbit/s to as low as 4 kbit/s, while improving output quali… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted at ICLR 2025

  21. arXiv:2503.00084  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation

    Authors: Chong Zhang, Yukun Ma, Qian Chen, Wen Wang, Shengkui Zhao, Zexu Pan, Hao Wang, Chongjia Ni, Trung Hieu Nguyen, Kun Zhou, Yidi Jiang, Chaohong Tan, Zhifu Gao, Zhihao Du, Bin Ma

    Abstract: We introduce InspireMusic, a framework integrated super resolution and large language model for high-fidelity long-form music generation. A unified framework generates high-fidelity music, songs, and audio, which incorporates an autoregressive transformer with a super-resolution flow-matching model. This framework enables the controllable generation of high-fidelity long-form music at a higher sam… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Work in progress. Correspondence regarding this technical report should be directed to {chong.zhang, yukun.ma}@alibaba-inc.com. Online demo available on https://modelscope.cn/studios/iic/InspireMusic and https://huggingface.co/spaces/FunAudioLLM/InspireMusic

  22. arXiv:2502.20067  [pdf, other

    eess.AS cs.SD

    UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook

    Authors: Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang, Xianghu Yue, ShiLiang Zhang, Haizhou Li

    Abstract: The emergence of audio language models is empowered by neural audio codecs, which establish critical mappings between continuous waveforms and discrete tokens compatible with language model paradigms. The evolutionary trends from multi-layer residual vector quantizer to single-layer quantizer are beneficial for language-autoregressive decoding. However, the capability to handle multi-domain audio… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 12 pages, 9 tables

  23. arXiv:2502.17468  [pdf, other

    eess.SP cs.LG

    CSSSTN: A Class-sensitive Subject-to-subject Semantic Style Transfer Network for EEG Classification in RSVP Tasks

    Authors: Ziyue Yang, Chengrui Chen, Yong Peng, Qiong Chen, Wanzeng Kong

    Abstract: The Rapid Serial Visual Presentation (RSVP) paradigm represents a promising application of electroencephalography (EEG) in Brain-Computer Interface (BCI) systems. However, cross-subject variability remains a critical challenge, particularly for BCI-illiterate users who struggle to effectively interact with these systems. To address this issue, we propose the Class-Sensitive Subject-to-Subject Sema… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  24. arXiv:2502.06289  [pdf

    eess.IV cs.AI cs.CV

    Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases?

    Authors: Qingshan Hou, Yukun Zhou, Jocelyn Hui Lin Goh, Ke Zou, Samantha Min Er Yew, Sahana Srinivasan, Meng Wang, Thaddaeus Lo, Xiaofeng Lei, Siegfried K. Wagner, Mark A. Chia, Dawei Yang, Hongyang Jiang, AnRan Ran, Rui Santos, Gabor Mark Somfai, Juan Helen Zhou, Haoyu Chen, Qingyu Chen, Carol Yim-Lui Cheung, Pearse A. Keane, Yih Chung Tham

    Abstract: The advent of foundation models (FMs) is transforming medical domain. In ophthalmology, RETFound, a retina-specific FM pre-trained sequentially on 1.4 million natural images and 1.6 million retinal images, has demonstrated high adaptability across clinical applications. Conversely, DINOv2, a general-purpose vision FM pre-trained on 142 million natural images, has shown promise in non-medical domai… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  25. arXiv:2502.05845  [pdf

    eess.SY

    Exploiting the Hidden Capacity of MMC Through Accurate Quantification of Modulation Indices

    Authors: Qianhao Sun, Jingwei Meng, Ruofan Li, Mingchao Xia, Qifang Chen, Jiejie Zhou, Meiqi Fan, Peiqian Guo

    Abstract: The modular multilevel converter (MMC) has become increasingly important in voltage-source converter-based high-voltage direct current (VSC-HVDC) systems. Direct and indirect modulation are widely used as mainstream modulation techniques in MMCs. However, due to the challenge of quantitatively evaluating the operation of different modulation schemes, the academic and industrial communities still h… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  26. arXiv:2502.05842  [pdf

    eess.SY

    A Grid-Forming HVDC Series Tapping Converter Using Extended Techniques of Flex-LCC

    Authors: Qianhao Sun, Ruofan Li, Jichen Wang, Mingchao Xia, Qifang Chen, Meiqi Fan, Gen Li, Xuebo Qiao

    Abstract: This paper discusses an extension technology for the previously proposed Flexible Line-Commutated Converter (Flex LCC) [1]. The proposed extension involves modifying the arm internal-electromotive-force control, redesigning the main-circuit parameters, and integrating a low-power coordination strategy. As a result, the Flex-LCC transforms from a grid-forming (GFM) voltage source converter (VSC) ba… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  27. arXiv:2501.08566  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement

    Authors: Qianniu Chen, Xiaoyang Hao, Bowen Li, Yue Liu, Li Lu

    Abstract: Zero-shot Text-To-Speech (TTS) synthesis shows great promise for personalized voice customization through voice cloning. However, current methods for achieving zero-shot TTS heavily rely on large model scales and extensive training datasets to ensure satisfactory performance and generalizability across various speakers. This raises concerns regarding both deployment costs and data security. In thi… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: 5 pages,4 figures

  28. arXiv:2501.06282  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

    Authors: Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan , et al. (11 additional authors not shown)

    Abstract: Recent advancements in large language models (LLMs) and multimodal speech-text models have laid the groundwork for seamless voice interactions, enabling real-time, natural, and human-like conversations. Previous models for voice interactions are categorized as native and aligned. Native models integrate speech and text processing in one framework but struggle with issues like differing sequence le… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  29. arXiv:2412.20914  [pdf, other

    cs.SD cs.IR eess.AS

    Language-based Audio Retrieval with Co-Attention Networks

    Authors: Haoran Sun, Zimu Wang, Qiuyi Chen, Jianjun Chen, Jia Wang, Haiyang Zhang

    Abstract: In recent years, user-generated audio content has proliferated across various media platforms, creating a growing need for efficient retrieval methods that allow users to search for audio clips using natural language queries. This task, known as language-based audio retrieval, presents significant challenges due to the complexity of learning semantic representations from heterogeneous data across… ▽ More

    Submitted 30 December, 2024; originally announced December 2024.

    Comments: Accepted at UIC 2024 proceedings. Accepted version

  30. arXiv:2412.18589  [pdf, other

    eess.IV cs.CV

    Text-Driven Tumor Synthesis

    Authors: Xinran Li, Yi Shuai, Chen Liu, Qi Chen, Qilong Wu, Pengfei Guo, Dong Yang, Can Zhao, Pedro R. A. S. Bassi, Daguang Xu, Kang Wang, Yang Yang, Alan Yuille, Zongwei Zhou

    Abstract: Tumor synthesis can generate examples that AI often misses or over-detects, improving AI performance by training on these challenging cases. However, existing synthesis methods, which are typically unconditional -- generating images from random variables -- or conditioned only by tumor shapes, lack controllability over specific tumor characteristics such as texture, heterogeneity, boundaries, and… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  31. arXiv:2412.10117  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

    Authors: Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou

    Abstract: In our previous work, we introduced CosyVoice, a multilingual speech synthesis model based on supervised discrete speech tokens. By employing progressive semantic decoding with two popular generative models, language models (LMs) and Flow Matching, CosyVoice demonstrated high prosody naturalness, content consistency, and speaker similarity in speech in-context learning. Recently, significant progr… ▽ More

    Submitted 25 December, 2024; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: Tech report, work in progress

  32. arXiv:2412.04783  [pdf, ps, other

    cs.CV cs.AI eess.SP

    KNN-MMD: Cross Domain Wireless Sensing via Local Distribution Alignment

    Authors: Zijian Zhao, Zhijie Cai, Tingwei Chen, Xiaoyang Li, Hang Li, Qimei Chen, Guangxu Zhu

    Abstract: Wireless sensing has recently found widespread applications in diverse environments, including homes, offices, and public spaces. By analyzing patterns in channel state information (CSI), it is possible to infer human actions for tasks such as person identification, gesture recognition, and fall detection. However, CSI is highly sensitive to environmental changes, where even minor alterations can… ▽ More

    Submitted 27 June, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  33. arXiv:2412.01525  [pdf, ps, other

    eess.IV cs.CV

    Towards Clinical Practice in CT-Based Pulmonary Disease Screening: An Efficient and Reliable Framework

    Authors: Qian Shao, Bang Du, Kai Zhang, Yixuan Wu, Zepeng Li, Qiyuan Chen, Qianqian Tang, Jian Wu, Jintai Chen, Honghao Gao, Hongxia Xu

    Abstract: Deep learning models for pulmonary disease screening from Computed Tomography (CT) scans promise to alleviate the immense workload on radiologists. Still, their high computational cost, stemming from processing entire 3D volumes, remains a major barrier to widespread clinical adoption. Current sub-sampling techniques often compromise diagnostic integrity by introducing artifacts or discarding crit… ▽ More

    Submitted 12 June, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  34. arXiv:2411.18369  [pdf, ps, other

    cs.RO cs.AI cs.CV eess.SY

    G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

    Authors: Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, Ping Luo

    Abstract: Recent advances in imitation learning for 3D robotic manipulation have shown promising results with diffusion-based policies. However, achieving human-level dexterity requires seamless integration of geometric precision and semantic understanding. We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D semantic representation by leveraging foundat… ▽ More

    Submitted 21 June, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Webpage: https://tianxingchen.github.io/G3Flow/, accepted to CVPR 2025

  35. arXiv:2411.17488  [pdf, other

    eess.IV cs.CV

    Structure-Guided MR-to-CT Synthesis with Spatial and Semantic Alignments for Attenuation Correction of Whole-Body PET/MR Imaging

    Authors: Jiaxu Zheng, Zhenrong Shen, Lichi Zhang, Qun Chen

    Abstract: Deep-learning-based MR-to-CT synthesis can estimate the electron density of tissues, thereby facilitating PET attenuation correction in whole-body PET/MR imaging. However, whole-body MR-to-CT synthesis faces several challenges including the issue of spatial misalignment and the complexity of intensity mapping, primarily due to the variety of tissues and organs throughout the whole body. Here we pr… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  36. Computation-power Coupled Modeling for IDCs and Collaborative Optimization in ADNs

    Authors: Chuyi Li, Kedi Zheng, Hongye Guo, Chongqing Kang, Qixin Chen

    Abstract: The batch and online workload of Internet data centers (IDCs) offer temporal and spatial scheduling flexibility. Given that power generation costs vary over time and location, harnessing the flexibility of IDCs' energy consumption through workload regulation can optimize the power flow within the system. This paper focuses on multi-geographically distributed IDCs managed by an Internet service com… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Smart Grid. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Journal ref: IEEE Transactions on Smart Grid, VOL. 15, NO. 3, MAY 2024

  37. Optimal Energy Dispatch of Grid-Connected Electric Vehicle Considering Lithium Battery Electrochemical Model

    Authors: Yuanbo Chen, Kedi Zheng, Yuxuan Gu, Jianxiao Wang, Qixin Chen

    Abstract: The grid-connected electric vehicles (EVs) serve as a promising regulating resource in the distribution grid with Vehicle-to-Grid (V2G) facilities. In the day-ahead stage, electric vehicle batteries (EVBs) need to be precisely dispatched and controlled to ensure high efficiency and prevent degradation. This article focuses on considering a refined battery model, i.e. the electrochemical model (EM)… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Smart Grid. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Journal ref: IEEE Transactions on Smart Grid, vol. 15, no. 3, pp. 3000-3015, May 2024

  38. A Data-Driven Pool Strategy for Price-Makers Under Imperfect Information

    Authors: Kedi Zheng, Hongye Guo, Qixin Chen

    Abstract: This paper studies the pool strategy for price-makers under imperfect information. In this occasion, market participants cannot obtain essential transmission parameters of the power system. Thus, price-makers should estimate the market results with respect to their offer curves using available historical information. The linear programming model of economic dispatch is analyzed with the theory of… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Power Systems. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Journal ref: IEEE Transactions on Power Systems, vol. 38, no. 1, pp. 278-289, Jan. 2023

  39. arXiv:2411.11699  [pdf, other

    eess.SP

    LiTformer: Efficient Modeling and Analysis of High-Speed Link Transmitters Using Non-Autoregressive Transformer

    Authors: Songyu Sun, Xiao Dong, Yanliang Sha, Quan Chen, Cheng Zhuo

    Abstract: High-speed serial links are fundamental to energy-efficient and high-performance computing systems such as artificial intelligence, 5G mobile and automotive, enabling low-latency and high-bandwidth communication. Transmitters (TXs) within these links are key to signal quality, while their modeling presents challenges due to nonlinear behavior and dynamic interactions with links. In this paper, we… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  40. Unsupervised Congestion Status Identification Using LMP Data

    Authors: Kedi Zheng, Qixin Chen, Yi Wang, Chongqing Kang, Le Xie

    Abstract: Having a better understanding of how locational marginal prices (LMPs) change helps in price forecasting and market strategy making. This paper investigates the fundamental distribution of the congestion part of LMPs in high-dimensional Euclidean space using an unsupervised approach. LMP models based on the lossless and lossy DC optimal power flow (DC-OPF) are analyzed to show the overlapping subs… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Smart Grid. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Journal ref: in IEEE Transactions on Smart Grid, vol. 12, no. 1, pp. 726-736, Jan. 2021

  41. arXiv:2411.06649  [pdf, other

    eess.SY cs.LG eess.SP

    A Novel Combined Data-Driven Approach for Electricity Theft Detection

    Authors: Kedi Zheng, Qixin Chen, Yi Wang, Chongqing Kang, Qing Xia

    Abstract: The two-way flow of information and energy is an important feature of the Energy Internet. Data analytics is a powerful tool in the information flow that aims to solve practical problems using data mining techniques. As the problem of electricity thefts via tampering with smart meters continues to increase, the abnormal behaviors of thefts become more diversified and more difficult to detect. Thus… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Industrial Informatics. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

    Journal ref: in IEEE Transactions on Industrial Informatics, vol. 15, no. 3, pp. 1809-1819, March 2019

  42. Coherent Hierarchical Probabilistic Forecasting of Electric Vehicle Charging Demand

    Authors: Kedi Zheng, Hanwei Xu, Zeyang Long, Yi Wang, Qixin Chen

    Abstract: The growing penetration of electric vehicles (EVs) significantly changes typical load curves in smart grids. With the development of fast charging technology, the volatility of EV charging demand is increasing, which requires additional flexibility for real-time power balance. The forecasting of EV charging demand involves probabilistic modeling of high dimensional time series dynamics across dive… ▽ More

    Submitted 3 November, 2024; v1 submitted 31 October, 2024; originally announced November 2024.

    Comments: Paper accepted for IEEE Transactions on Industrial Applications. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

  43. arXiv:2410.23919  [pdf, other

    cs.IT eess.SY

    Intelligent Angle Map-based Beam Alignment for RIS-aided mmWave Communication Networks

    Authors: Hao Xia, Qing Xue, Yanping Liu, Binggui Zhou, Meng Hua, Qianbin Chen

    Abstract: Recently, reconfigurable intelligent surface (RIS) has been widely used to enhance the performance of millimeter wave (mmWave) communication systems, making beam alignment more challenging. To ensure efficient communication, this paper proposes a novel intelligent angle map-based beam alignment scheme for both general user equipments (UEs) and RIS-aided UEs simultaneously in a fast and effective w… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

  44. arXiv:2410.19813  [pdf, other

    eess.IV

    Threshold-Based Automated Pest Detection System for Sustainable Agriculture

    Authors: Tianle Li, Jia Shu, Qinghong Chen, Murad Mehrab Abrar, John Raiti

    Abstract: This paper presents a threshold-based automated pea weevil detection system, developed as part of the Microsoft FarmVibes project. Based on Internet-of-Things (IoT) and computer vision, the system is designed to monitor and manage pea weevil populations in agricultural settings, with the goal of enhancing crop production and promoting sustainable farming practices. Unlike the machine learning-base… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Accepted for publication at the 7th IEEE International Conference on Internet of Things and Intelligence System (IOTAIS 2024)

  45. arXiv:2410.17799  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

    Authors: Qinglin Zhang, Luyao Cheng, Chong Deng, Qian Chen, Wen Wang, Siqi Zheng, Jiaqing Liu, Hai Yu, Chaohong Tan, Zhihao Du, Shiliang Zhang

    Abstract: Full-duplex spoken dialogue systems significantly surpass traditional turn-based dialogue systems, as they allow simultaneous bidirectional communication, closely mirroring human-human interactions. However, achieving low latency and natural interactions in full-duplex dialogue systems remains a significant challenge, especially considering human conversation dynamics such as interruptions, backch… ▽ More

    Submitted 3 January, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: Work in progress

  46. arXiv:2410.11373  [pdf, other

    cs.CV eess.IV

    DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM

    Authors: Yingjun Shen, Haizhao Dai, Qihe Chen, Yan Zeng, Jiakai Zhang, Yuan Pei, Jingyi Yu

    Abstract: Foundation models in computer vision have demonstrated exceptional performance in zero-shot and few-shot tasks by extracting multi-purpose features from large-scale datasets through self-supervised pre-training methods. However, these models often overlook the severe corruption in cryogenic electron microscopy (cryo-EM) images by high-level noises. We introduce DRACO, a Denoising-Reconstruction Au… ▽ More

    Submitted 28 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

  47. arXiv:2410.11180  [pdf, other

    cs.LG eess.SY

    Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

    Authors: Jinyu Liu, Hongye Guo, Yun Li, Qinghu Tang, Fuquan Huang, Tunan Chen, Haiwang Zhong, Qixin Chen

    Abstract: Over the past decade, bidding in power markets has attracted widespread attention. Reinforcement Learning (RL) has been widely used for power market bidding as a powerful AI tool to make decisions under real-world uncertainties. However, current RL methods mostly employ low dimensional bids, which significantly diverge from the N price-power pairs commonly used in the current power markets. The N-… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  48. arXiv:2410.11062  [pdf, other

    cs.SD cs.AI cs.CV eess.AS

    CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning

    Authors: Sjoerd Groot, Qinyu Chen, Jan C. van Gemert, Chang Gao

    Abstract: This paper presents CleanUMamba, a time-domain neural network architecture designed for real-time causal audio denoising directly applied to raw waveforms. CleanUMamba leverages a U-Net encoder-decoder structure, incorporating the Mamba state-space model in the bottleneck layer. By replacing conventional self-attention and LSTM mechanisms with Mamba, our architecture offers superior denoising perf… ▽ More

    Submitted 10 February, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: This paper has been accepted to be presented at the 2025 International Symposium on Circuits and Systems (ISCAS)

    Journal ref: 2025 IEEE International Symposium on Circuits and Systems (ISCAS)

  49. arXiv:2409.16115  [pdf, other

    eess.SY

    Mean Age of Information in Partial Offloading Mobile Edge Computing Networks

    Authors: Ying Dong, Hang Xiao, Haonan Hu, Jiliang Zhang, Qianbin Chen, Jie Zhang

    Abstract: The age of information (AoI) performance analysis is essential for evaluating the information freshness in the large-scale mobile edge computing (MEC) networks. This work proposes the earliest analysis of the mean AoI (MAoI) performance of large-scale partial offloading MEC networks. Firstly, we derive and validate the closed-form expressions of MAoI by using queueing theory and stochastic geometr… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  50. arXiv:2409.15710  [pdf, other

    cs.RO cs.AI eess.SY

    Autotuning Bipedal Locomotion MPC with GRFM-Net for Efficient Sim-to-Real Transfer

    Authors: Qianzhong Chen, Junheng Li, Sheng Cheng, Naira Hovakimyan, Quan Nguyen

    Abstract: Bipedal locomotion control is essential for humanoid robots to navigate complex, human-centric environments. While optimization-based control designs are popular for integrating sophisticated models of humanoid robots, they often require labor-intensive manual tuning. In this work, we address the challenges of parameter selection in bipedal locomotion control using DiffTune, a model-based autotuni… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.