Skip to main content

Showing 1–50 of 164 results for author: HU, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.07647  [pdf, ps, other

    eess.SP cs.IT

    Consistent and Asymptotically Efficient Localization from Bearing-only Measurements

    Authors: Shenghua Hu, Guangyang Zeng, Wenchao Xue, Haitao Fang, Biqiang Mu

    Abstract: We study the problem of signal source localization using bearing-only measurements. Initially, we present easily verifiable geometric conditions for sensor deployment to ensure the asymptotic identifiability of the model and demonstrate the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, obtaining the ML estimator is challenging due to its association with… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

  2. arXiv:2507.03915  [pdf, ps, other

    cs.IT eess.SP

    Resource Allocation for Multi-waveguide Pinching Antenna-assisted Broadcast Networks

    Authors: Ruotong Zhao, Shaokang Hu, Deepak Mishra, Derrick Wing Kwan Ng

    Abstract: In this paper, we investigate the resource allocation for multi-dielectric waveguide-assisted broadcast systems, where each waveguide employs multiple pinching antennas (PAs), aiming to maximize the minimum achievable rate among multiple users. To capture realistic propagation effects, we propose a novel generalized frequency-dependent power attenuation model for dielectric waveguides PA system. W… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  3. arXiv:2506.20762  [pdf, ps, other

    cs.NI eess.SP

    Drift-Adaptive Slicing-Based Resource Management for Cooperative ISAC Networks

    Authors: Shisheng Hu, Jie Gao, Xue Qin, Conghao Zhou, Xinyu Huang, Mushu Li, Mingcheng He, Xuemin Shen

    Abstract: In this paper, we propose a novel drift-adaptive slicing-based resource management scheme for cooperative integrated sensing and communication (ISAC) networks. Particularly, we establish two network slices to provide sensing and communication services, respectively. In the large-timescale planning for the slices, we partition the sensing region of interest (RoI) of each mobile device and reserve n… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Transactions on Cognitive Communications and Networking

  4. arXiv:2506.11069  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Regularized Federated Learning for Privacy-Preserving Dysarthric and Elderly Speech Recognition

    Authors: Tao Zhong, Mengzhe Geng, Shujie Hu, Guinan Li, Xunying Liu

    Abstract: Accurate recognition of dysarthric and elderly speech remains challenging to date. While privacy concerns have driven a shift from centralized approaches to federated learning (FL) to ensure data confidentiality, this further exacerbates the challenges of data scarcity, imbalanced data distribution and speaker heterogeneity. To this end, this paper conducts a systematic investigation of regularize… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  5. arXiv:2506.08063  [pdf, ps, other

    cs.LG eess.SY

    Lite-RVFL: A Lightweight Random Vector Functional-Link Neural Network for Learning Under Concept Drift

    Authors: Songqiao Hu, Zeyi Liu, Xiao He

    Abstract: The change in data distribution over time, also known as concept drift, poses a significant challenge to the reliability of online learning methods. Existing methods typically require model retraining or drift detection, both of which demand high computational costs and are often unsuitable for real-time applications. To address these limitations, a lightweight, fast and efficient random vector fu… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 6 pages, 4 figures, accepted by the 2025 CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS 2025)

  6. arXiv:2505.24224  [pdf, ps, other

    eess.AS

    MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition

    Authors: Chengxi Deng, Xurong Xie, Shujie Hu, Mengzhe Geng, Yicong Jiang, Jiankun Zhao, Jiajun Deng, Guinan Li, Youjun Chen, Huimeng Wang, Haoning Xu, Mingyu Cui, Xunying Liu

    Abstract: This paper proposes a novel Mixture of Prompt-Experts based Speaker Adaptation approach (MOPSA) for elderly speech recognition. It allows zero-shot, real-time adaptation to unseen speakers, and leverages domain knowledge tailored to elderly speakers. Top-K most distinctive speaker prompt clusters derived using K-means serve as experts. A router network is trained to dynamically combine clustered p… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  7. arXiv:2505.24160  [pdf, ps, other

    eess.IV cs.CV

    Beyond the LUMIR challenge: The pathway to foundational registration models

    Authors: Junyu Chen, Shuwen Wei, Joel Honkamaa, Pekka Marttinen, Hang Zhang, Min Liu, Yichao Zhou, Zuopeng Tan, Zhuoyuan Wang, Yi Wang, Hongchao Zhou, Shunbo Hu, Yi Zhang, Qian Tao, Lukas Förner, Thomas Wendler, Bailiang Jian, Benedikt Wiestler, Tim Hable, Jin Kim, Dan Ruan, Frederic Madesta, Thilo Sentker, Wiebke Heyer, Lianrui Zuo , et al. (11 additional authors not shown)

    Abstract: Medical image challenges have played a transformative role in advancing the field, catalyzing algorithmic innovation and establishing new performance standards across diverse clinical applications. Image registration, a foundational task in neuroimaging pipelines, has similarly benefited from the Learn2Reg initiative. Building on this foundation, we introduce the Large-scale Unsupervised Brain MRI… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  8. arXiv:2505.23236  [pdf, ps, other

    cs.SD cs.HC eess.AS

    Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

    Authors: Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu

    Abstract: This paper presents a novel end-to-end LLM-empowered explainable speech emotion recognition (SER) approach. Fine-grained speech emotion descriptor (SED) features, e.g., pitch, tone and emphasis, are disentangled from HuBERT SSL representations via alternating LLM fine-tuning to joint SER-SED prediction and ASR tasks. VAE compressed HuBERT features derived via Information Bottleneck (IB) are used t… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted by INTERSPEECH2025

  9. arXiv:2505.22072  [pdf, other

    cs.SD eess.AS

    On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

    Authors: Shujie HU, Xurong Xie, Mengzhe Geng, Jiajun Deng, Huimeng Wang, Guinan Li, Chengxi Deng, Tianzi Wang, Mingyu Cui, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel MoE-based speaker adaptation framework for foundation models based dysarthric speech recognition. This approach enables zero-shot adaptation and real-time processing while incorporating domain knowledge. Speech impairment severity and gender conditioned adapter experts are dynamically combined using on-the-fly predicted speaker-dependent routing parameters. KL-divergenc… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  10. arXiv:2505.21245  [pdf, ps, other

    cs.SD eess.AS

    Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

    Authors: Zhaoqing Li, Haoning Xu, Zengrui Jin, Lingwei Meng, Tianzi Wang, Huimeng Wang, Youjun Chen, Mingyu Cui, Shujie Hu, Xunying Liu

    Abstract: Model compression has become an emerging need as the sizes of modern speech systems rapidly increase. In this paper, we study model weight quantization, which directly reduces the memory footprint to accommodate computationally resource-constrained applications. We propose novel approaches to perform extremely low-bit (i.e., 2-bit and 1-bit) quantization of Conformer automatic speech recognition s… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  11. arXiv:2505.19077  [pdf, other

    eess.SY

    An Autocovariance Least-Squares-Based Data-Driven Kalman Filter for Unknown Systems

    Authors: Suyang Hu, Xiaoxu Lyu, Peihu Duan, Dawei Shi, Ling Shi

    Abstract: This article investigates the problem of data-driven state estimation for linear systems with both unknown system dynamics and noise covariances. We propose an Autocovariance Least-squares-based Data-driven Kalman Filter (ADKF), which provides a unified framework for simultaneous system identification and state estimation by utilizing pre-collected input-output trajectories and estimated initial s… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  12. arXiv:2505.13070  [pdf, ps, other

    eess.SY

    RSS-Based Localization: Ensuring Consistency and Asymptotic Efficiency

    Authors: Shenghua Hu, Guangyang Zeng, Wenchao Xue, Haitao Fang, Junfeng Wu, Biqiang Mu

    Abstract: We study the problem of signal source localization using received signal strength measurements. We begin by presenting verifiable geometric conditions for sensor deployment that ensure the model's asymptotic localizability. Then we establish the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, computing the ML estimator is challenging due to its reliance on… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  13. arXiv:2505.04753  [pdf, other

    cs.IT eess.SP

    Hybrid-Field 6D Movable Antenna for Terahertz Communications: Channel Modeling and Estimation

    Authors: Xiaodan Shao, Yixiao Zhang, Shisheng Hu, Zhixuan Tang, Mingcheng He, Xinyu Huang, Weihua Zhuang, Xuemin Shen

    Abstract: In this work, we study a six-dimensional movable antenna (6DMA)-enhanced Terahertz (THz) network that supports a large number of users with a few antennas by controlling the three-dimensional (3D) positions and 3D rotations of antenna surfaces/subarrays at the base station (BS). However, the short wavelength of THz signals combined with a large 6DMA movement range extends the near-field region. As… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  14. arXiv:2505.02613  [pdf, ps, other

    eess.IV cs.LG

    Lane-Wise Highway Anomaly Detection

    Authors: Mei Qiu, William Lorenz Reindl, Yaobin Chen, Stanley Chien, Shu Hu

    Abstract: This paper proposes a scalable and interpretable framework for lane-wise highway traffic anomaly detection, leveraging multi-modal time series data extracted from surveillance cameras. Unlike traditional sensor-dependent methods, our approach uses AI-powered vision models to extract lane-specific features, including vehicle count, occupancy, and truck percentage, without relying on costly hardware… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  15. arXiv:2503.16551  [pdf, other

    cs.RO eess.SY

    CoIn-SafeLink: Safety-critical Control With Cost-sensitive Incremental Random Vector Functional Link Network

    Authors: Songqiao Hu, Zeyi Liu, Xiao He, Zhen Shen

    Abstract: Control barrier functions (CBFs) play a crucial role in achieving the safety-critical control of robotic systems theoretically. However, most existing methods rely on the analytical expressions of unsafe state regions, which is often impractical for irregular and dynamic unsafe regions. In this paper, a novel CBF construction approach, called CoIn-SafeLink, is proposed based on cost-sensitive incr… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 8 pages, 8 figures, submitted to The 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

  16. arXiv:2503.15581  [pdf, other

    cs.LG eess.SY

    Performance-bounded Online Ensemble Learning Method Based on Multi-armed bandits and Its Applications in Real-time Safety Assessment

    Authors: Songqiao Hu, Zeyi Liu, Xiao He

    Abstract: Ensemble learning plays a crucial role in practical applications of online learning due to its enhanced classification performance and adaptable adjustment mechanisms. However, most weight allocation strategies in ensemble learning are heuristic, making it challenging to theoretically guarantee that the ensemble classifier outperforms its base classifiers. To address this issue, a performance-boun… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 14 pages, 9 figures

  17. arXiv:2503.10060  [pdf, other

    eess.SP

    Sum-Rate Maximization for Pinching Antenna-assisted NOMA Systems with Multiple Dielectric Waveguides

    Authors: Shaokang Hu, Ruotong Zhao, Yihuan Liao, Derrick Wing Kwan Ng, Jinhong Yuan

    Abstract: This paper investigates the resource allocation design for a pinching antenna (PA)-assisted multiuser multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) system featuring multiple dielectric waveguides. To enhance model accuracy, we propose a novel frequency-dependent power attenuation model for the dielectric waveguides in PA-assisted systems. By jointly optimizing the preco… ▽ More

    Submitted 6 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: 7 pages, 3 figures, conference

  18. arXiv:2503.04402  [pdf

    physics.optics eess.SP

    Mid-infrared laser chaos lidar

    Authors: Kai-Li Lin, Peng-Lei Wang, Yi-Bo Peng, Shiyu Hu, Chunfang Cao, Cheng-Ting Lee, Qian Gong, Fan-Yi Lin, Wenxiang Huang, Cheng Wang

    Abstract: Chaos lidars detect targets through the cross-correlation between the back-scattered chaos signal from the target and the local reference one. Chaos lidars have excellent anti-jamming and anti-interference capabilities, owing to the random nature of chaotic oscillations. However, most chaos lidars operate in the near-infrared spectral regime, where the atmospheric attenuation is significant. Here… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  19. arXiv:2502.09291  [pdf, ps, other

    eess.SP cs.LG

    Joint Attention Mechanism Learning to Facilitate Opto-physiological Monitoring during Physical Activity

    Authors: Xiaoyu Zheng, Sijung Hu, Vincent Dwyer, Mahsa Derakhshani, Laura Barrett

    Abstract: Opto-physiological monitoring is a non-contact technique for measuring cardiac signals, i.e., photoplethysmography (PPG). Quality PPG signals directly lead to reliable physiological readings. However, PPG signal acquisition procedures are often accompanied by spurious motion artefacts (MAs), especially during low-to-high-intensity physical activity. This study proposes a practical adversarial lear… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  20. arXiv:2502.06817  [pdf, other

    eess.IV cs.GR cs.LG

    Diffusion-empowered AutoPrompt MedSAM

    Authors: Peng Huang, Shu Hu, Bo Peng, Xun Gong, Penghang Yin, Hongtu Zhu, Xi Wu, Xin Wang

    Abstract: MedSAM, a medical foundation model derived from the SAM architecture, has demonstrated notable success across diverse medical domains. However, its clinical application faces two major challenges: the dependency on labor-intensive manual prompt generation, which imposes a significant burden on clinicians, and the absence of semantic labeling in the generated segmentation masks for organs or lesion… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  21. arXiv:2501.07127  [pdf, ps, other

    eess.IV

    QoE-oriented Communication Service Provision for Annotation Rendering in Mobile Augmented Reality

    Authors: Lulu Sun, Conghao Zhou, Shisheng Hu, Yupeng Zhu, Nan Cheng, Xu Xia

    Abstract: As mobile augmented reality (MAR) continues to evolve, future 6G networks will play a pivotal role in supporting immersive and personalized user experiences. In this paper, we address the communication service provision problem for annotation rendering in edge-assisted MAR, with the objective of optimizing spectrum resource utilization while ensuring the required quality of experience (QoE) for MA… ▽ More

    Submitted 3 March, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

    Comments: 6 pages,4 figures

  22. arXiv:2501.04379  [pdf, other

    cs.SD eess.AS

    Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition

    Authors: Huimeng Wang, Xurong Xie, Mengzhe Geng, Shujie Hu, Haoning Xu, Youjun Chen, Zhaoqing Li, Jiajun Deng, Xunying Liu

    Abstract: Discrete tokens extracted provide efficient and domain adaptable speech features. Their application to disordered speech that exhibits articulation imprecision and large mismatch against normal voice remains unexplored. To improve their phonetic discrimination that is weakened during unsupervised K-means or vector quantization of continuous features, this paper proposes novel phone-purity guided (… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: ICASSP 2025

  23. arXiv:2501.03643  [pdf, other

    cs.SD cs.AI eess.AS

    Effective and Efficient Mixed Precision Quantization of Speech Foundation Models

    Authors: Haoning Xu, Zhaoqing Li, Zengrui Jin, Huimeng Wang, Youjun Chen, Guinan Li, Mengzhe Geng, Shujie Hu, Jiajun Deng, Xunying Liu

    Abstract: This paper presents a novel mixed-precision quantization approach for speech foundation models that tightly integrates mixed-precision learning and quantized model parameter estimation into one single model compression stage. Experiments conducted on LibriSpeech dataset with fine-tuned wav2vec2.0-base and HuBERT-large models suggest the resulting mixed-precision quantized models increased the loss… ▽ More

    Submitted 11 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: To appear at IEEE ICASSP 2025

  24. arXiv:2412.19279  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Generalization for AI-Synthesized Voice Detection

    Authors: Hainan Ren, Li Lin, Chun-Hao Liu, Xin Wang, Shu Hu

    Abstract: AI-synthesized voice technology has the potential to create realistic human voices for beneficial applications, but it can also be misused for malicious purposes. While existing AI-synthesized voice detection models excel in intra-domain evaluation, they face challenges in generalizing across different domains, potentially becoming obsolete as new voice generators emerge. Current solutions use div… ▽ More

    Submitted 30 December, 2024; v1 submitted 26 December, 2024; originally announced December 2024.

    Comments: AAAI25

  25. arXiv:2412.18832  [pdf, other

    eess.AS cs.SD

    Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Jiajun Deng, Zengrui Jin, Tianzi Wang, Mingyu Cui, Guinan Li, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: Data-intensive fine-tuning of speech foundation models (SFMs) to scarce and diverse dysarthric and elderly speech leads to data bias and poor generalization to unseen speakers. This paper proposes novel structured speaker-deficiency adaptation approaches for SSL pre-trained SFMs on such data. Speaker and speech deficiency invariant SFMs were constructed in their supervised adaptive fine-tuning sta… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

  26. arXiv:2412.18619  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM eess.AS

    Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

    Authors: Liang Chen, Zekun Wang, Shuhuai Ren, Lei Li, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee , et al. (2 additional authors not shown)

    Abstract: Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable success. As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks f… ▽ More

    Submitted 29 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: 69 papes, 18 figures, repo at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction

  27. arXiv:2412.10559  [pdf, other

    math.NA eess.SY math.DS

    Error Estimation and Stopping Criteria for Krylov-Based Model Order Reduction in Acoustics

    Authors: Siyang Hu, Nick Wulbusch, Alexey Chernov, Tamara Bechtold

    Abstract: Depending on the frequency range of interest, finite element-based modeling of acoustic problems leads to dynamical systems with very high dimensional state spaces. As these models can mostly be described with second order linear dynamical system with sparse matrices, mathematical model order reduction provides an interesting possibility to speed up the simulation process. In this work, we tackle… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: 8 pages, 5 figures

    MSC Class: 37M05; 65P99; 33J05

  28. arXiv:2411.14656  [pdf, other

    eess.SP cs.ET stat.AP

    mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect

    Authors: Shuting Hu, Peggy Ackun, Xiang Zhang, Siyang Cao, Jennifer Barton, Melvin G. Hector, Mindy J. Fain, Nima Toosizadeh

    Abstract: This study explores a novel approach for analyzing Sit-to-Stand (STS) movements using millimeter-wave (mmWave) radar technology. The goal is to develop a non-contact sensing, privacy-preserving, and all-day operational method for healthcare applications, including fall risk assessment. We used a 60GHz mmWave radar system to collect radar point cloud data, capturing STS motions from 45 participants… ▽ More

    Submitted 28 November, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

  29. arXiv:2411.13766  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge

    Authors: Ruiyang Qin, Dancheng Liu, Gelei Xu, Zheyu Yan, Chenhui Xu, Yuting Hu, X. Sharon Hu, Jinjun Xiong, Yiyu Shi

    Abstract: The combination of Large Language Models (LLM) and Automatic Speech Recognition (ASR), when deployed on edge devices (called edge ASR-LLM), can serve as a powerful personalized assistant to enable audio-based interaction for users. Compared to text-based interaction, edge ASR-LLM allows accessible and natural audio interactions. Unfortunately, existing ASR-LLM models are mainly trained in high-per… ▽ More

    Submitted 9 July, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: Accepted by ICCAD'25

  30. arXiv:2411.03099  [pdf

    eess.SY

    Optimized Cryo-CMOS Technology with VTH<0.2V and Ion>1.2mA/um for High-Peformance Computing

    Authors: Chang He, Yue Xin, Longfei Yang, Zewei Wang, Zhidong Tang, Xin Luo, Renhe Chen, Zirui Wang, Shuai Kong, Jianli Wang, Jianshi Tang, Xiaoxu Kang, Shoumian Chen, Yuhang Zhao, Shaojian Hu, Xufeng Kou

    Abstract: We report the design-technology co-optimization (DTCO) scheme to develop a 28-nm cryogenic CMOS (Cryo-CMOS) technology for high-performance computing (HPC). The precise adjustment of halo implants manages to compensate the threshold voltage (VTH) shift at low temperatures. The optimized NMOS and PMOS transistors, featured by VTH<0.2V, sub-threshold swing (SS)<30 mV/dec, and on-state current (Ion)>… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  31. arXiv:2410.02592  [pdf, other

    cs.CV cs.AI cs.LG eess.SY

    IC3M: In-Car Multimodal Multi-object Monitoring for Abnormal Status of Both Driver and Passengers

    Authors: Zihan Fang, Zheng Lin, Senkang Hu, Hangcheng Cao, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Recently, in-car monitoring has emerged as a promising technology for detecting early-stage abnormal status of the driver and providing timely alerts to prevent traffic accidents. Although training models with multimodal data enhances the reliability of abnormal status detection, the scarcity of labeled data and the imbalance of class distribution impede the extraction of critical abnormal state f… ▽ More

    Submitted 21 November, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: 16 pages, 17 figures

  32. arXiv:2409.13167  [pdf, ps, other

    eess.SP cs.AI

    Unsupervised Attention-Based Multi-Source Domain Adaptation Framework for Drift Compensation in Electronic Nose Systems

    Authors: Wenwen Zhang, Shuhao Hu, Zhengyuan Zhang, Yuanjin Zheng, Qi Jie Wang, Zhiping Lin

    Abstract: Continuous, long-term monitoring of hazardous, noxious, explosive, and flammable gases in industrial environments using electronic nose (E-nose) systems faces the significant challenge of reduced gas identification accuracy due to time-varying drift in gas sensors. To address this issue, we propose a novel unsupervised attention-based multi-source domain shared-private feature fusion adaptation (A… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  33. arXiv:2409.08797  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

    Authors: Mingyu Cui, Yifan Yang, Jiajun Deng, Jiawen Kang, Shujie Hu, Tianzi Wang, Zhaoqing Li, Shiliang Zhang, Xie Chen, Xunying Liu

    Abstract: Self-supervised learning (SSL) based discrete speech representations are highly compact and domain adaptable. In this paper, SSL discrete speech features extracted from WavLM models are used as additional cross-utterance acoustic context features in Zipformer-Transducer ASR systems. The efficacy of replacing Fbank features with discrete token features for modelling either cross-utterance contexts… ▽ More

    Submitted 9 June, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted by Interspeech 2025

  34. arXiv:2409.08596  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

    Authors: Lingwei Meng, Shujie Hu, Jiawen Kang, Zhaoqing Li, Yuejiao Wang, Wenxuan Wu, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Recent advancements in large language models (LLMs) have revolutionized various domains, bringing significant progress and new opportunities. Despite progress in speech-related tasks, LLMs have not been sufficiently explored in multi-talker scenarios. In this work, we present a pioneering effort to investigate the capability of LLMs in transcribing speech in multi-talker environments, following ve… ▽ More

    Submitted 2 April, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted to IEEE ICASSP 2025. Update code link

  35. arXiv:2408.08881  [pdf, other

    eess.IV cs.AI cs.CV

    Challenge Summary U-MedSAM: Uncertainty-aware MedSAM for Medical Image Segmentation

    Authors: Xin Wang, Xiaoyu Liu, Peng Huang, Pu Huang, Shu Hu, Hongtu Zhu

    Abstract: Medical Image Foundation Models have proven to be powerful tools for mask prediction across various datasets. However, accurately assessing the uncertainty of their predictions remains a significant challenge. To address this, we propose a new model, U-MedSAM, which integrates the MedSAM model with an uncertainty-aware loss function and the Sharpness-Aware Minimization (SharpMin) optimizer. The un… ▽ More

    Submitted 16 January, 2025; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17496

  36. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  37. arXiv:2408.02943  [pdf, other

    eess.SP

    Recent Advances in Data-driven Intelligent Control for Wireless Communication: A Comprehensive Survey

    Authors: Wei Huo, Huiwen Yang, Nachuan Yang, Zhaohua Yang, Jiuzhou Zhang, Fuhai Nan, Xingzhou Chen, Yifan Mao, Suyang Hu, Pengyu Wang, Xuanyu Zheng, Mingming Zhao, Ling Shi

    Abstract: The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  38. arXiv:2407.13782  [pdf, other

    eess.AS cs.AI cs.SD

    Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

    Abstract: Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  39. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued token based language modeling approach for text-to-speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which is typically designed for audio compression and sacrifices fidelity compared to continuous representations. Specifically, (i) instead… ▽ More

    Submitted 27 May, 2025; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2025 Main

  40. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  41. arXiv:2406.17338  [pdf, other

    eess.IV cs.CV cs.LG

    Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

    Authors: Peng Huang, Shu Hu, Bo Peng, Jiashu Zhang, Xi Wu, Xin Wang

    Abstract: Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propos… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  42. arXiv:2406.15093  [pdf, other

    cs.CR cs.CV eess.IV

    ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

    Authors: Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai Jin

    Abstract: Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, bui… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ESORICS 2024

  43. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui Jin, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  44. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  45. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jin, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 30 August, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  46. arXiv:2405.19527  [pdf

    eess.SY

    Flexible Agent-based Modeling Framework to Evaluate Integrated Microtransit and Fixed-route Transit Designs: Mode Choice, Supernetworks, and Fleet Simulation

    Authors: Siwei Hu, Michael F. Hyland, Ritun Saha, Jacob J. Berkel, Geoffrey Vander Veen

    Abstract: The integration of traditional fixed-route transit (FRT) and more flexible microtransit has been touted as a means of improving mobility and access to opportunity, increasing transit ridership, and promoting environmental sustainability. To help evaluate integrated FRT and microtransit public transit (PT) system (henceforth ``integrated fixed-flex PT system'') designs, we propose a high-fidelity m… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 49 pages, 25 figures, 8 tables; Submitted To: Transportation Research Part C: Emerging Technologies on May 1st, 2024

  47. arXiv:2405.17496  [pdf, other

    eess.IV

    UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

    Authors: Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang

    Abstract: Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, w… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  48. arXiv:2405.09443  [pdf, other

    cs.IT eess.SP

    Low-Complexity Joint Azimuth-Range-Velocity Estimation for Integrated Sensing and Communication with OFDM Waveform

    Authors: Jun Zhang, Gang Yang, Qibin Ye, Yixuan Huang, Su Hu

    Abstract: Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with ortho… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 12 figures, submitted to IEEE journal

  49. arXiv:2404.15212  [pdf, other

    cs.CV eess.IV

    Real-time Lane-wise Traffic Monitoring in Optimal ROIs

    Authors: Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Chien, Yaobin Chen, Shu Hu

    Abstract: In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automa… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

  50. arXiv:2404.12908  [pdf, other

    cs.CV cs.LG eess.IV

    Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images

    Authors: Santosh, Li Lin, Irene Amerini, Xin Wang, Shu Hu

    Abstract: Diffusion models (DMs) have revolutionized image generation, producing high-quality images with applications spanning various fields. However, their ability to create hyper-realistic images poses significant challenges in distinguishing between real and synthetic content, raising concerns about digital authenticity and potential misuse in creating deepfakes. This work introduces a robust detection… ▽ More

    Submitted 8 September, 2024; v1 submitted 19 April, 2024; originally announced April 2024.