Skip to main content

Showing 1–50 of 145 results for author: Yang, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.07105  [pdf, ps, other

    cs.CV eess.IV

    4KAgent: Agentic Any Image to 4K Super-Resolution

    Authors: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu

    Abstract: We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components:… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Project page: https://4kagent.github.io

  2. arXiv:2507.02289  [pdf, ps, other

    eess.IV cs.CV

    CineMyoPS: Segmenting Myocardial Pathologies from Cine Cardiac MR

    Authors: Wangbin Ding, Lei Li, Junyi Qiu, Bogen Lin, Mingjing Yang, Liqin Huang, Lianming Wu, Sihan Wang, Xiahai Zhuang

    Abstract: Myocardial infarction (MI) is a leading cause of death worldwide. Late gadolinium enhancement (LGE) and T2-weighted cardiac magnetic resonance (CMR) imaging can respectively identify scarring and edema areas, both of which are essential for MI risk stratification and prognosis assessment. Although combining complementary information from multi-sequence CMR is useful, acquiring these sequences can… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2506.17425  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Trans${^2}$-CBCT: A Dual-Transformer Framework for Sparse-View CBCT Reconstruction

    Authors: Minmin Yang, Huantao Ren, Senem Velipasalar

    Abstract: Cone-beam computed tomography (CBCT) using only a few X-ray projection views enables faster scans with lower radiation dose, but the resulting severe under-sampling causes strong artifacts and poor spatial coverage. We address these challenges in a unified framework. First, we replace conventional UNet/ResNet encoders with TransUNet, a hybrid CNN-Transformer model. Convolutional layers capture loc… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  4. arXiv:2506.16961  [pdf, ps, other

    cs.CV eess.IV

    Reversing Flow for Image Restoration

    Authors: Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu

    Abstract: Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation. Existing generative models for image restoration, including diffusion and score-based models, often treat the degradation process as a stochastic transformation, which introduces inefficiency and complexity. In this work, we propose ResFlow, a novel image restorat… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: CVPR2025 Final Version; Corresponding Author: Bing Li

    MSC Class: 68U10 ACM Class: I.4.4

  5. arXiv:2506.05706  [pdf, other

    eess.AS

    Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition

    Authors: Mu Yang, Szu-Jui Chen, Jiamin Xie, John Hansen

    Abstract: One challenge of integrating speech input with large language models (LLMs) stems from the discrepancy between the continuous nature of audio data and the discrete token-based paradigm of LLMs. To mitigate this gap, we propose a method for integrating vector quantization (VQ) into LLM-based automatic speech recognition (ASR). Using the LLM embedding table as the VQ codebook, the VQ module aligns t… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  6. arXiv:2505.24651  [pdf, ps, other

    cs.IT eess.SP

    Robust Distributed Phase Retrieval for Multi-View Compressive Networked Sensing With Outliers

    Authors: Ming-Hsun Yang

    Abstract: This work examines the multi-view compressive phase retrieval problem in a distributed sensor network, where each sensor device, limited by storage and sensing capabilities, can access only intensity measurements from an unknown part of the global sparse vector. The goal is to enable each sensor to recover its observable sparse signal when measurements are corrupted by outliers. To achieve reliabl… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  7. arXiv:2505.16027  [pdf

    eess.IV cs.AI cs.CV

    Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets

    Authors: Qinmei Xu, Yiheng Li, Xianghao Zhan, Ahmet Gorkem Er, Brittany Dashevsky, Chuanjun Xu, Mohammed Alawad, Mengya Yang, Liu Ya, Changsheng Zhou, Xiao Li, Haruka Itakura, Olivier Gevaert

    Abstract: Foundation models leveraging vision-language pretraining have shown promise in chest X-ray (CXR) interpretation, yet their real-world performance across diverse populations and diagnostic tasks remains insufficiently evaluated. This study benchmarks the diagnostic performance and generalizability of foundation models versus traditional convolutional neural networks (CNNs) on multinational CXR data… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 78 pages, 7 figures, 2 tabeles

    MSC Class: I.2 ACM Class: I.2

  8. arXiv:2505.07916  [pdf, ps, other

    eess.AS cs.SD

    MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

    Authors: Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He

    Abstract: We introduce MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech. A key innovation is our learnable speaker encoder, which extracts timbre features from a reference audio without requiring its transcription. This enables MiniMax-Speech to produce highly expressive speech with timbre consistent with the reference in a zero-shot manner, w… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  9. arXiv:2504.18022  [pdf, ps, other

    cs.IT eess.SY

    Iterative Joint Detection of Kalman Filter and Channel Decoder for Sensor-to-Controller Link in Wireless Networked Control Systems

    Authors: Jinnan Piao, Dong Li, Yiming Sun, Zhibo Li, Ming Yang, Xueting Yu

    Abstract: In this letter, we propose an iterative joint detection algorithm of Kalman filter (KF) and channel decoder for the sensor-to-controller link of wireless networked control systems, which utilizes the prior information of control system to improve control and communication performance. In this algorithm, we first use the KF to estimate the probability density of the control system outputs and calcu… ▽ More

    Submitted 29 May, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: 5 pages, 4 figures

  10. arXiv:2503.13468  [pdf, other

    eess.SP cs.LG

    A CGAN-LSTM-Based Framework for Time-Varying Non-Stationary Channel Modeling

    Authors: Keying Guo, Ruisi He, Mi Yang, Yuxin Zhang, Bo Ai, Haoxiang Zhang, Jiahui Han, Ruifeng Chen

    Abstract: Time-varying non-stationary channels, with complex dynamic variations and temporal evolution characteristics, have significant challenges in channel modeling and communication system performance evaluation. Most existing methods of time-varying channel modeling focus on predicting channel state at a given moment or simulating short-term channel fluctuations, which are unable to capture the long-te… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 11 pages,7 figures

  11. arXiv:2503.11124  [pdf, other

    cs.RO eess.SY physics.flu-dyn

    Flow-Aware Navigation of Magnetic Micro-Robots in Complex Fluids via PINN-Based Prediction

    Authors: Yongyi Jia, Shu Miao, Jiayu Wu, Ming Yang, Chengzhi Hu, Xiang Li

    Abstract: While magnetic micro-robots have demonstrated significant potential across various applications, including drug delivery and microsurgery, the open issue of precise navigation and control in complex fluid environments is crucial for in vivo implementation. This paper introduces a novel flow-aware navigation and control strategy for magnetic micro-robots that explicitly accounts for the impact of f… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 8

  12. arXiv:2503.01383  [pdf, other

    eess.SP

    Channel Semantic Characterization for Integrated Sensing and Communication Scenarios: From Measurements to Modeling

    Authors: Zhengyu Zhang, Ruisi He, Bo Ai, Mi Yang, Xuejian Zhang, Ziyi Qi, Zhangdui Zhong

    Abstract: With the advancement of sixth-generation (6G) wireless communication systems, integrated sensing and communication (ISAC) is crucial for perceiving and interacting with the environment via electromagnetic propagation, termed channel semantics, to support tasks like decision-making. However, channel models focusing on physical characteristics face challenges in representing semantics embedded in… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  13. arXiv:2502.18846  [pdf, other

    cs.RO eess.SY

    RL-OGM-Parking: Lidar OGM-Based Hybrid Reinforcement Learning Planner for Autonomous Parking

    Authors: Zhitao Wang, Zhe Chen, Mingyang Jiang, Tong Qin, Ming Yang

    Abstract: Autonomous parking has become a critical application in automatic driving research and development. Parking operations often suffer from limited space and complex environments, requiring accurate perception and precise maneuvering. Traditional rule-based parking algorithms struggle to adapt to diverse and unpredictable conditions, while learning-based algorithms lack consistent and stable performa… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  14. arXiv:2502.16342  [pdf, other

    eess.IV cs.CV

    Revealing Microscopic Objects in Fluorescence Live Imaging by Video-to-video Translation Based on A Spatial-temporal Generative Adversarial Network

    Authors: Yang Jiao, Mei Yang, Mo Weng

    Abstract: In spite of being a valuable tool to simultaneously visualize multiple types of subcellular structures using spectrally distinct fluorescent labels, a standard fluoresce microscope is only able to identify a few microscopic objects; such a limit is largely imposed by the number of fluorescent labels available to the sample. In order to simultaneously visualize more objects, in this paper, we propo… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  15. arXiv:2502.09654  [pdf, other

    eess.IV cs.CV

    Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution

    Authors: Bowen Chen, Keyan Chen, Mohan Yang, Zhengxia Zou, Zhenwei Shi

    Abstract: Remote sensing image super-resolution (SR) aims to reconstruct high-resolution remote sensing images from low-resolution inputs, thereby addressing limitations imposed by sensors and imaging conditions. However, the inherent characteristics of remote sensing images, including diverse ground object types and complex details, pose significant challenges to achieving high-quality reconstruction. Exis… ▽ More

    Submitted 2 April, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  16. arXiv:2502.00800  [pdf, other

    cs.CV eess.IV

    Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data

    Authors: Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du

    Abstract: Generative adversarial networks (GANs) have made remarkable achievements in synthesizing images in recent years. Typically, training GANs requires massive data, and the performance of GANs deteriorates significantly when training data is limited. To improve the synthesis performance of GANs in low-data regimes, existing approaches use various data augmentation techniques to enlarge the training se… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

    Comments: This work was completed in 2022 and submitted to an IEEE journal for potential publication

  17. arXiv:2501.15726  [pdf, other

    cs.IT eess.SP

    Vision-Aided Channel Prediction Based on Image Segmentation at Street Intersection Scenarios

    Authors: Xuejian Zhang, Ruisi He, Mi Yang, Ziyi Qi, Zhengyu Zhang, Bo Ai, Zhangdui Zhong

    Abstract: Intelligent vehicular communication with vehicle road collaboration capability is a key technology enabled by 6G, and the integration of various visual sensors on vehicles and infrastructures plays a crucial role. Moreover, accurate channel prediction is foundational to realizing intelligent vehicular communication. Traditional methods are still limited by the inability to balance accuracy and ope… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: 12 pages, 9 figures, submitted to IEEE Transactions on Cognitive Communications and Networking

  18. arXiv:2501.13134  [pdf, ps, other

    eess.IV cs.LG

    UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

    Authors: I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang

    Abstract: Image restoration aims to recover content from inputs degraded by various factors, such as adverse weather, blur, and noise. Perceptual Image Restoration (PIR) methods improve visual quality but often do not support downstream tasks effectively. On the other hand, Task-oriented Image Restoration (TIR) methods focus on enhancing image utility for high-level vision tasks, sometimes compromising visu… ▽ More

    Submitted 1 June, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: Accepted by CVPR2025 (Highlight); Project Page: https://unirestore.github.io

  19. arXiv:2501.08639  [pdf

    cs.CV eess.IV

    Detecting Wildfire Flame and Smoke through Edge Computing using Transfer Learning Enhanced Deep Learning Models

    Authors: Giovanny Vazquez, Shengjie Zhai, Mei Yang

    Abstract: Autonomous unmanned aerial vehicles (UAVs) integrated with edge computing capabilities empower real-time data processing directly on the device, dramatically reducing latency in critical scenarios such as wildfire detection. This study underscores Transfer Learning's (TL) significance in boosting the performance of object detectors for identifying wildfire smoke and flames, especially when trained… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: 11 pages, 7 figures

  20. arXiv:2412.11393  [pdf

    cs.LG eess.SP

    STDHL: Spatio-Temporal Dynamic Hypergraph Learning for Wind Power Forecasting

    Authors: Xiaochong Dong, Xuemin Zhang, Ming Yang, Shengwei Mei

    Abstract: Leveraging spatio-temporal correlations among wind farms can significantly enhance the accuracy of ultra-short-term wind power forecasting. However, the complex and dynamic nature of these correlations presents significant modeling challenges. To address this, we propose a spatio-temporal dynamic hypergraph learning (STDHL) model. This model uses a hypergraph structure to represent spatial feature… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  21. arXiv:2412.07074  [pdf, other

    eess.SP

    Channel Spreading Function-Inspired Channel Transfer Function Estimation for OFDM Systems with High-Mobility

    Authors: Yiyan Ma, Bo Ai, Guoyu Ma, Akram Shafie, Qingqing Cheng, Mi Yang, Jingli Li, Xuebo Pang, Jinhong Yuan, Zhangdui Zhong

    Abstract: In this letter, we propose a novel channel transfer function (CTF) estimation approach for orthogonal frequency division multiplexing (OFDM) systems in high-mobility scenarios, that leverages the stationary properties of the delay-Doppler domain channel spreading function (CSF). First, we develop a CSF estimation model for OFDM systems that relies solely on discrete pilot symbols in the time-frequ… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  22. arXiv:2411.19509  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

    Authors: Tianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang

    Abstract: Recent advances in diffusion models have endowed talking head synthesis with subtle expressions and vivid head movements, but have also led to slow inference speed and insufficient control over generated results. To address these issues, we propose Ditto, a diffusion-based talking head framework that enables fine-grained controls and real-time inference. Specifically, we utilize an off-the-shelf m… ▽ More

    Submitted 30 April, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: Project Page: https://digital-avatar.github.io/ai/Ditto/

  23. arXiv:2411.11798  [pdf

    cs.IT cs.AI eess.SP

    COST CA20120 INTERACT Framework of Artificial Intelligence Based Channel Modeling

    Authors: Ruisi He, Nicola D. Cicco, Bo Ai, Mi Yang, Yang Miao, Mate Boban

    Abstract: Accurate channel models are the prerequisite for communication-theoretic investigations as well as system design. Channel modeling generally relies on statistical and deterministic approaches. However, there are still significant limits for the traditional modeling methods in terms of accuracy, generalization ability, and computational complexity. The fundamental reason is that establishing a quan… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

    Comments: to appear in IEEE Wireless Communications Magazine

  24. arXiv:2411.11539  [pdf, ps, other

    cs.IT eess.SP

    Channel Capacity-Aware Distributed Encoding for Multi-View Sensing and Edge Inference

    Authors: Mingjie Yang, Guangming Liang, Dongzhu Liu, Lei Zhang, Kaibin Huang

    Abstract: Integrated sensing and communication (ISAC) unifies wireless communication and sensing by sharing spectrum and hardware, which often incurs trade-offs between two functions due to limited resources. However, this paper shifts focus to exploring the synergy between communication and sensing, using WiFi sensing as an exemplary scenario where communication signals are repurposed to probe the environm… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  25. arXiv:2411.05835  [pdf, other

    eess.SY

    Improved Convolution-Based Analysis for Worst-Case Probability Response Time of CAN

    Authors: Haozhe Yi, Junyi Liu, Maolin Yang, Zewei Chen, Xu Jiang

    Abstract: Controller Area Networks (CANs) are widely adopted in real-time automotive control and are increasingly standard in factory automation. Considering their critical application in safety-critical systems, The error rate of the system must be accurately predicted and guaranteed. Through simulation, it is possible to obtain a low-precision overview of the system's behavior. However, for low-probabilit… ▽ More

    Submitted 28 November, 2024; v1 submitted 6 November, 2024; originally announced November 2024.

  26. arXiv:2411.05141  [pdf, ps, other

    eess.AS cs.SD

    Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation

    Authors: Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra

    Abstract: This work focuses on improving Text-To-Audio (TTA) generation on zero-shot and few-shot settings (i.e. generating unseen or uncommon audio events). Inspired by the success of Retrieval-Augmented Generation (RAG) in Large Language Models, we propose Audiobox TTA-RAG, a novel retrieval-augmented TTA approach based on Audiobox, a flow-matching audio generation model. Unlike the vanilla Audiobox TTA s… ▽ More

    Submitted 6 June, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: Interspeech 2025

  27. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander MÄ…dry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  28. arXiv:2410.12177  [pdf, other

    physics.optics eess.SY

    Towards Large Scale Atomic Manufacturing: Heterodyne Grating Interferometer with Zero Dead-Zone

    Authors: Can Cui, Lvye Gao, Pengbo Zhao, Menghan Yang, Lifu Liu, Yu Ma, Guangyao Huang, Shengtong Wang, Linbin Luo, Xinghui Li

    Abstract: This paper presents a novel heterodyne grating interferometer designed to meet the precise measurement requirements of next-generation lithography systems and large-scale atomic-level manufacturing. Utilizing a dual-frequency light source, the interferometer enables simultaneous measurement of three degrees of freedom. Key advancements include a compact zero Dead-Zone optical path configuration, s… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 8 pages,11 figures

  29. arXiv:2410.02957  [pdf, other

    eess.SY

    Human Balancing on a Log: A Switched Multi-Layer Controller

    Authors: Jiayi Zhao, Mo Yang, Jing Shuang Li

    Abstract: We study the task of balancing a human on a log that is fixed in place. Balancing on a log is substantially more challenging than balancing on a flat surface due to increased instability -- nonetheless, we are able to balance by composing simple (e.g., PID, LQR) controllers in a bio-inspired switched multi-layer configuration. The controller consists of an upper-layer LQR planner (akin to the cent… ▽ More

    Submitted 19 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: to appear at 2025 IEEE American Control Conference (ACC)

  30. arXiv:2409.06722  [pdf, other

    eess.IV cs.CV cs.LG

    Automated Quantification of White Blood Cells in Light Microscopic Images of Injured Skeletal Muscle

    Authors: Yang Jiao, Hananeh Derakhshan, Barbara St. Pierre Schneider, Emma Regentova, Mei Yang

    Abstract: White blood cells (WBCs) are the most diverse cell types observed in the healing process of injured skeletal muscles. In the course of healing, WBCs exhibit dynamic cellular response and undergo multiple protein expression changes. The progress of healing can be analyzed by quantifying the number of WBCs or the amount of specific proteins in light microscopic images obtained at different time poin… ▽ More

    Submitted 26 August, 2024; originally announced September 2024.

    Comments: 2 tables, 7 figures, 8 pages

  31. arXiv:2408.09241  [pdf, other

    cs.CV eess.IV

    Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration

    Authors: Xin Lin, Yuyan Zhou, Jingtong Yue, Chao Ren, Kelvin C. K. Chan, Lu Qi, Ming-Hsuan Yang

    Abstract: Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks without significantly modifying model structures or increasing the computational complexity. To address these issues, we propose a self-… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: This paper is an extended and revised version of our previous work "Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches"(https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_Unsupervised_Image_Denoising_in_Real-World_Scenarios_via_Self-Collaboration_Parallel_Generative_ICCV_2023_paper.pdf)

  32. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  33. arXiv:2406.16871  [pdf, other

    eess.SY

    Neural network based model predictive control of voltage for a polymer electrolyte fuel cell system with constraints

    Authors: Xiufei Li, Miao Yang, Yuanxin Qi, Miao Zhang

    Abstract: A fuel cell system must output a steady voltage as a power source in practical use. A neural network (NN) based model predictive control (MPC) approach is developed in this work to regulate the fuel cell output voltage with safety constraints. The developed NN MPC controller stabilizes the polymer electrolyte fuel cell system's output voltage by controlling the hydrogen and air flow rates at the s… ▽ More

    Submitted 24 March, 2024; originally announced June 2024.

  34. arXiv:2406.12596  [pdf, ps, other

    eess.SP

    Beyond Near-Field: Far-Field Location Division Multiple Access in Downlink MIMO Systems

    Authors: Haoyan Liu, Caijian Jie, Min Yang, Chengguang Li

    Abstract: Exploring channel dimensions has been the driving force behind breakthroughs in successive generations of mobile communication systems. In 5G, space division multiple access (SDMA) leveraging massive MIMO has been crucial in enhancing system capacity through spatial differentiation of users. However, SDMA can only finely distinguish users at adjacent angles in ultra-dense networks by extremely lar… ▽ More

    Submitted 30 January, 2025; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: We have omitted an important detail of the baseband equivalent model, which may mislead the reader. We are currently trying to resolve this issue, please withdraw our submission

  35. arXiv:2406.10137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Compressed Sensor Caching and Collaborative Sparse Data Recovery with Anchor Alignment

    Authors: Yi-Jen Yang, Ming-Hsun Yang, Jwo-Yuh Wu, Y. -W. Peter Hong

    Abstract: This work examines the compressed sensor caching problem in wireless sensor networks and devises efficient distributed sparse data recovery algorithms to enable collaboration among multiple caches. In this problem, each cache is only allowed to access measurements from a small subset of sensors within its vicinity to reduce both cache size and data acquisition overhead. To enable reliable data rec… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: v1 was submitted to IEEE Transactions on Signal Processing on Sept. 18, 2023

  36. arXiv:2405.10589  [pdf, other

    cs.CV cs.AI eess.IV

    Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance

    Authors: I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Ming-Hsuan Yang, Sy-Yen Kuo

    Abstract: Crowd counting and localization have become increasingly important in computer vision due to their wide-ranging applications. While point-based strategies have been widely used in crowd counting methods, they face a significant challenge, i.e., the lack of an effective learning strategy to guide the matching process. This deficiency leads to instability in matching point proposals to target points… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  37. arXiv:2405.07442  [pdf

    cs.SD cs.AI eess.AS q-bio.QM

    Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

    Authors: Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

    Abstract: Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio sample… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

  38. arXiv:2405.01200  [pdf, other

    eess.SY cs.LG

    Learning-to-solve unit commitment based on few-shot physics-guided spatial-temporal graph convolution network

    Authors: Mei Yang, Gao Qiu andJunyong Liu, Kai Liu

    Abstract: This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for cons… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  39. arXiv:2404.17736  [pdf, other

    eess.SP cs.CV cs.IT eess.IV

    Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission

    Authors: Mingyu Yang, Bowen Liu, Boyang Wang, Hun-Seok Kim

    Abstract: Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated to be an effective approach for wireless image transmission. Nevertheless, most existing work adopts an autoencoder framework to optimize conventional criteria such as Mean Squared Error (MSE) and Structural Similarity Index (SSIM) which do not suffice to maintain the perceptual quality of reconstructed images. Such… ▽ More

    Submitted 21 March, 2025; v1 submitted 26 April, 2024; originally announced April 2024.

  40. arXiv:2404.13153  [pdf, other

    eess.IV cs.CV

    Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

    Authors: Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang

    Abstract: Eliminating image blur produced by various kinds of motion has been a challenging problem. Dominant approaches rely heavily on model capacity to remove blurring by reconstructing residual from blurry observation in feature space. These practices not only prevent the capture of spatially variable motion in the real world but also ignore the tailored handling of various motions in image space. In th… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  41. arXiv:2404.11836  [pdf, other

    eess.SP

    AI-Empowered RIS-Assisted Networks: CV-Enabled RIS Selection and DNN-Enabled Transmission

    Authors: Conggang Hu, Yang Lu, Hongyang Du, Mi Yang, Bo Ai, Dusit Niyato

    Abstract: This paper investigates artificial intelligence (AI) empowered schemes for reconfigurable intelligent surface (RIS) assisted networks from the perspective of fast implementation. We formulate a weighted sum-rate maximization problem for a multi-RIS-assisted network. To avoid huge channel estimation overhead due to activate all RISs, we propose a computer vision (CV) enabled RIS selection scheme ba… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  42. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  43. arXiv:2404.06265  [pdf, other

    cs.CV eess.IV

    Spatial-Temporal Multi-level Association for Video Object Segmentation

    Authors: Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

    Abstract: Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel processing simultaneously, thereby constraining the learning of dynamic, target-aware features. To tackle these limitations, this paper proposes a spatial-temporal m… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  44. arXiv:2403.16170  [pdf, other

    eess.SY

    Voltage Regulation in Polymer Electrolyte Fuel Cell Systems Using Gaussian Process Model Predictive Control

    Authors: Xiufei Li, Miao Zhang, Yuanxin Qi, Miao Yang

    Abstract: This study introduces a novel approach utilizing Gaussian process model predictive control (MPC) to stabilize the output voltage of a polymer electrolyte fuel cell (PEFC) system by simultaneously regulating hydrogen and airflow rates. Two Gaussian process models are developed to capture PEFC dynamics, taking into account constraints including hydrogen pressure and input change rates, thereby aidin… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  45. arXiv:2403.00605  [pdf, other

    eess.SP

    Channel Measurements and Modeling for Dynamic Vehicular ISAC Scenarios at 28 GHz

    Authors: Zhengyu Zhang, Ruisi He, Bo Ai, Mi Yang, Xuejian Zhang, Ziyi Qi, Yuan Yuan

    Abstract: Integrated sensing and communication (ISAC) is a promising technology for 6G, with the goal of providing end-to-end information processing and inherent perception capabilities for future communication systems. Within ISAC emerging application scenarios, vehicular ISAC technologies have the potential to enhance traffic efficiency and safety through integration of communication and synchronized perc… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  46. arXiv:2403.00569  [pdf, other

    eess.SP

    Characterization of Wireless Channel Semantics: A New Paradigm

    Authors: Zhengyu Zhang, Ruisi He, Mi Yang, Xuejian Zhang, Ziyi Qi, Yuan Yuan, Bo Ai

    Abstract: Recently, deep learning enabled semantic communications have been developed to understand transmission content from semantic level, which realize effective and accurate information transfer. Aiming to the vision of sixth generation (6G) networks, wireless devices are expected to have native perception and intelligent capabilities, which associate wireless channel with surrounding environments from… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  47. arXiv:2403.00557  [pdf, other

    eess.SP

    Non-stationarity Characteristics in Dynamic Vehicular ISAC Channels at 28 GHz

    Authors: Zhengyu Zhang, Ruisi He, Mi Yang, Xuejian Zhang, Ziyi Qi, Hang Mi, Guiqi Sun, Jingya Yang, Bo Ai

    Abstract: Integrated sensing and communications (ISAC) is a potential technology of 6G, aiming to enable end-to-end information processing ability and native perception capability for future communication systems. As an important part of the ISAC application scenarios, ISAC aided vehicle-to-everything (V2X) can improve the traffic efficiency and safety through intercommunication and synchronous perception.… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  48. arXiv:2403.00505  [pdf, other

    eess.SP

    A Cluster-Based Statistical Channel Model for Integrated Sensing and Communication Channels

    Authors: Zhengyu Zhang, Ruisi He, Bo Ai, Mi Yang, Yong Niu, Zhangdui Zhong, Yujian Li, Xuejian Zhang, Jing Li

    Abstract: The emerging 6G network envisions integrated sensing and communication (ISAC) as a promising solution to meet growing demand for native perception ability. To optimize and evaluate ISAC systems and techniques, it is crucial to have an accurate and realistic wireless channel model. However, some important features of ISAC channels have not been well characterized, for example, most existing ISAC ch… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  49. arXiv:2402.10427  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Evaluating and Improving Continual Learning in Spoken Language Understanding

    Authors: Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj

    Abstract: Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects o… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  50. arXiv:2401.02046  [pdf, other

    eess.AS cs.SD

    CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition

    Authors: Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

    Abstract: Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skipping method th… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: accepted by ASRU 2023