Skip to main content

Showing 1–50 of 1,438 results for author: Wang, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.07481  [pdf, ps, other

    cs.NI eess.SP

    Energy Transfer and Data Collection from Batteryless Sensors in Low-altitude Wireless Networks

    Authors: Wen Zhang, Aimin Wang, Jiahui Li, Geng Sun, Jiacheng Wang, Weijie Yuan, Dusit Niyato

    Abstract: The integration of wireless power transfer (WPT) with Internet of Things (IoT) offers promising solutions for sensing applications, but faces significant challenges when deployed in hard-to-access areas such as high-temperature environments. In such extreme conditions, traditional fixed WPT infrastructure cannot be safely installed, and batteries rapidly degrade due to hardware failures. In this p… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

  2. arXiv:2507.07384  [pdf, ps, other

    cs.SD eess.AS

    VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching

    Authors: Yu Chen, Xinyuan Qian, Hongxu Zhu, Jiadong Wang, Kainan Chen, Haizhou Li

    Abstract: Audio-visual sound source localization (AV-SSL) identifies the position of a sound source by exploiting the complementary strengths of auditory and visual signals. However, existing AV-SSL methods encounter three major challenges: 1) inability to selectively isolate the target sound source in multi-source scenarios, 2) misalignment between semantic visual features and spatial acoustic features, an… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Under Review

  3. arXiv:2507.07105  [pdf, ps, other

    cs.CV eess.IV

    4KAgent: Agentic Any Image to 4K Super-Resolution

    Authors: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu

    Abstract: We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components:… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Project page: https://4kagent.github.io

  4. arXiv:2507.05177  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

    Authors: Chen Wang, Tianyu Peng, Wen Yang, Yinan Bai, Guangfu Wang, Jun Lin, Lanpeng Jia, Lingxiang Wu, Jinqiao Wang, Chengqing Zong, Jiajun Zhang

    Abstract: Empathetic interaction is a cornerstone of human-machine communication, due to the need for understanding speech enriched with paralinguistic cues and generating emotional and expressive responses. However, the most powerful empathetic LSLMs are increasingly closed off, leaving the crucial details about the architecture, data and development opaque to researchers. Given the critical need for trans… ▽ More

    Submitted 8 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: Technical Report

  5. arXiv:2507.04776  [pdf, ps, other

    cs.SD cs.LG cs.MM eess.AS

    Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction

    Authors: Jun-You Wang, Li Su

    Abstract: We propose a pre-trained BERT-like model for symbolic music understanding that achieves competitive performance across a wide range of downstream tasks. To achieve this target, we design two novel pre-training objectives, namely token correction and pianoroll prediction. First, we sample a portion of note tokens and corrupt them with a limited amount of noise, and then train the model to denoise t… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted at ISMIR 2025

  6. arXiv:2507.02666  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning

    Authors: Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

    Abstract: In recent advancements in audio self-supervised representation learning, the standard Transformer architecture has emerged as the predominant approach, yet its attention mechanism often allocates a portion of attention weights to irrelevant information, potentially impairing the model's discriminative ability. To address this, we introduce a differential attention mechanism, which effectively miti… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Accepted at Interspeech2025

  7. arXiv:2507.02374  [pdf, ps, other

    eess.SP

    Predictive Control over LAWN: Joint Trajectory Design and Resource Allocation

    Authors: Haijia Jin, Jun Wu, Weijie Yuan, Ruizhi Ruan, Jiacheng Wang, Dusit Niyato, Dong In Kim, Abbas Jamalipour

    Abstract: Low-altitude wireless networks (LAWNs) have been envisioned as flexible and transformative platforms for enabling delay-sensitive control applications in Internet of Things (IoT) systems. In this work, we investigate the real-time wireless control over a LAWN system, where an aerial drone is employed to serve multiple mobile automated guided vehicles (AGVs) via finite blocklength (FBL) transmissio… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  8. arXiv:2507.01360  [pdf, ps, other

    cs.NI eess.SP

    MmBack: Clock-free Multi-Sensor Backscatter with Synchronous Acquisition and Multiplexing

    Authors: Yijie Li, Weichong Ling, Taiting Lu, Yi-Chao Chen, Vaishnavi Ranganathan, Lili Qiu, Jingxian Wang

    Abstract: Backscatter tags provide a low-power solution for sensor applications, yet many real-world scenarios require multiple sensors-often of different types-for complex sensing tasks. However, existing designs support only a single sensor per tag, increasing spatial overhead. State-of-the-art approaches to multiplexing multiple sensor streams on a single tag rely on onboard clocks or multiple modulation… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 16 pages, 14 figures

  9. arXiv:2507.00398  [pdf, ps, other

    eess.IV cs.CV

    Accurate and Efficient Fetal Birth Weight Estimation from 3D Ultrasound

    Authors: Jian Wang, Qiongying Ni, Hongkui Yu, Ruixuan Yao, Jinqiao Ying, Bin Zhang, Xingyi Yang, Jin Peng, Jiongquan Chen, Junxuan Yu, Wenlong Shi, Chaoyu Chen, Zhongnuo Yan, Mingyuan Luo, Gaocheng Cai, Dong Ni, Jing Lu, Xin Yang

    Abstract: Accurate fetal birth weight (FBW) estimation is essential for optimizing delivery decisions and reducing perinatal mortality. However, clinical methods for FBW estimation are inefficient, operator-dependent, and challenging to apply in cases of complex fetal anatomy. Existing deep learning methods are based on 2D standard ultrasound (US) images or videos that lack spatial information, limiting the… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  10. arXiv:2507.00366  [pdf, ps, other

    cs.IT eess.SP

    Wireless AI Evolution: From Statistical Learners to Electromagnetic-Guided Foundation Models

    Authors: Jian Xiao, Ji Wang, Kunrui Cao, Xingwang Li, Zhao Chen, Chau Yuen

    Abstract: While initial applications of artificial intelligence (AI) in wireless communications over the past decade have demonstrated considerable potential using specialized models for targeted communication tasks, the revolutionary demands of sixth-generation (6G) networks for holographic communications, ubiquitous sensing, and native intelligence are propelling a necessary evolution towards AI-native wi… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  11. arXiv:2506.23874  [pdf, ps, other

    eess.AS cs.SD

    URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition

    Authors: Jiahe Wang, Chenda Li, Wei Wang, Wangyou Zhang, Samuele Cornell, Marvin Sach, Robin Scheibler, Kohei Saijo, Yihui Fu, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian

    Abstract: The Mean Opinion Score (MOS) is fundamental to speech quality assessment. However, its acquisition requires significant human annotation. Although deep neural network approaches, such as DNSMOS and UTMOS, have been developed to predict MOS to avoid this issue, they often suffer from insufficient training data. Recognizing that the comparison of speech enhancement (SE) systems prioritizes a reliabl… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Submitted to ASRU2025

  12. arXiv:2506.23557  [pdf, ps, other

    eess.SP

    Data-Driven Modulation Optimization with LMMSE Equalization for Reliability Enhancement in Underwater Acoustic Communications

    Authors: Xuehan Wang, Hengyu Zhang, Jintao Wang, Zhi Sun, Bo Ai

    Abstract: Ultra-reliable underwater acoustic (UWA) communications serve as one of the key enabling technologies for future space-air-ground-underwater integrated networks. However, the reliability of current UWA transmission is still insufficient since severe performance degradation occurs for conventional multicarrier systems in UWA channels with severe delay-scale spread. To solve this problem, we exploit… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 6 pages, 3 figures. This paper has been accepted for presentation in IEEE/CIC ICCC 2025

  13. arXiv:2506.23493  [pdf, ps, other

    cs.NI eess.SP

    Securing the Sky: Integrated Satellite-UAV Physical Layer Security for Low-Altitude Wireless Networks

    Authors: Jiahui Li, Geng Sun, Xiaoyu Sun, Fang Mei, Jingjing Wang, Xiangwang Hou, Daxin Tian, Victor C. M. Leung

    Abstract: Low-altitude wireless networks (LAWNs) have garnered significant attention in the forthcoming 6G networks. In LAWNs, satellites with wide coverage and unmanned aerial vehicles (UAVs) with flexible mobility can complement each other to form integrated satellite-UAV networks, providing ubiquitous and high-speed connectivity for low-altitude operations. However, the higher line-of-sight probability i… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper has been submitted to IEEE Wireless Communications

  14. arXiv:2506.23203  [pdf, ps, other

    eess.SP cs.AI

    Multi-Branch DNN and CRLB-Ratio-Weight Fusion for Enhanced DOA Sensing via a Massive H$^2$AD MIMO Receiver

    Authors: Feng Shu, Jiatong Bai, Di Wu, Wei Zhu, Bin Deng, Fuhui Zhou, Jiangzhou Wang

    Abstract: As a green MIMO structure, massive H$^2$AD is viewed as a potential technology for the future 6G wireless network. For such a structure, it is a challenging task to design a low-complexity and high-performance fusion of target direction values sensed by different sub-array groups with fewer use of prior knowledge. To address this issue, a lightweight Cramer-Rao lower bound (CRLB)-ratio-weight fusi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  15. arXiv:2506.22646  [pdf, ps, other

    eess.AS cs.SD

    Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR

    Authors: Weiqing Wang, Taejin Park, Ivan Medennikov, Jinhan Wang, Kunal Dhawan, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

    Abstract: We propose a self-speaker adaptation method for streaming multi-talker automatic speech recognition (ASR) that eliminates the need for explicit speaker queries. Unlike conventional approaches requiring target speaker embeddings or enrollment audio, our technique dynamically adapts individual ASR instances through speaker-wise speech activity prediction. The key innovation involves injecting speake… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted by INTERSPEECH 2025

  16. arXiv:2506.22277  [pdf, ps, other

    eess.SP

    A Self-scaled Approximate $\ell_0$ Regularization Robust Model for Outlier Detection

    Authors: Pengyang Song, Jue Wang

    Abstract: Robust regression models in the presence of outliers have significant practical relevance in areas such as signal processing, financial econometrics, and energy management. Many existing robust regression methods, either grounded in statistical theory or sparse signal recovery, typically rely on the explicit or implicit assumption of outlier sparsity to filter anomalies and recover the underlying… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  17. arXiv:2506.21796  [pdf, ps, other

    eess.SP cs.AI

    Demonstrating Interoperable Channel State Feedback Compression with Machine Learning

    Authors: Dani Korpi, Rachel Wang, Jerry Wang, Abdelrahman Ibrahim, Carl Nuzman, Runxin Wang, Kursat Rasim Mestav, Dustin Zhang, Iraj Saniee, Shawn Winston, Gordana Pavlovic, Wei Ding, William J. Hillery, Chenxi Hao, Ram Thirunagari, Jung Chang, Jeehyun Kim, Bartek Kozicki, Dragan Samardzija, Taesang Yoo, Andreas Maeder, Tingfang Ji, Harish Viswanathan

    Abstract: Neural network-based compression and decompression of channel state feedback has been one of the most widely studied applications of machine learning (ML) in wireless networks. Various simulation-based studies have shown that ML-based feedback compression can result in reduced overhead and more accurate channel information. However, to the best of our knowledge, there are no real-life proofs of co… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  18. arXiv:2506.21619  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

    Authors: Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu

    Abstract: Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  19. arXiv:2506.21448  [pdf, ps, other

    eess.AS cs.CV cs.SD

    ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

    Authors: Huadai Liu, Jialei Wang, Kaicheng Luo, Wen Wang, Qian Chen, Zhou Zhao, Wei Xue

    Abstract: While end-to-end video-to-audio generation has greatly improved, producing high-fidelity audio that authentically captures the nuances of visual content remains challenging. Like professionals in the creative industries, such generation requires sophisticated reasoning about items such as visual dynamics, acoustic environments, and temporal relationships. We present ThinkSound, a novel framework t… ▽ More

    Submitted 28 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  20. arXiv:2506.20244  [pdf, ps, other

    eess.SY

    Cooperative Sensing and Communication Beamforming Design for Low-Altitude Economy

    Authors: Fangzhi Li, Zhichu Ren, Cunhua Pan, Hong Ren, Jing Jin, Qixing Wang, Jiangzhou Wang

    Abstract: To empower the low-altitude economy with high-accuracy sensing and high-rate communication, this paper proposes a cooperative integrated sensing and communication (ISAC) framework for aerial-ground networks. In the proposed system, the ground base stations (BSs) cooperatively serve the unmanned aerial vehicles (UAVs), which are equipped for either joint communication and sensing or sensing-only op… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  21. arXiv:2506.19975  [pdf, ps, other

    eess.IV cs.AI cs.CV eess.SP

    VoxelOpt: Voxel-Adaptive Message Passing for Discrete Optimization in Deformable Abdominal CT Registration

    Authors: Hang Zhang, Yuxi Zhang, Jiazheng Wang, Xiang Chen, Renjiu Hu, Xin Tian, Gaolei Li, Min Liu

    Abstract: Recent developments in neural networks have improved deformable image registration (DIR) by amortizing iterative optimization, enabling fast and accurate DIR results. However, learning-based methods often face challenges with limited training data, large deformations, and tend to underperform compared to iterative approaches when label supervision is unavailable. While iterative methods can achiev… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Accepted for publication at MICCAI 2025

  22. arXiv:2506.19774  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

    Authors: Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li, Zihan Li, Yuzhe Liang, Xiaopeng Wang, Haorui Zheng, Ming Wen, Kang Yin, Yiran Wang, Nan Li, Feng Deng, Liang Dong, Chen Zhang, Di Zhang, Kun Gai

    Abstract: We propose Kling-Foley, a large-scale multimodal Video-to-Audio generation model that synthesizes high-quality audio synchronized with video content. In Kling-Foley, we introduce multimodal diffusion transformers to model the interactions between video, audio, and text modalities, and combine it with a visual semantic representation module and an audio-visual synchronization module to enhance alig… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  23. arXiv:2506.19376  [pdf, ps, other

    eess.SP

    Holographic Communication via Recordable and Reconfigurable Metasurface

    Authors: Jinzhe Wang, Qinghua Guo, Xiaojun Yuan

    Abstract: Holographic surface based communication technologies are anticipated to play a significant role in the next generation of wireless networks. The existing reconfigurable holographic surface (RHS)-based scheme only utilizes the reconstruction process of the holographic principle for beamforming, where the channel sate information (CSI) is needed. However, channel estimation for CSI acquirement is a… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  24. arXiv:2506.18680  [pdf, ps, other

    cs.GR cs.CV cs.SD eess.AS

    DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling

    Authors: Anindita Ghosh, Bing Zhou, Rishabh Dabral, Jian Wang, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek, Chuan Guo

    Abstract: We present DuetGen, a novel framework for generating interactive two-person dances from music. The key challenge of this task lies in the inherent complexities of two-person dance interactions, where the partners need to synchronize both with each other and with the music. Inspired by the recent advances in motion synthesis, we propose a two-stage solution: encoding two-person motions into discret… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures, 2 tables, accepted in ACM Siggraph 2025 conference track

  25. arXiv:2506.18067  [pdf, ps, other

    eess.SP cs.IT

    Cooperative Bistatic ISAC Systems for Low-Altitude Economy

    Authors: Zhenkun Zhang, Yining Xu, Cunhua Pan, Hong Ren, Yiming Yu, Jiangzhou Wang

    Abstract: The burgeoning low-altitude economy (LAE) necessitates integrated sensing and communication (ISAC) systems capable of high-accuracy multi-target localization and velocity estimation under hardware and coverage constraints inherent in conventional ISAC architectures. This paper addresses these challenges by proposing a cooperative bistatic ISAC framework within MIMO-OFDM cellular networks, enabling… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  26. arXiv:2506.17184  [pdf, ps, other

    cs.RO eess.SY

    Judo: A User-Friendly Open-Source Package for Sampling-Based Model Predictive Control

    Authors: Albert H. Li, Brandon Hung, Aaron D. Ames, Jiuguang Wang, Simon Le Cleac'h, Preston Culbertson

    Abstract: Recent advancements in parallel simulation and successful robotic applications are spurring a resurgence in sampling-based model predictive control. To build on this progress, however, the robotics community needs common tooling for prototyping, evaluating, and deploying sampling-based controllers. We introduce Judo, a software package designed to address this need. To facilitate rapid prototyping… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted at the 2025 RSS Workshop on Fast Motion Planning and Control in the Era of Parallelism. 5 Pages

  27. arXiv:2506.16173  [pdf, ps, other

    cs.RO cs.SD eess.AS

    Single-Microphone-Based Sound Source Localization for Mobile Robots in Reverberant Environments

    Authors: Jiang Wang, Runwu Shi, Benjamin Yen, He Kong, Kazuhiro Nakadai

    Abstract: Accurately estimating sound source positions is crucial for robot audition. However, existing sound source localization methods typically rely on a microphone array with at least two spatially preconfigured microphones. This requirement hinders the applicability of microphone-based robot audition systems and technologies. To alleviate these challenges, we propose an online sound source localizatio… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: This paper was accepted and going to appear in the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  28. arXiv:2506.15835  [pdf, ps, other

    eess.IV cs.AI cs.CV

    MoNetV2: Enhanced Motion Network for Freehand 3D Ultrasound Reconstruction

    Authors: Mingyuan Luo, Xin Yang, Zhongnuo Yan, Yan Cao, Yuanji Zhang, Xindi Hu, Jin Wang, Haoxuan Ding, Wei Han, Litao Sun, Dong Ni

    Abstract: Three-dimensional (3D) ultrasound (US) aims to provide sonographers with the spatial relationships of anatomical structures, playing a crucial role in clinical diagnosis. Recently, deep-learning-based freehand 3D US has made significant advancements. It reconstructs volumes by estimating transformations between images without external tracking. However, image-only reconstruction poses difficulties… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  29. arXiv:2506.12308  [pdf, ps, other

    eess.SP eess.SY

    From Ground to Sky: Architectures, Applications, and Challenges Shaping Low-Altitude Wireless Networks

    Authors: Weijie Yuan, Yuanhao Cui, Jiacheng Wang, Fan Liu, Geng Sun, Tao Xiang, Jie Xu, Shi Jin, Dusit Niyato, Sinem Coleri, Sumei Sun, Shiwen Mao, Abbas Jamalipour, Dong In Kim, Mohamed-Slim Alouini, Xuemin Shen

    Abstract: In this article, we introduce a novel low-altitude wireless network (LAWN), which is a reconfigurable, three-dimensional (3D) layered architecture. In particular, the LAWN integrates connectivity, sensing, control, and computing across aerial and terrestrial nodes that enable seamless operation in complex, dynamic, and mission-critical environments. Different from the conventional aerial communica… ▽ More

    Submitted 16 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

    Comments: 10 pages, 5 figures

  30. arXiv:2506.11443  [pdf, ps, other

    eess.IV

    Hadamard Encoded Row Column Ultrasonic Expansive Scanning (HERCULES) with Bias-Switchable Row-Column Arrays

    Authors: Darren Olufemi Dahunsi, Randy Palmar, Tyler Henry, Mohammad Rahim Sobhani, Negar Majidi, Joy Wang, Afshin Kashani Ilkhechi, Jeremy Brown, Roger Zemp

    Abstract: Top-Orthogonal-to-Bottom-Electrode (TOBE) arrays, also known as bias-switchable row-column arrays (RCAs), allow for imaging techniques otherwise impossible for non-bias-switachable RCAs. Hadamard Encoded Row Column Ultrasonic Expansive Scanning (HERCULES) is a novel imaging technique that allows for expansive 3D scanning by transmitting plane or cylindrical wavefronts and receiving using Hadamard-… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 10 pages, 10 figures, 6 supplementary videos

  31. arXiv:2506.10958  [pdf, ps, other

    eess.IV

    Bias-Switchable Row-Column Array Imaging using Fast Orthogonal Row-Column Electronic Scanning (FORCES) Compared with Conventional Row-Column Array Imaging

    Authors: Randy Palamar, Mohammad Rahim Sobhani, Darren Dahunsi, Negar Majidi, Afshin Kashani Ilkhechi, Joy Wang, Jeremy Brown, Roger Zemp

    Abstract: Row-Column Arrays (RCAs) offer an attractive alternative to fully wired 2D-arrays for 3D-ultrasound, due to their greatly simplified wiring. However, conventional RCAs face challenges related to their long elements. These include an inability to image beyond the shadow of the aperture and an inability to focus in both transmit and receive for desired scan planes. To address these limitations, we r… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  32. arXiv:2506.10459  [pdf, ps, other

    cs.CV eess.IV

    Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Intermediate Feature Distance

    Authors: Chun Liu, Bingqian Zhu, Tao Xu, Zheng Zheng, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang

    Abstract: Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, which pose security challenges to hyperspectral image (HSI) classification technologies based on DNNs. In the domain of natural images, numerous transfer-based adversarial attack methods have been studied. However, HSIs differ from natural images due to their high-dimensional and rich spectral information. Current research on HSI a… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  33. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  34. arXiv:2506.07876  [pdf, ps, other

    cs.RO eess.SY

    Versatile Loco-Manipulation through Flexible Interlimb Coordination

    Authors: Xinghao Zhu, Yuxin Chen, Lingfeng Sun, Farzad Niroui, Simon Le Cleac'h, Jiuguang Wang, Kuan Fang

    Abstract: The ability to flexibly leverage limbs for loco-manipulation is essential for enabling autonomous robots to operate in unstructured environments. Yet, prior work on loco-manipulation is often constrained to specific tasks or predetermined limb configurations. In this work, we present Reinforcement Learning for Interlimb Coordination (ReLIC), an approach that enables versatile loco-manipulation thr… ▽ More

    Submitted 10 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  35. arXiv:2506.06360  [pdf

    eess.SP cs.LG

    Towards Generalizable Drowsiness Monitoring with Physiological Sensors: A Preliminary Study

    Authors: Jiyao Wang, Suzan Ayas, Jiahao Zhang, Xiao Wen, Dengbo He, Birsen Donmez

    Abstract: Accurately detecting drowsiness is vital to driving safety. Among all measures, physiological-signal-based drowsiness monitoring can be more privacy-preserving than a camera-based approach. However, conflicts exist regarding how physiological metrics are associated with different drowsiness labels across datasets. Thus, we analyzed key features from electrocardiograms (ECG), electrodermal activity… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted by HFES2025

  36. arXiv:2506.06156  [pdf, ps, other

    cs.IT eess.SP

    Resource Allocation for Pinching-Antenna Systems: State-of-the-Art, Key Techniques and Open Issues

    Authors: Ming Zeng, Ji Wang, Octavia A. Dobre, Zhiguo Ding, George K. Karagiannidis, Robert Schober, H. Vincent Poor

    Abstract: Pinching antennas have emerged as a promising technology for reconfiguring wireless propagation environments, particularly in high-frequency communication systems operating in the millimeter-wave and terahertz bands. By enabling dynamic activation at arbitrary positions along a dielectric waveguide, pinching antennas offer unprecedented channel reconfigurability and the ability to provide line-of-… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: submitted to IEEE WCM, 8 pages, 5 figures

  37. arXiv:2506.04134  [pdf, other

    cs.CV cs.SD eess.AS

    UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation

    Authors: Jinting Wang, Shan Yang, Li Liu

    Abstract: Cued Speech (CS) enhances lipreading through hand coding, providing precise speech perception support for the hearing-impaired. CS Video-to-Speech generation (CSV2S) task aims to convert the CS visual expressions (CS videos) of hearing-impaired individuals into comprehensible speech signals. Direct generation of speech from CS video (called single CSV2S) yields poor performance due to insufficient… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 10 pages, 10 figures

  38. arXiv:2506.03976  [pdf, ps, other

    cs.IT eess.SP math.ST

    Large Deviations for Sequential Tests of Statistical Sequence Matching

    Authors: Lin Zhou, Qianyun Wang, Yun Wei, Jingjing Wang

    Abstract: We revisit the problem of statistical sequence matching initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for sequential tests that have bounded expected stopping times. Specifically, in this problem, one is given two databases of sequences and the task is to identify all matched pairs of sequences. In each database, each sequence is generated i.i.d. from a distinc… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  39. arXiv:2506.02610  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm

    Authors: Zhaoyang Li, Jie Wang, XiaoXiao Li, Wangjie Li, Longjie Luo, Lin Li, Qingyang Hong

    Abstract: In speaker diarization, traditional clustering-based methods remain widely used in real-world applications. However, these methods struggle with the complex distribution of speaker embeddings and overlapping speech segments. To address these limitations, we propose an Overlapping Community Detection method based on Graph Attention networks and the Label Propagation Algorithm (OCDGALP). The propose… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  40. arXiv:2506.02574  [pdf, other

    eess.IV cs.CV cs.MM

    Dynamic mapping from static labels: remote sensing dynamic sample generation with temporal-spectral embedding

    Authors: Shuai Yuan, Shuang Chen, Tianwu Lin, Jie Wang, Peng Gong

    Abstract: Accurate remote sensing geographic mapping depends heavily on representative and timely sample data. However, rapid changes in land surface dynamics necessitate frequent updates, quickly rendering previously collected samples obsolete and imposing significant labor demands for continuous manual updates. In this study, we aim to address this problem by dynamic sample generation using existing singl… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  41. arXiv:2506.02414  [pdf, ps, other

    cs.MM cs.CL cs.SD eess.AS

    StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion

    Authors: Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu

    Abstract: Voice Conversion (VC) modifies speech to match a target speaker while preserving linguistic content. Traditional methods usually extract speaker information directly from speech while neglecting the explicit utilization of linguistic content. Since VC fundamentally involves disentangling speaker identity from linguistic content, leveraging structured semantic features could enhance conversion perf… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 5 pages, 2 figures, Accepted by Interspeech 2025, Demo: https://thuhcsi.github.io/StarVC/

  42. arXiv:2506.02197  [pdf, ps, other

    eess.IV cs.CV

    NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution

    Authors: Marcos V. Conde, Radu Timofte, Zihao Lu, Xiangyu Kong, Xiaoxia Xing, Fan Wang, Suejin Han, MinKyu Park, Tianyu Zhang, Xin Luo, Yeda Chen, Dong Liu, Li Pang, Yuhang Yang, Hongzhong Wang, Xiangyong Cao, Ruixuan Jiang, Senyan Xu, Siyuan Jiang, Xueyang Fu, Zheng-Jun Zha, Tianyu Hao, Yuhong He, Ruoqi Li, Yueqi Yang , et al. (14 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and… ▽ More

    Submitted 4 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

  43. arXiv:2506.01038  [pdf, ps, other

    eess.SP

    Self-Supervised-ISAR-Net Enables Fast Sparse ISAR Imaging

    Authors: Ziwen Wang, Jianping wang, Pucheng Li, Yifan Wu, Zegang Ding

    Abstract: Numerous sparse inverse synthetic aperture radar (ISAR) imaging methods based on unfolded neural networks have been developed for high-quality image reconstruction with sparse measurements. However, their training typically requires paired ISAR images and echoes, which are often difficult to obtain. Meanwhile, one property can be observed that for a certain sparse measurement configuration of ISAR… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  44. arXiv:2505.24493  [pdf, ps, other

    cs.AI cs.SD eess.AS

    MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge

    Authors: Xin Jing, Jiadong Wang, Iosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller

    Abstract: Although speech emotion recognition (SER) has advanced significantly with deep learning, annotation remains a major hurdle. Human annotation is not only costly but also subject to inconsistencies annotators often have different preferences and may lack the necessary contextual knowledge, which can lead to varied and inaccurate labels. Meanwhile, Large Language Models (LLMs) have emerged as a scala… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  45. arXiv:2505.24437  [pdf, ps, other

    cs.SD eess.AS

    SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

    Authors: Jin Wang, Wenbin Jiang, Xiangbo Wang

    Abstract: Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available embedding space is sharply constrained. To address this, we propose a universal high-fidelity neural audio compression algorithm featuring Residual Experts Vec… ▽ More

    Submitted 4 July, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 11 pages,7 figures

  46. arXiv:2505.24356  [pdf, ps, other

    eess.SP

    Joint Transmit and Receive Beamforming for Tri-directional Coil-Based Magnetic Induction Communications

    Authors: Jinyang Li, Jianyu Wang, Wenchi Cheng, Yudong Fang, Wei Guo

    Abstract: In this paper, we enhance the omnidirectional coverage performance of tri-directional coil-based magnetic induction communication (TC-MIC) and reduce the pathloss with a joint transmit and receive magnetic beamforming method. An iterative optimization algorithm incorporating the transmit current vector and receive weight matrix is developed to minimize the pathloss under constant transmit power co… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  47. arXiv:2505.23980  [pdf, other

    cs.CV cs.LG eess.IV

    DeepTopoNet: A Framework for Subglacial Topography Estimation on the Greenland Ice Sheets

    Authors: Bayu Adhi Tama, Mansa Krishna, Homayra Alam, Mostafa Cham, Omar Faruque, Gong Cheng, Jianwu Wang, Mathieu Morlighem, Vandana Janeja

    Abstract: Understanding Greenland's subglacial topography is critical for projecting the future mass loss of the ice sheet and its contribution to global sea-level rise. However, the complex and sparse nature of observational data, particularly information about the bed topography under the ice sheet, significantly increases the uncertainty in model projections. Bed topography is traditionally measured by a… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Submitted to SIGSPATIAL 2025

  48. arXiv:2505.23249  [pdf, ps, other

    cs.NI eess.SP

    Context-Aware Semantic Communication for the Wireless Networks

    Authors: Guangyuan Liu, Yinqiu Liu, Jiacheng Wang, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour

    Abstract: In next-generation wireless networks, supporting real-time applications such as augmented reality, autonomous driving, and immersive Metaverse services demands stringent constraints on bandwidth, latency, and reliability. Existing semantic communication (SemCom) approaches typically rely on static models, overlooking dynamic conditions and contextual cues vital for efficient transmission. To addre… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  49. arXiv:2505.22053  [pdf, other

    cs.SD cs.MA cs.MM eess.AS

    AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

    Authors: Yan Rong, Jinting Wang, Shan Yang, Guangzhi Lei, Li Liu

    Abstract: Multimodality-to-Multiaudio (MM2MA) generation faces significant challenges in synthesizing diverse and contextually aligned audio types (e.g., sound effects, speech, music, and songs) from multimodal inputs (e.g., video, text, images), owing to the scarcity of high-quality paired datasets and the lack of robust multi-task learning frameworks. Recently, multi-agent system shows great potential in… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  50. arXiv:2505.20149  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Improvement Strategies for Few-Shot Learning in OCT Image Classification of Rare Retinal Diseases

    Authors: Cheng-Yu Tai, Ching-Wen Chen, Chi-Chin Wu, Bo-Chen Chiu, Cheng-Hung, Lin, Cheng-Kai Lu, Jia-Kang Wang, Tzu-Lun Huang

    Abstract: This paper focuses on using few-shot learning to improve the accuracy of classifying OCT diagnosis images with major and rare classes. We used the GAN-based augmentation strategy as a baseline and introduced several novel methods to further enhance our model. The proposed strategy contains U-GAT-IT for improving the generative part and uses the data balance technique to narrow down the skew of acc… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.