Skip to main content

Showing 1–50 of 263 results for author: Yang, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.08293  [pdf, ps, other

    eess.SP

    Ambiguity Function Analysis of AFDM Signals for Integrated Sensing and Communications

    Authors: Haoran Yin, Yanqun Tang, Yuanhan Ni, Zulin Wang, Gaojie Chen, Jun Xiong, Kai Yang, Marios Kountouris, Yong Liang Guan, Yong Zeng

    Abstract: Affine frequency division multiplexing (AFDM) is a promising chirp-based waveform with high flexibility and resilience, making it well-suited for next-generation wireless networks, particularly in high-mobility scenarios. In this paper, we investigate the ambiguity functions (AFs) of AFDM signals, which fundamentally characterize their range and velocity estimation capabilities in both monostatic… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: 14 pages, 14 figures. Under revision in an IEEE Journal

  2. arXiv:2507.06971  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting

    Authors: Fei Teng, Kai Luo, Sheng Wu, Siyu Li, Pujun Guo, Jiale Wei, Kunyu Peng, Jiaming Zhang, Kailun Yang

    Abstract: Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360° surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have… ▽ More

    Submitted 9 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

    Comments: The source code will be publicly available at https://github.com/Bryant-Teng/Percep360

  3. arXiv:2507.04002  [pdf, ps, other

    cs.CV cs.RO eess.IV

    NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models

    Authors: Siyu Li, Fei Teng, Yihong Cao, Kailun Yang, Zhiyong Li, Yaonan Wang

    Abstract: Birds' Eye View (BEV) semantic segmentation is an indispensable perception task in end-to-end autonomous driving systems. Unsupervised and semi-supervised learning for BEV tasks, as pivotal for real-world applications, underperform due to the homogeneous distribution of the labeled data. In this work, we explore the potential of synthetic data from driving world models to enhance the diversity of… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/NRSeg

  4. arXiv:2506.22902  [pdf, ps, other

    cs.CV eess.IV

    Point Cloud Compression and Objective Quality Assessment: A Survey

    Authors: Yiling Xu, Yujie Zhang, Shuting Xia, Kaifa Yang, He Huang, Ziyu Shan, Wenjie Huang, Qi Yang, Le Yang

    Abstract: The rapid growth of 3D point cloud data, driven by applications in autonomous driving, robotics, and immersive environments, has led to criticals demand for efficient compression and quality assessment techniques. Unlike traditional 2D media, point clouds present unique challenges due to their irregular structure, high data volume, and complex attributes. This paper provides a comprehensive survey… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  5. arXiv:2506.22507  [pdf, ps, other

    cs.NI cs.MA eess.SP

    Integrated Multimodal Sensing and Communication: Challenges, Technologies, and Architectures

    Authors: Yubo Peng, Luping Xiang, Kun Yang, Feibo Jiang, Kezhi Wang, Christos Masouros

    Abstract: The evolution towards 6G networks requires the intelligent integration of communication and sensing capabilities to support diverse and complex applications, such as autonomous driving and immersive services. However, existing integrated sensing and communication (ISAC) systems predominantly rely on single-modal sensors as primary participants, which leads to a limited representation of environmen… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  6. arXiv:2506.21198  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation

    Authors: Yihong Cao, Jiaming Zhang, Xu Zheng, Hao Shi, Kunyu Peng, Hang Liu, Kailun Yang, Hui Zhang

    Abstract: Panoramic image processing is essential for omni-context perception, yet faces constraints like distortions, perspective occlusions, and limited annotations. Previous unsupervised domain adaptation methods transfer knowledge from labeled pinhole data to unlabeled panoramic images, but they require access to source pinhole data. To address these, we introduce a more practical task, i.e., Source-Fre… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK

  7. arXiv:2506.21185  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Out-of-Distribution Semantic Occupancy Prediction

    Authors: Yuheng Zhang, Mengfei Duan, Kunyu Peng, Yuhang Wang, Ruiping Liu, Fei Teng, Kai Luo, Zhiyong Li, Kailun Yang

    Abstract: 3D Semantic Occupancy Prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these cha… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD

  8. arXiv:2506.20293  [pdf, ps, other

    cs.CV eess.IV

    Breaking Spatial Boundaries: Spectral-Domain Registration Guided Hyperspectral and Multispectral Blind Fusion

    Authors: Kunjing Yang, Libin Zheng, Minru Bai, Ting Lu, Leyuan Fang

    Abstract: The blind fusion of unregistered hyperspectral images (HSIs) and multispectral images (MSIs) has attracted growing attention recently. To address the registration challenge, most existing methods employ spatial transformations on the HSI to achieve alignment with the MSI. However, due to the substantial differences in spatial resolution of the images, the performance of these methods is often unsa… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  9. arXiv:2506.09650  [pdf, ps, other

    cs.CV cs.LG cs.MM cs.RO eess.IV

    HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios

    Authors: Kunyu Peng, Junchao Huang, Xiangsheng Huang, Di Wen, Junwei Zheng, Yufan Chen, Kailun Yang, Jiamin Wu, Chongqing Hao, Rainer Stiefelhagen

    Abstract: Action segmentation is a core challenge in high-level video understanding, aiming to partition untrimmed videos into segments and assign each a label from a predefined action set. Existing methods primarily address single-person activities with fixed action sequences, overlooking multi-person scenarios. In this work, we pioneer textual reference-guided human action segmentation in multi-person set… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: The code is available at https://github.com/KPeng9510/HopaDIFF.git

  10. arXiv:2505.18194  [pdf, ps, other

    eess.SP cs.AI cs.CV

    Large Language Model-Driven Distributed Integrated Multimodal Sensing and Semantic Communications

    Authors: Yubo Peng, Luping Xiang, Bingxin Zhang, Kun Yang

    Abstract: Traditional single-modal sensing systems-based solely on either radio frequency (RF) or visual data-struggle to cope with the demands of complex and dynamic environments. Furthermore, single-device systems are constrained by limited perspectives and insufficient spatial coverage, which impairs their effectiveness in urban or non-line-of-sight scenarios. To overcome these challenges, we propose a n… ▽ More

    Submitted 30 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  11. arXiv:2505.17543  [pdf, ps, other

    cs.SD cs.MM eess.AS

    MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation

    Authors: Kaixing Yang, Xulong Tang, Ziqiao Peng, Yuxuan Hu, Jun He, Hongyan Liu

    Abstract: Music-driven 3D dance generation has attracted increasing attention in recent years, with promising applications in choreography, virtual reality, and creative content creation. Previous research has generated promising realistic dance movement from audio signals. However, traditional methods underutilize genre conditioning, often treating it as auxiliary modifiers rather than core semantic driver… ▽ More

    Submitted 31 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: text overlap with arXiv:2505.14222

  12. arXiv:2505.14222  [pdf, other

    cs.SD cs.GR cs.MM eess.AS

    MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis

    Authors: Kaixing Yang, Xulong Tang, Yuxuan Hu, Jiahao Yang, Hongyan Liu, Qinnan Zhang, Jun He, Zhaoxin Fan

    Abstract: Music-to-dance generation represents a challenging yet pivotal task at the intersection of choreography, virtual reality, and creative content generation. Despite its significance, existing methods face substantial limitation in achieving choreographic consistency. To address the challenge, we propose MatchDance, a novel framework for music-to-dance generation that constructs a latent representati… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  13. arXiv:2505.11618  [pdf, ps, other

    cs.AI cs.LG eess.SP

    Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges

    Authors: Pengrui Quan, Brian Wang, Kang Yang, Liying Han, Mani Srivastava

    Abstract: Spatiotemporal reasoning plays a key role in Cyber-Physical Systems (CPS). Despite advances in Large Language Models (LLMs) and Large Reasoning Models (LRMs), their capacity to reason about complex spatiotemporal signals remains underexplored. This paper proposes a hierarchical SpatioTemporal reAsoning benchmaRK, STARK, to systematically evaluate LLMs across three levels of reasoning complexity: s… ▽ More

    Submitted 27 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  14. arXiv:2505.07916  [pdf, ps, other

    eess.AS cs.SD

    MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

    Authors: Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He

    Abstract: We introduce MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech. A key innovation is our learnable speaker encoder, which extracts timbre features from a reference audio without requiring its transcription. This enables MiniMax-Speech to produce highly expressive speech with timbre consistent with the reference in a zero-shot manner, w… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  15. arXiv:2505.07219  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection

    Authors: Hongda Qin, Xiao Lu, Zhiyong Wei, Yihong Cao, Kailun Yang, Ningjiang Chen

    Abstract: Generalizing an object detector trained on a single domain to multiple unseen domains is a challenging task. Existing methods typically introduce image or feature augmentation to diversify the source domain to raise the robustness of the detector. Vision-Language Model (VLM)-based augmentation techniques have been proven to be effective, but they require that the detector's backbone has the same s… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: The source code and pre-trained models will be publicly available at https://github.com/qinhongda8/LDDS

  16. arXiv:2505.06122  [pdf, other

    eess.SY

    Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning

    Authors: Haokun Yu, Jingyuan Zhou, Kaidi Yang

    Abstract: This paper addresses the problem of parameter privacy-preserving data sharing in coupled systems, where a data provider shares data with a data user but wants to protect its sensitive parameters. The shared data affects not only the data user's decision-making but also the data provider's operations through system interactions. To trade off control performance and privacy, we propose an interactio… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 21 pages, 8 figures, accepted at the 7th Annual Learning for Dynamics and Control (L4DC) Conference, 2025

  17. arXiv:2505.03539  [pdf, other

    cs.CV cs.RO eess.IV

    Panoramic Out-of-Distribution Segmentation

    Authors: Mengfei Duan, Kailun Yang, Yuheng Zhang, Yihong Cao, Fei Teng, Kai Luo, Jiaming Zhang, Zhiyong Li, Shutao Li

    Abstract: Panoramic imaging enables capturing 360° images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to background clutter and pixel distortions. To address these issues, we introdu… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Code and datasets will be available at https://github.com/MengfeiD/PanOoS

  18. arXiv:2505.00991  [pdf, other

    cs.RO eess.SY

    DexCtrl: Towards Sim-to-Real Dexterity with Adaptive Controller Learning

    Authors: Shuqi Zhao, Ke Yang, Yuxin Chen, Chenran Li, Yichen Xie, Xiang Zhang, Changhao Wang, Masayoshi Tomizuka

    Abstract: Dexterous manipulation has seen remarkable progress in recent years, with policies capable of executing many complex and contact-rich tasks in simulation. However, transferring these policies from simulation to real world remains a significant challenge. One important issue is the mismatch in low-level controller dynamics, where identical trajectories can lead to vastly different contact forces an… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  19. arXiv:2504.14874  [pdf, other

    eess.SY

    Event triggered optimal formation control for nonlinear multi-agent systems under Denial-of-Service attacks

    Authors: Jianqiang Zhang, Kaijun Yang

    Abstract: This paper investigates the optimal formation control problem of a class of nonlinear multi-agent systems(MASs) under Denial-of-Service(DoS) attacks. We design the optimal formation control law using an event-triggered control scheme to achieve formation objectives under DoS attacks. Critic neural network (NN)-based approach is employed to achieve the optimal control policy under DoS attacks. Even… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  20. arXiv:2504.14653  [pdf, other

    cs.IT eess.SP

    Wireless Large AI Model: Shaping the AI-Native Future of 6G and Beyond

    Authors: Fenghao Zhu, Xinquan Wang, Xinyi Li, Maojun Zhang, Yixuan Chen, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Zhaoyang Zhang, Richeng Jin, Yongming Huang, Wei Feng, Tingting Yang, Baoming Bai, Feifei Gao, Kun Yang, Yuanwei Liu, Sami Muhaidat, Chau Yuen, Kaibin Huang, Kai-Kit Wong, Dusit Niyato, Mérouane Debbah

    Abstract: The emergence of sixth-generation and beyond communication systems is expected to fundamentally transform digital experiences through introducing unparalleled levels of intelligence, efficiency, and connectivity. A promising technology poised to enable this revolutionary vision is the wireless large AI model (WLAM), characterized by its exceptional capabilities in data processing, inference, and d… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  21. arXiv:2504.11966  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    Exploring Video-Based Driver Activity Recognition under Noisy Labels

    Authors: Linjuan Fan, Di Wen, Kunyu Peng, Kailun Yang, Jiaming Zhang, Ruiping Liu, Yufan Chen, Junwei Zheng, Jiamin Wu, Xudong Han, Rainer Stiefelhagen

    Abstract: As an open research topic in the field of deep learning, learning with noisy labels has attracted much attention and grown rapidly over the past ten years. Learning with label noise is crucial for driver distraction behavior recognition, as real-world video data often contains mislabeled samples, impacting model reliability and performance. However, label noise learning is barely explored in the d… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: The source code is available at https://github.com/ilonafan/DAR-noisy-labels

  22. arXiv:2503.23883  [pdf, ps, other

    eess.SP

    Algorithm Design and Prototype Validation for Reconfigurable Intelligent Sensing Surface: Forward-Only Transmission

    Authors: Cheng Luo, Luping Xiang, Jie Hu, Kun Yang

    Abstract: Sensing-assisted communication schemes have recently garnered significant research attention. In this work, we design a dual-function reconfigurable intelligent surface (RIS), integrating both active and passive elements, referred to as the reconfigurable intelligent sensing surface (RISS), to enhance communication. By leveraging sensing results from the active elements, we propose communication e… ▽ More

    Submitted 19 June, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  23. arXiv:2503.12419  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera

    Authors: Luming Wang, Hao Shi, Xiaoting Yin, Kailun Yang, Kaiwei Wang, Jian Bai

    Abstract: Egocentric gesture recognition is a pivotal technology for enhancing natural human-computer interaction, yet traditional RGB-based solutions suffer from motion blur and illumination variations in dynamic scenarios. While event cameras show distinct advantages in handling high dynamic range with ultra-low power consumption, existing RGB-based architectures face inherent limitations in processing as… ▽ More

    Submitted 13 April, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: The dataset and models are made available at https://github.com/3190105222/EgoEv_Gesture

  24. arXiv:2503.12233  [pdf, ps, other

    cs.IT eess.SP

    Robust Full-Space Physical Layer Security for STAR-RIS-Aided Wireless Networks: Eavesdropper with Uncertain Location and Channel

    Authors: Han Xiao, Xiaoyan Hu, Ang Li, Wenjie Wang, Kun Yang

    Abstract: A robust full-space physical layer security (PLS) transmission scheme is proposed in this paper considering the full-space wiretapping challenge of wireless networks supported by simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). Different from the existing schemes, the proposed PLS scheme takes account of the uncertainty on the eavesdropper's position within t… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  25. arXiv:2503.08726  [pdf, other

    cs.LG cs.AI eess.SP

    SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

    Authors: Yubo Peng, Luping Xiang, Kun Yang, Feibo Jiang, Kezhi Wang, Dapeng Oliver Wu

    Abstract: Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SI… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  26. arXiv:2503.08220  [pdf, other

    eess.SP

    Bedrock Models in Communication and Sensing: Advancing Generalization, Transferability, and Performance

    Authors: Cheng Luo, Luping Xiang, Jie Hu, Kun Yang

    Abstract: Deep learning (DL) has emerged as a powerful tool for addressing the intricate challenges inherent in communication and sensing systems, significantly enhancing the intelligence of future sixth-generation (6G) networks. A substantial body of research has highlighted the promise of DL-based techniques in these domains. However, in addition to improving accuracy, new challenges must be addressed reg… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  27. arXiv:2503.08202  [pdf, other

    eess.SP

    Low-Complexity Beamforming Design for Null Space-based Simultaneous Wireless Information and Power Transfer Systems

    Authors: Cheng Luo, Jie Hu, Luping Xiang, Kun Yang

    Abstract: Simultaneous wireless information and power transfer (SWIPT) is a promising technology for the upcoming sixth-generation (6G) communication networks, enabling internet of things (IoT) devices and sensors to extend their operational lifetimes. In this paper, we propose a SWIPT scheme by projecting the interference signals from both intra-wireless information transfer (WIT) and inter-wireless energy… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  28. arXiv:2503.08198  [pdf, other

    eess.SP

    Reconfigurable Intelligent Sensing Surface enables Wireless Powered Communication Networks: Interference Suppression and Massive Wireless Energy Transfer

    Authors: Cheng Luo, Jie Hu, Luping Xiang, Kun Yang

    Abstract: Recently, a novel structures of reconfigurable intelligent surface (RIS) integrating both passive and active elements, termed reconfigurable intelligent sensing surface (RISS), efficiently addresses challenges in RIS channel estimation and mitigates issues related to multiplicative path loss by processing the signal at the RISS. In this paper, we propose a sensing-assisted wirelessly powered commu… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  29. arXiv:2503.07252  [pdf, other

    cs.CV eess.IV eess.SP

    Semantic Communications with Computer Vision Sensing for Edge Video Transmission

    Authors: Yubo Peng, Luping Xiang, Kun Yang, Kezhi Wang, Merouane Debbah

    Abstract: Despite the widespread adoption of vision sensors in edge applications, such as surveillance, the transmission of video data consumes substantial spectrum resources. Semantic communication (SC) offers a solution by extracting and compressing information at the semantic level, preserving the accuracy and relevance of transmitted data while significantly reducing the volume of transmitted informatio… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  30. arXiv:2503.06821  [pdf, other

    cs.CV cs.RO eess.IV

    HierDAMap: Towards Universal Domain Adaptive BEV Mapping via Hierarchical Perspective Priors

    Authors: Siyu Li, Yihong Cao, Hao Shi, Yongsheng Zang, Xuan He, Kailun Yang, Zhiyong Li

    Abstract: The exploration of Bird's-Eye View (BEV) mapping technology has driven significant innovation in visual perception technology for autonomous driving. BEV mapping models need to be applied to the unlabeled real world, making the study of unsupervised domain adaptation models an essential path. However, research on unsupervised domain adaptation for BEV mapping remains limited and cannot perfectly a… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/HierDAMap

  31. arXiv:2503.06125  [pdf, other

    eess.IV cs.CV

    RGB-Phase Speckle: Cross-Scene Stereo 3D Reconstruction via Wrapped Pre-Normalization

    Authors: Kai Yang, Zijian Bai, Yang Xiao, Xinyu Li, Xiaohan Shi

    Abstract: 3D reconstruction garners increasing attention alongside the advancement of high-level image applications, where dense stereo matching (DSM) serves as a pivotal technique. Previous studies often rely on publicly available datasets for training, focusing on modifying network architectures or incorporating specialized modules to extract domain-invariant features and thus improve model robustness. In… ▽ More

    Submitted 17 April, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: Submitted to ICCV 2025

  32. arXiv:2503.04565  [pdf, other

    cs.CV cs.RO eess.IV

    Omnidirectional Multi-Object Tracking

    Authors: Kai Luo, Hao Shi, Sheng Wu, Fei Teng, Mengfei Duan, Chang Huang, Yuhang Wang, Kaiwei Wang, Kailun Yang

    Abstract: Panoramic imagery, with its 360° field of view, offers comprehensive information to support Multi-Object Tracking (MOT) in capturing spatial and temporal relationships of surrounding objects. However, most MOT algorithms are tailored for pinhole images with limited views, impairing their effectiveness in panoramic settings. Additionally, panoramic image distortions, such as resolution loss, geomet… ▽ More

    Submitted 23 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025. The established dataset and source code are available at https://github.com/xifen523/OmniTrack

  33. arXiv:2503.02600  [pdf, other

    cs.CV cs.RO eess.IV

    Resource-Efficient Affordance Grounding with Complementary Depth and Semantic Prompts

    Authors: Yizhou Huang, Fan Yang, Guoliang Zhu, Gen Li, Hao Shi, Yukun Zuo, Wenrui Chen, Zhiyong Li, Kailun Yang

    Abstract: Affordance refers to the functional properties that an agent perceives and utilizes from its environment, and is key perceptual information required for robots to perform actions. This information is rich and multimodal in nature. Existing multimodal affordance methods face limitations in extracting useful information, mainly due to simple structural designs, basic fusion methods, and large model… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: The source code will be made publicly available at https://github.com/DAWDSE/BiT-Align

  34. arXiv:2503.02581  [pdf, other

    cs.CV cs.RO eess.IV

    Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

    Authors: Jiayi Zhao, Fei Teng, Kai Luo, Guoqiang Zhao, Zhiyong Li, Xu Zheng, Kailun Yang

    Abstract: The perception capability of robotic systems relies on the richness of the dataset. Although Segment Anything Model 2 (SAM2), trained on large datasets, demonstrates strong perception potential in perception tasks, its inherent training paradigm prevents it from being suitable for RGB-T tasks. To address these challenges, we propose SHIFNet, a novel SAM2-driven Hybrid Interaction Paradigm that unl… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: The source code will be made publicly available at https://github.com/iAsakiT3T/SHIFNet

  35. arXiv:2503.02578  [pdf, other

    cs.CV cs.RO eess.IV

    TS-CGNet: Temporal-Spatial Fusion Meets Centerline-Guided Diffusion for BEV Mapping

    Authors: Xinying Hong, Siyu Li, Kang Zeng, Hao Shi, Bomin Peng, Kailun Yang, Zhiyong Li

    Abstract: Bird's Eye View (BEV) perception technology is crucial for autonomous driving, as it generates top-down 2D maps for environment perception, navigation, and decision-making. Nevertheless, the majority of current BEV map generation studies focusing on visual map generation lack depth-aware reasoning capabilities. They exhibit limited efficacy in managing occlusions and handling complex environments,… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: The source code will be publicly available at https://github.com/krabs-H/TS-CGNet

  36. arXiv:2503.01092  [pdf, other

    cs.CV cs.RO eess.IV

    One-Shot Affordance Grounding of Deformable Objects in Egocentric Organizing Scenes

    Authors: Wanjun Jia, Fan Yang, Mengfei Duan, Xianchi Chen, Yinxi Wang, Yiming Jiang, Wenrui Chen, Kailun Yang, Zhiyong Li

    Abstract: Deformable object manipulation in robotics presents significant challenges due to uncertainties in component properties, diverse configurations, visual interference, and ambiguous prompts. These factors complicate both perception and control tasks. To address these challenges, we propose a novel method for One-Shot Affordance Grounding of Deformable Objects (OS-AGDO) in egocentric organizing scene… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: Source code and benchmark dataset will be publicly available at https://github.com/Dikay1/OS-AGDO

  37. arXiv:2503.00747  [pdf, other

    cs.CV cs.RO eess.IV

    Unifying Light Field Perception with Field of Parallax

    Authors: Fei Teng, Buyin Deng, Boyuan Zheng, Kai Luo, Kunyu Peng, Jiaming Zhang, Kailun Yang

    Abstract: Field of Parallax (FoP)}, a spatial field that distills the common features from different LF representations to provide flexible and consistent support for multi-task learning. FoP is built upon three core features--projection difference, adjacency divergence, and contextual consistency--which are essential for cross-task adaptability. To implement FoP, we design a two-step angular adapter: the f… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: The source code will be made publicly available at https://github.com/warriordby/LFX

  38. arXiv:2502.20018  [pdf, other

    cs.RO cs.CV eess.IV

    Multi-Keypoint Affordance Representation for Functional Dexterous Grasping

    Authors: Fan Yang, Dongsheng Luo, Wenrui Chen, Jiacheng Lin, Junjie Cai, Kailun Yang, Zhiyong Li, Yaonan Wang

    Abstract: Functional dexterous grasping requires precise hand-object interaction, going beyond simple gripping. Existing affordance-based methods primarily predict coarse interaction regions and cannot directly constrain the grasping posture, leading to a disconnection between visual perception and manipulation. To address this issue, we propose a multi-keypoint affordance representation for functional dext… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: The source code and demo videos will be publicly available at https://github.com/PopeyePxx/MKA

  39. arXiv:2502.16466  [pdf, other

    eess.SY

    Robust Nonlinear Data-Driven Predictive Control for Mixed Vehicle Platoons via Koopman Operator and Reachability Analysis

    Authors: Shuai Li, Jiawei Wang, Kaidi Yang, Qing Xu, Jianqiang Wang, Keqiang Li

    Abstract: Mixed vehicle platoons, comprising connected and automated vehicles (CAVs) and human-driven vehicles (HDVs), hold significant potential for enhancing traffic performance. Most existing research assumes linear system dynamics and often ignores the impact of critical factors such as noise, disturbances, and attacks, which are inherent to real-world scenarios. To address these limitations, we propose… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

    Comments: 16 pages, 6 figures

  40. arXiv:2502.16419  [pdf, other

    cs.CV cs.RO eess.IV

    DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion

    Authors: Jianbin Jiao, Xina Cheng, Kailun Yang, Xiangrong Zhang, Licheng Jiao

    Abstract: 3D human pose estimation has wide applications in fields such as intelligent surveillance, motion capture, and virtual reality. However, in real-world scenarios, issues such as occlusion, noise interference, and missing viewpoints can severely affect pose estimation. To address these challenges, we introduce the task of Deficiency-Aware 3D Pose Estimation. Traditional 3D pose estimation methods of… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: The source code will be available at https://github.com/WUJINHUAN/DeProPose

  41. arXiv:2502.10812  [pdf, other

    eess.IV cs.IT

    ResiComp: Loss-Resilient Image Compression via Dual-Functional Masked Visual Token Modeling

    Authors: Sixian Wang, Jincheng Dai, Xiaoqi Qin, Ke Yang, Kai Niu, Ping Zhang

    Abstract: Recent advancements in neural image codecs (NICs) are of significant compression performance, but limited attention has been paid to their error resilience. These resulting NICs tend to be sensitive to packet losses, which are prevalent in real-time communications. In this paper, we investigate how to elevate the resilience ability of NICs to combat packet losses. We propose ResiComp, a pion… ▽ More

    Submitted 28 February, 2025; v1 submitted 15 February, 2025; originally announced February 2025.

    Comments: Accepted by IEEE TCSVT

  42. arXiv:2502.05330  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

    Authors: Muhammad Imran, Jonathan R. Krebs, Vishal Balaji Sivaraman, Teng Zhang, Amarjeet Kumar, Walker R. Ueland, Michael J. Fassler, Jinlong Huang, Xiao Sun, Lisheng Wang, Pengcheng Shi, Maximilian Rokuss, Michael Baumgartner, Yannick Kirchhof, Klaus H. Maier-Hein, Fabian Isensee, Shuolin Liu, Bing Han, Bong Thanh Nguyen, Dong-jin Shin, Park Ji-Woo, Mathew Choi, Kwang-Hyun Uhm, Sung-Jea Ko, Chanwoong Lee , et al. (38 additional authors not shown)

    Abstract: Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  43. arXiv:2502.04735  [pdf, other

    eess.SP

    Affine Frequency Division Multiplexing: Extending OFDM for Scenario-Flexibility and Resilience

    Authors: Haoran Yin, Yanqun Tang, Ali Bemani, Marios Kountouris, Yu Zhou, Xingyao Zhang, Yuqing Liu, Gaojie Chen, Kai Yang, Fan Liu, Christos Masouros, Shuangyang Li, Giuseppe Caire, Pei Xiao

    Abstract: Next-generation wireless networks are conceived to provide reliable and high-data-rate communication services for diverse scenarios, such as vehicle-to-vehicle, unmanned aerial vehicles, and satellite networks. The severe Doppler spreads in the underlying time-varying channels induce destructive inter-carrier interference (ICI) in the extensively adopted orthogonal frequency division multiplexing… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: Magazine paper submitted to IEEE

  44. arXiv:2502.02334  [pdf, other

    cs.CV cs.RO eess.IV

    Event-aided Semantic Scene Completion

    Authors: Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang

    Abstract: Autonomous driving systems rely on robust 3D scene understanding. Recent advances in Semantic Scene Completion (SSC) for autonomous driving underscore the limitations of RGB-based approaches, which struggle under motion blur, poor lighting, and adverse weather. Event cameras, offering high dynamic range and low latency, address these challenges by providing asynchronous data that complements RGB i… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: The established datasets and codebase will be made publicly at https://github.com/Pandapan01/EvSSC

  45. arXiv:2501.16803  [pdf, other

    cs.RO cs.CV cs.NI eess.IV

    RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

    Authors: Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun

    Abstract: Cooperative perception offers an optimal solution to overcome the perception limitations of single-agent systems by leveraging Vehicle-to-Everything (V2X) communication for data sharing and fusion across multiple agents. However, most existing approaches focus on single-modality data exchange, limiting the potential of both homogeneous and heterogeneous fusion across agents. This overlooks the opp… ▽ More

    Submitted 31 March, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

  46. Design-Agnostic Distributed Timing Fault Injection Monitor With End-to-End Design Automation

    Authors: Yan He, Yumin Su, Kaiyuan Yang

    Abstract: Fault injection attacks induce hardware failures in circuits and exploit these faults to compromise the security of the system. It has been demonstrated that FIAs can bypass system security mechanisms, cause faulty outputs, and gain access to secret information. Certain types of FIAs can be mounted with little effort by tampering with clock signals and or the chip operating conditions. To mitigate… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 12 pages, 26 figures

    Journal ref: IEEE Journal of Solid-State Circuits, 04 December 2024

  47. arXiv:2501.06700  [pdf, other

    cs.IT cs.LG cs.NI eess.SP

    Average Reward Reinforcement Learning for Wireless Radio Resource Management

    Authors: Kun Yang, Jing Yang, Cong Shen

    Abstract: In this paper, we address a crucial but often overlooked issue in applying reinforcement learning (RL) to radio resource management (RRM) in wireless communications: the mismatch between the discounted reward RL formulation and the undiscounted goal of wireless network optimization. To the best of our knowledge, we are the first to systematically investigate this discrepancy, starting with a discu… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

    Comments: Accepted by Asilomar 2024

  48. arXiv:2501.03880  [pdf, other

    eess.IV cs.CV cs.LG

    SELMA3D challenge: Self-supervised learning for 3D light-sheet microscopy image segmentation

    Authors: Ying Chen, Rami Al-Maskari, Izabela Horvath, Mayar Ali, Luciano Hoher, Kaiyuan Yang, Zengming Lin, Zhiwei Zhai, Mengzhe Shen, Dejin Xun, Yi Wang, Tony Xu, Maged Goubran, Yunheng Wu, Kensaku Mori, Johannes C. Paetzold, Ali Erturk

    Abstract: Recent innovations in light sheet microscopy, paired with developments in tissue clearing techniques, enable the 3D imaging of large mammalian tissues with cellular resolution. Combined with the progress in large-scale data analysis, driven by deep learning, these innovations empower researchers to rapidly investigate the morphological and functional properties of diverse biological samples. Segme… ▽ More

    Submitted 12 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: 2st version

  49. arXiv:2412.19123  [pdf, other

    cs.SD cs.MM eess.AS

    CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition

    Authors: Kaixing Yang, Xulong Tang, Haoyu Wu, Qinliang Xue, Biao Qin, Hongyan Liu, Zhaoxin Fan

    Abstract: Dance generation is crucial and challenging, particularly in domains like dance performance and virtual gaming. In the current body of literature, most methodologies focus on Solo Music2Dance. While there are efforts directed towards Group Music2Dance, these often suffer from a lack of coherence, resulting in aesthetically poor dance performances. Thus, we introduce CoheDancers, a novel framework… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

  50. arXiv:2412.18342  [pdf, other

    cs.CV cs.LG eess.IV

    Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization

    Authors: Kunyu Peng, Di Wen, Sarfraz M. Saquib, Yufan Chen, Junwei Zheng, David Schneider, Kailun Yang, Jiamin Wu, Alina Roitberg, Rainer Stiefelhagen

    Abstract: Open-Set Domain Generalization (OSDG) is a challenging task requiring models to accurately predict familiar categories while minimizing confidence for unknown categories to effectively reject them in unseen domains. While the OSDG field has seen considerable advancements, the impact of label noise--a common issue in real-world datasets--has been largely overlooked. Label noise can mislead model op… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: The source code of this work is released at https://github.com/KPeng9510/HyProMeta