Skip to main content

Showing 1–21 of 21 results for author: Shan, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.04061  [pdf, other

    cs.RO cs.AI

    Mapping at First Sense: A Lightweight Neural Network-Based Indoor Structures Prediction Method for Robot Autonomous Exploration

    Authors: Haojia Gao, Haohua Que, Kunrong Li, Weihao Shan, Mingkai Liu, Rong Zhao, Lei Mu, Xinghua Yang, Qi Wei, Fei Qiao

    Abstract: Autonomous exploration in unknown environments is a critical challenge in robotics, particularly for applications such as indoor navigation, search and rescue, and service robotics. Traditional exploration strategies, such as frontier-based methods, often struggle to efficiently utilize prior knowledge of structural regularities in indoor spaces. To address this limitation, we propose Mapping at F… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  2. arXiv:2503.16000  [pdf, other

    cs.CV

    SenseExpo: Efficient Autonomous Exploration with Prediction Information from Lightweight Neural Networks

    Authors: Haojia Gao, Haohua Que, Hoiian Au, Weihao Shan, Mingkai Liu, Yusen Qin, Lei Mu, Rong Zhao, Xinghua Yang, Qi Wei, Fei Qiao

    Abstract: This paper proposes SenseExpo, an efficient autonomous exploration framework based on a lightweight prediction network, which addresses the limitations of traditional methods in computational overhead and environmental generalization. By integrating Generative Adversarial Networks (GANs), Transformer, and Fast Fourier Convolution (FFC), we designed a lightweight prediction model with merely 709k p… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  3. arXiv:2502.15178  [pdf, other

    eess.AS cs.SD

    Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders

    Authors: Weiqiao Shan, Yuang Li, Yuhao Zhang, Yingfeng Luo, Chen Xu, Xiaofeng Zhao, Long Meng, Yunfei Lu, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu

    Abstract: Connecting audio encoders with large language models (LLMs) allows the LLM to perform various audio understanding tasks, such as automatic speech recognition (ASR) and audio captioning (AC). Most research focuses on training an adapter layer to generate a unified audio feature for the LLM. However, different tasks may require distinct features that emphasize either semantic or acoustic aspects, ma… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 12 pages,4 figures, 7 tables

  4. arXiv:2501.08057  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Optimizing Speech Multi-View Feature Fusion through Conditional Computation

    Authors: Weiqiao Shan, Yuhao Zhang, Yuchen Han, Bei Li, Xiaofeng Zhao, Yuang Li, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu

    Abstract: Recent advancements have highlighted the efficacy of self-supervised learning (SSL) features in various speech-related tasks, providing lightweight and versatile multi-view speech representations. However, our study reveals that while SSL features expedite model convergence, they conflict with traditional spectral features like FBanks in terms of update directions. In response, we propose a novel… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: ICASSP 2025

  5. arXiv:2412.01455  [pdf, other

    cs.CL

    Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization

    Authors: Weiqiao Shan, Long Meng, Tong Zheng, Yingfeng Luo, Bei Li, junxin Wang, Tong Xiao, Jingbo Zhu

    Abstract: Large language models (LLMs) exhibit exceptional performance across various downstream tasks. However, they encounter limitations due to slow inference speeds stemming from their extensive parameters. The early exit (EE) is an approach that aims to accelerate auto-regressive decoding. EE generates outputs from intermediate layers instead of using the whole model, which offers a promising solution… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  6. arXiv:2411.16238  [pdf, other

    cs.AR

    UVLLM: An Automated Universal RTL Verification Framework using LLMs

    Authors: Yuchen Hu, Junhao Ye, Ke Xu, Jialin Sun, Shiyue Zhang, Xinyao Jiao, Dingrong Pan, Jie Zhou, Ning Wang, Weiwei Shan, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang

    Abstract: Verifying hardware designs in embedded systems is crucial but often labor-intensive and time-consuming. While existing solutions have improved automation, they frequently rely on unrealistic assumptions. To address these challenges, we introduce a novel framework, UVLLM, which combines Large Language Models (LLMs) with the Universal Verification Methodology (UVM) to relax these assumptions. UVLLM… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  7. arXiv:2409.08534  [pdf, other

    cs.AR

    AnalogGym: An Open and Practical Testing Suite for Analog Circuit Synthesis

    Authors: Jintao Li, Haochang Zhi, Ruiyu Lyu, Wangzhen Li, Zhaori Bi, Keren Zhu, Yanhan Zeng, Weiwei Shan, Changhao Yan, Fan Yang, Yun Li, Xuan Zeng

    Abstract: Recent advances in machine learning (ML) for automating analog circuit synthesis have been significant, yet challenges remain. A critical gap is the lack of a standardized evaluation framework, compounded by various process design kits (PDKs), simulation tools, and a limited variety of circuit topologies. These factors hinder direct comparisons and the validation of algorithms. To address these sh… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  8. arXiv:2408.08708  [pdf, other

    cs.CV

    Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

    Authors: Kaixiang Yang, Wenqi Shan, Xudong Li, Xuan Wang, Xikai Yang, Xi Wang, Pheng-Ann Heng, Qiang Li, Zhiwei Wang

    Abstract: Multi-modal brain tumor segmentation typically involves four magnetic resonance imaging (MRI) modalities, while incomplete modalities significantly degrade performance. Existing solutions employ explicit or implicit modality adaptation, aligning features across modalities or learning a fused feature robust to modality incompleteness. They share a common goal of encouraging each modality to express… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures

  9. arXiv:2405.06840  [pdf, other

    cs.AR cs.SE

    MEIC: Re-thinking RTL Debug Automation using LLMs

    Authors: Ke Xu, Jialin Sun, Yuchen Hu, Xinwei Fang, Weiwei Shan, Xi Wang, Zhe Jiang

    Abstract: The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  10. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 25 September, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Expanded manuscript (compared to arxiv v1 from Nov 2023 and CVPR 2024 paper from June 2024) for more comprehensive dataset and benchmark presentation, plus new results on v2 data release

  11. arXiv:2310.14921  [pdf, other

    cs.CL cs.AI

    PartialFormer: Modeling Part Instead of Whole for Machine Translation

    Authors: Tong Zheng, Bei Li, Huiwen Bao, Jiale Wang, Weiqiao Shan, Tong Xiao, Jingbo Zhu

    Abstract: The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead. In this work, we emphasize the importance of hidden dimensions in designing lightweight FFNs, a factor often overlooked in previous architectures. Guided by this principle, we introduce PartialFormer, a parameter-efficient Transformer architecture utilizing multiple sma… ▽ More

    Submitted 5 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by ACL2024 Findings

  12. arXiv:2308.11088  [pdf, other

    cs.AI cs.MA

    Collaborative Route Planning of UAVs, Workers and Cars for Crowdsensing in Disaster Response

    Authors: Lei Han, Chunyu Tu, Zhiwen Yu, Zhiyong Yu, Weihua Shan, Liang Wang, Bin Guo

    Abstract: Efficiently obtaining the up-to-date information in the disaster-stricken area is the key to successful disaster response. Unmanned aerial vehicles (UAVs), workers and cars can collaborate to accomplish sensing tasks, such as data collection, in disaster-stricken areas. In this paper, we explicitly address the route planning for a group of agents, including UAVs, workers, and cars, with the goal o… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  13. arXiv:2306.03576  [pdf, other

    cs.CV

    Human 3D Avatar Modeling with Implicit Neural Representation: A Brief Survey

    Authors: Mingyang Sun, Dingkang Yang, Dongliang Kou, Yang Jiang, Weihua Shan, Zhe Yan, Lihua Zhang

    Abstract: A human 3D avatar is one of the important elements in the metaverse, and the modeling effect directly affects people's visual experience. However, the human body has a complex topology and diverse details, so it is often expensive, time-consuming, and laborious to build a satisfactory model. Recent studies have proposed a novel method, implicit neural representation, which is a continuous represen… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: A Brief Survey

  14. arXiv:2303.11579  [pdf, other

    cs.CV

    Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation

    Authors: Wenkang Shan, Zhenhua Liu, Xinfeng Zhang, Zhao Wang, Kai Han, Shanshe Wang, Siwei Ma, Wen Gao

    Abstract: In this paper, a novel Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed for probabilistic 3D human pose estimation. On the one hand, D3DP generates multiple possible 3D pose hypotheses for a single 2D observation. It gradually diffuses the ground truth 3D poses to a random distribution, and learns a denoiser conditi… ▽ More

    Submitted 22 August, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: ICCV 2023

  15. arXiv:2301.01914  [pdf

    cs.CV

    Accuracy and Fidelity Comparison of Luna and DALL-E 2 Diffusion-Based Image Generation Systems

    Authors: Michael Cahyadi, Muhammad Rafi, William Shan, Jurike Moniaga, Henry Lucky

    Abstract: We qualitatively examine the accuracy and fidelity between two diffusion-based image generation systems, namely DALL-E 2 and Luna, which have massive differences in training datasets, algorithmic approaches, prompt resolvement, and output upscaling. The methodology used is a qualitative benchmark created by Saharia et al. and in our research we conclude that DALL-E 2 significantly edges Luna in bo… ▽ More

    Submitted 27 February, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  16. arXiv:2203.07628  [pdf, other

    cs.CV

    P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

    Authors: Wenkang Shan, Zhenhua Liu, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao

    Abstract: This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task. To reduce the difficulty of capturing spatial and temporal information, we divide this task into two stages: pre-training (Stage I) and fine-tuning (Stage II). In Stage I, a self-supervised pre-training sub-task, termed masked pose modeling, is proposed. The human joints i… ▽ More

    Submitted 28 July, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: ECCV 2022

  17. Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation

    Authors: Wenkang Shan, Haopeng Lu, Shanshe Wang, Xinfeng Zhang, Wen Gao

    Abstract: Most of the existing 3D human pose estimation approaches mainly focus on predicting 3D positional relationships between the root joint and other human joints (local motion) instead of the overall trajectory of the human body (global motion). Despite the great progress achieved by these approaches, they are not robust to global motion, and lack the ability to accurately predict local motion with a… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: In Proceedings of the 29th ACM International Conference on Multimedia (MM '21)

  18. arXiv:2104.09749  [pdf, ps, other

    cs.CE

    Interpolation of Microscale Stress and Strain Fields Based on Mechanical Models

    Authors: Wenzhe Shan, Udo Nackenhorst

    Abstract: In this short contribution we introduce a new procedure to recover the stress and strain fields for particle systems by mechanical models. Numerical tests for simple loading conditions have shown an excellent match between the estimated values and the reference values. The estimated stress field is also consistent with the so called Quasicontinuum stress field, which suggests its potential applica… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: 16 pages, 7 figures

  19. arXiv:2104.09746  [pdf, other

    cs.CE

    Computing Arlequin coupling coefficient for concurrent FE-MD approaches

    Authors: Wenzhe Shan, Udo Nackenhorst

    Abstract: Arlequin coupling coefficient is essential for concurrent FE-MD models with overlapping domains, but the calculation of its value is quite difficult when the geometry of the coupling region is complicated. In this work, we introduce a general procedure for the preprocessing of a concurrent FE-MD model, given that the mesh and atoms have already been created. The procedure is independent of the geo… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: 19 pages, 19 figures

  20. arXiv:2102.00369  [pdf, other

    cs.CV cs.LG

    Spectral Roll-off Points Variations: Exploring Useful Information in Feature Maps by Its Variations

    Authors: Yunkai Yu, Yuyang You, Zhihong Yang, Guozheng Liu, Peiyao Li, Zhicheng Yang, Wenjing Shan

    Abstract: Useful information (UI) is an elusive concept in neural networks. A quantitative measurement of UI is absent, despite the variations of UI can be recognized by prior knowledge. The communication bandwidth of feature maps decreases after downscaling operations, but UI flows smoothly after training due to lower Nyquist frequency. Inspired by the low-Nyqusit-frequency nature of UI, we propose the use… ▽ More

    Submitted 12 August, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

    Comments: 11 pages, 5 figures. This work has been submitted to the IEEE for possible publication

  21. arXiv:2002.03793  [pdf, other

    cs.LG stat.ML

    Adversarial Data Encryption

    Authors: Yingdong Hu, Liang Zhang, Wei Shan, Xiaoxiao Qin, Jing Qi, Zhenzhou Wu, Yang Yuan

    Abstract: In the big data era, many organizations face the dilemma of data sharing. Regular data sharing is often necessary for human-centered discussion and communication, especially in medical scenarios. However, unprotected data sharing may also lead to data leakage. Inspired by adversarial attack, we propose a method for data encryption, so that for human beings the encrypted data look identical to the… ▽ More

    Submitted 11 February, 2020; v1 submitted 10 February, 2020; originally announced February 2020.