Skip to main content

Showing 1–50 of 201 results for author: Wei, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08712  [pdf, ps, other

    cs.RO

    NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance

    Authors: Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, Meng Wei, Hanqing Wang, Yilun Chen, Tai Wang, Jiangmiao Pang

    Abstract: Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in divers… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 6 figures

  2. arXiv:2505.04369  [pdf, other

    cs.CV

    WDMamba: When Wavelet Degradation Prior Meets Vision Mamba for Image Dehazing

    Authors: Jie Sun, Heng Liu, Yongzhen Wang, Xiao-Ping Zhang, Mingqiang Wei

    Abstract: In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail en… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  3. arXiv:2505.03981  [pdf, ps, other

    cs.AI cs.CL cs.LG

    X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

    Authors: Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon

    Abstract: Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  4. arXiv:2504.21497  [pdf, other

    cs.CV

    MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance

    Authors: Mengting Wei, Yante Li, Tuomas Varanka, Yan Jiang, Guoying Zhao

    Abstract: In this study, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion control in existing video-based face generation approaches. Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation, providing a unif… ▽ More

    Submitted 10 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  5. arXiv:2504.17414  [pdf, ps, other

    cs.CV

    3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models

    Authors: Min Wei, Chaohui Yu, Jingkai Zhou, Fan Wang

    Abstract: Video try-on replaces clothing in videos with target garments. Existing methods struggle to generate high-quality and temporally consistent results when handling complex clothing patterns and diverse body poses. We present 3DV-TON, a novel diffusion-based framework for generating high-fidelity and temporally consistent video try-on results. Our approach employs generated animatable textured 3D mes… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Project page: https://2y7c3.github.io/3DV-TON/

  6. arXiv:2504.15223  [pdf

    cs.LG

    A Deep Learning Framework for Sequence Mining with Bidirectional LSTM and Multi-Scale Attention

    Authors: Tao Yang, Yu Cheng, Yaokun Ren, Yujia Lou, Minggu Wei, Honghui Xin

    Abstract: This paper addresses the challenges of mining latent patterns and modeling contextual dependencies in complex sequence data. A sequence pattern mining algorithm is proposed by integrating Bidirectional Long Short-Term Memory (BiLSTM) with a multi-scale attention mechanism. The BiLSTM captures both forward and backward dependencies in sequences, enhancing the model's ability to perceive global cont… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  7. arXiv:2504.14977  [pdf, other

    cs.CV

    RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

    Authors: Jingkai Zhou, Yifan Wu, Shikai Li, Min Wei, Chao Fan, Weihua Chen, Wei Jiang, Fan Wang

    Abstract: Controllable character animation remains a challenging problem, particularly in handling rare poses, stylized characters, character-object interactions, complex illumination, and dynamic scenes. To tackle these issues, prior work has largely focused on injecting pose and appearance guidance via elaborate bypass networks, but often struggles to generalize to open-world scenarios. In this paper, we… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Project Page: https://thefoxofsky.github.io/project_pages_new/RealisDance-DiT/index

  8. arXiv:2504.08685  [pdf, other

    cs.CV cs.AI

    Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

    Authors: Team Seawead, Ceyuan Yang, Zhijie Lin, Yang Zhao, Shanchuan Lin, Zhibei Ma, Haoyuan Guo, Hao Chen, Lu Qi, Sen Wang, Feng Cheng, Feilong Zuo, Xuejiao Zeng, Ziyan Yang, Fangyuan Kong, Meng Wei, Zhiwu Qing, Fei Xiao, Tuyen Hoang, Siyu Zhang, Peihao Zhu, Qi Zhao, Jiangqiao Yan, Liangke Gui, Sheng Bi , et al. (30 additional authors not shown)

    Abstract: This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary… ▽ More

    Submitted 4 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Technical report (some typos fixed)

  9. arXiv:2504.05135  [pdf, other

    cs.CV

    DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration

    Authors: Jiamei Xiong, Xuefeng Yan, Yongzhen Wang, Wei Zhao, Xiao-Ping Zhang, Mingqiang Wei

    Abstract: Image restoration under adverse weather conditions is a critical task for many vision-based applications. Recent all-in-one frameworks that handle multiple weather degradations within a unified model have shown potential. However, the diversity of degradation patterns across different weather conditions, as well as the complex and varied nature of real-world degradations, pose significant challeng… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  10. arXiv:2503.15949  [pdf, other

    cs.CV

    CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention

    Authors: Yaxiong Chen, Minghong Wei, Zixuan Zheng, Jingliang Hu, Yilei Shi, Shengwu Xiong, Xiao Xiang Zhu, Lichao Mou

    Abstract: Referring medical image segmentation targets delineating lesions indicated by textual descriptions. Aligning visual and textual cues is challenging due to their distinct data properties. Inspired by large-scale pre-trained vision-language models, we propose CausalCLIPSeg, an end-to-end framework for referring medical image segmentation that leverages CLIP. Despite not being trained on medical data… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: MICCAI 2024

  11. arXiv:2503.15144  [pdf, other

    cs.CV

    PointSFDA: Source-free Domain Adaptation for Point Cloud Completion

    Authors: Xing He, Zhe Zhu, Liangliang Nan, Honghua Chen, Jing Qin, Mingqiang Wei

    Abstract: Conventional methods for point cloud completion, typically trained on synthetic datasets, face significant challenges when applied to out-of-distribution real-world scans. In this paper, we propose an effective yet simple source-free domain adaptation framework for point cloud completion, termed \textbf{PointSFDA}. Unlike unsupervised domain adaptation that reduces the domain gap by directly lever… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  12. arXiv:2503.11038  [pdf, other

    cs.CV

    ACMo: Attribute Controllable Motion Generation

    Authors: Mingjie Wei, Xuemei Xie, Guangming Shi

    Abstract: Attributes such as style, fine-grained text, and trajectory are specific conditions for describing motion. However, existing methods often lack precise user control over motion attributes and suffer from limited generalizability to unseen motions. This work introduces an Attribute Controllable Motion generation architecture, to address these challenges via decouple any conditions and control them… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  13. arXiv:2503.10592  [pdf, other

    cs.CV

    CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

    Authors: Hao He, Ceyuan Yang, Shanchuan Lin, Yinghao Xu, Meng Wei, Liangke Gui, Qi Zhao, Gordon Wetzstein, Lu Jiang, Hongsheng Li

    Abstract: This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. Previous camera-conditioned video generative models suffer from diminished video dynamics and limited range of viewpoints when generating videos with large camera movement. We take an approach that progressively expands the generation of dynamic sce… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Project page: https://hehao13.github.io/Projects-CameraCtrl-II/

  14. arXiv:2503.02841  [pdf, other

    cs.CV

    Boltzmann Attention Sampling for Image Analysis with Small Objects

    Authors: Theodore Zhao, Sid Kiblawi, Naoto Usuyama, Ho Hin Lee, Sam Preston, Hoifung Poon, Mu Wei

    Abstract: Detecting and segmenting small objects, such as lung nodules and tumor lesions, remains a critical challenge in image analysis. These objects often occupy less than 0.1% of an image, making traditional transformer architectures inefficient and prone to performance degradation due to redundant attention computations on irrelevant regions. Existing sparse attention mechanisms rely on rigid hierarchi… ▽ More

    Submitted 26 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  15. arXiv:2503.00801  [pdf, other

    cs.CV

    STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds

    Authors: Zikuan Li, Honghua Chen, Yuecheng Wang, Sibo Wu, Mingqiang Wei, Jun Wang

    Abstract: Extracting geometric edges from unstructured point clouds remains a significant challenge, particularly in thin-walled structures that are commonly found in everyday objects. Traditional geometric methods and recent learning-based approaches frequently struggle with these structures, as both rely heavily on sufficient contextual information from local point neighborhoods. However, 3D measurement d… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  16. arXiv:2502.19896  [pdf, other

    cs.CV

    GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors

    Authors: An Li, Zhe Zhu, Mingqiang Wei

    Abstract: Existing point cloud completion methods, which typically depend on predefined synthetic training datasets, encounter significant challenges when applied to out-of-distribution, real-world scans. To overcome this limitation, we introduce a zero-shot completion framework, termed GenPC, designed to reconstruct high-quality real-world scans by leveraging explicit 3D generative priors. Our key insight… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted by CVPR 2025

  17. arXiv:2502.17053  [pdf, other

    cs.CV

    PointSea: Point Cloud Completion via Self-structure Augmentation

    Authors: Zhe Zhu, Honghua Chen, Xing He, Mingqiang Wei

    Abstract: Point cloud completion is a fundamental yet not well-solved problem in 3D vision. Current approaches often rely on 3D coordinate information and/or additional data (e.g., images and scanning viewpoints) to fill in missing parts. Unlike these methods, we explore self-structure augmentation and propose PointSea for global-to-local point cloud completion. In the global stage, consider how we inspect… ▽ More

    Submitted 26 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: Accepted by International Journal of Computer Vision (IJCV). Extension of our ICCV 2023 work: arXiv:2307.08492

  18. arXiv:2502.04377  [pdf, other

    cs.CV cs.AI

    MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction

    Authors: Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, Hui Zhang, Weiming Li, Shu Zhao, Yu Liu

    Abstract: Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches ofte… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  19. arXiv:2502.02465  [pdf, other

    cs.CV

    Towards Consistent and Controllable Image Synthesis for Face Editing

    Authors: Mengting Wei, Tuomas Varanka, Yante Li, Xingxun Jiang, Huai-Qian Khor, Guoying Zhao

    Abstract: Face editing methods, essential for tasks like virtual avatars, digital human synthesis and identity preservation, have traditionally been built upon GAN-based techniques, while recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in controlling specific attributes and preserving the consistency of other un… ▽ More

    Submitted 9 February, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  20. arXiv:2502.00943  [pdf, other

    cs.CL

    Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale

    Authors: Cliff Wong, Sam Preston, Qianchu Liu, Zelalem Gero, Jass Bagga, Sheng Zhang, Shrey Jain, Theodore Zhao, Yu Gu, Yanbo Xu, Sid Kiblawi, Roshanthi Weerasinghe, Rom Leidner, Kristina Young, Brian Piening, Carlo Bifulco, Tristan Naumann, Mu Wei, Hoifung Poon

    Abstract: The vast majority of real-world patient information resides in unstructured clinical text, and the process of medical abstraction seeks to extract and normalize structured information from this unstructured input. However, traditional medical abstraction methods can require significant manual efforts that can include crafting rules or annotating training labels, limiting scalability. In this paper… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

  21. arXiv:2502.00637  [pdf

    cs.HC

    Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling

    Authors: Mengyi Wei, Chenjing Jiao, Chenyu Zuo, Lorenz Hurni, Liqiu Meng

    Abstract: AI ethics narratives have the potential to shape the public accurate understanding of AI technologies and promote communication among different stakeholders. However, AI ethics narratives are largely lacking. Existing limited narratives tend to center on works of science fiction or corporate marketing campaigns of large technology companies. Misuse of "socio-technical imaginary" can blur the line… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: 13 pages, 9 figures

  22. arXiv:2501.06367  [pdf, other

    cs.CR

    Resilient Endurance-Aware NVM-based PUF against Learning-based Attacks

    Authors: Hassan Nassar, Ming-Liang Wei, Chia-Lin Yang, Jörg Henkel, Kuan-Hsun Chen

    Abstract: Physical Unclonable Functions (PUFs) based on Non-Volatile Memory (NVM) technology have emerged as a promising solution for secure authentication and cryptographic applications. By leveraging the multi-level cell (MLC) characteristic of NVMs, these PUFs can generate a wide range of unique responses, enhancing their resilience to machine learning (ML) modeling attacks. However, a significant issue… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

  23. arXiv:2501.02260  [pdf, other

    cs.CV

    MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

    Authors: Mengting Wei, Tuomas Varanka, Xingxun Jiang, Huai-Qian Khor, Guoying Zhao

    Abstract: We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person. This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner, while preserving their identity, pose, background and detailed facial attributes. Key to our model, which we dub MagicFace, is a diffusion model con… ▽ More

    Submitted 9 January, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

  24. arXiv:2501.01320  [pdf, other

    cs.CV

    SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

    Authors: Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Fei Xiao, Chen Change Loy, Lu Jiang

    Abstract: Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restora… ▽ More

    Submitted 22 March, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

    Comments: CVPR25 CR ver., add a co-author additionally. Project page: https://iceclear.github.io/projects/seedvr/

  25. arXiv:2501.01037  [pdf, other

    cs.RO cs.AI cs.CV

    MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception

    Authors: Xiaoshuai Hao, Guanqun Liu, Yuting Zhao, Yuheng Ji, Mengchuan Wei, Haimei Zhao, Lingdong Kong, Rong Yin, Yu Liu

    Abstract: Multi-sensor fusion models play a crucial role in autonomous driving perception, particularly in tasks like 3D object detection and HD map construction. These models provide essential and comprehensive static environmental information for autonomous driving systems. While camera-LiDAR fusion methods have shown promising results by integrating data from both modalities, they often depend on complet… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  26. Improving Acoustic Scene Classification in Low-Resource Conditions

    Authors: Zhi Chen, Yun-Fei Shao, Yong Ma, Mingsheng Wei, Le Zhang, Wei-Qiang Zhang

    Abstract: Acoustic Scene Classification (ASC) identifies an environment based on an audio signal. This paper explores ASC in low-resource conditions and proposes a novel model, DS-FlexiNet, which combines depthwise separable convolutions from MobileNetV2 with ResNet-inspired residual connections for a balance of efficiency and accuracy. To address hardware limitations and device heterogeneity, DS-FlexiNet e… ▽ More

    Submitted 27 April, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

    Comments: Copyright 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component

    Journal ref: ICASSP (2025)

  27. arXiv:2412.16085  [pdf, other

    eess.IV cs.CV

    Efficient MedSAMs: Segment Anything in Medical Images on Laptop

    Authors: Jun Ma, Feifei Li, Sumin Kim, Reza Asakereh, Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Alexander Pfefferle, Muxin Wei, Ruochen Gao, Donghang Lyu, Songxiao Yang, Lennart Purucker, Zdravko Marinov, Marius Staring, Haisheng Lu, Thuy Thanh Dao, Xincheng Ye, Zhi Li, Gianluca Brugnara, Philipp Vollmuth, Martha Foltyn-Dumitru, Jaeyoung Cho, Mustafa Ahmed Mahmutoglu, Martin Bendszus, Irada Pflüger , et al. (57 additional authors not shown)

    Abstract: Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spa… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: CVPR 2024 MedSAM on Laptop Competition Summary: https://www.codabench.org/competitions/1847/

  28. arXiv:2412.13693  [pdf, other

    cs.SE

    UITrans: Seamless UI Translation from Android to HarmonyOS

    Authors: Lina Gong, Chen Wang, Yujun Huang, Di Cui, Mingqiang Wei

    Abstract: Seamless user interface (i.e., UI) translation has emerged as a pivotal technique for modern mobile developers, addressing the challenge of developing separate UI applications for Android and HarmonyOS platforms due to fundamental differences in layout structures and development paradigms. In this paper, we present UITrans, the first automated UI translation tool designed for Android to HarmonyOS.… ▽ More

    Submitted 5 February, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: 5 pages

  29. arXiv:2412.13471  [pdf, other

    cs.AI cs.CL

    Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates

    Authors: Rui Zou, Mengqi Wei, Jintian Feng, Qian Wan, Jianwen Sun, Sannyuya Liu

    Abstract: In recent years, large language models have shown exceptional performance in fulfilling diverse human needs. However, their training data can introduce harmful content, underscoring the necessity for robust value alignment. Mainstream methods, which depend on feedback learning and supervised training, are resource-intensive and may constrain the full potential of the models. Multi-Agent Debate (MA… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  30. arXiv:2412.12770  [pdf, other

    cs.IR

    A Survey on Sequential Recommendation

    Authors: Liwei Pan, Weike Pan, Meiyan Wei, Hongzhi Yin, Zhong Ming

    Abstract: Different from most conventional recommendation problems, sequential recommendation focuses on learning users' preferences by exploiting the internal order and dependency among the interacted items, which has received significant attention from both researchers and practitioners. In recent years, we have witnessed great progress and achievements in this field, necessitating a new survey. In this s… ▽ More

    Submitted 13 March, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

  31. ESA: Example Sieve Approach for Multi-Positive and Unlabeled Learning

    Authors: Zhongnian Li, Meng Wei, Peng Ying, Xinzheng Xu

    Abstract: Learning from Multi-Positive and Unlabeled (MPU) data has gradually attracted significant attention from practical applications. Unfortunately, the risk of MPU also suffer from the shift of minimum risk, particularly when the models are very flexible as shown in Fig.\ref{moti}. In this paper, to alleviate the shifting of minimum risk problem, we propose an Example Sieve Approach (ESA) to select ex… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: 12 pages, 6 figures

  32. Learning from Concealed Labels

    Authors: Zhongnian Li, Meng Wei, Peng Ying, Tongfeng Sun, Xinzheng Xu

    Abstract: Annotating data for sensitive labels (e.g., disease, smoking) poses a potential threats to individual privacy in many real-world scenarios. To cope with this problem, we propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collecti… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: 12 pages, 2 figures

  33. arXiv:2412.00370  [pdf, other

    cs.GT

    Incentive-Driven Task Offloading and Collaborative Computing in Device-Assisted MEC Networks

    Authors: Yang Li, Xing Zhang, Bo Lei, Qianying Zhao, Min Wei, Zheyan Qu, Wenbo Wang

    Abstract: Edge computing (EC), positioned near end devices, holds significant potential for delivering low-latency, energy-efficient, and secure services. This makes it a crucial component of the Internet of Things (IoT). However, the increasing number of IoT devices and emerging services place tremendous pressure on edge servers (ESs). To better handle dynamically arriving heterogeneous tasks, ESs and IoT… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

    Comments: Accepted to IEEE Internet of Things Journal

  34. arXiv:2411.18850  [pdf, other

    cs.CV

    CrossTracker: Robust Multi-modal 3D Multi-Object Tracking via Cross Correction

    Authors: Lipeng Gu, Xuefeng Yan, Weiming Wang, Honghua Chen, Dingkun Zhu, Liangliang Nan, Mingqiang Wei

    Abstract: The fusion of camera- and LiDAR-based detections offers a promising solution to mitigate tracking failures in 3D multi-object tracking (MOT). However, existing methods predominantly exploit camera detections to correct tracking failures caused by potential LiDAR detection problems, neglecting the reciprocal benefit of refining camera detections using LiDAR data. This limitation is rooted in their… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  35. arXiv:2411.17406  [pdf, other

    cs.CV

    CoA: Chain-of-Action for Generative Semantic Labels

    Authors: Meng Wei, Zhongnian Li, Peng Ying, Xinzheng Xu

    Abstract: Recent advances in vision-language models (VLM) have demonstrated remarkable capability in image classification. These VLMs leverage a predefined set of categories to construct text prompts for zero-shot reasoning. However, in more open-ended domains like autonomous driving, using a predefined set of labels becomes impractical, as the semantic label space is unknown and constantly evolving. Additi… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 15 pages, 8 figures

  36. arXiv:2411.06041  [pdf, other

    cs.CV cs.AI

    PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

    Authors: Yun Liu, Peng Li, Xuefeng Yan, Liangliang Nan, Bing Wang, Honghua Chen, Lina Gong, Wei Zhao, Mingqiang Wei

    Abstract: The core of self-supervised point cloud learning lies in setting up appropriate pretext tasks, to construct a pre-training framework that enables the encoder to perceive 3D objects effectively. In this paper, we integrate two prevalent methods, masked point modeling (MPM) and 3D-to-2D generation, as pretext tasks within a pre-training framework. We leverage the spatial awareness and precise superv… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  37. arXiv:2410.20593  [pdf, other

    cs.CV

    Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering

    Authors: Meng Wei, Qianyi Wu, Jianmin Zheng, Hamid Rezatofighi, Jianfei Cai

    Abstract: Rendering and reconstruction are long-standing topics in computer vision and graphics. Achieving both high rendering quality and accurate geometry is a challenge. Recent advancements in 3D Gaussian Splatting (3DGS) have enabled high-fidelity novel view synthesis at real-time speeds. However, the noisy and discrete nature of 3D Gaussian primitives hinders accurate surface estimation. Previous attem… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

    Comments: 9 pages, 5 figures, accepted at NeurIPS 2024

  38. arXiv:2410.19550   

    cs.SE cs.AI

    DeMuVGN: Effective Software Defect Prediction Model by Learning Multi-view Software Dependency via Graph Neural Networks

    Authors: Yu Qiao, Lina Gong, Yu Zhao, Yongwei Wang, Mingqiang Wei

    Abstract: Software defect prediction (SDP) aims to identify high-risk defect modules in software development, optimizing resource allocation. While previous studies show that dependency network metrics improve defect prediction, most methods focus on code-based dependency graphs, overlooking developer factors. Current metrics, based on handcrafted features like ego and global network metrics, fail to fully… ▽ More

    Submitted 7 May, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

    Comments: The current paper is not comprehensive enough. We are seeking further improvement

  39. arXiv:2410.14743  [pdf, other

    cs.LG cs.AI

    Efficient Deep Learning Board: Training Feedback Is Not All You Need

    Authors: Lina Gong, Qi Gao, Peng Li, Mingqiang Wei, Fei Wu

    Abstract: Current automatic deep learning (i.e., AutoDL) frameworks rely on training feedback from actual runs, which often hinder their ability to provide quick and clear performance predictions for selecting suitable DL systems. To address this issue, we propose EfficientDL, an innovative deep learning board designed for automatic performance prediction and component recommendation. EfficientDL can quickl… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  40. arXiv:2410.08531  [pdf, other

    cs.CV

    Diffusion Models Need Visual Priors for Image Generation

    Authors: Xiaoyu Yue, Zidong Wang, Zeyu Lu, Shuyang Sun, Meng Wei, Wanli Ouyang, Lei Bai, Luping Zhou

    Abstract: Conventional class-guided diffusion models generally succeed in generating images with correct semantic content, but often struggle with texture details. This limitation stems from the usage of class priors, which only provide coarse and limited conditional information. To address this issue, we propose Diffusion on Diffusion (DoD), an innovative multi-stage generation framework that first extract… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: Preprint

  41. arXiv:2410.06542  [pdf, other

    eess.IV cs.CV

    MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

    Authors: Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur , et al. (6 additional authors not shown)

    Abstract: In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-ar… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  42. arXiv:2410.06440  [pdf, other

    cs.SE

    Checker Bug Detection and Repair in Deep Learning Libraries

    Authors: Nima Shiri Harzevili, Mohammad Mahdi Mohajer, Jiho Shin, Moshi Wei, Gias Uddin, Jinqiu Yang, Junjie Wang, Song Wang, Zhen Ming, Jiang, Nachiappan Nagappan

    Abstract: Checker bugs in Deep Learning (DL) libraries are critical yet not well-explored. These bugs are often concealed in the input validation and error-checking code of DL libraries and can lead to silent failures, incorrect results, or unexpected program behavior in DL applications. Despite their potential to significantly impact the reliability and performance of DL-enabled systems built with these li… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  43. arXiv:2410.04762  [pdf

    cs.CV

    WTCL-Dehaze: Rethinking Real-world Image Dehazing via Wavelet Transform and Contrastive Learning

    Authors: Divine Joseph Appiah, Donghai Guan, Abdul Nasser Kasule, Mingqiang Wei

    Abstract: Images captured in hazy outdoor conditions often suffer from colour distortion, low contrast, and loss of detail, which impair high-level vision tasks. Single image dehazing is essential for applications such as autonomous driving and surveillance, with the aim of restoring image clarity. In this work, we propose WTCL-Dehaze an enhanced semi-supervised dehazing network that integrates Contrastive… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 15 pages,4 figures

  44. arXiv:2410.02796  [pdf, other

    eess.SP cs.ET cs.IT cs.NI

    Toward Adaptive Tracking and Communication via an Airborne Maneuverable Bi-Static ISAC System

    Authors: Mingliang Wei, Ruoguang Li, Li Wang, Lianming Xu, Zhu Han

    Abstract: In this letter, we propose an airborne maneuverable bi-static integrated sensing and communication system where both the transmitter and receiver are unmanned aerial vehicles. By timely forming a dynamic bi-static range based on the motion information of the target, such a system can provide an adaptive two dimensional tracking and communication services. Towards this end, a trajectory optimizatio… ▽ More

    Submitted 18 September, 2024; originally announced October 2024.

  45. arXiv:2410.00872  [pdf

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Do Music Generation Models Encode Music Theory?

    Authors: Megan Wei, Michael Freeman, Chris Donahue, Chen Sun

    Abstract: Music foundation models possess impressive music generation capabilities. When people compose music, they may infuse their understanding of music into their work, by using notes and intervals to craft melodies, chords to build progressions, and tempo to create a rhythmic feel. To what extent is this true of music generation models? More specifically, are fundamental Western music theory concepts o… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: Accepted at ISMIR 2024. Dataset: https://huggingface.co/datasets/meganwei/syntheory Code: https://github.com/brown-palm/syntheory Website: https://brown-palm.github.io/music-theory

  46. arXiv:2409.20196  [pdf, other

    cs.SD cs.AI eess.AS

    Melody-Guided Music Generation

    Authors: Shaopeng Wei, Manzhen Wei, Haoyu Wang, Yu Zhao, Gang Kou

    Abstract: We present the Melody-Guided Music Generation (MG2) model, a novel approach using melody to guide the text-to-music generation that, despite a simple method and limited resources, achieves excellent performance. Specifically, we first align the text with audio waveforms and their associated melodies using the newly proposed Contrastive Language-Music Pretraining, enabling the learned text represen… ▽ More

    Submitted 30 December, 2024; v1 submitted 30 September, 2024; originally announced September 2024.

    Comments: 16 pages, 8 figure, 8 tables

  47. arXiv:2409.09378  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Prevailing Research Areas for Music AI in the Era of Foundation Models

    Authors: Megan Wei, Mateusz Modrzejewski, Aswin Sivaraman, Dorien Herremans

    Abstract: In tandem with the recent advancements in foundation model research, there has been a surge of generative music AI applications within the past few years. As the idea of AI-generated or AI-augmented music becomes more mainstream, many researchers in the music AI community may be wondering what avenues of research are left. With regards to music generative models, we outline the current areas of re… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    MSC Class: 68T05; 68T20 ACM Class: I.2; I.5.4; I.2.6; I.2.7; H.5.5

  48. arXiv:2409.00618  [pdf, other

    cs.CV

    YOLOO: You Only Learn from Others Once

    Authors: Lipeng Gu, Mingqiang Wei, Xuefeng Yan, Dingkun Zhu, Wei Zhao, Haoran Xie, Yong-Jin Liu

    Abstract: Multi-modal 3D multi-object tracking (MOT) typically necessitates extensive computational costs of deep neural networks (DNNs) to extract multi-modal representations. In this paper, we propose an intriguing question: May we learn from multiple modalities only during training to avoid multi-modal input in the inference phase? To answer it, we propose \textbf{YOLOO}, a novel multi-modal 3D MOT parad… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  49. arXiv:2408.13716  [pdf, other

    eess.IV cs.CV

    FreqINR: Frequency Consistency for Implicit Neural Representation with Adaptive DCT Frequency Loss

    Authors: Meiyi Wei, Liu Xie, Ying Sun, Gang Chen

    Abstract: Recent advancements in local Implicit Neural Representation (INR) demonstrate its exceptional capability in handling images at various resolutions. However, frequency discrepancies between high-resolution (HR) and ground-truth images, especially at larger scales, result in significant artifacts and blurring in HR images. This paper introduces Frequency Consistency for Implicit Neural Representatio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures

  50. arXiv:2408.08554  [pdf, other

    cs.LG

    ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

    Authors: Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their practical application is constrained by substantial memory and computational demands. Post-training quantization (PTQ) is considered an effective method to accelerate LLM inference. Despite its growing popularity in LLM model compression, PTQ deployment faces two major challenges. First, low-bit quan… ▽ More

    Submitted 22 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.