Skip to main content

Showing 201–250 of 3,900 results for author: Zhe

.
  1. arXiv:2503.21525  [pdf, other

    cs.CV

    ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo

    Authors: Yuxi Hu, Jun Zhang, Zhe Zhang, Rafael Weilharter, Yuchen Rao, Kuangyi Chen, Runze Yuan, Friedrich Fraundorfer

    Abstract: Multi-view Stereo (MVS) aims to estimate depth and reconstruct 3D point clouds from a series of overlapping images. Recent learning-based MVS frameworks overlook the geometric information embedded in features and correlations, leading to weak cost matching. In this paper, we propose ICG-MVSNet, which explicitly integrates intra-view and cross-view relationships for depth estimation. Specifically,… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  2. arXiv:2503.21057  [pdf, other

    eess.SY

    Validation and Calibration of Energy Models with Real Vehicle Data from Chassis Dynamometer Experiments

    Authors: Joy Carpio, Sulaiman Almatrudi, Nour Khoudari, Zhe Fu, Kenneth Butts, Jonathan Lee, Benjamin Seibold, Alexandre Bayen

    Abstract: Accurate estimation of vehicle fuel consumption typically requires detailed modeling of complex internal powertrain dynamics, often resulting in computationally intensive simulations. However, many transportation applications-such as traffic flow modeling, optimization, and control-require simplified models that are fast, interpretable, and easy to implement, while still maintaining fidelity to ph… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  3. arXiv:2503.20377  [pdf, other

    cs.AR cs.NI

    UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture

    Authors: Heng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, Zhe Zhou, Ziyang Zhang, Yuhang Gai, Cunle Qian, Yi Xiong, Zhongwu Cheng, Jing Xia, Yuli Ma, Xi Chen, Wenhua Du, Shizhong Xiao, Chungang Li, Yong Qin, Liudong Xiong, Zhou Yu , et al. (9 additional authors not shown)

    Abstract: As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture designed to enhance scalability, performance, cost-efficiency and availability. Unlike traditional datacenters that provide symmetrical node-to-node bandwidth, UB-Mesh employs a hierarchically locali… ▽ More

    Submitted 17 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  4. arXiv:2503.20265  [pdf, other

    cs.SE

    Fixseeker: An Empirical Driven Graph-based Approach for Detecting Silent Vulnerability Fixes in Open Source Software

    Authors: Yiran Cheng, Ting Zhang, Lwin Khin Shar, Zhe Lang, David Lo, Shichao Lv, Dongliang Fang, Zhiqiang Shi, Limin Sun

    Abstract: Open source software vulnerabilities pose significant security risks to downstream applications. While vulnerability databases provide valuable information for mitigation, many security patches are released silently in new commits of OSS repositories without explicit indications of their security impact. This makes it challenging for software maintainers and users to detect and address these vulne… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  5. arXiv:2503.20117  [pdf, ps, other

    cs.LG cs.DC

    Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable

    Authors: Bicheng Ying, Zhe Li, Haibo Yang

    Abstract: This work tackles the fundamental challenges in Federated Learning (FL) posed by arbitrary client participation and data heterogeneity, prevalent characteristics in practical FL settings. It is well-established that popular FedAvg-style algorithms struggle with exact convergence and can suffer from slow convergence rates since a decaying learning rate is required to mitigate these scenarios. To ad… ▽ More

    Submitted 3 June, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: Under review

  6. arXiv:2503.19941  [pdf, other

    cs.RO cs.AI cs.NE

    Body Discovery of Embodied AI

    Authors: Zhe Sun, Pengfei Tian, Xiaozhu Hu, Xiaoyu Zhao, Huiying Li, Zhenliang Zhang

    Abstract: In the pursuit of realizing artificial general intelligence (AGI), the importance of embodied artificial intelligence (AI) becomes increasingly apparent. Following this trend, research integrating robots with AGI has become prominent. As various kinds of embodiments have been designed, adaptability to diverse embodiments will become important to AGI. We introduce a new challenge, termed "Body Disc… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  7. arXiv:2503.18943  [pdf, other

    cs.CV

    SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

    Authors: Mingze Xu, Mingfei Gao, Shiyu Li, Jiasen Lu, Zhe Gan, Zhengfeng Lai, Meng Cao, Kai Kang, Yinfei Yang, Afshin Dehghan

    Abstract: We introduce SlowFast-LLaVA-1.5 (abbreviated as SF-LLaVA-1.5), a family of video large language models (LLMs) offering a token-efficient solution for long-form video understanding. We incorporate the two-stream SlowFast mechanism into a streamlined training pipeline, and perform joint video-image training on a carefully curated data mixture of only publicly available datasets. Our primary focus is… ▽ More

    Submitted 27 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: Technical report

  8. arXiv:2503.18888  [pdf, other

    cs.SE cs.CL cs.IR

    Toward building next-generation Geocoding systems: a systematic review

    Authors: Zhengcong Yin, Daniel W. Goldberg, Binbin Lin, Bing Zhou, Diya Li, Andong Ma, Ziqian Ming, Heng Cai, Zhe Zhang, Shaohua Wang, Shanzhen Gao, Joey Ying Lee, Xiao Li, Da Huo

    Abstract: Geocoding systems are widely used in both scientific research for spatial analysis and everyday life through location-based services. The quality of geocoded data significantly impacts subsequent processes and applications, underscoring the need for next-generation systems. In response to this demand, this review first examines the evolving requirements for geocoding inputs and outputs across vari… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  9. arXiv:2503.18091  [pdf, ps, other

    cond-mat.mes-hall cond-mat.dis-nn

    Fermi energy sensitive universal conductance fluctuations in anisotropic materials

    Authors: Qiang Yang, Yayun Hu, Zhe Hou, Peiqing Tong

    Abstract: Universal conductance fluctuations (UCF) are a hallmark of quantum interference in mesoscopic devices. According to the Altshuler-Lee-Stone theory, the amplitude of UCF remains independent of system parameters such as Fermi energy and disorder strength. However, recent experiments have demonstrated a significant variation in UCF with respect to Fermi energy in the anisotropic Dirac semimetal… ▽ More

    Submitted 10 June, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: 19 pages, 11 figures

    Journal ref: Phys. Rev. B 111, 214203 (2025)

  10. arXiv:2503.17788  [pdf, other

    cs.CV cs.AI

    Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction

    Authors: Gaoge Han, Yongkang Cheng, Zhe Chen, Shaoli Huang, Tongliang Liu

    Abstract: Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures and occlusions, causing significant difficulty in achieving plausible interaction alignment. Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts. To tackle this, we propose a novel framework that attempts to precisely alig… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  11. arXiv:2503.16965  [pdf, other

    cs.CL cs.CV

    Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning

    Authors: Zhe Hu, Jing Li, Zhongzhu Pu, Hou Pong Chan, Yu Yin

    Abstract: Vision Language Models exhibited immense potential for embodied AI, yet they often lack the sophisticated situational reasoning required for complex decision-making. This paper shows that VLMs can achieve surprisingly strong decision-making performance when visual scenes are represented merely as text-only descriptions, suggesting foundational reasoning can be effectively learned from language. Mo… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

  12. arXiv:2503.16886  [pdf, other

    astro-ph.HE

    Insight-HXMT observations of the 2023 outburst in Aql X-1

    Authors: Zhe Yan, Guobao Zhang, Yu-Peng Chen, Mariano Méndez, Jirong Mao, Ming Lyu, Shu Zhang, Pei Jin

    Abstract: We conducted an analysis of the continuum during the onset and initial decline phases of the 2023 outburst in transient neutron star low-mass X-ray binary Aql X$-$1 using broadband observations from the \textit{Insight-Hard X-ray Modulation Telescope (Insight-HXMT)} instrument. To determine the most appropriate model for the continuum of this outburst, we employed three models to explore the evolu… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 6 figures

  13. Giant Self Spin-Valve Effect in the Kagome Helimagnet

    Authors: Xitong Xu, Yonglai Liu, Kesen Zhao, Che-Min Lin, Miao He, Haitian Zhao, Qingqi Zeng, Yubin Hou, Qingyou Lu, Ding-Fu Shao, Shuang Jia, Haifeng Du, Wenjie Meng, Tay-Rong Chang, Zhe Qu

    Abstract: Kagome magnets can combine non-trivial band topology and electron correlations, offering a versatile playground for various quantum phenomena. In this work we propose that kagome magnets with frustrated interlayer interactions can intrinsically support a self spin-valve effect, and experimentally confirm this in the kagome helimagnet TmMn$_6$Sn$_6$. Under a magnetic field perpendicular to the heli… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Accepted version

    Journal ref: Nat. Commun. 16, 2630 (2025)

  14. arXiv:2503.15837  [pdf, other

    cs.CL cs.AI

    Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation

    Authors: Shangqing Zhao, Yuhao Zhou, Yupei Ren, Zhe Chen, Chenghao Jia, Fang Zhe, Zhaogaung Long, Shu Liu, Man Lan

    Abstract: Ancient Chinese text processing presents unique challenges for large language models (LLMs) due to its distinct linguistic features, complex structural constraints, and rich cultural context. While existing benchmarks have primarily focused on evaluating comprehension through multiple-choice questions, there remains a critical gap in assessing models' generative capabilities in classical Chinese.… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: working in progress

  15. arXiv:2503.15558  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

    Authors: NVIDIA, :, Alisson Azzolini, Junjie Bai, Hannah Brandon, Jiaxin Cao, Prithvijit Chattopadhyay, Huayu Chen, Jinju Chu, Yin Cui, Jenna Diamond, Yifan Ding, Liang Feng, Francesco Ferroni, Rama Govindaraju, Jinwei Gu, Siddharth Gururani, Imad El Hanafi, Zekun Hao, Jacob Huffman, Jingyi Jin, Brendan Johnson, Rizwan Khan, George Kurian, Elena Lantz , et al. (29 additional authors not shown)

    Abstract: Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, wit… ▽ More

    Submitted 19 May, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  16. arXiv:2503.15404  [pdf, other

    cs.CV cs.CR

    Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement

    Authors: Yuchen Ren, Zhengyu Zhao, Chenhao Lin, Bo Yang, Lu Zhou, Zhe Liu, Chao Shen

    Abstract: Vision Transformers (ViTs) have been widely applied in various computer vision and vision-language tasks. To gain insights into their robustness in practical scenarios, transferable adversarial examples on ViTs have been extensively studied. A typical approach to improving adversarial transferability is by refining the surrogate model. However, existing work on ViTs has restricted their surrogate… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: CVPR2025

  17. arXiv:2503.15144  [pdf, other

    cs.CV

    PointSFDA: Source-free Domain Adaptation for Point Cloud Completion

    Authors: Xing He, Zhe Zhu, Liangliang Nan, Honghua Chen, Jing Qin, Mingqiang Wei

    Abstract: Conventional methods for point cloud completion, typically trained on synthetic datasets, face significant challenges when applied to out-of-distribution real-world scans. In this paper, we propose an effective yet simple source-free domain adaptation framework for point cloud completion, termed \textbf{PointSFDA}. Unlike unsupervised domain adaptation that reduces the domain gap by directly lever… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  18. arXiv:2503.15091  [pdf, other

    cs.RO cs.CV

    Intelligent Spatial Perception by Building Hierarchical 3D Scene Graphs for Indoor Scenarios with the Help of LLMs

    Authors: Yao Cheng, Zhe Han, Fengyang Jiang, Huaizhen Wang, Fengyu Zhou, Qingshan Yin, Lei Wei

    Abstract: This paper addresses the high demand in advanced intelligent robot navigation for a more holistic understanding of spatial environments, by introducing a novel system that harnesses the capabilities of Large Language Models (LLMs) to construct hierarchical 3D Scene Graphs (3DSGs) for indoor scenarios. The proposed framework constructs 3DSGs consisting of a fundamental layer with rich metric-semant… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: accepted by WRC SARA 2024

  19. arXiv:2503.14000  [pdf, other

    cs.SE

    LLM-based Unit Test Generation for Dynamically-Typed Programs

    Authors: Runlin Liu, Zhe Zhang, Yunge Hu, Yuhang Lin, Xiang Gao, Hailong Sun

    Abstract: Automated unit test generation has been widely studied, but generating effective tests for dynamically typed programs remains a significant challenge. Existing approaches, including search-based software testing (SBST) and recent LLM-based methods, often suffer from type errors, leading to invalid inputs and assertion failures, ultimately reducing testing effectiveness. To address this, we propose… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  20. arXiv:2503.13880  [pdf

    physics.optics physics.acc-ph

    The development of vibration modes propagation method to perform wave-optics simulation of beamline vibration

    Authors: Han Xu, Xiao Li, Ming Li, Zhe Ren, Yi Zhang, Peng Liu, Yuhui Dong, Liang Zhou

    Abstract: The evolution from 3rd to 4th generation synchrotron radiation (SR) sources provide promising potential improvements in X-ray techniques, particularly in spatial resolution for imaging, temporal resolution for dynamic studies, and beam size control for nanoprobes. Achieving these enhancements demands effective vibration suppression in beamline systems. This challenge drives the need for optical de… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  21. arXiv:2503.13848  [pdf, other

    cs.AR cs.DC

    FlexStep: Enabling Flexible Error Detection in Multi/Many-core Real-time Systems

    Authors: Tinglue Wang, Yiming Li, Wei Tang, Jiapeng Guan, Zhenghui Guo, Renshuang Jiang, Ran Wei, Jing Li, Zhe Jiang

    Abstract: Reliability and real-time responsiveness in safety-critical systems have traditionally been achieved using error detection mechanisms, such as LockStep, which require pre-configured checker cores,strict synchronisation between main and checker cores, static error detection regions, or limited preemption capabilities. However, these core-bound hardware mechanisms often lead to significant resource… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  22. arXiv:2503.13555  [pdf, other

    eess.IV cs.CV

    Feasibility study for reconstruction of knee MRI from one corresponding X-ray via CNN

    Authors: Zhe Wang, Aladine Chetouani, Rachid Jennane

    Abstract: Generally, X-ray, as an inexpensive and popular medical imaging technique, is widely chosen by medical practitioners. With the development of medical technology, Magnetic Resonance Imaging (MRI), an advanced medical imaging technique, has already become a supplementary diagnostic option for the diagnosis of KOA. We propose in this paper a deep-learning-based approach for generating MRI from one co… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  23. arXiv:2503.13012  [pdf, other

    cs.CV cs.AI

    Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation

    Authors: Xingguo Lv, Xingbo Dong, Liwen Wang, Jiewen Yang, Lei Zhao, Bin Pu, Zhe Jin, Xuejun Li

    Abstract: Despite domain generalization (DG) has significantly addressed the performance degradation of pre-trained models caused by domain shifts, it often falls short in real-world deployment. Test-time adaptation (TTA), which adjusts a learned model using unlabeled test data, presents a promising solution. However, most existing TTA methods struggle to deliver strong performance in medical image segmenta… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  24. arXiv:2503.12652  [pdf, other

    cs.CV

    UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

    Authors: Tsu-Jui Fu, Yusu Qian, Chen Chen, Wenze Hu, Zhe Gan, Yinfei Yang

    Abstract: Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts. Building on this, various methods further fine-tune the pre-trained T2I model for specific tasks. However, this requires separate model architectures, training designs, and multiple parameter sets to handle different tasks. In this paper, we introduce UniVG, a general… ▽ More

    Submitted 22 April, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  25. arXiv:2503.12482  [pdf, other

    eess.SP

    Fuzzy Clustering for Low-Complexity Time Domain Chromatic Dispersion Compensation Scheme in Coherent Optical Fiber Communication Systems

    Authors: Wenkai Wan, Aiying Yang, Peng Guo, Zhe Zhao, Tianjia Xu, Jinxuan Wu, Zhiheng Liu

    Abstract: Chromatic dispersion compensation (CDC), implemented in either the time-domain or frequency-domain, is crucial for enhancing power efficiency in the digital signal processing of modern optical fiber communication systems. Developing low-complexity CDC schemes is essential for hardware implemention, particularly for high-speed and long-haul optical fiber communication systems. In this work, we prop… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  26. arXiv:2503.12339  [pdf, other

    cs.LG cs.AI

    Augmented Adversarial Trigger Learning

    Authors: Zhe Wang, Yanjun Qi

    Abstract: Gradient optimization-based adversarial attack methods automate the learning of adversarial triggers to generate jailbreak prompts or leak system prompts. In this work, we take a closer look at the optimization objective of adversarial trigger learning and propose ATLA: Adversarial Trigger Learning with Augmented objectives. ATLA improves the negative log-likelihood loss used by previous studies i… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  27. arXiv:2503.12278  [pdf, other

    eess.SY

    Power Swing Trajectory Influenced by Virtual Impedance-Based Current-Limiting Strategy

    Authors: Yanshu Niu, Zhe Yang, Bikash C. Pal

    Abstract: Grid-forming (GFM) inverter-based resources (IBRs) can emulate the external characteristics of synchronous generators (SGs) through appropriate control loop design. However, in systems with GFM IBRs, the apparent impedance trajectory under current limitation differs significantly from that of SG-based systems due to the limited overcurrent capability of power electronic devices. This difference ch… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  28. arXiv:2503.12218  [pdf, other

    cs.CV

    Adaptive Label Correction for Robust Medical Image Segmentation with Noisy Labels

    Authors: Chengxuan Qian, Kai Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chen, Zhe Liu

    Abstract: Deep learning has shown remarkable success in medical image analysis, but its reliance on large volumes of high-quality labeled data limits its applicability. While noisy labeled data are easier to obtain, directly incorporating them into training can degrade model performance. To address this challenge, we propose a Mean Teacher-based Adaptive Label Correction (ALC) self-ensemble framework for ro… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  29. arXiv:2503.12018  [pdf, other

    cs.CV cs.AI

    Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art

    Authors: Zhe Jin, Tat-Seng Chua

    Abstract: Text-to-Image (T2I) diffusion models (DM) have garnered widespread adoption due to their capability in generating high-fidelity outputs and accessibility to anyone able to put imagination into words. However, DMs are often predisposed to generate unappealing outputs, much like the random images on the internet they were trained on. Existing approaches to address this are founded on the implicit pr… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  30. arXiv:2503.12006  [pdf, other

    cs.CV

    ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

    Authors: Zhe Shan, Yang Liu, Lei Zhou, Cheng Yan, Heng Wang, Xia Xie

    Abstract: The availability of large-scale remote sensing video data underscores the importance of high-quality interactive segmentation. However, challenges such as small object sizes, ambiguous features, and limited generalization make it difficult for current methods to achieve this goal. In this work, we propose ROS-SAM, a method designed to achieve high-quality interactive segmentation while preserving… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  31. arXiv:2503.11899  [pdf, other

    cs.LG eess.SP

    Spatio-temporal Fourier Transformer (StFT) for Long-term Dynamics Prediction

    Authors: Da Long, Shandian Zhe, Samuel Williams, Leonid Oliker, Zhe Bai

    Abstract: Simulating the long-term dynamics of multi-scale and multi-physics systems poses a significant challenge in understanding complex phenomena across science and engineering. The complexity arises from the intricate interactions between scales and the interplay of diverse physical processes. Neural operators have emerged as promising models for predicting such dynamics due to their flexibility and co… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 16 pages, 10 figures

  32. arXiv:2503.11185  [pdf, other

    cs.CR cs.AI

    Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

    Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen

    Abstract: Large Language Models (LLMs) are vulnerable to jailbreak attacks, which use crafted prompts to elicit toxic responses. These attacks exploit LLMs' difficulty in dynamically detecting harmful intents during the generation process. Traditional safety alignment methods, often relying on the initial few generation steps, are ineffective due to limited computational budget. This paper proposes DEEPALIG… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  33. arXiv:2503.11182  [pdf, other

    cs.CL

    Palette of Language Models: A Solver for Controlled Text Generation

    Authors: Zhe Yang, Yi Huang, Yaqin Chen, Xiaoting Wu, Junlan Feng, Chao Deng

    Abstract: Recent advancements in large language models have revolutionized text generation with their remarkable capabilities. These models can produce controlled texts that closely adhere to specific requirements when prompted appropriately. However, designing an optimal prompt to control multiple attributes simultaneously can be challenging. A common approach is to linearly combine single-attribute models… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted to NAACL 2025, Main, Long Paper

  34. arXiv:2503.10721  [pdf, other

    cs.SE cs.AI

    From Understanding to Excelling: Template-Free Algorithm Design through Structural-Functional Co-Evolution

    Authors: Zhe Zhao, Haibin Wen, Pengkun Wang, Ye Wei, Zaixi Zhang, Xi Lin, Fei Liu, Bo An, Hui Xiong, Yang Wang, Qingfu Zhang

    Abstract: Large language models (LLMs) have greatly accelerated the automation of algorithm generation and optimization. However, current methods such as EoH and FunSearch mainly rely on predefined templates and expert-specified functions that focus solely on the local evolution of key functionalities. Consequently, they fail to fully leverage the synergistic benefits of the overall architecture and the pot… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    MSC Class: 68W20; 68T20 ACM Class: I.2.7

  35. arXiv:2503.10299  [pdf, other

    hep-ph hep-ex

    $D^{(*)}\bar{B}^{(*)}$ Dynamics in Chiral Effective Field Theory

    Authors: Zhe Liu, Hao Xu, Xiang Liu

    Abstract: In this work, we systematically study the interactions of the $S$-wave $D^{(*)}\bar{B}^{(*)}$ systems within the framework of chiral effective field theory in heavy hadron formalism. We calculate the $D^{(*)}\bar{B}^{(*)}$ effective potentials up to next-to-leading order, explore the bound state formations, and investigate the $D^{(*)}\bar{B}^{(*)}$ scattering properties such as scattering rate, s… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 20 pages, 11 figures, 10 tables

  36. arXiv:2503.10291  [pdf, other

    cs.CV cs.CL

    VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

    Authors: Weiyun Wang, Zhangwei Gao, Lianjie Chen, Zhe Chen, Jinguo Zhu, Xiangyu Zhao, Yangzhou Liu, Yue Cao, Shenglong Ye, Xizhou Zhu, Lewei Lu, Haodong Duan, Yu Qiao, Jifeng Dai, Wenhai Wang

    Abstract: We introduce VisualPRM, an advanced multimodal Process Reward Model (PRM) with 8B parameters, which improves the reasoning abilities of existing Multimodal Large Language Models (MLLMs) across different model scales and families with Best-of-N (BoN) evaluation strategies. Specifically, our model improves the reasoning performance of three types of MLLMs and four different model scales. Even when a… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  37. arXiv:2503.10239  [pdf, other

    cs.CR

    I Can Tell Your Secrets: Inferring Privacy Attributes from Mini-app Interaction History in Super-apps

    Authors: Yifeng Cai, Ziqi Zhang, Mengyu Yao, Junlin Liu, Xiaoke Zhao, Xinyi Fu, Ruoyu Li, Zhe Li, Xiangqun Chen, Yao Guo, Ding Li

    Abstract: Super-apps have emerged as comprehensive platforms integrating various mini-apps to provide diverse services. While super-apps offer convenience and enriched functionality, they can introduce new privacy risks. This paper reveals a new privacy leakage source in super-apps: mini-app interaction history, including mini-app usage history (Mini-H) and operation history (Op-H). Mini-H refers to the his… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Accepted by USENIX Security 2025

  38. arXiv:2503.09499  [pdf, other

    cs.CV cs.AI cs.CL

    MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?

    Authors: Zhe Xu, Daoyuan Chen, Zhenqing Ling, Yaliang Li, Ying Shen

    Abstract: Large foundation models face challenges in acquiring transferable, structured thinking abilities, especially when supervised with rigid templates or crowd-annotated instruction datasets. Unlike prior approaches, we focus on a thinking-centric data synthesis paradigm that enables models to evolve through self-generated, cognitively guided data. We propose MindGYM, a structured and scalable framewor… ▽ More

    Submitted 22 May, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: 22 pages, 7 tables

  39. arXiv:2503.09394  [pdf, other

    cs.CV

    Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models

    Authors: Xiaozhen Qiao, Peng Huang, Jiakang Yuan, Xianda Guo, Bowen Ye, Zhe Sun, Xuelong Li

    Abstract: Test-time adaptation (TTA) is crucial in maintaining Vision-Language Models (VLMs) performance when facing real-world distribution shifts, particularly when the source data or target labels are inaccessible. Existing TTA methods rely on CLIP's output probability distribution for feature evaluation, which can introduce biases under domain shifts. This misalignment may cause features to be misclassi… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  40. arXiv:2503.08710  [pdf, other

    eess.IV cs.CV

    Large model enhanced computational ghost imaging

    Authors: Yifan Chen, Hongjun An, Zhe Sun, Tong Tian, Mingliang Chen, Christian Spielmann, Xuelong Li

    Abstract: Ghost imaging (GI) achieves 2D image reconstruction through high-order correlation of 1D bucket signals and 2D light field information, particularly demonstrating enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media. Recent investigations have established that deep learning (DL) can substantially enhance the ghost imaging reconstr… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  41. arXiv:2503.08377  [pdf, other

    cs.CV

    Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens

    Authors: Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang

    Abstract: Image tokenization has significantly advanced visual generation and multimodal modeling, particularly when paired with autoregressive models. However, current methods face challenges in balancing efficiency and fidelity: high-resolution image reconstruction either requires an excessive number of tokens or compromises critical details through token reduction. To resolve this, we propose Latent Cons… ▽ More

    Submitted 13 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  42. arXiv:2503.08354  [pdf, other

    cs.CV cs.AI

    Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis

    Authors: Kai Qiu, Xiang Li, Jason Kuen, Hao Chen, Xiaohao Xu, Jiuxiang Gu, Yinyi Luo, Bhiksha Raj, Zhe Lin, Marios Savvides

    Abstract: Recent image generation schemes typically capture image distribution in a pre-constructed latent space relying on a frozen image tokenizer. Though the performance of tokenizer plays an essential role to the successful generation, its current evaluation metrics (e.g. rFID) fail to precisely assess the tokenizer and correlate its performance to the generation quality (e.g. gFID). In this paper, we c… ▽ More

    Submitted 17 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 17 pages, 13 figures, 6 tables

  43. arXiv:2503.08338  [pdf, other

    cs.RO

    Trinity: A Modular Humanoid Robot AI System

    Authors: Jingkai Sun, Qiang Zhang, Gang Han, Wen Zhao, Zhe Yong, Yan He, Jiaxu Wang, Jiahang Cao, Yijie Guo, Renjing Xu

    Abstract: In recent years, research on humanoid robots has garnered increasing attention. With breakthroughs in various types of artificial intelligence algorithms, embodied intelligence, exemplified by humanoid robots, has been highly anticipated. The advancements in reinforcement learning (RL) algorithms have significantly improved the motion control and generalization capabilities of humanoid robots. Sim… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  44. arXiv:2503.08219  [pdf, other

    cs.CV cs.AI

    CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning

    Authors: Kaiqiang Xiong, Rui Peng, Zhe Zhang, Tianxing Feng, Jianbo Jiao, Feng Gao, Ronggang Wang

    Abstract: Unsupervised Multi-View Stereo (MVS) methods have achieved promising progress recently. However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning ap… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Accpetd by ICCV2023

  45. arXiv:2503.08037  [pdf, other

    cs.GR cs.AI cs.CV

    ObjectMover: Generative Object Movement with Video Prior

    Authors: Xin Yu, Tianyu Wang, Soo Ye Kim, Paul Guerrero, Xi Chen, Qing Liu, Zhe Lin, Xiaojuan Qi

    Abstract: Simple as it seems, moving an object to another location within an image is, in fact, a challenging image-editing task that requires re-harmonizing the lighting, adjusting the pose based on perspective, accurately filling occluded regions, and ensuring coherent synchronization of shadows and reflections while maintaining the object identity. In this paper, we present ObjectMover, a generative mode… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: CVPR 2025, Project Page: https://xinyu-andy.github.io/ObjMover

  46. arXiv:2503.08019  [pdf, other

    cs.CV

    Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models

    Authors: Bozhi Luan, Wengang Zhou, Hao Feng, Zhe Wang, Xiaosong Li, Houqiang Li

    Abstract: As the computational needs of Large Vision-Language Models (LVLMs) increase, visual token pruning has proven effective in improving inference speed and memory efficiency. Traditional pruning methods in LVLMs predominantly focus on attention scores to determine token relevance, overlooking critical aspects such as spatial position and token similarity. To this end, we introduce AdaptPrune, a novel… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  47. arXiv:2503.07891  [pdf, other

    cs.CL cs.AI

    Gemini Embedding: Generalizable Embeddings from Gemini

    Authors: Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Ábrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, Xiaoqi Ren, Shanfeng Zhang, Daniel Salz, Michael Boratko, Jay Han, Blair Chen, Shuo Huang, Vikram Rao, Paul Suganthan, Feng Han, Andreas Doumanoglou, Nithi Gupta, Fedor Moiseev, Cathy Yip, Aashi Jain , et al. (22 additional authors not shown)

    Abstract: In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding capabilities, Gemini Embedding produces highly generalizable embeddings for text spanning numerous languages and textual modalities. The representations generated by Gemini… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 19 pages

  48. arXiv:2503.07661  [pdf, other

    cs.LG cs.AI cs.CR

    Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy

    Authors: Wei Junhao, Yu Zhe, Sakuma Jun

    Abstract: Model merging is a technique that combines multiple finetuned models into a single model without additional training, allowing a free-rider to cheaply inherit specialized capabilities. This study investigates methodologies to suppress unwanted model merging by free-riders. Existing methods such as model watermarking or fingerprinting can only detect merging in hindsight. In contrast, we propose a… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 12 pages, 7 figures

  49. arXiv:2503.07478  [pdf, other

    cs.CV

    VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

    Authors: Jiacheng Ruan, Wenzhen Yuan, Xian Gao, Ye Guo, Daoxin Zhang, Zhe Xu, Yao Hu, Ting Liu, Yuzhuo Fu

    Abstract: Although large visual-language models (LVLMs) have demonstrated strong performance in multimodal tasks, errors may occasionally arise due to biases during the reasoning process. Recently, reward models (RMs) have become increasingly pivotal in the reasoning process. Specifically, process RMs evaluate each reasoning step, outcome RMs focus on the assessment of reasoning results, and critique RMs pe… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 12 pages, 4 figures. This work is in progress

  50. arXiv:2503.07174  [pdf, other

    cond-mat.str-el

    Quantum spin dynamics of the honeycomb magnet K$_2$Co$_2$TeO$_6$ in high magnetic fields

    Authors: Patrick Pilch, Laur Peedu, Urmas Nagel, Toomas Rõõm, Changqing Zhu, Yurii Skourski, Xianghan Xu, Robert J. Cava, Zhe Wang

    Abstract: We present terahertz spectroscopic measurements of quantum spin dynamics in the honeycomb magnet K$_2$Co$_2$TeO$_6$ as a function of temperature, polarization and in an external magnetic field applied in the honeycomb plane. Magnetic excitations are resolved below the magnetic ordering temperature of $T_\text{N}$ = 12 K. In the applied magnetic field, we reveal characteristic field dependence not… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.