Skip to main content

Showing 1–50 of 2,039 results for author: Hu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10349  [pdf, ps, other

    cs.CR

    Locally Differentially Private Frequency Estimation via Joint Randomized Response

    Authors: Ye Zheng, Shafizur Rahman Seeam, Yidan Hu, Rui Zhang, Yanchao Zhang

    Abstract: Local Differential Privacy (LDP) has been widely recognized as a powerful tool for providing a strong theoretical guarantee of data privacy to data contributors against an untrusted data collector. Under a typical LDP scheme, each data contributor independently randomly perturbs their data before submitting them to the data collector, which in turn infers valuable statistics about the original dat… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Accepted by PETS'25 (issue 3)

    ACM Class: E.3

  2. arXiv:2505.09998  [pdf, other

    cs.CV

    From Air to Wear: Personalized 3D Digital Fashion with AR/VR Immersive 3D Sketching

    Authors: Ying Zang, Yuanqi Hu, Xinyu Chen, Yuxia Xu, Suhui Wang, Chunan Yu, Lanyun Zhu, Deyi Ji, Xin Xu, Tianrun Chen

    Abstract: In the era of immersive consumer electronics, such as AR/VR headsets and smart devices, people increasingly seek ways to express their identity through virtual fashion. However, existing 3D garment design tools remain inaccessible to everyday users due to steep technical barriers and limited data. In this work, we introduce a 3D sketch-driven 3D garment generation framework that empowers ordinary… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 8 pages, 5 figures

  3. arXiv:2505.09568  [pdf, ps, other

    cs.CV cs.AI

    BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

    Authors: Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, Le Xue, Caiming Xiong, Ran Xu

    Abstract: Unifying image understanding and generation has gained growing attention in recent research on multimodal models. Although design choices for image understanding have been extensively studied, the optimal model architecture and training recipe for a unified framework with image generation remain underexplored. Motivated by the strong potential of autoregressive and diffusion models for high-qualit… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09456  [pdf, ps, other

    quant-ph cs.AI cs.LG

    Quantum state-agnostic work extraction (almost) without dissipation

    Authors: Josep Lumbreras, Ruo Cheng Huang, Yanglin Hu, Mile Gu, Marco Tomamichel

    Abstract: We investigate work extraction protocols designed to transfer the maximum possible energy to a battery using sequential access to $N$ copies of an unknown pure qubit state. The core challenge is designing interactions to optimally balance two competing goals: charging of the battery optimally using the qubit in hand, and acquiring more information by qubit to improve energy harvesting in subsequen… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 5 pages+14 pages, 2 figures

  5. arXiv:2505.09430  [pdf, ps, other

    cs.RO cs.LG

    Train a Multi-Task Diffusion Policy on RLBench-18 in One Day with One GPU

    Authors: Yutong Hu, Pinhao Song, Kehan Wen, Renaud Detry

    Abstract: We present a method for training multi-task vision-language robotic diffusion policies that reduces training time and memory usage by an order of magnitude. This improvement arises from a previously underexplored distinction between action diffusion and the image diffusion techniques that inspired it: image generation targets are high-dimensional, while robot actions lie in a much lower-dimensiona… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  6. arXiv:2505.09259  [pdf, ps, other

    cs.NI

    Interplay Between AI and Space-Air-Ground Integrated Network: The Road Ahead

    Authors: Chenyu Wu, Xi Wang, Yi Hu, Shuai Han, Dusit Niyato

    Abstract: Space-air-ground integrated network (SAGIN) is envisioned as a key network architecture for achieving ubiquitous coverage in the next-generation communication system. Concurrently, artificial intelligence (AI) plays a pivotal role in managing the complex control of SAGIN, thereby enhancing its automation and flexibility. Despite this, there remains a significant research gap concerning the interac… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2505.08838  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

    Authors: Peixuan Ge, Tongkun Su, Faqin Lv, Baoliang Zhao, Peng Zhang, Chi Hong Wong, Liang Yao, Yu Sun, Zenan Wang, Pak Kin Wong, Ying Hu

    Abstract: Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveragi… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  8. arXiv:2505.08765  [pdf, other

    cs.CV cs.AI

    Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

    Authors: Yatai Ji, Zhengqiu Zhu, Yong Zhao, Beidan Liu, Chen Gao, Yihao Zhao, Sihang Qiu, Yue Hu, Quanjun Yin, Yong Li

    Abstract: Aerial Visual Object Search (AVOS) tasks in urban environments require Unmanned Aerial Vehicles (UAVs) to autonomously search for and identify target objects using visual and textual cues without external guidance. Existing approaches struggle in complex urban environments due to redundant semantic processing, similar object distinction, and the exploration-exploitation dilemma. To bridge this gap… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  9. arXiv:2505.08557  [pdf, ps, other

    cs.LG

    Online Learning and Unlearning

    Authors: Yaxi Hu, Bernhard Schölkopf, Amartya Sanyal

    Abstract: We formalize the problem of online learning-unlearning, where a model is updated sequentially in an online setting while accommodating unlearning requests between updates. After a data point is unlearned, all subsequent outputs must be statistically indistinguishable from those of a model trained without that point. We present two online learner-unlearner (OLU) algorithms, both built upon online g… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  10. arXiv:2505.08229  [pdf, other

    cs.RO eess.SY

    Constrained Factor Graph Optimization for Robust Networked Pedestrian Inertial Navigation

    Authors: Yingjie Hu, Wang Hu

    Abstract: This paper presents a novel constrained Factor Graph Optimization (FGO)-based approach for networked inertial navigation in pedestrian localization. To effectively mitigate the drift inherent in inertial navigation solutions, we incorporate kinematic constraints directly into the nonlinear optimization framework. Specifically, we utilize equality constraints, such as Zero-Velocity Updates (ZUPTs),… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 6 pages, 5 figures. Accepted by 2025 IEEE/ION Position, Location and Navigation Symposium (PLANS)

  11. arXiv:2505.07294  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    HuB: Learning Extreme Humanoid Balance

    Authors: Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, Yang Gao

    Abstract: The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1.5 meters-both requiring precise balance control. While recent research on humanoid control has leveraged reinforcement learning to track human motions for skill acquisition, applying this paradigm to balance-intensive tasks remains challenging. In th… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project website: https://hub-robot.github.io

  12. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  13. arXiv:2505.06678  [pdf, other

    cs.NI eess.SP

    Distributionally Robust Contract Theory for Edge AIGC Services in Teleoperation

    Authors: Zijun Zhan, Yaxian Dong, Daniel Mawunyo Doe, Yuqing Hu, Shuai Li, Shaohua Cao, Lei Fan, Zhu Han

    Abstract: Advanced AI-Generated Content (AIGC) technologies have injected new impetus into teleoperation, further enhancing its security and efficiency. Edge AIGC networks have been introduced to meet the stringent low-latency requirements of teleoperation. However, the inherent uncertainty of AIGC service quality and the need to incentivize AIGC service providers (ASPs) make the design of a robust incentiv… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  14. arXiv:2505.05336  [pdf, other

    cs.CV

    Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors

    Authors: Zunjie Zhu, Yan Zhao, Yihan Hu, Guoxiang Wang, Hai Qiu, Bolun Zheng, Chenggang Yan, Feng Xu

    Abstract: The motion capture system that supports full-body virtual representation is of key significance for virtual reality. Compared to vision-based systems, full-body pose estimation from sparse tracking signals is not limited by environmental conditions or recording range. However, previous works either face the challenge of wearing additional sensors on the pelvis and lower-body or rely on external vi… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  15. arXiv:2505.05195  [pdf, other

    cs.LG cs.AI cs.CV

    Concept-Based Unsupervised Domain Adaptation

    Authors: Xinyue Xu, Yueying Hu, Hui Tang, Yi Qin, Lu Mi, Hao Wang, Xiaomeng Li

    Abstract: Concept Bottleneck Models (CBMs) enhance interpretability by explaining predictions through human-understandable concepts but typically assume that training and test data share the same distribution. This assumption often fails under domain shifts, leading to degraded performance and poor generalization. To address these limitations and improve the robustness of CBMs, we propose the Concept-based… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  16. arXiv:2505.05098  [pdf, ps, other

    cs.RO cs.CL cs.CV cs.ET

    X-Driver: Explainable Autonomous Driving with Vision-Language Models

    Authors: Wei Liu, Jiyuan Zhang, Binxiong Zheng, Yufeng Hu, Yingzhan Lin, Zengfeng Zeng

    Abstract: End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a uni… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  17. arXiv:2505.04087  [pdf, other

    cs.CV

    SEVA: Leveraging Single-Step Ensemble of Vicinal Augmentations for Test-Time Adaptation

    Authors: Zixuan Hu, Yichun Hu, Ling-Yu Duan

    Abstract: Test-Time adaptation (TTA) aims to enhance model robustness against distribution shifts through rapid model adaptation during inference. While existing TTA methods often rely on entropy-based unsupervised training and achieve promising results, the common practice of a single round of entropy training is typically unable to adequately utilize reliable samples, hindering adaptation efficiency. In t… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  18. arXiv:2505.03769  [pdf, other

    cs.SI cs.AI cs.IR

    The Influence of Text Variation on User Engagement in Cross-Platform Content Sharing

    Authors: Yibo Hu, Yiqiao Jin, Meng Ye, Ajay Divakaran, Srijan Kumar

    Abstract: In today's cross-platform social media landscape, understanding factors that drive engagement for multimodal content, especially text paired with visuals, remains complex. This study investigates how rewriting Reddit post titles adapted from YouTube video titles affects user engagement. First, we build and analyze a large dataset of Reddit posts sharing YouTube videos, revealing that 21% of post t… ▽ More

    Submitted 26 April, 2025; originally announced May 2025.

  19. arXiv:2505.03694  [pdf, other

    cs.RO cs.AI

    Demonstrating ViSafe: Vision-enabled Safety for High-speed Detect and Avoid

    Authors: Parv Kapoor, Ian Higgins, Nikhil Keetha, Jay Patrikar, Brady Moon, Zelin Ye, Yao He, Ivan Cisneros, Yaoyu Hu, Changliu Liu, Eunsuk Kang, Sebastian Scherer

    Abstract: Assured safe-separation is essential for achieving seamless high-density operation of airborne vehicles in a shared airspace. To equip resource-constrained aerial systems with this safety-critical capability, we present ViSafe, a high-speed vision-only airborne collision avoidance system. ViSafe offers a full-stack solution to the Detect and Avoid (DAA) problem by tightly integrating a learning-ba… ▽ More

    Submitted 8 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

    Comments: 13 pages, RSS 2025 Demo track, https://theairlab.org/visafe/

  20. arXiv:2505.03426  [pdf, other

    cs.CV cs.AI

    Phenotype-Guided Generative Model for High-Fidelity Cardiac MRI Synthesis: Advancing Pretraining and Clinical Applications

    Authors: Ziyu Li, Yujian Hu, Zhengyao Ding, Yiheng Mao, Haitao Li, Fan Yi, Hongkun Zhang, Zhengxing Huang

    Abstract: Cardiac Magnetic Resonance (CMR) imaging is a vital non-invasive tool for diagnosing heart diseases and evaluating cardiac health. However, the limited availability of large-scale, high-quality CMR datasets poses a major challenge to the effective application of artificial intelligence (AI) in this domain. Even the amount of unlabeled data and the health status it covers are difficult to meet the… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  21. arXiv:2505.02185  [pdf, other

    q-fin.PM cs.LG econ.EM stat.ME stat.ML

    Latent Variable Estimation in Bayesian Black-Litterman Models

    Authors: Thomas Y. L. Lin, Jerry Yao-Chieh Hu, Paul W. Chiou, Peter Lin

    Abstract: We revisit the Bayesian Black-Litterman (BL) portfolio model and remove its reliance on subjective investor views. Classical BL requires an investor "view": a forecast vector $q$ and its uncertainty matrix $Ω$ that describe how much a chosen portfolio should outperform the market. Our key idea is to treat $(q,Ω)$ as latent variables and learn them from market data within a single Bayesian network.… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted at ICML 2025

  22. arXiv:2505.01687  [pdf, other

    cs.IT eess.SP

    Resilient Vehicular Communications under Imperfect Channel State Information

    Authors: Tingyu Shui, Walid Saad, Ye Hu, Mingzhe Chen

    Abstract: Cellular vehicle-to-everything (C-V2X) networks provide a promising solution to improve road safety and traffic efficiency. One key challenge in such systems lies in meeting quality-of-service (QoS) requirements of vehicular communication links given limited network resources, particularly under imperfect channel state information (CSI) conditions caused by the highly dynamic environment. In this… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  23. arXiv:2505.01652  [pdf, other

    cs.LG cs.AI

    Causally Fair Node Classification on Non-IID Graph Data

    Authors: Yucong Dai, Lu Zhang, Yaowei Hu, Susan Gauch, Yongkai Wu

    Abstract: Fair machine learning seeks to identify and mitigate biases in predictions against unfavorable populations characterized by demographic attributes, such as race and gender. Recently, a few works have extended fairness to graph data, such as social networks, but most of them neglect the causal relationships among data instances. This paper addresses the prevalent challenge in fairness-aware ML algo… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  24. arXiv:2505.01572  [pdf, other

    cs.AI cs.DC

    PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding

    Authors: Bradley McDanel, Sai Qian Zhang, Yunhai Hu, Zining Liu

    Abstract: Speculative decoding accelerates large language model inference by using smaller draft models to generate candidate tokens for parallel verification. However, current approaches are limited by sequential stage dependencies that prevent full hardware utilization. We present PipeSpec, a framework that generalizes speculative decoding to $k$ models arranged in a hierarchical pipeline, enabling asynch… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures, 2 tables

  25. arXiv:2505.00598  [pdf, ps, other

    cs.LG cs.AI

    Fast and Low-Cost Genomic Foundation Models via Outlier Removal

    Authors: Haozheng Luo, Chenghao Qiu, Maojiang Su, Zhihan Zhou, Zoe Mehta, Guo Ye, Jerry Yao-Chieh Hu, Han Liu

    Abstract: To address the challenge of scarce computational resources in genomic modeling, we introduce GERM, a genomic foundation model with strong compression performance and fast adaptability. GERM improves upon models like DNABERT-2 by eliminating outliers that hinder low-rank adaptation and post-training quantization, enhancing both efficiency and robustness. We replace the vanilla attention layer with… ▽ More

    Submitted 2 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: International Conference on Machine Learning (ICML) 2025

  26. arXiv:2505.00359  [pdf, other

    cs.LG cs.AI cs.NE

    TNStream: Applying Tightest Neighbors to Micro-Clusters to Define Multi-Density Clusters in Streaming Data

    Authors: Qifen Zeng, Haomin Bao, Yuanzhuo Hu, Zirui Zhang, Yuheng Zheng, Luosheng Wen

    Abstract: In data stream clustering, systematic theory of stream clustering algorithms remains relatively scarce. Recently, density-based methods have gained attention. However, existing algorithms struggle to simultaneously handle arbitrarily shaped, multi-density, high-dimensional data while maintaining strong outlier resistance. Clustering quality significantly deteriorates when data density varies compl… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 21 pages, 9 figures, 8 tables, under review at Expert Systems with Applications (ESWA)

    MSC Class: 68T05; 68W20 ACM Class: H.2.8; I.5.3

  27. arXiv:2505.00304  [pdf, other

    stat.ML cs.LG stat.ME

    Reinforcement Learning with Continuous Actions Under Unmeasured Confounding

    Authors: Yuhan Li, Eugene Han, Yifan Hu, Wenzhuo Zhou, Zhengling Qi, Yifan Cui, Ruoqing Zhu

    Abstract: This paper addresses the challenge of offline policy learning in reinforcement learning with continuous action spaces when unmeasured confounders are present. While most existing research focuses on policy evaluation within partially observable Markov decision processes (POMDPs) and assumes discrete action spaces, we advance this field by establishing a novel identification result to enable the no… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  28. arXiv:2504.20763  [pdf, other

    cs.SE

    Understanding Large Language Model Supply Chain: Structure, Domain, and Vulnerabilities

    Authors: Yanzhe Hu, Shenao Wang, Tianyuan Nie, Yanjie Zhao, Haoyu Wang

    Abstract: Large Language Models (LLMs) have revolutionized artificial intelligence (AI), driving breakthroughs in natural language understanding, text generation, and autonomous systems. However, the rapid growth of LLMs presents significant challenges in the security and reliability of the Large Language Model Supply Chain (LLMSC), a complex network of open-source components, libraries, and tools essential… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  29. arXiv:2504.20624  [pdf, other

    cs.AI

    PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval

    Authors: Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao, Jia Chen, Zhe Xu, Yi Wu, Yao Hu

    Abstract: Social chatbots have become essential intelligent companions in daily scenarios ranging from emotional support to personal interaction. However, conventional chatbots with passive response mechanisms usually rely on users to initiate or sustain dialogues by bringing up new topics, resulting in diminished engagement and shortened dialogue duration. In this paper, we present PaRT, a novel framework… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  30. arXiv:2504.20607  [pdf, other

    cs.CV

    EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian

    Authors: Hao Tian, Rui Liu, Wen Shen, Yilong Hu, Zhihao Zheng, Xiaolin Qin

    Abstract: 3D Gaussian Splatting (3DGS) has been recognized as a pioneering technique in scene reconstruction and novel view synthesis. Recent work on reconstructing the 3D human body using 3DGS attempts to leverage prior information on human pose to enhance rendering quality and improve training speed. However, it struggles to effectively fit dynamic surface planes due to multi-view inconsistency and redund… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 11 pages, 3 figures

  31. arXiv:2504.19959  [pdf, ps, other

    cs.AR

    From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

    Authors: Junhao Ye, Yuchen Hu, Ke Xu, Dingrong Pan, Qichun Chen, Jie Zhou, Shuai Zhao, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang

    Abstract: Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise… ▽ More

    Submitted 28 April, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  32. arXiv:2504.19901  [pdf, other

    cs.LG cs.AI stat.ML

    Attention Mechanism, Max-Affine Partition, and Universal Approximation

    Authors: Hude Liu, Jerry Yao-Chieh Hu, Zhao Song, Han Liu

    Abstract: We establish the universal approximation capability of single-layer, single-head self- and cross-attention mechanisms with minimal attached structures. Our key insight is to interpret single-head attention as an input domain-partition mechanism that assigns distinct values to subregions. This allows us to engineer the attention weights such that this assignment imitates the target function. Buildi… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  33. arXiv:2504.19637  [pdf, other

    cs.CV

    Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval

    Authors: Junlong Ren, Gangjian Zhang, Yu Hu, Jian Shu, Hao Wang

    Abstract: Partially Relevant Video Retrieval (PRVR) aims to retrieve the target video that is partially relevant to the text query. The primary challenge in PRVR arises from the semantic asymmetry between textual and visual modalities, as videos often contain substantial content irrelevant to the query. Existing methods coarsely align paired videos and text queries to construct the semantic space, neglectin… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  34. arXiv:2504.19099  [pdf, other

    cs.SE cs.AI cs.AR

    VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction

    Authors: Ning Wang, Bingkun Yao, Jie Zhou, Yuchen Hu, Xi Wang, Nan Guan, Zhe Jiang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  35. arXiv:2504.18509  [pdf, other

    cs.CV

    Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

    Authors: Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma

    Abstract: Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often ov… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: CVPR 2025. Project page and codes: https://eval3d.github.io/

  36. arXiv:2504.17789  [pdf, other

    cs.CV

    Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

    Authors: Xu Ma, Peize Sun, Haoyu Ma, Hao Tang, Chih-Yao Ma, Jialiang Wang, Kunpeng Li, Xiaoliang Dai, Yujun Shi, Xuan Ju, Yushi Hu, Artsiom Sanakoyeu, Felix Juefei-Xu, Ji Hou, Junjiao Tian, Tao Xu, Tingbo Hou, Yen-Cheng Liu, Zecheng He, Zijian He, Matt Feiszli, Peizhao Zhang, Peter Vajda, Sam Tsai, Yun Fu

    Abstract: Autoregressive (AR) models, long dominant in language generation, are increasingly applied to image synthesis but are often considered less competitive than Diffusion-based models. A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution. To address this, we present Token-Shuffle, a n… ▽ More

    Submitted 27 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: Project Page: https://ma-xu.github.io/token-shuffle/ Add related works

  37. arXiv:2504.17705  [pdf, other

    cs.HC

    LUIDA: Large-scale Unified Infrastructure for Digital Assessments based on Commercial Metaverse Platform

    Authors: Yong-Hao Hu, Sotaro Yokoi, Yuji Hatada, Yuichi Hiroi, Takuji Narumi, Takefumi Hiraki

    Abstract: Online experiments using metaverse platforms have gained significant traction in Human-Computer Interaction and Virtual Reality (VR) research. However, current research workflows are highly fragmented, as researchers must use separate tools for system implementation, participant recruitment, experiment execution, and data collection, reducing consistency and increasing workload. We present LUIDA (… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  38. arXiv:2504.16214  [pdf, other

    cs.LG cs.AI cs.PL

    Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis

    Authors: Xiao Zhang, Yaoyao Ding, Yang Hu, Gennady Pekhimenko

    Abstract: Deep learning (DL) workloads mainly run on accelerators like GPUs. Recent DL quantization techniques demand a new matrix multiplication operator with mixed input data types, further complicating GPU optimization. Prior high-level compilers like Triton lack the expressiveness to implement key optimizations like fine-grained data pipelines and hardware-friendly memory layouts for these operators, wh… ▽ More

    Submitted 30 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 17 pages, 24 figures

  39. arXiv:2504.16074  [pdf, other

    cs.CL

    PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

    Authors: Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang , et al. (27 additional authors not shown)

    Abstract: We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, t… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 21 pages ,8 figures, 4 tables

  40. arXiv:2504.16068  [pdf, other

    physics.comp-ph cs.LG physics.chem-ph

    High-performance training and inference for deep equivariant interatomic potentials

    Authors: Chuin Wei Tan, Marc L. Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R. Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R. Hu, Tess Smidt, Anders Johansson, William C. Witt, Boris Kozinsky, Albert Musaelian

    Abstract: Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presen… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  41. arXiv:2504.15956  [pdf, other

    cs.LG cs.AI stat.ML

    Universal Approximation with Softmax Attention

    Authors: Jerry Yao-Chieh Hu, Hude Liu, Hong-Yu Chen, Weimin Wu, Han Liu

    Abstract: We prove that with linear transformations, both (i) two-layer self-attention and (ii) one-layer self-attention followed by a softmax function are universal approximators for continuous sequence-to-sequence functions on compact domains. Our main technique is a new interpolation-based method for analyzing attention's internal mechanism. This leads to our key insight: self-attention is able to approx… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  42. arXiv:2504.15804  [pdf, other

    cs.AR cs.AI

    Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback

    Authors: Ning Wang, Bingkun Yao, Jie Zhou, Yuchen Hu, Xi Wang, Nan Guan, Zhe Jiang

    Abstract: Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of har… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  43. arXiv:2504.15281  [pdf, other

    cs.CV

    StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians

    Authors: Cailin Zhuang, Yaoqi Hu, Xuanyang Zhang, Wei Cheng, Jiacheng Bao, Shengqi Liu, Yiying Yang, Xianfang Zeng, Gang Yu, Ming Li

    Abstract: 3D Gaussian Splatting (3DGS) excels in photorealistic scene reconstruction but struggles with stylized scenarios (e.g., cartoons, games) due to fragmented textures, semantic misalignment, and limited adaptability to abstract aesthetics. We propose StyleMe3D, a holistic framework for 3D GS style transfer that integrates multi-modal style conditioning, multi-level semantic alignment, and perceptual… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 16 pages; Project page: https://styleme3d.github.io/

  44. arXiv:2504.14174  [pdf, other

    cs.LG cs.AI

    A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences

    Authors: Jing Han, Hanting Chen, Kai Han, Xiaomeng Huang, Yongyun Hu, Wenjun Xu, Dacheng Tao, Ping Zhang

    Abstract: With the rapid development of machine learning in recent years, many problems in meteorology can now be addressed using AI models. In particular, data-driven algorithms have significantly improved accuracy compared to traditional methods. Meteorological data is often transformed into 2D images or 3D videos, which are then fed into AI models for learning. Additionally, these models often incorporat… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Perspective article

  45. arXiv:2504.13950  [pdf, other

    cs.LG cs.AI

    Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain

    Authors: Zhongxi Qiu, Zhang Zhang, Yan Hu, Heng Li, Jiang Liu

    Abstract: This paper explores optimal data selection strategies for Reinforcement Learning with Verified Rewards (RLVR) training in the medical domain. While RLVR has shown exceptional potential for enhancing reasoning capabilities in large language models, most prior implementations have focused on mathematics and logical puzzles, with limited exploration of domain-specific applications like medicine. We i… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 15 figures

  46. arXiv:2504.13916  [pdf, other

    cs.HC cs.RO

    Task Matters: Investigating Human Questioning Behavior in Different Household Service for Learning by Asking Robots

    Authors: Yuanda Hu, Hou Jiani, Zhang Junyu, Yate Ge, Xiaohua Sun, Weiwei Guo

    Abstract: Learning by Asking (LBA) enables robots to identify knowledge gaps during task execution and acquire the missing information by asking targeted questions. However, different tasks often require different types of questions, and how to adapt questioning strategies accordingly remains underexplored. This paper investigates human questioning behavior in two representative household service tasks: a G… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  47. arXiv:2504.13413  [pdf, other

    cs.LG cs.RO eess.SY

    A Model-Based Approach to Imitation Learning through Multi-Step Predictions

    Authors: Haldun Balim, Yang Hu, Yuyang Zhang, Na Li

    Abstract: Imitation learning is a widely used approach for training agents to replicate expert behavior in complex decision-making tasks. However, existing methods often struggle with compounding errors and limited generalization, due to the inherent challenge of error correction and the distribution shift between training and deployment. In this paper, we present a novel model-based imitation learning fram… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  48. arXiv:2504.12285  [pdf, other

    cs.CL cs.LG

    BitNet b1.58 2B4T Technical Report

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei

    Abstract: We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability. Our results demonstrate that BitNet b1.58 2B4T achieves performanc… ▽ More

    Submitted 24 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: Work in progress

  49. arXiv:2504.11967  [pdf, other

    cs.CV cs.AI cs.RO

    Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions

    Authors: Yifei Dong, Fengyi Wu, Sanjian Zhang, Guangyu Chen, Yuzhi Hu, Masumi Yano, Jingdong Sun, Siyu Huang, Feng Liu, Qi Dai, Zhi-Qi Cheng

    Abstract: Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and related tasks, yet they also introduce critical security challenges. This survey provides a wide-ranging examination of the anti-UAV domain, centering on three core objectives-classification, detection, and tracking-while detailing emerging methodologies such as diffusion-based data synthesis, multi-… ▽ More

    Submitted 17 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR Workshop Anti-UAV 2025. 15 pages

  50. arXiv:2504.11354  [pdf, other

    cs.AI

    Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

    Authors: Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, Jianqiao Lu, Hugues de Saxcé, Bolton Bailey, Chendong Song, Chenjun Xiao, Dehao Zhang, Ebony Zhang, Frederick Pu, Han Zhu, Jiawei Liu, Jonas Bayer, Julien Michel, Longhui Yu, Léo Dreyfus-Schmidt, Lewis Tunstall , et al. (15 additional authors not shown)

    Abstract: We introduce Kimina-Prover Preview, a large language model that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving, as showcased in this preview release. Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation by employing a structured reasoning pattern we term \textit{forma… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 22 pages