Skip to main content

Showing 1–50 of 1,735 results for author: Yu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01131  [pdf, ps, other

    cs.LG physics.comp-ph

    Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations

    Authors: Yuchao Lin, Cong Fu, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

    Abstract: $\rm{SO}(3)… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  2. arXiv:2507.00672  [pdf, ps, other

    cs.NI cs.DC

    Toward Edge General Intelligence with Multiple-Large Language Model (Multi-LLM): Architecture, Trust, and Orchestration

    Authors: Haoxiang Luo, Yinqiu Liu, Ruichen Zhang, Jiacheng Wang, Gang Sun, Dusit Niyato, Hongfang Yu, Zehui Xiong, Xianbin Wang, Xuemin Shen

    Abstract: Edge computing enables real-time data processing closer to its source, thus improving the latency and performance of edge-enabled AI applications. However, traditional AI models often fall short when dealing with complex, dynamic tasks that require advanced reasoning and multimodal data processing. This survey explores the integration of multi-LLMs (Large Language Models) to address this in edge c… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2507.00407  [pdf, ps, other

    physics.chem-ph cs.AI q-bio.QM

    Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials

    Authors: Cong Fu, Yuchao Lin, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

    Abstract: Accurate molecular property predictions require 3D geometries, which are typically obtained using expensive methods such as density functional theory (DFT). Here, we attempt to obtain molecular geometries by relying solely on machine learning interatomic potential (MLIP) models. To this end, we first curate a large-scale molecular relaxation dataset comprising 3.5 million molecules and 300 million… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  4. arXiv:2507.00398  [pdf, ps, other

    eess.IV cs.CV

    Accurate and Efficient Fetal Birth Weight Estimation from 3D Ultrasound

    Authors: Jian Wang, Qiongying Ni, Hongkui Yu, Ruixuan Yao, Jinqiao Ying, Bin Zhang, Xingyi Yang, Jin Peng, Jiongquan Chen, Junxuan Yu, Wenlong Shi, Chaoyu Chen, Zhongnuo Yan, Mingyuan Luo, Gaocheng Cai, Dong Ni, Jing Lu, Xin Yang

    Abstract: Accurate fetal birth weight (FBW) estimation is essential for optimizing delivery decisions and reducing perinatal mortality. However, clinical methods for FBW estimation are inefficient, operator-dependent, and challenging to apply in cases of complex fetal anatomy. Existing deep learning methods are based on 2D standard ultrasound (US) images or videos that lack spatial information, limiting the… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  5. arXiv:2506.23982  [pdf, ps, other

    cs.CV cs.RO

    StyleDrive: Towards Driving-Style Aware Benchmarking of End-To-End Autonomous Driving

    Authors: Ruiyang Hao, Bowen Jing, Haibao Yu, Zaiqing Nie

    Abstract: While personalization has been explored in traditional autonomous driving systems, it remains largely overlooked in end-to-end autonomous driving (E2EAD), despite its growing prominence. This gap is critical, as user-aligned behavior is essential for trust, comfort, and widespread adoption of autonomous vehicles. A core challenge is the lack of large-scale real-world datasets annotated with divers… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 14 pages, 4 figures

    ACM Class: I.4.9

  6. arXiv:2506.23485  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent

    Authors: Haocheng Yu, Yaxiong Wu, Hao Wang, Wei Guo, Yong Liu, Yawen Li, Yuyang Ye, Junping Du, Enhong Chen

    Abstract: Interactive recommendation is a typical information-seeking task that allows users to interactively express their needs through natural language and obtain personalized recommendations. Large language model-powered (LLM-powered) agents have become a new paradigm in interactive recommendations, effectively capturing users' real-time needs and enhancing personalized experiences. However, due to limi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  7. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  8. arXiv:2506.23263  [pdf, ps, other

    cs.CV

    Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

    Authors: Lei-lei Li, Jianwu Fang, Junbin Xiao, Shanmin Pang, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

    Abstract: Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars, and synthesizing causal-entity reflected accident videos can facilitate the capability test to respond to unaffordable accidents in reality. However, incorporating causal relations as seen in real-world videos into synthetic videos remains challenging. This work argues that precisely… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV2025

  9. arXiv:2506.23152  [pdf, ps, other

    cs.RO

    DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover

    Authors: Youzhuo Wang, Jiayi Ye, Chuyang Xiao, Yiming Zhong, Heng Tao, Hang Yu, Yumeng Liu, Jingyi Yu, Yuexin Ma

    Abstract: Handover between a human and a dexterous robotic hand is a fundamental yet challenging task in human-robot collaboration. It requires handling dynamic environments and a wide variety of objects and demands robust and adaptive grasping strategies. However, progress in developing effective dynamic dexterous grasping methods is limited by the absence of high-quality, real-world human-to-robot handove… ▽ More

    Submitted 2 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

    Comments: Comments: Accepted by ICCV 2025. Project page: https://dexh2r.github.io/

  10. arXiv:2506.19349  [pdf, ps, other

    cs.DC

    A Heuristic Algorithm for Shortest Path Search

    Authors: Huashan Yu, Xiaolin Wang, Yingwei Luo

    Abstract: The Single-Source Shortest Path (SSSP) problem is well-known for the challenges in developing fast, practical, and work-efficient parallel algorithms. This work introduces a novel shortest path search method. It allows paths with different lengths to be extended in parallel at the cost of almost negligible repeated relaxations. A dynamic-stepping heuristic is proposed for the method to efficiently… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  11. arXiv:2506.18732  [pdf, ps, other

    cs.LG

    Towards Group Fairness with Multiple Sensitive Attributes in Federated Foundation Models

    Authors: Yuning Yang, Han Yu, Tianrun Gao, Xiaodong Xu, Guangyu Wang

    Abstract: The deep integration of foundation models (FM) with federated learning (FL) enhances personalization and scalability for diverse downstream tasks, making it crucial in sensitive domains like healthcare. Achieving group fairness has become an increasingly prominent issue in the era of federated foundation models (FFMs), since biases in sensitive attributes might lead to inequitable treatment for un… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  12. arXiv:2506.18559  [pdf, ps, other

    cs.AI cs.LO

    T-CPDL: A Temporal Causal Probabilistic Description Logic for Developing Logic-RAG Agent

    Authors: Hong Qing Yu

    Abstract: Large language models excel at generating fluent text but frequently struggle with structured reasoning involving temporal constraints, causal relationships, and probabilistic reasoning. To address these limitations, we propose Temporal Causal Probabilistic Description Logic (T-CPDL), an integrated framework that extends traditional Description Logic with temporal interval operators, explicit caus… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    ACM Class: I.2.7; F.4.1

  13. arXiv:2506.17068  [pdf, ps, other

    q-bio.NC cs.ET eess.SP

    Cross-Modal Epileptic Signal Harmonization: Frequency Domain Mapping Quantization for Pre-training a Unified Neurophysiological Transformer

    Authors: Runkai Zhang, Hua Yu, John Q. Gan, Haixian Wang

    Abstract: Scalp electroencephalography (EEG) and intracranial EEG (iEEG) are vital for epilepsy diagnosis and treatment. Their unified analysis offers the potential to harness the complementary strengths of each modality but is challenging due to variations in recording montages, amplitude and signal-to-noise ratio (SNR), and frequency components. To address the aforementioned challenges, this paper introdu… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  14. arXiv:2506.16643  [pdf, ps, other

    cs.RO cs.HC

    See What I Mean? Expressiveness and Clarity in Robot Display Design

    Authors: Matthew Ebisu, Hang Yu, Reuben Aronson, Elaine Short

    Abstract: Nonverbal visual symbols and displays play an important role in communication when humans and robots work collaboratively. However, few studies have investigated how different types of non-verbal cues affect objective task performance, especially in a dynamic environment that requires real time decision-making. In this work, we designed a collaborative navigation task where the user and the robot… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Journal ref: RO-MAN 2025

  15. arXiv:2506.15834  [pdf, ps, other

    cs.HC

    Machine Learning-based Context-Aware EMAs: An Offline Feasibility Study

    Authors: Zachary D King, Maryam Khalid, Han Yu, Kei Shibuya, Khadija Zanna, Marzieh Majd, Ryan L Brown, Yufei Shen, Thomas Vaessen, George Kypriotakis, Christopher P Fagundes, Akane Sano

    Abstract: Mobile health (mHealth) systems help researchers monitor and care for patients in real-world settings. Studies utilizing mHealth applications use Ecological Momentary Assessment (EMAs), passive sensing, and contextual features to develop emotion recognition models, which rely on EMA responses as ground truth. Due to this, it is crucial to consider EMA compliance when conducting a successful mHealt… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  16. arXiv:2506.15228  [pdf, ps, other

    eess.IV cs.MM

    ABC: Adaptive BayesNet Structure Learning for Computational Scalable Multi-task Image Compression

    Authors: Yufeng Zhang, Wenrui Dai, Hang Yu, Shizhan Liu, Junhui Hou, Jianguo Li, Weiyao Lin

    Abstract: Neural Image Compression (NIC) has revolutionized image compression with its superior rate-distortion performance and multi-task capabilities, supporting both human visual perception and machine vision tasks. However, its widespread adoption is hindered by substantial computational demands. While existing approaches attempt to address this challenge through module-specific optimizations or pre-def… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  17. arXiv:2506.14285  [pdf, ps, other

    cs.CL

    From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents

    Authors: Seongbo Jang, Minjin Jeon, Jaehoon Lee, Seonghyeon Lee, Dongha Lee, Hwanjo Yu

    Abstract: While research on dialogue response generation has primarily focused on generating coherent responses conditioning on textual context, the critical question of when to respond grounded on the temporal context remains underexplored. To bridge this gap, we propose a novel task called timely dialogue response generation and introduce the TimelyChat benchmark, which evaluates the capabilities of langu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Work in progress

  18. arXiv:2506.14168  [pdf, ps, other

    cs.CV cs.AI

    VideoMAR: Autoregressive Video Generatio with Continuous Tokens

    Authors: Hu Yu, Biao Gong, Hangjie Yuan, DanDan Zheng, Weilong Chai, Jingdong Chen, Kecheng Zheng, Feng Zhao

    Abstract: Masked-based autoregressive models have demonstrated promising image generation capability in continuous space. However, their potential for video generation remains under-explored. In this paper, we propose \textbf{VideoMAR}, a concise and efficient decoder-only autoregressive image-to-video model with continuous tokens, composing temporal frame-by-frame and spatial masked generation. We first id… ▽ More

    Submitted 18 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  19. arXiv:2506.13915  [pdf, ps, other

    cs.RO

    Sequence Modeling for Time-Optimal Quadrotor Trajectory Optimization with Sampling-based Robustness Analysis

    Authors: Katherine Mao, Hongzhan Yu, Ruipeng Zhang, Igor Spasojevic, M Ani Hsieh, Sicun Gao, Vijay Kumar

    Abstract: Time-optimal trajectories drive quadrotors to their dynamic limits, but computing such trajectories involves solving non-convex problems via iterative nonlinear optimization, making them prohibitively costly for real-time applications. In this work, we investigate learning-based models that imitate a model-based time-optimal trajectory planner to accelerate trajectory generation. Given a dataset o… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  20. arXiv:2506.13079  [pdf, ps, other

    cs.RO cs.HC

    CHARM: Considering Human Attributes for Reinforcement Modeling

    Authors: Qidi Fang, Hang Yu, Shijie Fang, Jindan Huang, Qiuyu Chen, Reuben M. Aronson, Elaine S. Short

    Abstract: Reinforcement Learning from Human Feedback has recently achieved significant success in various fields, and its performance is highly related to feedback quality. While much prior work acknowledged that human teachers' characteristics would affect human feedback patterns, there is little work that has closely investigated the actual effects. In this work, we designed an exploratory study investiga… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Journal ref: ROMAN 2025

  21. arXiv:2506.13058  [pdf, ps, other

    cs.CV cs.AI

    DualFast: Dual-Speedup Framework for Fast Sampling of Diffusion Models

    Authors: Hu Yu, Hao Luo, Fan Wang, Feng Zhao

    Abstract: Diffusion probabilistic models (DPMs) have achieved impressive success in visual generation. While, they suffer from slow inference speed due to iterative sampling. Employing fewer sampling steps is an intuitive solution, but this will also introduces discretization error. Existing fast samplers make inspiring efforts to reduce discretization error through the adoption of high-order solvers, poten… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  22. arXiv:2506.12728  [pdf, ps, other

    cs.SE

    MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution

    Authors: Yibo Wang, Zhihao Peng, Ying Wang, Zhao Wei, Hai Yu, Zhiliang Zhu

    Abstract: LLMs demonstrate strong performance in auto-mated software engineering, particularly for code generation and issue resolution. While proprietary models like GPT-4o achieve high benchmarks scores on SWE-bench, their API dependence, cost, and privacy concerns limit adoption. Open-source alternatives offer transparency but underperform in complex tasks, especially sub-100B parameter models. Although… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  23. Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

    Authors: Suyeon Kim, SeongKu Kang, Dongwoo Kim, Jungseul Ok, Hwanjo Yu

    Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art performance in node classification tasks but struggle with label noise in real-world data. Existing studies on graph learning with label noise commonly rely on class-dependent label noise, overlooking the complexities of instance-dependent noise and falling short of capturing real-world corruption patterns. We introduce BeGIN (Benchmarkin… ▽ More

    Submitted 16 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

    Comments: 12 pages

    Journal ref: KDD 2025

  24. arXiv:2506.12376  [pdf, ps, other

    cs.AI cs.CL

    ConsistencyChecker: Tree-based Evaluation of LLM Generalization Capabilities

    Authors: Zhaochen Hong, Haofei Yu, Jiaxuan You

    Abstract: Evaluating consistency in large language models (LLMs) is crucial for ensuring reliability, particularly in complex, multi-step interactions between humans and LLMs. Traditional self-consistency methods often miss subtle semantic changes in natural language and functional shifts in code or equations, which can accumulate over multiple transformations. To address this, we propose ConsistencyChecker… ▽ More

    Submitted 17 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

    Comments: Accepted at ACL 2025 Main Conference

  25. arXiv:2506.12220  [pdf, ps, other

    cs.LG cs.AI

    Two Heads Are Better than One: Simulating Large Transformers with Small Ones

    Authors: Hantao Yu, Josh Alman

    Abstract: The quadratic complexity of self-attention prevents transformers from scaling effectively to long input sequences. On the other hand, modern GPUs and other specialized hardware accelerators are well-optimized for processing small input sequences in transformers during both training and inference. A natural question arises: can we take advantage of the efficiency of small transformers to deal with… ▽ More

    Submitted 18 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  26. arXiv:2506.11499  [pdf, ps, other

    cs.CL

    On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval

    Authors: Seongbo Jang, Seonghyeon Lee, Dongha Lee, Hwanjo Yu

    Abstract: Multimodal chatbots have become one of the major topics for dialogue systems in both research community and industry. Recently, researchers have shed light on the multimodality of responses as well as dialogue contexts. This work explores how a dialogue system can output responses in various modalities such as text and image. To this end, we first formulate a multimodal dialogue response retrieval… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 9 pages, 1 figure

  27. arXiv:2506.11262  [pdf, ps, other

    cs.RO cs.LG

    Demonstration Sidetracks: Categorizing Systematic Non-Optimality in Human Demonstrations

    Authors: Shijie Fang, Hang Yu, Qidi Fang, Reuben M. Aronson, Elaine S. Short

    Abstract: Learning from Demonstration (LfD) is a popular approach for robots to acquire new skills, but most LfD methods suffer from imperfections in human demonstrations. Prior work typically treats these suboptimalities as random noise. In this paper we study non-optimal behaviors in non-expert demonstrations and show that they are systematic, forming what we call demonstration sidetracks. Using a public… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Journal ref: RO-MAN 2025

  28. arXiv:2506.11109  [pdf, ps, other

    cs.CL cs.AI

    Enhancing Large Language Models for Mobility Analytics with Semantic Location Tokenization

    Authors: Yile Chen, Yicheng Tao, Yue Jiang, Shuai Liu, Han Yu, Gao Cong

    Abstract: The widespread adoption of location-based services has led to the generation of vast amounts of mobility data, providing significant opportunities to model user movement dynamics within urban environments. Recent advancements have focused on adapting Large Language Models (LLMs) for mobility analytics. However, existing methods face two primary limitations: inadequate semantic representation of lo… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Accepted by KDD'25

  29. arXiv:2506.10756  [pdf, ps, other

    cs.RO cs.AI

    Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding

    Authors: Yuhang Zhang, Haosheng Yu, Jiaping Xiao, Mir Feroskhan

    Abstract: Vision-and-language navigation (VLN) is a long-standing challenge in autonomous robotics, aiming to empower agents with the ability to follow human instructions while navigating complex environments. Two key bottlenecks remain in this field: generalization to out-of-distribution environments and reliance on fixed discrete action spaces. To address these challenges, we propose Vision-Language Fly (… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  30. arXiv:2506.10002  [pdf, other

    cs.MM cs.AI cs.CV cs.RO

    EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis

    Authors: Jianwu Fang, Lei-Lei Li, Zhedong Zheng, Hongkai Yu, Jianru Xue, Zhengguo Li, Tat-Seng Chua

    Abstract: Traffic Accident Anticipation (TAA) in traffic scenes is a challenging problem for achieving zero fatalities in the future. Current approaches typically treat TAA as a supervised learning task needing the laborious annotation of accident occurrence duration. However, the inherent long-tailed, uncertain, and fast-evolving nature of traffic scenes has the problem that real causal parts of accidents… ▽ More

    Submitted 15 March, 2025; originally announced June 2025.

    Comments: Accepted by IEEE-TMM

  31. arXiv:2506.09398  [pdf, ps, other

    cs.LG physics.comp-ph

    Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames

    Authors: Haiyang Yu, Yuchao Lin, Xuan Zhang, Xiaofeng Qian, Shuiwang Ji

    Abstract: We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations, which plays an important role in physics, chemistry, and materials science. Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose a novel and efficient network, called QHNetV2, that achieves global SO(3) equivarian… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Code available at: https://github.com/divelab/AIRS

  32. arXiv:2506.09349  [pdf, ps, other

    cs.CL

    OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment

    Authors: Chao-Hong Tan, Qian Chen, Wen Wang, Chong Deng, Qinglin Zhang, Luyao Cheng, Hai Yu, Xin Zhang, Xiang Lv, Tianyu Zhao, Chong Zhang, Yukun Ma, Yafeng Chen, Hui Wang, Jiaqing Liu, Jieping Ye

    Abstract: Recent studies on end-to-end speech generation with large language models (LLMs) have attracted significant community attention, with multiple works extending text-based LLMs to generate discrete speech tokens. Existing approaches primarily fall into two categories: (1) Methods that generate discrete speech tokens independently without incorporating them into the LLM's autoregressive process, resu… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  33. arXiv:2506.08640  [pdf, ps, other

    cs.CV

    Orientation Matters: Making 3D Generative Models Orientation-Aligned

    Authors: Yichong Lu, Yuzhuo Tian, Zijin Jiang, Yikun Zhao, Yuanbo Yang, Hao Ouyang, Haoji Hu, Huimin Yu, Yujun Shen, Yiyi Liao

    Abstract: Humans intuitively perceive object shape and orientation from a single image, guided by strong priors about canonical poses. However, existing 3D generative models often produce misaligned results due to inconsistent training data, limiting their usability in downstream tasks. To address this gap, we introduce the task of orientation-aligned 3D object generation: producing 3D objects from single i… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Project Page: https://xdimlab.github.io/Orientation_Matters

  34. arXiv:2506.08516  [pdf, ps, other

    cs.LG

    NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis

    Authors: Mouadh Yagoubi, David Danan, Milad Leyli-Abadi, Ahmed Mazari, Jean-Patrick Brunet, Abbas Kabalan, Fabien Casenave, Yuxin Ma, Giovanni Catalani, Jean Fesquet, Jacob Helwig, Xuan Zhang, Haiyang Yu, Xavier Bertrand, Frederic Tost, Michael Baurheim, Joseph Morlier, Shuiwang Ji

    Abstract: The integration of machine learning (ML) into the physical sciences is reshaping computational paradigms, offering the potential to accelerate demanding simulations such as computational fluid dynamics (CFD). Yet, persistent challenges in accuracy, generalization, and physical consistency hinder the practical deployment of ML models in scientific domains. To address these limitations and systemati… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  35. arXiv:2506.08427  [pdf, other

    cs.CL

    Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models

    Authors: Jiaxiang Liu, Boxuan Xing, Chenhao Yuan, Chenxiang Zhang, Di Wu, Xiusheng Huang, Haida Yu, Chuhan Lang, Pengfei Cao, Jun Zhao, Kang Liu

    Abstract: As large language models (LLMs) continue to advance, there is a growing urgency to enhance the interpretability of their internal knowledge mechanisms. Consequently, many interpretation methods have emerged, aiming to unravel the knowledge mechanisms of LLMs from various perspectives. However, current interpretation methods differ in input data formats and interpreting outputs. The tools integrati… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  36. arXiv:2506.08381  [pdf

    physics.geo-ph cs.LG

    TS-PIELM: Time-Stepping Physics-Informed Extreme Learning Machine Facilitates Soil Consolidation Analyses

    Authors: He Yang, Fei Ren, Hai-Sui Yu, Xueyu Geng, Pei-Zhi Zhuang

    Abstract: Accuracy and efficiency of the conventional physics-informed neural network (PINN) need to be improved before it can be a competitive alternative for soil consolidation analyses. This paper aims to overcome these limitations by proposing a highly accurate and efficient physics-informed machine learning (PIML) approach, termed time-stepping physics-informed extreme learning machine (TS-PIELM). In t… ▽ More

    Submitted 10 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  37. arXiv:2506.08375  [pdf, ps, other

    cs.CL

    EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models

    Authors: Tao Zou, Xinghua Zhang, Haiyang Yu, Minzheng Wang, Fei Huang, Yongbin Li

    Abstract: With the development and widespread application of large language models (LLMs), the new paradigm of "Model as Product" is rapidly evolving, and demands higher capabilities to address complex user needs, often requiring precise workflow execution which involves the accurate understanding of multiple tasks. However, existing benchmarks focusing on single-task environments with limited constraints l… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 24 pages

  38. arXiv:2506.08249  [pdf, other

    cs.DB cs.CL

    RADAR: Benchmarking Language Models on Imperfect Tabular Data

    Authors: Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, Xin Liu

    Abstract: Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compro… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  39. arXiv:2506.07969  [pdf, ps, other

    cs.LG physics.flu-dyn

    A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling

    Authors: Jacob Helwig, Sai Sreeharsha Adavi, Xuan Zhang, Yuchao Lin, Felix S. Chim, Luke Takeshi Vizzini, Haiyang Yu, Muhammad Hasnain, Saykat Kumar Biswas, John J. Holloway, Narendra Singh, N. K. Anand, Swagnik Guhathakurta, Shuiwang Ji

    Abstract: We consider the problem of modeling high-speed flows using machine learning methods. While most prior studies focus on low-speed fluid flows in which uniform time-stepping is practical, flows approaching and exceeding the speed of sound exhibit sudden changes such as shock waves. In such cases, it is essential to use adaptive time-stepping methods to allow a temporal resolution sufficient to resol… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  40. arXiv:2506.07466  [pdf, other

    cs.IR

    Leveraging Historical and Current Interests for Continual Sequential Recommendation

    Authors: Gyuseok Lee, Hyunsik Yoo, Junyoung Hwang, SeongKu Kang, Hwanjo Yu

    Abstract: Sequential recommendation models based on the Transformer architecture show superior performance in harnessing long-range dependencies within user behavior via self-attention. However, naively updating them on continuously arriving non-stationary data streams incurs prohibitive computation costs or leads to catastrophic forgetting. To address this, we propose Continual Sequential Transformer for R… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  41. arXiv:2506.06539  [pdf, ps, other

    cs.CL cs.AI

    Beyond Facts: Evaluating Intent Hallucination in Large Language Models

    Authors: Yijie Hao, Haofei Yu, Jiaxuan You

    Abstract: When exposed to complex queries containing multiple conditions, today's large language models (LLMs) tend to produce responses that only partially satisfy the query while neglecting certain conditions. We therefore introduce the concept of Intent Hallucination. In this phenomenon, LLMs either omit (neglecting to address certain parts) or misinterpret (responding to invented query parts) elements o… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 main conference

    Journal ref: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

  42. arXiv:2506.06400  [pdf, ps, other

    eess.IV cs.CV

    ResPF: Residual Poisson Flow for Efficient and Physically Consistent Sparse-View CT Reconstruction

    Authors: Changsheng Fang, Yongtong Liu, Bahareh Morovati, Shuo Han, Yu Shi, Li Zhou, Shuyi Fan, Hengyong Yu

    Abstract: Sparse-view computed tomography (CT) is a practical solution to reduce radiation dose, but the resulting ill-posed inverse problem poses significant challenges for accurate image reconstruction. Although deep learning and diffusion-based methods have shown promising results, they often lack physical interpretability or suffer from high computational costs due to iterative sampling starting from ra… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  43. Optimizing Recall or Relevance? A Multi-Task Multi-Head Approach for Item-to-Item Retrieval in Recommendation

    Authors: Jiang Zhang, Sumit Kumar, Wei Chang, Yubo Wang, Feng Zhang, Weize Mao, Hanchao Yu, Aashu Singh, Min Li, Qifan Wang

    Abstract: The task of item-to-item (I2I) retrieval is to identify a set of relevant and highly engaging items based on a given trigger item. It is a crucial component in modern recommendation systems, where users' previously engaged items serve as trigger items to retrieve relevant content for future engagement. However, existing I2I retrieval models in industry are primarily built on co-engagement data and… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Journal ref: KDD 2025

  44. arXiv:2506.05820  [pdf, ps, other

    cs.CV

    DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image

    Authors: Ziwei Zhao, Zhixing Zhang, Yuhang Liu, Zhao Zhang, Haojun Yu, Dong Wang, Liwei Wang

    Abstract: In the field of 3D medical imaging, accurately extracting and representing the blood vessels with curvilinear structures holds paramount importance for clinical diagnosis. Previous methods have commonly relied on discrete representation like mask, often resulting in local fractures or scattered fragments due to the inherent limitations of the per-pixel classification paradigm. In this work, we int… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Accepted by CVPR 2025

  45. arXiv:2506.05806  [pdf, ps, other

    cs.CV

    LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

    Authors: Haojie Yu, Zhaonian Wang, Yihan Pan, Meng Cheng, Hao Yang, Chao Wang, Tao Xie, Xiaoming Xu, Xiaoming Wei, Xunliang Cai

    Abstract: Diffusion-based models have gained wide adoption in the virtual human generation due to their outstanding expressiveness. However, their substantial computational requirements have constrained their deployment in real-time interactive avatar applications, where stringent speed, latency, and duration requirements are paramount. We present a novel audio-driven portrait video generation framework bas… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  46. arXiv:2506.05276  [pdf, ps, other

    cs.LG

    How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control

    Authors: Hao Yu, Chu Xin Cheng, Runlong Yu, Yuyang Ye, Shiwei Tong, Zhaofeng Liu, Defu Lian

    Abstract: Recent advances in time series generation have shown promise, yet controlling properties in generated sequences remains challenging. Time Series Editing (TSE) - making precise modifications while preserving temporal coherence - consider both point-level constraints and segment-level controls that current methods struggle to provide. We introduce the CocktailEdit framework to enable simultaneous, f… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  47. arXiv:2506.05242  [pdf, ps, other

    cs.CR

    SECNEURON: Reliable and Flexible Abuse Control in Local LLMs via Hybrid Neuron Encryption

    Authors: Zhiqiang Wang, Haohua Du, Junyang Wang, Haifeng Sun, Kaiwen Guo, Haikuo Yu, Chao Liu, Xiang-Yang Li

    Abstract: Large language models (LLMs) with diverse capabilities are increasingly being deployed in local environments, presenting significant security and controllability challenges. These locally deployed LLMs operate outside the direct control of developers, rendering them more susceptible to abuse. Existing mitigation techniques mainly designed for cloud-based LLM services are frequently circumvented or… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  48. arXiv:2506.03798  [pdf, ps, other

    cs.CV

    CoLa: Chinese Character Decomposition with Compositional Latent Components

    Authors: Fan Shi, Haiyang Yu, Bin Li, Xiangyang Xue

    Abstract: Humans can decompose Chinese characters into compositional components and recombine them to recognize unseen characters. This reflects two cognitive principles: Compositionality, the idea that complex concepts are built on simpler parts; and Learning-to-learn, the ability to learn strategies for decomposing and recombining components to form new concepts. These principles provide inductive biases… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  49. arXiv:2506.03737  [pdf, ps, other

    cs.CV cs.AI

    ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

    Authors: Hao Yu, Tangyu Jiang, Shuning Jia, Shannan Yan, Shunning Liu, Haolong Qian, Guanghao Li, Shuting Dong, Huaisong Zhang, Chun Yuan

    Abstract: The Transformer architecture has revolutionized various regions since it was proposed, and its effectiveness largely depends on the ability to encode positional information. Traditional position encoding methods exhibit significant limitations due to lack of robustness and flexibility of position. Therefore, Rotary Positional Encoding (RoPE) was proposed to alleviate these issues, which integrates… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  50. arXiv:2506.03608  [pdf, ps, other

    cs.CV

    PDSE: A Multiple Lesion Detector for CT Images using PANet and Deformable Squeeze-and-Excitation Block

    Authors: Di Fan, Heng Yu, Zhiyuan Xu

    Abstract: Detecting lesions in Computed Tomography (CT) scans is a challenging task in medical image processing due to the diverse types, sizes, and locations of lesions. Recently, various one-stage and two-stage framework networks have been developed to focus on lesion localization. We introduce a one-stage lesion detection framework, PDSE, by redesigning Retinanet to achieve higher accuracy and efficiency… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: MIUA 2024