Skip to main content

Showing 1–50 of 2,025 results for author: Lin, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01857  [pdf, ps, other

    cs.RO

    TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types

    Authors: Yuhao Lin, Yi-Lin Wei, Haoran Liao, Mu Lin, Chengyi Xing, Hao Li, Dandan Zhang, Mark Cutkosky, Wei-Shi Zheng

    Abstract: Dexterous teleoperation plays a crucial role in robotic manipulation for real-world data collection and remote robot control. Previous dexterous teleoperation mostly relies on hand retargeting to closely mimic human hand postures. However, these approaches may fail to fully leverage the inherent dexterity of dexterous hands, which can execute unique actions through their structural advantages comp… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Project Page: https://isee-laboratory.github.io/TypeTele

  2. arXiv:2507.01564  [pdf, ps, other

    eess.IV cs.CV

    Multi Source COVID-19 Detection via Kernel-Density-based Slice Sampling

    Authors: Chia-Ming Lee, Bo-Cheng Qiu, Ting-Yao Chen, Ming-Han Sun, Fang-Ying Lin, Jung-Tse Tsai, I-An Tsai, Yu-Fan Lin, Chih-Chung Hsu

    Abstract: We present our solution for the Multi-Source COVID-19 Detection Challenge, which classifies chest CT scans from four distinct medical centers. To address multi-source variability, we employ the Spatial-Slice Feature Learning (SSFL) framework with Kernel-Density-based Slice Sampling (KDS). Our preprocessing pipeline combines lung region extraction, quality control, and adaptive slice sampling to se… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.01551  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning

    Authors: Wu Fei, Hao Kong, Shuxian Liang, Yang Lin, Yibo Yang, Jing Tang, Lei Chen, Xiansheng Hua

    Abstract: Process Reinforcement Learning~(PRL) has demonstrated considerable potential in enhancing the reasoning capabilities of Large Language Models~(LLMs). However, introducing additional process reward models incurs substantial computational overhead, and there is no unified theoretical framework for process-level advantage estimation. To bridge this gap, we propose \textbf{S}elf-Guided \textbf{P}roces… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2507.01131  [pdf, ps, other

    cs.LG physics.comp-ph

    Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations

    Authors: Yuchao Lin, Cong Fu, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

    Abstract: $\rm{SO}(3)… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  5. arXiv:2507.00407  [pdf, ps, other

    physics.chem-ph cs.AI q-bio.QM

    Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials

    Authors: Cong Fu, Yuchao Lin, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji

    Abstract: Accurate molecular property predictions require 3D geometries, which are typically obtained using expensive methods such as density functional theory (DFT). Here, we attempt to obtain molecular geometries by relying solely on machine learning interatomic potential (MLIP) models. To this end, we first curate a large-scale molecular relaxation dataset comprising 3.5 million molecules and 300 million… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  6. arXiv:2506.24102  [pdf, ps, other

    cs.CV

    DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

    Authors: Xiangtai Li, Tao Zhang, Yanwei Li, Haobo Yuan, Shihao Chen, Yikang Zhou, Jiahao Meng, Yueyi Sun, Shilin Xu, Lu Qi, Tianheng Cheng, Yi Lin, Zilong Huang, Wenhao Huang, Jiashi Feng, Guang Shi

    Abstract: Multimodal Large Language Models (MLLMs) demonstrate a complex understanding of scenes, benefiting from large-scale and high-quality datasets. Most existing caption datasets lack the ground locations and relations for visual entities. Several grounded caption datasets face the problems of missing detailed descriptions, relations, and massive object descriptions on high-resolution images. To fill t… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Datasets and Models: https://github.com/lxtGH/DenseWorld-1M

  7. arXiv:2506.23340  [pdf

    cs.CL

    Information Loss in LLMs' Multilingual Translation: The Role of Training Data, Language Proximity, and Language Family

    Authors: Yumeng Lin, Xufeng Duan, David Haslett, Yige Chen, Zhenguang G. Cai

    Abstract: Large language models have achieved impressive progress in multilingual translation, yet they continue to face challenges with certain language pairs-particularly those with limited training data or significant linguistic divergence from English. This study systematically investigates how training data, language proximity, and language family affect information loss in multilingual translation. We… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  8. arXiv:2506.23329  [pdf, ps, other

    cs.CV

    IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering

    Authors: Parker Liu, Chenxin Li, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyuan Zhang, Yunlong Lin, Sirui Han, Brandon Y. Feng

    Abstract: Vision-language models (VLMs) excel at descriptive tasks, but whether they truly understand scenes from visual observations remains uncertain. We introduce IR3D-Bench, a benchmark challenging VLMs to demonstrate understanding through active creation rather than passive recognition. Grounded in the analysis-by-synthesis paradigm, IR3D-Bench tasks Vision-Language Agents (VLAs) with actively using pr… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Project Page: https://ir3d-bench.github.io/

  9. arXiv:2506.23088  [pdf, ps, other

    cs.CV

    Where, What, Why: Towards Explainable Driver Attention Prediction

    Authors: Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao, Yueyao Lin, Linkai Liu, Zipeng Guo, Hao Fei, Xiaobo Xia, Chao Gou

    Abstract: Modeling task-driven attention in driving is a fundamental challenge for both autonomous vehicles and cognitive science. Existing methods primarily predict where drivers look by generating spatial heatmaps, but fail to capture the cognitive motivations behind attention allocation in specific contexts, which limits deeper understanding of attention mechanisms. To bridge this gap, we introduce Expla… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  10. arXiv:2506.22589  [pdf, ps, other

    cs.CV

    LIGHT: Multi-Modal Text Linking on Historical Maps

    Authors: Yijun Lin, Rhett Olson, Junhan Wu, Yao-Yi Chiang, Jerod Weinman

    Abstract: Text on historical maps provides valuable information for studies in history, economics, geography, and other related fields. Unlike structured or semi-structured documents, text on maps varies significantly in orientation, reading order, shape, and placement. Many modern methods can detect and transcribe text regions, but they struggle to effectively ``link'' the recognized text fragments, e.g.,… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted at ICDAR2025

  11. arXiv:2506.22246  [pdf, ps, other

    cs.CV

    EAMamba: Efficient All-Around Vision State Space Model for Image Restoration

    Authors: Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen, Hsien-Kai Kuo, Chun-Yi Lee

    Abstract: Image restoration is a key task in low-level computer vision that aims to reconstruct high-quality images from degraded inputs. The emergence of Vision Mamba, which draws inspiration from the advanced state space model Mamba, marks a significant advancement in this field. Vision Mamba demonstrates excellence in modeling long-range dependencies with linear complexity, a crucial advantage for image… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: ICCV 2025

  12. arXiv:2506.22133  [pdf, ps, other

    cs.GT math.CO

    A few good choices

    Authors: Thanh Nguyen, Haoyu Song, Young-San Lin

    Abstract: A Condorcet winning set addresses the Condorcet paradox by selecting a few candidates--rather than a single winner--such that no unselected alternative is preferred to all of them by a majority of voters. This idea extends to $α$-undominated sets, which ensure the same property for any $α$-fraction of voters and are guaranteed to exist in constant size for any $α$. However, the requirement that an… ▽ More

    Submitted 29 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  13. arXiv:2506.22068  [pdf, ps, other

    cs.AI

    Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios

    Authors: Shengyue Yao, Runqing Guo, Yangyang Qin, Miangbing Meng, Jipeng Cao, Yilun Lin, Yisheng Lv, Fei-Yue Wang

    Abstract: With the deep penetration of Artificial Intelligence (AI) in the transportation sector, intelligent cockpits, autonomous driving, and intelligent road networks are developing at an unprecedented pace. However, the data ecosystems of these three key areas are increasingly fragmented and incompatible. Especially, existing testing methods rely on data stacking, fail to cover all edge cases, and lack… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Submitted to IEEE Transaction on Vehicular Technology

  14. arXiv:2506.19863  [pdf, ps, other

    physics.comp-ph cs.AI

    Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research

    Authors: Ahmed Almeldein, Mohammed Alnaggar, Rick Archibald, Tom Beck, Arpan Biswas, Rike Bostelmann, Wes Brewer, Chris Bryan, Christopher Calle, Cihangir Celik, Rajni Chahal, Jong Youl Choi, Arindam Chowdhury, Mark Cianciosa, Franklin Curtis, Gregory Davidson, Sebastian De Pascuale, Lisa Fassino, Ana Gainaru, Yashika Ghai, Luke Gibson, Qian Gong, Christopher Greulich, Scott Greenwood, Cory Hauck , et al. (25 additional authors not shown)

    Abstract: The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to au… ▽ More

    Submitted 26 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  15. arXiv:2506.19852  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

    Authors: Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

    Abstract: Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal d… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Code: https://github.com/mit-han-lab/radial-attention

  16. arXiv:2506.18193  [pdf, ps, other

    cs.LG cs.AI cs.DC

    DeInfoReg: A Decoupled Learning Framework for Better Training Throughput

    Authors: Zih-Hao Huang, You-Teng Lin, Hung-Hsuan Chen

    Abstract: This paper introduces Decoupled Supervised Learning with Information Regularization (DeInfoReg), a novel approach that transforms a long gradient flow into multiple shorter ones, thereby mitigating the vanishing gradient problem. Integrating a pipeline strategy, DeInfoReg enables model parallelization across multiple GPUs, significantly improving training throughput. We compare our proposed method… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  17. arXiv:2506.17612  [pdf, ps, other

    cs.CV

    JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

    Authors: Yunlong Lin, Zixu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding, Wenbo Li, Shuicheng Yan

    Abstract: Photo retouching has become integral to contemporary visual storytelling, enabling users to capture aesthetics and express creativity. While professional tools such as Adobe Lightroom offer powerful capabilities, they demand substantial expertise and manual effort. In contrast, existing AI-based solutions provide automation but often suffer from limited adjustability and poor generalization, faili… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 40 pages, 26 figures

  18. arXiv:2506.17562  [pdf, ps, other

    cs.CV cs.CL

    LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning

    Authors: Haoxuan Che, Haibo Jin, Zhengrui Guo, Yi Lin, Cheng Jin, Hao Chen

    Abstract: LLMs have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge,… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  19. arXiv:2506.17561  [pdf, ps, other

    cs.CV cs.AI cs.RO

    VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

    Authors: Chongkai Gao, Zixuan Liu, Zhenghao Chi, Junshan Huang, Xin Fei, Yiwen Hou, Yuxuan Zhang, Yudi Lin, Zhirui Fang, Zeyu Jiang, Lin Shao

    Abstract: Recent studies on Vision-Language-Action (VLA) models have shifted from the end-to-end action-generation paradigm toward a pipeline involving task planning followed by action generation, demonstrating improved performance on various complex, long-horizon manipulation tasks. However, existing approaches vary significantly in terms of network architectures, planning paradigms, representations, and t… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  20. arXiv:2506.17302  [pdf, ps, other

    cs.CV cs.LG

    Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning

    Authors: Yijun Lin, Theresa Chen, Colby Brungard, Grunwald Sabine, Sue Ives, Matt Macander, Timm Nawrocki, Yao-Yi Chiang, Nic Jelinski

    Abstract: Fine-scale soil mapping in Alaska, traditionally relying on fieldwork and localized simulations, remains a critical yet underdeveloped task, despite the region's ecological importance and extensive permafrost coverage. As permafrost thaw accelerates due to climate change, it threatens infrastructure stability and key ecosystem services, such as soil carbon storage. High-resolution soil maps are es… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 12 pages, Submitted to SIGSPATIAL 2025

  21. arXiv:2506.16006  [pdf, ps, other

    cs.CV cs.AI

    DIGMAPPER: A Modular System for Automated Geologic Map Digitization

    Authors: Weiwei Duan, Michael P. Gerlek, Steven N. Minton, Craig A. Knoblock, Fandel Lin, Theresa Chen, Leeje Jang, Sofia Kirsanova, Zekun Li, Yijun Lin, Yao-Yi Chiang

    Abstract: Historical geologic maps contain rich geospatial information, such as rock units, faults, folds, and bedding planes, that is critical for assessing mineral resources essential to renewable energy, electric vehicles, and national security. However, digitizing maps remains a labor-intensive and time-consuming task. We present DIGMAPPER, a modular, scalable system developed in collaboration with the… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  22. arXiv:2506.15959  [pdf

    cs.DL physics.soc-ph

    Can Recombination Displace Dominant Scientific Ideas

    Authors: Linzhuo Li, Yiling Lin, Lingfei Wu

    Abstract: Scientific breakthroughs are widely attributed to the novel recombination of existing ideas. Yet despite explosive global growth in scientific labor and publications -- creating more opportunities to reconfigure knowledge -- the rate of breakthroughs has not kept pace. To investigate this disconnect, we analyze 49 million scholarly works from 1960 to 2024 using measures of atypical recombination a… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 10 figures

  23. arXiv:2506.15706  [pdf, ps, other

    cs.LG cs.AI

    MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning

    Authors: Yunze Lin

    Abstract: Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) as it requires ensuring the correctness of each reasoning step. Researchers have been strengthening the mathematical reasoning abilities of LLMs through supervised fine-tuning, but due to the inability to suppress incorrect outputs, illusions can easily arise. Recently, Direct Preference Optimization (DPO) has… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  24. arXiv:2506.15524  [pdf, ps, other

    cs.CV

    NTIRE 2025 Image Shadow Removal Challenge Report

    Authors: Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu, Jiarui Wang, Qiming Hu, Xiaojie Guo, Xin Lu, Jiarong Yang, Yuanfei Bao, Anya Hu, Zihao Fan, Kunyu Wang, Jie Xiao, Xi Wang, Xueyang Fu, Zheng-Jun Zha, Yu-Fan Lin, Chia-Ming Lee , et al. (57 additional authors not shown)

    Abstract: This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were e… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  25. arXiv:2506.15087  [pdf, ps, other

    cs.RO

    3D Vision-tactile Reconstruction from Infrared and Visible Images for Robotic Fine-grained Tactile Perception

    Authors: Yuankai Lin, Xiaofan Lu, Jiahui Chen, Hua Yang

    Abstract: To achieve human-like haptic perception in anthropomorphic grippers, the compliant sensing surfaces of vision tactile sensor (VTS) must evolve from conventional planar configurations to biomimetically curved topographies with continuous surface gradients. However, planar VTSs have challenges when extended to curved surfaces, including insufficient lighting of surfaces, blurring in reconstruction,… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  26. arXiv:2506.15010  [pdf, ps, other

    cs.CV

    Hyper-Local Deformable Transformers for Text Spotting on Historical Maps

    Authors: Yijun Lin, Yao-Yi Chiang

    Abstract: Text on historical maps contains valuable information providing georeferenced historical, political, and cultural contexts. However, text extraction from historical maps is challenging due to the lack of (1) effective methods and (2) training data. Previous approaches use ad-hoc steps tailored to only specific map styles. Recent machine learning-based text spotters (e.g., for scene images) have th… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Published in KDD2024

  27. arXiv:2506.13905  [pdf, ps, other

    cs.AR

    Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems

    Authors: Zhongzhi Yu, Mingjie Liu, Michael Zimmer, Yingyan Celine Lin, Yong Liu, Haoxing Ren

    Abstract: Despite recent progress in generating hardware RTL code with LLMs, existing solutions still suffer from a substantial gap between practical application scenarios and the requirements of real-world RTL code development. Prior approaches either focus on overly simplified hardware descriptions or depend on extensive human guidance to process complex specifications, limiting their scalability and auto… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  28. arXiv:2506.13777  [pdf, other

    physics.soc-ph cs.AI cs.CY

    A Survey of Physics-Informed AI for Complex Urban Systems

    Authors: En Xu, Huandong Wang, Yunke Zhang, Sibo Li, Yinzhou Tang, Zhilun Zhou, Yuming Lin, Yuan Yuan, Xiaochen Fan, Jingtao Ding, Yong Li

    Abstract: Urban systems are typical examples of complex systems, where the integration of physics-based modeling with artificial intelligence (AI) presents a promising paradigm for enhancing predictive accuracy, interpretability, and decision-making. In this context, AI excels at capturing complex, nonlinear relationships, while physics-based models ensure consistency with real-world laws and provide interp… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  29. arXiv:2506.12860  [pdf, ps, other

    cs.CL

    QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

    Authors: Wanlong Liu, Junxiao Xu, Fei Yu, Yukang Lin, Ke Ji, Wenyu Chen, Yan Xu, Yasheng Wang, Lifeng Shang, Benyou Wang

    Abstract: Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns e… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 23 pages

  30. arXiv:2506.11948  [pdf, ps, other

    cs.RO cs.AI

    SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies

    Authors: Nadun Ranawaka Arachchige, Zhenyang Chen, Wonsuhk Jung, Woo Chul Shin, Rohan Bansal, Pierre Barroso, Yu Hang He, Yingyang Celine Lin, Benjamin Joffe, Shreyas Kousik, Danfei Xu

    Abstract: Offline Imitation Learning (IL) methods such as Behavior Cloning are effective at acquiring complex robotic manipulation skills. However, existing IL-trained policies are confined to executing the task at the same speed as shown in demonstration data. This limits the task throughput of a robotic system, a critical requirement for applications such as industrial automation. In this paper, we introd… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: The first two authors contributed equally

  31. arXiv:2506.11908  [pdf, ps, other

    cs.LG cs.AI

    Spectra-to-Structure and Structure-to-Spectra Inference Across the Periodic Table

    Authors: Yufeng Wang, Peiyao Wang, Lu Ma, Yuewei Lin, Qun Liu, Haibin Ling

    Abstract: X-ray Absorption Spectroscopy (XAS) is a powerful technique for probing local atomic environments, yet its interpretation remains limited by the need for expert-driven analysis, computationally expensive simulations, and element-specific heuristics. Recent advances in machine learning have shown promise for accelerating XAS interpretation, but many existing models are narrowly focused on specific… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  32. From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation

    Authors: Chih-Hao Hsu, Ying-Jia Lin, Hung-Yu Kao

    Abstract: In dialogue generation, the naturalness of responses is crucial for effective human-machine interaction. Personalized response generation poses even greater challenges, as the responses must remain coherent and consistent with the user's personal traits or persona descriptions. We propose MUDI ($\textbf{Mu}$ltiple $\textbf{Di}$scourse Relations Graph Learning) for personalized dialogue generation.… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted by PAKDD 2025

  33. arXiv:2506.11538  [pdf, ps, other

    cs.IR

    Dual-Perspective Disentangled Multi-Intent Alignment for Enhanced Collaborative Filtering

    Authors: Shanfan Zhang, Yongyi Lin, Yuan Rao, Chenlong Zhang

    Abstract: Disentangling user intents from implicit feedback has emerged as a promising strategy for enhancing both the accuracy and interpretability of recommendation systems. However, existing methods often model user and item intents independently and rely heavily on implicit structural signals, lacking explicit guidance to uncover the joint semantics that drive user-item interactions. To address these li… ▽ More

    Submitted 30 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

    Comments: 27 pages, 11 figures

  34. arXiv:2506.11403  [pdf, ps, other

    cs.SD cs.AI eess.AS

    A correlation-permutation approach for speech-music encoders model merging

    Authors: Fabian Ritter-Gutierrez, Yi-Cheng Lin, Jeremy H. M Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

    Abstract: Creating a unified speech and music model requires expensive pre-training. Model merging can instead create an unified audio model with minimal computational expense. However, direct merging is challenging when the models are not aligned in the weight space. Motivated by Git Re-Basin, we introduce a correlation-permutation approach that aligns a music encoder's internal layers with a speech encode… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Under review

  35. arXiv:2506.11338  [pdf, ps, other

    cs.CL

    Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More Poorly

    Authors: Yi-Chien Lin, William Schuler

    Abstract: As Transformers become more widely incorporated into natural language processing tasks, there has been considerable interest in using surprisal from these models as predictors of human sentence processing difficulty. Recent work has observed a positive relationship between Transformer-based models' perplexity and the predictive power of their surprisal estimates on reading times, showing that lang… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  36. arXiv:2506.11116  [pdf, ps, other

    cs.CL cs.AI

    Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models

    Authors: Jijie Li, Li Du, Hanyu Zhao, Bo-wen Zhang, Liangdong Wang, Boyan Gao, Guang Liu, Yonghua Lin

    Abstract: Large Language Models (LLMs) demonstrate strong performance in real-world applications, yet existing open-source instruction datasets often concentrate on narrow domains, such as mathematics or coding, limiting generalization and widening the gap with proprietary models. To bridge this gap, we introduce Infinity-Instruct, a high-quality instruction dataset designed to enhance both foundational and… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  37. arXiv:2506.11012  [pdf, other

    cs.AI cs.CL

    A Survey of Task-Oriented Knowledge Graph Reasoning: Status, Applications, and Prospects

    Authors: Guanglin Niu, Bo Li, Yangguang Lin

    Abstract: Knowledge graphs (KGs) have emerged as a powerful paradigm for structuring and leveraging diverse real-world knowledge, which serve as a fundamental technology for enabling cognitive intelligence systems with advanced understanding and reasoning capabilities. Knowledge graph reasoning (KGR) aims to infer new knowledge based on existing facts in KGs, playing a crucial role in applications such as p… ▽ More

    Submitted 27 April, 2025; originally announced June 2025.

    Comments: 45 pages, 17 figures, 12 tables

    ACM Class: I.2.7

  38. arXiv:2506.10741  [pdf, ps, other

    cs.CV

    PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

    Authors: SiXiang Chen, Jianyu Lai, Jialin Gao, Tian Ye, Haoyu Chen, Hengyu Shi, Shitong Shao, Yunlong Lin, Song Fei, Zhaohu Xing, Yeying Jin, Junfeng Luo, Xiaoming Wei, Lei Zhu

    Abstract: Generating aesthetic posters is more challenging than simple design images: it requires not only precise text rendering but also the seamless integration of abstract artistic content, striking layouts, and overall stylistic harmony. To address this, we propose PosterCraft, a unified framework that abandons prior modular pipelines and rigid, predefined layouts, allowing the model to freely explore… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  39. arXiv:2506.09398  [pdf, ps, other

    cs.LG physics.comp-ph

    Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames

    Authors: Haiyang Yu, Yuchao Lin, Xuan Zhang, Xiaofeng Qian, Shuiwang Ji

    Abstract: We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations, which plays an important role in physics, chemistry, and materials science. Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose a novel and efficient network, called QHNetV2, that achieves global SO(3) equivarian… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Code available at: https://github.com/divelab/AIRS

  40. arXiv:2506.09066  [pdf, other

    cs.CV cs.AI

    ReStNet: A Reusable & Stitchable Network for Dynamic Adaptation on IoT Devices

    Authors: Maoyu Wang, Yao Lu, Jiaqi Nie, Zeyu Wang, Yun Lin, Qi Xuan, Guan Gui

    Abstract: With the rapid development of deep learning, a growing number of pre-trained models have been publicly available. However, deploying these fixed models in real-world IoT applications is challenging because different devices possess heterogeneous computational and memory resources, making it impossible to deploy a single model across all platforms. Although traditional compression methods, such as… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  41. arXiv:2506.08931  [pdf, ps, other

    cs.RO

    CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks

    Authors: Yixuan Li, Yutang Lin, Jieming Cui, Tengyu Liu, Wei Liang, Yixin Zhu, Siyuan Huang

    Abstract: Humanoid teleoperation plays a vital role in demonstrating and collecting data for complex humanoid-scene interactions. However, current teleoperation systems face critical limitations: they decouple upper- and lower-body control to maintain stability, restricting natural coordination, and operate open-loop without real-time position feedback, leading to accumulated drift. The fundamental challeng… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 18 pages, 13 figures

  42. arXiv:2506.07969  [pdf, ps, other

    cs.LG physics.flu-dyn

    A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling

    Authors: Jacob Helwig, Sai Sreeharsha Adavi, Xuan Zhang, Yuchao Lin, Felix S. Chim, Luke Takeshi Vizzini, Haiyang Yu, Muhammad Hasnain, Saykat Kumar Biswas, John J. Holloway, Narendra Singh, N. K. Anand, Swagnik Guhathakurta, Shuiwang Ji

    Abstract: We consider the problem of modeling high-speed flows using machine learning methods. While most prior studies focus on low-speed fluid flows in which uniform time-stepping is practical, flows approaching and exceeding the speed of sound exhibit sudden changes such as shock waves. In such cases, it is essential to use adaptive time-stepping methods to allow a temporal resolution sufficient to resol… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  43. arXiv:2506.07900  [pdf, ps, other

    cs.CL cs.AI

    MiniCPM4: Ultra-Efficient LLMs on End Devices

    Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

    Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: MiniCPM4 Technical Report

  44. arXiv:2506.07851  [pdf, ps, other

    cs.CL

    Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning

    Authors: Yiju Guo, Wenkai Yang, Zexu Sun, Ning Ding, Zhiyuan Liu, Yankai Lin

    Abstract: Large language models (LLMs) have demonstrated significant improvements in contextual understanding. However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace. Specifically, our preliminary experiments reveal that certain distracting patterns can misdirect the model's attention during inference, and removing these patter… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  45. arXiv:2506.07463  [pdf, ps, other

    cs.CL cs.AI

    CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models

    Authors: Guang Liu, Liangdong Wang, Jijie Li, Yang Yu, Yao Xu, Jiabei Chen, Yu Bai, Feng Liao, Yonghua Lin

    Abstract: We introduce CCI4.0, a large-scale bilingual pre-training dataset engineered for superior data quality and diverse human-like reasoning trajectory. CCI4.0 occupies roughly $35$ TB of disk space and comprises two sub-datasets: CCI4.0-M2-Base and CCI4.0-M2-CoT. CCI4.0-M2-Base combines a $5.2$ TB carefully curated Chinese web corpus, a $22.5$ TB English subset from Nemotron-CC, and diverse sources fr… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  46. arXiv:2506.07385  [pdf, ps, other

    cs.SE

    GUIPilot: A Consistency-based Mobile GUI Testing Approach for Detecting Application-specific Bugs

    Authors: Ruofan Liu, Xiwen Teoh, Yun Lin, Guanjie Chen, Ruofei Ren, Denys Poshyvanyk, Jin Song Dong

    Abstract: In this work, we propose GUIPilot, an approach for detecting inconsistencies between the mobile design and their implementations. The mobile design usually consists of design mock-ups that specify (1) the expected screen appearances (e.g., widget layouts, colors, and shapes) and (2) the expected screen behaviors, regarding how one screen can transition into another (e.g., labeled widgets with text… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  47. arXiv:2506.07368  [pdf, ps, other

    cs.CV cs.AI

    C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image Segmentation

    Authors: Jiaying He, Yitong Lin, Jiahe Chen, Honghui Xu, Jianwei Zheng

    Abstract: For the immanent challenge of insufficiently annotated samples in the medical field, semi-supervised medical image segmentation (SSMIS) offers a promising solution. Despite achieving impressive results in delineating primary target areas, most current methodologies struggle to precisely capture the subtle details of boundaries. This deficiency often leads to significant diagnostic inaccuracies. To… ▽ More

    Submitted 25 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

    Comments: Accepted to ICME 2025

  48. arXiv:2506.07091  [pdf, ps, other

    cs.CV

    SceneLCM: End-to-End Layout-Guided Interactive Indoor Scene Generation with Latent Consistency Model

    Authors: Yangkai Lin, Jiabao Lei, Kui Jia

    Abstract: Our project page: https://scutyklin.github.io/SceneLCM/. Automated generation of complex, interactive indoor scenes tailored to user prompt remains a formidable challenge. While existing methods achieve indoor scene synthesis, they struggle with rigid editing constraints, physical incoherence, excessive human effort, single-room limitations, and suboptimal material quality. To address these limita… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  49. Multi-StyleGS: Stylizing Gaussian Splatting with Multiple Styles

    Authors: Yangkai Lin, Jiabao Lei, Kui jia

    Abstract: In recent years, there has been a growing demand to stylize a given 3D scene to align with the artistic style of reference images for creative purposes. While 3D Gaussian Splatting(GS) has emerged as a promising and efficient method for realistic 3D scene modeling, there remains a challenge in adapting it to stylize 3D GS to match with multiple styles through automatic local style transfer or manu… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: AAAI 2025

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 5289-5297 (2025)

  50. arXiv:2506.06804  [pdf, ps, other

    cs.RO

    IRS: Instance-Level 3D Scene Graphs via Room Prior Guided LiDAR-Camera Fusion

    Authors: Hongming Chen, Yiyang Lin, Ziliang Li, Biyu Ye, Yuying Zhang, Ximin Lyu

    Abstract: Indoor scene understanding remains a fundamental challenge in robotics, with direct implications for downstream tasks such as navigation and manipulation. Traditional approaches often rely on closed-set recognition or loop closure, limiting their adaptability in open-world environments. With the advent of visual foundation models (VFMs), open-vocabulary recognition and natural language querying ha… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.