Skip to main content

Showing 1–50 of 2,449 results for author: Liu, L

Searching in archive cs. Search in all archives.
.
  1. Spatial tangible user interfaces for cognitive assessment and training

    Authors: Ehud Sharlin, Yuichi Itoh, Benjamin Watson, Yoshifumi Kitamura, Steve Sutphen, Lili Liu, Fumio Kishino

    Abstract: This paper discusses Tangible User Interfaces (TUIs) and their potential impact on cognitive assessment and cognitive training. We believe that TUIs, and particularly a subset that we dub spatial TUIs, can extend human computer interaction beyond some of its current limitations. Spatial TUIs exploit human innate spatial and tactile ability in an intuitive and direct manner, affording interaction p… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Journal ref: Proc. Bio-ADIT 2004 (Lausanne, Switzerland), 410-425. Also in Lecture Notes in Computer Science, 3141, 137-152

  2. arXiv:2507.01795  [pdf, ps, other

    math.NA cs.LG math-ph

    Neural Entropy-stable conservative flux form neural networks for learning hyperbolic conservation laws

    Authors: Lizuo Liu, Lu Zhang, Anne Gelb

    Abstract: We propose a neural entropy-stable conservative flux form neural network (NESCFN) for learning hyperbolic conservation laws and their associated entropy functions directly from solution trajectories, without requiring any predefined numerical discretization. While recent neural network architectures have successfully integrated classical numerical principles into learned models, most rely on prior… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    MSC Class: 65M08; 68T07; 65M22; 65M32; 65D25

  3. arXiv:2507.01182  [pdf, ps, other

    cs.CV

    Rapid Salient Object Detection with Difference Convolutional Neural Networks

    Authors: Zhuo Su, Li Liu, Matthias Müller, Jiehua Zhang, Diana Wofk, Ming-Ming Cheng, Matti Pietikäinen

    Abstract: This paper addresses the challenge of deploying salient object detection (SOD) on resource-constrained devices with real-time performance. While recent advances in deep neural networks have improved SOD, existing top-leading models are computationally expensive. We propose an efficient network design that combines traditional wisdom on SOD and the representation power of modern CNNs. Like biologic… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 16 pages, accepted in TPAMI

  4. arXiv:2507.00690  [pdf, ps, other

    cs.CV cs.CR

    Cage-Based Deformation for Transferable and Undefendable Point Cloud Attack

    Authors: Keke Tang, Ziyong Du, Weilong Peng, Xiaofei Wang, Peican Zhu, Ligang Liu, Zhihong Tian

    Abstract: Adversarial attacks on point clouds often impose strict geometric constraints to preserve plausibility; however, such constraints inherently limit transferability and undefendability. While deformation offers an alternative, existing unstructured approaches may introduce unnatural distortions, making adversarial point clouds conspicuous and undermining their plausibility. In this paper, we propose… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  5. arXiv:2507.00016  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Gradient-based Fine-Tuning through Pre-trained Model Regularization

    Authors: Xuanbo Liu, Liu Liu, Fuxiang Wu, Fusheng Hao, Xianglong Liu

    Abstract: Large pre-trained models have demonstrated extensive applications across various fields. However, fine-tuning these models for specific downstream tasks demands significant computational resources and storage. One fine-tuning method, gradient-based parameter selection (GPS), focuses on fine-tuning only the parameters with high gradients in each neuron, thereby reducing the number of training param… ▽ More

    Submitted 14 June, 2025; originally announced July 2025.

  6. arXiv:2506.23897  [pdf, ps, other

    cs.CV

    PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View

    Authors: Longliang Liu, Miaojie Feng, Junda Cheng, Jijun Xiang, Xuan Zhu, Xin Yang

    Abstract: Panoramic optical flow enables a comprehensive understanding of temporal dynamics across wide fields of view. However, severe distortions caused by sphere-to-plane projections, such as the equirectangular projection (ERP), significantly degrade the performance of conventional perspective-based optical flow methods, especially in polar regions. To address this challenge, we propose PriOr-Flow, a no… ▽ More

    Submitted 30 June, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

  7. arXiv:2506.23648  [pdf, ps, other

    cs.CV

    MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis

    Authors: Zhe Liu, Yuhao Huang, Lian Liu, Chengrui Zhang, Haotian Lin, Tong Han, Zhiyuan Zhu, Yanlin Chen, Yuerui Chen, Dong Ni, Zhongshan Gou, Xin Yang

    Abstract: Color Doppler echocardiography is a crucial tool for diagnosing mitral regurgitation (MR). Recent studies have explored intelligent methods for MR diagnosis to minimize user dependence and improve accuracy. However, these approaches often fail to align with clinical workflow and may lead to suboptimal accuracy and interpretability. In this study, we introduce an automated MR diagnosis model (MReg)… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 10 pages, 5 figures, accepted by MICCAI 2025

  8. arXiv:2506.23490  [pdf, ps, other

    eess.IV cs.AI cs.CV

    UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound

    Authors: Junxuan Yu, Yaofei Duan, Yuhao Huang, Yu Wang, Rongbo Ling, Weihao Luo, Ang Zhang, Jingxian Xu, Qiongying Ni, Yongsong Zhou, Binghan Li, Haoran Dou, Liping Liu, Yanfen Chu, Feng Geng, Zhe Sheng, Zhifeng Ding, Dingxin Zhang, Rui Huang, Yuhang Zhang, Xiaowei Xu, Tao Tan, Dong Ni, Zhongshan Gou, Xin Yang

    Abstract: Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quan… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: accepted by miccai 2025

  9. arXiv:2506.23482  [pdf, ps, other

    cs.CV

    MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting

    Authors: Jun Huang, Ting Liu, Yihang Wu, Xiaochao Qu, Luoqi Liu, Xiaolin Hu

    Abstract: Advancements in generative models have enabled image inpainting models to generate content within specific regions of an image based on provided prompts and masks. However, existing inpainting methods often suffer from problems such as semantic misalignment, structural distortion, and style inconsistency. In this work, we present MTADiffusion, a Mask-Text Alignment diffusion model designed for obj… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: CVPR 2025

  10. arXiv:2506.23322  [pdf, ps, other

    cs.DB cs.AI cs.CL cs.IR

    GaussMaster: An LLM-based Database Copilot System

    Authors: Wei Zhou, Ji Sun, Xuanhe Zhou, Guoliang Li, Luyang Liu, Hao Wu, Tianyuan Wang

    Abstract: In the financial industry, data is the lifeblood of operations, and DBAs shoulder significant responsibilities for SQL tuning, database deployment, diagnosis, and service repair. In recent years, both database vendors and customers have increasingly turned to autonomous database platforms in an effort to alleviate the heavy workload of DBAs. However, existing autonomous database platforms are limi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: We welcome contributions from the community. For reference, please see the code at: https://gitcode.com/opengauss/openGauss-GaussMaster

  11. Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems

    Authors: Langming Liu, Wanyu Wang, Chi Zhang, Bo Li, Hongzhi Yin, Xuetao Wei, Wenbo Su, Bo Zheng, Xiangyu Zhao

    Abstract: Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints.… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: KDD 2025

  12. arXiv:2506.23088  [pdf, ps, other

    cs.CV

    Where, What, Why: Towards Explainable Driver Attention Prediction

    Authors: Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao, Yueyao Lin, Linkai Liu, Zipeng Guo, Hao Fei, Xiaobo Xia, Chao Gou

    Abstract: Modeling task-driven attention in driving is a fundamental challenge for both autonomous vehicles and cognitive science. Existing methods primarily predict where drivers look by generating spatial heatmaps, but fail to capture the cognitive motivations behind attention allocation in specific contexts, which limits deeper understanding of attention mechanisms. To bridge this gap, we introduce Expla… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  13. arXiv:2506.22763  [pdf, ps, other

    q-fin.PM cs.LG q-fin.CP

    Can We Reliably Predict the Fed's Next Move? A Multi-Modal Approach to U.S. Monetary Policy Forecasting

    Authors: Fiona Xiao Jingyi, Lili Liu

    Abstract: Forecasting central bank policy decisions remains a persistent challenge for investors, financial institutions, and policymakers due to the wide-reaching impact of monetary actions. In particular, anticipating shifts in the U.S. federal funds rate is vital for risk management and trading strategies. Traditional methods relying only on structured macroeconomic indicators often fall short in capturi… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: 9 pages, 15 figures

  14. A tangible user interface for assessing cognitive mapping ability

    Authors: Ehud Sharlin, Benjamin Watson, Steve Sutphen, Lili Liu, Robert Lederer, John Frazer

    Abstract: Wayfinding, the ability to recall the environment and navigate through it, is an essential cognitive skill relied upon almost every day in a person's life. A crucial component of wayfinding is the construction of cognitive maps, mental representations of the environments through which a person travels. Age, disease or injury can severely affect cognitive mapping, making assessment of this basic su… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Journal ref: International Journal of Human-Computer Studies (2009). Volume 67, Issue 3, Pages 269-278. Academic Press

  15. arXiv:2506.22500  [pdf, ps, other

    cs.CV cs.AI

    Visual-Semantic Knowledge Conflicts in Operating Rooms: Synthetic Data Curation for Surgical Risk Perception in Multimodal Large Language Models

    Authors: Weiyi Zhao, Xiaoyu Tan, Liang Liu, Sijia Li, Youwei Song, Xihe Qiu

    Abstract: Surgical risk identification is critical for patient safety and reducing preventable medical errors. While multimodal large language models (MLLMs) show promise for automated operating room (OR) risk detection, they often exhibit visual-semantic knowledge conflicts (VS-KC), failing to identify visual safety violations despite understanding textual rules. To address this, we introduce a dataset com… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 13 pages, 5 figures. The dataset and appendix are available at https://github.com/zgg2577/VS-KC

    MSC Class: 68T07; 68U10; 92C55 ACM Class: I.2.10; I.2.7; J.3; I.2.6

  16. arXiv:2506.22319  [pdf, ps, other

    math.AP cs.GR math-ph physics.comp-ph

    Asymptotic analysis and design of shell-based thermal lattice metamaterials

    Authors: Di Zhang, Ligang Liu

    Abstract: We present a rigorous asymptotic analysis framework for investigating the thermal conductivity of shell lattice metamaterials, extending prior work from mechanical stiffness to heat transfer. Central to our analysis is a new metric, the asymptotic directional conductivity (ADC), which captures the leading-order influence of the middle surface geometry on the effective thermal conductivity in the v… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    MSC Class: 74Q15 (Primary) 35Q74; 74Q20; 74K25 (Secondary) ACM Class: I.3.5; J.2

  17. arXiv:2506.21860  [pdf, ps, other

    cs.RO cs.CV

    Embodied Domain Adaptation for Object Detection

    Authors: Xiangyu Shi, Yanyuan Qiao, Lingqiao Liu, Feras Dayoub

    Abstract: Mobile robots rely on object detectors for perception and object localization in indoor environments. However, standard closed-set methods struggle to handle the diverse objects and dynamic conditions encountered in real homes and labs. Open-vocabulary object detection (OVOD), driven by Vision Language Models (VLMs), extends beyond fixed labels but still struggles with domain shifts in indoor envi… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by IROS 2025

  18. arXiv:2506.21763  [pdf, ps, other

    cs.AI

    THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?

    Authors: Xin Wang, Jiyao Liu, Yulong Xiao, Junzhi Ning, Lihao Liu, Junjun He, Botian Shi, Kaicheng Yu

    Abstract: Large Language Models (LLMs) are accelerating scientific idea generation, but rigorously evaluating these numerous, often superficial, AI-generated propositions for novelty and factual accuracy is a critical bottleneck; manual verification is too slow.Existing validation methods are inadequate: LLMs as standalone verifiers may hallucinate and lack domain knowledge (our findings show ~60\% unawaren… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  19. arXiv:2506.21573  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Instruction Learning Paradigms: A Dual Perspective on White-box and Black-box LLMs

    Authors: Yanwei Ren, Liu Liu, Baosheng Yu, Jiayan Qiu, Quan Chen

    Abstract: Optimizing instructions for large language models (LLMs) is critical for harnessing their full potential in complex and diverse tasks. However, relying solely on white-box approaches demands extensive computational resources and offers limited representational capacity, while black-box models can incur prohibitive financial costs. To address these challenges, we introduce a novel framework that se… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  20. arXiv:2506.21063  [pdf, ps, other

    cs.RO

    Control of Marine Robots in the Era of Data-Driven Intelligence

    Authors: Lin Hong, Lu Liu, Zhouhua Peng, Fumin Zhang

    Abstract: The control of marine robots has long relied on model-based methods grounded in classical and modern control theory. However, the nonlinearity and uncertainties inherent in robot dynamics, coupled with the complexity of marine environments, have revealed the limitations of conventional control methods. The rapid evolution of machine learning has opened new avenues for incorporating data-driven int… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  21. arXiv:2506.20756  [pdf, ps, other

    cs.CV

    StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation

    Authors: Haodong Li, Chen Wang, Jiahui Lei, Kostas Daniilidis, Lingjie Liu

    Abstract: Recent video depth estimation methods achieve great performance by following the paradigm of image depth estimation, i.e., typically fine-tuning pre-trained video diffusion models with massive data. However, we argue that video depth estimation is not a naive extension of image depth estimation. The temporal consistency requirements for dynamic and static regions in videos are fundamentally differ… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Work done in Nov 2024, during an internship at the University of Pennsylvania. Project page: https://stereodiff.github.io/

  22. arXiv:2506.20513  [pdf, ps, other

    physics.geo-ph cs.LG eess.SP

    Fast ground penetrating radar dual-parameter full waveform inversion method accelerated by hybrid compilation of CUDA kernel function and PyTorch

    Authors: Lei Liu, Chao Song, Liangsheng He, Silin Wang, Xuan Feng, Cai Liu

    Abstract: This study proposes a high-performance dual-parameter full waveform inversion framework (FWI) for ground-penetrating radar (GPR), accelerated through the hybrid compilation of CUDA kernel functions and PyTorch. The method leverages the computational efficiency of GPU programming while preserving the flexibility and usability of Python-based deep learning frameworks. By integrating customized CUDA… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  23. Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search

    Authors: Zhigong Zhou, Ning Ding, Xiaochuan Fan, Yue Shang, Yiming Qiu, Jingwei Zhuo, Zhiwei Ge, Songlin Wang, Lin Liu, Sulong Xu, Han Zhang

    Abstract: Semantic retrieval, which retrieves semantically matched items given a textual query, has been an essential component to enhance system effectiveness in e-commerce search. In this paper, we study the multimodal retrieval problem, where the visual information (e.g, image) of item is leveraged as supplementary of textual information to enrich item representation and further improve retrieval perform… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: published in sigir2023

  24. arXiv:2506.19833  [pdf, ps, other

    cs.CV

    Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router

    Authors: Yubo Huang, Weiqiang Wang, Sirui Zhao, Tong Xu, Lin Liu, Enhong Chen

    Abstract: Recent years have witnessed remarkable advances in audio-driven talking head generation. However, existing approaches predominantly focus on single-character scenarios. While some methods can create separate conversation videos between two individuals, the critical challenge of generating unified conversation videos with multiple physically co-present characters sharing the same spatial environmen… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  25. arXiv:2506.19660  [pdf, ps, other

    cs.DC

    PS-WL: A Probability-Sensitive Wear Leveling scheme for SSD array scaling

    Authors: Shuhang Xu, Yunfei Gu, Linhui Liu, Chentao Wu

    Abstract: As flash-based Solid State Drive (SSD) arrays become essential to modern data centers, scaling these arrays to meet explosive data growth is a frequent and critical operation. However, the conventional wear-leveling (WL) paradigm applied during scaling suffers from a fundamental flaw: it ignores the non-linear relationship between wear and failure probability, potentially pushing the most vulnerab… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  26. arXiv:2506.19368  [pdf, ps, other

    cs.CR

    Yotta: A Large-Scale Trustless Data Trading Scheme for Blockchain System

    Authors: Xiang Liu, Zhanpeng Guo, Liangxi Liu, Mengyao Zheng, Yiming Qiu, Linshan Jiang

    Abstract: Data trading is one of the key focuses of Web 3.0. However, all the current methods that rely on blockchain-based smart contracts for data exchange cannot support large-scale data trading while ensuring data security, which falls short of fulfilling the spirit of Web 3.0. Even worse, there is currently a lack of discussion on the essential properties that large-scale data trading should satisfy. I… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures, Exploratory Paper

    Journal ref: Nanyang Blockchain Conference 2025

  27. arXiv:2506.19118   

    cs.CE

    LKA: Large Kernel Adapter for Enhanced Medical Image Classification

    Authors: Ziquan Zhu, Si-Yuan Lu, Tianjin Huang, Lu Liu, Zhe Liu

    Abstract: Despite the notable success of current Parameter-Efficient Fine-Tuning (PEFT) methods across various domains, their effectiveness on medical datasets falls short of expectations. This limitation arises from two key factors: (1) medical images exhibit extensive anatomical variation and low contrast, necessitating a large receptive field to capture critical features, and (2) existing PEFT methods do… ▽ More

    Submitted 25 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Some aspects of the experimental setup were not clearly described in the current version. We plan to revise and clarify these points before resubmitting

  28. arXiv:2506.18881  [pdf, ps, other

    cs.CV cs.MM

    Let Your Video Listen to Your Music!

    Authors: Xinyu Zhang, Dong Gong, Zicheng Duan, Anton van den Hengel, Lingqiao Liu

    Abstract: Aligning the rhythm of visual motion in a video with a given music track is a practical need in multimedia production, yet remains an underexplored task in autonomous video editing. Effective alignment between motion and musical beats enhances viewer engagement and visual appeal, particularly in music videos, promotional content, and cinematic editing. Existing methods typically depend on labor-in… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: project page: https://zhangxinyu-xyz.github.io/MVAA/

  29. arXiv:2506.18851  [pdf, ps, other

    cs.CV

    Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

    Authors: Zhuowei Chen, Bingchuan Li, Tianxiang Ma, Lijie Liu, Mingcong Liu, Yi Zhang, Gen Li, Xinghui Li, Siyu Zhou, Qian He, Xinglong Wu

    Abstract: Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by samp… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project page:https://phantom-video.github.io/Phantom-Data/

  30. arXiv:2506.18244  [pdf

    cs.LG

    Dual-Forward Path Teacher Knowledge Distillation: Bridging the Capacity Gap Between Teacher and Student

    Authors: Tong Li, Long Liu, Yihang Hu, Hu Chen, Shifeng Chen

    Abstract: Knowledge distillation (KD) provides an effective way to improve the performance of a student network under the guidance of pre-trained teachers. However, this approach usually brings in a large capacity gap between teacher and student networks, limiting the distillation gains. Previous methods addressing this problem either discard accurate knowledge representation or fail to dynamically adjust t… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 15pages

  31. arXiv:2506.18145  [pdf, ps, other

    cs.LG cs.AI

    Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection

    Authors: Zheng Zhan, Liliang Ren, Shuohang Wang, Liyuan Liu, Yang Liu, Yeyun Gong, Yanzhi Wang, Yelong Shen

    Abstract: Linear State Space Models (SSMs) offer remarkable performance gains in efficient sequence modeling, with constant inference-time computation and memory complexity. Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the ex… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  32. arXiv:2506.17913  [pdf, ps, other

    cs.AI

    Learning, Reasoning, Refinement: A Framework for Kahneman's Dual-System Intelligence in GUI Agents

    Authors: Jinjie Wei, Jiyao Liu, Lihao Liu, Ming Hu, Junzhi Ning, Mingcheng Li, Weijie Yin, Junjun He, Xiao Liang, Chao Feng, Dingkang Yang

    Abstract: Graphical User Interface (GUI) agents have made significant progress in automating digital tasks through the utilization of computer vision and language models. Nevertheless, existing agent systems encounter notable limitations. Firstly, they predominantly depend on trial and error decision making rather than progressive reasoning, thereby lacking the capability to learn and adapt from interactive… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  33. arXiv:2506.16962  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs

    Authors: Haoran Sun, Yankai Jiang, Wenjie Lou, Yujie Zhang, Wenjie Li, Lilong Wang, Mianxin Liu, Lei Liu, Xiaosong Wang

    Abstract: Multimodal large language models (MLLMs) have begun to demonstrate robust reasoning capabilities on general tasks, yet their application in the medical domain remains in its early stages. Constructing chain-of-thought (CoT) training data is essential for bolstering the reasoning abilities of medical MLLMs. However, existing approaches exhibit a deficiency in offering a comprehensive framework for… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  34. arXiv:2506.15241  [pdf

    cs.CL

    Research on Graph-Retrieval Augmented Generation Based on Historical Text Knowledge Graphs

    Authors: Yang Fan, Zhang Qi, Xing Wenqian, Liu Chang, Liu Liu

    Abstract: This article addresses domain knowledge gaps in general large language models for historical text analysis in the context of computational humanities and AIGC technology. We propose the Graph RAG framework, combining chain-of-thought prompting, self-instruction generation, and process supervision to create a The First Four Histories character relationship dataset with minimal manual annotation. Th… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  35. arXiv:2506.15050  [pdf, ps, other

    cs.AI

    Truncated Proximal Policy Optimization

    Authors: Tiantian Fan, Lingjun Liu, Yu Yue, Jiaze Chen, Chengyi Wang, Qiying Yu, Chi Zhang, Zhiqi Lin, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Bole Ma, Mofan Zhang, Gaohong Liu, Ru Zhang, Haotian Zhou, Cong Xie, Ruidong Zhu, Zhi Zhang, Xin Liu, Mingxuan Wang, Lin Yan, Yonghui Wu

    Abstract: Recently, test-time scaling Large Language Models (LLMs) have demonstrated exceptional reasoning capabilities across scientific and professional tasks by generating long chains-of-thought (CoT). As a crucial component for developing these reasoning models, reinforcement learning (RL), exemplified by Proximal Policy Optimization (PPO) and its variants, allows models to learn through trial and error… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  36. arXiv:2506.15025  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size

    Authors: Soufiane Hayou, Liyuan Liu

    Abstract: Pretraining large language models is a costly process. To make this process more efficient, several methods have been proposed to optimize model architecture/parametrization and hardware use. On the parametrization side, $μP$ (Maximal Update Parametrization) parametrizes model weights and learning rate (LR) in a way that makes hyperparameters (HPs) transferable with width (embedding dimension): HP… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: TD,LR: How to set the learning rate for emebdding layer in LLMs?

  37. arXiv:2506.14248  [pdf, ps, other

    cs.CL cs.AI

    Re-Initialization Token Learning for Tool-Augmented Large Language Models

    Authors: Chenghao Li, Liu Liu, Baosheng Yu, Jiayan Qiu, Yibing Zhan

    Abstract: Large language models have demonstrated exceptional performance, yet struggle with complex tasks such as numerical reasoning, plan generation. Integrating external tools, such as calculators and databases, into large language models (LLMs) is crucial for enhancing problem-solving capabilities. Current methods assign a unique token to each tool, enabling LLMs to call tools through token prediction-… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  38. arXiv:2506.13814  [pdf, ps, other

    cs.GR cs.LG eess.IV

    ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

    Authors: Lufei Liu, Tor M. Aamodt

    Abstract: Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate fea… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: Published at ICML 2025

  39. arXiv:2506.13363  [pdf, ps, other

    cs.CL

    Efficient Medical VIE via Reinforcement Learning

    Authors: Lijun Liu, Ruiyang Li, Zhaocheng Liu, Chenglin Zhu, Chong Li, Jiehan Cheng, Qiang Ju, Jian Xie

    Abstract: Visual Information Extraction (VIE) converts unstructured document images into structured formats like JSON, critical for medical applications such as report analysis and online consultations. Traditional methods rely on OCR and language models, while end-to-end multimodal models offer direct JSON generation. However, domain-specific schemas and high annotation costs limit their effectiveness in m… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  40. arXiv:2506.12766  [pdf, ps, other

    cs.CV

    Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better

    Authors: Ruojing Li, Wei An, Xinyi Ying, Yingqian Wang, Yimian Dai, Longguang Wang, Miao Li, Yulan Guo, Li Liu

    Abstract: Infrared small target (IRST) detection is challenging in simultaneously achieving precise, universal, robust and efficient performance due to extremely dim targets and strong interference. Current learning-based methods attempt to leverage ``more" information from both the spatial and the short-term temporal domains, but suffer from unreliable performance under complex conditions while incurring c… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  41. arXiv:2506.12394  [pdf, ps, other

    cs.CV cs.AI

    LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning

    Authors: Haotian Zhang, Liu Liu, Baosheng Yu, Jiayan Qiu, Yanwei Ren, Xianglong Liu

    Abstract: The advent of parameter-efficient fine-tuning methods has significantly reduced the computational burden of adapting large-scale pretrained models to diverse downstream tasks. However, existing approaches often struggle to achieve robust performance under domain shifts while maintaining computational efficiency. To address this challenge, we propose Low-rAnk Regulated Gradient Projection (LARGO) a… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  42. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  43. arXiv:2506.11661  [pdf, ps, other

    cs.CV

    Prohibited Items Segmentation via Occlusion-aware Bilayer Modeling

    Authors: Yunhan Ren, Ruihuang Li, Lingbo Liu, Changwen Chen

    Abstract: Instance segmentation of prohibited items in security X-ray images is a critical yet challenging task. This is mainly caused by the significant appearance gap between prohibited items in X-ray images and natural objects, as well as the severe overlapping among objects in X-ray images. To address these issues, we propose an occlusion-aware instance segmentation pipeline designed to identify prohibi… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted by ICME 2025

  44. arXiv:2506.11150  [pdf, ps, other

    eess.IV cs.CV

    ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

    Authors: Wenlong Hou, Guangqian Yang, Ye Du, Yeung Lau, Lihao Liu, Junjun He, Ling Long, Shujun Wang

    Abstract: Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disease. Early and precise diagnosis of AD is crucial for timely intervention and treatment planning to alleviate the progressive neurodegeneration. However, most existing methods rely on single-modality data, which contrasts with the multifaceted approach used by medical experts. While some deep learning approaches proce… ▽ More

    Submitted 15 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  45. arXiv:2506.10932  [pdf

    cs.HC cs.CY cs.MM

    Video-Mediated Emotion Disclosure: Expressions of Fear, Sadness, and Joy by People with Schizophrenia on YouTube

    Authors: Jiaying Lizzy Liu, Yan Zhang

    Abstract: Individuals with schizophrenia frequently experience intense emotions and often turn to vlogging as a medium for emotional expression. While previous research has predominantly focused on text based disclosure, little is known about how individuals construct narratives around emotions and emotional experiences in video blogs. Our study addresses this gap by analyzing 200 YouTube videos created by… ▽ More

    Submitted 18 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 10 pages

    Journal ref: ASIS&T 2025

  46. arXiv:2506.10600  [pdf, ps, other

    cs.RO cs.CV

    EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

    Authors: Xinjie Wang, Liu Liu, Yu Cao, Ruiqi Wu, Wenkang Qin, Dehui Wang, Wei Sui, Zhizhong Su

    Abstract: Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D comp… ▽ More

    Submitted 16 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  47. arXiv:2506.10312  [pdf, other

    eess.AS cs.CL cs.SD

    AC/DC: LLM-based Audio Comprehension via Dialogue Continuation

    Authors: Yusuke Fujita, Tomoya Mizumoto, Atsushi Kojima, Lianbo Liu, Yui Sudo

    Abstract: We propose an instruction-following audio comprehension model that leverages the dialogue continuation ability of large language models (LLMs). Instead of directly generating target captions in training data, the proposed method trains a model to produce responses as if the input caption triggered a dialogue. This dialogue continuation training mitigates the caption variation problem. Learning to… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  48. arXiv:2506.10145  [pdf, ps, other

    cs.CV

    RoCA: Robust Cross-Domain End-to-End Autonomous Driving

    Authors: Rajeev Yasarla, Shizhong Han, Hsin-Pai Cheng, Litian Liu, Shweta Mahajan, Apratim Bhattacharyya, Yunxiao Shi, Risheek Garrepalli, Hong Cai, Fatih Porikli

    Abstract: End-to-end (E2E) autonomous driving has recently emerged as a new paradigm, offering significant potential. However, few studies have looked into the practical challenge of deployment across domains (e.g., cities). Although several works have incorporated Large Language Models (LLMs) to leverage their open-world knowledge, LLMs do not guarantee cross-domain driving performance and may incur prohib… ▽ More

    Submitted 17 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  49. arXiv:2506.09990  [pdf, ps, other

    cs.RO cs.CV cs.LG

    Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation

    Authors: Wenbo Zhang, Tianrun Hu, Yanyuan Qiao, Hanbo Zhang, Yuchu Qin, Yang Li, Jiajun Liu, Tao Kong, Lingqiao Liu, Xiao Ma

    Abstract: We present Chain-of-Action (CoA), a novel visuo-motor policy paradigm built upon Trajectory Autoregressive Modeling. Unlike conventional approaches that predict next step action(s) forward, CoA generates an entire trajectory by explicit backward reasoning with task-specific goals through an action-level Chain-of-Thought (CoT) process. This process is unified within a single autoregressive structur… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  50. arXiv:2506.09448  [pdf, ps, other

    cs.SD cs.CL eess.AS

    OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary

    Authors: Yui Sudo, Yusuke Fujita, Atsushi Kojima, Tomoya Mizumoto, Lianbo Liu

    Abstract: Speech foundation models (SFMs), such as Open Whisper-Style Speech Models (OWSM), are trained on massive datasets to achieve accurate automatic speech recognition. However, even SFMs struggle to accurately recognize rare and unseen words. While contextual biasing (CB) is a promising approach to improve recognition of such words, most CB methods are trained from scratch, resulting in lower performa… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025