Skip to main content

Showing 1–50 of 8,811 results for author: Yi

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18204  [pdf, ps, other

    cs.CV cs.AI

    Multimodal Fusion SLAM with Fourier Attention

    Authors: Youjie Zhou, Guofeng Mei, Yiming Wang, Yi Wan, Fabio Poiesi

    Abstract: Visual SLAM is particularly challenging in environments affected by noise, varying lighting conditions, and darkness. Learning-based optical flow algorithms can leverage multiple modalities to address these challenges, but traditional optical flow-based visual SLAM approaches often require significant computational resources.To overcome this limitation, we propose FMF-SLAM, an efficient multimodal… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  2. arXiv:2506.18023  [pdf, ps, other

    cs.CV cs.AI cs.CL

    PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

    Authors: Kui Huang, Xinrong Chen, Wenyu Lv, Jincheng Liao, Guanzhong Wang, Yi Liu

    Abstract: This report introduces PP-DocBee2, an advanced version of the PP-DocBee, designed to enhance multimodal document understanding. Built on a large multimodal model architecture, PP-DocBee2 addresses the limitations of its predecessor through key technological improvements, including enhanced synthetic data quality, improved visual feature fusion strategy, and optimized inference methodologies. These… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  3. arXiv:2506.17798  [pdf, ps, other

    cs.SE cs.CR

    SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis

    Authors: Wang Lingxiang, Quanzhi Fu, Wenjia Song, Gelei Deng, Yi Liu, Dan Williams, Ying Zhang

    Abstract: The integration of open-source third-party library dependencies in Java development introduces significant security risks when these libraries contain known vulnerabilities. Existing Software Composition Analysis (SCA) tools struggle to effectively detect vulnerable API usage from these libraries due to limitations in understanding API usage semantics and computational challenges in analyzing comp… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  4. arXiv:2506.17728  [pdf, ps, other

    cs.CL cs.AI

    KAG-Thinker: Teaching Large Language Models to Think with Human-like Reasoning Process

    Authors: Dalong Zhang, Jun Xu, Jun Zhou, Lei Liang, Lin Yuan, Ling Zhong, Mengshu Sun, Peilong Zhao, QiWei Wang, Xiaorui Wang, Xinkai Du, YangYang Hou, Yu Ao, ZhaoYang Wang, Zhengke Gui, ZhiYing Yi, Zhongpu Bo

    Abstract: In this paper, we introduce KAG-Thinker, a novel human-like reasoning framework built upon a parameter-light large language model (LLM). Our approach enhances the logical coherence and contextual consistency of the thinking process in question-answering (Q\&A) tasks on domain-specific knowledge bases (KBs) within LLMs. This framework simulates human cognitive mechanisms for handling complex proble… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  5. arXiv:2506.17562  [pdf, ps, other

    cs.CV cs.CL

    LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning

    Authors: Haoxuan Che, Haibo Jin, Zhengrui Guo, Yi Lin, Cheng Jin, Hao Chen

    Abstract: LLMs have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge,… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  6. arXiv:2506.17328  [pdf, ps, other

    cs.RO

    Reflective VLM Planning for Dual-Arm Desktop Cleaning: Bridging Open-Vocabulary Perception and Precise Manipulation

    Authors: Yufan Liu, Yi Wu, Gweneth Ge, Haoliang Cheng, Rui Liu

    Abstract: Desktop cleaning demands open-vocabulary recognition and precise manipulation for heterogeneous debris. We propose a hierarchical framework integrating reflective Vision-Language Model (VLM) planning with dual-arm execution via structured scene representation. Grounded-SAM2 facilitates open-vocabulary detection, while a memory-augmented VLM generates, critiques, and revises manipulation sequences.… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  7. arXiv:2506.17249  [pdf, ps, other

    cs.LG cs.AI

    Improving Prediction Certainty Estimation for Reliable Early Exiting via Null Space Projection

    Authors: Jianing He, Qi Zhang, Duoqian Miao, Yi Kun, Shufeng Hao, Hongyun Zhang, Zhihua Wei

    Abstract: Early exiting has demonstrated great potential in accelerating the inference of pre-trained language models (PLMs) by enabling easy samples to exit at shallow layers, eliminating the need for executing deeper layers. However, existing early exiting methods primarily rely on class-relevant logits to formulate their exiting signals for estimating prediction certainty, neglecting the detrimental infl… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: IJCAI 2025, 9 pages

  8. arXiv:2506.17114  [pdf, ps, other

    cs.AI

    Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models

    Authors: Dadi Guo, Jiayu Liu, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung

    Abstract: Large reasoning models (e.g., R1, o3) have demonstrated remarkable mathematical problem-solving abilities. However, the high reported accuracy of these advanced models on popular datasets, reliance on purely numerical evaluation and potential benchmark leakage, often masks their true reasoning shortcomings. To address this, we propose leveraging the inherent rigor and methodological complexity of… ▽ More

    Submitted 23 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  9. arXiv:2506.16677  [pdf, ps, other

    cs.HC cs.RO

    PPTP: Performance-Guided Physiological Signal-Based Trust Prediction in Human-Robot Collaboration

    Authors: Hao Guo, Wei Fan, Shaohui Liu, Feng Jiang, Chunzhi Yi

    Abstract: Trust prediction is a key issue in human-robot collaboration, especially in construction scenarios where maintaining appropriate trust calibration is critical for safety and efficiency. This paper introduces the Performance-guided Physiological signal-based Trust Prediction (PPTP), a novel framework designed to improve trust assessment. We designed a human-robot construction scenario with three di… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  10. arXiv:2506.16447  [pdf, ps, other

    cs.CR cs.CL

    Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models

    Authors: Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, Zhixuan Chu, Yiming Li

    Abstract: Backdoor unalignment attacks against Large Language Models (LLMs) enable the stealthy compromise of safety alignment using a hidden trigger while evading normal safety auditing. These attacks pose significant threats to the applications of LLMs in the real-world Large Language Model as a Service (LLMaaS) setting, where the deployed model is a fully black-box system that can only interact through t… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted at ICLR 2025

    Journal ref: Proceedings of The Thirteenth International Conference on Learning Representations (ICLR 2025)

  11. arXiv:2506.16400  [pdf, ps, other

    cs.CR

    Physical-Layer Signal Injection Attacks on EV Charging Ports: Bypassing Authentication via Electrical-Level Exploits

    Authors: Hetian Shi, Yi He, Shangru Song, Jianwei Zhuge, Jian Mao

    Abstract: The proliferation of electric vehicles in recent years has significantly expanded the charging infrastructure while introducing new security risks to both vehicles and chargers. In this paper, we investigate the security of major charging protocols such as SAE J1772, CCS, IEC 61851, GB/T 20234, and NACS, uncovering new physical signal spoofing attacks in their authentication mechanisms. By inserti… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  12. arXiv:2506.16157  [pdf, ps, other

    cs.CV

    MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models

    Authors: Xingbai Chen, Tingchao Fu, Renyang Liu, Wei Zhou, Chao Yi

    Abstract: Referring Expression Segmentation (RES) enables precise object segmentation in images based on natural language descriptions, offering high flexibility and broad applicability in real-world vision tasks. Despite its impressive performance, the robustness of RES models against adversarial examples remains largely unexplored. While prior adversarial attack methods have explored adversarial robustnes… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 17 pages, 5pages

  13. arXiv:2506.16141  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

    Authors: Yi Chen, Yuying Ge, Rui Wang, Yixiao Ge, Junhao Cheng, Ying Shan, Xihui Liu

    Abstract: Recent reinforcement learning approaches, such as outcome-supervised GRPO, have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1, a benchmark with complex real-world videos requiring balanced perception and reasonin… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Code released at: https://github.com/TencentARC/GRPO-CARE

  14. arXiv:2506.15977  [pdf, ps, other

    cs.CV

    Towards Classifying Histopathological Microscope Images as Time Series Data

    Authors: Sungrae Hong, Hyeongmin Park, Youngsin Ko, Sol Lee, Bryan Wong, Mun Yong Yi

    Abstract: As the frontline data for cancer diagnosis, microscopic pathology images are fundamental for providing patients with rapid and accurate treatment. However, despite their practical value, the deep learning community has largely overlooked their usage. This paper proposes a novel approach to classifying microscopy images as time series data, addressing the unique challenges posed by their manual acq… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 5 pages, 4 figures, Accepted by International Symposium on Biomedical Imaging (ISBI) 2025

  15. arXiv:2506.15961  [pdf

    cs.DC cs.AI cs.LG

    TrainVerify: Equivalence-Based Verification for Distributed LLM Training

    Authors: Yunchi Lu, Youshan Miao, Cheng Tan, Peng Huang, Yi Zhu, Xian Zhang, Fan Yang

    Abstract: Training large language models (LLMs) at scale requires parallel execution across thousands of devices, incurring enormous computational costs. Yet, these costly distributed trainings are rarely verified, leaving them prone to silent errors and potentially wasting millions of GPU hours. We introduce TrainVerify, a system for verifiable distributed training of LLMs. Given a deep learning model's lo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  16. arXiv:2506.15755  [pdf, ps, other

    cs.CV cs.CL

    VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service

    Authors: Xiasi Wang, Tianliang Yao, Simin Chen, Runqi Wang, Lei YE, Kuofeng Gao, Yi Huang, Yuan Yao

    Abstract: Vision-Language Models (VLMs) have demonstrated great potential in real-world applications. While existing research primarily focuses on improving their accuracy, the efficiency remains underexplored. Given the real-time demands of many applications and the high inference overhead of VLMs, efficiency robustness is a critical issue. However, previous studies evaluate efficiency robustness under unr… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025

  17. arXiv:2506.15741  [pdf, ps, other

    cs.AI cs.CL

    OAgents: An Empirical Study of Building Effective Agents

    Authors: He Zhu, Tianrui Qin, King Zhu, Heyuan Huang, Yeyi Guan, Jinxiang Xia, Yi Yao, Hanhao Li, Ningning Wang, Pai Liu, Tianhao Peng, Xin Gui, Xiaowan Li, Yuhui Liu, Yuchen Eleanor Jiang, Jun Wang, Changwang Zhang, Xiangru Tang, Ge Zhang, Jian Yang, Minghao Liu, Xitong Gao, Wangchunshu Zhou, Jiaheng Liu

    Abstract: Recently, Agentic AI has become an increasingly popular research field. However, we argue that current agent research practices lack standardization and scientific rigor, making it hard to conduct fair comparisons among methods. As a result, it is still unclear how different design choices in agent frameworks affect effectiveness, and measuring their progress remains challenging. In this work, we… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 28 pages

  18. arXiv:2506.15697  [pdf, ps, other

    cs.AR cs.CL cs.LG

    DeepRTL2: A Versatile Model for RTL-Related Tasks

    Authors: Yi Liu, Hongji Zhang, Yunhao Zhou, Zhengyuan Shi, Changran Xu, Qiang Xu

    Abstract: The integration of large language models (LLMs) into electronic design automation (EDA) has significantly advanced the field, offering transformative benefits, particularly in register transfer level (RTL) code generation and understanding. While previous studies have demonstrated the efficacy of fine-tuning LLMs for these generation-based tasks, embedding-based tasks, which are equally critical t… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

  19. arXiv:2506.15610  [pdf, ps, other

    cs.CV

    BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion

    Authors: Yuqing Lan, Chenyang Zhu, Zhirui Gao, Jiazhao Zhang, Yihan Cao, Renjiao Yi, Yijie Wang, Kai Xu

    Abstract: Open-vocabulary 3D object detection has gained significant interest due to its critical applications in autonomous driving and embodied AI. Existing detection methods, whether offline or online, typically rely on dense point cloud reconstruction, which imposes substantial computational overhead and memory constraints, hindering real-time deployment in downstream tasks. To address this, we propose… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 11 pages, 6 figures

  20. arXiv:2506.15402  [pdf, ps, other

    cs.RO cs.AI cs.CV

    MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System

    Authors: Miaoxin Pan, Jinnan Li, Yaowen Zhang, Yi Yang, Yufeng Yue

    Abstract: Object-level SLAM offers structured and semantically meaningful environment representations, making it more interpretable and suitable for high-level robotic tasks. However, most existing approaches rely on RGB-D sensors or monocular views, which suffer from narrow fields of view, occlusion sensitivity, and limited depth perception-especially in large-scale or outdoor environments. These limitatio… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  21. arXiv:2506.15267  [pdf, ps, other

    cs.IR

    Next-User Retrieval: Enhancing Cold-Start Recommendations via Generative Next-User Modeling

    Authors: Yu-Ting Lan, Yang Huo, Yi Shen, Xiao Yang, Zuotao Liu

    Abstract: The item cold-start problem is critical for online recommendation systems, as the success of this phase determines whether high-quality new items can transition to popular ones, receive essential feedback to inspire creators, and thus lead to the long-term retention of creators. However, modern recommendation systems still struggle to address item cold-start challenges due to the heavy reliance on… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  22. arXiv:2506.15155  [pdf, ps, other

    cs.DC

    eLLM: Elastic Memory Management Framework for Efficient LLM Serving

    Authors: Jiale Xu, Rui Zhang, Yi Xiong, Cong Guo, Zihan Liu, Yangjie Zhou, Weiming Hu, Hao Wu, Changxu Shao, Ziqing Wang, Yongjie Yuan, Junping Zhao, Minyi Guo, Jingwen Leng

    Abstract: Large Language Models are increasingly being deployed in datacenters. Serving these models requires careful memory management, as their memory usage includes static weights, dynamic activations, and key-value caches. While static weights are constant and predictable, dynamic components such as activations and KV caches change frequently during runtime, presenting significant challenges for efficie… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  23. arXiv:2506.14965  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

    Authors: Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains. We introduce Guru, a curated RL reasoning corpu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 38 pages, 9 figures. Under review

  24. SimSpark: Interactive Simulation of Social Media Behaviors

    Authors: Ziyue Lin, Yi Shan, Lin Gao, Xinghua Jia, Siming Chen

    Abstract: Understanding user behaviors on social media has garnered significant scholarly attention, enhancing our comprehension of how virtual platforms impact society and empowering decision-makers. Simulating social media behaviors provides a robust tool for capturing the patterns of social media behaviors, testing hypotheses, and predicting the effects of various interventions, ultimately contributing t… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 32 pages, 7 figures

    Journal ref: Proc. ACM Hum.-Comput. Interact. 9, 2, Article CSCW168 (April 2025), 32 pages

  25. arXiv:2506.14437  [pdf, ps, other

    cs.IR

    Similarity = Value? Consultation Value Assessment and Alignment for Personalized Search

    Authors: Weicong Qin, Yi Xu, Weijie Yu, Teng Shi, Chenglei Shen, Ming He, Jianping Fan, Xiao Zhang, Jun Xu

    Abstract: Personalized search systems in e-commerce platforms increasingly involve user interactions with AI assistants, where users consult about products, usage scenarios, and more. Leveraging consultation to personalize search services is trending. Existing methods typically rely on semantic similarity to align historical consultations with current queries due to the absence of 'value' labels, but we obs… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  26. arXiv:2506.14396  [pdf, ps, other

    cs.SD cs.MM

    Manipulated Regions Localization For Partially Deepfake Audio: A Survey

    Authors: Jiayi He, Jiangyan Yi, Jianhua Tao, Siding Zeng, Hao Gu

    Abstract: With the development of audio deepfake techniques, attacks with partially deepfake audio are beginning to rise. Compared to fully deepfake, it is much harder to be identified by the detector due to the partially cryptic manipulation, resulting in higher security risks. Although some studies have been launched, there is no comprehensive review to systematically introduce the current situations and… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  27. arXiv:2506.14356  [pdf, ps, other

    cs.CV cs.AI

    EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization

    Authors: Xiaoqi Wang, Yi Wang, Lap-Pui Chau

    Abstract: Egocentric video-language understanding demands both high efficiency and accurate spatial-temporal modeling. Existing approaches face three key challenges: 1) Excessive pre-training cost arising from multi-stage pre-training pipelines, 2) Ineffective spatial-temporal encoding due to manually split 3D rotary positional embeddings that hinder feature interactions, and 3) Imprecise learning objective… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  28. arXiv:2506.14189  [pdf, ps, other

    cs.CV

    Egocentric Human-Object Interaction Detection: A New Benchmark and Method

    Authors: Kunyuan Deng, Yi Wang, Lap-Pui Chau

    Abstract: Understanding the interaction between humans and objects has gained much attention in recent years. Existing human-object interaction (HOI) detection methods mainly focus on the third-person perspectives, overlooking a more intuitive way from the egocentric view of HOI, namely Ego-HOI. This paper introduces an Ego-HOIBench, a new dataset to promote the benchmarking and development of Ego-HOI detec… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  29. arXiv:2506.14142  [pdf, ps, other

    cs.CV cs.CL

    RadFabric: Agentic AI System with Reasoning Capability for Radiology

    Authors: Wenting Chen, Yi Dong, Zhaojun Ding, Yucheng Shi, Yifan Zhou, Fang Zeng, Yijun Luo, Tianyu Lin, Yihang Su, Yichen Wu, Kai Zhang, Zhen Xiang, Tianming Liu, Ninghao Liu, Lichao Sun, Yixuan Yuan, Xiang Li

    Abstract: Chest X ray (CXR) imaging remains a critical diagnostic tool for thoracic conditions, but current automated systems face limitations in pathology coverage, diagnostic accuracy, and integration of visual and textual reasoning. To address these gaps, we propose RadFabric, a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation. RadF… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 4 figures, 2 tables

  30. arXiv:2506.14028  [pdf, ps, other

    cs.CL

    MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

    Authors: Xueqing Peng, Lingfei Qian, Yan Wang, Ruoyu Xiang, Yueru He, Yang Ren, Mingyang Jiang, Jeff Zhao, Huan He, Yi Han, Yun Feng, Yuechen Jiang, Yupeng Cao, Haohang Li, Yangyang Yu, Xiaoyu Wang, Penglei Gao, Shengyuan Lin, Keyi Wang, Shanshan Yang, Yilun Zhao, Zhiwei Liu, Peng Lu, Jerry Huang, Suyuchen Wang , et al. (19 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global finan… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  31. arXiv:2506.13695  [pdf, ps, other

    cs.IR

    OneRec Technical Report

    Authors: Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang , et al. (40 additional authors not shown)

    Abstract: Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimizat… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Authors are listed alphabetically by their first name

  32. arXiv:2506.13678  [pdf

    cs.LG

    A Gravity-informed Spatiotemporal Transformer for Human Activity Intensity Prediction

    Authors: Yi Wang, Zhenghong Wang, Fan Zhang, Chengling Tang, Chaogui Kang, Di Zhu, Zhongfu Ma, Sijie Ruan, Weiyu Zhang, Yu Zheng, Philip S. Yu, Yu Liu

    Abstract: Human activity intensity prediction is a crucial to many location-based services. Although tremendous progress has been made to model dynamic spatiotemporal patterns of human activity, most existing methods, including spatiotemporal graph neural networks (ST-GNNs), overlook physical constraints of spatial interactions and the over-smoothing phenomenon in spatial correlation modeling. To address th… ▽ More

    Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: 18 pages, 13 figures, under review

  33. arXiv:2506.13226  [pdf, ps, other

    cs.CE

    A modified Newmark/Newton-Raphson method with automatic differentiation for general nonlinear dynamics analysis

    Authors: Yifan Jiang, Yuhong Jin, Lei Hou, Yi Chen, Andong Cong

    Abstract: The Newmark/Newton-Raphson (NNR) method is widely employed for solving nonlinear dynamic systems. However, the current NNR method exhibits limited applicability in complex nonlinear dynamic systems, as the acquisition of the Jacobian matrix required for Newton iterations incurs substantial computational costs and may even prove intractable in certain cases. To address these limitations, we integra… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 18 pages, 9 figures

  34. arXiv:2506.13205  [pdf, ps, other

    cs.CR cs.AI

    Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

    Authors: Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang

    Abstract: With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobi… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages

  35. arXiv:2506.13151  [pdf, ps, other

    cs.AR

    Reconfigurable Digital RRAM Logic Enables In-Situ Pruning and Learning for Edge AI

    Authors: Songqi Wang, Yue Zhang, Jia Chen, Xinyuan Zhang, Yi Li, Ning Lin, Yangu He, Jichang Yang, Yingjie Yu, Yi Li, Zhongrui Wang, Xiaojuan Qi, Han Wang

    Abstract: The human brain simultaneously optimizes synaptic weights and topology by growing, pruning, and strengthening synapses while performing all computation entirely in memory. In contrast, modern artificial-intelligence systems separate weight optimization from topology optimization and depend on energy-intensive von Neumann architectures. Here, we present a software-hardware co-design that bridges th… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  36. arXiv:2506.13129  [pdf, ps, other

    cs.HC

    ChartBlender: An Interactive System for Authoring and Synchronizing Visualization Charts in Video

    Authors: Yi He, Yuqi Liu, Chenpu Li, Ruoyan Chen, Chuer Chen, Shengqi Dang, Nan Cao

    Abstract: Embedding data visualizations in video can enhance the communication of complex information. However, this process is often labor-intensive, requiring designers to adjust visualizations frame by frame manually. In this work, we present ChartBlender, a novel system that streamlines this process by enabling users to create data visualizations, embed them seamlessly into video scenes, and automatical… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures

  37. arXiv:2506.13065  [pdf, ps, other

    cs.CL cs.AI

    MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?

    Authors: Xixian Yong, Jianxun Lian, Xiaoyuan Yi, Xiao Zhou, Xing Xie

    Abstract: Large language models (LLMs) have been widely adopted as the core of agent frameworks in various scenarios, such as social simulations and AI companions. However, the extent to which they can replicate human-like motivations remains an underexplored question. Existing benchmarks are constrained by simplistic scenarios and the absence of character identities, resulting in an information asymmetry w… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  38. arXiv:2506.13063  [pdf, ps, other

    cs.CV cs.CL

    PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue

    Authors: George Shaikovski, Eugene Vorontsov, Adam Casson, Julian Viret, Eric Zimmermann, Neil Tenenholtz, Yi Kan Wang, Jan H. Bernhard, Ran A. Godrich, Juan A. Retamero, Razik Yousfi, Nicolo Fusi, Thomas J. Fuchs, Kristen Severson, Siqi Liu

    Abstract: Recent pathology foundation models can provide rich tile-level representations but fall short of delivering general-purpose clinical utility without further extensive model development. These models lack whole-slide image (WSI) understanding and are not trained with large-scale diagnostic data, limiting their performance on diverse downstream tasks. We introduce PRISM2, a multi-modal slide-level f… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  39. arXiv:2506.12804  [pdf, ps, other

    cs.AI

    Fuzzy Propositional Formulas under the Stable Model Semantics

    Authors: Joohyung Lee, Yi Wang

    Abstract: We define a stable model semantics for fuzzy propositional formulas, which generalizes both fuzzy propositional logic and the stable model semantics of classical propositional formulas. The syntax of the language is the same as the syntax of fuzzy propositional logic, but its semantics distinguishes stable models from non-stable models. The generality of the language allows for highly configurable… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: In the Special Issue on Logics for Reasoning about Preferences, Uncertainty and Vagueness of the IfCoLog Journal of Logics and their Applications, pages 1927-1972, 2017

  40. arXiv:2506.12751  [pdf, ps, other

    stat.ML cs.LG

    Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions

    Authors: Yue Kang, Mingshuo Liu, Bongsoo Yi, Jing Lyu, Zhi Zhang, Doudou Zhou, Yao Li

    Abstract: Generalized linear bandits have been extensively studied due to their broad applicability in real-world online decision-making problems. However, these methods typically assume that the expected reward function is known to the users, an assumption that is often unrealistic in practice. Misspecification of this link function can lead to the failure of all existing algorithms. In this work, we addre… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  41. arXiv:2506.12708  [pdf, ps, other

    cs.DC cs.AI cs.AR cs.LG

    Serving Large Language Models on Huawei CloudMatrix384

    Authors: Pengfei Zuo, Huimin Lin, Junbo Deng, Nan Zou, Xingkun Yang, Yingyu Diao, Weifeng Gao, Ke Xu, Zhangyu Chen, Shirui Lu, Zhao Qiu, Peiyang Li, Xianyu Chang, Zhengzhong Yu, Fangzheng Miao, Jia Zheng, Ying Li, Yuan Feng, Bei Wang, Zaijian Zong, Mosong Zhou, Wenli Zhou, Houjiang Chen, Xingyu Liao, Yipeng Li , et al. (21 additional authors not shown)

    Abstract: The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-leve… ▽ More

    Submitted 19 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

    Comments: 59 pages, 24 figures

  42. arXiv:2506.12530  [pdf, ps, other

    cs.CV

    Towards Seamless Borders: A Method for Mitigating Inconsistencies in Image Inpainting and Outpainting

    Authors: Xingzhong Hou, Jie Wu, Boxiao Liu, Yi Zhang, Guanglu Song, Yunpeng Liu, Yu Liu, Haihang You

    Abstract: Image inpainting is the task of reconstructing missing or damaged parts of an image in a way that seamlessly blends with the surrounding content. With the advent of advanced generative models, especially diffusion models and generative adversarial networks, inpainting has achieved remarkable improvements in visual quality and coherence. However, achieving seamless continuity remains a significant… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  43. arXiv:2506.12379  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Training-free LLM Merging for Multi-task Learning

    Authors: Zichuan Fu, Xian Wu, Yejing Wang, Wanyu Wang, Shanshan Ye, Hongzhi Yin, Yi Chang, Yefeng Zheng, Xiangyu Zhao

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models tailored for various tasks and languages. In this paper, we explore an important question: is it possible to combine these specialized models to create a unifie… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 14 pages, 6 figures

    Journal ref: ACL 2025 Main

  44. arXiv:2506.12331  [pdf, ps, other

    cs.MA cs.AI

    IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment

    Authors: Dekun Wu, Frederik Brudy, Bang Liu, Yi Wang

    Abstract: Virtual environments are essential to AI agent research. Existing environments for LLM agent research typically focus on either physical task solving or social simulation, with the former oversimplifying agent individuality and social dynamics, and the latter lacking physical grounding of social behaviors. We introduce IndoorWorld, a heterogeneous multi-agent environment that tightly integrates ph… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  45. arXiv:2506.12258  [pdf, ps, other

    cs.CV cs.CY

    EgoPrivacy: What Your First-Person Camera Says About You?

    Authors: Yijiang Li, Genpei Zhang, Jiacheng Cheng, Yi Li, Xiaojun Shan, Dashan Gao, Jiancheng Lyu, Yuan Li, Ning Bi, Nuno Vasconcelos

    Abstract: While the rapid proliferation of wearable cameras has raised significant concerns about egocentric video privacy, prior work has largely overlooked the unique privacy threats posed to the camera wearer. This work investigates the core question: How much privacy information about the camera wearer can be inferred from their first-person view videos? We introduce EgoPrivacy, the first large-scale be… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  46. arXiv:2506.12257  [pdf

    cs.CR

    Lessons for Cybersecurity from the American Public Health System

    Authors: Adam Shostack, L. Jean Camp, Yi Ting Chua, Josiah Dykstra, Brian LaMacchia, Daniel Lopresti

    Abstract: The United States needs national institutions and frameworks to systematically collect cybersecurity data, measure outcomes, and coordinate responses across government and private sectors, similar to how public health systems track and address disease outbreaks.

    Submitted 3 March, 2025; originally announced June 2025.

  47. arXiv:2506.12241  [pdf, ps, other

    cs.AI cs.LG

    Privacy Reasoning in Ambiguous Contexts

    Authors: Ren Yi, Octavian Suciu, Adria Gascon, Sarah Meiklejohn, Eugene Bagdasarian, Marco Gruteser

    Abstract: We study the ability of language models to reason about appropriate information disclosure - a central aspect of the evolving field of agentic privacy. Whereas previous works have focused on evaluating a model's ability to align with human decisions, we examine the role of ambiguity and missing context on model performance when making information-sharing decisions. We identify context ambiguity as… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  48. arXiv:2506.12117  [pdf, ps, other

    q-bio.NC cs.AI

    Scale-Invariance Drives Convergence in AI and Brain Representations

    Authors: Junjie Yu, Wenxiao Ma, Jianyu Zhang, Haotian Deng, Zihan Deng, Yi Guo, Quanying Liu

    Abstract: Despite variations in architecture and pretraining strategies, recent studies indicate that large-scale AI models often converge toward similar internal representations that also align with neural activity. We propose that scale-invariance, a fundamental structural principle in natural systems, is a key driver of this convergence. In this work, we propose a multi-scale analytical framework to quan… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  49. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, AdriĆ  de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  50. arXiv:2506.11418  [pdf, ps, other

    cs.CL

    Efficient Long-Context LLM Inference via KV Cache Clustering

    Authors: Jie Hu, Shengnan Wang, Yutong He, Ping Gong, Jiawei Yi, Juncheng Zhang, Youhui Bai, Renhai Chen, Gong Zhang, Cheng Li, Kun Yuan

    Abstract: Large language models (LLMs) with extended context windows have become increasingly prevalent for tackling complex tasks. However, the substantial Key-Value (KV) cache required for long-context LLMs poses significant deployment challenges. Existing approaches either discard potentially critical information needed for future generations or offer limited efficiency gains due to high computational ov… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.