Skip to main content

Showing 1–50 of 1,975 results for author: Zhao, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10351  [pdf, other

    cs.CV

    A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

    Authors: Jie Zhu, Jirong Zha, Ding Li, Leye Wang

    Abstract: Self-supervised learning shows promise in harnessing extensive unlabeled data, but it also confronts significant privacy concerns, especially in vision. In this paper, we perform membership inference on visual self-supervised models in a more realistic setting: self-supervised training method and details are unknown for an adversary when attacking as he usually faces a black-box system in practice… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: An extension of our ACM CCS2024 conference paper (arXiv:2404.02462). We show the impacts of scaling from both data and model aspects on membership inference for self-supervised visual encoders

  2. arXiv:2505.10315  [pdf, other

    cs.CR cs.AI

    Private Transformer Inference in MLaaS: A Survey

    Authors: Yang Li, Xinyu Zhou, Yitong Wang, Liangxin Qian, Jun Zhao

    Abstract: Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-pa… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09161  [pdf, ps, other

    cond-mat.mtrl-sci cs.LG

    Bridging Theory and Experiment in Materials Discovery: Machine-Learning-Assisted Prediction of Synthesizable Structures

    Authors: Yu Xin, Peng Liu, Zhuohang Xie, Wenhui Mi, Pengyue Gao, Hong Jian Zhao, Jian Lv, Yanchao Wang, Yanming Ma

    Abstract: Even though thermodynamic energy-based crystal structure prediction (CSP) has revolutionized materials discovery, the energy-driven CSP approaches often struggle to identify experimentally realizable metastable materials synthesized through kinetically controlled pathways, creating a critical gap between theoretical predictions and experimental synthesis. Here, we propose a synthesizability-driven… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09062  [pdf, other

    cs.SE cs.AI cs.LG

    Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models

    Authors: Junda Zhao, Yuliang Song, Eldan Cohen

    Abstract: Recent advancements in source code summarization have leveraged transformer-based pre-trained models, including Large Language Models of Code (LLMCs), to automate and improve the generation of code summaries. However, existing methods often focus on generating a single high-quality summary for a given source code, neglecting scenarios where the generated summary might be inadequate and alternative… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted by the Journal of Systems and Software

    ACM Class: D.2.7

  5. arXiv:2505.08833  [pdf, ps, other

    cs.CV cs.LG

    Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models

    Authors: Qingyi Wang, Yuebing Liang, Yunhan Zheng, Kaiyuan Xu, Jinhua Zhao, Shenhao Wang

    Abstract: Generative AI offers new opportunities for automating urban planning by creating site-specific urban layouts and enabling flexible design exploration. However, existing approaches often struggle to produce realistic and practical designs at scale. Therefore, we adapt a state-of-the-art Stable Diffusion model, extended with ControlNet, to generate high-fidelity satellite imagery conditioned on land… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.08768  [pdf, other

    cs.LG

    SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models

    Authors: Suhan Guo, Jiahong Deng, Mengjun Yi, Furao Shen, Jian Zhao

    Abstract: Attention-based architectures have achieved superior performance in multivariate time series forecasting but are computationally expensive. Techniques such as patching and adaptive masking have been developed to reduce their sizes and latencies. In this work, we propose a structured pruning method, SPAT ($\textbf{S}$ensitivity $\textbf{P}$runer for $\textbf{At}$tention), which selectively removes… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  7. arXiv:2505.08529  [pdf, ps, other

    cs.LG cs.AI

    ExEBench: Benchmarking Foundation Models on Extreme Earth Events

    Authors: Shan Zhao, Zhitong Xiong, Jie Zhao, Xiao Xiang Zhu

    Abstract: Our planet is facing increasingly frequent extreme events, which pose major risks to human lives and ecosystems. Recent advances in machine learning (ML), especially with foundation models (FMs) trained on extensive datasets, excel in extracting features and show promise in disaster management. Nevertheless, these models often inherit biases from training data, challenging their performance over e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  8. arXiv:2505.08448  [pdf, ps, other

    cs.MA

    Scalable UAV Multi-Hop Networking via Multi-Agent Reinforcement Learning with Large Language Models

    Authors: Yanggang Xu, Weijie Hong, Jirong Zha, Geng Chen, Jianfeng Zheng, Chen-Chun Hsia, Xinlei Chen

    Abstract: In disaster scenarios, establishing robust emergency communication networks is critical, and unmanned aerial vehicles (UAVs) offer a promising solution to rapidly restore connectivity. However, organizing UAVs to form multi-hop networks in large-scale dynamic environments presents significant challenges, including limitations in algorithmic scalability and the vast exploration space required for c… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  9. arXiv:2505.08265  [pdf, other

    cs.LG cs.AI

    LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification

    Authors: Hang Gao, Wenxuan Huang, Fengge Wu, Junsuo Zhao, Changwen Zheng, Huaping Liu

    Abstract: The use of large language models (LLMs) as feature enhancers to optimize node representations, which are then used as inputs for graph neural networks (GNNs), has shown significant potential in graph representation learning. However, the fundamental properties of this approach remain underexplored. To address this issue, we propose conducting a more in-depth analysis of this issue based on the int… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  10. arXiv:2505.07596  [pdf, other

    cs.CL cs.AI

    Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent

    Authors: Ziyang Huang, Xiaowei Yuan, Yiming Ju, Jun Zhao, Kang Liu

    Abstract: Retrieval-augmented generation (RAG) is a common strategy to reduce hallucinations in Large Language Models (LLMs). While reinforcement learning (RL) can enable LLMs to act as search agents by activating retrieval capabilities, existing ones often underutilize their internal knowledge. This can lead to redundant retrievals, potential harmful knowledge conflicts, and increased inference latency. To… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  11. arXiv:2505.07460  [pdf, other

    cs.AI cs.CL

    A Survey on Collaborative Mechanisms Between Large and Small Language Models

    Authors: Yi Chen, JiaHao Zhao, HaoHao Han

    Abstract: Large Language Models (LLMs) deliver powerful AI capabilities but face deployment challenges due to high resource costs and latency, whereas Small Language Models (SLMs) offer efficiency and deployability at the cost of reduced performance. Collaboration between LLMs and SLMs emerges as a crucial paradigm to synergistically balance these trade-offs, enabling advanced AI applications, especially on… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  12. arXiv:2505.07243  [pdf, ps, other

    cs.SE quant-ph

    A Black-box Testing Framework for Oracle Quantum Programs

    Authors: Peixun Long, Jianjun Zhao

    Abstract: Oracle quantum programs are a fundamental class of quantum programs that serve as a critical bridge between quantum computing and classical computing. Many important quantum algorithms are built upon oracle quantum programs, making it essential to ensure their correctness during development. While software testing is a well-established approach for improving program reliability, no systematic meth… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 35 pages, 8 figures

  13. arXiv:2505.07198  [pdf, other

    cs.CV

    Ranking-aware Continual Learning for LiDAR Place Recognition

    Authors: Xufei Wang, Gengxuan Tian, Junqiao Zhao, Siyue Tao, Qiwen Gu, Qiankun Yu, Tiantian Feng

    Abstract: Place recognition plays a significant role in SLAM, robot navigation, and autonomous driving applications. Benefiting from deep learning, the performance of LiDAR place recognition (LPR) has been greatly improved. However, many existing learning-based LPR methods suffer from catastrophic forgetting, which severely harms the performance of LPR on previously trained places after training on a new en… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 8 pages, 4 figures

  14. arXiv:2505.06850  [pdf, other

    cs.NE

    Visual Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models: A Case Study of Influence Maximization

    Authors: Jie Zhao, Kang Hao Cheong

    Abstract: Graph-structured combinatorial problems in complex networks are prevalent in many domains, and are computationally demanding due to their complexity and non-linear nature. Traditional evolutionary algorithms (EAs), while robust, often face obstacles due to content-shallow encoding limitations and lack of structural awareness, necessitating hand-crafted modifications for effective application. In t… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  15. arXiv:2505.06828  [pdf, ps, other

    cs.DS

    An Improved Algorithm for a Bipartite Traveling Tournament in Interleague Sports Scheduling

    Authors: Jingyang Zhao, Mingyu Xiao

    Abstract: The bipartite traveling tournament problem (BTTP) addresses inter-league sports scheduling, which aims to design a feasible bipartite tournament between two $n$-team leagues under some constraints such that the total traveling distance of all participating teams is minimized. Since its introduction, several methods have been developed to design feasible schedules for NBA, NPB and so on. In terms o… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: A preliminary version of this article was presented at IJCAI 2024

  16. arXiv:2505.06321  [pdf, other

    cs.LG cs.AI

    Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning

    Authors: Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various domains. However, they still face significant challenges, including high computational costs for training and limitations in solving complex reasoning problems. Although existing methods have extended the reasoning capabilities of LLMs through structured paradigms, these approaches often rely on task-specific prompts and… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  17. arXiv:2505.05811  [pdf, ps, other

    cs.RO

    Unsupervised Anomaly Detection for Autonomous Robots via Mahalanobis SVDD with Audio-IMU Fusion

    Authors: Yizhuo Yang, Jiulin Zhao, Xinhang Xu, Kun Cao, Shenghai Yuan, Lihua Xie

    Abstract: Reliable anomaly detection is essential for ensuring the safety of autonomous robots, particularly when conventional detection systems based on vision or LiDAR become unreliable in adverse or unpredictable conditions. In such scenarios, alternative sensing modalities are needed to provide timely and robust feedback. To this end, we explore the use of audio and inertial measurement unit (IMU) senso… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  18. arXiv:2505.05130  [pdf, ps, other

    cs.DC

    CacheFL: Efficient Federated Cache Model Fine-Tuning for Vision-Language Models

    Authors: Mengjun Yi, Hanwen Zhang, Hui Dou, Jian Zhao, Furao Shen

    Abstract: Large pre-trained Vision-Language Models (VLMs), such as Contrastive Language-Image Pre-training (CLIP), have exhibited remarkable zero-shot performance across various image classification tasks. Fine-tuning these models on domain-specific datasets further enhances their effectiveness for downstream applications. However, fine-tuning in cloud environments raises significant concerns regarding data… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  19. arXiv:2505.04946  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models

    Authors: Xuyang Guo, Jiayan Huo, Zhenmei Shi, Zhao Song, Jiahao Zhang, Jiale Zhao

    Abstract: Thanks to recent advancements in scalable deep architectures and large-scale pretraining, text-to-video generation has achieved unprecedented capabilities in producing high-fidelity, instruction-following content across a wide range of styles, enabling applications in advertising, entertainment, and education. However, these models' ability to render precise on-screen text, such as captions or mat… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  20. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  21. arXiv:2505.03507  [pdf, ps, other

    cs.CV

    Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking

    Authors: Shenglan Li, Rui Yao, Yong Zhou, Hancheng Zhu, Kunyang Sun, Bing Liu, Zhiwen Shao, Jiaqi Zhao

    Abstract: To reduce the reliance on large-scale annotations, self-supervised RGB-T tracking approaches have garnered significant attention. However, the omission of the object region by erroneous pseudo-label or the introduction of background noise affects the efficiency of modality fusion, while pseudo-label noise triggered by similar object noise can further affect the tracking performance. In this paper,… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025)

  22. arXiv:2505.03281  [pdf, other

    cs.LG cs.AI

    Physics-inspired Energy Transition Neural Network for Sequence Learning

    Authors: Zhou Wu, Junyi An, Baile Xu, Furao Shen, Jian Zhao

    Abstract: Recently, the superior performance of Transformers has made them a more robust and scalable solution for sequence modeling than traditional recurrent neural networks (RNNs). However, the effectiveness of Transformer in capturing long-term dependencies is primarily attributed to their comprehensive pair-modeling process rather than inherent inductive biases toward sequence semantics. In this study,… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  23. arXiv:2505.03184  [pdf, other

    cs.CV

    Interactive Instance Annotation with Siamese Networks

    Authors: Xiang Xu, Ruotong Li, Mengjun Yi, Baile XU, Furao Shen, Jian Zhao

    Abstract: Annotating instance masks is time-consuming and labor-intensive. A promising solution is to predict contours using a deep learning model and then allow users to refine them. However, most existing methods focus on in-domain scenarios, limiting their effectiveness for cross-domain annotation tasks. In this paper, we propose SiamAnno, a framework inspired by the use of Siamese networks in object tra… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  24. arXiv:2505.03102  [pdf, ps, other

    cs.AR

    Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU

    Authors: Huanzhi Pu, Rishabh Ravi, Shinnung Jeong, Udit Subramanya, Euijun Chung, Jisheng Zhao, Chihyo Ahn, Hyesoon Kim

    Abstract: RISC-V GPUs present a promising path for supporting GPU applications. Traditionally, GPUs achieve high efficiency through the SPMD (Single Program Multiple Data) programming model. However, modern GPU programming increasingly relies on warp-level features, which diverge from the conventional SPMD paradigm. In this paper, we explore how RISC-V GPUs can support these warp-level features both through… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 4 pages, 6 figures, Workshop W05.4.1 at DATE 2025

  25. arXiv:2505.02471  [pdf, other

    cs.CV

    Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

    Abstract: We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale repr… ▽ More

    Submitted 7 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: https://github.com/inclusionAI/Ming/tree/main/Ming-unify

  26. arXiv:2505.02311  [pdf, other

    cs.CL

    Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

    Authors: Jihao Zhao, Chunlai Zhou, Biao Qin

    Abstract: The collaborative paradigm of large and small language models (LMs) effectively balances performance and cost, yet its pivotal challenge lies in precisely pinpointing the moment of invocation when hallucinations arise in small LMs. Previous optimization efforts primarily focused on post-processing techniques, which were separate from the reasoning process of LMs, resulting in high computational co… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  27. arXiv:2505.01974  [pdf, other

    cs.RO

    KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation

    Authors: Di Zhang, Chengbo Yuan, Chuan Wen, Hai Zhang, Junqiao Zhao, Yang Gao

    Abstract: Collecting demonstrations enriched with fine-grained tactile information is critical for dexterous manipulation, particularly in contact-rich tasks that require precise force control and physical interaction. While prior works primarily focus on teleoperation or video-based retargeting, they often suffer from kinematic mismatches and the absence of real-time tactile feedback, hindering the acquisi… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  28. arXiv:2505.00963  [pdf, other

    cs.LG cs.PL

    Adaptive Branch-and-Bound Tree Exploration for Neural Network Verification

    Authors: Kota Fukuda, Guanqin Zhang, Zhenya Zhang, Yulei Sui, Jianjun Zhao

    Abstract: Formal verification is a rigorous approach that can provably ensure the quality of neural networks, and to date, Branch and Bound (BaB) is the state-of-the-art that performs verification by splitting the problem as needed and applying off-the-shelf verifiers to sub-problems for improved performance. However, existing BaB may not be efficient, due to its naive way of exploring the space of sub-prob… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 7 pages, 6 figures

  29. arXiv:2505.00337  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation

    Authors: Xuyang Guo, Jiayan Huo, Zhenmei Shi, Zhao Song, Jiahao Zhang, Jiale Zhao

    Abstract: Text-to-video generative models have made significant strides in recent years, producing high-quality videos that excel in both aesthetic appeal and accurate instruction following, and have become central to digital art creation and user engagement online. Yet, despite these advancements, their ability to respect fundamental physical laws remains largely untested: many outputs still violate basic… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  30. arXiv:2504.20437  [pdf, other

    cs.LG cs.AI

    GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection

    Authors: DiJia Su, Andrew Gu, Jane Xu, Yuandong Tian, Jiawei Zhao

    Abstract: Large language models (LLMs) have revolutionized natural language understanding and generation but face significant memory bottlenecks during training. GaLore, Gradient Low-Rank Projection, addresses this issue by leveraging the inherent low-rank structure of weight gradients, enabling substantial memory savings without sacrificing performance. Recent works further extend GaLore from various aspec… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  31. arXiv:2504.19341  [pdf, other

    cs.RO cs.AI

    PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies

    Authors: Jialiang Zhao, Naveen Kuppuswamy, Siyuan Feng, Benjamin Burchfiel, Edward Adelson

    Abstract: Achieving robust dexterous manipulation in unstructured domestic environments remains a significant challenge in robotics. Even with state-of-the-art robot learning methods, haptic-oblivious control strategies (i.e. those relying only on external vision and/or proprioception) often fall short due to occlusions, visual complexities, and the need for precise contact interaction control. To address t… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: Nominated for the best paper award at ICRA 2025

  32. arXiv:2504.18857  [pdf, other

    cs.CL cs.AI

    Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

    Authors: Yi Lu, Wanxu Zhao, Xin Zhou, Chenxin An, Chenglong Wang, Shuo Li, Yuming Yang, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Large Language Models (LLMs) often struggle to process and generate coherent context when the number of input tokens exceeds the pre-trained length. Recent advancements in long-context extension have significantly expanded the context window of LLMs but require expensive overhead to train the large-scale models with longer context. In this work, we propose Dimension-Wise Positional Embeddings Mani… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  33. arXiv:2504.18842  [pdf

    cs.RO

    A Microgravity Simulation Experimental Platform For Small Space Robots In Orbit

    Authors: Hang Luo, Nanlin Zhou, Haoxiang Zhang, Kai Han, Ning Zhao, Zhiyuan Yang, Jian Qi, Sikai Zhao, Jie Zhao, Yanhe Zhu

    Abstract: This study describes the development and validation of a novel microgravity experimental platform that is mainly applied to small robots such as modular self-reconfigurable robots. This platform mainly consists of an air supply system, a microporous platform and glass. By supplying air to the microporous platform to form an air film, the influence of the weight of the air foot and the ventilation… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  34. arXiv:2504.18604  [pdf, other

    cs.AI

    A Cognitive-Mechanistic Human Reliability Analysis Framework: A Nuclear Power Plant Case Study

    Authors: Xingyu Xiao, Peng Chen, Jiejuan Tong, Shunshun Liu, Hongru Zhao, Jun Zhao, Qianqian Jia, Jingang Liang, Haitao Wang

    Abstract: Traditional human reliability analysis (HRA) methods, such as IDHEAS-ECA, rely on expert judgment and empirical rules that often overlook the cognitive underpinnings of human error. Moreover, conducting human-in-the-loop experiments for advanced nuclear power plants is increasingly impractical due to novel interfaces and limited operational data. This study proposes a cognitive-mechanistic framewo… ▽ More

    Submitted 5 May, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  35. arXiv:2504.18189  [pdf, other

    cs.HC

    ClassComet: Exploring and Designing AI-generated Danmaku in Educational Videos to Enhance Online Learning

    Authors: Zipeng Ji, Pengcheng An, Jian Zhao

    Abstract: Danmaku, users' live comments synchronized with, and overlaying on videos, has recently shown potential in promoting online video-based learning. However, user-generated danmaku can be scarce-especially in newer or less viewed videos and its quality is unpredictable, limiting its educational impact. This paper explores how large multimodal models (LMM) can be leveraged to automatically generate ef… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  36. arXiv:2504.18012  [pdf, other

    cs.CL cs.AI

    Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation

    Authors: Zhuang Yu, Shiliang Sun, Jing Zhao, Tengfei Song, Hao Yang

    Abstract: Multimodal Machine Translation (MMT) aims to improve translation quality by leveraging auxiliary modalities such as images alongside textual input. While recent advances in large-scale pre-trained language and vision models have significantly benefited unimodal natural language processing tasks, their effectiveness and role in MMT remain underexplored. In this work, we conduct a systematic study o… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  37. arXiv:2504.16795  [pdf, other

    cs.CL cs.AI

    Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention

    Authors: Xiang Hu, Jiaqi Leng, Jun Zhao, Kewei Tu, Wei Wu

    Abstract: A key advantage of Recurrent Neural Networks (RNNs) over Transformers is their linear computational and space complexity enables faster training and inference for long sequences. However, RNNs are fundamentally unable to randomly access historical context, and simply integrating attention mechanisms may undermine their efficiency advantages. To overcome this limitation, we propose \textbf{H}ierarc… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: preprint

  38. arXiv:2504.16615  [pdf, other

    cs.HC

    Algorithmic Mirror: Designing an Interactive Tool to Promote Self-Reflection for YouTube Recommendations

    Authors: Yui Kondo, Kevin Dunnell, Qing Xiao, Jun Zhao, Luc Rocher

    Abstract: Big Data analytics and Artificial Intelligence systems derive non-intuitive and often unverifiable inferences about individuals' behaviors, preferences, and private lives. Drawing on diverse, feature-rich datasets of unpredictable value, these systems erode the intuitive connection between our actions and how we are perceived, diminishing control over our digital identities. While Explainable Arti… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Presented at the 2025 ACM Workshop on Human-AI Interaction for Augmented Reasoning, Report Number: CHI25-WS-AUGMENTED-REASONING

    Report number: CHI25-WS-AUGMENTED-REASONING

    Journal ref: Proceedings of the 2025 ACM CHI Workshop on Human-AI Interaction for Augmented Reasoning

  39. arXiv:2504.16327  [pdf, ps, other

    cs.DS cs.GT

    Universal Online Contention Resolution with Preselected Order

    Authors: Junyao Zhao

    Abstract: Online contention resolution scheme (OCRS) is a powerful technique for online decision making, which--in the case of matroids--given a matroid and a prior distribution of active elements, selects a subset of active elements that satisfies the matroid constraint in an online fashion. OCRS has been studied mostly for product distributions in the literature. Recently, universal OCRS, that works even… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: ICALP 2025

  40. arXiv:2504.15630  [pdf, other

    cs.CL

    Exploiting Contextual Knowledge in LLMs through V-usable Information based Layer Enhancement

    Authors: Xiaowei Yuan, Zhao Yang, Ziyang Huang, Yequan Wang, Siqi Fan, Yiming Ju, Jun Zhao, Kang Liu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet they often struggle with context-faithfulness generations that properly reflect contextual knowledge. While existing approaches focus on enhancing the decoding strategies, they ignore the fundamental mechanism of how contextual information is processed within LLMs' internal states. As a result, LLMs remain… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  41. arXiv:2504.15171  [pdf, other

    cs.LG

    Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture

    Authors: Meng Cui, Xianghu Yue, Xinyuan Qian, Jinzheng Zhao, Haohe Liu, Xubo Liu, Daoliang Li, Wenwu Wang

    Abstract: Fish Feeding Intensity Assessment (FFIA) is crucial in industrial aquaculture management. Recent multi-modal approaches have shown promise in improving FFIA robustness and efficiency. However, these methods face significant challenges when adapting to new fish species or environments due to catastrophic forgetting and the lack of suitable datasets. To address these limitations, we first introduce… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  42. arXiv:2504.15066  [pdf, other

    cs.MM cs.AI

    Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides

    Authors: Jinghua Zhao, Yuhang Jia, Shiyao Wang, Jiaming Zhou, Hui Wang, Yong Qin

    Abstract: Incorporating visual modalities to assist Automatic Speech Recognition (ASR) tasks has led to significant improvements. However, existing Audio-Visual Speech Recognition (AVSR) datasets and methods typically rely solely on lip-reading information or speaking contextual video, neglecting the potential of combining these different valuable visual cues within the speaking context. In this paper, we r… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 6 pages, 7 figures

  43. arXiv:2504.14991  [pdf, other

    cs.IR

    Understanding Accuracy-Fairness Trade-offs in Re-ranking through Elasticity in Economics

    Authors: Chen Xu, Jujia Zhao, Wenjie Wang, Liang Pang, Jun Xu, Tat-Seng Chua, Maarten de Rijke

    Abstract: Fairness is an increasingly important factor in re-ranking tasks. Prior work has identified a trade-off between ranking accuracy and item fairness. However, the underlying mechanisms are still not fully understood. An analogy can be drawn between re-ranking and the dynamics of economic transactions. The accuracy-fairness trade-off parallels the coupling of the commodity tax transfer process. Fairn… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted in SIGIR2025

  44. arXiv:2504.14856  [pdf, other

    cs.CL

    Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation

    Authors: Jiajun Shen, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: While hallucinations of large language models could been alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both exter… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 19 pages, 14 figures

  45. arXiv:2504.14820  [pdf, other

    cs.RO

    Accelerating Visual Reinforcement Learning with Separate Primitive Policy for Peg-in-Hole Tasks

    Authors: Zichun Xu, Zhaomin Wang, Yuntao Li, Lei Zhuang, Zhiyuan Zhao, Guocai Yang, Jingdong Zhao

    Abstract: For peg-in-hole tasks, humans rely on binocular visual perception to locate the peg above the hole surface and then proceed with insertion. This paper draws insights from this behavior to enable agents to learn efficient assembly strategies through visual reinforcement learning. Hence, we propose a Separate Primitive Policy (S2P) to simultaneously learn how to derive location and insertion actions… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  46. arXiv:2504.14582  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu, Hyunhee Park, Suejin Han, Hakjae Jeon, Dafeng Zhang, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Lu Zhao, Yuyi Zhang, Pengyu Yan, Jiawei Hu, Pengwei Liu, Fengjun Guo, Hongyuan Yu , et al. (86 additional authors not shown)

    Abstract: This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that ach… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_ImageSR_x4

  47. arXiv:2504.14065  [pdf, other

    cs.HC

    AnywhereXR: On-the-fly 3D Environments as a Basis for Open Source Immersive Digital Twin Applications

    Authors: Alexander Klippel, Bart Knuiman, Jiayan Zhao, Jan Oliver Wallgrün, Jascha Grübel

    Abstract: Visualization has long been fundamental to human communication and decision-making. Today, we stand at the threshold of integrating veridical, high-fidelity visualizations into immersive digital environments, alongside digital twinning techniques. This convergence heralds powerful tools for communication, co-design, and participatory decision-making. Our paper delves into the development of lightw… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 59 pages, 23 figures, currently resubmitted to International Journal of Digital Earth

    ACM Class: J.2; J.4; I.3; H.4

  48. arXiv:2504.13407  [pdf, other

    cs.CV cs.AI

    LoRA-Based Continual Learning with Constraints on Critical Parameter Changes

    Authors: Shimou Ling, Liang Zhang, Jiangwei Zhao, Lili Pan, Hongliang Li

    Abstract: LoRA-based continual learning represents a promising avenue for leveraging pre-trained models in downstream continual learning tasks. Recent studies have shown that orthogonal LoRA tuning effectively mitigates forgetting. However, this work unveils that under orthogonal LoRA tuning, the critical parameters for pre-tasks still change notably after learning post-tasks. To address this problem, we di… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  49. arXiv:2504.12997  [pdf, other

    cs.CV

    All-in-One Transferring Image Compression from Human Perception to Multi-Machine Perception

    Authors: Jiancheng Zhao, Xiang Ji, Zhuoxiao Li, Zunian Wan, Weihang Ran, Mingze Ma, Muyao Niu, Yifan Zhan, Cheng-Ching Tseng, Yinqiang Zheng

    Abstract: Efficiently transferring Learned Image Compression (LIC) model from human perception to machine perception is an emerging challenge in vision-centric representation learning. Existing approaches typically adapt LIC to downstream tasks in a single-task manner, which is inefficient, lacks task interaction, and results in multiple task-specific bitstreams. To address these limitations, we propose an… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 8 pages, 5 figures

  50. arXiv:2504.12899  [pdf, other

    cs.CV

    Tree-NeRV: A Tree-Structured Neural Representation for Efficient Non-Uniform Video Encoding

    Authors: Jiancheng Zhao, Yifan Zhan, Qingtian Zhu, Mingze Ma, Muyao Niu, Zunian Wan, Xiang Ji, Yinqiang Zheng

    Abstract: Implicit Neural Representations for Videos (NeRV) have emerged as a powerful paradigm for video representation, enabling direct mappings from frame indices to video frames. However, existing NeRV-based methods do not fully exploit temporal redundancy, as they rely on uniform sampling along the temporal axis, leading to suboptimal rate-distortion (RD) performance. To address this limitation, we pro… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 16 pages, 14 figures