Skip to main content

Showing 1–50 of 513 results for author: Ren, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04294  [pdf, ps, other

    cs.IR

    BiFair: A Fairness-aware Training Framework for LLM-enhanced Recommender Systems via Bi-level Optimization

    Authors: Jiaming Zhang, Yuyuan Li, Yiqun Xu, Li Zhang, Xiaohua Feng, Zhifei Ren, Chaochao Chen

    Abstract: Large Language Model-enhanced Recommender Systems (LLM-enhanced RSs) have emerged as a powerful approach to improving recommendation quality by leveraging LLMs to generate item representations. Despite these advancements, the integration of LLMs raises severe fairness concerns. Existing studies reveal that LLM-based RSs exhibit greater unfairness than traditional RSs, yet fairness issues in LLM-en… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2507.03227  [pdf, ps, other

    cs.RO

    Dexterous Teleoperation of 20-DoF ByteDexter Hand via Human Motion Retargeting

    Authors: Ruoshi Wen, Jiajun Zhang, Guangzeng Chen, Zhongren Cui, Min Du, Yang Gou, Zhigang Han, Junkai Hu, Liqun Huang, Hao Niu, Wei Xu, Haoxiang Zhang, Zhengming Zhu, Hang Li, Zeyu Ren

    Abstract: Replicating human--level dexterity remains a fundamental robotics challenge, requiring integrated solutions from mechatronic design to the control of high degree--of--freedom (DoF) robotic hands. While imitation learning shows promise in transferring human dexterity to robots, the efficacy of trained policies relies on the quality of human demonstration data. We bridge this gap with a hand--arm te… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Tech Report. Project page: https://byte-dexter.github.io/

  3. arXiv:2507.01321  [pdf, ps, other

    cs.LG cs.AI cs.CR

    ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks

    Authors: Zhiyao Ren, Siyuan Liang, Aishan Liu, Dacheng Tao

    Abstract: In-context learning (ICL) has demonstrated remarkable success in large language models (LLMs) due to its adaptability and parameter-free nature. However, it also introduces a critical vulnerability to backdoor attacks, where adversaries can manipulate LLM behaviors by simply poisoning a few ICL demonstrations. In this paper, we propose, for the first time, the dual-learning hypothesis, which posit… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: ICML 2025

  4. arXiv:2506.23075  [pdf, ps, other

    cs.HC cs.LG eess.SP q-bio.NC

    CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding

    Authors: Yuchen Zhou, Jiamin Wu, Zichen Ren, Zhouheng Yao, Weiheng Lu, Kunyu Peng, Qihao Zheng, Chunfeng Song, Wanli Ouyang, Chao Gou

    Abstract: Understanding and decoding brain activity from electroencephalography (EEG) signals is a fundamental challenge in neuroscience and AI, with applications in cognition, emotion recognition, diagnosis, and brain-computer interfaces. While recent EEG foundation models advance generalized decoding via unified architectures and large-scale pretraining, they adopt a scale-agnostic dense modeling paradigm… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  5. arXiv:2506.19425  [pdf, ps, other

    cs.SE

    What Makes the Best Decomposition? Investigating Binary Decomposition Under FCG Variance

    Authors: Ang Jia, He Jiang, Zhilei Ren, Xiaochen Li, Ming Fan, Ting Liu

    Abstract: Binary decomposition, which decomposes binary files into modules, plays a critical role in binary reuse detection. Existing binary decomposition works either apply anchor-based methods by extending anchor functions to generate modules, or apply clustering-based methods by using clustering algorithms to group binary functions, which all rely on that reused code shares similar function call relation… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  6. arXiv:2506.18879  [pdf, ps, other

    cs.CL cs.AI

    CommVQ: Commutative Vector Quantization for KV Cache Compression

    Authors: Junyan Li, Yang Zhang, Muhammad Yusuf Hassan, Talha Chafekar, Tianle Cai, Zhile Ren, Pengsheng Guo, Foroozan Karimzadeh, Colorado Reed, Chong Wang, Chuang Gan

    Abstract: Large Language Models (LLMs) are increasingly used in applications requiring long context lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as context grows. To address this, we propose Commutative Vector Quantization (CommVQ) to significantly reduce memory usage for long-context LLM inference. We first introduce additive quantization with a lightweight encoder and co… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: ICML 2025 poster

  7. arXiv:2506.17963  [pdf, ps, other

    q-bio.BM cs.AI

    OmniESI: A unified framework for enzyme-substrate interaction prediction with progressive conditional deep learning

    Authors: Zhiwei Nie, Hongyu Zhang, Hao Jiang, Yutian Liu, Xiansong Huang, Fan Xu, Jie Fu, Zhixiang Ren, Yonghong Tian, Wen-Bin Zhang, Jie Chen

    Abstract: Understanding and modeling enzyme-substrate interactions is crucial for catalytic mechanism research, enzyme engineering, and metabolic engineering. Although a large number of predictive methods have emerged, they do not incorporate prior knowledge of enzyme catalysis to rationally modulate general protein-molecule features that are misaligned with catalytic patterns. To address this issue, we int… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  8. arXiv:2506.13415  [pdf, other

    eess.IV cs.AI cs.CV

    Simple is what you need for efficient and accurate medical image segmentation

    Authors: Xiang Yu, Yayan Chen, Guannan He, Qing Zeng, Yue Qin, Meiling Liang, Dandan Luo, Yimei Liao, Zeyu Ren, Cheng Kang, Delong Yang, Bocheng Liang, Bin Pu, Ying Yuan, Shengli Li

    Abstract: While modern segmentation models often prioritize performance over practicality, we advocate a design philosophy prioritizing simplicity and efficiency, and attempted high performance segmentation model design. This paper presents SimpleUNet, a scalable ultra-lightweight medical image segmentation model with three key innovations: (1) A partial feature selection mechanism in skip connections for r… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 15 pages, 11 figures

    ACM Class: I.4.6

  9. arXiv:2506.12078  [pdf, ps, other

    cs.MA cs.AI cs.CL cs.CY cs.SI

    Modeling Earth-Scale Human-Like Societies with One Billion Agents

    Authors: Haoxiang Guan, Jiyan He, Liyang Fan, Zhenzhen Ren, Shaobin He, Xin Yu, Yuan Chen, Shuxin Zheng, Tie-Yan Liu, Zhen Liu

    Abstract: Understanding how complex societal behaviors emerge from individual cognition and interactions requires both high-fidelity modeling of human behavior and large-scale simulations. Traditional agent-based models (ABMs) have been employed to study these dynamics for decades, but are constrained by simplified agent behaviors that fail to capture human complexity. Recent advances in large language mode… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Work in progress

  10. arXiv:2506.10857  [pdf, ps, other

    cs.CV cs.AI cs.MM

    VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

    Authors: Jiashuo Yu, Yue Wu, Meng Chu, Zhifei Ren, Zizheng Huang, Pei Chu, Ruijie Zhang, Yinan He, Qirui Li, Songze Li, Zhenxiang Li, Zhongying Tu, Conghui He, Yu Qiao, Yali Wang, Yi Wang, Limin Wang

    Abstract: We present VRBench, the first long narrative video benchmark crafted for evaluating large models' multi-step reasoning capabilities, addressing limitations in existing evaluations that overlook temporal reasoning and procedural validity. It comprises 1,010 long videos (with an average duration of 1.6 hours), along with 9,468 human-labeled multi-step question-answering pairs and 30,292 reasoning st… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Technical Report

  11. arXiv:2506.02839  [pdf, ps, other

    cs.IR cs.AI

    DeepShop: A Benchmark for Deep Research Shopping Agents

    Authors: Yougang Lyu, Xiaoyu Zhang, Lingyong Yan, Maarten de Rijke, Zhaochun Ren, Xiuying Chen

    Abstract: Web agents for online shopping have shown great promise in automating user interactions across e-commerce platforms. Benchmarks for assessing such agents do not reflect the complexity of real-world shopping scenarios, as they often consist of overly simple queries with deterministic paths, such as "Find iPhone 15." Real shopping scenarios are inherently more layered, involving multi-dimensional pr… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  12. arXiv:2505.23705  [pdf, ps, other

    cs.LG cs.RO

    Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

    Authors: Danny Driess, Jost Tobias Springenberg, Brian Ichter, Lili Yu, Adrian Li-Bell, Karl Pertsch, Allen Z. Ren, Homer Walke, Quan Vuong, Lucy Xiaoyang Shi, Sergey Levine

    Abstract: Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions o… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  13. arXiv:2505.20128  [pdf, other

    cs.CL

    Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers

    Authors: Zhengliang Shi, Lingyong Yan, Dawei Yin, Suzan Verberne, Maarten de Rijke, Zhaochun Ren

    Abstract: Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques. However, effectively enabling LLMs to seek accurate knowledge in complex tasks remains a challenge due to the complexity of multi-hop queries as well as the irrelevant retrieved content. To address these limitations, we propose EXSEARCH, an agentic search framework, where the LLM… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Working in process

  14. arXiv:2505.18774  [pdf, other

    cs.CL

    Disentangling Knowledge Representations for Large Language Model Editing

    Authors: Mengqi Zhang, Zisheng Zhou, Xiaotian Ye, Qiang Liu, Zhaochun Ren, Zhumin Chen, Pengjie Ren

    Abstract: Knowledge Editing has emerged as a promising solution for efficiently updating embedded knowledge in large language models (LLMs). While existing approaches demonstrate effectiveness in integrating new knowledge and preserving the original capabilities of LLMs, they fail to maintain fine-grained irrelevant knowledge facts that share the same subject as edited knowledge but differ in relation and o… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  15. arXiv:2505.16785  [pdf, other

    cs.CR cs.AI

    CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models

    Authors: Zhenzhen Ren, GuoBiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang

    Abstract: Despite providing superior performance, open-source large language models (LLMs) are vulnerable to abusive usage. To address this issue, recent works propose LLM fingerprinting methods to identify the specific source LLMs behind suspect applications. However, these methods fail to provide stealthy and robust fingerprint verification. In this paper, we propose a novel LLM fingerprinting scheme, nam… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  16. arXiv:2505.15801  [pdf, other

    cs.CL cs.AI

    VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models

    Authors: Yuchen Yan, Jin Jiang, Zhenbang Ren, Yijun Li, Xudong Cai, Yang Liu, Xin Xu, Mengdi Zhang, Jian Shao, Yongliang Shen, Jun Xiao, Yueting Zhuang

    Abstract: Large reasoning models such as OpenAI o1 and DeepSeek-R1 have achieved remarkable performance in the domain of reasoning. A key component of their training is the incorporation of verifiable rewards within reinforcement learning (RL). However, existing reward benchmarks do not evaluate reference-based reward systems, leaving researchers with limited understanding of the accuracy of verifiers used… ▽ More

    Submitted 25 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Project Page: https://zju-real.github.io/VerifyBench Dataset: https://huggingface.co/datasets/ZJU-REAL/VerifyBench Code: https://github.com/ZJU-REAL/VerifyBench

  17. arXiv:2505.15467  [pdf, ps, other

    cs.CL cs.AI

    Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

    Authors: Yukun Zhao, Lingyong Yan, Zhenyang Li, Shuaiqiang Wang, Zhumin Chen, Zhaochun Ren, Dawei Yin

    Abstract: Large language models have achieved remarkable success in various tasks. However, it is challenging for them to learn new tasks incrementally due to catastrophic forgetting. Existing approaches rely on experience replay, optimization constraints, or task differentiation, which encounter strict limitations in real-world scenarios. To address these issues, we propose Joint Flashback Adaptation. We f… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  18. arXiv:2505.14170  [pdf, other

    cs.LG

    Nonparametric Teaching for Graph Property Learners

    Authors: Chen Zhang, Weixin Bu, Zeyi Ren, Zhengwu Liu, Yik-Chung Wu, Ngai Wong

    Abstract: Inferring properties of graph-structured data, e.g., the solubility of molecules, essentially involves learning the implicit mapping from graphs to their properties. This learning process is often costly for graph property learners like Graph Convolutional Networks (GCNs). To address this, we propose a paradigm called Graph Neural Teaching (GraNT) that reinterprets the learning process through a n… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: ICML 2025 Spotlight (25 pages, 17 figures)

  19. arXiv:2505.12781  [pdf, ps, other

    cs.CL cs.AI

    A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

    Authors: Jitai Hao, Qiang Huang, Hao Liu, Xinyan Xiao, Zhaochun Ren, Jun Yu

    Abstract: Training high-performing Small Language Models (SLMs) remains costly, even with knowledge distillation and pruning from larger teacher models. Existing work often faces three key challenges: (1) information loss from hard pruning, (2) inefficient alignment of representations, and (3) underutilization of informative activations, particularly from Feed-Forward Networks (FFNs). To address these chall… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  20. arXiv:2505.12736  [pdf, ps, other

    cs.LG

    Deep Unfolding with Kernel-based Quantization in MIMO Detection

    Authors: Zeyi Ren, Jingreng Lei, Yichen Jin, Ermo Hua, Qingfeng Lin, Chen Zhang, Bowen Zhou, Yik-Chung Wu

    Abstract: The development of edge computing places critical demands on energy-efficient model deployment for multiple-input multiple-output (MIMO) detection tasks. Deploying deep unfolding models such as PGD-Nets and ADMM-Nets into resource-constrained edge devices using quantization methods is challenging. Existing quantization methods based on quantization aware training (QAT) suffer from performance degr… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: submitted to ICML ML4Wireless workshop

  21. arXiv:2505.07728  [pdf, other

    cs.RO cs.AI cs.LG

    Guiding Data Collection via Factored Scaling Curves

    Authors: Lihan Zha, Apurva Badithela, Michael Zhang, Justin Lidard, Jeremy Bao, Emily Zhou, David Snyder, Allen Z. Ren, Dhruv Shah, Anirudha Majumdar

    Abstract: Generalist imitation learning policies trained on large datasets show great promise for solving diverse manipulation tasks. However, to ensure generalization to different conditions, policies need to be trained with data collected across a large set of environmental factor variations (e.g., camera pose, table height, distractors) $-$ a prohibitively expensive undertaking, if done exhaustively. We… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project website: https://factored-data-scaling.github.io

  22. arXiv:2505.06576  [pdf, other

    cs.CV cs.AI

    Two-Stage Random Alternation Framework for One-Shot Pansharpening

    Authors: Haorui Chen, Zeyu Ren, Jiaxuan Ren, Ran Ran, Jinliang Shao, Jie Huang, Liangjian Deng

    Abstract: Deep learning has substantially advanced pansharpening, achieving impressive fusion quality. However, a prevalent limitation is that conventional deep learning models, which typically rely on training datasets, often exhibit suboptimal generalization to unseen real-world image pairs. This restricts their practical utility when faced with real-world scenarios not included in the training datasets.… ▽ More

    Submitted 16 May, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

  23. arXiv:2505.03075  [pdf, other

    cs.IR

    Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models

    Authors: Zhengliang Shi, Lingyong Yan, Weiwei Sun, Yue Feng, Pengjie Ren, Xinyu Ma, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Zhaochun Ren

    Abstract: Retrieval-augmented generation (RAG) integrates large language models ( LLM s) with retrievers to access external knowledge, improving the factuality of LLM generation in knowledge-grounded tasks. To optimize the RAG performance, most previous work independently fine-tunes the retriever to adapt to frozen LLM s or trains the LLMs to use documents retrieved by off-the-shelf retrievers, lacking end-… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  24. arXiv:2504.21801  [pdf, other

    cs.CL cs.AI

    DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

    Authors: Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a ch… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  25. arXiv:2504.19749  [pdf, other

    cs.CV

    STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

    Authors: Zhimin Liao, Ping Wei, Shuaijia Chen, Haoxuan Wang, Ziyang Ren

    Abstract: 3D occupancy and scene flow offer a detailed and dynamic representation of 3D scene. Recognizing the sparsity and complexity of 3D space, previous vision-centric methods have employed implicit learning-based approaches to model spatial and temporal information. However, these approaches struggle to capture local details and diminish the model's spatial discriminative ability. To address these chal… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  26. arXiv:2504.17519  [pdf, other

    cs.IR

    Replication and Exploration of Generative Retrieval over Dynamic Corpora

    Authors: Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Zhaochun Ren

    Abstract: Generative retrieval (GR) has emerged as a promising paradigm in information retrieval (IR). However, most existing GR models are developed and evaluated using a static document collection, and their performance in dynamic corpora where document collections evolve continuously is rarely studied. In this paper, we first reproduce and systematically evaluate various representative GR approaches over… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted at SIGIR 2025 (Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval)

  27. arXiv:2504.16054  [pdf, other

    cs.LG cs.RO

    $Ï€_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

    Authors: Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Allen Z. Ren , et al. (11 additional authors not shown)

    Abstract: In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $Ï€_{0.5}$, a new model based on $Ï€_{0}$ that uses co-training on heterogeneous tasks… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  28. arXiv:2504.14680  [pdf, ps, other

    cs.RO

    A Complete and Bounded-Suboptimal Algorithm for a Moving Target Traveling Salesman Problem with Obstacles in 3D

    Authors: Anoop Bhat, Geordan Gutow, Bhaskar Vundurthy, Zhongqiang Ren, Sivakumar Rathinam, Howie Choset

    Abstract: The moving target traveling salesman problem with obstacles (MT-TSP-O) seeks an obstacle-free trajectory for an agent that intercepts a given set of moving targets, each within specified time windows, and returns to the agent's starting position. Each target moves with a constant velocity within its time windows, and the agent has a speed limit no smaller than any target's speed. We present FMC*-T… ▽ More

    Submitted 22 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: Accepted to ICRA 2025

  29. arXiv:2504.14425  [pdf, ps, other

    stat.ML cs.LG math.CA math.FA

    Optimal Scheduling of Dynamic Transport

    Authors: Panos Tsimpos, Zhi Ren, Jakob Zech, Youssef Marzouk

    Abstract: Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable design freedom, and a central question is how to exploit this freedom. Though many popular methods seek straight line (i.e., zero acceleration) trajectories, we sh… ▽ More

    Submitted 17 June, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

  30. arXiv:2504.13482  [pdf, other

    cs.IR

    Improving Sequential Recommenders through Counterfactual Augmentation of System Exposure

    Authors: Ziqi Zhao, Zhaochun Ren, Jiyuan Yang, Zuming Yan, Zihan Wang, Liu Yang, Pengjie Ren, Zhumin Chen, Maarten de Rijke, Xin Xin

    Abstract: In sequential recommendation (SR), system exposure refers to items that are exposed to the user. Typically, only a few of the exposed items would be interacted with by the user. Although SR has achieved great success in predicting future user interests, existing SR methods still fail to fully exploit system exposure data. Most methods only model items that have been interacted with, while the larg… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: accepted at SIGIR 2025 (Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval)

  31. arXiv:2504.11331  [pdf, other

    cs.CL cs.MM

    Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis

    Authors: Hao Liu, Lijun He, Jiaxi Liang, Zhihan Ren, Fan Li

    Abstract: Multimodal Aspect-Based Sentiment Analysis (MABSA) seeks to extract fine-grained information from image-text pairs to identify aspect terms and determine their sentiment polarity. However, existing approaches often fall short in simultaneously addressing three core challenges: Sentiment Cue Perception (SCP), Multimodal Information Misalignment (MIM), and Semantic Noise Elimination (SNE). To overco… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: submitted to ACM MM2025

  32. arXiv:2504.10281  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall cs.AI cs.CV cs.LG

    Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials

    Authors: Jingyun Yang, Ruoyan Avery Yin, Chi Jiang, Yuepeng Hu, Xiaokai Zhu, Xingjian Hu, Sutharsika Kumar, Xiao Wang, Xiaohua Zhai, Keran Rong, Yunyue Zhu, Tianyi Zhang, Zongyou Yin, Jing Kong, Neil Zhenqiang Gong, Zhichu Ren, Haozhe Wang

    Abstract: Characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training. Even for trained human operators, accurate and reliable characterization remains challenging when examining newly discovered materials such as two-dimensional (2D) structures. This bottleneck drives demand for fully autonomous experimentation systems capable of comprehendin… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 13 pages, 4 figures

  33. arXiv:2504.10065  [pdf, other

    cs.CL

    A Computational Cognitive Model for Processing Repetitions of Hierarchical Relations

    Authors: Zeng Ren, Xinyi Guan, Martin Rohrmeier

    Abstract: Patterns are fundamental to human cognition, enabling the recognition of structure and regularity across diverse domains. In this work, we focus on structural repeats, patterns that arise from the repetition of hierarchical relations within sequential data, and develop a candidate computational model of how humans detect and understand such structural repeats. Based on a weighted deduction system,… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  34. Constrained Auto-Regressive Decoding Constrains Generative Retrieval

    Authors: Shiguang Wu, Zhaochun Ren, Xin Xin, Jiyuan Yang, Mengqi Zhang, Zhumin Chen, Maarten de Rijke, Pengjie Ren

    Abstract: Generative retrieval seeks to replace traditional search index data structures with a single large-scale neural network, offering the potential for improved efficiency and seamless integration with generative large language models. As an end-to-end paradigm, generative retrieval adopts a learned differentiable search index to conduct retrieval by directly generating document identifiers through co… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 13 pages, 6 figures, 2 tables, accepted by SIGIR 2025 (Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval)

  35. arXiv:2504.06714  [pdf, other

    cs.IR

    Unifying Search and Recommendation: A Generative Paradigm Inspired by Information Theory

    Authors: Jujia Zhao, Wenjie Wang, Chen Xu, Xiuying Chen, Zhaochun Ren, Suzan Verberne

    Abstract: Recommender systems and search engines serve as foundational elements of online platforms, with the former delivering information proactively and the latter enabling users to seek information actively. Unifying both tasks in a shared model is promising since it can enhance user modeling and item understanding. Previous approaches mainly follow a discriminative paradigm, utilizing shared encoders t… ▽ More

    Submitted 22 May, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  36. arXiv:2504.05732  [pdf, other

    cs.CL

    LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources

    Authors: Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun

    Abstract: Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. The primary challenge in long-to-long generation lies in effectively integrating and analyzing rel… ▽ More

    Submitted 14 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  37. arXiv:2504.04141  [pdf, other

    cs.CL

    Cognitive Debiasing Large Language Models for Decision-Making

    Authors: Yougang Lyu, Shijie Ren, Yue Feng, Zihan Wang, Zhumin Chen, Zhaochun Ren, Maarten de Rijke

    Abstract: Large language models (LLMs) have shown potential in supporting decision-making applications, particularly as personal assistants in the financial, healthcare, and legal domains. While prompt engineering strategies have enhanced the capabilities of LLMs in decision-making, cognitive biases inherent to LLMs present significant challenges. Cognitive biases are systematic patterns of deviation from n… ▽ More

    Submitted 23 May, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

  38. arXiv:2503.24047  [pdf, other

    cs.AI cs.MA

    Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents

    Authors: Shuo Ren, Pu Jian, Zhenjiang Ren, Chunlin Leng, Can Xie, Jiajun Zhang

    Abstract: As scientific research becomes increasingly complex, innovative tools are needed to manage vast data, facilitate interdisciplinary collaboration, and accelerate discovery. Large language models (LLMs) are now evolving into LLM-based scientific agents that automate critical tasks, ranging from hypothesis generation and experiment design to data analysis and simulation. Unlike general-purpose LLMs,… ▽ More

    Submitted 17 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: 34 pages, 10 figures

  39. arXiv:2503.22394  [pdf, other

    cs.CV cs.AI

    Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

    Authors: Rulin Zhou, Wenlong He, An Wang, Qiqi Yao, Haijun Hu, Jiankun Wang, Xi Zhang an Hongliang Ren

    Abstract: Accurate tissue point tracking in endoscopic videos is critical for robotic-assisted surgical navigation and scene understanding, but remains challenging due to complex deformations, instrument occlusion, and the scarcity of dense trajectory annotations. Existing methods struggle with long-term tracking under these conditions due to limited feature utilization and annotation dependence. We present… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  40. arXiv:2503.19937  [pdf, other

    cs.CV cs.AI

    Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation

    Authors: Zhiyao Ren, Yibing Zhan, Baosheng Yu, Dacheng Tao

    Abstract: Text-to-image generation has become increasingly popular, but achieving the desired images often requires extensive prompt engineering. In this paper, we explore how to decode textual prompts from reference images, a process we refer to as image reverse prompt engineering. This technique enables us to gain insights from reference images, understand the creative processes of great artists, and gene… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  41. arXiv:2503.19383  [pdf, other

    cs.CV

    MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

    Authors: Yukang Lin, Hokit Fung, Jianjin Xu, Zeping Ren, Adela S. M. Lau, Guosheng Yin, Xiu Li

    Abstract: Recent portrait animation methods have made significant strides in generating realistic lip synchronization. However, they often lack explicit control over head movements and facial expressions, and cannot produce videos from multiple viewpoints, resulting in less controllable and expressive animations. Moreover, text-guided portrait animation remains underexplored, despite its user-friendly natur… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  42. arXiv:2503.13169  [pdf

    cs.AI

    Collaborative AI Enhances Image Understanding in Materials Science

    Authors: Ruoyan Avery Yin, Zhichu Ren, Zongyou Yin, Zhen Zhang, So Yeon Kim, Chia-Wei Hsu, Ju Li

    Abstract: The Copilot for Real-world Experimental Scientist (CRESt) system empowers researchers to control autonomous laboratories through conversational AI, providing a seamless interface for managing complex experimental workflows. We have enhanced CRESt by integrating a multi-agent collaboration mechanism that utilizes the complementary strengths of the ChatGPT and Gemini models for precise image analysi… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: 10 pages, 4 figures

    ACM Class: I.2.1; I.2.10

  43. arXiv:2503.08703  [pdf, ps, other

    cs.NE cs.CV

    SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks

    Authors: Yimeng Shan, Zhenbang Ren, Haodi Wu, Wenjie Wei, Rui-Jie Zhu, Shuai Wang, Dehao Zhang, Yichen Xiao, Jieyuan Zhang, Kexin Shi, Jingzhinan Wang, Jason K. Eshraghian, Haicheng Qu, Jiqing Zhang, Malu Zhang, Yang Yang

    Abstract: Event cameras provide superior temporal resolution, dynamic range, power efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches that combine Artificial Neural Networks (ANNs) and SNNs, along with suboptimal architectures, compromise energy efficiency and… ▽ More

    Submitted 17 June, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: 11 pages,7 figures,4 tables

  44. arXiv:2503.08160  [pdf, other

    cs.LG

    Concept-Driven Deep Learning for Enhanced Protein-Specific Molecular Generation

    Authors: Taojie Kuang, Qianli Ma, Athanasios V. Vasilakos, Yu Wang, Qiang, Cheng, Zhixiang Ren

    Abstract: In recent years, deep learning techniques have made significant strides in molecular generation for specific targets, driving advancements in drug discovery. However, existing molecular generation methods present significant limitations: those operating at the atomic level often lack synthetic feasibility, drug-likeness, and interpretability, while fragment-based approaches frequently overlook com… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  45. arXiv:2503.04693  [pdf, other

    cs.CL

    UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets

    Authors: Wenyu Wang, Mengqi Zhang, Xiaotian Ye, Zhaochun Ren, Zhumin Chen, Pengjie Ren

    Abstract: Large Language Models (LLMs) inevitably acquire harmful information during training on massive datasets. LLM unlearning aims to eliminate the influence of such harmful information while maintaining the model's overall performance. Existing unlearning methods, represented by gradient ascent-based approaches, primarily focus on forgetting target data while overlooking the crucial impact of logically… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  46. arXiv:2503.01763  [pdf, other

    cs.CL cs.AI cs.IR

    Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

    Authors: Zhengliang Shi, Yuhan Wang, Lingyong Yan, Pengjie Ren, Shuaiqiang Wang, Dawei Yin, Zhaochun Ren

    Abstract: Tool learning aims to augment large language models (LLMs) with diverse tools, enabling them to act as agents for solving practical tasks. Due to the limited context length of tool-using LLMs, adopting information retrieval (IR) models to select useful tools from large toolsets is a critical initial step. However, the performance of IR models in tool retrieval tasks remains underexplored and uncle… ▽ More

    Submitted 26 May, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: ACL 2025. Code: https://github.com/mangopy/tool-retrieval-benchmark

  47. arXiv:2503.01273  [pdf

    cs.AI physics.flu-dyn

    OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization based on CFD

    Authors: Yuxuan Chen, Long Zhang, Xu Zhu, Hua Zhou, Zhuyin Ren

    Abstract: Merging natural language interfaces with computational fluid dynamics (CFD) workflows presents transformative opportunities for both industry and research. In this study, we introduce OptMetaOpenFOAM - a novel framework that bridges MetaOpenFOAM with external analysis and optimization tool libraries through a large language model (LLM)-driven chain-of-thought (COT) methodology. By automating compl… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 26 pages,11 figures

  48. arXiv:2502.20183  [pdf, ps, other

    cs.LG

    Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems

    Authors: Zeyi Ren, Qingfeng Lin, Jingreng Lei, Yang Li, Yik-Chung Wu

    Abstract: In the realm of activity detection for massive machine-type communications, intelligent reflecting surfaces (IRS) have shown significant potential in enhancing coverage for devices lacking direct connections to the base station (BS). However, traditional activity detection methods are typically designed for a single type of channel model, which does not reflect the complexities of real-world scena… ▽ More

    Submitted 26 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: 5 pages, 5 figures, Accepted in IEEE Wireless Communications Letters

  49. arXiv:2502.18702  [pdf, other

    cs.IR cs.CL

    A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition

    Authors: Zihan Wang, Ziqi Zhao, Yougang Lyu, Zhumin Chen, Maarten de Rijke, Zhaochun Ren

    Abstract: Zero-shot named entity recognition (NER) aims to develop entity recognition systems from unannotated text corpora. This task presents substantial challenges due to minimal human intervention. Recent work has adapted large language models (LLMs) for zero-shot NER by crafting specialized prompt templates. It advances model self-learning abilities by incorporating self-annotated demonstrations. Howev… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: Accepted at WWW 2025

  50. arXiv:2502.17848  [pdf, ps, other

    cs.CL

    LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems

    Authors: Jianghao Chen, Zhenlin Wei, Zhenjiang Ren, Ziyong Li, Jiajun Zhang

    Abstract: Recent progress in Large Reasoning Models (LRMs) has significantly enhanced the reasoning abilities of Large Language Models (LLMs), empowering them to tackle increasingly complex tasks through reflection capabilities, such as making assumptions, backtracking, and self-refinement. However, effectively evaluating such reflection capabilities remains challenging due to the lack of appropriate benchm… ▽ More

    Submitted 25 June, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: ACL-2025, our code is available at https://github.com/ZNLP/LR2Bench