Skip to main content

Showing 1–50 of 137 results for author: Pan, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05633  [pdf, ps, other

    cs.CL cs.AI cs.IR

    SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

    Authors: Yiqiao Jin, Kartik Sharma, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar

    Abstract: Retrieval-augmented Generation (RAG) extends large language models (LLMs) with external knowledge but faces key challenges: restricted effective context length and redundancy in retrieved documents. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a unified RAG framework that balances local precision and glo… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 20 pages

  2. arXiv:2507.01216  [pdf, ps, other

    cs.LG cs.CR

    PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

    Authors: Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Hao Wang, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan

    Abstract: There is a huge gap between numerous intriguing applications fostered by on-device large language model (LLM) fine-tuning (FT) from fresh mobile data and the limited resources of a mobile device. While existing server-assisted methods (e.g., split learning or side-tuning) may enable LLM FT on the local mobile device, they suffer from heavy communication burdens of activation transmissions, and may… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2506.19054  [pdf, ps, other

    cs.CR

    GuardSet-X: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset

    Authors: Mintong Kang, Zhaorun Chen, Chejian Xu, Jiawei Zhang, Chengquan Guo, Minzhou Pan, Ivan Revilla, Yu Sun, Bo Li

    Abstract: As LLMs become widespread across diverse applications, concerns about the security and safety of LLM interactions have intensified. Numerous guardrail models and benchmarks have been developed to ensure LLM content safety. However, existing guardrail benchmarks are often built upon ad hoc risk taxonomies that lack a principled grounding in standardized safety policies, limiting their alignment wit… ▽ More

    Submitted 25 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  4. arXiv:2506.15751  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

    Authors: Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar

    Abstract: As large language models (LLMs) are deployed in safety-critical settings, it is essential to ensure that their responses comply with safety standards. Prior research has revealed that LLMs often fail to grasp the notion of safe behaviors, resulting in either unjustified refusals to harmless prompts or the generation of harmful content. While substantial efforts have been made to improve their robu… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  5. arXiv:2506.15402  [pdf, ps, other

    cs.RO cs.AI cs.CV

    MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System

    Authors: Miaoxin Pan, Jinnan Li, Yaowen Zhang, Yi Yang, Yufeng Yue

    Abstract: Object-level SLAM offers structured and semantically meaningful environment representations, making it more interpretable and suitable for high-level robotic tasks. However, most existing approaches rely on RGB-D sensors or monocular views, which suffer from narrow fields of view, occlusion sensitivity, and limited depth perception-especially in large-scale or outdoor environments. These limitatio… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  6. arXiv:2506.09343  [pdf, other

    cs.CV cs.RO

    CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation

    Authors: Yuxing Long, Jiyao Zhang, Mingjie Pan, Tianshu Wu, Taewhan Kim, Hao Dong

    Abstract: Correct use of electrical appliances has significantly improved human life quality. Unlike simple tools that can be manipulated with common sense, different parts of electrical appliances have specific functions defined by manufacturers. If we want the robot to heat bread by microwave, we should enable them to review the microwave manual first. From the manual, it can learn about component functio… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 Highlight

  7. arXiv:2505.23821  [pdf, ps, other

    cs.CR cs.SD eess.AS

    SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking

    Authors: Lingfeng Yao, Chenpei Huang, Shengyao Wang, Junpei Xue, Hanqing Guo, Jiang Liu, Xun Chen, Miao Pan

    Abstract: With the surge of social media, maliciously tampered public speeches, especially those from influential figures, have seriously affected social stability and public trust. Existing speech tampering detection methods remain insufficient: they either rely on external reference data or fail to be both sensitive to attacks and robust to benign operations, such as compression and resampling. To tackle… ▽ More

    Submitted 1 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  8. arXiv:2505.22358  [pdf, other

    cs.LG cs.AI

    Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs

    Authors: Zhiyi Wan, Wanrou Du, Liang Li, Miao Pan, Xiaoqi Qin

    Abstract: Large language models (LLMs) often suffer from catastrophic forgetting in continual learning (CL) scenarios, where performance on previously learned tasks degrades severely while training on sequentially arriving tasks. Although pioneering CL approaches using orthogonal subspaces can mitigate task interference, they typically employ fixed budget allocation, neglecting the varying complexity across… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  9. arXiv:2505.21528  [pdf, other

    cs.CV cs.AI cs.LG

    UniDB++: Fast Sampling of Unified Diffusion Bridge

    Authors: Mokai Pan, Kaizhen Zhu, Yuexin Ma, Yanwei Fu, Jingyi Yu, Jingya Wang, Ye Shi

    Abstract: Diffusion Bridges enable transitions between arbitrary distributions, with the Unified Diffusion Bridge (UniDB) framework achieving high-fidelity image generation via a Stochastic Optimal Control (SOC) formulation. However, UniDB's reliance on iterative Euler sampling methods results in slow, computationally expensive inference, while existing acceleration techniques for diffusion or diffusion bri… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  10. arXiv:2505.12759  [pdf, ps, other

    cs.LG

    Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization

    Authors: Haochen Yuan, Minting Pan, Yunbo Wang, Siyu Gao, Philip S. Yu, Xiaokang Yang

    Abstract: Reinforcement learning (RL) has shown significant promise for sequential portfolio optimization tasks, such as stock trading, where the objective is to maximize cumulative returns while minimizing risks using historical data. However, traditional RL approaches often produce policies that merely memorize the optimal yet impractical buying and selling behaviors within the fixed dataset. These offlin… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  11. arXiv:2505.08735  [pdf, other

    cs.LG

    Preference Optimization for Combinatorial Optimization Problems

    Authors: Mingjun Pan, Guanquan Lin, You-Wei Luo, Bin Zhu, Zhien Dai, Lijun Sun, Chun Yuan

    Abstract: Reinforcement Learning (RL) has emerged as a powerful tool for neural combinatorial optimization, enabling models to learn heuristics that solve complex problems without requiring expert knowledge. Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast combinatorial action spaces, leading to inefficiency. In this… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted by ICML 2025

  12. arXiv:2505.06482  [pdf, other

    cs.LG cs.AI cs.RO

    Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach

    Authors: Minting Pan, Yitao Zheng, Jiajian Li, Yunbo Wang, Xiaokang Yang

    Abstract: Offline reinforcement learning (RL) enables policy optimization using static datasets, avoiding the risks and costs of extensive real-world exploration. However, it struggles with suboptimal offline behaviors and inaccurate value estimation due to the lack of environmental interaction. We present Video-Enhanced Offline RL (VeoRL), a model-based method that constructs an interactive world model fro… ▽ More

    Submitted 17 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

  13. arXiv:2504.10707  [pdf

    physics.geo-ph cs.LG

    Distinct hydrologic response patterns and trends worldwide revealed by physics-embedded learning

    Authors: Haoyu Ji, Yalan Song, Tadd Bindas, Chaopeng Shen, Yuan Yang, Ming Pan, Jiangtao Liu, Farshid Rahmani, Ather Abbas, Hylke Beck, Kathryn Lawson, Yoshihide Wada

    Abstract: To track rapid changes within our water sector, Global Water Models (GWMs) need to realistically represent hydrologic systems' response patterns - such as baseflow fraction - but are hindered by their limited ability to learn from data. Here we introduce a high-resolution physics-embedded big-data-trained model as a breakthrough in reliably capturing characteristic hydrologic response patterns ('s… ▽ More

    Submitted 22 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  14. arXiv:2504.01260  [pdf, other

    cs.RO cs.HC eess.SY

    The Social Life of Industrial Arms: How Arousal and Attention Shape Human-Robot Interaction

    Authors: Roy El-Helou, Matthew K. X. J Pan

    Abstract: This study explores how human perceptions of a non-anthropomorphic robotic manipulator are shaped by two key dimensions of behaviour: arousal, defined as the robot's movement energy and expressiveness, and attention, defined as the robot's capacity to selectively orient toward and engage with a user. We introduce a novel control architecture that integrates a gaze-like attention engine with an aro… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 7 pages, 3 figures, 1 table

  15. arXiv:2503.17078  [pdf, other

    cs.RO cs.ET

    Exploring psychophysiological methods for human-robot collaboration in construction

    Authors: Saika Wong, Zhentao Chen, Mi Pan, Miroslaw J. Skibniewski

    Abstract: Psychophysiological methods present a promising approach to fostering enhanced mutual communication and collaboration between human workers and robots. Despite their potential, there is still limited understanding of how to effectively integrate psychophysiological methods to improve human-robot collaboration (HRC) in construction. This paper addresses this gap by critically reviewing the use of p… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  16. arXiv:2503.13657  [pdf, other

    cs.AI

    Why Do Multi-Agent LLM Systems Fail?

    Authors: Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

    Abstract: Despite growing enthusiasm for Multi-Agent LLM Systems (MAS), their performance gains on popular benchmarks often remain minimal compared with single-agent frameworks. This gap highlights the need to systematically analyze the challenges hindering MAS effectiveness. We present MAST (Multi-Agent System Failure Taxonomy), the first empirically grounded taxonomy designed to understand MAS failures.… ▽ More

    Submitted 22 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: ArXiv v2

  17. arXiv:2502.20421  [pdf, other

    cs.LG

    MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning

    Authors: Liang Li, Xingke Yang, Wen Wu, Hao Wang, Tomoaki Ohtsuki, Xin Fu, Miao Pan, Xuemin Shen

    Abstract: Large Language Model (LLM) at mobile devices and its potential applications never fail to fascinate. However, on-device LLM fine-tuning poses great challenges due to extremely high memory requirements and slow training speeds. Even with parameter-efficient fine-tuning (PEFT) methods that update only a small subset of parameters, resource-constrained mobile devices cannot afford them. In this paper… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  18. arXiv:2502.12602  [pdf, other

    cs.RO

    Learning-based Dynamic Robot-to-Human Handover

    Authors: Hyeonseong Kim, Chanwoo Kim, Matthew Pan, Kyungjae Lee, Sungjoon Choi

    Abstract: This paper presents a novel learning-based approach to dynamic robot-to-human handover, addressing the challenges of delivering objects to a moving receiver. We hypothesize that dynamic handover, where the robot adjusts to the receiver's movements, results in more efficient and comfortable interaction compared to static handover, where the receiver is assumed to be stationary. To validate this, we… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: Accepted to ICRA 2025. For associated videos, see https://zerotohero7886.github.io/dyn-r2h-handover

  19. arXiv:2502.05749  [pdf, ps, other

    cs.CV cs.AI eess.SY

    UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control

    Authors: Kaizhen Zhu, Mokai Pan, Yuexin Ma, Yanwei Fu, Jingyi Yu, Jingya Wang, Ye Shi

    Abstract: Recent advances in diffusion bridge models leverage Doob's $h$-transform to establish fixed endpoints between distributions, demonstrating promising results in image translation and restoration tasks. However, these approaches frequently produce blurred or excessively smoothed image details and lack a comprehensive theoretical foundation to explain these shortcomings. To address these limitations,… ▽ More

    Submitted 6 June, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

  20. arXiv:2501.16397  [pdf, other

    cs.LG

    THOR: A Generic Energy Estimation Approach for On-Device Training

    Authors: Jiaru Zhang, Zesong Wang, Hao Wang, Tao Song, Huai-an Su, Rui Chen, Yang Hua, Xiangwei Zhou, Ruhui Ma, Miao Pan, Haibing Guan

    Abstract: Battery-powered mobile devices (e.g., smartphones, AR/VR glasses, and various IoT devices) are increasingly being used for AI training due to their growing computational power and easy access to valuable, diverse, and real-time data. On-device training is highly energy-intensive, making accurate energy consumption estimation crucial for effective job scheduling and sustainable AI. However, the het… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: Under review

  21. arXiv:2501.03841  [pdf, other

    cs.RO

    OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

    Authors: Mingjie Pan, Jiyao Zhang, Tianshu Wu, Yinghao Zhao, Wenlong Gao, Hao Dong

    Abstract: The development of general robotic systems capable of manipulating in unstructured environments is a significant challenge. While Vision-Language Models(VLM) excel in high-level commonsense reasoning, they lack the fine-grained 3D spatial understanding required for precise manipulation tasks. Fine-tuning VLM on robotic datasets to create Vision-Language-Action Models(VLA) is a potential solution,… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

  22. arXiv:2501.00332  [pdf, other

    cs.CL cs.IR

    MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

    Authors: Chia-Yuan Chang, Zhimeng Jiang, Vineeth Rakesh, Menghai Pan, Chin-Chia Michael Yeh, Guanchu Wang, Mingzhi Hu, Zhichao Xu, Yan Zheng, Mahashweta Das, Na Zou

    Abstract: Large Language Models (LLMs) are becoming essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating external, real-time information retrieval to ground LLM responses. However, the existing RAG systems frequently struggle with the quality of retrieval do… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

  23. arXiv:2412.19908  [pdf, other

    cs.PL

    Comprehensive Verification of Packet Processing

    Authors: Shengyi Wang, Mengying Pan, Andrew W. Appel

    Abstract: To prove the functional correctness of a P4 program running in a programmable network switch or smart NIC, prior works have focused mainly on verifiers for the "control block" (match-action pipeline). But to verify that a switch handles packets according to a desired specification, proving the control block is not enough. We demonstrate a new comprehensive framework for formally specifying and pro… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

    ACM Class: F.3.1

  24. arXiv:2412.14961  [pdf, other

    cs.CV

    TDCNet: Transparent Objects Depth Completion with CNN-Transformer Dual-Branch Parallel Network

    Authors: Xianghui Fan, Chao Ye, Anping Deng, Xiaotian Wu, Mengyang Pan, Hang Yang

    Abstract: The sensing and manipulation of transparent objects present a critical challenge in industrial and laboratory robotics. Conventional sensors face challenges in obtaining the full depth of transparent objects due to the refraction and reflection of light on their surfaces and their lack of visible texture. Previous research has attempted to obtain complete depth maps of transparent objects from RGB… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

  25. arXiv:2412.12387  [pdf, other

    quant-ph cs.DC

    Differential Privacy Preserving Distributed Quantum Computing

    Authors: Hui Zhong, Keyi Ju, Jiachen Shen, Xinyue Zhang, Xiaoqi Qin, Tomoaki Ohtsuki, Miao Pan, Zhu Han

    Abstract: Existing quantum computers can only operate with hundreds of qubits in the Noisy Intermediate-Scale Quantum (NISQ) state, while quantum distributed computing (QDC) is regarded as a reliable way to address this limitation, allowing quantum computers to achieve their full computational potential. However, similar to classical distributed computing, QDC also faces the problem of privacy leakage. Exis… ▽ More

    Submitted 6 January, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

  26. arXiv:2412.10644  [pdf, other

    eess.SP cs.AI

    Model-driven deep neural network for enhanced direction finding with commodity 5G gNodeB

    Authors: Shengheng Liu, Zihuan Mao, Xingkang Li, Mengguan Pan, Peng Liu, Yongming Huang, Xiaohu You

    Abstract: Pervasive and high-accuracy positioning has become increasingly important as a fundamental enabler for intelligent connected devices in mobile networks. Nevertheless, current wireless networks heavily rely on pure model-driven techniques to achieve positioning functionality, often succumbing to performance deterioration due to hardware impairments in practical scenarios. Here we reformulate the di… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: To appear in ACM TOSN. A preliminary version of this article was presented at the AAAI'2024 Main Technical Track

  27. arXiv:2412.06878  [pdf, other

    cs.CV cs.LG

    SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

    Authors: Zhaorun Chen, Francesco Pinto, Minzhou Pan, Bo Li

    Abstract: With the rise of generative AI and rapid growth of high-quality video generation, video guardrails have become more crucial than ever to ensure safety and security across platforms. Current video guardrails, however, are either overly simplistic, relying on pure classification models trained on simple policies with limited unsafe categories, which lack detailed explanations, or prompting multimoda… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: 43 pages, 20 figures

  28. arXiv:2412.05781  [pdf, other

    cs.CV cs.AI cs.LG

    Open-Source Acceleration of Stable-Diffusion.cpp Deployable on All Devices

    Authors: Jingxu Ng, Cheng Lv, Pu Zhao, Wei Niu, Juyi Lin, Minzhou Pan, Yun Liang, Yanzhi Wang

    Abstract: Stable diffusion plays a crucial role in generating high-quality images. However, image generation is time-consuming and memory-intensive. To address this, stable-diffusion.cpp (Sdcpp) emerges as an efficient inference framework to accelerate the diffusion models. Although it is lightweight, the current implementation of ggml_conv_2d operator in Sdcpp is suboptimal, exhibiting both high inference… ▽ More

    Submitted 7 January, 2025; v1 submitted 7 December, 2024; originally announced December 2024.

  29. arXiv:2411.11091  [pdf, other

    cs.DB

    KV-Tandem -- a Modular Approach to Building High-Speed LSM Storage Engines

    Authors: Edward Bortnikov, Michael Azran, Asa Bornstein, Shmuel Dashevsky, Dennis Huang, Omer Kepten, Michael Pan, Gali Sheffi, Moshe Twitto, Tamar Weiss Orzech, Idit Keidar, Guy Gueta, Roey Maor, Niv Dayan

    Abstract: We present~\emph{KV-Tandem}, a modular architecture for building LSM-based storage engines on top of simple, non-ordered persistent key-value stores (KVSs). KV-Tandem enables advanced functionalities such as range queries and snapshot reads, while maintaining the native KVS performance for random reads and writes. Its modular design offers better performance trade-offs compared to previous KV-sepa… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  30. arXiv:2410.02406  [pdf, other

    cs.HC

    ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR

    Authors: Mengxu Pan, Alexandra Kitson, Hongyu Wan, Mirjana Prpa

    Abstract: Many people struggle with learning a new language, with traditional tools falling short in providing contextualized learning tailored to each learner's needs. The recent development of large language models (LLMs) and embodied conversational agents (ECAs) in social virtual reality (VR) provide new opportunities to practice language learning in a contextualized and naturalistic way that takes into… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: 20 pages, 6 figures

  31. arXiv:2409.20343  [pdf, other

    cs.SE

    Demystifying and Assessing Code Understandability in Java Decompilation

    Authors: Ruixin Qin, Yifan Xiong, Yifei Lu, Minxue Pan

    Abstract: Decompilation, the process of converting machine-level code into readable source code, plays a critical role in reverse engineering. Given that the main purpose of decompilation is to facilitate code comprehension in scenarios where the source code is unavailable, the understandability of decompiled code is of great importance. In this paper, we propose the first empirical study on the understanda… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: 18 pages, 16 figures

  32. arXiv:2409.11394  [pdf, other

    eess.SY cs.RO

    Distributed Perception Aware Safe Leader Follower System via Control Barrier Methods

    Authors: Richie R. Suganda, Tony Tran, Miao Pan, Lei Fan, Qin Lin, Bin Hu

    Abstract: This paper addresses a distributed leader-follower formation control problem for a group of agents, each using a body-fixed camera with a limited field of view (FOV) for state estimation. The main challenge arises from the need to coordinate the agents' movements with their cameras' FOV to maintain visibility of the leader for accurate and reliable state estimation. To address this challenge, we p… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 8 pages, 10 figures

  33. arXiv:2409.00694  [pdf, other

    cs.CV

    IAFI-FCOS: Intra- and across-layer feature interaction FCOS model for lesion detection of CT images

    Authors: Qiu Guan, Mengjie Pan, Feng Chen, Zhiqiang Yang, Zhongwen Yu, Qianwei Zhou, Haigen Hu

    Abstract: Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous l… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 2024 IJCNN

  34. arXiv:2408.04900  [pdf, other

    cs.CL

    Communicate to Play: Pragmatic Reasoning for Efficient Cross-Cultural Communication in Codenames

    Authors: Isadora White, Sashrika Pandey, Michelle Pan

    Abstract: Cultural differences in common ground may result in pragmatic failure and misunderstandings during communication. We develop our method Rational Speech Acts for Cross-Cultural Communication (RSA+C3) to resolve cross-cultural differences in common ground. To measure the success of our method, we study RSA+C3 in the collaborative referential game of Codenames Duet and show that our method successful… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  35. arXiv:2407.17436  [pdf, other

    cs.CY cs.AI

    AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

    Authors: Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in… ▽ More

    Submitted 5 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  36. arXiv:2407.14485  [pdf, ps, other

    cs.GT

    On Sybil-proof Mechanisms

    Authors: Minghao Pan, Bruno Mazorra, Christoph Schlegel, Akaki Mamageishvili

    Abstract: We show that in the single-parameter mechanism design environment, the only non-wasteful, symmetric, incentive compatible and Sybil-proof direct mechanism is a second price auction with symmetric tie-breaking. Thus, if there is private information, lotteries or other mechanisms that do not always allocate to a highest-value bidder are not Sybil-proof or not incentive compatible. Moreover, we show… ▽ More

    Submitted 29 May, 2025; v1 submitted 19 July, 2024; originally announced July 2024.

  37. arXiv:2407.14118  [pdf, other

    cs.SE

    Beyond Code Generation: Assessing Code LLM Maturity with Postconditions

    Authors: Fusen He, Juan Zhai, Minxue Pan

    Abstract: Most existing code Large Language Model (LLM) benchmarks, e.g., EvalPlus, focus on the code generation tasks. Namely, they contain a natural language description of a problem and ask the LLM to write code to solve the problem. We argue that they do not capture all capabilities needed to assess the quality of a code LLM. In this paper, we propose a code LLM maturity model, based on the postconditio… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  38. arXiv:2407.11418  [pdf, other

    cs.DB cs.AI cs.CL

    Semantic Operators: A Declarative Model for Rich, AI-based Data Processing

    Authors: Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, Matei Zaharia

    Abstract: The semantic capabilities of large language models (LLMs) have the potential to enable rich analytics and reasoning over vast knowledge corpora. Unfortunately, existing systems either empirically optimize expensive LLM-powered operations with no performance guarantees, or serve a limited set of row-wise LLM operations, providing limited robustness, expressiveness and usability. We introduce semant… ▽ More

    Submitted 28 February, 2025; v1 submitted 16 July, 2024; originally announced July 2024.

  39. arXiv:2407.00943  [pdf, other

    cs.DC cs.LG

    FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlapping and Participant Selection

    Authors: Jiaxiang Geng, Boyu Li, Xiaoqi Qin, Yixuan Li, Liang Li, Yanzhao Hou, Miao Pan

    Abstract: Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlapping local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues i… ▽ More

    Submitted 20 May, 2025; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 17 pages, 10 figures, Published in Transactions on Mobile Computing

  40. arXiv:2406.18069  [pdf, other

    eess.SP cs.AI cs.CL

    Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

    Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

    Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More

    Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  41. arXiv:2406.17864  [pdf, other

    cs.CY cs.AI

    AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

    Authors: Yi Zeng, Kevin Klyman, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses Sys… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  42. arXiv:2406.06714  [pdf, other

    cs.LG cs.AI cs.HC

    Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation

    Authors: Michelle Pan, Mariah Schrum, Vivek Myers, Erdem Bıyık, Anca Dragan

    Abstract: Adaptive brain stimulation can treat neurological conditions such as Parkinson's disease and post-stroke motor deficits by influencing abnormal neural activity. Because of patient heterogeneity, each patient requires a unique stimulation policy to achieve optimal neural responses. Model-free reinforcement learning (MFRL) holds promise in learning effective policies for a variety of similar control… ▽ More

    Submitted 7 October, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

    Journal ref: International Conference on Machine Learning 2024

  43. arXiv:2406.04662  [pdf, other

    cs.CV

    Evaluating and Mitigating IP Infringement in Visual Generative AI

    Authors: Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan, Lingjuan Lyu

    Abstract: The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can generate content that bears a striking resemblance to characters protected by intellectual property rights held by major entertainment companies (such as Sony, Marve… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  44. arXiv:2406.04491  [pdf, other

    cs.RO

    Towards Robotic Haptic Proxies in Virtual Reality

    Authors: Eric Godden, Matthew Pan

    Abstract: This work represents the initial development of a haptic display system for increased presence in virtual experiences. The developed system creates a two-way connection between a virtual space, mediated through a virtual reality headset, and a physical space, mediated through a robotic manipulator, creating the foundation for future haptic display development using the haptic proxy framework. Here… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  45. arXiv:2406.03785  [pdf, other

    cs.CR

    Count-mean Sketch as an Optimized Framework for Frequency Estimation with Local Differential Privacy

    Authors: Mingen Pan

    Abstract: This paper identifies that a group of state-of-the-art locally-differentially-private (LDP) algorithms for frequency estimation are equivalent to the private Count-Mean Sketch (CMS) algorithm with different parameters. Therefore, we revisit the private CMS, correct errors in the original CMS paper regarding expectation and variance, modify the CMS implementation to eliminate existing bias, and exp… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  46. arXiv:2406.03720  [pdf, other

    cs.CV cs.MM

    JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits

    Authors: Minzhou Pan, Yi Zeng, Xue Lin, Ning Yu, Cho-Jui Hsieh, Peter Henderson, Ruoxi Jia

    Abstract: In this study, we investigate the vulnerability of image watermarks to diffusion-model-based image editing, a challenge exacerbated by the computational cost of accessing gradient information and the closed-source nature of many diffusion models. To address this issue, we introduce JIGMARK. This first-of-its-kind watermarking technique enhances robustness through contrastive learning with pairs of… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  47. arXiv:2406.03711  [pdf, other

    physics.flu-dyn cs.AI

    Pi-fusion: Physics-informed diffusion model for learning fluid dynamics

    Authors: Jing Qiu, Jiancheng Huang, Xiangdong Zhang, Zeng Lin, Minglei Pan, Zengding Liu, Fen Miao

    Abstract: Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particle… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  48. arXiv:2405.11416  [pdf, other

    cs.LG

    Discrete-state Continuous-time Diffusion for Graph Generation

    Authors: Zhe Xu, Ruizhong Qiu, Yuzhong Chen, Huiyuan Chen, Xiran Fan, Menghai Pan, Zhichen Zeng, Mahashweta Das, Hanghang Tong

    Abstract: Graph is a prevalent discrete data structure, whose generation has wide applications such as drug discovery and circuit design. Diffusion generative models, as an emerging research focus, have been applied to graph generation tasks. Overall, according to the space of states and time steps, diffusion generative models can be categorized into discrete-/continuous-state discrete-/continuous-time fash… ▽ More

    Submitted 3 November, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  49. arXiv:2405.00885  [pdf, other

    cs.LG cs.NI eess.IV

    WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

    Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan

    Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training b… ▽ More

    Submitted 27 February, 2025; v1 submitted 1 May, 2024; originally announced May 2024.

  50. arXiv:2403.15955  [pdf, other

    cs.CV cs.AI

    Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection

    Authors: Minzhou Pan, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin

    Abstract: In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a clean non-watermarked dataset as a reference, without relying on specific decoding methods or prior knowledge of the watermarking techniques. We develop WMD using… ▽ More

    Submitted 30 March, 2024; v1 submitted 23 March, 2024; originally announced March 2024.