Skip to main content

Showing 1–50 of 711 results for author: Zhang, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09094  [pdf, ps, other

    cs.HC

    PLanet: Formalizing Experimental Design

    Authors: London Bielicke, Anna Zhang, Shruti Tyagi, Emery Berger, Adam Chlipala, Eunice Jun

    Abstract: Carefully constructed experimental designs are essential for drawing valid, generalizable conclusions from scientific studies. Unfortunately, experimental design plans can be difficult to specify, communicate clearly, and relate to alternatives. In response, we introduce a grammar of experimental design that provides composable operators for constructing assignment procedures (e.g., Latin square).… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 tables, 6 figures, human-computer interaction, domain specific language, experimental design

  2. arXiv:2505.07147  [pdf, other

    cs.CG math.MG

    All Polyhedral Manifolds are Connected by a 2-Step Refolding

    Authors: Lily Chung, Erik D. Demaine, Jenny Diomidova, Tonan Kamata, Jayson Lynch, Ryuhei Uehara, Hanyu Alice Zhang

    Abstract: We prove that, for any two polyhedral manifolds $\mathcal P, \mathcal Q$, there is a polyhedral manifold $\mathcal I$ such that $\mathcal P, \mathcal I$ share a common unfolding and $\mathcal I,\mathcal Q$ share a common unfolding. In other words, we can unfold $\mathcal P$, refold (glue) that unfolding into $\mathcal I$, unfold $\mathcal I$, and then refold into $\mathcal Q$. Furthermore, if… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures. Presented at JCDCG^3 2024. arXiv admin note: substantial text overlap with arXiv:2412.02174

  3. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  4. arXiv:2505.06537  [pdf, ps, other

    cs.CV cs.AI

    ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images

    Authors: Xianghao Kong, Qiaosong Qi, Yuanbin Wang, Anyi Rao, Biaolong Chen, Aixi Zhang, Si Liu, Hao Jiang

    Abstract: Fashion video generation aims to synthesize temporally consistent videos from reference images of a designated character. Despite significant progress, existing diffusion-based methods only support a single reference image as input, severely limiting their capability to generate view-consistent fashion videos, especially when there are different patterns on the clothes from different perspectives.… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  5. arXiv:2505.04105  [pdf

    eess.IV cs.CV

    MAISY: Motion-Aware Image SYnthesis for Medical Image Motion Correction

    Authors: Andrew Zhang, Hao Wang, Shuchang Ye, Michael Fulham, Jinman Kim

    Abstract: Patient motion during medical image acquisition causes blurring, ghosting, and distorts organs, which makes image interpretation challenging. Current state-of-the-art algorithms using Generative Adversarial Network (GAN)-based methods with their ability to learn the mappings between corrupted images and their ground truth via Structural Similarity Index Measure (SSIM) loss effectively generate mot… ▽ More

    Submitted 8 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  6. arXiv:2505.03729  [pdf, other

    cs.RO cs.CV

    Visual Imitation Enables Contextual Humanoid Control

    Authors: Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Angjoo Kanazawa

    Abstract: How can we teach humanoids to climb staircases and sit on chairs using the surrounding environment context? Arguably, the simplest way is to just show them-casually capture a human motion video and feed it to humanoids. We introduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everyday videos, jointly reconstructs the humans and the environment, and produces whole-body control policies fo… ▽ More

    Submitted 13 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

    Comments: Project website: https://www.videomimic.net/

  7. arXiv:2505.03172  [pdf, ps, other

    cs.LG cs.AI

    Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning

    Authors: Caleb Chuck, Fan Feng, Carl Qi, Chang Shi, Siddhant Agarwal, Amy Zhang, Scott Niekum

    Abstract: Hindsight relabeling is a powerful tool for overcoming sparsity in goal-conditioned reinforcement learning (GCRL), especially in certain domains such as navigation and locomotion. However, hindsight relabeling can struggle in object-centric domains. For example, suppose that the goal space consists of a robotic arm pushing a particular target block to a goal location. In this case, hindsight relab… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Published at ICLR 2025

    Journal ref: The Thirteenth International Conference on Learning Representations. 2025

  8. arXiv:2505.00742  [pdf, other

    cs.CV cs.AI eess.IV

    Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

    Authors: Jiaxu Qian, Chendong Wang, Yifan Yang, Chaoyun Zhang, Huiqiang Jiang, Xufang Luo, Yu Kang, Qingwei Lin, Anlan Zhang, Shiqi Jiang, Ting Cao, Tianjun Mao, Suman Banerjee, Guyue Liu, Saravan Rajmohan, Dongmei Zhang, Yuqing Yang, Qi Zhang, Lili Qiu

    Abstract: Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omi… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  9. AlphaFuse: Learn ID Embeddings for Sequential Recommendation in Null Space of Language Embeddings

    Authors: Guoqing Hu, An Zhang, Shuo Liu, Zhibo Cai, Xun Yang, Xiang Wang

    Abstract: Recent advancements in sequential recommendation have underscored the potential of Large Language Models (LLMs) for enhancing item embeddings. However, existing approaches face three key limitations: 1) the degradation of the semantic space when high-dimensional language embeddings are mapped to lower-dimensional ID embeddings, 2) the underutilization of language embeddings, and 3) the reliance on… ▽ More

    Submitted 29 April, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: Accepted by SIGIR'25

  10. arXiv:2504.18496  [pdf, other

    cs.HC

    Facets, Taxonomies, and Syntheses: Navigating Structured Representations in LLM-Assisted Literature Review

    Authors: Raymond Fok, Joseph Chee Chang, Marissa Radensky, Pao Siangliulue, Jonathan Bragg, Amy X. Zhang, Daniel S. Weld

    Abstract: Comprehensive literature review requires synthesizing vast amounts of research -- a labor intensive and cognitively demanding process. Most prior work focuses either on helping researchers deeply understand a few papers (e.g., for triaging or reading), or retrieving from and visualizing a vast corpus. Deep analysis and synthesis of large paper collections (e.g., to produce a survey paper) is large… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  11. arXiv:2504.17441  [pdf, other

    cs.CV

    Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding

    Authors: Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, Angjoo Kanazawa

    Abstract: Humans can resort to long-form inspection to build intuition on predicting the 3D configurations of unseen objects. The more we observe the object motion, the better we get at predicting its 3D state immediately. Existing systems either optimize underlying representations from multi-view observations or train a feed-forward predictor from supervised datasets. We introduce Predict-Optimize-Distill… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: See our website at: https://predict-optimize-distill.github.io/pod.github.io First two authors contributed equally

  12. arXiv:2504.16552  [pdf, other

    cs.DC

    DTVM: Revolutionizing Smart Contract Execution with Determinism and Compatibility

    Authors: Wei Zhou, Changzheng Wei, Ying Yan, Wei Tang, Zhihao Chen, Xiong Xu, Xuebing Huang, Wengang Chen, Jie Zhang, Yang Chen, Xiaofu Zheng, Hanghang Wu, Shenglong Chen, Ermei Wang, Xiangfei Chen, Yang Yu, Meng Wu, Tao Zhu, Liwei Yuan, Feng Yu, Alex Zhang, Wei Wang, Ji Luo, Zhengyu He, Wenbiao Zhao

    Abstract: We introduce the DeTerministic Virtual Machine (DTVM) Stack, a next-generation smart contract execution framework designed to address critical performance, determinism, and ecosystem compatibility challenges in blockchain networks. Building upon WebAssembly (Wasm) while maintaining full Ethereum Virtual Machine (EVM) ABI compatibility, DTVM introduces a Deterministic Middle Intermediate Representa… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  13. arXiv:2504.15817  [pdf, other

    cs.CR cs.AR

    EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform

    Authors: Yi Huang, Xinsheng Gong, Xiangyu Kong, Dibei Chen, Jianfeng Zhu, Wenping Zhu, Liangwei Li, Mingyu Gao, Shaojun Wei, Aoyang Zhang, Leibo Liu

    Abstract: Fully Homomorphic Encryption (FHE) is a set of powerful cryptographic schemes that allows computation to be performed directly on encrypted data with an unlimited depth. Despite FHE's promising in privacy-preserving computing, yet in most FHE schemes, ciphertext generally blows up thousands of times compared to the original message, and the massive amount of data load from off-chip memory for boot… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted by HPCA 2025

  14. arXiv:2504.15600  [pdf, other

    cs.RO eess.SY

    Research on Navigation Methods Based on LLMs

    Authors: Anlong Zhang, Jianmin Ji

    Abstract: In recent years, the field of indoor navigation has witnessed groundbreaking advancements through the integration of Large Language Models (LLMs). Traditional navigation approaches relying on pre-built maps or reinforcement learning exhibit limitations such as poor generalization and limited adaptability to dynamic environments. In contrast, LLMs offer a novel paradigm for complex indoor navigatio… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  15. arXiv:2504.14391  [pdf, other

    cs.CV

    How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?

    Authors: Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder, Angela Zhang, Ben Athiwaratkun, Shuaiwen Leon Song, David Ouyang, James Zou

    Abstract: Publicly available biomedical videos, such as those on YouTube, serve as valuable educational resources for medical students. Unlike standard machine learning datasets, these videos are designed for human learners, often mixing medical imagery with narration, explanatory diagrams, and contextual framing. In this work, we investigate whether such pedagogically rich, yet non-standardized and heterog… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  16. arXiv:2504.14076  [pdf, other

    cs.SD cs.LG eess.AS

    Transformation of audio embeddings into interpretable, concept-based representations

    Authors: Alice Zhang, Edison Thomaz, Lie Lu

    Abstract: Advancements in audio neural networks have established state-of-the-art results on downstream audio tasks. However, the black-box structure of these models makes it difficult to interpret the information encoded in their internal audio representations. In this work, we explore the semantic interpretability of audio embeddings extracted from these neural networks by leveraging CLAP, a contrastive l… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: Accepted to International Joint Conference on Neural Networks (IJCNN) 2025

  17. arXiv:2504.13392  [pdf, ps, other

    cs.CV cs.HC

    POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

    Authors: Evans Xu Han, Alice Qian Zhang, Hong Shen, Haiyi Zhu, Paul Pu Liang, Jane Hsieh

    Abstract: State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  18. arXiv:2504.13368  [pdf, other

    cs.LG cs.AI

    An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

    Authors: Haoran Xu, Shuozhe Li, Harshit Sikchi, Scott Niekum, Amy Zhang

    Abstract: We introduce Iterative Dual Reinforcement Learning (IDRL), a new method that takes an optimal discriminator-weighted imitation view of solving RL. Our method is motivated by a simple experiment in which we find training a discriminator using the offline dataset plus an additional expert dataset and then performing discriminator-weighted behavior cloning gives strong results on various types of dat… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: ICLR 2025

  19. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  20. arXiv:2504.10961  [pdf

    cs.HC cs.AI

    Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students

    Authors: Audrey Zhang, Yifei Gao, Wannapon Suraworachet, Tanya Nazaretsky, Mutlu Cukurova

    Abstract: As generative AI transforms educational feedback practices, understanding students' perceptions of different feedback providers becomes crucial for effective implementation. This study addresses a critical gap by comparing undergraduate students' trust in AI-generated, human-created, and human-AI co-produced feedback, informing how institutions can adapt feedback practices in this new era. Through… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 35 pages, 6 figures. Under review at Assessment and Evaluation in Higher Education

  21. arXiv:2504.09466  [pdf, other

    cs.CR cs.CL

    AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender

    Authors: Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, Ting Liu

    Abstract: Despite extensive efforts in safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks. Activation steering offers a training-free defense method but relies on fixed steering coefficients, resulting in suboptimal protection and increased false rejections of benign inputs. To address this, we propose AdaSteer, an adaptive activation steering method that dynamically adjus… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 17 pages, 6 figures, 9 tables

  22. arXiv:2504.08813  [pdf, other

    cs.LG cs.AI cs.CR

    SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models

    Authors: Junfeng Fang, Yukai Wang, Ruipeng Wang, Zijun Yao, Kun Wang, An Zhang, Xiang Wang, Tat-Seng Chua

    Abstract: The rapid advancement of multi-modal large reasoning models (MLRMs) -- enhanced versions of multimodal language models (MLLMs) equipped with reasoning capabilities -- has revolutionized diverse applications. However, their safety implications remain underexplored. While prior work has exposed critical vulnerabilities in unimodal reasoning models, MLRMs introduce distinct risks from cross-modal rea… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  23. arXiv:2504.07896  [pdf, other

    cs.LG cs.AI cs.RO

    Fast Adaptation with Behavioral Foundation Models

    Authors: Harshit Sikchi, Andrea Tirinzoni, Ahmed Touati, Yingchen Xu, Anssi Kanervisto, Scott Niekum, Amy Zhang, Alessandro Lazaric, Matteo Pirotta

    Abstract: Unsupervised zero-shot reinforcement learning (RL) has emerged as a powerful paradigm for pretraining behavioral foundation models (BFMs), enabling agents to solve a wide range of downstream tasks specified via reward functions in a zero-shot fashion, i.e., without additional test-time learning or planning. This is achieved by learning self-supervised task embeddings alongside corresponding near-o… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 25 pages

  24. arXiv:2504.06935  [pdf

    cs.LG

    ASRL:A robust loss function with potential for development

    Authors: Chenyu Hui, Anran Zhang, Xintong Li

    Abstract: In this article, we proposed a partition:wise robust loss function based on the previous robust loss function. The characteristics of this loss function are that it achieves high robustness and a wide range of applicability through partition-wise design and adaptive parameter adjustment. Finally, the advantages and development potential of this loss function were verified by applying this loss fun… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: five pages and three figures

  25. arXiv:2504.06020  [pdf, other

    cs.AI cs.CL cs.LG

    Information-Theoretic Reward Decomposition for Generalizable RLHF

    Authors: Liyuan Mao, Haoran Xu, Amy Zhang, Weinan Zhang, Chenjia Bai

    Abstract: A generalizable reward model is crucial in Reinforcement Learning from Human Feedback (RLHF) as it enables correctly evaluating unseen prompt-response pairs. However, existing reward models lack this ability, as they are typically trained by increasing the reward gap between chosen and rejected responses, while overlooking the prompts that the responses are conditioned on. Consequently, when the t… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Work done during internships at Institute of Artificial Intelligence (TeleAI), China Telecom

  26. arXiv:2504.05781  [pdf, other

    cs.HC cs.CY

    Building Proactive and Instant-Reactive Safety Designs to Address Harassment in Social Virtual Reality

    Authors: Zhehui Liao, Hanwen Zhao, Ayush Kulkarni, Shaan Singh Chattrath, Amy X. Zhang

    Abstract: Social Virtual Reality (VR) games offer immersive socialization experiences but pose significant challenges of harassment. Common solutions, such as reporting and moderation, address harassment after it happens but fail to prevent or stop harassment in the moment. In this study, we explore and design proactive and instant-reactive safety designs to mitigate harassment in social VR. Proactive desig… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 37 pages, 11 figures

  27. arXiv:2504.05419  [pdf, other

    cs.AI cs.CL

    Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification

    Authors: Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li, He He

    Abstract: Reasoning models have achieved remarkable performance on tasks like math and logical reasoning thanks to their ability to search during reasoning. However, they still suffer from overthinking, often performing unnecessary reasoning steps even after reaching the correct answer. This raises the question: can models evaluate the correctness of their intermediate answers during reasoning? In this work… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  28. arXiv:2504.05408  [pdf, other

    cs.CR cs.AI cs.CY

    Frontier AI's Impact on the Cybersecurity Landscape

    Authors: Wenbo Guo, Yujin Potter, Tianneng Shi, Zhun Wang, Andy Zhang, Dawn Song

    Abstract: As frontier AI advances rapidly, understanding its impact on cybersecurity and inherent risks is essential to ensuring safe AI evolution (e.g., guiding risk mitigation and informing policymakers). While some studies review AI applications in cybersecurity, none of them comprehensively discuss AI's future impacts or provide concrete recommendations for navigating its safe and secure usage. This pap… ▽ More

    Submitted 14 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  29. arXiv:2504.04156  [pdf, other

    cs.CV

    CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation

    Authors: Kai Fang, Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei

    Abstract: Effective Class Incremental Segmentation (CIS) requires simultaneously mitigating catastrophic forgetting and ensuring sufficient plasticity to integrate new classes. The inherent conflict above often leads to a back-and-forth, which turns the objective into finding the balance between the performance of previous~(old) and incremental~(new) classes. To address this conflict, we introduce a novel a… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR 2025

  30. arXiv:2504.01054  [pdf

    cs.CY cs.SE

    Open, Small, Rigmarole -- Evaluating Llama 3.2 3B's Feedback for Programming Exercises

    Authors: Imen Azaiz, Natalie Kiesler, Sven Strickroth, Anni Zhang

    Abstract: Large Language Models (LLMs) have been subject to extensive research in the past few years. This is particularly true for the potential of LLMs to generate formative programming feedback for novice learners at university. In contrast to Generative AI (GenAI) tools based on LLMs, such as GPT, smaller and open models have received much less attention. Yet, they offer several benefits, as educators c… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: accepted to the International Journal of Engineering Pedagogy (iJEP; eISSN: 2192-4880)

  31. arXiv:2503.22743  [pdf, other

    cs.LG

    Adaptive State-Space Mamba for Real-Time Sensor Data Anomaly Detection

    Authors: Alice Zhang, Chao Li

    Abstract: State-space modeling has emerged as a powerful paradigm for sequence analysis in various tasks such as natural language processing, time-series forecasting, and signal processing. In this work, we propose an \emph{Adaptive State-Space Mamba} (\textbf{ASSM}) framework for real-time sensor data anomaly detection. While state-space models have been previously employed for image processing application… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  32. arXiv:2503.22140  [pdf, other

    eess.IV cs.CV eess.SP

    Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

    Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

    Abstract: Message passing algorithms have been tailored for compressive imaging applications by plugging in different types of off-the-shelf image denoisers. These off-the-shelf denoisers mostly rely on some generic or hand-crafted priors for denoising. Due to their insufficient accuracy in capturing the true image prior, these methods often fail to produce satisfactory results, especially in largely underd… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  33. arXiv:2503.22116  [pdf, ps, other

    cs.CY cs.HC

    Effective Automation to Support the Human Infrastructure in AI Red Teaming

    Authors: Alice Qian Zhang, Jina Suh, Mary L. Gray, Hong Shen

    Abstract: As artificial intelligence (AI) systems become increasingly embedded in critical societal functions, the need for robust red teaming methodologies continues to grow. In this forum piece, we examine emerging approaches to automating AI red teaming, with a particular focus on how the application of automated methods affects human-driven efforts. We discuss the role of labor in automated red teaming… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: This piece has been accepted to the ACM Interactions Publication Tech Labor Forum For August 2025

  34. arXiv:2503.21893  [pdf, other

    cs.CV cs.AI cs.LG

    Exponentially Weighted Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection Model Training in Unmanned Aerial Vehicles Surveillance Scenarios

    Authors: Taufiq Ahmed, Abhishek Kumar, Constantino Álvarez Casado, Anlan Zhang, Tuomo Hänninen, Lauri Loven, Miguel Bordallo López, Sasu Tarkoma

    Abstract: Object detection models often struggle with class imbalance, where rare categories appear significantly less frequently than common ones. Existing sampling-based rebalancing strategies, such as Repeat Factor Sampling (RFS) and Instance-Aware Repeat Factor Sampling (IRFS), mitigate this issue by adjusting sample frequencies based on image and instance counts. However, these methods are based on lin… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 6 pages, 2 figures, 9 tables, 6 formulas, conference paper

  35. arXiv:2503.21018  [pdf, other

    cs.LG

    Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

    Authors: Alexander Levine, Peter Stone, Amy Zhang

    Abstract: While sequential decision-making environments often involve high-dimensional observations, not all features of these observations are relevant for control. In particular, the observation space may capture factors of the environment which are not controllable by the agent, but which add complexity to the observation space. The need to ignore these "noise" features in order to operate in a tractably… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  36. arXiv:2503.20666  [pdf, other

    cs.HC cs.CL

    TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews

    Authors: Huimin Xu, Seungjun Yi, Terence Lim, Jiawei Xu, Andrew Well, Carlos Mery, Aidong Zhang, Yuji Zhang, Heng Ji, Keshav Pingali, Yan Leng, Ying Ding

    Abstract: Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data. TA provides valuable insights in healthcare but is resource-intensive. Large Language Models (LLMs) have been introduced to perform TA, yet their applications in healthcare remain unexplored. Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-A… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Submitted to the American Medical Informatics Association (AMIA) 2025 Annual Symposium, 10 pages

  37. arXiv:2503.20540  [pdf, other

    cs.CV

    Beyond Intermediate States: Explaining Visual Redundancy through Language

    Authors: Dingchen Yang, Bowen Cao, Anran Zhang, Weibo Gu, Winston Hu, Guang Chen

    Abstract: Multi-modal Large Langue Models (MLLMs) often process thousands of visual tokens, which consume a significant portion of the context window and impose a substantial computational burden. Prior work has empirically explored visual token pruning methods based on MLLMs' intermediate states (e.g., attention scores). However, they have limitations in precisely defining visual redundancy due to their in… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  38. arXiv:2503.20297  [pdf, other

    cs.CV

    Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model

    Authors: Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang, Xiaojun Yuan

    Abstract: The distortion-perception (DP) tradeoff reveals a fundamental conflict between distortion metrics (e.g., MSE and PSNR) and perceptual quality. Recent research has increasingly concentrated on evaluating denoising algorithms within the DP framework. However, existing algorithms either prioritize perceptual quality by sacrificing acceptable distortion, or focus on minimizing MSE for faithful restora… ▽ More

    Submitted 3 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025

  39. arXiv:2503.17017  [pdf, other

    cs.LG cs.CV

    Specifying What You Know or Not for Multi-Label Class-Incremental Learning

    Authors: Aoting Zhang, Dongbao Yang, Chang Liu, Xiaopeng Hong, Yu Zhou

    Abstract: Existing class incremental learning is mainly designed for single-label classification task, which is ill-equipped for multi-label scenarios due to the inherent contradiction of learning objectives for samples with incomplete labels. We argue that the main challenge to overcome this contradiction in multi-label class-incremental learning (MLCIL) lies in the model's inability to clearly distinguish… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI 2025

  40. arXiv:2503.16465  [pdf, other

    cs.HC cs.AI

    OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents

    Authors: Pengzhou Cheng, Zheng Wu, Zongru Wu, Aston Zhang, Zhuosheng Zhang, Gongshen Liu

    Abstract: Autonomous graphical user interface (GUI) agents powered by multimodal large language models have shown great promise. However, a critical yet underexplored issue persists: over-execution, where the agent executes tasks in a fully autonomous way, without adequate assessment of its action confidence to compromise an adaptive human-agent collaboration. This poses substantial risks in complex scenari… ▽ More

    Submitted 26 February, 2025; originally announced March 2025.

    Comments: 25 pages, 24 figures, 11 tables

  41. arXiv:2503.15295  [pdf, other

    cs.CV

    DCA: Dividing and Conquering Amnesia in Incremental Object Detection

    Authors: Aoting Zhang, Dongbao Yang, Chang Liu, Xiaopeng Hong, Miao Shang, Yu Zhou

    Abstract: Incremental object detection (IOD) aims to cultivate an object detector that can continuously localize and recognize novel classes while preserving its performance on previous classes. Existing methods achieve certain success by improving knowledge distillation and exemplar replay for transformer-based detection frameworks, but the intrinsic forgetting mechanisms remain underexplored. In this pape… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI 2025

  42. arXiv:2503.14734  [pdf, other

    cs.RO cs.AI cs.LG

    GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

    Authors: NVIDIA, :, Johan Bjorck, Fernando CastaƱeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi "Jim" Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Kaushil Kundalia, Lawrence Lao, Zhiqi Li, Zongyu Lin, Kevin Lin, Guilin Liu, Edith Llontop, Loic Magne, Ajay Mandlekar, Avnish Narayan , et al. (18 additional authors not shown)

    Abstract: General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy in the human world. A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapi… ▽ More

    Submitted 26 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: Authors are listed alphabetically. Project leads are Linxi "Jim" Fan and Yuke Zhu. For more information, see https://developer.nvidia.com/isaac/gr00t

  43. arXiv:2503.11355  [pdf, ps, other

    math.NA cs.MS cs.SE

    TypedMatrices.jl: An Extensible and Type-Based Matrix Collection for Julia

    Authors: Anzhi Zhang, Massimiliano Fasi

    Abstract: TypedMatrices.jl is a Julia package to organize test matrices. By default, the package comes with a number of built-in matrices and interfaces to help users select test cases based on their properties. The package is designed to be extensible, allowing users to define their own matrix types. We discuss the design and implementation of the package and demonstrate its usage with a number of examples… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  44. arXiv:2503.07135  [pdf, other

    cs.RO cs.CV

    VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

    Authors: Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Pollefeys, Stefan Leutenegger

    Abstract: Future robots are envisioned as versatile systems capable of performing a variety of household tasks. The big question remains, how can we bridge the embodiment gap while minimizing physical robot learning, which fundamentally does not scale well. We argue that learning from in-the-wild human videos offers a promising solution for robotic manipulation tasks, as vast amounts of relevant data alread… ▽ More

    Submitted 27 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  45. arXiv:2503.06002  [pdf, ps, other

    cs.HC

    Knowledge Workers' Perspectives on AI Training for Responsible AI Use

    Authors: Angie Zhang, Min Kyung Lee

    Abstract: AI expansion has accelerated workplace adoption of new technologies. Yet, it is unclear whether and how knowledge workers are supported and trained to safely use AI. Inadequate training may lead to unrealized benefits if workers abandon tools, or perpetuate biases if workers misinterpret AI-based outcomes. In a workshop with 39 workers from 26 countries specializing in human resources, labor law,… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: Upcoming at CHI 2025

    ACM Class: H.5; K.5.0

  46. arXiv:2503.03921  [pdf, other

    cs.RO cs.AI cs.CV

    CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

    Authors: Arthur Zhang, Harshit Sikchi, Amy Zhang, Joydeep Biswas

    Abstract: We address the long-horizon mapless navigation problem: enabling robots to traverse novel environments without relying on high-definition maps or precise waypoints that specify exactly where to navigate. Achieving this requires overcoming two major challenges -- learning robust, generalizable perceptual representations of the environment without pre-enumerating all possible navigation factors and… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 19 pages, 10 figures, 5 tables

  47. arXiv:2502.20968  [pdf, other

    cs.CL

    Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs

    Authors: Weixiang Zhao, Yulin Hu, Yang Deng, Jiahe Guo, Xingyu Sui, Xinyang Han, An Zhang, Yanyan Zhao, Bing Qin, Tat-Seng Chua, Ting Liu

    Abstract: Role-playing enables large language models (LLMs) to engage users in immersive and personalized interactions, but it also introduces significant safety risks. Existing role-play fine-tuning techniques improve role adaptability but may degrade safety performance, particularly for villainous characters. In this work, we conduct the first comprehensive assessment of role-play fine-tuning risks by tra… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 25 pages, 10 figures, 13 tables

  48. arXiv:2502.19041  [pdf, other

    cs.CR

    Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs

    Authors: Shiyu Xiang, Ansen Zhang, Yanfei Cao, Yang Fan, Ronghao Chen

    Abstract: Although Aligned Large Language Models (LLMs) are trained to refuse harmful requests, they remain vulnerable to jailbreak attacks. Unfortunately, existing methods often focus on surface-level patterns, overlooking the deeper attack essences. As a result, defenses fail when attack prompts change, even though the underlying "attack essence" remains the same. To address this issue, we introduce EDDF,… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 15 pages, 12 figures

  49. arXiv:2502.16614  [pdf, other

    cs.CL

    CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

    Authors: Alexander Zhang, Marcus Dong, Jiaheng Liu, Wei Zhang, Yejie Wang, Jian Yang, Ge Zhang, Tianyu Liu, Zhongyuan Peng, Yingshui Tan, Yuanxing Zhang, Zhexu Wang, Weixun Wang, Yancheng He, Ken Deng, Wangchunshu Zhou, Wenhao Huang, Zhaoxiang Zhang

    Abstract: The critique capacity of Large Language Models (LLMs) is essential for reasoning abilities, which can provide necessary suggestions (e.g., detailed analysis and constructive feedback). Therefore, how to evaluate the critique capacity of LLMs has drawn great attention and several critique benchmarks have been proposed. However, existing critique benchmarks usually have the following limitations: (1… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  50. arXiv:2502.16043  [pdf, other

    cs.CY

    African Data Ethics: A Discursive Framework for Black Decolonial Data Science

    Authors: Teanna Barrett, Chinasa T. Okolo, B. Biira, Eman Sherif, Amy X. Zhang, Leilani Battle

    Abstract: Most artificial intelligence (AI) and other data-driven systems (DDS) are created by and for the benefit of global superpowers. The shift towards pluralism in global data ethics acknowledges the importance of including perspectives from the Global Majority to develop responsible data science (RDS) practices that mitigate systemic harms inherent to the current data science ecosystem. African practi… ▽ More

    Submitted 26 February, 2025; v1 submitted 21 February, 2025; originally announced February 2025.