Skip to main content

Showing 1–50 of 67 results for author: Mei, K

.
  1. arXiv:2506.12204  [pdf, ps, other

    cs.LG cs.AI cs.OS

    Semantic Scheduling for LLM Inference

    Authors: Wenyue Hua, Dujian Ding, Yile Gu, Yujie Ren, Kai Mei, Minghua Ma, William Yang Wang

    Abstract: Conventional operating system scheduling algorithms are largely content-ignorant, making decisions based on factors such as latency or fairness without considering the actual intents or semantics of processes. Consequently, these algorithms often do not prioritize tasks that require urgent attention or carry higher importance, such as in emergency management scenarios. However, recent advances in… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 18 pages, 3 figures

  2. arXiv:2506.08846  [pdf, ps, other

    cs.CY cs.CL cs.SD eess.AS

    Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia

    Authors: Katelyn Xiaoying Mei, Anna Seo Gyeong Choi, Hilke Schellmann, Mona Sloane, Allison Koenecke

    Abstract: Automatic Speech Recognition (ASR) has transformed daily tasks from video transcription to workplace hiring. ASR systems' growing use warrants robust and standardized auditing approaches to ensure automated transcriptions of high and equitable quality. This is especially critical for people with speech and language disorders (such as aphasia) who may disproportionately depend on ASR systems to nav… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  3. arXiv:2505.21905  [pdf, ps, other

    cs.CV cs.MM

    Reference-Guided Identity Preserving Face Restoration

    Authors: Mo Zhou, Keren Ye, Viraj Shah, Kangfu Mei, Mauricio Delbracio, Peyman Milanfar, Vishal M. Patel, Hossein Talebi

    Abstract: Preserving face identity is a critical yet persistent challenge in diffusion-based image restoration. While reference faces offer a path forward, existing reference-based methods often fail to fully exploit their potential. This paper introduces a novel approach that maximizes reference face utility for improved face restoration and identity preservation. Our method makes three key contributions:… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  4. arXiv:2505.18829  [pdf, ps, other

    cs.AI cs.HC cs.OS

    LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS

    Authors: Kai Mei, Xi Zhu, Hang Gao, Shuhang Lin, Yongfeng Zhang

    Abstract: We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powerful agent frameworks or enhancing agent models, we identify a fundamental limitation: the semantic disconnect between how language models understand the world and how computer interfaces are structur… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  5. arXiv:2505.01537  [pdf, other

    cs.HC

    Passing the Buck to AI: How Individuals' Decision-Making Patterns Affect Reliance on AI

    Authors: Katelyn Xiaoying Mei, Rock Yuren Pang, Alex Lyford, Lucy Lu Wang, Katharina Reinecke

    Abstract: Psychological research has identified different patterns individuals have while making decisions, such as vigilance (making decisions after thorough information gathering), hypervigilance (rushed and anxious decision-making), and buckpassing (deferring decisions to others). We examine whether these decision-making patterns shape peoples' likelihood of seeking out or relying on AI. In an online exp… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  6. arXiv:2504.14689  [pdf, ps, other

    cs.HC

    Designing AI Systems that Augment Human Performed vs. Demonstrated Critical Thinking

    Authors: Katelyn Xiaoying Mei, Nic Weber

    Abstract: The recent rapid advancement of LLM-based AI systems has accelerated our search and production of information. While the advantages brought by these systems seemingly improve the performance or efficiency of human activities, they do not necessarily enhance human capabilities. Recent research has started to examine the impact of generative AI on individuals' cognitive abilities, especially critica… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Presented at the 2025 ACM Workshop on Human-AI Interaction for Augmented Reasoning,

    Report number: CHI25-WS-AUGMENTED-REASONING

    Journal ref: Proceedings of the 2025 ACM CHI Workshop on Human-AI Interaction for Augmented Reasoning

  7. arXiv:2503.14503  [pdf, other

    cs.CV cs.AI cs.LG

    The Power of Context: How Multimodality Improves Image Super-Resolution

    Authors: Kangfu Mei, Hossein Talebi, Mojtaba Ardakani, Vishal M. Patel, Peyman Milanfar, Mauricio Delbracio

    Abstract: Single-image super-resolution (SISR) remains challenging due to the inherent difficulty of recovering fine-grained details and preserving perceptual quality from low-resolution inputs. Existing methods often rely on limited image priors, leading to suboptimal results. We propose a novel approach that leverages the rich contextual information available in multiple modalities -- including depth, seg… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: accepted by CVPR2025

  8. arXiv:2503.11444  [pdf, other

    cs.MA cs.AI cs.CL cs.OS

    Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery

    Authors: Balaji Rama, Kai Mei, Yongfeng Zhang

    Abstract: Autonomous LLM-based agents have emerged as a powerful paradigm for complex task execution, yet the field lacks standardized tools for development, deployment, distribution and discovery of agents. We present Cerebrum, an Agent SDK for AIOS that addresses this gap through three key components: (1) a comprehensive SDK featuring a modular four-layer architecture for agent development, encompassing L… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted to the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) - System Demonstration Track

  9. arXiv:2503.01725  [pdf, other

    cs.CV

    HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization

    Authors: Zitang Zhou, Ke Mei, Yu Lu, Tianyi Wang, Fengyun Rao

    Abstract: This paper introduces HarmonySet, a comprehensive dataset designed to advance video-music understanding. HarmonySet consists of 48,328 diverse video-music pairs, annotated with detailed information on rhythmic synchronization, emotional alignment, thematic coherence, and cultural relevance. We propose a multi-step human-machine collaborative framework for efficient annotation, combining human insi… ▽ More

    Submitted 4 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025. Project page: https://harmonyset.github.io/

  10. arXiv:2502.20576  [pdf, ps, other

    cs.DB cs.CL

    OmniRouter: Budget and Performance Controllable Multi-LLM Routing

    Authors: Kai Mei, Wujiang Xu, Shuhang Lin, Yongfeng Zhang

    Abstract: Large language models (LLMs) deliver superior performance but require substantial computational resources and operate with relatively low efficiency, while smaller models can efficiently handle simpler tasks with fewer resources. LLM routing is a crucial paradigm that dynamically selects the most suitable large language models from a pool of candidates to process diverse inputs, ensuring optimal r… ▽ More

    Submitted 31 May, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  11. arXiv:2502.14662  [pdf, ps, other

    cs.CL cs.IR

    iAgent: LLM Agent as a Shield between User and Recommender Systems

    Authors: Wujiang Xu, Yunxiao Shi, Zujie Liang, Xuying Ning, Kai Mei, Kun Wang, Xi Zhu, Min Xu, Yongfeng Zhang

    Abstract: Traditional recommender systems usually take the user-platform paradigm, where users are directly exposed under the control of the platform's recommendation algorithms. However, the defect of recommendation algorithms may put users in very vulnerable positions under this paradigm. First, many sophisticated models are often designed with commercial objectives in mind, focusing on the platform's ben… ▽ More

    Submitted 29 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Findings of ACL 2025 and WWW2025@HCRS

  12. arXiv:2502.12110  [pdf, ps, other

    cs.CL cs.HC

    A-MEM: Agentic Memory for LLM Agents

    Authors: Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, Yongfeng Zhang

    Abstract: While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adap… ▽ More

    Submitted 2 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  13. arXiv:2502.01563  [pdf, other

    cs.CL

    Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

    Authors: Mingyu Jin, Kai Mei, Wujiang Xu, Mingjie Sun, Ruixiang Tang, Mengnan Du, Zirui Liu, Yongfeng Zhang

    Abstract: Large language models (LLMs) have achieved remarkable success in contextual knowledge understanding. In this paper, we show that these concentrated massive values consistently emerge in specific regions of attention queries (Q) and keys (K) while not having such patterns in values (V) in various modern transformer-based LLMs (Q, K, and V mean the representations output by the query, key, and value… ▽ More

    Submitted 20 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: International Conference on Machine Learning (ICML 2025)

  14. arXiv:2412.02054  [pdf, other

    cs.CV

    Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable

    Authors: Lizhen Xu, Shanmin Pang, Wenzhao Qiu, Zehao Wu, Xiuxiu Bai, Kuizhi Mei, Jianru Xue

    Abstract: Query-based models are extensively used in 3D object detection tasks, with a wide range of pre-trained checkpoints readily available online. However, despite their popularity, these models often require an excessive number of object queries, far surpassing the actual number of objects to detect. The redundant queries result in unnecessary computational and memory costs. In this paper, we find that… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 13 pages,5 figures

  15. arXiv:2410.11843  [pdf, other

    cs.HC cs.AI cs.DB cs.LG

    From Commands to Prompts: LLM-based Semantic File System for AIOS

    Authors: Zeru Shi, Kai Mei, Mingyu Jin, Yongye Su, Chaoji Zuo, Wenyue Hua, Wujiang Xu, Yujie Ren, Zirui Liu, Mengnan Du, Dong Deng, Yongfeng Zhang

    Abstract: Large language models (LLMs) have demonstrated significant potential in the development of intelligent applications and systems such as LLM-based agents and agent operating systems (AIOS). However, when these applications and systems interact with the underlying file system, the file system still remains the traditional paradigm: reliant on manual navigation through precise commands. This paradigm… ▽ More

    Submitted 18 March, 2025; v1 submitted 23 September, 2024; originally announced October 2024.

    Comments: Accepted by International Conference on Learning Representations 2025(ICLR2025)

  16. arXiv:2410.02644  [pdf, ps, other

    cs.CR cs.AI

    Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

    Authors: Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, Yongfeng Zhang

    Abstract: Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive frame… ▽ More

    Submitted 29 May, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted by ICLR 2025

  17. arXiv:2407.12821  [pdf, other

    cs.CL cs.AI cs.LG

    AutoFlow: Automated Workflow Generation for Large Language Model Agents

    Authors: Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang

    Abstract: Recent advancements in Large Language Models (LLMs) have shown significant progress in understanding complex natural language. One important application of LLM is LLM-based AI Agent, which leverages the ability of LLM as well as external tools for complex-task solving. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed workflows are usuall… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Open source code available at https://github.com/agiresearch/AutoFlow

  18. arXiv:2406.08266  [pdf, other

    eess.AS cs.SD

    Refining Self-Supervised Learnt Speech Representation using Brain Activations

    Authors: Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Liping Chen, Jie Zhang, Zhenhua Ling

    Abstract: It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work,… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: accpeted by Interspeech2024

  19. arXiv:2405.06907  [pdf, other

    cs.CL cs.AI cs.LG cs.PL

    AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents

    Authors: Shuyuan Xu, Zelong Li, Kai Mei, Yongfeng Zhang

    Abstract: Since their inception, programming languages have trended towards greater readability and lower barriers for programmers. Following this trend, natural language can be a promising type of programming language that provides great flexibility and usability and helps towards the democracy of programming. However, the inherent vagueness, ambiguity, and verbosity of natural language pose significant ch… ▽ More

    Submitted 21 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures, comments and suggestions are welcome

  20. arXiv:2404.07066  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers?

    Authors: Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang

    Abstract: Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are ty… ▽ More

    Submitted 4 February, 2025; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: COLING 2025

  21. arXiv:2404.01367  [pdf, other

    cs.CV cs.LG

    Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

    Authors: Kangfu Mei, Zhengzhong Tu, Mauricio Delbracio, Hossein Talebi, Vishal M. Patel, Peyman Milanfar

    Abstract: We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency. While improved network architecture and inference algorithms have shown to effectively boost sampling efficiency of diffusion models, the role of model size -- a critical determinant of sampling efficiency -- has not been thoroughly examined. Through empirical analysis of established te… ▽ More

    Submitted 10 December, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to TMLR. Camera-ready version

  22. arXiv:2403.16971  [pdf, other

    cs.OS cs.AI cs.CL

    AIOS: LLM Agent Operating System

    Authors: Kai Mei, Xi Zhu, Wujiang Xu, Wenyue Hua, Mingyu Jin, Zelong Li, Shuyuan Xu, Ruosong Ye, Yingqiang Ge, Yongfeng Zhang

    Abstract: LLM-based intelligent agents face significant deployment challenges, particularly related to resource management. Allowing unrestricted access to LLM or tool resources can lead to inefficient or even potentially harmful resource allocation and utilization for agents. Furthermore, the absence of proper scheduling and resource management mechanisms in current agent designs hinders concurrent process… ▽ More

    Submitted 11 May, 2025; v1 submitted 25 March, 2024; originally announced March 2024.

  23. arXiv:2402.13184  [pdf, ps, other

    cs.CL

    What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

    Authors: Zhaoqian Xue, Beichen Wang, Suiyuan Zhu, Kai Mei, Hua Tang, Wenyue Hua, Mengnan Du, Yongfeng Zhang

    Abstract: This study introduces "CosmoAgent," an innovative artificial intelligence system that utilizes Large Language Models (LLMs) to simulate complex interactions between human and extraterrestrial civilizations. This paper introduces a mathematical model for quantifying the levels of civilization development and further employs a state transition matrix approach to evaluate their trajectories. Through… ▽ More

    Submitted 8 June, 2025; v1 submitted 20 February, 2024; originally announced February 2024.

  24. Careless Whisper: Speech-to-Text Hallucination Harms

    Authors: Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, Mona Sloane

    Abstract: Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurat… ▽ More

    Submitted 2 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  25. arXiv:2312.02156  [pdf, other

    cs.CV cs.AI

    Latent Feature-Guided Diffusion Models for Shadow Removal

    Authors: Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen, Vishal M. Patel

    Abstract: Recovering textures under shadows has remained a challenging problem due to the difficulty of inferring shadow-free scenes from shadow images. In this paper, we propose the use of diffusion models as they offer a promising approach to gradually refine the details of shadow regions during the diffusion process. Our method improves this process by conditioning on a learned latent feature space that… ▽ More

    Submitted 10 May, 2025; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: project page see https://kfmei.com/shadow-diffusion/

  26. arXiv:2311.17227  [pdf, other

    cs.AI cs.CL cs.CY

    War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars

    Authors: Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang

    Abstract: Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to simulate the partic… ▽ More

    Submitted 30 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 47 pages, 9 figures, 5 tables

  27. arXiv:2310.17488  [pdf, other

    cs.IR cs.CL

    LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation

    Authors: Kai Mei, Yongfeng Zhang

    Abstract: This paper presents LightLM, a lightweight Transformer-based language model for generative recommendation. While Transformer-based generative modeling has gained importance in various AI sub-fields such as NLP and vision, generative recommendation is still in its infancy due to its unique demand on personalized generative modeling. Existing works on generative recommendation often use NLP-oriented… ▽ More

    Submitted 29 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

  28. arXiv:2310.01407  [pdf, other

    cs.CV cs.AI cs.LG

    CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

    Authors: Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar

    Abstract: Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks such as image enhancement, restoration, editing, and compositing. However, their widespread adoption is hindered by the high computational cost, which limits their real-time application. To address this challenge, we introduce a novel method dubbed CoDi, that… ▽ More

    Submitted 17 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  29. Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks

    Authors: Katelyn X. Mei, Sonia Fereidooni, Aylin Caliskan

    Abstract: The rapid deployment of artificial intelligence (AI) models demands a thorough investigation of biases and risks inherent in these models to understand their impact on individuals and society. This study extends the focus of bias evaluation in extant work by examining bias against social stigmas on a large scale. It focuses on 93 stigmatized groups in the United States, including a wide range of c… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 20 pages,12 figures,2 tables; ACM FAccT 2023

    ACM Class: K.4; I.2.7; I.2.0

  30. arXiv:2305.17826  [pdf, other

    cs.CL cs.CR

    NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

    Authors: Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma

    Abstract: Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable b… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  31. arXiv:2305.14674  [pdf, other

    cs.CV

    T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities

    Authors: Kangfu Mei, Mo Zhou, Vishal M. Patel

    Abstract: Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. While DPF shows great potential for unifying data generation of various modalities including images, videos, and 3D geometry, it does not scale to a higher data resolution. This can be attributed to the ``scaling property'', where it is difficult for the model to capture local structures… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: for project page, see https://t1-diffusion-model.github.io

  32. arXiv:2304.05959  [pdf, other

    cs.RO cs.AI

    UAV Obstacle Avoidance by Human-in-the-Loop Reinforcement in Arbitrary 3D Environment

    Authors: Xuyang Li, Jianwu Fang, Kai Du, Kuizhi Mei, Jianru Xue

    Abstract: This paper focuses on the continuous control of the unmanned aerial vehicle (UAV) based on a deep reinforcement learning method for a large-scale 3D complex environment. The purpose is to make the UAV reach any target point from a certain starting point, and the flying height and speed are variable during navigation. In this work, we propose a deep reinforcement learning (DRL)-based method combine… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: accepted in CCC2023

  33. arXiv:2304.04370  [pdf, other

    cs.AI cs.CL cs.LG

    OpenAGI: When LLM Meets Domain Experts

    Authors: Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, Yongfeng Zhang

    Abstract: Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents, enabling them to harness expert models for complex task-solving towards Artificial General Intelligence (AGI). Large Language Models (LLMs) show promising learning and reasoning abilities, and can effectively u… ▽ More

    Submitted 3 November, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: In NeurIPS 2023

  34. arXiv:2304.02786  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    UNICORN: A Unified Backdoor Trigger Inversion Framework

    Authors: Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma

    Abstract: The backdoor attack, where the adversary uses inputs stamped with triggers (e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat to Deep Neural Network (DNN) models. Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors. A challenge of trigger inversion is that there are many ways of constructing the trigger.… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  35. arXiv:2302.06992  [pdf, other

    cs.CV

    Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

    Authors: Chuang Zhu, Kebin Liu, Wenqi Tang, Ke Mei, Jiaqi Zou, Tiejun Huang

    Abstract: The divergence between labeled training data and unlabeled testing data is a significant challenge for recent deep learning models. Unsupervised domain adaptation (UDA) attempts to solve such problem. Recent works show that self-training is a powerful approach to UDA. However, existing methods have difficulty in balancing the scalability and performance. In this paper, we propose a hard-aware inst… ▽ More

    Submitted 24 March, 2025; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2008.12197

  36. arXiv:2212.07352  [pdf, other

    cs.CV

    Bi-Noising Diffusion: Towards Conditional Diffusion Models with Generative Restoration Priors

    Authors: Kangfu Mei, Nithin Gopalakrishnan Nair, Vishal M. Patel

    Abstract: Conditional diffusion probabilistic models can model the distribution of natural images and can generate diverse and realistic samples based on given conditions. However, oftentimes their results can be unrealistic with observable color shifts and textures. We believe that this issue results from the divergence between the probabilistic distribution learned by the model and the distribution of nat… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  37. arXiv:2212.00235  [pdf, other

    cs.CV

    VIDM: Video Implicit Diffusion Models

    Authors: Kangfu Mei, Vishal M. Patel

    Abstract: Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated vi… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

    Comments: AAAI2023 https://kfmei.page/vidm/

  38. arXiv:2210.15127  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Rethinking the Reverse-engineering of Trojan Triggers

    Authors: Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma

    Abstract: Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space trigg… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  39. arXiv:2208.11284  [pdf, other

    cs.CV

    AT-DDPM: Restoring Faces degraded by Atmospheric Turbulence using Denoising Diffusion Probabilistic Models

    Authors: Nithin Gopalakrishnan Nair, Kangfu Mei, Vishal M. Patel

    Abstract: Although many long-range imaging systems are designed to support extended vision applications, a natural obstacle to their operation is degradation due to atmospheric turbulence. Atmospheric turbulence causes significant degradation to image quality by introducing blur and geometric distortion. In recent years, various deep learning-based single image atmospheric turbulence mitigation methods, inc… ▽ More

    Submitted 20 September, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: Accepted to IEEE WACV 2023

  40. arXiv:2207.09302  [pdf, other

    cs.CV

    Deep Semantic Statistics Matching (D2SM) Denoising Network

    Authors: Kangfu Mei, Vishal M. Patel, Rui Huang

    Abstract: The ultimate aim of image restoration like denoising is to find an exact correlation between the noisy and clear image domains. But the optimization of end-to-end denoising learning like pixel-wise losses is performed in a sample-to-sample manner, which ignores the intrinsic correlation of images, especially semantics. In this paper, we introduce the Deep Semantic Statistics Matching (D2SM) Denois… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: ECCV2022, for Project Page, see https://kfmei.page/d2sm/

  41. arXiv:2204.08974  [pdf, other

    cs.CV eess.IV

    A comparison of different atmospheric turbulence simulation methods for image restoration

    Authors: Nithin Gopalakrishnan Nair, Kangfu Mei, Vishal M. Patel

    Abstract: Atmospheric turbulence deteriorates the quality of images captured by long-range imaging systems by introducing blur and geometric distortions to the captured scene. This leads to a drastic drop in performance when computer vision algorithms like object/face recognition and detection are performed on these images. In recent years, various deep learning-based atmospheric turbulence mitigation metho… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  42. arXiv:2204.03057  [pdf, other

    cs.CV

    Thermal to Visible Image Synthesis under Atmospheric Turbulence

    Authors: Kangfu Mei, Yiqun Mei, Vishal M. Patel

    Abstract: In many practical applications of long-range imaging such as biometrics and surveillance, thermal imagining modalities are often used to capture images in low-light and nighttime conditions. However, such imaging systems often suffer from atmospheric turbulence, which introduces severe blur and deformation artifacts to the captured images. Such an issue is unavoidable in long-range imaging and sig… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: 4 pages, 3 figures

  43. arXiv:2202.09954  [pdf, other

    eess.SP cs.IT cs.LG

    Theoretical Analysis of Deep Neural Networks in Physical Layer Communication

    Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

    Abstract: Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of… ▽ More

    Submitted 26 August, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

    Comments: 15 pages, 13 figures, has been accepted for publication in IEEE Transactions on Communications. arXiv admin note: substantial text overlap with arXiv:2106.01124

    Journal ref: IEEE Transactions on Communications, 2022

  44. LTT-GAN: Looking Through Turbulence by Inverting GANs

    Authors: Kangfu Mei, Vishal M. Patel

    Abstract: In many applications of long-range imaging, we are faced with a scenario where a person appearing in the captured imagery is often degraded by atmospheric turbulence. However, restoring such degraded images for face verification is difficult since the degradation causes images to be geometrically distorted and blurry. To mitigate the turbulence effect, in this paper, we propose the first turbulenc… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

    Comments: Project Page: https://kfmei.page/LTT-GAN/

  45. Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion Understanding

    Authors: Shengcheng Yu, Chunrong Fang, Quanjun Zhang, Zhihao Cao, Yexiao Yun, Zhenfei Cao, Kai Mei, Zhenyu Chen

    Abstract: Crowdsourced testing, as a distinct testing paradigm, has attracted much attention in software testing, especially in mobile application (app) testing field. Compared with in-house testing, crowdsourced testing shows superiority with the diverse testing environments when faced with the mobile testing fragmentation problem. However, crowdsourced testing also encounters the low-quality test report p… ▽ More

    Submitted 12 June, 2023; v1 submitted 16 August, 2021; originally announced August 2021.

  46. A Low Complexity Learning-based Channel Estimation for OFDM Systems with Online Training

    Authors: Kai Mei, Jun Liu, Xiaoying Zhang, Kuo Cao, Nandana Rajatheva, Jibo Wei

    Abstract: In this paper, we devise a highly efficient machine learning-based channel estimation for orthogonal frequency division multiplexing (OFDM) systems, in which the training of the estimator is performed online. A simple learning module is employed for the proposed learning-based estimator. The training process is thus much faster and the required training data is reduced significantly. Besides, a tr… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: 12 pages, 12 figures. To appear in IEEE Transactions on Communications

  47. arXiv:2106.01124  [pdf, other

    eess.SP cs.IT cs.LG

    Opening the Black Box of Deep Neural Networks in Physical Layer Communication

    Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

    Abstract: Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantit… ▽ More

    Submitted 18 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: 6 pages, 5 figures, to be presented in the IEEE Wireless Communications and Networking Conference (WCNC) 2022 Workshop on Machine Learning for Communications: Future Large Scale MIMO and AI-Native Air-Interface

  48. arXiv:2104.00848  [pdf, other

    cs.CV

    SDAN: Squared Deformable Alignment Network for Learning Misaligned Optical Zoom

    Authors: Kangfu Mei, Shenglong Ye, Rui Huang

    Abstract: Deep Neural Network (DNN) based super-resolution algorithms have greatly improved the quality of the generated images. However, these algorithms often yield significant artifacts when dealing with real-world super-resolution problems due to the difficulty in learning misaligned optical zoom. In this paper, we introduce a Squared Deformable Alignment Network (SDAN) to address this issue. Our networ… ▽ More

    Submitted 25 November, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: ICME21. Code is available at https://github.com/MKFMIKU/SDAN

  49. arXiv:2103.05930  [pdf, other

    cs.CV

    AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing

    Authors: Qi Song, Kangfu Mei, Rui Huang

    Abstract: Two factors have proven to be very important to the performance of semantic segmentation models: global context and multi-level semantics. However, generating features that capture both factors always leads to high computational complexity, which is problematic in real-time scenarios. In this paper, we propose a new model, called Attention-Augmented Network (AttaNet), to capture both global contex… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: AAAI 2021

  50. arXiv:2101.01479  [pdf, other

    cs.CV

    Scale-Aware Network with Regional and Semantic Attentions for Crowd Counting under Cluttered Background

    Authors: Qiaosi Yi, Yunxing Liu, Aiwen Jiang, Juncheng Li, Kangfu Mei, Mingwen Wang

    Abstract: Crowd counting is an important task that shown great application value in public safety-related fields, which has attracted increasing attention in recent years. In the current research, the accuracy of counting numbers and crowd density estimation are the main concerns. Although the emergence of deep learning has greatly promoted the development of this field, crowd counting under cluttered backg… ▽ More

    Submitted 7 January, 2021; v1 submitted 5 January, 2021; originally announced January 2021.