Skip to main content

Showing 1–50 of 75 results for author: Yi, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05283  [pdf

    cs.AI cs.CL

    Chat2SPaT: A Large Language Model Based Tool for Automating Traffic Signal Control Plan Management

    Authors: Yue Wang, Miao Zhou, Guijing Huang, Rui Zhuo, Chao Yi, Zhenliang Ma

    Abstract: Pre-timed traffic signal control, commonly used for operating signalized intersections and coordinated arterials, requires tedious manual work for signaling plan creating and updating. When the time-of-day or day-of-week plans are utilized, one intersection is often associated with multiple plans, leading to further repetitive manual plan parameter inputting. To enable a user-friendly traffic sign… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  2. arXiv:2507.01949  [pdf, ps, other

    cs.CV

    Kwai Keye-VL Technical Report

    Authors: Kwai Keye Team, Biao Yang, Bin Wen, Changyi Liu, Chenglong Chu, Chengru Song, Chongling Rao, Chuan Yi, Da Li, Dunju Zang, Fan Yang, Guorui Zhou, Hao Peng, Haojie Ding, Jiaming Huang, Jiangxia Cao, Jiankang Chen, Jingyun Hua, Jin Ouyang, Kaibing Chen, Kaiyu Jiang, Kaiyu Tang, Kun Gai, Shengnan Zhang, Siyang Mao , et al. (35 additional authors not shown)

    Abstract: While Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities on static images, they often fall short in comprehending dynamic, information-dense short-form videos, a dominant medium in today's digital landscape. To bridge this gap, we introduce \textbf{Kwai Keye-VL}, an 8-billion-parameter multimodal foundation model engineered for leading-edge performance in short-video unde… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Technical Report: https://github.com/Kwai-Keye/Keye

  3. arXiv:2507.01354  [pdf, ps, other

    cs.LG physics.ao-ph

    Efficient Kilometer-Scale Precipitation Downscaling with Conditional Wavelet Diffusion

    Authors: Chugang Yi, Minghan Yu, Weikang Qian, Yixin Wen, Haizhao Yang

    Abstract: Effective hydrological modeling and extreme weather analysis demand precipitation data at a kilometer-scale resolution, which is significantly finer than the 10 km scale offered by standard global products like IMERG. To address this, we propose the Wavelet Diffusion Model (WDM), a generative framework that achieves 10x spatial super-resolution (downscaling to 1 km) and delivers a 9x inference spe… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    MSC Class: 86A10 (Primary) 86A22; 68U10 (Secondary) ACM Class: J.2; I.4.4

  4. arXiv:2506.23724  [pdf, ps, other

    cs.CV cs.AI

    When Small Guides Large: Cross-Model Co-Learning for Test-Time Adaptation

    Authors: Chang'an Yi, Xiaohui Deng, Guohao Chen, Yan Zhou, Qinghua Lu, Shuaicheng Niu

    Abstract: Test-time Adaptation (TTA) adapts a given model to testing domain data with potential domain shifts through online unsupervised learning, yielding impressive performance. However, to date, existing TTA methods primarily focus on single-model adaptation. In this work, we investigate an intriguing question: how does cross-model knowledge influence the TTA process? Our findings reveal that, in TTA's… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 15 pages, 5 figures

  5. arXiv:2506.23467  [pdf, ps, other

    cs.CV cs.LG

    AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays

    Authors: Chenlang Yi, Zizhan Xiong, Qi Qi, Xiyuan Wei, Girish Bathla, Ching-Long Lin, Bobak Jack Mortazavi, Tianbao Yang

    Abstract: Contrastive Language-Image Pre-training (CLIP) models have demonstrated superior performance across various visual tasks including medical image classification. However, fairness concerns, including demographic biases, have received limited attention for CLIP models. This oversight leads to critical issues, particularly those related to race and gender, resulting in disparities in diagnostic outco… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This preprint has been accepted by MICCAI 2025

  6. arXiv:2506.19406  [pdf, ps, other

    cs.CV cs.AI

    A Global-Local Cross-Attention Network for Ultra-high Resolution Remote Sensing Image Semantic Segmentation

    Authors: Chen Yi, Shan LianLei

    Abstract: With the rapid development of ultra-high resolution (UHR) remote sensing technology, the demand for accurate and efficient semantic segmentation has increased significantly. However, existing methods face challenges in computational efficiency and multi-scale feature fusion. To address these issues, we propose GLCANet (Global-Local Cross-Attention Network), a lightweight segmentation framework des… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  7. arXiv:2506.16677  [pdf, ps, other

    cs.HC cs.RO

    PPTP: Performance-Guided Physiological Signal-Based Trust Prediction in Human-Robot Collaboration

    Authors: Hao Guo, Wei Fan, Shaohui Liu, Feng Jiang, Chunzhi Yi

    Abstract: Trust prediction is a key issue in human-robot collaboration, especially in construction scenarios where maintaining appropriate trust calibration is critical for safety and efficiency. This paper introduces the Performance-guided Physiological signal-based Trust Prediction (PPTP), a novel framework designed to improve trust assessment. We designed a human-robot construction scenario with three di… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  8. arXiv:2506.16157  [pdf, ps, other

    cs.CV

    MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models

    Authors: Xingbai Chen, Tingchao Fu, Renyang Liu, Wei Zhou, Chao Yi

    Abstract: Referring Expression Segmentation (RES) enables precise object segmentation in images based on natural language descriptions, offering high flexibility and broad applicability in real-world vision tasks. Despite its impressive performance, the robustness of RES models against adversarial examples remains largely unexplored. While prior adversarial attack methods have explored adversarial robustnes… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 17 pages, 5pages

  9. arXiv:2506.11143  [pdf, ps, other

    cs.CV

    On the development of an AI performance and behavioural measures for teaching and classroom management

    Authors: Andreea I. Niculescu, Jochen Ehnen, Chen Yi, Du Jiawei, Tay Chiat Pin, Joey Tianyi Zhou, Vigneshwaran Subbaraju, Teh Kah Kuan, Tran Huy Dat, John Komar, Gi Soong Chee, Kenneth Kwok

    Abstract: This paper presents a two-year research project focused on developing AI-driven measures to analyze classroom dynamics, with particular emphasis on teacher actions captured through multimodal sensor data. We applied real-time data from classroom sensors and AI techniques to extract meaningful insights and support teacher development. Key outcomes include a curated audio-visual dataset, novel behav… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 7 pages, 10 figures, A video demonstration of the teacher trainer dashboard can be accessed here: https://vimeo.com/1076482827

    ACM Class: H.5; J.4; I.2.7; I.2.10

  10. arXiv:2505.15829  [pdf, ps, other

    cs.NI

    Distributionally Robust Optimization for Digital Twin Service Provisioning over Edge Computing

    Authors: Yuxiang Li, Jiayuan Chen, Changyan Yi

    Abstract: Digital Twin (DT) is a transformative technology poised to revolutionize a wide range of applications. This advancement has led to the emergence of digital twin as a service (DTaaS), enabling users to interact with DT models that accurately reflect the real-time status of their physical counterparts. Quality of DTaaS primarily depends on the freshness of DT data, which can be quantified by the age… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  11. arXiv:2505.15828  [pdf, ps, other

    cs.NI cs.AI

    Generative AI-Aided QoE Maximization for RIS-Assisted Digital Twin Interaction

    Authors: Jiayuan Chen, Yuxiang Li, Changyan Yi, Shimin Gong

    Abstract: In this paper, we investigate a quality of experience (QoE)-aware resource allocation problem for reconfigurable intelligent surface (RIS)-assisted digital twin (DT) interaction with uncertain evolution. In the considered system, mobile users are expected to interact with a DT model maintained on a DT server that is deployed on a base station, via effective uplink and downlink channels assisted by… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  12. arXiv:2505.05023  [pdf, ps, other

    cs.CV

    Split Matching for Inductive Zero-shot Semantic Segmentation

    Authors: Jialei Chen, Xu Zheng, Dongyue Li, Chong Yi, Seigo Ito, Danda Pani Paudel, Luc Van Gool, Hiroshi Murase, Daisuke Deguchi

    Abstract: Zero-shot Semantic Segmentation (ZSS) aims to segment categories that are not annotated during training. While fine-tuning vision-language models has achieved promising results, these models often overfit to seen categories due to the lack of supervision for unseen classes. As an alternative to fully supervised approaches, query-based segmentation has shown great latent in ZSS, as it enables objec… ▽ More

    Submitted 27 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  13. arXiv:2505.03799  [pdf, other

    cs.LG cs.AI cs.CL

    Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling

    Authors: Hyun Lee, Chris Yi, Maminur Islam, B. D. S. Aritra

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in various natural language processing tasks; however, their application to graph-related problems remains limited, primarily due to scalability constraints and the absence of dedicated mechanisms for processing graph structures. Existing approaches predominantly integrate LLMs with Graph Neural Networks (GNNs), using GNNs as featu… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: To be published in International Joint Conference on Neural Networks (IJCNN), 2025

  14. arXiv:2504.19506  [pdf, other

    cs.CV

    SynergyAmodal: Deocclude Anything with Text Control

    Authors: Xinyang Li, Chengjie Yi, Jiawei Lai, Mingbao Lin, Yansong Qu, Shengchuan Zhang, Liujuan Cao

    Abstract: Image deocclusion (or amodal completion) aims to recover the invisible regions (\ie, shape and appearance) of occluded instances in images. Despite recent advances, the scarcity of high-quality data that balances diversity, plausibility, and fidelity remains a major obstacle. To address this challenge, we identify three critical elements: leveraging in-the-wild image data for diversity, incorporat… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 17 pages

  15. arXiv:2504.08781  [pdf, other

    cs.CL cs.AI cs.IR

    Efficient Evaluation of Large Language Models via Collaborative Filtering

    Authors: Xu-Xiang Zhong, Chao Yi, Han-Jia Ye

    Abstract: With the development of Large Language Models (LLMs), numerous benchmarks have been proposed to measure and compare the capabilities of different LLMs. However, evaluating LLMs is costly due to the large number of test instances and their slow inference speed. In this paper, we aim to explore how to efficiently estimate a model's real performance on a given benchmark based on its evaluation result… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  16. arXiv:2503.19240  [pdf, other

    cs.CV cs.HC

    Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding

    Authors: Hao Guo, Jianfei Zhu, Wei Fan, Chunzhi Yi, Feng Jiang

    Abstract: Referring expression comprehension (REC) aims at achieving object localization based on natural language descriptions. However, existing REC approaches are constrained by object category descriptions and single-attribute intention descriptions, hindering their application in real-world scenarios. In natural human-robot interactions, users often express their desires through individual states and i… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  17. arXiv:2503.16823  [pdf, other

    cs.ET cs.GT eess.SY

    Federated Digital Twin Construction via Distributed Sensing: A Game-Theoretic Online Optimization with Overlapping Coalitions

    Authors: Ruoyang Chen, Changyan Yi, Fuhui Zhou, Jiawen Kang, Yuan Wu, Dusit Niyato

    Abstract: In this paper, we propose a novel federated framework for constructing the digital twin (DT) model, referring to a living and self-evolving visualization model empowered by artificial intelligence, enabled by distributed sensing under edge-cloud collaboration. In this framework, the DT model to be built at the cloud is regarded as a global one being split into and integrating from multiple functio… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  18. arXiv:2503.04170  [pdf, other

    cs.ET cs.AI

    Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework

    Authors: Xiaolong Li, Jianhao Wei, Haidong Wang, Li Dong, Ruoyang Chen, Changyan Yi, Jun Cai, Dusit Niyato, Xuemin, Shen

    Abstract: In intelligent transportation systems (ITSs), incorporating pedestrians and vehicles in-the-loop is crucial for developing realistic and safe traffic management solutions. However, there is falls short of simulating complex real-world ITS scenarios, primarily due to the lack of a digital twin implementation framework for characterizing interactions between pedestrians and vehicles at different loc… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  19. arXiv:2502.20223  [pdf, other

    cs.CV cs.AI cs.LG

    Deep Convolutional Neural Networks for Palm Fruit Maturity Classification

    Authors: Mingqiang Han, Chunlin Yi

    Abstract: To maximize palm oil yield and quality, it is essential to harvest palm fruit at the optimal maturity stage. This project aims to develop an automated computer vision system capable of accurately classifying palm fruit images into five ripeness levels. We employ deep Convolutional Neural Networks (CNNs) to classify palm fruit images based on their maturity stage. A shallow CNN serves as the baseli… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  20. arXiv:2502.13539  [pdf, other

    cs.IR

    Bursting Filter Bubble: Enhancing Serendipity Recommendations with Aligned Large Language Models

    Authors: Yunjia Xi, Muyan Weng, Wen Chen, Chao Yi, Dian Chen, Gaoyang Guo, Mao Zhang, Jian Wu, Yuning Jiang, Qingwen Liu, Yong Yu, Weinan Zhang

    Abstract: Recommender systems (RSs) often suffer from the feedback loop phenomenon, e.g., RSs are trained on data biased by their recommendations. This leads to the filter bubble effect that reinforces homogeneous content and reduces user satisfaction. To this end, serendipity recommendations, which offer unexpected yet relevant items, are proposed. Recently, large language models (LLMs) have shown potentia… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 15 pages

  21. arXiv:2502.09624  [pdf, other

    cs.AI cs.CR

    Efficient and Trustworthy Block Propagation for Blockchain-enabled Mobile Embodied AI Networks: A Graph Resfusion Approach

    Authors: Jiawen Kang, Jiana Liao, Runquan Gao, Jinbo Wen, Huawei Huang, Maomao Zhang, Changyan Yi, Tao Zhang, Dusit Niyato, Zibin Zheng

    Abstract: By synergistically integrating mobile networks and embodied artificial intelligence (AI), Mobile Embodied AI Networks (MEANETs) represent an advanced paradigm that facilitates autonomous, context-aware, and interactive behaviors within dynamic environments. Nevertheless, the rapid development of MEANETs is accompanied by challenges in trustworthiness and operational efficiency. Fortunately, blockc… ▽ More

    Submitted 26 January, 2025; originally announced February 2025.

    Comments: 15 pages, 11 figures

  22. A 3-Step Optimization Framework with Hybrid Models for a Humanoid Robot's Jump Motion

    Authors: Haoxiang Qi, Zhangguo Yu, Xuechao Chen, Yaliang Liu, Chuanku Yi, Chencheng Dong, Fei Meng, Qiang Huang

    Abstract: High dynamic jump motions are challenging tasks for humanoid robots to achieve environment adaptation and obstacle crossing. The trajectory optimization is a practical method to achieve high-dynamic and explosive jumping. This paper proposes a 3-step trajectory optimization framework for generating a jump motion for a humanoid robot. To improve iteration speed and achieve ideal performance, the fr… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  23. arXiv:2412.20654  [pdf

    cs.RO cs.HC

    Impact of Cognitive Load on Human Trust in Hybrid Human-Robot Collaboration

    Authors: Hao Guo, Bangan Wu, Qi Li, Zhen Ding, Feng Jiang, Chunzhi Yi

    Abstract: Human trust plays a crucial role in the effectiveness of human-robot collaboration. Despite its significance, the development and maintenance of an optimal trust level are obstructed by the complex nature of influencing factors and their mechanisms. This study investigates the effects of cognitive load on human trust within the context of a hybrid human-robot collaboration task. An experiment is c… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  24. arXiv:2412.20484  [pdf, other

    cs.NI eess.SP

    Exploiting NOMA Transmissions in Multi-UAV-assisted Wireless Networks: From Aerial-RIS to Mode-switching UAVs

    Authors: Songhan Zhao, Shimin Gong, Bo Gu, Lanhua Li, Bin Lyu, Dinh Thai Hoang, Changyan Yi

    Abstract: In this paper, we consider an aerial reconfigurable intelligent surface (ARIS)-assisted wireless network, where multiple unmanned aerial vehicles (UAVs) collect data from ground users (GUs) by using the non-orthogonal multiple access (NOMA) method. The ARIS provides enhanced channel controllability to improve the NOMA transmissions and reduce the co-channel interference among UAVs. We also propose… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  25. arXiv:2412.18111  [pdf, other

    cs.AI

    AIGT: AI Generative Table Based on Prompt

    Authors: Mingming Zhang, Zhiqing Xiao, Guoshan Lu, Sai Wu, Weiqiang Wang, Xing Fu, Can Yi, Junbo Zhao

    Abstract: Tabular data, which accounts for over 80% of enterprise data assets, is vital in various fields. With growing concerns about privacy protection and data-sharing restrictions, generating high-quality synthetic tabular data has become essential. Recent advancements show that large language models (LLMs) can effectively gener-ate realistic tabular data by leveraging semantic information and overcomin… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

  26. arXiv:2411.15504  [pdf, other

    physics.med-ph cs.RO

    Effects of Muscle Synergy during Overhead Work with a Passive Shoulder Exoskeleton: A Case Study

    Authors: Jin Tian, Baichun Wei, Chifu Yang, Suo Luo, Jiadong Feng, Ping Li, Changbing Chen, Yingjie Liu, Haiqi Zhu, Chunzhi Yi

    Abstract: Objective: Shoulder exoskeletons can effectively assist with overhead work. However, their impacts on muscle synergy remain unclear. The objective is to systematically investigate the effects of the shoulder exoskeleton on muscle synergies during overhead work.Methods: Eight male participants were recruited to perform a screwing task both with (Intervention) and without (Normal) the exoskeleton. E… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  27. arXiv:2411.13770  [pdf, other

    cs.RO

    A Novel Passive Occupational Shoulder Exoskeleton With Adjustable Peak Assistive Torque Angle For Overhead Tasks

    Authors: Jin Tian, Haiqi Zhu, Changjia Lu, Chifu Yang, Yingjie Liu, Baichun Wei, Chunzhi Yi

    Abstract: Objective: Overhead tasks are a primary inducement to work-related musculoskeletal disorders. Aiming to reduce shoulder physical loads, passive shoulder exoskeletons are increasingly prevalent in the industry due to their lightweight, affordability, and effectiveness. However, they can only accommodate a specific task and cannot effectively balance between compactness and sufficient range of motio… ▽ More

    Submitted 23 November, 2024; v1 submitted 20 November, 2024; originally announced November 2024.

  28. arXiv:2411.08451  [pdf, other

    cs.CV

    AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding

    Authors: Hao Guo, Wei Fan, Baichun Wei, Jianfei Zhu, Jin Tian, Chunzhi Yi, Feng Jiang

    Abstract: Embodied reference understanding is crucial for intelligent agents to predict referents based on human intention through gesture signals and language descriptions. This paper introduces the Attention-Dynamic DINO, a novel framework designed to mitigate misinterpretations of pointing gestures across various interaction contexts. Our approach integrates visual and textual features to simultaneously… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  29. arXiv:2410.22309  [pdf

    cs.HC cs.CY

    GPT-4o reads the mind in the eyes

    Authors: James W. A. Strachan, Oriana Pansardi, Eugenio Scaliti, Marco Celotto, Krati Saxena, Chunzhi Yi, Fabio Manzi, Alessandro Rufo, Guido Manzi, Michael S. A. Graziano, Stefano Panzeri, Cristina Becchio

    Abstract: Large Language Models (LLMs) are capable of reproducing human-like inferences, including inferences about emotions and mental states, from text. Whether this capability extends beyond text to other modalities remains unclear. Humans possess a sophisticated ability to read the mind in the eyes of other people. Here we tested whether this ability is also present in GPT-4o, a multimodal LLM. Using tw… ▽ More

    Submitted 30 October, 2024; v1 submitted 29 October, 2024; originally announced October 2024.

  30. arXiv:2410.02176  [pdf, ps, other

    cs.LG stat.ML

    Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks

    Authors: Ke Chen, Chugang Yi, Haizhao Yang

    Abstract: We study the implicit bias towards low-rank weight matrices when training neural networks (NN) with Weight Decay (WD). We prove that when a ReLU NN is sufficiently trained with Stochastic Gradient Descent (SGD) and WD, its weight matrix is approximately a rank-two matrix. Empirically, we demonstrate that WD is a necessary condition for inducing this low-rank bias across both regression and classif… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  31. arXiv:2409.15905  [pdf, other

    cs.SD cs.AI eess.AS

    Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM

    Authors: Fengrun Zhang, Wang Geng, Hukai Huang, Yahui Shan, Cheng Yi, He Qu

    Abstract: In this paper, we introduce a speech-conditioned Large Language Model (LLM) integrated with a Mixture of Experts (MoE) based connector to address the challenge of Code-Switching (CS) in Automatic Speech Recognition (ASR). Specifically, we propose an Insertion and Deletion of Interruption Token (IDIT) mechanism for better transfer text generation ability of LLM to speech recognition task. We also p… ▽ More

    Submitted 30 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

  32. arXiv:2407.10979  [pdf, ps, other

    cs.NI

    Diffusion Model-based Incentive Mechanism with Prospect Theory for Edge AIGC Services in 6G IoT

    Authors: Jinbo Wen, Jiangtian Nie, Yue Zhong, Changyan Yi, Xiaohuan Li, Jiangming Jin, Yang Zhang, Dusit Niyato

    Abstract: The fusion of the Internet of Things (IoT) with Sixth-Generation (6G) technology has significant potential to revolutionize the IoT landscape. With the ultra-reliable and low-latency communication capabilities of 6G, 6G-IoT networks can transmit high-quality and diverse data to enhance edge learning. Artificial Intelligence-Generated Content (AIGC) harnesses advanced AI algorithms to automatically… ▽ More

    Submitted 25 July, 2024; v1 submitted 10 June, 2024; originally announced July 2024.

  33. arXiv:2407.08174  [pdf, other

    cs.HC q-bio.NC

    An Adaptively Weighted Averaging Method for Regional Time Series Extraction of fMRI-based Brain Decoding

    Authors: Jianfei Zhu, Baichun Wei, Jiaru Tian, Feng Jiang, Chunzhi Yi

    Abstract: Brain decoding that classifies cognitive states using the functional fluctuations of the brain can provide insightful information for understanding the brain mechanisms of cognitive functions. Among the common procedures of decoding the brain cognitive states with functional magnetic resonance imaging (fMRI), extracting the time series of each brain region after brain parcellation traditionally av… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 17 pages, 4 figures

    ACM Class: J.3

  34. arXiv:2406.17349  [pdf, other

    cs.CR cs.CV

    Semantic Deep Hiding for Robust Unlearnable Examples

    Authors: Ruohan Meng, Chenyu Yi, Yi Yu, Siyuan Yang, Bingquan Shen, Alex C. Kot

    Abstract: Ensuring data privacy and protection has become paramount in the era of deep learning. Unlearnable examples are proposed to mislead the deep learning models and prevent data from unauthorized exploration by adding small perturbations to data. However, such perturbations (e.g., noise, texture, color change) predominantly impact low-level features, making them vulnerable to common countermeasures. I… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by TIFS 2024

  35. arXiv:2406.02539  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Parrot: Multilingual Visual Instruction Tuning

    Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid development of Multimodal Large Language Models (MLLMs), such as GPT-4o, marks a significant step toward artificial general intelligence. Existing methods typically align vision encoders with LLMs via supervised fine-tuning (SFT), but this often deteriorates their ability to handle multiple languages as training progresses. We empirically observe that imbalanced SFT datasets, largely Eng… ▽ More

    Submitted 25 May, 2025; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2025. Code and dataset are available at: https://github.com/AIDC-AI/Parrot

  36. arXiv:2404.17753  [pdf, other

    cs.CV cs.AI

    Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

    Authors: Chao Yi, Lu Ren, De-Chuan Zhan, Han-Jia Ye

    Abstract: CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment betwe… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  37. arXiv:2403.13797  [pdf, other

    cs.LG cs.CV

    Bridge the Modality and Capability Gaps in Vision-Language Model Selection

    Authors: Chao Yi, Yu-Hang He, De-Chuan Zhan, Han-Jia Ye

    Abstract: Vision Language Models (VLMs) excel in zero-shot image classification by pairing images with textual category names. The expanding variety of Pre-Trained VLMs enhances the likelihood of identifying a suitable VLM for specific tasks. To better reuse the VLM resource and fully leverage its potential on different zero-shot image classification tasks, a promising strategy is selecting appropriate Pre-… ▽ More

    Submitted 18 May, 2025; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: fix typo in figure 2 "Capability Gap"

  38. arXiv:2403.13237  [pdf, ps, other

    cs.CR math.OC

    Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0

    Authors: Jiana Liao, Jinbo Wen, Jiawen Kang, Changyan Yi, Yang Zhang, Yutao Jiao, Dusit Niyato, Dong In Kim, Shengli Xie

    Abstract: Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and… ▽ More

    Submitted 8 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  39. arXiv:2402.18936  [pdf, ps, other

    cs.NI eess.SP

    Energy-Efficient UAV Swarm Assisted MEC with Dynamic Clustering and Scheduling

    Authors: Jialiuyuan Li, Jiayuan Chen, Changyan Yi, Tong Zhang, Kun Zhu, Jun Cai

    Abstract: In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically clust… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  40. arXiv:2402.18927  [pdf, other

    cs.CV cs.MM cs.NI

    Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

    Authors: Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai

    Abstract: This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  41. arXiv:2402.11500  [pdf, other

    cs.GT cs.NI

    A Three-Party Repeated Coalition Formation Game for PLS in Wireless Communications with IRSs

    Authors: Haipeng Zhou, Ruoyang Chen, Changyan Yi, Juan Li, Jun Cai

    Abstract: In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE WCNC 2024

  42. arXiv:2401.16710  [pdf, other

    cs.NI

    Dynamic Human Digital Twin Deployment at the Edge for Task Execution: A Two-Timescale Accuracy-Aware Online Optimization

    Authors: Yuye Yang, You Shi, Changyan Yi, Jun Cai, Jiawen Kang, Dusit Niyato, Xuemin, Shen

    Abstract: Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services. In this paper, we study a two-timescale online optimization for building HDT under an end-edge-cloud collaborative framework. As a unique feature of HDT, we consider that PTs' corresponding VTs are deployed on edge ser… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  43. arXiv:2401.13699  [pdf, other

    cs.HC cs.AI cs.LG

    Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey

    Authors: Jiayuan Chen, You Shi, Changyan Yi, Hongyang Du, Jiawen Kang, Dusit Niyato

    Abstract: The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare, attracting extensive attentions to IoT-healthcare services. Meanwhile, the human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body in the digital world and reflect its physical status in real time. Na… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

  44. arXiv:2401.02705  [pdf, other

    cs.AI

    XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

    Authors: Zhitao Wang, Wei Wang, Zirao Li, Long Wang, Can Yi, Xinjie Xu, Luyang Cao, Hanjing Su, Shouzhi Chen, Jun Zhou

    Abstract: In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting… ▽ More

    Submitted 10 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  45. arXiv:2312.12063  [pdf, other

    cs.NI cs.AI cs.GT

    Resource-efficient Generative Mobile Edge Networks in 6G Era: Fundamentals, Framework and Case Study

    Authors: Bingkun Lai, Jinbo Wen, Jiawen Kang, Hongyang Du, Jiangtian Nie, Changyan Yi, Dong In Kim, Shengli Xie

    Abstract: As the next-generation wireless communication system, Sixth-Generation (6G) technologies are emerging, enabling various mobile edge networks that can revolutionize wireless communication and connectivity. By integrating Generative Artificial Intelligence (GAI) with mobile edge networks, generative mobile edge networks possess immense potential to enhance the intelligence and efficiency of wireless… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  46. arXiv:2312.02896  [pdf, other

    cs.CV

    BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

    Authors: Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot

    Abstract: Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles. However, their robustness against diverse style shifts, crucial for practical applications, remains largely unexplored. In this paper, we propose a new benchmark, BenchLMM, to assess the robustness of LMMs against three different styles: artistic image style, ima… ▽ More

    Submitted 5 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Code is available at https://github.com/AIFEG/BenchLMM

  47. arXiv:2310.05341  [pdf, other

    cs.CV cs.AI

    From Question to Exploration: Test-Time Adaptation in Semantic Segmentation?

    Authors: Chang'an Yi, Haotian Chen, Yifan Zhang, Yonghui Xu, Yan Zhou, Lizhen Cui

    Abstract: Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to test data with potential distribution shifts. Most existing TTA methods focus on classification problems. The pronounced success of classification might lead numerous newcomers and engineers to assume that classic TTA techniques can be directly applied to the more challenging task of semantic segmentation. How… ▽ More

    Submitted 31 October, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  48. arXiv:2309.09984  [pdf

    q-bio.NC cs.NE

    BDEC:Brain Deep Embedded Clustering model

    Authors: Xiaoxiao Ma, Chunzhi Yi, Zhicai Zhong, Hui Zhou, Baichun Wei, Haiqi Zhu, Feng Jiang

    Abstract: An essential premise for neuroscience brain network analysis is the successful segmentation of the cerebral cortex into functionally homogeneous regions. Resting-state functional magnetic resonance imaging (rs-fMRI), capturing the spontaneous activities of the brain, provides the potential for cortical parcellation. Previous parcellation methods can be roughly categorized into three groups, mainly… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  49. arXiv:2308.09158  [pdf, other

    cs.LG cs.CL cs.CV

    ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

    Authors: Yi-Kai Zhang, Lu Ren, Chao Yi, Qi-Wei Wang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid expansion of foundation pre-trained models and their fine-tuned counterparts has significantly contributed to the advancement of machine learning. Leveraging pre-trained models to extract knowledge and expedite learning in real-world tasks, known as "Model Reuse", has become crucial in various applications. Previous research focuses on reusing models within a certain aspect, including re… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  50. arXiv:2307.12115  [pdf, other

    cs.NI cs.AI cs.LG

    A Revolution of Personalized Healthcare: Enabling Human Digital Twin with Mobile AIGC

    Authors: Jiayuan Chen, Changyan Yi, Hongyang Du, Dusit Niyato, Jiawen Kang, Jun Cai, Xuemin, Shen

    Abstract: Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empower… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.