Skip to main content

Showing 1–50 of 68 results for author: Chu, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04600  [pdf, ps, other

    cs.AI

    DisMS-TS: Eliminating Redundant Multi-Scale Features for Time Series Classification

    Authors: Zhipeng Liu, Peibo Duan, Binwu Wang, Xuan Tang, Qi Chu, Changsheng Zhang, Yongsheng Huang, Bin Zhang

    Abstract: Real-world time series typically exhibit complex temporal variations, making the time series classification task notably challenging. Recent advancements have demonstrated the potential of multi-scale analysis approaches, which provide an effective solution for capturing these complex temporal patterns. However, existing multi-scale analysis-based time series prediction methods fail to eliminate r… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: This paper has been accepted for presentation at the ACM International Conference on Multimedia (ACM MM 2025)

  2. arXiv:2506.23292  [pdf, ps, other

    cs.CV

    DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios

    Authors: Changtao Miao, Yi Zhang, Weize Gao, Man Luo, Weiwei Feng, Zhiya Tan, Jianshu Li, Ajian Liu, Yunfeng Diao, Qi Chu, Tao Gong, Zhe Li, Weibin Yao, Joey Tianyi Zhou

    Abstract: Recent advances in AIGC have exacerbated the misuse of malicious deepfake content, making the development of reliable deepfake detection methods an essential means to address this challenge. Although existing deepfake detection models demonstrate outstanding performance in detection metrics, most methods only provide simple binary classification results, lacking interpretability. In critical domai… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper is a preliminary version, with an extended and comprehensive version currently under development

  3. arXiv:2506.21975  [pdf, ps, other

    cs.CV

    TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models

    Authors: Meng Yu, Te Cui, Qitong Chu, Wenjie Song, Yi Yang, Yufeng Yue

    Abstract: Reliable semantic segmentation of open environments is essential for intelligent systems, yet significant problems remain: 1) Existing RGB-T semantic segmentation models mainly rely on low-level visual features and lack high-level textual information, which struggle with accurate segmentation when categories share similar visual characteristics. 2) While SAM excels in instance-level segmentation,… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 6 pages, accepted for publication in lEEE/RSJ international Conference on Intelligent Robots and Systems (lROS 2025)

  4. arXiv:2506.09553  [pdf, ps, other

    cs.CV

    GLD-Road:A global-local decoding road network extraction model for remote sensing images

    Authors: Ligao Deng, Yupeng Deng, Yu Meng, Jingbo Chen, Zhihao Xi, Diyou Liu, Qifeng Chu

    Abstract: Road networks are crucial for mapping, autonomous driving, and disaster response. While manual annotation is costly, deep learning offers efficient extraction. Current methods include postprocessing (prone to errors), global parallel (fast but misses nodes), and local iterative (accurate but slow). We propose GLD-Road, a two-stage model combining global efficiency and local precision. First, it de… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  5. arXiv:2506.03710  [pdf, ps, other

    cs.CV cs.AI

    OSGNet @ Ego4D Episodic Memory Challenge 2025

    Authors: Yisen Feng, Haoyu Zhang, Qiaohui Chu, Meng Liu, Weili Guan, Yaowei Wang, Liqiang Nie

    Abstract: In this report, we present our champion solutions for the three egocentric video localization tracks of the Ego4D Episodic Memory Challenge at CVPR 2025. All tracks require precise localization of the interval within an untrimmed egocentric video. Previous unified video localization approaches often rely on late fusion strategies, which tend to yield suboptimal results. To address this, we adopt a… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: The champion solutions for the three egocentric video localization tracks(Natural Language Queries, Goal Step, and Moment Queries tracks) of the Ego4D Episodic Memory Challenge at CVPR EgoVis Workshop 2025

  6. arXiv:2506.02550  [pdf, ps, other

    cs.CV cs.AI

    Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025

    Authors: Qiaohui Chu, Haoyu Zhang, Yisen Feng, Meng Liu, Weili Guan, Yaowei Wang, Liqiang Nie

    Abstract: In this report, we present a novel three-stage framework developed for the Ego4D Long-Term Action Anticipation (LTA) task. Inspired by recent advances in foundation models, our method consists of three stages: feature extraction, action recognition, and long-term action anticipation. First, visual features are extracted using a high-performance visual encoder. The features are then fed into a Tran… ▽ More

    Submitted 11 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: The champion solution for the Ego4D Long-Term Action Anticipation Challenge at the CVPR EgoVis Workshop 2025

  7. arXiv:2505.23810  [pdf, ps, other

    cs.CL cs.AI

    MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

    Authors: Chenghao Yang, Yinbo Luo, Zhoufutu Wen, Qi Chu, Tao Gong, Longxiang Liu, Kaiyuan Zhang, Jianpeng Jiao, Ge Zhang, Wenhao Huang, Nenghai Yu

    Abstract: Large Language Models (\textbf{LLMs}), e.g. ChatGPT, have been widely adopted in real-world dialogue applications. However, LLMs' robustness, especially in handling long complex dialogue sessions, including frequent motivation transfer, sophisticated cross-turn dependency, is criticized all along. Nevertheless, no existing benchmarks can fully reflect these weaknesses. We present \textbf{MARS-Benc… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 29 pages, 13 figures

  8. arXiv:2505.20644  [pdf, other

    cs.CV cs.AI

    HCQA-1.5 @ Ego4D EgoSchema Challenge 2025

    Authors: Haoyu Zhang, Yisen Feng, Qiaohui Chu, Meng Liu, Weili Guan, Yaowei Wang, Liqiang Nie

    Abstract: In this report, we present the method that achieves third place for Ego4D EgoSchema Challenge in CVPR 2025. To improve the reliability of answer prediction in egocentric video question answering, we propose an effective extension to the previously proposed HCQA framework. Our approach introduces a multi-source aggregation strategy to generate diverse predictions, followed by a confidence-based fil… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: The third-place solution for the Ego4D EgoSchema Challenge at the CVPR EgoVis Workshop 2025

  9. arXiv:2505.19459  [pdf, ps, other

    cs.LG cs.AI

    Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation

    Authors: Kaichao Jiang, He Wang, Xiaoshuai Hao, Xiulong Yang, Ajian Liu, Qi Chu, Yunfeng Diao

    Abstract: Joint Energy-based Models (JEMs), a class of hybrid generative-discriminative models, are well known for their ability to achieve both high classification accuracy and generative capability within a single model. However, their robustness still lags significantly behind the classifiers based adversarial training (AT). Conversely, while AT is currently the most effective approach to improving the c… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  10. arXiv:2505.13327  [pdf, other

    cs.CV

    Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning

    Authors: Ajian Liu, Haocheng Yuan, Xiao Guo, Hui Ma, Wanyi Zhuang, Changtao Miao, Yan Hong, Chuanbiao Song, Jun Lan, Qi Chu, Tao Gong, Yanyan Liang, Weiqiang Wang, Jun Wan, Xiaoming Liu, Zhen Lei

    Abstract: Presentation Attack Detection and Face Forgery Detection are designed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes respectively. But separate training of these two models makes them vulnerable to unknown attacks and burdens deployment environments. The lack of a Unified Face Attack Detection model to handle both types of attacks is mainly… ▽ More

    Submitted 19 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  11. arXiv:2503.20294  [pdf, ps, other

    cs.CV cs.AI

    Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement

    Authors: Xinghao Wang, Tao Gong, Qi Chu, Bin Liu, Nenghai Yu

    Abstract: Malicious image manipulation poses societal risks, increasing the importance of effective image manipulation detection methods. Recent approaches in image manipulation detection have largely been driven by fully supervised approaches, which require labor-intensive pixel-level annotations. Thus, it is essential to explore weakly supervised image manipulation localization methods that only require i… ▽ More

    Submitted 31 March, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  12. arXiv:2503.09143  [pdf, other

    cs.CV

    Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding

    Authors: Haoyu Zhang, Qiaohui Chu, Meng Liu, Yunxiao Wang, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, Yaowei Wang, Liqiang Nie

    Abstract: AI personal assistants, deployed through robots or wearables, require embodied understanding to collaborate effectively with humans. Current Multimodal Large Language Models (MLLMs) primarily focus on third-person (exocentric) vision, overlooking the unique aspects of first-person (egocentric) videos. Additionally, high acquisition costs limit data size, impairing MLLM performance. To address thes… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Project: https://egovisiongroup.github.io/Exo2Ego.github.io/

  13. arXiv:2503.01932  [pdf

    cond-mat.mtrl-sci cs.LG

    A General Neural Network Potential for Energetic Materials with C, H, N, and O elements

    Authors: Mingjie Wen, Jiahe Han, Wenjuan Li, Xiaoya Chang, Qingzhao Chu, Dongping Chen

    Abstract: The discovery and optimization of high-energy materials (HEMs) are constrained by the prohibitive computational expense and prolonged development cycles inherent in conventional approaches. In this work, we develop a general neural network potential (NNP) that efficiently predicts the structural, mechanical, and decomposition properties of HEMs composed of C, H, N, and O. Our framework leverages p… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 41 pages,16 figures

  14. arXiv:2412.20833  [pdf, ps, other

    cs.CV cs.MM

    Inclusion 2024 Global Multimedia Deepfake Detection Challenge: Towards Multi-dimensional Face Forgery Detection

    Authors: Yi Zhang, Weize Gao, Changtao Miao, Man Luo, Jianshu Li, Wenzhong Deng, Zhe Li, Bingyu Hu, Weibin Yao, Yunfeng Diao, Wenbo Zhou, Tao Gong, Qi Chu

    Abstract: In this paper, we present the Global Multimedia Deepfake Detection held concurrently with the Inclusion 2024. Our Multimedia Deepfake Detection aims to detect automatic image and audio-video manipulations including but not limited to editing, synthesis, generation, Photoshop,etc. Our challenge has attracted 1500 teams from all over the world, with about 5000 valid result submission counts. We invi… ▽ More

    Submitted 3 June, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

    Comments: Inclusion 2024 Global Multimedia Deepfake Detection Competition Top Team Technical Report

  15. arXiv:2410.18032  [pdf, other

    cs.AI cs.CL cs.MA

    GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration

    Authors: Xin Sky Li, Qizhi Chu, Yubin Chen, Yang Liu, Yaoqi Liu, Zekai Yu, Weize Chen, Chen Qian, Chuan Shi, Cheng Yang

    Abstract: Graphs are widely used for modeling relational data in real-world scenarios, such as social networks and urban computing. Existing LLM-based graph analysis approaches either integrate graph neural networks (GNNs) for specific machine learning tasks, limiting their transferability, or rely solely on LLMs' internal reasoning ability, resulting in suboptimal performance. To address these limitations,… ▽ More

    Submitted 24 February, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  16. arXiv:2410.13879  [pdf, ps, other

    cs.LG

    Mixed-curvature decision trees and random forests

    Authors: Philippe Chlenski, Quentin Chu, Raiyan R. Khan, Kaizhu Du, Antonio Khalil Moretti, Itsik Pe'er

    Abstract: Decision trees (DTs) and their random forest (RF) extensions are workhorses of classification and regression in Euclidean spaces. However, algorithms for learning in non-Euclidean spaces are still limited. We extend DT and RF algorithms to product manifolds: Cartesian products of several hyperbolic, hyperspherical, or Euclidean components. Such manifolds handle heterogeneous curvature while still… ▽ More

    Submitted 6 June, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: 30 pages, 12 figures, 13 tables. Camera-ready version for ICML 2025

  17. arXiv:2410.02330  [pdf, other

    cs.CL

    Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection

    Authors: Tianxiang Chen, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Jieping Ye, Nenghai Yu

    Abstract: As a manner to augment pre-trained large language models (LLM), knowledge injection is critical to develop vertical domain large models and has been widely studied. Although most current approaches, including parameter-efficient fine-tuning (PEFT) and block expansion methods, uniformly apply knowledge across all LLM layers, it raises the question: are all layers equally crucial for knowledge injec… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  18. arXiv:2409.19667  [pdf, other

    cs.CL cs.AI

    Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

    Authors: Xin Sky Li, Weize Chen, Qizhi Chu, Haopeng Li, Zhaojun Sun, Ran Li, Chen Qian, Yiwei Wei, Zhiyuan Liu, Chuan Shi, Maosong Sun, Cheng Yang

    Abstract: The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step toward more advanced general intelligence. However, current LLM benchmarks on graph analysis require models to directly reason over the prompts describing graph… ▽ More

    Submitted 24 February, 2025; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: NeurIPS 2024

  19. arXiv:2408.02306  [pdf, other

    cs.CV

    Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization

    Authors: Changtao Miao, Qi Chu, Tao Gong, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Man Luo, Honggang Hu, Nenghai Yu

    Abstract: With the advancement of face manipulation technology, forgery images in multi-face scenarios are gradually becoming a more complex and realistic challenge. Despite this, detection and localization methods for such multi-face manipulations remain underdeveloped. Traditional manipulation localization methods either indirectly derive detection results from localization masks, resulting in limited det… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  20. arXiv:2406.07457  [pdf, other

    cs.LG stat.ML

    Estimating the Hallucination Rate of Generative AI

    Authors: Andrew Jesson, Nicolas Beltran-Velez, Quentin Chu, Sweta Karlekar, Jannik Kossen, Yarin Gal, John P. Cunningham, David Blei

    Abstract: This paper presents a method for estimating the hallucination rate for in-context learning (ICL) with generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and a prediction question and asked to generate a response. One interpretation of ICL assumes that the CGM computes the posterior predictive of an unknown Bayesian model, which implicitly defines a joint distrib… ▽ More

    Submitted 8 December, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  21. arXiv:2406.05227   

    cs.LG

    Mixed-Curvature Decision Trees and Random Forests

    Authors: Philippe Chlenski, Quentin Chu, Itsik Pe'er

    Abstract: We extend decision tree and random forest algorithms to product space manifolds: Cartesian products of Euclidean, hyperspherical, and hyperbolic manifolds. Such spaces have extremely expressive geometries capable of representing many arrangements of distances with low metric distortion. To date, all classifiers for product spaces fit a single linear decision boundary, and no regressor has been des… ▽ More

    Submitted 7 May, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: This paper has been replaced by a newer version at arXiv:2410.13879

  22. arXiv:2405.06699  [pdf

    cs.CL cs.AI

    ChatSOS: Vector Database Augmented Generative Question Answering Assistant in Safety Engineering

    Authors: Haiyang Tang, Dongping Chen, Qingzhao Chu

    Abstract: With the rapid advancement of natural language processing technologies, generative artificial intelligence techniques, represented by large language models (LLMs), are gaining increasing prominence and demonstrating significant potential for applications in safety engineering. However, fundamental LLMs face constraints such as limited training data coverage and unreliable responses. This study dev… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  23. arXiv:2404.00513  [pdf, other

    cs.CV

    Transformer based Pluralistic Image Completion with Reduced Information Loss

    Authors: Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

    Abstract: Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize $256^3$ RGB values to a small number (such as 512) of quantized color valu… ▽ More

    Submitted 14 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by TPAMI (2024). arXiv admin note: text overlap with arXiv:2205.05076

  24. arXiv:2403.18405  [pdf, other

    cs.AI cs.IR

    Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

    Authors: Shengjie Ma, Qi Chu, Jiaxin Mao, Xuhui Jiang, Haozhe Duan, Chong Chen

    Abstract: Determining which legal cases are relevant to a given query involves navigating lengthy texts and applying nuanced legal reasoning. Traditionally, this task has demanded significant time and domain expertise to identify key Legal Facts and reach sound juridical conclusions. In addition, existing data with legal case similarities often lack interpretability, making it difficult to understand the ra… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 March, 2024; originally announced March 2024.

  25. arXiv:2403.02148  [pdf, other

    cs.CV

    MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

    Authors: Tianxiang Chen, Zi Ye, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Nenghai Yu, Jieping Ye

    Abstract: Recently, infrared small target detection (ISTD) has made significant progress, thanks to the development of basic models. Specifically, the models combining CNNs with transformers can successfully extract both local and global features. However, the disadvantage of the transformer is also inherited, i.e., the quadratic computational complexity to sequence length. Inspired by the recent basic mode… ▽ More

    Submitted 24 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: The first Mamba-based model for infrared small target detection

  26. arXiv:2402.02327  [pdf, other

    cs.CV cs.SD eess.AS

    Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

    Authors: Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jieping Ye, Nenghai Yu

    Abstract: How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in video frames under the guidance of audio cues. However, most existing AVS methods are hindered by a modality imbalance where the visual features tend to dominate… ▽ More

    Submitted 6 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  27. arXiv:2402.02046  [pdf, other

    cs.CV

    TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

    Authors: Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu

    Abstract: Infrared small target detection (ISTD) is critical to national security and has been extensively applied in military areas. ISTD aims to segment small target pixels from background. Most ISTD networks focus on designing feature extraction blocks or feature fusion modules, but rarely describe the ISTD process from the feature map evolution perspective. In the ISTD process, the network attention gra… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  28. arXiv:2312.08629  [pdf

    cs.AI

    ChatSOS: LLM-based knowledge Q&A system for safety engineering

    Authors: Haiyang Tang, Zhenyi Liu, Dongping Chen, Qingzhao Chu

    Abstract: Recent advancements in large language models (LLMs) have notably propelled natural language processing (NLP) capabilities, demonstrating significant potential in safety engineering applications. Despite these advancements, LLMs face constraints in processing specialized tasks, attributed to factors such as corpus size, input processing limitations, and privacy concerns. Obtaining useful informatio… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: in Chinese language

  29. arXiv:2312.02520  [pdf, other

    cs.CV

    Towards More Unified In-context Visual Understanding

    Authors: Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

    Abstract: The rapid advancement of large language models (LLMs) has accelerated the emergence of in-context learning (ICL) as a cutting-edge approach in the natural language processing domain. Recently, ICL has been employed in visual understanding tasks, such as semantic segmentation and image captioning, yielding promising results. However, existing visual ICL framework can not enable producing content ac… ▽ More

    Submitted 16 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  30. arXiv:2310.15624  [pdf, other

    cs.CV cs.LG

    GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

    Authors: Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

    Abstract: Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and re… ▽ More

    Submitted 6 January, 2025; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 18 pages, 9 figures

  31. arXiv:2309.16668  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    RealFill: Reference-Driven Generation for Authentic Image Completion

    Authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein

    Abstract: Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of a… ▽ More

    Submitted 14 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH 2024 (Journal Track). Project page: https://realfill.github.io

  32. arXiv:2309.12657  [pdf, other

    cs.CV

    Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding

    Authors: Jiazhen Wang, Bin Liu, Changtao Miao, Zhiwei Zhao, Wanyi Zhuang, Qi Chu, Nenghai Yu

    Abstract: AI-synthesized text and images have gained significant attention, particularly due to the widespread dissemination of multi-modal manipulations on the internet, which has resulted in numerous negative impacts on society. Existing methods for multi-modal manipulation detection and grounding primarily focus on fusing vision-language features to make predictions, while overlooking the importance of m… ▽ More

    Submitted 13 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Camera-ready version and supplementary material

  33. arXiv:2306.10900  [pdf, other

    cs.CV cs.AI

    MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

    Authors: Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

    Abstract: Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human i… ▽ More

    Submitted 18 March, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: 18 pages, 8 figures, accepted by AAAI 2024

  34. EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

    Authors: Yaqi Zhang, Yan Lu, Bin Liu, Zhiwei Zhao, Qi Chu, Nenghai Yu

    Abstract: Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space. However, current transformer-based methods do not fully exploit the prior knowledge of the human skeleton provided by the kinematic structure. In this paper, we propose a novel transformer-based model EvoPose to introduce the human body prior knowledge for 3D human p… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 4 tables, published in the proceedings of IEEE ICASSP 2023

  35. arXiv:2306.09008  [pdf, other

    cs.CV

    Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal

    Authors: Zhentao Tan, Yue Wu, Qiankun Liu, Qi Chu, Le Lu, Jieping Ye, Nenghai Yu

    Abstract: Image restoration under adverse weather conditions (e.g., rain, snow and haze) is a fundamental computer vision problem and has important indications for various downstream applications. Different from early methods that are specially designed for specific type of weather, most recent works tend to remove various adverse weather effects simultaneously through either spatial feature representation… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  36. arXiv:2306.05390  [pdf, other

    cs.CV

    HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

    Authors: Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

    Abstract: This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50,000 high-quality images with rich texture details and semantic diversity. We analyze existing image restoration datasets from five different perspectives, including data scale, resolution, compression rates, texture details, and semantic coverage. However, we find that all of these datasets are defi… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Dataset and code will be available at https://github.com/littleYaang/HQ-50K

  37. arXiv:2305.10794  [pdf, other

    cs.CV

    Multi-spectral Class Center Network for Face Manipulation Detection and Localization

    Authors: Changtao Miao, Qi Chu, Zhentao Tan, Zhenchao Jin, Tao Gong, Wanyi Zhuang, Yue Wu, Bin Liu, Honggang Hu, Nenghai Yu

    Abstract: As deepfake content proliferates online, advancing face manipulation forensics has become crucial. To combat this emerging threat, previous methods mainly focus on studying how to distinguish authentic and manipulated face images. Although impressive, image-level classification lacks explainability and is limited to specific application scenarios, spurring recent research on pixel-level prediction… ▽ More

    Submitted 13 July, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Update Version

  38. arXiv:2305.06145  [pdf, other

    cs.CV

    Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification

    Authors: Xulin Li, Yan Lu, Bin Liu, Yuenan Hou, Yating Liu, Qi Chu, Wanli Ouyang, Nenghai Yu

    Abstract: Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID). It can provide discriminative identity features and eliminate the negative effects caused by the confounder--clothing changes. But we argue that there exists a strong spurious correlation between clothes and human identity, that restricts the common likelihood-based ReID method P(Y|X) to ex… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  39. arXiv:2303.09522  [pdf, other

    cs.CV cs.CL cs.GR cs.LG

    P+: Extended Textual Conditioning in Text-to-Image Generation

    Authors: Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, Kfir Aberman

    Abstract: We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inve… ▽ More

    Submitted 15 July, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  40. arXiv:2301.04265  [pdf, other

    cs.CV cs.AI

    Adversarial Alignment for Source Free Object Detection

    Authors: Qiaosong Chu, Shuyan Li, Guangyi Chen, Kai Li, Xiu Li

    Abstract: Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data. While most existing SFOD methods generate pseudo labels via a source-pretrained model to guide training, these pseudo labels usually contain high noises due to heavy domain discrepancy. In order to obtain better pseudo supervisions, we… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  41. arXiv:2212.03863  [pdf, other

    cs.CV cs.LG

    X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

    Authors: Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

    Abstract: Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous wor… ▽ More

    Submitted 31 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: ICML 2023, code is available at https://github.com/yoctta/XPaste

  42. arXiv:2210.12752  [pdf, other

    cs.CV

    UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection

    Authors: Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao Miao, Zixiang Luo, Nenghai Yu

    Abstract: Intra-frame inconsistency has been proved to be effective for the generalization of face forgery detection. However, learning to focus on these inconsistency requires extra pixel-level forged location annotations. Acquiring such annotations is non-trivial. Some existing methods generate large-scale synthesized data with location annotations, which is only composed of real images and cannot capture… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: accepted by ECCV 2022 (oral)

  43. arXiv:2208.00967  [pdf, other

    cs.CV

    Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

    Authors: Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

    Abstract: Graph-based models have achieved great success in person re-identification tasks recently, which compute the graph topology structure (affinities) among different people first and then pass the information across them to achieve stronger features. But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two i… ▽ More

    Submitted 14 November, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

  44. arXiv:2207.03776  [pdf, other

    cs.CV

    Towards Intrinsic Common Discriminative Features Learning for Face Forgery Detection using Adversarial Learning

    Authors: Wanyi Zhuang, Qi Chu, Haojie Yuan, Changtao Miao, Bin Liu, Nenghai Yu

    Abstract: Existing face forgery detection methods usually treat face forgery detection as a binary classification problem and adopt deep convolution neural networks to learn discriminative features. The ideal discriminative features should be only related to the real/fake labels of facial images. However, we observe that the features learned by vanilla classification networks are correlated to unnecessary p… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  45. arXiv:2205.05076  [pdf, other

    cs.CV cs.GR

    Reduce Information Loss in Transformers for Pluralistic Image Inpainting

    Authors: Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu

    Abstract: Transformers have achieved great success in pluralistic image inpainting recently. However, we find existing transformer based solutions regard each pixel as a token, thus suffer from information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration, incurring information loss and extra misalignment for the boundaries of masked reg… ▽ More

    Submitted 15 May, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: CVPR 2022, code is available at https://github.com/liuqk3/PUT

  46. arXiv:2201.01297  [pdf, other

    cs.CV

    Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation

    Authors: Qiankun Liu, Dongdong Chen, Qi Chu, Lu Yuan, Bin Liu, Lei Zhang, Nenghai Yu

    Abstract: Occlusion between different objects is a typical challenge in Multi-Object Tracking (MOT), which often leads to inferior tracking results due to the missing detected objects. The common practice in multi-object tracking is re-identifying the missed objects after their reappearance. Though tracking performance can be boosted by the re-identification, the annotation of identity is required to train… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

    Comments: To Appear at Neurocomputing 2022

  47. arXiv:2110.09510  [pdf, other

    cs.CV cs.LG

    Unsupervised Finetuning

    Authors: Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu

    Abstract: This paper studies "unsupervised finetuning", the symmetrical problem of the well-known "supervised finetuning". Given a pretrained model and small-scale unlabeled target data, unsupervised finetuning is to adapt the representation pretrained from the source domain to the target domain so that better transfer performance can be obtained. This problem is more challenging than the supervised counter… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  48. arXiv:2109.03495  [pdf, other

    cs.CV

    Temporal RoI Align for Video Object Recognition

    Authors: Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

    Abstract: Video object detection is challenging in the presence of appearance deterioration in certain video frames. Therefore, it is a natural choice to aggregate temporal information from other frames of the same video into the current frame. However, RoI Align, as one of the most core procedures of video detectors, still remains extracting features from a single-frame feature map for proposals, making th… ▽ More

    Submitted 10 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: Accpeted by AAAI 2021

  49. arXiv:2108.12382  [pdf, other

    cs.CV

    ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation

    Authors: Zhenchao Jin, Bin Liu, Qi Chu, Nenghai Yu

    Abstract: Co-occurrent visual pattern makes aggregating contextual information a common paradigm to enhance the pixel representation for semantic image segmentation. The existing approaches focus on modeling the context from the perspective of the whole image, i.e., aggregating the image-level contextual information. Despite impressive, these methods weaken the significance of the pixel representations of t… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021

  50. arXiv:2108.11819  [pdf, other

    cs.CV

    Mining Contextual Information Beyond Image for Semantic Segmentation

    Authors: Zhenchao Jin, Tao Gong, Dongdong Yu, Qi Chu, Jian Wang, Changhu Wang, Jie Shao

    Abstract: This paper studies the context aggregation problem in semantic image segmentation. The existing researches focus on improving the pixel representations by aggregating the contextual information within individual images. Though impressive, these methods neglect the significance of the representations of the pixels of the corresponding class beyond the input image. To address this, this paper propos… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021