Skip to main content

Showing 1–50 of 51 results for author: Fei, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.25811  [pdf, ps, other

    cs.CV cs.LG

    Logo-VGR: Visual Grounded Reasoning for Open-world Logo Recognition

    Authors: Zichen Liang, Jingjing Fei, Jie Wang, Zheming Yang, Changqing Li, Pei Wu, Minghui Qiu, Fei Yang, Xialei Liu

    Abstract: Recent advances in multimodal large language models (MLLMs) have been primarily evaluated on general-purpose benchmarks, while their applications in domain-specific scenarios, such as intelligent product moderation, remain underexplored. To address this gap, we introduce an open-world logo recognition benchmark, a core challenge in product moderation. Unlike traditional logo recognition methods th… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  2. arXiv:2509.22640  [pdf, ps, other

    quant-ph cs.DS math-ph math.RT

    High-dimensional quantum Schur transforms

    Authors: Adam Burchardt, Jiani Fei, Dmitry Grinko, Martin Larocca, Maris Ozols, Sydney Timmerman, Vladyslav Visnevskyi

    Abstract: The quantum Schur transform has become a foundational quantum algorithm, yet even after two decades since the seminal 2005 paper by Bacon, Chuang, and Harrow (BCH), some aspects of the transform remain insufficiently understood. Moreover, an alternative approach proposed by Krovi in 2018 was recently found to contain a crucial error. In this paper, we present a corrected version of Krovi's algorit… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  3. arXiv:2508.05658  [pdf, ps, other

    cs.CR cs.CV cs.MM

    Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards

    Authors: Song Yan, Hui Wei, Jinlong Fei, Guoliang Yang, Zhengyu Zhao, Zheng Wang

    Abstract: Various (text) prompt filters and (image) safety checkers have been implemented to mitigate the misuse of Text-to-Image (T2I) models in creating Not-Safe-For-Work (NSFW) content. In order to expose potential security vulnerabilities of such safeguards, multimodal jailbreaks have been studied. However, existing jailbreaks are limited to prompt-specific and image-specific perturbations, which suffer… ▽ More

    Submitted 11 August, 2025; v1 submitted 30 July, 2025; originally announced August 2025.

    Comments: This paper has been accepted by ACM MM 2025

  4. arXiv:2508.05502  [pdf, ps, other

    cs.CV cs.CL

    MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

    Authors: Yufei Gao, Jiaying Fei, Nuo Chen, Ruirui Chen, Guohang Yan, Yunshi Lan, Botian Shi

    Abstract: Multimodal Large Language Models (MLLMs) have shown remarkable performance in high-resource languages. However, their effectiveness diminishes significantly in the contexts of low-resource languages. Current multilingual enhancement methods are often limited to text modality or rely solely on machine translation. While such approaches help models acquire basic linguistic capabilities and produce "… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

  5. arXiv:2506.07016  [pdf, ps, other

    cs.CV cs.AI

    MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

    Authors: Sanjoy Chowdhury, Mohamed Elmoghany, Yohan Abeysinghe, Junjie Fei, Sayan Nag, Salman Khan, Mohamed Elhoseiny, Dinesh Manocha

    Abstract: Large multimodal models (LMMs) have shown remarkable progress in audio-visual understanding, yet they struggle with real-world scenarios that require complex reasoning across extensive video collections. Existing benchmarks for video question answering remain limited in scope, typically involving one clip per query, which falls short of representing the challenges of large-scale, audio-visual retr… ▽ More

    Submitted 13 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

    Comments: Audio-visual learning, Audio-Visual RAG, Multi-Video Linkage

  6. arXiv:2503.19065  [pdf, ps, other

    cs.CV

    WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation

    Authors: Zhongyu Yang, Jun Chen, Dannong Xu, Junjie Fei, Xiaoqian Shen, Liangbing Zhao, Chun-Mei Feng, Mohamed Elhoseiny

    Abstract: Knowledge discovery and collection are intelligence-intensive tasks that traditionally require significant human effort to ensure high-quality outputs. Recent research has explored multi-agent frameworks for automating Wikipedia-style article generation by retrieving and synthesizing information from the internet. However, these methods primarily focus on text-only generation, overlooking the impo… ▽ More

    Submitted 5 September, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: ICCV 2025, Project in https://wikiautogen.github.io/

  7. arXiv:2503.14853  [pdf, ps, other

    cs.CV

    Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection

    Authors: Peipeng Yu, Jianwei Fei, Hui Gao, Xuan Feng, Zhihua Xia, Chip Hong Chang

    Abstract: Current Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in understanding multimodal data, but their potential remains underexplored for deepfake detection due to the misalignment of their knowledge and forensics patterns. To this end, we present a novel framework that unlocks LVLMs' potential capabilities for deepfake detection. Our framework includes a Knowledge-gui… ▽ More

    Submitted 7 June, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: Accepted by ICML 2025

  8. arXiv:2502.18274  [pdf, other

    cs.AI cs.CL

    Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support

    Authors: Guoxin Wang, Minyu Gao, Shuai Yang, Ya Zhang, Lizhi He, Liang Huang, Hanlin Xiao, Yexuan Zhang, Wanyue Li, Lu Chen, Jintao Fei, Xin Li

    Abstract: Large language models (LLMs), particularly those with reasoning capabilities, have rapidly advanced in recent years, demonstrating significant potential across a wide range of applications. However, their deployment in healthcare, especially in disease reasoning tasks, is hindered by the challenge of acquiring expert-level cognitive data. In this paper, we introduce Citrus, a medical language mode… ▽ More

    Submitted 25 February, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  9. arXiv:2502.04515  [pdf, other

    cs.LG cs.AI

    MedGNN: Towards Multi-resolution Spatiotemporal Graph Learning for Medical Time Series Classification

    Authors: Wei Fan, Jingru Fei, Dingyu Guo, Kun Yi, Xiaozhuang Song, Haolong Xiang, Hangting Ye, Min Li

    Abstract: Medical time series has been playing a vital role in real-world healthcare systems as valuable information in monitoring health conditions of patients. Accurate classification for medical time series, e.g., Electrocardiography (ECG) signals, can help for early detection and diagnosis. Traditional methods towards medical time series classification rely on handcrafted feature extraction and statisti… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW 2025

  10. arXiv:2501.17216  [pdf, other

    cs.LG

    Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting

    Authors: Jingru Fei, Kun Yi, Wei Fan, Qi Zhang, Zhendong Niu

    Abstract: We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy r… ▽ More

    Submitted 22 February, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

    Comments: Accepted by AAAI 2025

  11. arXiv:2412.08984  [pdf, other

    q-bio.QM cs.LG

    Predicting Emergency Department Visits for Patients with Type II Diabetes

    Authors: Javad M Alizadeh, Jay S Patel, Gabriel Tajeu, Yuzhou Chen, Ilene L Hollin, Mukesh K Patel, Junchao Fei, Huanmei Wu

    Abstract: Over 30 million Americans are affected by Type II diabetes (T2D), a treatable condition with significant health risks. This study aims to develop and validate predictive models using machine learning (ML) techniques to estimate emergency department (ED) visits among patients with T2D. Data for these patients was obtained from the HealthShare Exchange (HSX), focusing on demographic details, diagnos… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: This manuscript has been accepted and presented at AI-PHSS 2024: The 2024 International Workshop on AI Applications in Public Health and Social Services in conjunction with the 22nd International Conference of Artificial Intelligence in Medicine (AIME 2024)

  12. arXiv:2412.07260  [pdf, other

    cs.CV

    DFREC: DeepFake Identity Recovery Based on Identity-aware Masked Autoencoder

    Authors: Peipeng Yu, Hui Gao, Jianwei Fei, Zhitao Huang, Zhihua Xia, Chip-Hong Chang

    Abstract: Recent advances in deepfake forensics have primarily focused on improving the classification accuracy and generalization performance. Despite enormous progress in detection accuracy across a wide variety of forgery algorithms, existing algorithms lack intuitive interpretability and identity traceability to help with forensic investigation. In this paper, we introduce a novel DeepFake Identity Reco… ▽ More

    Submitted 5 March, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

  13. arXiv:2411.16740  [pdf, other

    cs.CV cs.AI

    Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

    Authors: Jun Chen, Dannong Xu, Junjie Fei, Chun-Mei Feng, Mohamed Elhoseiny

    Abstract: Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for multi-image question-answering are limited in scope, each question is paired with only up to 30 images, which does not fully capture the demands of large-scale retri… ▽ More

    Submitted 6 December, 2024; v1 submitted 23 November, 2024; originally announced November 2024.

    Comments: the correct arxiv version

  14. arXiv:2411.09209  [pdf, other

    cs.CV

    JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation

    Authors: Xuyang Cao, Guoxin Wang, Sheng Shi, Jun Zhao, Yang Yao, Jintao Fei, Minyu Gao

    Abstract: Audio-driven portrait animation has made significant advances with diffusion-based models, improving video quality and lipsync accuracy. However, the increasing complexity of these models has led to inefficiencies in training and inference, as well as constraints on video length and inter-frame continuity. In this paper, we propose JoyVASA, a diffusion-based method for generating facial dynamics a… ▽ More

    Submitted 27 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

  15. arXiv:2411.01623  [pdf, other

    cs.LG cs.AI eess.SP

    FilterNet: Harnessing Frequency Filters for Time Series Forecasting

    Authors: Kun Yi, Jingru Fei, Qi Zhang, Hui He, Shufeng Hao, Defu Lian, Wei Fan

    Abstract: While numerous forecasters have been proposed using different network architectures, the Transformer-based models have state-of-the-art performance in time series forecasting. However, forecasters based on Transformers are still suffering from vulnerability to high-frequency signals, efficiency in computation, and bottleneck in full-spectrum utilization, which essentially are the cornerstones for… ▽ More

    Submitted 4 November, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

    Comments: Accepted by NeurIPS 2024

  16. arXiv:2410.18094  [pdf, other

    q-bio.QM cs.AI cs.LG eess.SP

    Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation

    Authors: Xiangqian Zhu, Mengnan Shi, Xuexin Yu, Chang Liu, Xiaocong Lian, Jintao Fei, Jiangying Luo, Xin Jin, Ping Zhang, Xiangyang Ji

    Abstract: Atrial fibrillation is a commonly encountered clinical arrhythmia associated with stroke and increased mortality. Since professional medical knowledge is required for annotation, exploiting a large corpus of ECGs to develop accurate supervised learning-based atrial fibrillation algorithms remains challenging. Self-supervised learning (SSL) is a promising recipe for generalized ECG representation l… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Preprint submitted to Biomedical Signal Processing and Control

  17. MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals

    Authors: Lei Yu, Jintao Fei, Xinyi Liu, Yang Yao, Jun Zhao, Guoxin Wang, Xin Li

    Abstract: Video-based physiology, exemplified by remote photoplethysmography (rPPG), extracts physiological signals such as pulse and respiration by analyzing subtle changes in video recordings. This non-contact, real-time monitoring method holds great potential for home settings. Despite the valuable contributions of public benchmark datasets to this technology, there is currently no dataset specifically d… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  18. arXiv:2408.14023  [pdf, other

    cs.CV cs.AI

    Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos

    Authors: Jiajun Fei, Dian Li, Zhidong Deng, Zekun Wang, Gang Liu, Hui Wang

    Abstract: Multi-modal large language models (MLLMs) have demonstrated considerable potential across various downstream tasks that require cross-domain knowledge. MLLMs capable of processing videos, known as Video-MLLMs, have attracted broad interest in video-language understanding. However, videos, especially long videos, contain more visual tokens than images, making them difficult for LLMs to process. Exi… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  19. arXiv:2405.18937  [pdf, ps, other

    cs.CV cs.CL

    Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description

    Authors: Mahmoud Ahmed, Junjie Fei, Jian Ding, Eslam Mohamed Bakr, Mohamed Elhoseiny

    Abstract: In this paper, we introduce Part-Aware Point Grounded Description (PaPGD), a challenging task aimed at advancing 3D multimodal learning for fine-grained, part-aware segmentation grounding and detailed explanation of 3D objects. Existing 3D datasets largely focus on either vision-only part segmentation or vision-language scene segmentation, lacking the fine-grained multimodal segmentation needed fo… ▽ More

    Submitted 4 August, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

  20. arXiv:2311.05478  [pdf, other

    cs.CV eess.IV

    Robust Retraining-free GAN Fingerprinting via Personalized Normalization

    Authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni

    Abstract: In recent years, there has been significant growth in the commercial applications of generative models, licensed and distributed by model developers to users, who in turn use them to offer services. In this scenario, there is a need to track and identify the responsible user in the presence of a violation of the license agreement or any kind of malicious usage. Although there are methods enabling… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  21. arXiv:2310.16919  [pdf, other

    cs.CV cs.AI

    Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs

    Authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni

    Abstract: We propose a novel multi-bit box-free watermarking method for the protection of Intellectual Property Rights (IPR) of GANs with improved robustness against white-box attacks like fine-tuning, pruning, quantization, and surrogate model attacks. The watermark is embedded by adding an extra watermarking loss term during GAN training, ensuring that the images generated by the GAN contain an invisible… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  22. arXiv:2307.16525  [pdf, other

    cs.CV cs.CL

    Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

    Authors: Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng

    Abstract: Image-to-text generation aims to describe images using natural language. Recently, zero-shot image captioning based on pre-trained vision-language models (VLMs) and large language models (LLMs) has made significant progress. However, we have observed and empirically demonstrated that these methods are susceptible to modality bias induced by LLMs and tend to generate descriptions containing objects… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  23. arXiv:2306.02061  [pdf, other

    cs.CV

    Balancing Logit Variation for Long-tailed Semantic Segmentation

    Authors: Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen

    Abstract: Semantic segmentation usually suffers from a long-tail data distribution. Due to the imbalanced number of samples across categories, the features of those tail classes may get squeezed into a narrow area in the feature space. Towards a balanced feature distribution, we introduce category-wise variation into the network predictions in the training phase such that an instance is no longer projected… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  24. arXiv:2305.13752  [pdf, other

    cs.CV

    Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

    Authors: Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang

    Abstract: Domain adaptive semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, existing methods primarily focus on directly learning qualified target features, making it challenging to guarantee their discrimination in the absence of target labels. This work provides a new perspective. We observe that the features learned with source data mana… ▽ More

    Submitted 23 October, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCV

  25. arXiv:2305.02677  [pdf, other

    cs.CV

    Caption Anything: Interactive Image Description with Diverse Multimodal Controls

    Authors: Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

    Abstract: Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e.g.}$, looking at the specified regions or telling in a particular text style. State-of-the-art methods are trained on annotated pairs of input controls and output captions. However, the scarcity of such well-annotated multimodal data largely limits… ▽ More

    Submitted 6 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Tech-report

  26. arXiv:2301.12178  [pdf, other

    cs.AI

    MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring

    Authors: Yuzhen Qin, Li Sun, Hui Chen, Wei-qiang Zhang, Wenming Yang, Jintao Fei, Guijin Wang

    Abstract: The widespread emergence of smart devices for ECG has sparked demand for intelligent single-lead ECG-based diagnostic systems. However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information. In this work, we propose inter-lead Multi-View Knowledge Transferring of ECG (MVKT-ECG) to boost single-lead… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

  27. arXiv:2212.14309   

    cs.CV

    Learning to mask: Towards generalized face forgery detection

    Authors: Jianwei Fei, Yunshu Dai, Huaming Wang, Zhihua Xia

    Abstract: Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forg… ▽ More

    Submitted 18 November, 2024; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: Incorrect experimental setting

  28. arXiv:2212.13466  [pdf, other

    cs.CV

    General GAN-generated image detection by data augmentation in fingerprint domain

    Authors: Huaming Wang, Jianwei Fei, Yunshu Dai, Lingyun Leng, Zhihua Xia

    Abstract: In this work, we investigate improving the generalizability of GAN-generated image detectors by performing data augmentation in the fingerprint domain. Specifically, we first separate the fingerprints and contents of the GAN-generated images using an autoencoder based GAN fingerprint extractor, followed by random perturbations of the fingerprints. Then the original fingerprints are substituted wit… ▽ More

    Submitted 9 April, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

  29. arXiv:2211.13968  [pdf, other

    cs.CV

    MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

    Authors: Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, Ye Zheng

    Abstract: Visual anomaly detection plays a crucial role in not only manufacturing inspection to find defects of products during manufacturing processes, but also maintenance inspection to keep equipment in optimum working condition particularly outdoors. Due to the scarcity of the defective samples, unsupervised anomaly detection has attracted great attention in recent years. However, existing datasets for… ▽ More

    Submitted 28 November, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

  30. arXiv:2211.07052  [pdf, other

    cs.LG

    Treatment-RSPN: Recurrent Sum-Product Networks for Sequential Treatment Regimes

    Authors: Adam Dejl, Harsh Deep, Jonathan Fei, Ardavan Saeedi, Li-wei H. Lehman

    Abstract: Sum-product networks (SPNs) have recently emerged as a novel deep learning architecture enabling highly efficient probabilistic inference. Since their introduction, SPNs have been applied to a wide range of data modalities and extended to time-sequence data. In this paper, we propose a general framework for modelling sequential treatment decision-making behaviour and treatment response using recur… ▽ More

    Submitted 13 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 14 pages

    ACM Class: G.3; I.2

  31. arXiv:2209.15490  [pdf, other

    cs.CV

    Learning Second Order Local Anomaly for General Face Forgery Detection

    Authors: Jianwei Fei, Yunshu Dai, Peipeng Yu, Tianrun Shen, Zhihua Xia, Jian Weng

    Abstract: In this work, we propose a novel method to improve the generalization ability of CNN-based face forgery detectors. Our method considers the feature anomalies of forged faces caused by the prevalent blending operations in face forgery algorithms. Specifically, we propose a weakly supervised Second Order Local Anomaly (SOLA) learning module to mine anomalies in local regions using deep feature maps.… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

  32. arXiv:2209.09434  [pdf, other

    cs.AR

    BP-Im2col: Implicit Im2col Supporting AI Backpropagation on Systolic Arrays

    Authors: Jianchao Yang, Mei Wen, Junzhong Shen, Yasong Cao, Minjin Tang, Renyu Yang, Jiawei Fei, Chunyuan Zhang

    Abstract: State-of-the-art systolic array-based accelerators adopt the traditional im2col algorithm to accelerate the inference of convolutional layers. However, traditional im2col cannot efficiently support AI backpropagation. Backpropagation in convolutional layers involves performing transposed convolution and dilated convolution, which usually introduces plenty of zero-spaces into the feature map or ker… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted in ICCD 2022, The 40th IEEE International Conference on Computer Design

  33. arXiv:2209.07237  [pdf, other

    cs.CV

    Robust Implementation of Foreground Extraction and Vessel Segmentation for X-ray Coronary Angiography Image Sequence

    Authors: Zeyu Fu, Zhuang Fu, Chenzhuo Lu, Jun Yan, Jian Fei, Hui Han

    Abstract: The extraction of contrast-filled vessels from X-ray coronary angiography (XCA) image sequence has important clinical significance for intuitively diagnosis and therapy. In this study, the XCA image sequence is regarded as a 3D tensor input, the vessel layer is regarded as a sparse tensor, and the background layer is regarded as a low-rank tensor. Using tensor nuclear norm (TNN) minimization, a no… ▽ More

    Submitted 27 February, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: 34pages, 14figures, 5tables

  34. arXiv:2209.06993  [pdf, other

    cs.CV

    Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

    Authors: Ye Du, Yujun Shen, Haochen Wang, Jingjing Fei, Wei Li, Liwei Wu, Rui Zhao, Zehua Fu, Qingjie Liu

    Abstract: Self-training has shown great potential in semi-supervised learning. Its core idea is to use the model learned on labeled data to generate pseudo-labels for unlabeled samples, and in turn teach itself. To obtain valid supervision, active attempts typically employ a momentum teacher for pseudo-label prediction yet observe the confirmation bias issue, where the incorrect predictions may provide wron… ▽ More

    Submitted 18 September, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: Accepted to NeurIPS 2022

  35. arXiv:2209.03466  [pdf, other

    cs.CV cs.AI

    Supervised GAN Watermarking for Intellectual Property Protection

    Authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni

    Abstract: We propose a watermarking method for protecting the Intellectual Property (IP) of Generative Adversarial Networks (GANs). The aim is to watermark the GAN model so that any image generated by the GAN contains an invisible watermark (signature), whose presence inside the image can be checked at a later stage for ownership verification. To achieve this goal, a pre-trained CNN watermarking decoding bl… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  36. arXiv:2206.11476  [pdf, other

    cs.CV

    Dynamic Scene Deblurring Based on Continuous Cross-Layer Attention Transmission

    Authors: Xia Hua, Mingxin Li, Junxiong Fei, Yu Shi, JianGuo Liu, Hanyu Hong

    Abstract: The deep convolutional neural networks (CNNs) using attention mechanism have achieved great success for dynamic scene deblurring. In most of these networks, only the features refined by the attention maps can be passed to the next layer and the attention maps of different layers are separated from each other, which does not make full use of the attention information from different layers in the CN… ▽ More

    Submitted 28 January, 2023; v1 submitted 23 June, 2022; originally announced June 2022.

  37. DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature Extraction

    Authors: Jiajun Fei, Ziyu Zhu, Wenlei Liu, Zhidong Deng, Mingyang Li, Huanjun Deng, Shuo Zhang

    Abstract: Existing permutation-invariant methods can be divided into two categories according to the aggregation scope, i.e. global aggregation and local one. Although the global aggregation methods, e. g., PointNet and Deep Sets, get involved in simpler structures, their performance is poorer than the local aggregation ones like PointNet++ and Point Transformer. It remains an open problem whether there exi… ▽ More

    Submitted 30 August, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: 16 pages, accepted by AAAI 2022 (https://ojs.aaai.org/index.php/AAAI/article/view/19939), with technical appendix

  38. arXiv:2203.03884  [pdf, other

    cs.CV

    Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

    Authors: Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le

    Abstract: The crux of semi-supervised semantic segmentation is to assign adequate pseudo-labels to the pixels of unlabeled images. A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability. We argue that every pixel matters to the model training, even its prediction is ambiguous. Intuit… ▽ More

    Submitted 14 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022. Project: https://haochen-wang409.github.io/U2PL/

  39. arXiv:2202.13067  [pdf, other

    cs.CV

    A Robust Document Image Watermarking Scheme using Deep Neural Network

    Authors: Sulong Ge, Zhihua Xia, Jianwei Fei, Xingming Sun, Jian Weng

    Abstract: Watermarking is an important copyright protection technology which generally embeds the identity information into the carrier imperceptibly. Then the identity can be extracted to prove the copyright from the watermarked carrier even after suffering various attacks. Most of the existing watermarking technologies take the nature images as carriers. Different from the natural images, document images… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

  40. arXiv:2112.06095  [pdf, other

    cs.NI cs.DC

    Unlocking the Power of Inline Floating-Point Operations on Programmable Switches

    Authors: Yifan Yuan, Omar Alama, Amedeo Sapio, Jiawei Fei, Jacob Nelson, Dan R. K. Ports, Marco Canini, Nam Sung Kim

    Abstract: The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network prot… ▽ More

    Submitted 11 December, 2021; originally announced December 2021.

    Comments: This paper has been accepted by NSDI'22. This arxiv paper is not the final camera-ready version

  41. arXiv:2108.08166  [pdf, other

    cs.CV cs.RO

    Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization

    Authors: Lukas Stäcker, Juncong Fei, Philipp Heidenreich, Frank Bonarens, Jason Rambach, Didier Stricker, Christoph Stiller

    Abstract: Deep neural networks have proven increasingly important for automotive scene understanding with new algorithms offering constant improvements of the detection performance. However, there is little emphasis on experiences and needs for deployment in embedded environments. We therefore perform a case study of the deployment of two representative object detection networks on an edge AI platform. In p… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: To present in ICCV 2021 (ERCVAD Workshop)

  42. arXiv:2107.00346  [pdf, other

    cs.CV cs.RO

    MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense Top-View Understanding

    Authors: Kunyu Peng, Juncong Fei, Kailun Yang, Alina Roitberg, Jiaming Zhang, Frank Bieder, Philipp Heidenreich, Christoph Stiller, Rainer Stiefelhagen

    Abstract: At the heart of all automated driving systems is the ability to sense the surroundings, e.g., through semantic segmentation of LiDAR sequences, which experienced a remarkable progress due to the release of large datasets such as SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse segmentation of the LiDAR input, dense output masks provide self-driving cars with almost co… ▽ More

    Submitted 20 January, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). Code is publicly available at https://github.com/KPeng9510/MASS

  43. arXiv:2105.04169  [pdf, other

    cs.CV cs.RO

    PillarSegNet: Pillar-based Semantic Grid Map Estimation using Sparse LiDAR Data

    Authors: Juncong Fei, Kunyu Peng, Philipp Heidenreich, Frank Bieder, Christoph Stiller

    Abstract: Semantic understanding of the surrounding environment is essential for automated vehicles. The recent publication of the SemanticKITTI dataset stimulates the research on semantic segmentation of LiDAR point clouds in urban scenarios. While most existing approaches predict sparse pointwise semantic classes for the sparse input LiDAR scan, we propose PillarSegNet to be able to output a dense semanti… ▽ More

    Submitted 5 July, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted to present in the 2021 IEEE Intelligent Vehicles Symposium (IV21)

  44. SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation

    Authors: Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, Christoph Stiller

    Abstract: 3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public be… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

    Comments: Accepted to present in the 2020 IEEE International Conference on Multisensor Fusion and Integration (MFI 2020)

  45. arXiv:2007.13902  [pdf, other

    cs.CY cs.LG econ.GN stat.AP

    Leveraging the Power of Place: A Data-Driven Decision Helper to Improve the Location Decisions of Economic Immigrants

    Authors: Jeremy Ferwerda, Nicholas Adams-Cohen, Kirk Bansak, Jennifer Fei, Duncan Lawrence, Jeremy M. Weinstein, Jens Hainmueller

    Abstract: A growing number of countries have established programs to attract immigrants who can contribute to their economy. Research suggests that an immigrant's initial arrival location plays a key role in shaping their economic success. Yet immigrants currently lack access to personalized information that would help them identify optimal destinations. Instead, they often rely on availability heuristics,… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: 51 pages (including appendix), 13 figures. Immigration Policy Lab (IPL) Working Paper Series, Working Paper No. 20-06

  46. arXiv:2002.11573  [pdf, other

    cs.RO cs.AI

    Efficient reinforcement learning control for continuum robots based on Inexplicit Prior Knowledge

    Authors: Junjia Liu, Jiaying Shou, Zhuang Fu, Hangfei Zhou, Rongli Xie, Jun Zhang, Jian Fei, Yanna Zhao

    Abstract: Compared to rigid robots that are generally studied in reinforcement learning, the physical characteristics of some sophisticated robots such as soft or continuum robots are higher complicated. Moreover, recent reinforcement learning methods are data-inefficient and can not be directly deployed to the robot without simulation. In this paper, we propose an efficient reinforcement learning method ba… ▽ More

    Submitted 2 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: 11 pages, 12 figures

  47. arXiv:1906.03647  [pdf, other

    stat.ML cs.LG

    A Variant of Gaussian Process Dynamical Systems

    Authors: Jing Zhao, Jingjing Fei, Shiliang Sun

    Abstract: In order to better model high-dimensional sequential data, we propose a collaborative multi-output Gaussian process dynamical system (CGPDS), which is a novel variant of GPDSs. The proposed model assumes that the output on each dimension is controlled by a shared global latent process and a private local latent process. Thus, the dependence among different dimensions of the sequences can be captur… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: Technical Report, East China Normal University, November 2018

  48. arXiv:1905.05761  [pdf, ps, other

    cs.LG stat.ML

    Online Anomaly Detection with Sparse Gaussian Processes

    Authors: Jingjing Fei, Shiliang Sun

    Abstract: Online anomaly detection of time-series data is an important and challenging task in machine learning. Gaussian processes (GPs) are powerful and flexible models for modeling time-series data. However, the high time complexity of GPs limits their applications in online anomaly detection. Attributed to some internal or external changes, concept drift usually occurs in time-series data, where the cha… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

  49. arXiv:1901.05571  [pdf, other

    cs.NI

    Metaflow: A DAG-Based Network Abstraction for Distributed Applications

    Authors: Jiawei Fei, Yang Shi, Qun Huang, Mei Wen

    Abstract: In the past decade, increasingly network scheduling techniques have been proposed to boost the distributed application performance. Flow-level metrics, such as flow completion time (FCT), are based on the abstraction of flows yet they cannot capture the semantics of communication in a cluster application. Being aware of this problem, coflow is proposed as a new network abstraction. However, it is… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

  50. arXiv:1311.3105  [pdf

    cs.NI

    k-DAG Based Lifetime Aware Data Collection in Wireless Sensor Networks

    Authors: Jingjing Fei, Hui Wu, Yongxin Wang

    Abstract: Wireless Sensor Networks need to be organized for efficient data collection and lifetime maximization. In this paper, we propose a novel routing structure, namely k-DAG, to balance the load of the base station's neighbours while providing the worst-case latency guarantee for data collection, and a distributed algorithm for construction a k-DAG based on a SPD (Shortest Path DAG). In a k-DAG, the le… ▽ More

    Submitted 13 November, 2013; originally announced November 2013.

    Comments: 17 pages, 10 figures

    Journal ref: International Journal of Wireless & Mobile Networks (IJWMN) Vol. 5, No. 5, October 2013, pp.17-33