Skip to main content

Showing 1–50 of 55 results for author: Gui, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.07968  [pdf, other

    cs.CL

    Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

    Authors: Weiyi Wu, Xinwen Xu, Chongyang Gao, Xingjian Diao, Siting Li, Lucas A. Salas, Jiang Gui

    Abstract: Large Language Models (LLMs) have great potential in the field of health care, yet they face great challenges in adapting to rapidly evolving medical knowledge. This can lead to outdated or contradictory treatment suggestions. This study investigated how LLMs respond to evolving clinical guidelines, focusing on concept drift and internal inconsistencies. We developed the DriftMedQA benchmark to si… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  2. arXiv:2504.14800  [pdf, other

    cs.LG cs.CV

    A Survey on Small Sample Imbalance Problem: Metrics, Feature Analysis, and Solutions

    Authors: Shuxian Zhao, Jie Gui, Minjing Dong, Baosheng Yu, Zhipeng Gui, Lu Dong, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: The small sample imbalance (S&I) problem is a major challenge in machine learning and data analysis. It is characterized by a small number of samples and an imbalanced class distribution, which leads to poor model performance. In addition, indistinct inter-class feature distributions further complicate classification tasks. Existing methods often rely on algorithmic heuristics without sufficiently… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  3. arXiv:2504.14253  [pdf, other

    cs.CV

    ColorVein: Colorful Cancelable Vein Biometrics

    Authors: Yifan Wang, Jie Gui, Xinli Shi, Linqing Gui, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Vein recognition technologies have become one of the primary solutions for high-security identification systems. However, the issue of biometric information leakage can still pose a serious threat to user privacy and anonymity. Currently, there is no cancelable biometric template generation scheme specifically designed for vein biometrics. Therefore, this paper proposes an innovative cancelable ve… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  4. arXiv:2503.20281  [pdf, other

    cs.CR cs.AI

    Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems

    Authors: Chenglong Wang, Pujia Zheng, Jiaping Gui, Cunqing Hua, Wajih Ul Hassan

    Abstract: Network Intrusion Detection Systems (NIDS) are vital for ensuring enterprise security. Recently, Graph-based NIDS (GIDS) have attracted considerable attention because of their capability to effectively capture the complex relationships within the graph structures of data communications. Despite their promise, the reproducibility and replicability of these GIDS remain largely unexplored, posing cha… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  5. arXiv:2503.13962  [pdf, ps, other

    cs.CV

    Survey of Adversarial Robustness in Multimodal Large Language Models

    Authors: Chengze Jiang, Zhuangzhuang Wang, Minjing Dong, Jie Gui

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence by facilitating integrated understanding across diverse modalities, including text, images, video, audio, and speech. However, their deployment in real-world applications raises significant concerns about adversarial vulnerabilities that could compromise their safety and reliability. Unlik… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: 9 pages

  6. arXiv:2503.12958  [pdf, other

    cs.CR

    FedSDP: Explainable Differential Privacy in Federated Learning via Shapley Values

    Authors: Yunbo Li, Jiaping Gui, Yue Wu

    Abstract: Federated learning (FL) enables participants to store data locally while collaborating in training, yet it remains vulnerable to privacy attacks, such as data reconstruction. Existing differential privacy (DP) technologies inject noise dynamically into the training process to mitigate the impact of excessive noise. However, this dynamic scheduling is often grounded in factors indirectly related to… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  7. arXiv:2503.03107  [pdf, other

    cs.CL cs.AI

    External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection

    Authors: Biwei Cao, Qihang Wu, Jiuxin Cao, Bo Liu, Jie Gui

    Abstract: With the rapid development of the Internet, the information dissemination paradigm has changed and the efficiency has been improved greatly. While this also brings the quick spread of fake news and leads to negative impacts on cyberspace. Currently, the information presentation formats have evolved gradually, with the news formats shifting from texts to multimodal contents. As a result, detecting… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: accepted by AAAI'25

  8. arXiv:2502.06710  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Learning Musical Representations for Music Performance Question Answering

    Authors: Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Jiang Gui

    Abstract: Music performances are representative scenarios for audio-visual modeling. Unlike common scenarios with sparse audio, music performances continuously involve dense audio signals throughout. While existing multimodal learning methods on the audio-video QA demonstrate impressive capabilities in general scenarios, they are incapable of dealing with fundamental problems within the music performances:… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted at EMNLP 2024

  9. arXiv:2502.06020  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

    Authors: Xingjian Diao, Chunhui Zhang, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui

    Abstract: Multimodal foundation models (MFMs) have demonstrated significant success in tasks such as visual captioning, question answering, and image-text retrieval. However, these models face inherent limitations due to their finite internal capacity, which restricts their ability to process extended temporal sequences, a crucial requirement for comprehensive video and audio analysis. To overcome these cha… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted at NAACL 2025

  10. arXiv:2501.13795  [pdf, other

    cs.CV

    Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models

    Authors: Chaolei Han, Hongsong Wang, Jidong Kuang, Lei Zhang, Jie Gui

    Abstract: Existing zero-shot temporal action detection (ZSTAD) methods predominantly use fully supervised or unsupervised strategies to recognize unseen activities. However, these training-based methods are prone to domain shifts and require high computational costs, which hinder their practical applicability in real-world scenarios. In this paper, unlike previous works, we propose a training-Free Zero-shot… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  11. Outage Probability Analysis of Uplink Heterogeneous Non-terrestrial Networks: A Novel Stochastic Geometry Model

    Authors: Wen-Yu Dong, Shaoshi Yang, Wei Lin, Wei Zhao, Jia-Xing Gui, Sheng Chen

    Abstract: In harsh environments such as mountainous terrain, dense vegetation areas, or urban landscapes, a single type of unmanned aerial vehicles (UAVs) may encounter challenges like flight restrictions, difficulty in task execution, or increased risk. Therefore, employing multiple types of UAVs, along with satellite assistance, to collaborate becomes essential in such scenarios. In this context, we prese… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: 5 pages,6 figures, conference

    Journal ref: in Proc. IEEE Global Communications Conference (GLOBECOM 2024), Cape Town, South Africa, Dec. 8-12, 2024, pp. 2593-2598

  12. arXiv:2412.17210  [pdf, other

    cs.CV

    Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection

    Authors: Hongsong Wang, Andi Xu, Pinle Ding, Jie Gui

    Abstract: Video Anomaly Detection (VAD) is essential for computer vision research. Existing VAD methods utilize either reconstruction-based or prediction-based frameworks. The former excels at detecting irregular patterns or structures, whereas the latter is capable of spotting abnormal deviations or trends. We address pose-based video anomaly detection and introduce a novel framework called Dual Conditione… ▽ More

    Submitted 8 March, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

    Comments: Code is on https://github.com/guijiejie/DCMD-main

  13. BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation

    Authors: Haiyang Yu, Tian Xie, Jiaping Gui, Pengyang Wang, Ping Yi, Yue Wu

    Abstract: Over the past few years, the emergence of backdoor attacks has presented significant challenges to deep learning systems, allowing attackers to insert backdoors into neural networks. When data with a trigger is processed by a backdoor model, it can lead to mispredictions targeted by attackers, whereas normal data yields regular results. The scope of backdoor attacks is expanding beyond computer vi… ▽ More

    Submitted 6 March, 2025; v1 submitted 17 November, 2024; originally announced November 2024.

  14. arXiv:2410.05500  [pdf, other

    cs.CV cs.AI cs.LG

    Residual Kolmogorov-Arnold Network for Enhanced Deep Learning

    Authors: Ray Congrui Yu, Sherry Wu, Jiang Gui

    Abstract: Despite their immense success, deep neural networks (CNNs) are costly to train, while modern architectures can retain hundreds of convolutional layers in network depth. Standard convolutional operations are fundamentally limited by their linear nature along with fixed activations, where multiple layers are needed to learn complex patterns, making this approach computationally inefficient and prone… ▽ More

    Submitted 4 March, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Code is available at https://github.com/withray/residualKAN.git

    ACM Class: I.2.10; I.4.10; I.4.3; I.4.9

  15. arXiv:2409.19685  [pdf, other

    cs.CV

    Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation

    Authors: Xiaofeng Cong, Jing Zhang, Yeying Jin, Junming Hou, Yu Zhao, Jie Gui, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Underwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our ap… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  16. arXiv:2409.17589  [pdf, other

    cs.CV cs.AI

    Improving Fast Adversarial Training via Self-Knowledge Guidance

    Authors: Chengze Jiang, Junkai Wang, Minjing Dong, Jie Gui, Xinli Shi, Yuan Cao, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Adversarial training has achieved remarkable advancements in defending against adversarial attacks. Among them, fast adversarial training (FAT) is gaining attention for its ability to achieve competitive robustness with fewer computing resources. Existing FAT methods typically employ a uniform strategy that optimizes all training data equally without considering the influence of different examples… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 13 pages

  17. CFVNet: An End-to-End Cancelable Finger Vein Network for Recognition

    Authors: Yifan Wang, Jie Gui, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Finger vein recognition technology has become one of the primary solutions for high-security identification systems. However, it still has information leakage problems, which seriously jeopardizes users privacy and anonymity and cause great security risks. In addition, there is no work to consider a fully integrated secure finger vein recognition system. So, different from the previous systems, we… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Journal ref: in IEEE Transactions on Information Forensics and Security, vol. 19, pp. 7810-7823, 2024

  18. arXiv:2409.14336  [pdf, other

    cs.CV

    Zero-Shot Skeleton-based Action Recognition with Dual Visual-Text Alignment

    Authors: Jidong Kuang, Hongsong Wang, Chaolei Han, Jie Gui

    Abstract: Zero-shot action recognition, which addresses the issue of scalability and generalization in action recognition and allows the models to adapt to new and unseen actions dynamically, is an important research topic in computer vision communities. The key to zero-shot action recognition lies in aligning visual features with semantic vectors representing action categories. Most existing methods either… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

  19. arXiv:2409.06420  [pdf, other

    eess.IV cs.CV

    Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models

    Authors: Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Learning-based methods for underwater image enhancement (UWIE) have undergone extensive exploration. However, learning-based models are usually vulnerable to adversarial examples so as the UWIE models. To the best of our knowledge, there is no comprehensive study on the adversarial robustness of UWIE models, which indicates that UWIE models are potentially under the threat of adversarial attacks.… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  20. arXiv:2408.17129  [pdf, ps, other

    cs.LG cs.AI

    Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

    Authors: Xiaodi Li, Jianfeng Gui, Qian Gao, Haoyuan Shi, Zhenyu Yue

    Abstract: Graph Neural Networks have been widely applied in critical decision-making areas that demand interpretable predictions, leading to the flourishing development of interpretability algorithms. However, current graph interpretability algorithms tend to emphasize generality and often overlook biological significance, thereby limiting their applicability in predicting cancer drug responses. In this pap… ▽ More

    Submitted 3 September, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

  21. arXiv:2408.15778  [pdf, other

    cs.AI cs.CL

    LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

    Authors: Jiayi Gui, Yiming Liu, Jiale Cheng, Xiaotao Gu, Xiao Liu, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning and critical for practical LLM agents and decision-making systems. However, evaluating LLMs as effective rule-based executors and planners remains under… ▽ More

    Submitted 12 October, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  22. arXiv:2408.03944  [pdf, other

    cs.CV cs.LG

    Improving Fast Adversarial Training Paradigm: An Example Taxonomy Perspective

    Authors: Jie Gui, Chengze Jiang, Minjing Dong, Kun Tong, Xinli Shi, Yuan Yan Tang, Dacheng Tao

    Abstract: While adversarial training is an effective defense method against adversarial attacks, it notably increases the training cost. To this end, fast adversarial training (FAT) is presented for efficient training and has become a hot research topic. However, FAT suffers from catastrophic overfitting, which leads to a performance drop compared with multi-step adversarial training. However, the cause of… ▽ More

    Submitted 26 September, 2024; v1 submitted 21 July, 2024; originally announced August 2024.

    Comments: 15 pages

  23. arXiv:2407.09924  [pdf, other

    cs.CV

    Region-aware Image-based Human Action Retrieval with Transformers

    Authors: Hongsong Wang, Jianhua Zhao, Jie Gui

    Abstract: Human action understanding is a fundamental and challenging task in computer vision. Although there exists tremendous research on this area, most works focus on action recognition, while action retrieval has received less attention. In this paper, we focus on the neglected but important task of image-based action retrieval which aims to find images that depict the same action as a query image. We… ▽ More

    Submitted 28 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  24. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong , et al. (34 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 29 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  25. arXiv:2406.09333  [pdf, other

    cs.CV

    Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

    Authors: Weiyi Wu, Chongyang Gao, Xinwen Xu, Siting Li, Jiang Gui

    Abstract: Whole Slide Images (WSIs) are crucial for modern pathological diagnosis, yet their gigapixel-scale resolutions and sparse informative regions pose significant computational challenges. Traditional dense attention mechanisms, widely used in computer vision and natural language processing, are impractical for WSI analysis due to the substantial data scale and the redundant processing of uninformativ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  26. arXiv:2405.19684  [pdf, other

    cs.CV

    A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning

    Authors: Xiaofeng Cong, Yu Zhao, Jie Gui, Junming Hou, Dacheng Tao

    Abstract: Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent. To foster future advancements, we provide a detailed overview of the UIE task from several perspectives. Firstly, we introduce the physical models, data construction processes, evaluation metrics,… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: A survey on the underwater image enhancement task

  27. arXiv:2405.19062  [pdf, other

    cs.LG cs.AI

    SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs

    Authors: Lanting Fang, Yulian Yang, Kai Wang, Shanshan Feng, Kaiyu Feng, Jie Gui, Shuliang Wang, Yew-Soon Ong

    Abstract: While dynamic graph neural networks have shown promise in various applications, explaining their predictions on continuous-time dynamic graphs (CTDGs) is difficult. This paper investigates a new research task: self-interpretable GNNs for CTDGs. We aim to predict future links within the dynamic graph while simultaneously providing causal explanations for these predictions. There are two key challen… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 19 pages

  28. arXiv:2405.16086  [pdf, other

    cs.DC cs.PF

    An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning

    Authors: Yunbo Li, Jiaping Gui, Yue Wu

    Abstract: Federated learning is highly valued due to its high-performance computing in distributed environments while safeguarding data privacy. To address resource heterogeneity, researchers have proposed a semi-asynchronous federated learning (SAFL) architecture. However, the performance gap between different aggregation targets in SAFL remain unexplored. In this paper, we systematically compare the per… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  29. arXiv:2404.13830  [pdf, other

    cs.CV

    Deep Learning-Based Point Cloud Registration: A Comprehensive Survey and Taxonomy

    Authors: Yu-Xin Zhang, Jie Gui, Baosheng Yu, Xiaofeng Cong, Xin Gong, Wenbing Tao, Dacheng Tao

    Abstract: Point cloud registration involves determining a rigid transformation to align a source point cloud with a target point cloud. This alignment is fundamental in applications such as autonomous driving, robotics, and medical imaging, where precise spatial correspondence is essential. Deep learning has greatly advanced point cloud registration by providing robust and efficient methods that address the… ▽ More

    Submitted 1 February, 2025; v1 submitted 21 April, 2024; originally announced April 2024.

  30. arXiv:2403.18548  [pdf, other

    cs.CV

    A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

    Authors: Xiaofeng Cong, Jie Gui, Jing Zhang, Junming Hou, Hao Shen

    Abstract: Existing research based on deep learning has extensively explored the problem of daytime image dehazing. However, few studies have considered the characteristics of nighttime hazy scenes. There are two distinctions between nighttime and daytime haze. First, there may be multiple active colored light sources with lower illumination intensity in nighttime scenes, which may cause haze, glow and noise… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by CVPR2024

  31. arXiv:2312.04065  [pdf, other

    cs.LG

    A Robust and Efficient Boundary Point Detection Method by Measuring Local Direction Dispersion

    Authors: Dehua Peng, Zhipeng Gui, Jie Gui, Huayi Wu

    Abstract: Boundary point detection aims to outline the external contour structure of clusters and enhance the inter-cluster discrimination, thus bolstering the performance of the downstream classification and clustering tasks. However, existing boundary point detectors are sensitive to density heterogeneity or cannot identify boundary points in concave structures and high-dimensional manifolds. In this work… ▽ More

    Submitted 25 February, 2025; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 14 pages, 12 figures, 3 tables

    ACM Class: I.5.2

  32. arXiv:2306.05675  [pdf, other

    cs.CV

    Illumination Controllable Dehazing Network based on Unsupervised Retinex Embedding

    Authors: Jie Gui, Xiaofeng Cong, Lei He, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: On the one hand, the dehazing task is an illposedness problem, which means that no unique solution exists. On the other hand, the dehazing task should take into account the subjective factor, which is to give the user selectable dehazed images rather than a single result. Therefore, this paper proposes a multi-output dehazing network by introducing illumination controllable ability, called IC-Deha… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  33. arXiv:2303.18049  [pdf, other

    cs.CL

    No Place to Hide: Dual Deep Interaction Channel Network for Fake News Detection based on Data Augmentation

    Authors: Biwei Cao, Lulu Hua, Jiuxin Cao, Jie Gui, Bo Liu, James Tin-Yau Kwok

    Abstract: Online Social Network (OSN) has become a hotbed of fake news due to the low cost of information dissemination. Although the existing methods have made many attempts in news content and propagation structure, the detection of fake news is still facing two challenges: one is how to mine the unique key features and evolution patterns, and the other is how to tackle the problem of small samples to bui… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  34. arXiv:2303.17255  [pdf, other

    cs.CV cs.CR eess.IV

    Fooling the Image Dehazing Models by First Order Gradient

    Authors: Jie Gui, Xiaofeng Cong, Chengwei Peng, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: The research on the single image dehazing task has been widely explored. However, as far as we know, no comprehensive study has been conducted on the robustness of the well-trained dehazing models. Therefore, there is no evidence that the dehazing networks can resist malicious attacks. In this paper, we focus on designing a group of attack methods based on first order gradient to verify the robust… ▽ More

    Submitted 15 February, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: This paper is accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

  35. arXiv:2301.05712  [pdf, other

    cs.LG

    A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

    Authors: Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, Dacheng Tao

    Abstract: Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance. However, the process of collecting and labeling such data can be expensive and time-consuming. Self-supervised learning (SSL), a subset of unsupervised learning, aims to learn discriminative features from unlabeled data without relying on human-annotated labels. SSL has garnere… ▽ More

    Submitted 14 July, 2024; v1 submitted 13 January, 2023; originally announced January 2023.

    Comments: This paper is accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  36. Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning

    Authors: Jidong Ge, Yuxiang Liu, Jie Gui, Lanting Fang, Ming Lin, James Tin-Yau Kwok, LiGuo Huang, Bin Luo

    Abstract: Self-supervised learning enables networks to learn discriminative features from massive data itself. Most state-of-the-art methods maximize the similarity between two augmentations of one image based on contrastive learning. By utilizing the consistency of two augmentations, the burden of manual annotations can be freed. Contrastive learning exploits instance-level information to learn robust feat… ▽ More

    Submitted 5 June, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

    Comments: This paper is accepted by IEEE Transactions on Image Processing

  37. arXiv:2212.03112  [pdf, other

    cs.DB cs.IR cs.LG

    Fast Online Hashing with Multi-Label Projection

    Authors: Wenzhe Jia, Yuan Cao, Junwei Liu, Jie Gui

    Abstract: Hashing has been widely researched to solve the large-scale approximate nearest neighbor search problem owing to its time and storage superiority. In recent years, a number of online hashing methods have emerged, which can update the hash functions to adapt to the new stream data and realize dynamic retrieval. However, existing online hashing methods are required to update the whole database with… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: This paper is accepted by AAAI Conference on Artificial Intelligence (AAAI), 2023

  38. arXiv:2211.15362  [pdf, other

    cs.CV cs.LG

    Exploring the Coordination of Frequency and Attention in Masked Image Modeling

    Authors: Jie Gui, Tuo Chen, Minjing Dong, Zhengqi Liu, Hao Luo, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Recently, masked image modeling (MIM), which learns visual representations by reconstructing the masked patches of an image, has dominated self-supervised learning in computer vision. However, the pre-training of MIM always takes massive time due to the large-scale data and large-size backbones. We mainly attribute it to the random patch masking in previous MIM works, which fails to leverage the c… ▽ More

    Submitted 28 September, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

  39. AlignVE: Visual Entailment Recognition Based on Alignment Relations

    Authors: Biwei Cao, Jiuxin Cao, Jie Gui, Jiayun Shen, Bo Liu, Lei He, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Visual entailment (VE) is to recognize whether the semantics of a hypothesis text can be inferred from the given premise image, which is one special task among recent emerged vision and language understanding tasks. Currently, most of the existing VE approaches are derived from the methods of visual question answering. They recognize visual entailment by quantifying the similarity between the hypo… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: This paper is accepted for publication as a REGULAR paper in the IEEE Transactions on Multimedia

  40. arXiv:2110.10709  [pdf

    physics.med-ph cs.LG eess.IV

    Predicting Tau Accumulation in Cerebral Cortex with Multivariate MRI Morphometry Measurements, Sparse Coding, and Correntropy

    Authors: Jianfeng Wu, Wenhui Zhu, Yi Su, Jie Gui, Natasha Lepore, Eric M. Reiman, Richard J. Caselli, Paul M. Thompson, Kewei Chen, Yalin Wang

    Abstract: Biomarker-assisted diagnosis and intervention in Alzheimer's disease (AD) may be the key to prevention breakthroughs. One of the hallmarks of AD is the accumulation of tau plaques in the human brain. However, current methods to detect tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (Tau PET). In our previous work, structural MRI-based hippocampal multiv… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: 10 pages, 5 figures, 17th International Symposium on Medical Information Processing and Analysis

  41. arXiv:2106.06996  [pdf, other

    eess.IV cs.CV

    Pyramidal Dense Attention Networks for Lightweight Image Super-Resolution

    Authors: Huapeng Wu, Jie Gui, Jun Zhang, James T. Kwok, Zhihui Wei

    Abstract: Recently, deep convolutional neural network methods have achieved an excellent performance in image superresolution (SR), but they can not be easily applied to embedded devices due to large memory cost. To solve this problem, we propose a pyramidal dense attention network (PDAN) for lightweight image super-resolution in this paper. In our method, the proposed pyramidal dense learning can gradually… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  42. arXiv:2106.06966  [pdf, other

    eess.IV cs.CV

    Feedback Pyramid Attention Networks for Single Image Super-Resolution

    Authors: Huapeng Wu, Jie Gui, Jun Zhang, James T. Kwok, Zhihui Wei

    Abstract: Recently, convolutional neural network (CNN) based image super-resolution (SR) methods have achieved significant performance improvement. However, most CNN-based methods mainly focus on feed-forward architecture design and neglect to explore the feedback mechanism, which usually exists in the human visual system. In this paper, we propose feedback pyramid attention networks (FPAN) to fully exploit… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  43. arXiv:2106.03323  [pdf, other

    cs.CV cs.LG

    A Comprehensive Survey and Taxonomy on Single Image Dehazing Based on Deep Learning

    Authors: Jie Gui, Xiaofeng Cong, Yuan Cao, Wenqi Ren, Jun Zhang, Jing Zhang, Jiuxin Cao, Dacheng Tao

    Abstract: With the development of convolutional neural networks, hundreds of deep learning based dehazing methods have been proposed. In this paper, we provide a comprehensive survey on supervised, semi-supervised, and unsupervised single image dehazing. We first discuss the physical model, datasets, network modules, loss functions, and evaluation metrics that are commonly used. Then, the main contributions… ▽ More

    Submitted 20 December, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: This paper is accepted by ACM Computing Surveys

  44. arXiv:2104.00453  [pdf, ps, other

    cs.LG math.FA

    Learning Rates for Multi-task Regularization Networks

    Authors: Jie Gui, Haizhang Zhang

    Abstract: Multi-task learning is an important trend of machine learning in facing the era of artificial intelligence and big data. Despite a large amount of researches on learning rate estimates of various single-task machine learning algorithms, there is little parallel work for multi-task learning. We present mathematical analysis on the learning rate estimate of multi-task learning based on the theory of… ▽ More

    Submitted 28 September, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  45. arXiv:2103.11590  [pdf, other

    cs.CV

    Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

    Authors: Yuxiang Liu, Jidong Ge, Chuanyi Li, Jie Gui

    Abstract: Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the perspective of variance transmission by investigating the relationship between BN and Weights Normalization (WN). In this work, we demonstrate that the problem of the shi… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: This paper has been accepted by AAAI21

  46. arXiv:2005.07427  [pdf, other

    cs.LG cs.SI stat.ML

    Structural Temporal Graph Neural Networks for Anomaly Detection in Dynamic Graphs

    Authors: Lei Cai, Zhengzhang Chen, Chen Luo, Jiaping Gui, Jingchao Ni, Ding Li, Haifeng Chen

    Abstract: Detecting anomalies in dynamic graphs is a vital task, with numerous practical applications in areas such as security, finance, and social media. Previous network embedding based methods have been mostly focusing on learning good node representations, whereas largely ignoring the subgraph structural changes related to the target nodes in dynamic graphs. In this paper, we propose StrGNN, an end-to-… ▽ More

    Submitted 25 May, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

  47. arXiv:2004.01143  [pdf, other

    stat.ML cs.LG

    Randomized Kernel Multi-view Discriminant Analysis

    Authors: Xiaoyun Li, Jie Gui, Ping Li

    Abstract: In many artificial intelligence and computer vision systems, the same object can be observed at distinct viewpoints or by diverse sensors, which raises the challenges for recognizing objects from different, even heterogeneous views. Multi-view discriminant analysis (MvDA) is an effective multi-view subspace learning method, which finds a discriminant common subspace by jointly learning multiple vi… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

  48. arXiv:2001.06937  [pdf, other

    cs.LG stat.ML

    A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

    Authors: Jie Gui, Zhenan Sun, Yonggang Wen, Dacheng Tao, Jieping Ye

    Abstract: Generative adversarial networks (GANs) are a hot research topic recently. GANs have been widely studied since 2014, and a large number of algorithms have been proposed. However, there is few comprehensive study explaining the connections among different GANs variants, and how they have evolved. In this paper, we attempt to provide a review on various GANs methods from the perspectives of algorithm… ▽ More

    Submitted 19 January, 2020; originally announced January 2020.

  49. arXiv:1912.00398  [pdf, other

    cs.CL cs.AI cs.LG

    Deep Human Answer Understanding for Natural Reverse QA

    Authors: Rujing Yao, Linlin Hou, Lei Yang, Jie Gui, Qing Yin, Ou Wu

    Abstract: This study focuses on a reverse question answering (QA) procedure, in which machines proactively raise questions and humans supply the answers. This procedure exists in many real human-machine interaction applications. However, a crucial problem in human-machine interaction is answer understanding. The existing solutions have relied on mandatory option term selection to avoid automatic answer unde… ▽ More

    Submitted 28 November, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

  50. arXiv:1910.08074  [pdf, other

    cs.CR cs.LG stat.ML

    Heterogeneous Graph Matching Networks

    Authors: Shen Wang, Zhengzhang Chen, Xiao Yu, Ding Li, Jingchao Ni, Lu-An Tang, Jiaping Gui, Zhichun Li, Haifeng Chen, Philip S. Yu

    Abstract: Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques,… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.