Skip to main content

Showing 1–46 of 46 results for author: Fang, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.05952  [pdf, ps, other

    cs.CV cs.AI

    MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation

    Authors: Dongjie Fu, Tengjiao Sun, Pengcheng Fang, Xiaohao Cai, Hansung Kim

    Abstract: Recent advances in transformer-based text-to-motion generation have led to impressive progress in synthesizing high-quality human motion. Nevertheless, jointly achieving high fidelity, streaming capability, real-time responsiveness, and scalability remains a fundamental challenge. In this paper, we propose MOGO (Motion Generation with One-pass), a novel autoregressive framework tailored for effici… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 9 pages, 4 figures, conference

  2. arXiv:2504.13534  [pdf, other

    cs.CL cs.AI

    CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models

    Authors: Feiyang Li, Peng Fang, Zhan Shi, Arijit Khan, Fang Wang, Dan Feng, Weihao Wang, Xin Zhang, Yongjian Cui

    Abstract: Chain-of-thought (CoT) reasoning boosts large language models' (LLMs) performance on complex tasks but faces two key limitations: a lack of reliability when solely relying on LLM-generated reasoning chains and interference from natural language reasoning steps with the models' inference process, also known as the inference logic of LLMs. To address these issues, we propose CoT-RAG, a novel reasoni… ▽ More

    Submitted 18 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  3. arXiv:2412.11017  [pdf, other

    cs.LG cs.CV

    On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning

    Authors: Pengfei Fang, Yongchun Qin, Hui Xue

    Abstract: Few-shot Class-Incremental Learning (FSCIL) addresses the challenges of evolving data distributions and the difficulty of data acquisition in real-world scenarios. To counteract the catastrophic forgetting typically encountered in FSCIL, knowledge distillation is employed as a way to maintain the knowledge from learned data distribution. Recognizing the limitations of generating discriminative fea… ▽ More

    Submitted 17 December, 2024; v1 submitted 14 December, 2024; originally announced December 2024.

  4. arXiv:2412.10900  [pdf, other

    cs.LG cs.CV

    PEARL: Input-Agnostic Prompt Enhancement with Negative Feedback Regulation for Class-Incremental Learning

    Authors: Yongchun Qin, Pengfei Fang, Hui Xue

    Abstract: Class-incremental learning (CIL) aims to continuously introduce novel categories into a classification system without forgetting previously learned ones, thus adapting to evolving data distributions. Researchers are currently focusing on leveraging the rich semantic information of pre-trained models (PTMs) in CIL tasks. Prompt learning has been adopted in CIL for its ability to adjust data distrib… ▽ More

    Submitted 25 December, 2024; v1 submitted 14 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI-25

  5. arXiv:2412.09073  [pdf, other

    cs.CV cs.LG

    SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning

    Authors: Wenqian Li, Pengfei Fang, Hui Xue

    Abstract: Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from seen source domains to unseen target domains, which is crucial for evaluating the generalization and robustness of models. Recent studies focus on utilizing visual styles to bridge the domain gap between different domains. However, the serious dilemma of gradient instability and local optimization problem occurs in those style… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  6. arXiv:2411.11244  [pdf, other

    cs.GR cs.CG cs.PF

    gDist: Efficient Distance Computation between 3D Meshes on GPU

    Authors: Peng Fang, Wei Wang, Ruofeng Tong, Hailong Li, Min Tang

    Abstract: Computing maximum/minimum distances between 3D meshes is crucial for various applications, i.e., robotics, CAD, VR/AR, etc. In this work, we introduce a highly parallel algorithm (gDist) optimized for Graphics Processing Units (GPUs), which is capable of computing the distance between two meshes with over 15 million triangles in less than 0.4 milliseconds (Fig. 1). By testing on benchmarks with va… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  7. HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections

    Authors: Jiaxing Hao, Yanxi Wang, Zhigang Chang, Hongmin Gao, Zihao Cheng, Chen Wu, Xin Zhao, Peiye Fang, Rachmat Muwardi

    Abstract: Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interferen… ▽ More

    Submitted 23 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

  8. arXiv:2408.09722  [pdf, other

    cs.LG stat.ML

    Towards Few-Shot Learning in the Open World: A Review and Beyond

    Authors: Hui Xue, Yuexuan An, Yongchun Qin, Wenqian Li, Yixin Wu, Yongjuan Che, Pengfei Fang, Minling Zhang

    Abstract: Human intelligence is characterized by our ability to absorb and apply knowledge from the world around us, especially in rapidly acquiring new concepts from minimal examples, underpinned by prior knowledge. Few-shot learning (FSL) aims to mimic this capacity by enabling significant generalizations and transferability. However, traditional FSL frameworks often rely on assumptions of clean, complete… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  9. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 17 December, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  10. arXiv:2401.14832  [pdf, other

    cs.CV

    Text Image Inpainting via Global Structure-Guided Diffusion Models

    Authors: Shipeng Zhu, Pengfei Fang, Chenjie Zhu, Zuoyan Zhao, Qiang Xu, Hui Xue

    Abstract: Real-world text can be damaged by corrosion issues caused by environmental or human factors, which hinder the preservation of the complete styles of texts, e.g., texture and structure. These corrosion issues, such as graffiti signs and incomplete signatures, bring difficulties in understanding the texts, thereby posing significant challenges to downstream applications, e.g., scene text recognition… ▽ More

    Submitted 1 August, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI-24

  11. arXiv:2401.00430  [pdf, other

    cs.AI

    Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy

    Authors: Weijian Mai, Jian Zhang, Pengfei Fang, Zhijun Zhang

    Abstract: In the era of Artificial Intelligence Generated Content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image, text-to-video, text-to-audio, etc) are gradually reshaping the natural content in the real world. The key to multimodal synthesis technology is to establish the mapping relationship between different modalities. Brain signals, serving as potential reflections of how t… ▽ More

    Submitted 3 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  12. arXiv:2311.17955  [pdf, other

    cs.CV

    PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

    Authors: Zuoyan Zhao, Hui Xue, Pengfei Fang, Shipeng Zhu

    Abstract: Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images, thus boosting the performance of the downstream recognition task. Two factors in scene text images, visual structure and semantic information, affect the recognition performance significantly. To mitigate the effects from these factors, this paper proposes… ▽ More

    Submitted 23 July, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted by ACMMM 2024

  13. arXiv:2308.12558  [pdf, other

    cs.CV

    Hyperbolic Audio-visual Zero-shot Learning

    Authors: Jie Hong, Zeeshan Hayder, Junlin Han, Pengfei Fang, Mehrtash Harandi, Lars Petersson

    Abstract: Audio-visual zero-shot learning aims to classify samples consisting of a pair of corresponding audio and video sequences from classes that are not present during training. An analysis of the audio-visual data reveals a large degree of hyperbolicity, indicating the potential benefit of using a hyperbolic transformation to achieve curvature-aware geometric learning, with the aim of exploring more co… ▽ More

    Submitted 16 December, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  14. arXiv:2306.07106  [pdf, other

    cs.LG cs.AI cs.GT cs.IR

    Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning

    Authors: Haozhe Wang, Chao Du, Panyan Fang, Li He, Liang Wang, Bo Zheng

    Abstract: The proliferation of the Internet has led to the emergence of online advertising, driven by the mechanics of online auctions. In these repeated auctions, software agents participate on behalf of aggregated advertisers to optimize for their long-term utility. To fulfill the diverse demands, bidding strategies are employed to optimize advertising objectives subject to different spending constraints.… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted by SIGKDD2023

  15. arXiv:2305.12691  [pdf, other

    cs.CV

    Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation

    Authors: Yuxia Chen, Pengcheng Fang, Jianhui Yu, Xiaoling Zhong, Xiaoming Zhang, Tianrui Li

    Abstract: High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas. However, objects of the same category within HRS images generally show significant differences in scale and shape across diverse geographical environments, making it difficult to fit the data distribution. Additionally, a complex background environment causes similar appearances of… ▽ More

    Submitted 14 August, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

  16. arXiv:2304.10764  [pdf, other

    cs.CV

    Hyperbolic Geometry in Computer Vision: A Survey

    Authors: Pengfei Fang, Mehrtash Harandi, Trung Le, Dinh Phung

    Abstract: Hyperbolic geometry, a Riemannian manifold endowed with constant sectional negative curvature, has been considered an alternative embedding space in many learning scenarios, \eg, natural language processing, graph learning, \etc, as a result of its intriguing property of encoding the data's hierarchical structure (like irregular graph or tree-likeness data). Recent studies prove that such data hie… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: First survey paper for the hyperbolic geometry in CV applications

  17. arXiv:2303.15702  [pdf, ps, other

    cs.DC cs.LG

    Distributed Graph Embedding with Information-Oriented Random Walks

    Authors: Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, Yuchao Cao

    Abstract: Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability.… ▽ More

    Submitted 25 February, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

    Journal ref: 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada - August 28 to September 1, 2023

  18. arXiv:2302.10414  [pdf, other

    cs.CV

    Improving Scene Text Image Super-resolution via Dual Prior Modulation Network

    Authors: Shipeng Zhu, Zuoyan Zhao, Pengfei Fang, Hui Xue

    Abstract: Scene text image super-resolution (STISR) aims to simultaneously increase the resolution and legibility of the text images, and the resulting images will significantly affect the performance of downstream tasks. Although numerous progress has been made, existing approaches raise two crucial issues: (1) They neglect the global structure of the text, which bounds the semantic determinism of the scen… ▽ More

    Submitted 29 November, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted by AAAI-2023

  19. arXiv:2301.08170  [pdf, other

    cs.LG cs.AI cs.CR

    On the Vulnerability of Backdoor Defenses for Federated Learning

    Authors: Pei Fang, Jinghui Chen

    Abstract: Federated Learning (FL) is a popular distributed machine learning paradigm that enables jointly training a global model without sharing clients' data. However, its repetitive server-client communication gives room for backdoor attacks with aim to mislead the global model into a targeted misprediction when a specific trigger pattern is presented. In response to such backdoor threats on federated le… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: Accepted by AAAI 2023 (15 pages, 12 figures, 7 tables)

  20. arXiv:2211.07625  [pdf, other

    cs.CV cs.AI cs.LG

    What Images are More Memorable to Machines?

    Authors: Junlin Han, Huangying Zhan, Jie Hong, Pengfei Fang, Hongdong Li, Lars Petersson, Ian Reid

    Abstract: This paper studies the problem of measuring and predicting how memorable an image is to pattern recognition machines, as a path to explore machine intelligence. Firstly, we propose a self-supervised machine memory quantification pipeline, dubbed ``MachineMem measurer'', to collect machine memorability scores of images. Similar to humans, machines also tend to memorize certain kinds of images, wher… ▽ More

    Submitted 11 July, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Code: https://github.com/JunlinHan/MachineMem Project page: https://junlinhan.github.io/projects/machinemem.html

  21. arXiv:2209.14026  [pdf, other

    cs.RO cs.HC

    Human-in-the-loop Robotic Grasping using BERT Scene Representation

    Authors: Yaoxian Song, Penglei Sun, Pengfei Fang, Linyi Yang, Yanghua Xiao, Yue Zhang

    Abstract: Current NLP techniques have been greatly applied in different domains. In this paper, we propose a human-in-the-loop framework for robotic grasping in cluttered scenes, investigating a language interface to the grasping process, which allows the user to intervene by natural language commands. This framework is constructed on a state-of-the-art rasping baseline, where we substitute a scene-graph re… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: 15 pages, 10 figures, Coling2022 Oral

  22. arXiv:2208.01188  [pdf, other

    cs.CV

    Curved Geometric Networks for Visual Anomaly Recognition

    Authors: Jie Hong, Pengfei Fang, Weihao Li, Junlin Han, Lars Petersson, Mehrtash Harandi

    Abstract: Learning a latent embedding to understand the underlying nature of data distribution is often formulated in Euclidean spaces with zero curvature. However, the success of the geometry constraints, posed in the embedding space, indicates that curved spaces might encode more structural information, leading to better discriminative power and hence richer representations. In this work, we investigate b… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

  23. arXiv:2206.06803  [pdf, other

    cs.CV eess.IV

    Asymmetric Dual-Decoder U-Net for Joint Rain and Haze Removal

    Authors: Yuan Feng, Yaojun Hu, Pengfei Fang, Yanhong Yang, Sheng Liu, Shengyong Chen

    Abstract: This work studies the joint rain and haze removal problem. In real-life scenarios, rain and haze, two often co-occurring common weather phenomena, can greatly degrade the clarity and quality of the scene images, leading to a performance drop in the visual applications, such as autonomous driving. However, jointly removing the rain and haze in scene images is ill-posed and challenging, where the ex… ▽ More

    Submitted 21 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 12 pages, 35 figures

  24. ROI-Constrained Bidding via Curriculum-Guided Bayesian Reinforcement Learning

    Authors: Haozhe Wang, Chao Du, Panyan Fang, Shuo Yuan, Xuming He, Liang Wang, Bo Zheng

    Abstract: Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems. Advertisers employ bidding strategies in RTB to optimize their advertising effects subject to various financial requirements, especially the return-on-investment (ROI) constraint. ROIs change non-monotonically during the sequential bidding process, and often induce a see-saw effect between constraint satisfacti… ▽ More

    Submitted 16 July, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Accepted by SIGKDD 2022

  25. arXiv:2203.12116  [pdf, other

    cs.CV cs.RO

    GOSS: Towards Generalized Open-set Semantic Segmentation

    Authors: Jie Hong, Weihao Li, Junlin Han, Jiyang Zheng, Pengfei Fang, Mehrtash Harandi, Lars Petersson

    Abstract: In this paper, we present and study a new image segmentation task, called Generalized Open-set Semantic Segmentation (GOSS). Previously, with the well-known open-set semantic segmentation (OSS), the intelligent agent only detects the unknown regions without further processing, limiting their perception of the environment. It stands to reason that a further analysis of the detected unknown pixels w… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  26. arXiv:2203.03442  [pdf, other

    cs.CL cs.AI

    Towards Automated Real-time Evaluation in Text-based Counseling

    Authors: Anqi Li, Jingsong Ma, Lizhi Ma, Pengfei Fang, Hongliang He, Zhenzhong Lan

    Abstract: Automated real-time evaluation of counselor-client interaction is important for ensuring quality counseling but the rules are difficult to articulate. Recent advancements in machine learning methods show the possibility of learning such rules automatically. However, these methods often demand large scale and high quality counseling data, which are difficult to collect. To address this issue, we bu… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 15 pages, 4 figures

  27. arXiv:2201.12078  [pdf, other

    cs.CV cs.LG

    You Only Cut Once: Boosting Data Augmentation with a Single Cut

    Authors: Junlin Han, Pengfei Fang, Weihao Li, Jie Hong, Mohammad Ali Armin, Ian Reid, Lars Petersson, Hongdong Li

    Abstract: We present You Only Cut Once (YOCO) for performing data augmentations. YOCO cuts one image into two pieces and performs data augmentations individually within each piece. Applying YOCO improves the diversity of the augmentation per sample and encourages neural networks to recognize objects from partial information. YOCO enjoys the properties of parameter-free, easy usage, and boosting almost all a… ▽ More

    Submitted 15 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: ICML 2022, Code: https://github.com/JunlinHan/YOCO

  28. arXiv:2112.03494  [pdf, other

    cs.CV

    Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning

    Authors: Rongkai Ma, Pengfei Fang, Gil Avraham, Yan Zuo, Tianyu Zhu, Tom Drummond, Mehrtash Harandi

    Abstract: Learning and generalizing to novel concepts with few samples (Few-Shot Learning) is still an essential challenge to real-world applications. A principle way of achieving few-shot learning is to realize a model that can rapidly adapt to the context of a given task. Dynamic networks have been shown capable of learning content-adaptive parameters efficiently, making them suitable for few-shot learnin… ▽ More

    Submitted 12 July, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: ECCV2022

  29. arXiv:2112.01719  [pdf, other

    cs.CV cs.LG

    Adaptive Poincaré Point to Set Distance for Few-Shot Classification

    Authors: Rongkai Ma, Pengfei Fang, Tom Drummond, Mehrtash Harandi

    Abstract: Learning and generalizing from limited examples, i,e, few-shot learning, is of core importance to many real-world vision applications. A principal way of achieving few-shot learning is to realize an embedding where samples from different classes are distinctive. Recent studies suggest that embedding via hyperbolic geometry enjoys low distortion for hierarchical and structured data, making it suita… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: Accepted at AAAI2022

  30. TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

    Authors: Lin Cheng, Pengfei Fang, Yanjie Liang, Liao Zhang, Chunhua Shen, Hanzi Wang

    Abstract: The explanation for deep neural networks has drawn extensive attention in the deep learning community over the past few years. In this work, we study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks. Compared to iteration based saliency methods, single backward pass based saliency methods benefit from faster speed, and they are widely used in downstream vi… ▽ More

    Submitted 6 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted by IEEE Transactions on Image Processing. Index Terms: Model interpretability, explanation, saliency map, CNN visualization

  31. arXiv:2109.09300  [pdf, other

    cs.LG cs.CV

    Feature Correlation Aggregation: on the Path to Better Graph Neural Networks

    Authors: Jieming Zhou, Tong Zhang, Pengfei Fang, Lars Petersson, Mehrtash Harandi

    Abstract: Prior to the introduction of Graph Neural Networks (GNNs), modeling and analyzing irregular data, particularly graphs, was thought to be the Achilles' heel of deep learning. The core concept of GNNs is to find a representation by recursively aggregating the representations of a central node and those of its neighbors. The core concept of GNNs is to find a representation by recursively aggregating… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  32. arXiv:2108.11364  [pdf, other

    cs.CV eess.IV

    Blind Image Decomposition

    Authors: Junlin Han, Weihao Li, Pengfei Fang, Chunyi Sun, Jie Hong, Mohammad Ali Armin, Lars Petersson, Hongdong Li

    Abstract: We propose and study a novel task named Blind Image Decomposition (BID), which requires separating a superimposed image into constituent underlying images in a blind setting, that is, both the source components involved in mixing as well as the mixing mechanism are unknown. For example, rain may consist of multiple components, such as rain streaks, raindrops, snow, and haze. Rainy images can be tr… ▽ More

    Submitted 18 July, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: ECCV 2022. Project page: https://junlinhan.github.io/projects/BID.html. Code: https://github.com/JunlinHan/BID

  33. arXiv:2106.15288  [pdf, other

    cs.CV

    MFR 2021: Masked Face Recognition Competition

    Authors: Fadi Boutros, Naser Damer, Jan Niklas Kolf, Kiran Raja, Florian Kirchbuchner, Raghavendra Ramachandra, Arjan Kuijper, Pengcheng Fang, Chao Zhang, Fei Wang, David Montero, Naiara Aginako, Basilio Sierra, Marcos Nieto, Mustafa Ekrem Erakin, Ugur Demir, Hazim Kemal, Ekenel, Asaki Kataoka, Kohei Ichikawa, Shizuma Kubo, Jie Zhang, Mingjie He, Dan Han, Shiguang Shan , et al. (10 additional authors not shown)

    Abstract: This paper presents a summary of the Masked Face Recognition Competitions (MFR) held within the 2021 International Joint Conference on Biometrics (IJCB 2021). The competition attracted a total of 10 participating teams with valid submissions. The affiliations of these teams are diverse and associated with academia and industry in nine different countries. These teams successfully submitted 18 vali… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: Accepted at International Join Conference on Biometrics (IJCB 2021)

  34. arXiv:2106.01263  [pdf, other

    cs.CL cs.AI

    Uni-Encoder: A Fast and Accurate Response Selection Paradigm for Generation-Based Dialogue Systems

    Authors: Chiyu Song, Hongliang He, Haofei Yu, Pengfei Fang, Leyang Cui, Zhenzhong Lan

    Abstract: Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to the… ▽ More

    Submitted 15 May, 2023; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted to the Findings of ACL 2023

  35. arXiv:2105.03591  [pdf, other

    cs.LG cs.AI cs.NI

    Loss Tolerant Federated Learning

    Authors: Pengyuan Zhou, Pei Fang, Pan Hui

    Abstract: Federated learning has attracted attention in recent years for collaboratively training data on distributed devices with privacy-preservation. The limited network capacity of mobile and IoT devices has been seen as one of the major challenges for cross-device federated learning. Recent solutions have been focusing on threshold-based client selection schemes to guarantee the communication efficienc… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

  36. arXiv:2104.04192  [pdf, other

    cs.CV

    Reinforced Attention for Few-Shot Learning and Beyond

    Authors: Jie Hong, Pengfei Fang, Weihao Li, Tong Zhang, Christian Simon, Mehrtash Harandi, Lars Petersson

    Abstract: Few-shot learning aims to correctly recognize query samples from unseen classes given a limited number of support samples, often by relying on global embeddings of images. In this paper, we propose to equip the backbone network with an attention agent, which is trained by reinforcement learning. The policy gradient algorithm is employed to train the agent towards adaptively localizing the represen… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

  37. arXiv:2103.04059  [pdf, other

    cs.CV

    Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning

    Authors: Ali Cheraghian, Shafin Rahman, Pengfei Fang, Soumava Kumar Roy, Lars Petersson, Mehrtash Harandi

    Abstract: Few-shot class incremental learning (FSCIL) portrays the problem of learning new concepts gradually, where only a few examples per concept are available to the learner. Due to the limited number of examples for training, the techniques developed for standard incremental learning cannot be applied verbatim to FSCIL. In this work, we introduce a distillation algorithm to address the problem of FSCIL… ▽ More

    Submitted 30 March, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

    Comments: Accepted at CVPR 2021

  38. arXiv:2011.00774  [pdf, other

    cs.CV

    Set Augmented Triplet Loss for Video Person Re-Identification

    Authors: Pengfei Fang, Pan Ji, Lars Petersson, Mehrtash Harandi

    Abstract: Modern video person re-identification (re-ID) machines are often trained using a metric learning approach, supervised by a triplet loss. The triplet loss used in video re-ID is usually based on so-called clip features, each aggregated from a few frame features. In this paper, we propose to model the video clip as a set and instead study the distance between sets in the corresponding triplet loss.… ▽ More

    Submitted 6 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: to appear in WACV 2021

  39. arXiv:2010.03108  [pdf, other

    cs.CV cs.LG

    Channel Recurrent Attention Networks for Video Pedestrian Retrieval

    Authors: Pengfei Fang, Pan Ji, Jieming Zhou, Lars Petersson, Mehrtash Harandi

    Abstract: Full attention, which generates an attention value per element of the input feature maps, has been successfully demonstrated to be beneficial in visual tasks. In this work, we propose a fully attentional network, termed {\it channel recurrent attention network}, for the task of video pedestrian retrieval. The main attention unit, \textit{channel recurrent attention}, identifies attention maps at t… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: To appear in ACCV 2020

  40. arXiv:2009.02557  [pdf, other

    cs.LG cs.AI stat.ML

    FLFE: A Communication-Efficient and Privacy-Preserving Federated Feature Engineering Framework

    Authors: Pei Fang, Zhendong Cai, Hui Chen, QingJiang Shi

    Abstract: Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques and is a key step to improve the performance of machine learning algorithms. In the multi-party feature engineering scenario (features are stored in many different IoT devices), direct and unlimited multivariate feature transformations will quickly exhaust memory, power, and ba… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

    Comments: 11pages, multi-party feature engineering problem

  41. arXiv:2006.09597  [pdf, other

    cs.CV

    Cross-Correlated Attention Networks for Person Re-Identification

    Authors: Jieming Zhou, Soumava Kumar Roy, Pengfei Fang, Mehrtash Harandi, Lars Petersson

    Abstract: Deep neural networks need to make robust inference in the presence of occlusion, background clutter, pose and viewpoint variations -- to name a few -- when the task of person re-identification is considered. Attention mechanisms have recently proven to be successful in handling the aforementioned challenges to some degree. However previous designs fail to capture inherent inter-dependencies betwee… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: Accepted by Image and Vision Computing

    Journal ref: Image and Vision Computing, Vol. 100, 2020, p. 103931

  42. arXiv:2005.05106  [pdf, other

    cs.SD eess.AS

    Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

    Authors: Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie

    Abstract: In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech. Specifically, we improve the original MelGAN by the following aspects. First, we increase the receptive field of the generator, which is proven to be beneficial to speech generation. Second, we substitute the feature matching loss with the multi-resolution STFT loss to bet… ▽ More

    Submitted 17 November, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech2020

  43. arXiv:2003.12143  [pdf

    cs.CY

    Coronavirus Geographic Dissemination at Chicago and its Potential Proximity to Public Commuter Rail

    Authors: Peter Fang

    Abstract: The community spread of coronavirus at great Chicago area has severely threatened the residents health, family and normal activities. CDC daily updates on infected cases on County level are not satisfying to address publics concern on virus spread. On March 20th, NBC5 published case information of 435 coronavirus infections. The data is relative comprehensive and of high value for understanding on… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

  44. arXiv:1911.03464  [pdf, other

    eess.IV cs.CV

    Perception-oriented Single Image Super-Resolution via Dual Relativistic Average Generative Adversarial Networks

    Authors: Yuan Ma, Kewen Liu, Hongxia Xiong, Panpan Fang, Xiaojun Li, Yalei Chen, Chaoyang Liu

    Abstract: The presence of residual and dense neural networks which greatly promotes the development of image Super-Resolution(SR) have witnessed a lot of impressive results. Depending on our observation, although more layers and connections could always improve performance, the increase of model parameters is not conducive to launch application of SR algorithms. Furthermore, algorithms supervised by L1/L2 l… ▽ More

    Submitted 20 February, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: Re-submit after codes reviewing

  45. arXiv:1906.06575   

    eess.IV cs.CV

    Single Image Super-resolution via Dense Blended Attention Generative Adversarial Network for Clinical Diagnosis

    Authors: Kewen Liu, Yuan Ma, Hongxia Xiong, Zejun Yan, Zhijun Zhou, Chaoyang Liu, Panpan Fang, Xiaojun Li, Yalei Chen

    Abstract: During training phase, more connections (e.g. channel concatenation in last layer of DenseNet) means more occupied GPU memory and lower GPU utilization, requiring more training time. The increase of training time is also not conducive to launch application of SR algorithms. This's why we abandoned DenseNet as basic network. Futhermore, we abandoned this paper due to its limitation only applied on… ▽ More

    Submitted 23 February, 2020; v1 submitted 15 June, 2019; originally announced June 2019.

    Comments: We abandoned this paper due to its limitation only applied on medical images, please view our lastest work at arXiv:1911.03464

  46. arXiv:1905.05084  [pdf, other

    cs.CV physics.med-ph

    Medical image super-resolution method based on dense blended attention network

    Authors: Kewen Liu, Yuan Ma, Hongxia Xiong, Zejun Yan, Zhijun Zhou, Panpan Fang, Chaoyang Liu

    Abstract: In order to address the issue that medical image would suffer from severe blurring caused by the lack of high-frequency details in the process of image super-resolution reconstruction, a novel medical image super-resolution method based on dense neural network and blended attention mechanism is proposed. The proposed method adds blended attention blocks to dense neural network(DenseNet), so that t… ▽ More

    Submitted 13 May, 2019; originally announced May 2019.

    Comments: 12 pages, 4 figures, 32 references