Skip to main content

Showing 1–27 of 27 results for author: Pu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04587  [pdf, ps, other

    cs.CV

    CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection

    Authors: Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu, Jingyun Fu, Peng Xu, Shaohong Wang, Zhihao Yang, Tianyu Pu, Eryun Liu

    Abstract: 4D radar has received significant attention in autonomous driving thanks to its robustness under adverse weathers. Due to the sparse points and noisy measurements of the 4D radar, most of the research finish the 3D object detection task by integrating images from camera and perform modality fusion in BEV space. However, the potential of the radar and the fusion mechanism is still largely unexplore… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2505.09943  [pdf, ps, other

    cs.CV

    CSPENet: Contour-Aware and Saliency Priors Embedding Network for Infrared Small Target Detection

    Authors: Jiakun Deng, Kexuan Li, Xingye Cui, Jiaxuan Li, Chang Long, Tian Pu, Zhenming Peng

    Abstract: Infrared small target detection (ISTD) plays a critical role in a wide range of civilian and military applications. Existing methods suffer from deficiencies in the localization of dim targets and the perception of contour information under dense clutter environments, severely limiting their detection performance. To tackle these issues, we propose a contour-aware and saliency priors embedding net… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2503.04376  [pdf, other

    cs.CV

    MIDAS: Modeling Ground-Truth Distributions with Dark Knowledge for Domain Generalized Stereo Matching

    Authors: Peng Xu, Zhiyu Xiang, Jingyun Fu, Tianyu Pu, Hanzhi Zhong, Eryun Liu

    Abstract: Despite the significant advances in domain generalized stereo matching, existing methods still exhibit domain-specific preferences when transferring from synthetic to real domains, hindering their practical applications in complex and diverse scenarios. The probability distributions predicted by the stereo network naturally encode rich similarity and uncertainty information. Inspired by this obser… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  4. arXiv:2412.14571  [pdf, other

    cs.CV cs.AI eess.IV

    SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection

    Authors: Ruoyu Xu, Zhiyu Xiang, Chenwei Zhang, Hanzhi Zhong, Xijun Zhao, Ruina Dang, Peng Xu, Tianyu Pu, Eryun Liu

    Abstract: 3D object detection is one of the fundamental perception tasks for autonomous vehicles. Fulfilling such a task with a 4D millimeter-wave radar is very attractive since the sensor is able to acquire 3D point clouds similar to Lidar while maintaining robust measurements under adverse weather. However, due to the high sparsity and noise associated with the radar point clouds, the performance of the e… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  5. arXiv:2412.06190  [pdf, other

    cs.CV

    Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition

    Authors: Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin

    Abstract: Benefiting from the generalization capability of CLIP, recent vision language pre-training (VLP) models have demonstrated an impressive ability to capture virtually any visual concept in daily images. However, due to the presence of unseen categories in open-vocabulary settings, existing algorithms struggle to effectively capture strong semantic correlations between categories, resulting in sub-op… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: 15 pages

  6. arXiv:2409.19599  [pdf, other

    cs.CV

    DATransNet: Dynamic Attention Transformer Network for Infrared Small Target Detection

    Authors: Chen Hu, Yian Huang, Kexuan Li, Luping Zhang, Chang Long, Yiming Zhu, Tian Pu, Zhenming Peng

    Abstract: Infrared small target detection (ISTD) is widely used in civilian and military applications. However, ISTD encounters several challenges, including the tendency for small and dim targets to be obscured by complex backgrounds. To address this issue, we propose the Dynamic Attention Transformer Network (DATransNet), which aims to extract and preserve detailed information vital for small targets. DAT… ▽ More

    Submitted 1 March, 2025; v1 submitted 29 September, 2024; originally announced September 2024.

  7. arXiv:2409.15345  [pdf

    cs.CV cs.RO

    Neuromorphic spatiotemporal optical flow: Enabling ultrafast visual perception beyond human capabilities

    Authors: Shengbo Wang, Jingwen Zhao, Tongming Pu, Liangbing Zhao, Xiaoyu Guo, Yue Cheng, Cong Li, Weihao Ma, Chenyu Tang, Zhenyu Xu, Ningli Wang, Luigi Occhipinti, Arokia Nathan, Ravinder Dahiya, Huaqiang Wu, Li Tao, Shuo Gao

    Abstract: Optical flow, inspired by the mechanisms of biological visual systems, calculates spatial motion vectors within visual scenes that are necessary for enabling robotics to excel in complex and dynamic working environments. However, current optical flow algorithms, despite human-competitive task performance on benchmark datasets, remain constrained by unacceptable time delays (~0.6 seconds per infere… ▽ More

    Submitted 30 January, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: 22 pages, 6 figures

  8. arXiv:2407.18487  [pdf, other

    cs.CV

    SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks

    Authors: Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng

    Abstract: Infrared ship detection (IRSD) has received increasing attention in recent years due to the robustness of infrared images to adverse weather. However, a large number of false alarms may occur in complex scenes. To address these challenges, we propose the Scene Semantic Prior-Assisted Multi-Task Perception Network (SMPISD-MTPNet), which includes three stages: scene semantic extraction, deep feature… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  9. arXiv:2407.06844  [pdf, other

    cs.CV

    Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

    Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

    Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: submitted to TIP

  10. arXiv:2406.00378  [pdf

    physics.app-ph cs.NE

    Real-Time State Modulation and Acquisition Circuit in Neuromorphic Memristive Systems

    Authors: Shengbo Wang, Cong Li, Tongming Pu, Jian Zhang, Weihao Ma, Luigi Occhipinti, Arokia Nathan, Shuo Gao

    Abstract: Memristive neuromorphic systems are designed to emulate human perception and cognition, where the memristor states represent essential historical information to perform both low-level and high-level tasks. However, current systems face challenges with the separation of state modulation and acquisition, leading to undesired time delays that impact real-time performance. To overcome this issue, we i… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

    Journal ref: 2024 IEEE Biomedical Circuits and Systems Conference (BioCAS)

  11. arXiv:2404.04661  [pdf, other

    cs.LG cs.AI

    Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

    Authors: Tianle Pu, Changjun Fan, Mutian Shen, Yizhou Lu, Li Zeng, Zohar Nussinov, Chao Chen, Zhong Liu

    Abstract: Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not all… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  12. arXiv:2309.13237  [pdf, other

    cs.CV

    Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation

    Authors: Tao Pu, Tianshui Chen, Hefeng Wu, Yongyi Lu, Liang Lin

    Abstract: Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video. It requires not only a comprehensive understanding of each object scattered on the whole scene but also a deep dive into their temporal motions and interactions. Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image a… ▽ More

    Submitted 15 December, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted at IEEE T-IP, 2024

  13. arXiv:2306.15612  [pdf, other

    cs.CV

    Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

    Authors: Peng Xu, Zhiyu Xiang, Chenyu Qiao, Jingyun Fu, Tianyu Pu

    Abstract: Despite the great success of deep learning in stereo matching, recovering accurate disparity maps is still challenging. Currently, L1 and cross-entropy are the two most widely used losses for stereo network training. Compared with the former, the latter usually performs better thanks to its probability modeling and direct supervision to the cost volume. However, how to accurately model the stereo… ▽ More

    Submitted 15 March, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

  14. arXiv:2211.07846  [pdf, other

    cs.CV

    Category-Adaptive Label Discovery and Noise Rejection for Multi-label Image Recognition with Partial Positive Labels

    Authors: Tao Pu, Qianru Lao, Hefeng Wu, Tianshui Chen, Liang Lin

    Abstract: As a promising solution of reducing annotation cost, training multi-label models with partial positive labels (MLR-PPL), in which merely few positive labels are known while other are missing, attracts increasing attention. Due to the absence of any negative labels, previous works regard unknown labels as negative and adopt traditional MLR algorithms. To reject noisy labels, recent works regard lar… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.13092

  15. arXiv:2205.13092  [pdf, other

    cs.CV

    Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

    Authors: Tao Pu, Tianshui Chen, Hefeng Wu, Yukai Shi, Zhijing Yang, Liang Lin

    Abstract: Despite achieving impressive progress, current multi-label image recognition (MLR) algorithms heavily depend on large-scale datasets with complete labels, making collecting large-scale datasets extremely time-consuming and labor-intensive. Training the multi-label image recognition models with partial labels (MLR-PL) is an alternative way, in which merely some labels are known while others are unk… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Technical Report. arXiv admin note: text overlap with arXiv:2203.02172

  16. arXiv:2205.11131  [pdf, other

    cs.CV

    Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

    Authors: Tianshui Chen, Tao Pu, Lingbo Liu, Yukai Shi, Zhijing Yang, Liang Lin

    Abstract: Multi-label image recognition with partial labels (MLR-PL), in which some labels are known while others are unknown for each image, may greatly reduce the cost of annotation and thus facilitate large-scale MLR. We find that strong semantic correlations exist within each image and across different images, and these correlations can help transfer the knowledge possessed by the known labels to retrie… ▽ More

    Submitted 15 July, 2024; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted at IJCV, 2024

  17. arXiv:2204.03795  [pdf, other

    cs.CV

    Semantic Representation and Dependency Learning for Multi-Label Image Recognition

    Authors: Tao Pu, Mingzhan Sun, Hefeng Wu, Tianshui Chen, Ling Tian, Liang Lin

    Abstract: Recently many multi-label image recognition (MLR) works have made significant progress by introducing pre-trained object detection models to generate lots of proposals or utilizing statistical label co-occurrence enhance the correlation among different categories. However, these works have some limitations: (1) the effectiveness of the network significantly depends on pre-trained object detection… ▽ More

    Submitted 9 January, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: accepted by Neurocomputing

  18. arXiv:2203.02172  [pdf, other

    cs.CV

    Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

    Authors: Tao Pu, Tianshui Chen, Hefeng Wu, Liang Lin

    Abstract: Training the multi-label image recognition models with partial labels, in which merely some labels are known while others are unknown for each image, is a considerably challenging and practical task. To address this task, current algorithms mainly depend on pre-training classification or similarity models to generate pseudo labels for the unknown labels. However, these algorithms depend on suffici… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted by AAAI'22

  19. arXiv:2112.10941  [pdf, other

    cs.CV

    Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

    Authors: Tianshui Chen, Tao Pu, Hefeng Wu, Yuan Xie, Liang Lin

    Abstract: Multi-label image recognition is a fundamental yet practical task because real-world images inherently possess multiple semantic labels. However, it is difficult to collect large-scale multi-label annotations due to the complexity of both the input images and output label spaces. To reduce the annotation cost, we propose a structured semantic transfer (SST) framework that enables training multi-la… ▽ More

    Submitted 4 March, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI'22

  20. arXiv:2111.03580  [pdf, other

    cs.CV

    AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection

    Authors: Tianfang Zhang, Siying Cao, Tian Pu, Zhenming Peng

    Abstract: Infrared small target detection is an important problem in many fields such as earth observation, military reconnaissance, disaster relief, and has received widespread attention recently. This paper presents the Attention-Guided Pyramid Context Network (AGPCNet) algorithm. Its main components are an Attention-Guided Context Block (AGCB), a Context Pyramid Module (CPM), and an Asymmetric Fusion Mod… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 12 pages, 13 figures, 8 tables

  21. arXiv:2101.03285  [pdf, other

    cs.CV cs.LG

    Detecting, Localising and Classifying Polyps from Colonoscopy Videos using Deep Learning

    Authors: Yu Tian, Leonardo Zorron Cheng Tao Pu, Yuyuan Liu, Gabriel Maicas, Johan W. Verjans, Alastair D. Burt, Seon Ho Shin, Rajvinder Singh, Gustavo Carneiro

    Abstract: In this paper, we propose and analyse a system that can automatically detect, localise and classify polyps from colonoscopy videos. The detection of frames with polyps is formulated as a few-shot anomaly classification problem, where the training set is highly imbalanced with the large majority of frames consisting of normal images and a small minority comprising frames with polyps. Colonoscopy vi… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

    Comments: Preprint to submit to IEEE journals

  22. arXiv:2012.15717  [pdf, other

    cs.CL

    FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation

    Authors: Wenhao Zhu, Shujian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen

    Abstract: Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios. One representative of such challenging scenarios is to deploy a translation system for a conference with a specific topic, e.g., global warming or coronavir… ▽ More

    Submitted 7 November, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  23. arXiv:2012.14587  [pdf, other

    cs.CV

    AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

    Authors: Tao Pu, Tianshui Chen, Yuan Xie, Hefeng Wu, Liang Lin

    Abstract: Recognizing human emotion/expressions automatically is quite an expected ability for intelligent robotics, as it can promote better communication and cooperation with humans. Current deep-learning-based algorithms may achieve impressive performance in some lab-controlled environments, but they always fail to recognize the expressions accurately for the uncontrolled in-the-wild situation. Fortunate… ▽ More

    Submitted 2 April, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: Accepted at ICRA 2021

  24. arXiv:2008.00923  [pdf, other

    cs.CV

    Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning

    Authors: Tianshui Chen, Tao Pu, Hefeng Wu, Yuan Xie, Lingbo Liu, Liang Lin

    Abstract: To address the problem of data inconsistencies among different facial expression recognition (FER) datasets, many cross-domain FER methods (CD-FERs) have been extensively devised in recent years. Although each declares to achieve superior performance, fair comparisons are lacking due to the inconsistent choices of the source/target datasets and feature extractors. In this work, we first analyze th… ▽ More

    Submitted 30 November, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted at T-PAMI, 2021. arXiv admin note: text overlap with arXiv:2008.00859

  25. arXiv:2008.00859  [pdf, other

    cs.CV

    Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

    Authors: Yuan Xie, Tianshui Chen, Tao Pu, Hefeng Wu, Liang Lin

    Abstract: Data inconsistency and bias are inevitable among different facial expression recognition (FER) datasets due to subjective annotating process and different collecting conditions. Recent works resort to adversarial mechanisms that learn domain-invariant features to mitigate domain shift. However, most of these works focus on holistic feature adaptation, and they ignore local features that are more t… ▽ More

    Submitted 4 August, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted at ACM MM 2020

  26. arXiv:2006.14811  [pdf, other

    cs.CV

    Few-Shot Anomaly Detection for Polyp Frames from Colonoscopy

    Authors: Yu Tian, Gabriel Maicas, Leonardo Zorron Cheng Tao Pu, Rajvinder Singh, Johan W. Verjans, Gustavo Carneiro

    Abstract: Anomaly detection methods generally target the learning of a normal image distribution (i.e., inliers showing healthy cases) and during testing, samples relatively far from the learned distribution are classified as anomalies (i.e., outliers showing disease cases). These approaches tend to be sensitive to outliers that lie relatively close to inliers (e.g., a colonoscopy image with a small polyp).… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

    Comments: Accept at MICCAI 2020

  27. arXiv:1910.10345  [pdf, other

    eess.IV cs.CV

    Unsupervised Dual Adversarial Learning for Anomaly Detection in Colonoscopy Video Frames

    Authors: Yuyuan Liu, Yu Tian, Gabriel Maicas, Leonardo Z. C. T. Pu, Rajvinder Singh, Johan W. Verjans, Gustavo Carneiro

    Abstract: The automatic detection of frames containing polyps from a colonoscopy video sequence is an important first step for a fully automated colonoscopy analysis tool. Typically, such detection system is built using a large annotated data set of frames with and without polyps, which is expensive to be obtained. In this paper, we introduce a new system that detects frames containing polyps as anomalies f… ▽ More

    Submitted 6 February, 2021; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: Accepted by ISBI 2020