Skip to main content

Showing 1–41 of 41 results for author: Tao, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.05794  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking

    Authors: Yiming Li, Kaiying Yan, Shuo Shao, Tongqing Zhai, Shu-Tao Xia, Zhan Qin, Dacheng Tao

    Abstract: With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW)… ▽ More

    Submitted 5 April, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

    Comments: 14 pages. The journal extension of our ICASSP'21 paper (arXiv:2010.11607)

  2. arXiv:2501.04515  [pdf, other

    eess.IV cs.CV cs.RO

    SplineFormer: An Explainable Transformer-Based Approach for Autonomous Endovascular Navigation

    Authors: Tudor Jianu, Shayan Doust, Mengyun Li, Baoru Huang, Tuong Do, Hoan Nguyen, Karl Bates, Tung D. Ta, Sebastiano Fichera, Pierre Berthet-Rayne, Anh Nguyen

    Abstract: Endovascular navigation is a crucial aspect of minimally invasive procedures, where precise control of curvilinear instruments like guidewires is critical for successful interventions. A key challenge in this task is accurately predicting the evolving shape of the guidewire as it navigates through the vasculature, which presents complex deformations due to interactions with the vessel walls. Tradi… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: 8 pages

  3. arXiv:2410.03798  [pdf, other

    cs.CL cs.SD eess.AS

    Self-Powered LLM Modality Expansion for Large Speech-Text Models

    Authors: Tengfei Yu, Xuebo Liu, Zhiyi Hou, Liang Ding, Dacheng Tao, Min Zhang

    Abstract: Large language models (LLMs) exhibit remarkable performance across diverse tasks, indicating their potential for expansion into large speech-text models (LSMs) by integrating speech capabilities. Although unified speech-text pre-training and multimodal data instruction-tuning offer considerable benefits, these methods generally entail significant resource demands and tend to overfit specific tasks… ▽ More

    Submitted 13 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024

  4. arXiv:2409.02466  [pdf, other

    eess.AS cs.SD

    CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research

    Authors: Dehua Tao, Harold Chui, Sarah Luk, Tan Lee

    Abstract: Psychotherapy or counseling is typically conducted through spoken conversation between a therapist and a client. Analyzing the speech characteristics of psychotherapeutic interactions can help understand the factors associated with effective psychotherapy. This paper introduces CUEMPATHY, a large-scale speech dataset collected from actual counseling sessions. The dataset consists of 156 counseling… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted by ISCSLP 2022

  5. arXiv:2407.00717  [pdf, other

    cs.LG cs.AI eess.SY

    Learning System Dynamics without Forgetting

    Authors: Xikun Zhang, Dongjin Song, Yushan Jiang, Yixin Chen, Dacheng Tao

    Abstract: Observation-based trajectory prediction for systems with unknown dynamics is essential in fields such as physics and biology. Most existing approaches are limited to learning within a single system with fixed dynamics patterns. However, many real-world applications require learning across systems with evolving dynamics patterns, a challenge that has been largely overlooked. To address this, we sys… ▽ More

    Submitted 24 February, 2025; v1 submitted 30 June, 2024; originally announced July 2024.

  6. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes,… ▽ More

    Submitted 1 April, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE TPAMI. Project website: https://whu-sigma.github.io/HyperSIGMA

  7. arXiv:2406.08989  [pdf, other

    eess.AS cs.SD

    ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

    Authors: Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee

    Abstract: Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrec… ▽ More

    Submitted 3 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  8. arXiv:2403.10980  [pdf, other

    cs.GT eess.SY math.OC

    Inverse learning of black-box aggregator for robust Nash equilibrium

    Authors: Guanpu Chen, Gehui Xu, Fengxiang He, Dacheng Tao, Thomas Parisini, Karl Henrik Johansson

    Abstract: In this note, we investigate the robustness of Nash equilibria (NE) in multi-player aggregative games with coupling constraints. There are many algorithms for computing an NE of an aggregative game given a known aggregator. When the coupling parameters are affected by uncertainty, robust NE need to be computed. We consider a scenario where players' weight in the aggregator is unknown, making the a… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  9. arXiv:2403.04228  [pdf, other

    cs.CV eess.IV

    Single-Image HDR Reconstruction Assisted Ghost Suppression and Detail Preservation Network for Multi-Exposure HDR Imaging

    Authors: Huafeng Li, Zhenmei Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yu

    Abstract: The reconstruction of high dynamic range (HDR) images from multi-exposure low dynamic range (LDR) images in dynamic scenes presents significant challenges, especially in preserving and restoring information in oversaturated regions and avoiding ghosting artifacts. While current methods often struggle to address these challenges, our work aims to bridge this gap by developing a multi-exposure HDR i… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: IEEE Transactions on Computational Imaging

  10. arXiv:2311.13254  [pdf, other

    cs.CV cs.AI eess.IV

    Unified Domain Adaptive Semantic Segmentation

    Authors: Zhe Zhang, Gaochang Wu, Jing Zhang, Xiatian Zhu, Dacheng Tao, Tianyou Chai

    Abstract: Unsupervised Domain Adaptive Semantic Segmentation (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain. The majority of existing UDA-SS works typically consider images whilst recent attempts have extended further to tackle videos by modeling the temporal dimension. Although the two lines of research share the major challenges -- overcoming the under… ▽ More

    Submitted 17 April, 2025; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 17 pages,11 figures, 11 tables. Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

  11. arXiv:2310.14181  [pdf, other

    eess.AS

    A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: Counseling is carried out as spoken conversation between a therapist and a client. The empathy level expressed by the therapist is considered an important index of the quality of counseling and often assessed by an observer or the client. This research investigates the entrainment of speech prosody in relation to subjectively rated empathy. Experimental results show that the entrainment of intensi… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted by INTERSPEECH 2023

  12. arXiv:2310.14178  [pdf, other

    eess.AS

    Modeling Intrapersonal and Interpersonal Influences for Automatic Estimation of Therapist Empathy in Counseling Conversation

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: Counseling is usually conducted through spoken conversation between a therapist and a client. The empathy level of therapist is a key indicator of outcomes. Presuming that therapist's empathy expression is shaped by their past behavior and their perception of the client's behavior, we propose a model to estimate the therapist empathy by considering both intrapersonal and interpersonal influences.… ▽ More

    Submitted 4 September, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024

  13. arXiv:2307.12180  [pdf, other

    eess.IV cs.CV cs.LG

    Prototype-Driven and Multi-Expert Integrated Multi-Modal MR Brain Tumor Image Segmentation

    Authors: Yafei Zhang, Zhiyuan Li, Huafeng Li, Dapeng Tao

    Abstract: For multi-modal magnetic resonance (MR) brain tumor image segmentation, current methods usually directly extract the discriminative features from input images for tumor sub-region category determination and localization. However, the impact of information aliasing caused by the mutual inclusion of tumor sub-regions is often ignored. Moreover, existing methods usually do not take tailored efforts t… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

  14. arXiv:2306.02913  [pdf, other

    cs.LG cs.CY cs.DC eess.SY stat.ML

    Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

    Authors: Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao

    Abstract: Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-S… ▽ More

    Submitted 9 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 40th International Conference on Machine Learning (ICML 2023)

  15. arXiv:2305.16690  [pdf, other

    eess.AS

    Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: Counseling is an activity of conversational speaking between a therapist and a client. Therapist empathy is an essential indicator of counseling quality and assessed subjectively by considering the entire conversation. This paper proposes to encode long counseling conversation using a hierarchical attention network. Conversations with extreme values of empathy rating are used to train a Siamese ne… ▽ More

    Submitted 4 September, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2024

  16. arXiv:2305.01899  [pdf, other

    cs.AI cs.CY eess.IV

    Empowering Agrifood System with Artificial Intelligence: A Survey of the Progress, Challenges and Opportunities

    Authors: Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, Dacheng Tao

    Abstract: With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep learning (DL) have demonstrated their strong abilities in various areas, including language, vision, remote sensing (RS), and agrifood systems applicati… ▽ More

    Submitted 26 September, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted by ACM Computing Surveys

  17. arXiv:2303.06682  [pdf, other

    cs.CV eess.IV

    DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration

    Authors: Yuchun Miao, Lefei Zhang, Liangpei Zhang, Dacheng Tao

    Abstract: Diffusion models have recently received a surge of interest due to their impressive performance for image restoration, especially in terms of noise robustness. However, existing diffusion-based methods are trained on a large amount of training data and perform very well in-distribution, but can be quite susceptible to distribution shift. This is especially inappropriate for data-starved hyperspect… ▽ More

    Submitted 19 March, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: 11 pages, 5 figures

  18. arXiv:2302.05726  [pdf, other

    eess.SY

    Enhance Local Consistency in Federated Learning: A Multi-Step Inertial Momentum Approach

    Authors: Yixing Liu, Yan Sun, Zhengtao Ding, Li Shen, Bo Liu, Dacheng Tao

    Abstract: Federated learning (FL), as a collaborative distributed training paradigm with several edge computing devices under the coordination of a centralized server, is plagued by inconsistent local stationary points due to the heterogeneity of the local partial participation clients, which precipitates the local client-drifts problems and sparks off the unstable and slow convergence, especially on the ag… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

  19. arXiv:2212.07867  [pdf, other

    eess.IV cs.CV cs.RO

    Localizing Scan Targets from Human Pose for Autonomous Lung Ultrasound Imaging

    Authors: Jianzhi Long, Jicang Cai, Abdullah Al-Battal, Shiwei Jin, Jing Zhang, Dacheng Tao, Truong Nguyen

    Abstract: Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging. With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging as it requires trained operators in close proximity to patients for a long period of time, therefore increasing risk of infection. In this work, we investigate the important yet seldom-studied problem of… ▽ More

    Submitted 25 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: v2 2023/02/25

    ACM Class: I.4.9

  20. arXiv:2204.12139  [pdf, other

    cs.CV eess.IV

    Neural Maximum A Posteriori Estimation on Unpaired Data for Motion Deblurring

    Authors: Youjian Zhang, Chaoyue Wang, Dacheng Tao

    Abstract: Real-world dynamic scene deblurring has long been a challenging task since paired blurry-sharp training data is unavailable. Conventional Maximum A Posteriori estimation and deep learning-based deblurring methods are restricted by handcrafted priors and synthetic blurry-sharp training pairs respectively, thereby failing to generalize to real dynamic blurriness. To this end, we propose a Neural Max… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

  21. arXiv:2203.16847  [pdf, other

    eess.AS

    Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: Counseling typically takes the form of spoken conversation between a therapist and a client. The empathy level expressed by the therapist is considered to be an essential quality factor of counseling outcome. This paper proposes a hierarchical recurrent network combined with two-level attention mechanisms to determine the therapist's empathy level solely from the acoustic features of conversationa… ▽ More

    Submitted 26 May, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by INTERSPEECH 2022

  22. arXiv:2203.13127  [pdf, other

    eess.AS

    Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: In conversation-based psychotherapy, therapists use verbal techniques to help clients express thoughts and feelings, and change behavior. In particular, how well therapists convey empathy is an essential quality index of psychotherapy sessions and is associated with psychotherapy outcome. In this paper, we analyze the prosody of therapist speech and attempt to associate the therapist's speaking st… ▽ More

    Submitted 26 May, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted by INTERSPEECH 2022

  23. Brain Age Estimation From MRI Using Cascade Networks with Ranking Loss

    Authors: Jian Cheng, Ziyang Liu, Hao Guan, Zhenzhou Wu, Haogang Zhu, Jiyang Jiang, Wei Wen, Dacheng Tao, Tao Liu

    Abstract: Chronological age of healthy people is able to be predicted accurately using deep neural networks from neuroimaging data, and the predicted brain age could serve as a biomarker for detecting aging-related diseases. In this paper, a novel 3D convolutional network, called two-stage-age-network (TSAN), is proposed to estimate brain age from T1-weighted MRI data. Compared with existing methods, TSAN h… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted by IEEE transactions on Medical Imaging, 13 pages, 6 figures

  24. MRI-based Alzheimer's disease prediction via distilling the knowledge in multi-modal data

    Authors: Hao Guan, Chaoyue Wang, Dacheng Tao

    Abstract: Mild cognitive impairment (MCI) conversion prediction, i.e., identifying MCI patients of high risks converting to Alzheimer's disease (AD), is essential for preventing or slowing the progression of AD. Although previous studies have shown that the fusion of multi-modal data can effectively improve the prediction accuracy, their applications are largely restricted by the limited availability or hig… ▽ More

    Submitted 24 September, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

  25. arXiv:2012.11887  [pdf, other

    eess.SP

    Joint Optimization of Trajectory, Propulsion and Thrust Powers for Covert UAV-on-UAV Video Tracking and Surveillance

    Authors: Shuyan Hu, Wei Ni, Xin Wang, Abbas Jamalipour, Dean Ta

    Abstract: Autonomous tracking of suspicious unmanned aerial vehicles (UAVs) by legitimate monitoring UAVs (or monitors) can be crucial to public safety and security. It is non-trivial to optimize the trajectory of a monitor while conceiving its monitoring intention, due to typically non-convex propulsion and thrust power functions. This paper presents a novel framework to jointly optimize the propulsion and… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: IEEE Transactions on Information Forensics and Security, accepted, 21 Dec. 2020

  26. arXiv:2011.14611  [pdf, other

    cs.CV eess.IV

    SIR: Self-supervised Image Rectification via Seeing the Same Scene from Multiple Different Lenses

    Authors: Jinlong Fan, Jing Zhang, Dacheng Tao

    Abstract: Deep learning has demonstrated its power in image rectification by leveraging the representation capacity of deep neural networks via supervised training based on a large-scale synthetic dataset. However, the model may overfit the synthetic images and generalize not well on real-world fisheye images due to the limited universality of a specific distortion model and the lack of explicitly modeling… ▽ More

    Submitted 18 June, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

  27. arXiv:2011.12108  [pdf, other

    cs.CV eess.IV

    Wide-angle Image Rectification: A Survey

    Authors: Jinlong Fan, Jing Zhang, Stephen J. Maybank, Dacheng Tao

    Abstract: Wide field-of-view (FOV) cameras, which capture a larger scene area than narrow FOV cameras, are used in many applications including 3D reconstruction, autonomous driving, and video surveillance. However, wide-angle images contain distortions that violate the assumptions underlying pinhole camera models, resulting in object distortion, difficulties in estimating scene distance, area, and direction… ▽ More

    Submitted 1 December, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

    Comments: Accepted by the International Journal of Computer Vision (IJCV). Both the datasets and source code are available at https://github.com/loong8888/WAIR

  28. arXiv:2010.16188  [pdf, other

    cs.CV cs.LG eess.IV

    Bridging Composite and Real: Towards End-to-end Deep Image Matting

    Authors: Jizhizi Li, Jing Zhang, Stephen J. Maybank, Dacheng Tao

    Abstract: Extracting accurate foregrounds from natural images benefits many downstream applications such as film production and augmented reality. However, the furry characteristics and various appearance of the foregrounds, e.g., animal and portrait, challenge existing matting methods, which usually require extra user inputs such as trimap or scribbles. To resolve these problems, we study the distinct role… ▽ More

    Submitted 26 October, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: Accepted by the International Journal of Computer Vision (IJCV). Both the datasets and source code are available at https://github.com/JizhiziLi/GFM

  29. arXiv:2009.08891  [pdf, other

    eess.IV cs.CV

    AdderSR: Towards Energy Efficient Image Super-Resolution

    Authors: Dehua Song, Yunhe Wang, Hanting Chen, Chang Xu, Chunjing Xu, Dacheng Tao

    Abstract: This paper studies the single image super-resolution problem using adder neural networks (AdderNet). Compared with convolutional neural networks, AdderNet utilizing additions to calculate the output features thus avoid massive energy consumptions of conventional multiplications. However, it is very hard to directly inherit the existing success of AdderNet on large-scale image classification to the… ▽ More

    Submitted 4 May, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

  30. arXiv:2008.03864  [pdf, other

    cs.CV cs.LG eess.IV

    Nighttime Dehazing with a Synthetic Benchmark

    Authors: Jing Zhang, Yang Cao, Zheng-Jun Zha, Dacheng Tao

    Abstract: Increasing the visibility of nighttime hazy images is challenging because of uneven illumination from active artificial light sources and haze absorbing/scattering. The absence of large-scale benchmark datasets hampers progress in this area. To address this issue, we propose a novel synthetic method called 3R to simulate nighttime hazy images from daytime clear images, which first reconstructs the… ▽ More

    Submitted 18 October, 2020; v1 submitted 9 August, 2020; originally announced August 2020.

    Comments: ACM MM 2020. Both the dataset and source code will be available at \url{https://github.com/chaimi2013/3R}

  31. arXiv:2002.11474  [pdf, other

    cs.SD cs.LG eess.AS

    RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

    Authors: Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, Dingwen Tao

    Abstract: Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a nove… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

  32. arXiv:2002.00537  [pdf, other

    cs.CV cs.LG eess.IV

    Towards High Performance Human Keypoint Detection

    Authors: Jing Zhang, Zhe Chen, Dacheng Tao

    Abstract: Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance. In this paper, we address this problem from three aspects by devising an efficient network structure, proposing three effective training strategies, and exploiting four useful postprocessing techniques. First, we find that context information plays an important role in reasonin… ▽ More

    Submitted 22 May, 2021; v1 submitted 2 February, 2020; originally announced February 2020.

    Comments: Accepted by IJCV

  33. arXiv:1912.01447  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Transform-Invariant Convolutional Neural Networks for Image Classification and Search

    Authors: Xu Shen, Xinmei Tian, Anfeng He, Shaoyan Sun, Dacheng Tao

    Abstract: Convolutional neural networks (CNNs) have achieved state-of-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and non-linear activation) and pooling operations should be abl… ▽ More

    Submitted 28 November, 2019; originally announced December 2019.

    Comments: Accepted by ACM Multimedia. arXiv admin note: text overlap with arXiv:1911.12682

  34. arXiv:1911.12682  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Patch Reordering: a Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks

    Authors: Xu Shen, Xinmei Tian, Shaoyan Sun, Dacheng Tao

    Abstract: Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance on many visual recognition tasks. However, the combination of convolution and pooling operations only shows invariance to small local location changes in meaningful objects in input. Sometimes, such networks are trained using data augmentation to encode this invariance into the parameters, which restricts the capac… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted AAAI17

  35. arXiv:1911.12501  [pdf, other

    cs.CV cs.LG eess.IV

    An End-to-end Framework for Unconstrained Monocular 3D Hand Pose Estimation

    Authors: Sanjeev Sharma, Shaoli Huang, Dacheng Tao

    Abstract: This work addresses the challenging problem of unconstrained 3D hand pose estimation using monocular RGB images. Most of the existing approaches assume some prior knowledge of hand (such as hand locations and side information) is available for 3D hand pose estimation. This restricts their use in unconstrained environments. We, therefore, present an end-to-end framework that robustly predicts hand… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

  36. arXiv:1910.12223  [pdf, other

    cs.CV cs.LG eess.IV

    Human Keypoint Detection by Progressive Context Refinement

    Authors: Jing Zhang, Zhe Chen, Dacheng Tao

    Abstract: Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance of person instances. In this paper, we find that context information plays an important role in addressing these issues, and propose a novel method named progressive context refinement (PCR) for human keypoint detection. First, we devise a simple but effective context-aware modu… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: Technical Report for "Joint COCO and MapillaryWorkshop at ICCV 2019: COCO Keypoint Detection Challenge Track"

  37. arXiv:1910.09830  [pdf

    cs.CV eess.IV

    Hetero-Center Loss for Cross-Modality Person Re-Identification

    Authors: Yuanxin Zhu, Zhao Yang, Li Wang, Sai Zhao, Xiao Hu, Dapeng Tao

    Abstract: Cross-modality person re-identification is a challenging problem which retrieves a given pedestrian image in RGB modality among all the gallery images in infrared modality. The task can address the limitation of RGB-based person Re-ID in dark environments. Existing researches mainly focus on enlarging inter-class differences of feature to solve the problem. However, few studies investigate improvi… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: 16 pages, 10 figures

  38. arXiv:1908.07307  [pdf, other

    cs.LG eess.SP stat.ML

    Investigation of wind pressures on tall building under interference effects using machine learning techniques

    Authors: Gang Hu, Lingbo Liu, Dacheng Tao, Jie Song, K. C. S. Kwok

    Abstract: Interference effects of tall buildings have attracted numerous studies due to the boom of clusters of tall buildings in megacities. To fully understand the interference effects of buildings, it often requires a substantial amount of wind tunnel tests. Limited wind tunnel tests that only cover part of interference scenarios are unable to fully reveal the interference effects. This study used machin… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: 15 pages, 14 figures

  39. arXiv:1906.01796  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    One-pass Multi-task Networks with Cross-task Guided Attention for Brain Tumor Segmentation

    Authors: Chenhong Zhou, Changxing Ding, Xinchao Wang, Zhentai Lu, Dacheng Tao

    Abstract: Class imbalance has emerged as one of the major challenges for medical image segmentation. The model cascade (MC) strategy significantly alleviates the class imbalance issue via running a set of individual deep models for coarse-to-fine segmentation. Despite its outstanding performance, however, this method leads to undesired system complexity and also ignores the correlation among the models. To… ▽ More

    Submitted 7 February, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: 14 pages, 7 figures, To appear in IEEE Transactions on Image Processing

  40. arXiv:1802.07101  [pdf, other

    cs.CV eess.IV

    Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields

    Authors: Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, Dacheng Tao, Mingli Song

    Abstract: The Fast Style Transfer methods have been recently proposed to transfer a photograph to an artistic style in real-time. This task involves controlling the stroke size in the stylized results, which remains an open challenge. In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control. By analyzing the factors that influence the… ▽ More

    Submitted 18 October, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

    Comments: Accepted by ECCV2018. Supplementary material: https://yongchengjing.com/pdf/strokeControllable_supp.pdf

  41. arXiv:1709.05077  [pdf, other

    cs.AI eess.SY

    Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

    Authors: Yuanlong Li, Yonggang Wen, Kyle Guan, Dacheng Tao

    Abstract: Cooling system plays a critical role in a modern data center (DC). Developing an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this… ▽ More

    Submitted 18 July, 2018; v1 submitted 15 September, 2017; originally announced September 2017.