Skip to main content

Showing 1–50 of 54 results for author: Qiao, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.14103  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

    Authors: Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang

    Abstract: Jailbreak attacks to Large audio-language models (LALMs) are studied recently, but they achieve suboptimal effectiveness, applicability, and practicability, particularly, assuming that the adversary can fully manipulate user prompts. In this work, we first conduct an extensive experiment showing that advanced text jailbreak attacks cannot be easily ported to end-to-end LALMs via text-to speech (TT… ▽ More

    Submitted 20 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2505.06248  [pdf, ps, other

    eess.SP cs.IT

    Low-Complexity Channel Estimation in OTFS Systems with Fractional Effects

    Authors: Guangyu Lei, Yanduo Qiao, Tianhao Liang, Weijie Yuan, Tingting Zhang

    Abstract: Orthogonal Time Frequency Space (OTFS) modulation exploits the sparsity of Delay-Doppler domain channels, making it highly effective in high-mobility scenarios. Its accurate channel estimation supports integrated sensing and communication (ISAC) systems. The letter introduces a low-complexity technique for estimating delay and Doppler shifts under fractional effects, while addressing inter-path in… ▽ More

    Submitted 28 April, 2025; originally announced May 2025.

  3. arXiv:2504.02852  [pdf, other

    eess.SY cs.RO

    Curvature-Constrained Vector Field for Motion Planning of Nonholonomic Robots

    Authors: Yike Qiao, Xiaodong He, An Zhuo, Zhiyong Sun, Weimin Bao, Zhongkui Li

    Abstract: Vector fields are advantageous in handling nonholonomic motion planning as they provide reference orientation for robots. However, additionally incorporating curvature constraints becomes challenging, due to the interconnection between the design of the curvature-bounded vector field and the tracking controller under underactuation. In this paper, we present a novel framework to co-develop the vec… ▽ More

    Submitted 25 March, 2025; originally announced April 2025.

  4. arXiv:2504.01375  [pdf, other

    eess.SP

    Simultaneous Pre-compensation for Bandwidth Limitation and Fiber Dispersion in Cost-Sensitive IM/DD Transmission Systems

    Authors: Zhe Zhao, Aiying Yang, Xiaoqian Huang, Peng Guo, Shuhua Zhao, Tianjia Xu, Wenkai Wan, Tianwai Bo, Zhongwei Tan, Yi Dong, Yaojun Qiao

    Abstract: We propose a pre-compensation scheme for bandwidth limitation and fiber dispersion (pre-BL-EDC) based on the modified Gerchberg-Saxton (GS) algorithm. Experimental results demonstrate 1.0/1.0/2.0 dB gains compared to modified GS pre-EDC for 20/28/32 Gbit/s bandwidth-limited systems.

    Submitted 2 April, 2025; originally announced April 2025.

  5. arXiv:2503.20323  [pdf, other

    eess.SP

    Derivation and analysis of power offset in fiber-longitudinal power profile estimation using pre-FEC hard-decision data

    Authors: Du Tang, Yingjie Jiang, Ji Luo, Yu Chen, Bofang Zheng, Yaojun Qiao

    Abstract: Utilizing the precise reference waveform regenerated by post-forward error correction (FEC) data, the fiber-longitudinal power profile estimation based on the minimum-mean-square-error method (MMSE-PPE) has been validated as an effective tool for absolute power monitoring. However, when post-FEC data is unavailable, it becomes necessary to rely on pre-FEC hard-decision data, which inevitably intro… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  6. arXiv:2503.13560  [pdf, other

    eess.IV cs.CV

    MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset

    Authors: Zhaodong Wu, Qiaochu Zhao, Ming Hu, Yulong Li, Haochen Xue, Kang Dang, Zhengyong Jiang, Angelos Stefanidis, Qiufeng Wang, Imran Razzak, Zongyuan Ge, Junjun He, Yu Qiao, Zhong Zheng, Feilong Tang, Jionglong Su

    Abstract: With the significantly increasing incidence and prevalence of abdominal diseases, there is a need to embrace greater use of new innovations and technology for the diagnosis and treatment of patients. Although deep-learning methods have notably been developed to assist radiologists in diagnosing abdominal diseases, existing models have the restricted ability to segment common lesions in the abdomen… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  7. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  8. arXiv:2411.00772  [pdf, other

    eess.AS

    SANN-PSZ: Spatially Adaptive Neural Network for Head-Tracked Personal Sound Zones

    Authors: Yue Qiao, Edgar Choueiri

    Abstract: A deep learning framework for dynamically rendering personal sound zones (PSZs) with head tracking is presented, utilizing a spatially adaptive neural network (SANN) that inputs listeners' head coordinates and outputs PSZ filter coefficients. The SANN model is trained using either simulated acoustic transfer functions (ATFs) with data augmentation for robustness in uncertain environments or a mix… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  9. arXiv:2409.06954  [pdf, other

    eess.AS

    Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

    Authors: Yue Qiao, Vinay Kothapally, Meng Yu, Dong Yu

    Abstract: Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage netw… ▽ More

    Submitted 16 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

  10. arXiv:2408.03361  [pdf, other

    eess.IV cs.CV

    GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

    Authors: Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao

    Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren… ▽ More

    Submitted 21 October, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: GitHub: https://github.com/uni-medical/GMAI-MMBench Hugging face: https://huggingface.co/datasets/OpenGVLab/GMAI-MMBench

  11. arXiv:2408.02865  [pdf, other

    eess.IV cs.AI cs.CL cs.CV

    VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge

    Authors: Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E. Kinahan, Yu Qiao

    Abstract: The need for improved diagnostic methods in ophthalmology is acute, especially in the less developed regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  12. arXiv:2407.07667  [pdf, other

    cs.CV eess.IV

    VEnhancer: Generative Space-Time Enhancement for Video Generation

    Authors: Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

    Abstract: We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: technical report

  13. arXiv:2406.09656  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement

    Authors: Jingcheng Li, Ye Qiao, Haocheng Xu, Sitao Huang

    Abstract: Images captured under low-light scenarios often suffer from low quality. Previous CNN-based deep learning methods often involve using Retinex theory. Nevertheless, most of them cannot perform well in more complicated datasets like LOL-v2 while consuming too much computational resources. Besides, some of these methods require sophisticated training at different stages, making the procedure even mor… ▽ More

    Submitted 23 April, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2404.19500  [pdf, other

    cs.CV cs.AI cs.MM eess.IV

    Towards Real-world Video Face Restoration: A New Benchmark

    Authors: Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong

    Abstract: Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face ima… ▽ More

    Submitted 4 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Project page: https://ziyannchen.github.io/projects/VFRxBenchmark/

  15. A Multi-loudspeaker Binaural Room Impulse Response Dataset with High-Resolution Translational and Rotational Head Coordinates in a Listening Room

    Authors: Yue Qiao, Ryan Miguel Gonzales, Edgar Choueiri

    Abstract: Data report for the 3D3A Lab Binaural Room Impulse Response (BRIR) Dataset (https://doi.org/10.34770/6gc9-5787).

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to Frontiers in Signal Processing

  16. arXiv:2402.09181  [pdf, other

    eess.IV cs.CV

    OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

    Authors: Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape… ▽ More

    Submitted 21 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  17. arXiv:2402.01246  [pdf, other

    cs.RO eess.SY

    LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

    Authors: Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao

    Abstract: The emergence of Multimodal Large Language Models ((M)LLMs) has ushered in new avenues in artificial intelligence, particularly for autonomous driving by offering enhanced understanding and reasoning capabilities. This paper introduces LimSim++, an extended version of LimSim designed for the application of (M)LLMs in autonomous driving. Acknowledging the limitations of existing simulation platform… ▽ More

    Submitted 12 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted by 35th IEEE Intelligent Vehicles Symposium (IV 2024)

  18. arXiv:2311.11969  [pdf, other

    eess.IV cs.CV

    SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

    Authors: Jin Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao

    Abstract: Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  19. arXiv:2310.15413  [pdf, other

    eess.SY

    Sensor Attacks and Resilient Defense on HVAC Systems for Energy Market Signal Tracking

    Authors: Guanyu Tian, Qun Zhou Sun, Yiyuan Qiao

    Abstract: The power flexibility from smart buildings makes them suitable candidates for providing grid services. The building automation system (BAS) that employs model predictive control (MPC) for grid services relies heavily on sensor data gathered from IoT-based HVAC systems through communication networks. However, cyber-attacks that tamper sensor values can compromise the accuracy and flexibility of HVA… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  20. arXiv:2310.10513  [pdf, other

    cs.CV eess.IV

    Unifying Image Processing as Visual Prompting Question Answering

    Authors: Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong

    Abstract: Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a… ▽ More

    Submitted 20 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 16 pages, 12 figures

  21. arXiv:2309.04084  [pdf, other

    cs.CV cs.MM eess.IV

    Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

    Authors: Xiangyu Chen, Zheyuan Li, Zhengwen Zhang, Jimmy S. Ren, Yihao Liu, Jingwen He, Yu Qiao, Jiantao Zhou, Chao Dong

    Abstract: Modern displays can render video content with high dynamic range (HDR) and wide color gamut (WCG). However, most resources are still in standard dynamic range (SDR). Therefore, transforming existing SDR content into the HDRTV standard holds significant value. This paper defines and analyzes the SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV content. Our findings reveal that a naive e… ▽ More

    Submitted 20 September, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Extended version of HDRTVNet

  22. A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation

    Authors: Ziyan Huang, Zhongying Deng, Jin Ye, Haoyu Wang, Yanzhou Su, Tianbin Li, Hui Sun, Junlong Cheng, Jianpin Chen, Junjun He, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao

    Abstract: Although deep learning have revolutionized abdominal multi-organ segmentation, models often struggle with generalization due to training on small, specific datasets. With the recent emergence of large-scale datasets, some important questions arise: \textbf{Can models trained on these datasets generalize well on different ones? If yes/no, how to further improve their generalizability?} To address t… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  23. arXiv:2309.03905  [pdf, other

    cs.MM cs.CL cs.CV cs.LG cs.SD eess.AS

    ImageBind-LLM: Multi-modality Instruction Tuning

    Authors: Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

    Abstract: We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training… ▽ More

    Submitted 11 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Code is available at https://github.com/OpenGVLab/LLaMA-Adapter

  24. arXiv:2307.06648  [pdf, other

    eess.SY cs.RO

    LimSim: A Long-term Interactive Multi-scenario Traffic Simulator

    Authors: Licheng Wen, Daocheng Fu, Song Mao, Pinlong Cai, Min Dou, Yikang Li, Yu Qiao

    Abstract: With the growing popularity of digital twin and autonomous driving in transportation, the demand for simulation systems capable of generating high-fidelity and reliable scenarios is increasing. Existing simulation systems suffer from a lack of support for different types of scenarios, and the vehicle models used in these systems are too simplistic. Thus, such systems fail to represent driving styl… ▽ More

    Submitted 26 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted by 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

  25. arXiv:2306.11504  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    Align, Adapt and Inject: Sound-guided Unified Image Generation

    Authors: Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo

    Abstract: Text-guided image generation has witnessed unprecedented progress due to the development of diffusion models. Beyond text and image, sound is a vital element within the sphere of human perception, offering vivid representations and naturally coinciding with corresponding scenes. Taking advantage of sound therefore presents a promising avenue for exploration within image generation research. Howeve… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Tech Report

  26. arXiv:2306.01808  [pdf, other

    eess.IV cs.CV

    Morphology Edge Attention Network and Optimal Geometric Matching Connection model for vascular segmentation

    Authors: Yuntao Zhu, Yuxuan Qiao, Xiaoping Yang

    Abstract: There are many unsolved problems in vascular image segmentation, including vascular structural connectivity, scarce branches and missing small vessels. Obtaining vessels that preserve their correct topological structures is currently a crucial research issue, as it provides an overall view of one vascular system. In order to preserve the topology and accuracy of vessel segmentation, we proposed a… ▽ More

    Submitted 13 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 6 pages

  27. arXiv:2305.01319  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Long-Term Rhythmic Video Soundtracker

    Authors: Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao

    Abstract: We consider the problem of generating musical soundtracks in sync with rhythmic visual cues. Most existing works rely on pre-defined music representations, leading to the incompetence of generative flexibility and complexity. Other methods directly generating video-conditioned waveforms suffer from limited scenarios, short lengths, and unstable generation quality. To this end, we present Long-Term… ▽ More

    Submitted 30 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: ICML2023

    Report number: 15

  28. arXiv:2210.05960  [pdf, other

    eess.IV cs.CV

    Efficient Image Super-Resolution using Vast-Receptive-Field Attention

    Authors: Lin Zhou, Haoming Cai, Jinjin Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, Chao Dong

    Abstract: The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of th… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  29. Isolation performance metrics for personal sound zone reproduction systems

    Authors: Yue Qiao, Léo Guadagnin, Edgar Choueiri

    Abstract: Two isolation performance metrics, Inter-Zone Isolation (IZI) and Inter-Program Isolation (IPI), are introduced for evaluating Personal Sound Zone (PSZ) systems. Compared to the commonly-used Acoustic Contrast metric, IZI and IPI are generalized for multichannel audio, and quantify the isolation of sound zones and of audio programs, respectively. The two metrics are shown to be generally non-inter… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  30. arXiv:2208.00883  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    A Two-Stage Efficient 3-D CNN Framework for EEG Based Emotion Recognition

    Authors: Ye Qiao, Mohammed Alnemari, Nader Bagherzadeh

    Abstract: This paper proposes a novel two-stage framework for emotion recognition using EEG data that outperforms state-of-the-art models while keeping the model size small and computationally efficient. The framework consists of two stages; the first stage involves constructing efficient models named EEGNet, which is inspired by the state-of-the-art efficient architecture and employs inverted-residual bloc… ▽ More

    Submitted 26 July, 2022; originally announced August 2022.

  31. arXiv:2206.12512  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge Findings

    Authors: Sophia Bano, Alessandro Casella, Francisco Vasconcelos, Abdul Qayyum, Abdesslam Benzinou, Moona Mazher, Fabrice Meriaudeau, Chiara Lena, Ilaria Anita Cintorrino, Gaia Romana De Paolis, Jessica Biagioli, Daria Grechishnikova, Jing Jiao, Bizhe Bai, Yanyan Qiao, Binod Bhattarai, Rebati Raman Gaire, Ronast Subedi, Eduard Vazquez, Szymon Płotka, Aneta Lisowska, Arkadiusz Sitek, George Attilakos, Ruwan Wimalasundera, Anna L David , et al. (6 additional authors not shown)

    Abstract: Fetoscopy laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS). The procedure involves photocoagulation pathological anastomoses to regulate blood exchange among twins. The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination. These challe… ▽ More

    Submitted 26 February, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

    Comments: Accepted at MedIA (Medical Image Analysis)

  32. arXiv:2205.07019  [pdf, other

    cs.CV eess.IV

    Evaluating the Generalization Ability of Super-Resolution Networks

    Authors: Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong

    Abstract: Performance and generalization ability are two important aspects to evaluate the deep learning models. However, research on the generalization ability of Super-Resolution (SR) networks is currently absent. Assessing the generalization ability of deep models not only helps us to understand their intrinsic mechanisms, but also allows us to quantitatively measure their applicability boundaries, which… ▽ More

    Submitted 3 September, 2023; v1 submitted 14 May, 2022; originally announced May 2022.

    Comments: Accepted by TPAMI

  33. arXiv:2205.05996  [pdf, other

    cs.CV eess.IV

    Blueprint Separable Residual Network for Efficient Image Super-Resolution

    Authors: Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Jinjin Gu, Yu Qiao, Chao Dong

    Abstract: Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR Workshops

  34. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  35. arXiv:2205.04437  [pdf, other

    eess.IV cs.CV

    Activating More Pixels in Image Super-Resolution Transformer

    Authors: Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong

    Abstract: Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reco… ▽ More

    Submitted 18 March, 2023; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR2023

  36. arXiv:2203.13310  [pdf, other

    cs.CV cs.AI eess.IV

    MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

    Authors: Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Yiwen Tang, Xuanzhuo Xu, Ziteng Cui, Yu Qiao, Peng Gao, Hongsheng Li

    Abstract: Monocular 3D object detection has long been a challenging task in autonomous driving. Most existing methods follow conventional 2D detectors to first localize object centers, and then predict 3D attributes by neighboring features. However, only using local visual features is insufficient to understand the scene-level 3D spatial structures and ignores the long-range inter-object depth relations. In… ▽ More

    Submitted 13 February, 2025; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted by ICCV 2023. Code is available at https://github.com/ZrrSkywalker/MonoDETR

  37. arXiv:2110.04562  [pdf, other

    cs.CV eess.IV

    Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning

    Authors: Yihao Liu, Hengyuan Zhao, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Yu Qiao, Chao Dong

    Abstract: Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization and existing methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfying colorization performance. We address this problem from a new perspective, b… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: 13 pages, 10 figures

  38. Automated Aerial Animal Detection When Spatial Resolution Conditions Are Varied

    Authors: Jasper Brown, Yongliang Qiao, Cameron Clark, Sabrina Lomax, Khalid Rafique, Salah Sukkarieh

    Abstract: Knowing where livestock are located enables optimized management and mustering. However, Australian farms are large meaning that many of Australia's livestock are unmonitored which impacts farm profit, animal welfare and the environment. Effective animal localisation and counting by analysing satellite imagery overcomes this management hurdle however, high resolution satellite imagery is expensive… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: 20 pages, 9 figures, 4 tables in appendix

    Journal ref: Computers and Electronics in Agriculture, Volume 193, 2022

  39. arXiv:2109.12634  [pdf, other

    eess.IV cs.CV

    A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images

    Authors: Zijie Chen, Cheng Li, Junjun He, Jin Ye, Diping Song, Shanshan Wang, Lixu Gu, Yu Qiao

    Abstract: Radiation therapy (RT) is widely employed in the clinic for the treatment of head and neck (HaN) cancers. An essential step of RT planning is the accurate segmentation of various organs-at-risks (OARs) in HaN CT images. Nevertheless, segmenting OARs manually is time-consuming, tedious, and error-prone considering that typical HaN CT images contain tens to hundreds of slices. Automated segmentation… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: 10 pages, 2 figures

  40. arXiv:2109.12629  [pdf, ps, other

    eess.IV cs.CV

    Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation

    Authors: Junjun He, Jin Ye, Cheng Li, Diping Song, Wanli Chen, Shanshan Wang, Lixu Gu, Yu Qiao

    Abstract: Recent studies have witnessed the effectiveness of 3D convolutions on segmenting volumetric medical images. Compared with the 2D counterparts, 3D convolutions can capture the spatial context in three dimensions. Nevertheless, models employing 3D convolutions introduce more trainable parameters and are more computationally complex, which may lead easily to model overfitting especially for medical a… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: 10 pages, 2 figures

  41. arXiv:2108.07978  [pdf, other

    eess.IV cs.CV

    A New Journey from SDRTV to HDRTV

    Authors: Xiangyu Chen, Zhengwen Zhang, Jimmy S. Ren, Lynhoo Tian, Yu Qiao, Chao Dong

    Abstract: Nowadays modern displays are capable to render video content with high dynamic range (HDR) and wide color gamut (WCG). However, most available resources are still in standard dynamic range (SDR). Therefore, there is an urgent demand to transform existing SDR-TV contents into their HDR-TV versions. In this paper, we conduct an analysis of SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV… ▽ More

    Submitted 25 September, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV

  42. arXiv:2106.08689  [pdf, other

    cs.CL cs.SD eess.AS

    Alzheimer's Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models

    Authors: Yu Qiao, Xuefeng Yin, Daniel Wiechmann, Elma Kerz

    Abstract: In this paper, we combined linguistic complexity and (dis)fluency features with pretrained language models for the task of Alzheimer's disease detection of the 2021 ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech) challenge. An accuracy of 83.1% was achieved on the test set, which amounts to an improvement of 4.23% over the baseline model. Our best-performing model that integr… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: accepted at Interspeech2021

  43. arXiv:2105.13084  [pdf, other

    eess.IV cs.CV

    HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization

    Authors: Xiangyu Chen, Yihao Liu, Zhengwen Zhang, Yu Qiao, Chao Dong

    Abstract: Most consumer-grade digital cameras can only capture a limited range of luminance in real-world scenes due to sensor constraints. Besides, noise and quantization errors are often introduced in the imaging process. In order to obtain high dynamic range (HDR) images with excellent visual quality, the most common solution is to combine multiple images with different exposures. However, it is not alwa… ▽ More

    Submitted 19 June, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

  44. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  45. arXiv:2103.10339  [pdf, other

    cs.CV eess.IV

    Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud

    Authors: Mingye Xu, Zhipeng Zhou, Junhao Zhang, Yu Qiao

    Abstract: This paper investigates the indistinguishable points (difficult to predict label) in semantic segmentation for large-scale 3D point clouds. The indistinguishable points consist of those located in complex boundary, points with similar local textures but different categories, and points in isolate small hard areas, which largely harm the performance of 3D semantic segmentation. To address this chal… ▽ More

    Submitted 25 August, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: AAAI2021

  46. arXiv:2102.00969  [pdf

    cs.DC eess.SY

    Blockchain for Decentralized Multi-Drone to Combat COVID-19

    Authors: S. H. Alsamhi, B. Lee, M. Guizani, N. Kumar, Y. Qiao, Xuan Liu

    Abstract: Currently, drones represent a promising technology for combating Coronavirus disease 2019 (COVID-19) due to the transport of goods, medical supplies to a given target location in the quarantine areas experiencing an epidemic outbreak. Drone missions will increasingly rely on drone collaboration, which requires the drones to reduce communication complexity and be controlled in a decentralized fashi… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Journal ref: Transactions on Emerging Telecommunication Technologies,2021

  47. arXiv:2010.16127  [pdf, other

    eess.SP

    Fixed-State Log-MAP Detection for Intensity-Modulation and Direct-Detection Optical Systems over Dispersion-Uncompensated Links

    Authors: Shuangyue Liu, Ji Zhou, Haide Wang, Mengqi Guo, Yueming Lu, Yaojun Qiao

    Abstract: In this paper, an optimized detection based on log-maximum a posteriori estimation with the fixed number of surviving states (fixed-state Log-MAP) is proposed to cooperate with equalizers to deal with the spectral distortions caused by limited bandwidth and chromatic dispersion for intensity-modulation and direct-detection (IM/DD) optical systems. The equalizers compensates the spectral distortion… ▽ More

    Submitted 11 January, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: 7 pages, 9 figures

  48. Modified QPSK Partition Algorithm Based on MAP Estimation for Probabilistically-Shaped 16-QAM

    Authors: Jin Hu, Zhongliang Sun, Xuekai Xu, Mengqi Guo, Xizi Tang, Yueming Lu, Yaojun Qiao

    Abstract: Probabilistic shaping (PS) is investigated as a potential technique to approach the Shannon limit. However, it has been proved that conventional carrier phase recovery (CPR) algorithm designed for uniform distribution may have extra penalty in PS systems. In this paper, we find that the performance of QPSK partition algorithm is degenerated when PS is implemented. To solve this issue, a modified Q… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  49. arXiv:2010.01073  [pdf, other

    eess.IV cs.CV

    Efficient Image Super-Resolution Using Pixel Attention

    Authors: Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong

    Abstract: This work aims at designing a lightweight convolutional neural network for image super resolution (SR). With simplicity bare in mind, we construct a pretty concise and effective network with a newly proposed pixel attention scheme. Pixel attention (PA) is similar as channel attention and spatial attention in formulation. The difference is that PA produces 3D attention maps instead of a 1D attentio… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 17 pages, 5 figures, conference, accpeted by ECCVW (AIM2020 ESR Challenge)

  50. arXiv:2009.06943  [pdf, other

    eess.IV cs.CV

    AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Xiaotong Luo, Liang Chen, Jiangtao Zhang, Maitreya Suin , et al. (60 additional authors not shown)

    Abstract: This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter co… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.