Skip to main content

Showing 1–50 of 66 results for author: Hao, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03872  [pdf, ps, other

    eess.IV cs.CV

    PLUS: Plug-and-Play Enhanced Liver Lesion Diagnosis Model on Non-Contrast CT Scans

    Authors: Jiacheng Hao, Xiaoming Zhang, Wei Liu, Xiaoli Yin, Yuan Gao, Chunli Li, Ling Zhang, Le Lu, Yu Shi, Xu Han, Ke Yan

    Abstract: Focal liver lesions (FLL) are common clinical findings during physical examination. Early diagnosis and intervention of liver malignancies are crucial to improving patient survival. Although the current 3D segmentation paradigm can accurately detect lesions, it faces limitations in distinguishing between malignant and benign liver lesions, primarily due to its inability to differentiate subtle var… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: MICCAI 2025 (Early Accepted)

  2. arXiv:2506.20303  [pdf

    eess.IV cs.CL cs.CV

    FundaQ-8: A Clinically-Inspired Scoring Framework for Automated Fundus Image Quality Assessment

    Authors: Lee Qi Zun, Oscar Wong Jin Hao, Nor Anita Binti Che Omar, Zalifa Zakiah Binti Asnir, Mohamad Sabri bin Sinal Zainal, Goh Man Fye

    Abstract: Automated fundus image quality assessment (FIQA) remains a challenge due to variations in image acquisition and subjective expert evaluations. We introduce FundaQ-8, a novel expert-validated framework for systematically assessing fundus image quality using eight critical parameters, including field coverage, anatomical visibility, illumination, and image artifacts. Using FundaQ-8 as a structured s… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  3. arXiv:2505.08536  [pdf, ps, other

    eess.SP cs.IT

    Short Wins Long: Short Codes with Language Model Semantic Correction Outperform Long Codes

    Authors: Jiafu Hao, Chentao Yue, Hao Chang, Branka Vucetic, Yonghui Li

    Abstract: This paper presents a novel semantic-enhanced decoding scheme for transmitting natural language sentences with multiple short block codes over noisy wireless channels. After ASCII source coding, the natural language sentence message is divided into segments, where each is encoded with short block channel codes independently before transmission. At the receiver, each short block of codewords is dec… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures

  4. arXiv:2505.07159  [pdf, ps, other

    eess.IV cs.CV

    Skull stripping with purely synthetic data

    Authors: Jong Sung Park, Juhyung Ha, Siddhesh Thakur, Alexandra Badea, Spyridon Bakas, Eleftherios Garyfallidis

    Abstract: While many skull stripping algorithms have been developed for multi-modal and multi-species cases, there is still a lack of a fundamentally generalizable approach. We present PUMBA(PUrely synthetic Multimodal/species invariant Brain extrAction), a strategy to train a model for brain extraction with no real brain images or labels. Our results show that even without any real images or anatomical pri… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Oral at ISMRM 2025

  5. arXiv:2504.13936  [pdf, other

    cs.HC cs.LG eess.SY

    ViMo: A Generative Visual GUI World Model for App Agents

    Authors: Dezhao Luo, Bohan Tang, Kang Li, Georgios Papoudakis, Jifei Song, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao

    Abstract: App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effectiv… ▽ More

    Submitted 20 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: https://ai-agents-2030.github.io/ViMo/

  6. arXiv:2504.02373  [pdf, other

    eess.IV cs.CV

    HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement

    Authors: Hantang Li, Jinhua Hao, Lei Xiong, Shuyuan Zhu

    Abstract: In practical applications, conventional methods generate large volumes of low-light images that require compression for efficient storage and transmission. However, most existing methods either disregard the removal of potential compression artifacts during the enhancement process or fail to establish a unified framework for joint task enhancement of images with varying compression qualities. To s… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 7 pages, 5 figures

  7. arXiv:2503.19591  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization

    Authors: Weifei Jin, Junjie Su, Hejia Wang, Yulin Ye, Jie Hao

    Abstract: With the widespread application of automatic speech recognition (ASR) systems, their vulnerability to adversarial attacks has been extensively studied. However, most existing adversarial examples are generated on specific individual models, resulting in a lack of transferability. In real-world scenarios, attackers often cannot access detailed information about the target model, making query-based… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to ICME 2025

  8. arXiv:2503.18836  [pdf, other

    eess.IV cs.AI cs.CV

    Dual-domain Multi-path Self-supervised Diffusion Model for Accelerated MRI Reconstruction

    Authors: Yuxuan Zhang, Jinkui Hao, Bo Zhou

    Abstract: Magnetic resonance imaging (MRI) is a vital diagnostic tool, but its inherently long acquisition times reduce clinical efficiency and patient comfort. Recent advancements in deep learning, particularly diffusion models, have improved accelerated MRI reconstruction. However, existing diffusion models' training often relies on fully sampled data, models incur high computational costs, and often lack… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 10 pages, 8 figures, 5 tables

  9. arXiv:2502.09805  [pdf, other

    eess.IV cs.CV

    Towards Patient-Specific Surgical Planning for Bicuspid Aortic Valve Repair: Fully Automated Segmentation of the Aortic Valve in 4D CT

    Authors: Zaiyang Guo, Ningjun J Dong, Harold Litt, Natalie Yushkevich, Melanie Freas, Jessica Nunez, Victor Ferrari, Jilei Hao, Shir Goldfinger, Matthew A. Jolley, Joseph Bavaria, Nimesh Desai, Alison M. Pouch

    Abstract: The bicuspid aortic valve (BAV) is the most prevalent congenital heart defect and may require surgery for complications such as stenosis, regurgitation, and aortopathy. BAV repair surgery is effective but challenging due to the heterogeneity of BAV morphology. Multiple imaging modalities can be employed to assist the quantitative assessment of BAVs for surgical planning. Contrast-enhanced 4D compu… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  10. arXiv:2502.03974  [pdf

    eess.SY

    Spatiotemporal Trajectory Tracking Method for Vehicles Incorporating Lead-Lag Judgement

    Authors: Yuan Li, Xiang Dong, Tao Li, Junfeng Hao, Xiaoxue Xu, Sana Ullaha, Yincai Cai, Peng Wu, Ting Peng

    Abstract: In the domain of intelligent transportation systems, especially within the context of autonomous vehicle control, the preemptive holistic collaborative system has been presented as a promising solution to bring a remarkable enhancement in traffic efficiency and a substantial reduction in the accident rate, demonstrating a great potential of development. In order to ensure this system operates as i… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  11. arXiv:2501.10128  [pdf, other

    eess.IV cs.CV

    FECT: Classification of Breast Cancer Pathological Images Based on Fusion Features

    Authors: Jiacheng Hao, Yiqing Liu, Siqi Zeng, Yonghong He

    Abstract: Breast cancer is one of the most common cancers among women globally, with early diagnosis and precise classification being crucial. With the advancement of deep learning and computer vision, the automatic classification of breast tissue pathological images has emerged as a research focus. Existing methods typically rely on singular cell or tissue features and lack design considerations for morpho… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  12. arXiv:2501.08868  [pdf, other

    eess.SY cs.HC

    Processing and Analyzing Real-World Driving Data: Insights on Trips, Scenarios, and Human Driving Behaviors

    Authors: Jihun Han, Dominik Karbowski, Ayman Moawad, Namdoo Kim, Aymeric Rousseau, Shihong Fan, Jason Hoon Lee, Jinho Ha

    Abstract: Analyzing large volumes of real-world driving data is essential for providing meaningful and reliable insights into real-world trips, scenarios, and human driving behaviors. To this end, we developed a multi-level data processing approach that adds new information, segments data, and extracts desired parameters. Leveraging a confidential but extensive dataset (over 1 million km), this approach lea… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  13. arXiv:2412.13508  [pdf, other

    eess.IV cs.CV

    Plug-and-Play Tri-Branch Invertible Block for Image Rescaling

    Authors: Jingwei Bao, Jinhua Hao, Pengcheng Xu, Ming Sun, Chao Zhou, Shuyuan Zhu

    Abstract: High-resolution (HR) images are commonly downscaled to low-resolution (LR) to reduce bandwidth, followed by upscaling to restore their original details. Recent advancements in image rescaling algorithms have employed invertible neural networks (INNs) to create a unified framework for downscaling and upscaling, ensuring a one-to-one mapping between LR and HR images. Traditional methods, utilizing d… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025. Code is available at https://github.com/Jingwei-Bao/T-InvBlocks

  14. arXiv:2412.00575  [pdf, other

    eess.IV cs.CV

    Multi-resolution Guided 3D GANs for Medical Image Translation

    Authors: Juhyung Ha, Jong Sung Park, David Crandall, Eleftherios Garyfallidis, Xuhong Zhang

    Abstract: Medical image translation is the process of converting from one imaging modality to another, in order to reduce the need for multiple image acquisitions from the same patient. This can enhance the efficiency of treatment by reducing the time, equipment, and labor needed. In this paper, we introduce a multi-resolution guided Generative Adversarial Network (GAN)-based framework for 3D medical image… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

  15. arXiv:2410.20742  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Mitigating Unauthorized Speech Synthesis for Voice Protection

    Authors: Zhisheng Zhang, Qianyi Yang, Derui Wang, Pengyang Huang, Yuxin Cao, Kai Ye, Jie Hao

    Abstract: With just a few speech samples, it is possible to perfectly replicate a speaker's voice in recent years, while malicious voice exploitation (e.g., telecom fraud for illegal financial gain) has brought huge hazards in our daily lives. Therefore, it is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints. Most previous defense methods h… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: Accepted to ACM CCS Workshop (LAMPS) 2024

  16. arXiv:2410.14803  [pdf, other

    cs.LG cs.AI cs.DC eess.SY

    DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

    Authors: Taiyi Wang, Zhihao Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao

    Abstract: On-device control agents, especially on mobile devices, are responsible for operating mobile devices to fulfill users' requests, enabling seamless and intuitive interactions. Integrating Multimodal Large Language Models (MLLMs) into these agents enhances their ability to understand and execute complex commands, thereby improving user experience. However, fine-tuning MLLMs for on-device control pre… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: Paper and Appendix, 26 pages

  17. arXiv:2408.11480  [pdf, other

    eess.IV cs.CV

    OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal

    Authors: Qiao Mo, Yukang Ding, Jinhua Hao, Qiang Zhu, Ming Sun, Chao Zhou, Feiyu Chen, Shuyuan Zhu

    Abstract: Deep learning-based methods have shown remarkable performance in single JPEG artifacts removal task. However, existing methods tend to degrade on double JPEG images, which are prevalent in real-world scenarios. To address this issue, we propose Offset-Aware Partition Transformer for double JPEG artifacts removal, termed as OAPT. We conduct an analysis of double JPEG compression that results in up… ▽ More

    Submitted 24 September, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 14 pages, 9 figures. Codes and models are available at https://github.com/QMoQ/OAPT.git

  18. arXiv:2408.05117  [pdf, other

    eess.IV cs.AI cs.CV

    Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images

    Authors: Shouyue Liu, Ziyi Zhang, Yuanyuan Gu, Jinkui Hao, Yonghuai Liu, Huazhu Fu, Xinyu Guo, Hong Song, Shuting Zhang, Yitian Zhao

    Abstract: Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryologic… ▽ More

    Submitted 12 March, 2025; v1 submitted 9 August, 2024; originally announced August 2024.

  19. Multiscale Spatio-Temporal Enhanced Short-term Load Forecasting of Electric Vehicle Charging Stations

    Authors: Zongbao Zhang, Jiao Hao, Wenmeng Zhao, Yan Liu, Yaohui Huang, Xinhang Luo

    Abstract: The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 5 pages, 1 figure, AEEES 2024

  20. arXiv:2405.09470  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

    Authors: Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

    Abstract: In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

  21. arXiv:2403.10636  [pdf

    physics.soc-ph econ.GN eess.SY stat.AP

    Resilient by Design: Simulating Street Network Disruptions across Every Urban Area in the World

    Authors: Geoff Boeing, Jaehyun Ha

    Abstract: Street networks allow people and goods to move through cities, but they are vulnerable to disasters like floods, earthquakes, and terrorist attacks. Well-planned network design can make a city more resilient and robust to such disruptions, but we still know little about worldwide patterns of vulnerability, or worldwide empirical relationships between specific design characteristics and resilience.… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Journal ref: Transportation Research Part A: Policy and Practice, 2024

  22. arXiv:2403.10362  [pdf, other

    eess.IV cs.CV

    CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

    Authors: Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu

    Abstract: Recently, numerous approaches have achieved notable success in compressed video quality enhancement (VQE). However, these methods usually ignore the utilization of valuable coding priors inherently embedded in compressed videos, such as motion vectors and residual frames, which carry abundant temporal and spatial information. To remedy this problem, we propose the Coding Priors-Guided Aggregation… ▽ More

    Submitted 19 November, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 11 pages, 8 figures, 6 tables

  23. arXiv:2402.05706  [pdf, other

    cs.CL cs.SD eess.AS

    Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

    Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo

    Abstract: Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken respons… ▽ More

    Submitted 27 November, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024, Project Page: https://unifiedsdm.github.io/

  24. arXiv:2402.04171  [pdf, other

    eess.IV cs.CV

    3D Volumetric Super-Resolution in Radiology Using 3D RRDB-GAN

    Authors: Juhyung Ha, Nian Wang, Surendra Maharjan, Xuhong Zhang

    Abstract: This study introduces the 3D Residual-in-Residual Dense Block GAN (3D RRDB-GAN) for 3D super-resolution for radiology imagery. A key aspect of 3D RRDB-GAN is the integration of a 2.5D perceptual loss function, which contributes to improved volumetric image quality and realism. The effectiveness of our model was evaluated through 4x super-resolution experiments across diverse datasets, including Mi… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  25. arXiv:2401.16017  [pdf, other

    eess.SP

    DMCE: Diffusion Model Channel Enhancer for Multi-User Semantic Communication Systems

    Authors: Youcheng Zeng, Xinxin He, Xu Chen, Haonan Tong, Zhaohui Yang, Yijun Guo, Jianjun Hao

    Abstract: To achieve continuous massive data transmission with significantly reduced data payload, the users can adopt semantic communication techniques to compress the redundant information by transmitting semantic features instead. However, current works on semantic communication mainly focus on high compression ratio, neglecting the wireless channel effects including dynamic distortion and multi-user int… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: accepted by IEEE ICC 2024

  26. Polar-Net: A Clinical-Friendly Model for Alzheimer's Disease Detection in OCTA Images

    Authors: Shouyue Liu, Jinkui Hao, Yanwu Xu, Huazhu Fu, Xinyu Guo, Jiang Liu, Yalin Zheng, Yonghuai Liu, Jiong Zhang, Yitian Zhao

    Abstract: Optical Coherence Tomography Angiography (OCTA) is a promising tool for detecting Alzheimer's disease (AD) by imaging the retinal microvasculature. Ophthalmologists commonly use region-based analysis, such as the ETDRS grid, to study OCTA image biomarkers and understand the correlation with AD. However, existing studies have used general deep computer vision methods, which present challenges in pr… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted by MICCAI2023

  27. PIPO-Net: A Penalty-based Independent Parameters Optimization Deep Unfolding Network

    Authors: Xiumei Li, Zhijie Zhang, Huang Bai, Ljubiša Stanković, Junpeng Hao, Junmei Sun

    Abstract: Compressive sensing (CS) has been widely applied in signal and image processing fields. Traditional CS reconstruction algorithms have a complete theoretical foundation but suffer from the high computational complexity, while fashionable deep network-based methods can achieve high-accuracy reconstruction of CS but are short of interpretability. These facts motivate us to develop a deep unfolding ne… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  28. arXiv:2310.12507  [pdf, other

    eess.IV

    Multi-granularity Backprojection Transformer for Remote Sensing Image Super-Resolution

    Authors: Jinglei Hao, Wukai Li, Binglu Wang, Shunzhou Wang, Yuting Lu, Ning Li, Yongqiang Zhao

    Abstract: Backprojection networks have achieved promising super-resolution performance for nature images but not well be explored in the remote sensing image super-resolution (RSISR) field due to the high computation costs. In this paper, we propose a Multi-granularity Backprojection Transformer termed MBT for RSISR. MBT incorporates the backprojection learning strategy into a Transformer framework. It cons… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  29. arXiv:2310.00009  [pdf, other

    cs.RO eess.SY

    Dataset Generation for Drone Optimal Placement Using Machine Learning

    Authors: Jialin Hao

    Abstract: Unmanned aerial vehicle (UAV), or drone is increasingly becoming a promising tool in communication system. This report explains the generation details of a dataset which will be used to designing an algorithm for the optimal placement of UAVs in the drone-assisted vehicular network (DAVN). The goal is to improve the drones' communication and energy efficiency after our previous work. The report is… ▽ More

    Submitted 4 September, 2023; originally announced October 2023.

  30. arXiv:2305.19569  [pdf

    cs.LG cs.AI cs.CY eess.SP

    Domain knowledge-informed Synthetic fault sample generation with Health Data Map for cross-domain Planetary Gearbox Fault Diagnosis

    Authors: Jong Moon Ha, Olga Fink

    Abstract: Extensive research has been conducted on fault diagnosis of planetary gearboxes using vibration signals and deep learning (DL) approaches. However, DL-based methods are susceptible to the domain shift problem caused by varying operating conditions of the gearbox. Although domain adaptation and data synthesis methods have been proposed to overcome such domain shifts, they are often not directly app… ▽ More

    Submitted 26 November, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Under review / added arXiv identifier / Updated to revised version

    Journal ref: Published in Mechanical Systems and Signal Processing Volume 202, 1 November 2023, 110680

  31. arXiv:2305.03387  [pdf, other

    eess.IV cs.CV

    AsConvSR: Fast and Lightweight Super-Resolution Network with Assembled Convolutions

    Authors: Jiaming Guo, Xueyi Zou, Yuyi Chen, Yi Liu, Jia Hao, Jianzhuang Liu, Youliang Yan

    Abstract: In recent years, videos and images in 720p (HD), 1080p (FHD) and 4K (UHD) resolution have become more popular for display devices such as TVs, mobile phones and VR. However, these high resolution images cannot achieve the expected visual effect due to the limitation of the internet bandwidth, and bring a great challenge for super-resolution networks to achieve real-time performance. Following this… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  32. arXiv:2305.00505  [pdf, ps, other

    eess.SY math-ph

    Fixed-time safe tracking control of uncertain high-order nonlinear pure-feedback systems via unified transformation functions

    Authors: Chaoqun Guo, Jiangping Hu, Jiasheng Hao, Sergej Celikovsky, Xiaoming Hu

    Abstract: In this paper, a fixed-time safe control problem is investigated for an uncertain high-order nonlinear pure-feedback system with state constraints. A new nonlinear transformation function is firstly proposed to handle both the constrained and unconstrained cases in a unified way. Further, a radial basis function neural network is constructed to approximate the unknown dynamics in the system and a… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

  33. arXiv:2304.03433  [pdf, other

    cs.IT eess.SP

    Multi-User Cooperation for Covert Communication Under Quasi-Static Fading

    Authors: Jinyoung Lee, Duc Trung Dinh, Hyeonsik Yeom, Si-Hyeon Lee, Jeongseok Ha

    Abstract: This work studies a covert communication scheme for an uplink multi-user scenario in which some users are opportunistically selected to help a covert user. In particular, the selected users emit interfering signals via an orthogonal resource dedicated to the covert user together with signals for their own communications using orthogonal resources allocated to the selected users, which helps the co… ▽ More

    Submitted 10 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: 13 pages, 8 figures, This work has been submitted to the IEEE for possible publication

  34. arXiv:2211.13920  [pdf, other

    eess.SP

    Secure Power Control for Downlink Cell-Free Massive MIMO With Passive Eavesdroppers

    Authors: Junguk Park, Sangseok Yun, Jeongseok Ha

    Abstract: This work studies secure communications for a cell-free massive multiple-input multiple-output (CF-mMIMO) network which is attacked by multiple passive eavesdroppers overhearing communications between access points (APs) and users in the network. It will be revealed that the distributed APs in CF-mMIMO allows not only legitimate users but also eavesdroppers to reap the diversity gain, which seriou… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures. This work has been submitted to the IEEE for possible publication

  35. arXiv:2211.12704  [pdf, other

    eess.SP

    Joint Design of Power Control and Access Point Scheduling for Uplink Cell-Free Massive MIMO Networks

    Authors: Hyeonsik Yeom, Junguk Park, Jinho Choi, Jeongseok Ha

    Abstract: This work proposes a joint power control and access points (APs) scheduling algorithm for uplink cell-free massive multiple-input multiple-output (CF-mMIMO) networks without channel hardening assumption. Extensive studies have done on the joint optimization problem assuming the channel hardening. However, it has been reported that the channel hardening may not be validated in some CF-mMIMO environ… ▽ More

    Submitted 25 November, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: 30 pages, 7 Figures. This work has been submitted to the IEEE for possible publication

  36. arXiv:2208.10745  [pdf, other

    eess.IV cs.CV

    Retinal Structure Detection in OCTA Image via Voting-based Multi-task Learning

    Authors: Jinkui Hao, Ting Shen, Xueli Zhu, Yonghuai Liu, Ardhendu Behera, Dan Zhang, Bang Chen, Jiang Liu, Jiong Zhang, Yitian Zhao

    Abstract: Automated detection of retinal structures, such as retinal vessels (RV), the foveal avascular zone (FAZ), and retinal vascular junctions (RVJ), are of great importance for understanding diseases of the eye and clinical decision-making. In this paper, we propose a novel Voting-based Adaptive Feature Fusion multi-task network (VAFF-Net) for joint segmentation, detection, and classification of RV, FA… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  37. arXiv:2208.03324  [pdf, other

    eess.IV cs.CV

    Perception-Distortion Balanced ADMM Optimization for Single-Image Super-Resolution

    Authors: Yuehan Zhang, Bo Ji, Jia Hao, Angela Yao

    Abstract: In image super-resolution, both pixel-wise accuracy and perceptual fidelity are desirable. However, most deep learning methods only achieve high performance in one aspect due to the perception-distortion trade-off, and works that successfully balance the trade-off rely on fusing results from separately trained models with ad-hoc post-processing. In this paper, we propose a novel super-resolution m… ▽ More

    Submitted 16 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  38. arXiv:2207.11697  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Mandarin Speech Recogntion with Block-augmented Transformer

    Authors: Xiaoming Ren, Huifeng Zhu, Liuwei Wei, Minghui Wu, Jie Hao

    Abstract: Recently Convolution-augmented Transformer (Conformer) has shown promising results in Automatic Speech Recognition (ASR), outperforming the previous best published Transformer Transducer. In this work, we believe that the output information of each block in the encoder and decoder is not completely inclusive, in other words, their output information may be complementary. We study how to take advan… ▽ More

    Submitted 1 December, 2022; v1 submitted 24 July, 2022; originally announced July 2022.

  39. arXiv:2206.14962  [pdf, other

    eess.AS cs.SD

    GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block

    Authors: Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Jianjun Hao

    Abstract: For monaural speech enhancement, contextual information is important for accurate speech estimation. However, commonly used convolution neural networks (CNNs) are weak in capturing temporal contexts since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human auditory perception to introduce a two-stage trainable reasoning mechanism, refe… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  40. arXiv:2205.08681  [pdf, other

    eess.AS

    U-Former: Improving Monaural Speech Enhancement with Multi-head Self and Cross Attention

    Authors: Xinmeng Xu, Jianjun Hao

    Abstract: For supervised speech enhancement, contextual information is important for accurate spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, this paper treats the speech enhancement as sequence-to-sequence mapping, and propose a novel monaural speech enhancement U-net structure bas… ▽ More

    Submitted 12 October, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: Accepted by ICPR 2022

  41. arXiv:2203.05784  [pdf

    eess.IV cs.AI cs.CV

    AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications

    Authors: Jin Hao, Jiaxiang Liu, Jin Li, Wei Pan, Ruizhe Chen, Huimin Xiong, Kaiwei Sun, Hangzheng Lin, Wanlu Liu, Wanghui Ding, Jianfei Yang, Haoji Hu, Yueling Zhang, Yang Feng, Zeyu Zhao, Huikai Wu, Youyi Zheng, Bing Fang, Zuozhu Liu, Zhihe Zhao

    Abstract: A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical ap… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: 30 pages, 6 figures, 3 tables

  42. arXiv:2202.10456  [pdf, other

    cs.LG cs.CR cs.CV eess.IV

    Feasibility Study of Multi-Site Split Learning for Privacy-Preserving Medical Systems under Data Imbalance Constraints in COVID-19, X-Ray, and Cholesterol Dataset

    Authors: Yoo Jeong Ha, Gusang Lee, Minjae Yoo, Soyi Jung, Seehwan Yoo, Joongheon Kim

    Abstract: It seems as though progressively more people are in the race to upload content, data, and information online; and hospitals haven't neglected this trend either. Hospitals are now at the forefront for multi-site medical data sharing to provide groundbreaking advancements in the way health records are shared and patients are diagnosed. Sharing of medical data is essential in modern medical research.… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

  43. arXiv:2111.08857  [pdf, other

    cs.LG cs.AI cs.MA cs.RO eess.SY

    SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition

    Authors: Hangyu Mao, Chao Wang, Xiaotian Hao, Yihuan Mao, Yiming Lu, Chengjie Wu, Jianye Hao, Dong Li, Pingzhong Tang

    Abstract: The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex \emph{ObtainDiamond} task with sparse rewards. To address the challenge, in this paper, we present \textbf{SEIHAI}, a \textbf{S}ample-\textbf{e}f… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: The winner solution of NeurIPS 2020 MineRL competition (https://www.aicrowd.com/challenges/neurips-2020-minerl-competition/leaderboards). The paper has been accepted by DAI 2021 (the third International Conference on Distributed Artificial Intelligence)

  44. arXiv:2108.10147  [pdf, other

    cs.LG cs.AI eess.IV

    Spatio-Temporal Split Learning for Privacy-Preserving Medical Platforms: Case Studies with COVID-19 CT, X-Ray, and Cholesterol Data

    Authors: Yoo Jeong Ha, Minjae Yoo, Gusang Lee, Soyi Jung, Sae Won Choi, Joongheon Kim, Seehwan Yoo

    Abstract: Machine learning requires a large volume of sample data, especially when it is used in high-accuracy medical applications. However, patient records are one of the most sensitive private information that is not usually shared among institutes. This paper presents spatio-temporal split learning, a distributed deep neural network framework, which is a turning point in allowing collaboration among pri… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

  45. arXiv:2104.10781  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

    Authors: Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

    Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More

    Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Corrected the MOS values in Table 2, and corrected some minor typos

  46. arXiv:2102.13588  [pdf, other

    eess.IV cs.CV cs.LG

    3D Vessel Reconstruction in OCT-Angiography via Depth Map Estimation

    Authors: Shuai Yu, Jianyang Xie, Jinkui Hao, Yalin Zheng, Jiong Zhang, Yan Hu, Jiang Liu, Yitian Zhao

    Abstract: Optical Coherence Tomography Angiography (OCTA) has been increasingly used in the management of eye and systemic diseases in recent years. Manual or automatic analysis of blood vessel in 2D OCTA images (en face angiograms) is commonly used in clinical practice, however it may lose rich 3D spatial distribution information of blood vessels or capillaries that are useful for clinical decision-making.… ▽ More

    Submitted 26 February, 2021; originally announced February 2021.

  47. arXiv:2101.06268  [pdf, other

    eess.AS cs.SD

    AMFFCN: Attentional Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement

    Authors: Xinmeng Xu, Jianjun Hao

    Abstract: Audio-visual speech enhancement system is regarded to be one of promising solutions for isolating and enhancing speech of desired speaker. Conventional methods focus on predicting clean speech spectrum via a naive convolution neural network based encoder-decoder architecture, and these methods a) not adequate to use data fully and effectively, b) cannot process features selectively. The proposed m… ▽ More

    Submitted 26 September, 2022; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: text overlap with arXiv:2101.05975

  48. arXiv:2101.05975  [pdf, other

    eess.AS cs.SD eess.IV

    Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement

    Authors: Xinmeng Xu, Jianjun Hao

    Abstract: Speech enhancement can potentially benefit from the visual information from the target speaker, such as lip movement and facial expressions, because the visual aspect of speech is essentially unaffected by acoustic environment. In this paper, we address the problem of enhancing corrupted speech signal from videos by using audio-visual (AV) neural processing. Most of recent AV speech enhancement ap… ▽ More

    Submitted 23 May, 2022; v1 submitted 15 January, 2021; originally announced January 2021.

  49. arXiv:2010.09776  [pdf, other

    cs.MA cs.AI cs.GT cs.LG eess.SY

    SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

    Authors: Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat , et al. (12 additional authors not shown)

    Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse a… ▽ More

    Submitted 31 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: 20 pages, 11 figures. Paper accepted to CoRL 2020

  50. arXiv:2010.05440  [pdf

    eess.SY

    Using Empirical Trajectory Data to Design Connected Autonomous Vehicle Controllers for Traffic Stabilization

    Authors: Yujie Li, Sikai Chen, Runjia Du, Paul Young Joun Ha, Jiqian Dong, Samuel Labi

    Abstract: Emerging transportation technologies offer unprecedented opportunities to improve the efficiency of the transportation system from the perspectives of energy consumption, congestion, and emissions. One of these technologies is connected and autonomous vehicles (CAVs). With the prospective duality of operations of CAVs and human driven vehicles in the same roadway space (also referred to as a mixed… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: TRB 2021 Annual Meeting