Skip to main content

Showing 1–44 of 44 results for author: Huang, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.15333  [pdf, ps, other

    cs.CV cs.AI cs.LG eess.IV

    Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception

    Authors: Yulin Wang, Yang Yue, Yang Yue, Huanqian Wang, Haojun Jiang, Yizeng Han, Zanlin Ni, Yifan Pu, Minglei Shi, Rui Lu, Qisen Yang, Andrew Zhao, Zhuofan Xia, Shiji Song, Gao Huang

    Abstract: Human vision is highly adaptive, efficiently sampling intricate environments by sequentially fixating on task-relevant regions. In contrast, prevailing machine vision models passively process entire scenes at once, resulting in excessive resource demands scaling with spatial-temporal input resolution and model size, yielding critical limitations impeding both future advancements and real-world app… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  2. arXiv:2508.00391  [pdf, ps, other

    cs.CV eess.AS

    Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

    Authors: Guanjie Huang, Danny H. K. Tsang, Shan Yang, Guangzhi Lei, Li Liu

    Abstract: Cued Speech (CS) is a visual communication system that combines lip-reading with hand coding to facilitate communication for individuals with hearing impairments. Automatic CS Recognition (ACSR) aims to convert CS hand gestures and lip movements into text via AI-driven methods. Traditionally, the temporal asynchrony between hand and lip movements requires the design of complex modules to facilitat… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

    Comments: 9 pages

  3. arXiv:2507.16632  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Step-Audio 2 Technical Report

    Authors: Boyong Wu, Chao Yan, Chen Hu, Cheng Yi, Chengli Feng, Fei Tian, Feiyu Shen, Gang Yu, Haoyang Zhang, Jingbei Li, Mingrui Chen, Peng Liu, Wang You, Xiangyu Tony Zhang, Xingyuan Li, Xuerui Yang, Yayue Deng, Yechang Huang, Yuxin Li, Yuxin Zhang, Zhao You, Brian Li, Changyi Wan, Hanpeng Hu, Jiangjie Zhen , et al. (84 additional authors not shown)

    Abstract: This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation. By integrating a latent audio encoder and reasoning-centric reinforcement learning (RL), Step-Audio 2 achieves promising performance in automatic speech recognition (ASR) and audio understanding. To facilitate genuine end-to-end speech convers… ▽ More

    Submitted 27 August, 2025; v1 submitted 22 July, 2025; originally announced July 2025.

    Comments: v3: Added introduction and evaluation results of Step-Audio 2 mini

  4. arXiv:2507.16564  [pdf, ps, other

    cs.SD cs.AI eess.AS

    TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

    Authors: Yuxuan He, Xiaoran Yang, Ningning Pan, Gongping Huang

    Abstract: Most existing text-to-audio (TTA) generation methods produce mono outputs, neglecting essential spatial information for immersive auditory experiences. To address this issue, we propose a cascaded method for text-to-multisource binaural audio generation (TTMBA) with both temporal and spatial control. First, a pretrained large language model (LLM) segments the text into a structured format with tim… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 5 pages,3 figures,2 tables

  5. arXiv:2506.12368  [pdf, ps, other

    cs.IT eess.SP

    Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications

    Authors: Guojun Huang, Jiancheng An, Lu Gan, Dusit Niyato, Mérouane Debbah, Tie Jun Cui

    Abstract: Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for direc… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 6 pages, 6 figures, have been accepted by IEEE WCL

  6. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  7. arXiv:2505.07555  [pdf, ps, other

    cs.IT eess.SP

    Energy-Efficient Resource Allocation for NOMA-Assisted Uplink Pinching-Antenna Systems

    Authors: Ming Zeng, Xingwang Li, Ji Wang, Gaojian Huang, Octavia A. Dobre, Zhiguo Ding

    Abstract: The pinching-antenna architecture has emerged as a promising solution for reconfiguring wireless propagation environments and enhancing system performance. While prior research has primarily focused on sum-rate maximization or transmit power minimization of pinching-antenna systems, the critical aspect of energy efficiency (EE) has received limited attention. Given the increasing importance of EE… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: submitted IEEE WCL; 4 figures; 5 pages

  8. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  9. arXiv:2504.01392  [pdf, other

    eess.AS

    Spatial-Filter-Bank-Based Neural Method for Multichannel Speech Enhancement

    Authors: Tianqin Zheng, Jilu Jin, Hanchen Pei, Gongping Huang, Jingdong Chen, Jacob Benesty

    Abstract: The performance of deep learning-based multi-channel speech enhancement methods often deteriorates when the geometric parameters of the microphone array change. Traditional approaches to mitigate this issue typically involve training on multiple microphone arrays, which can be costly. To address this challenge, we focus on uniform circular arrays and propose the use of a spatial filter bank to ext… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  10. arXiv:2503.21785  [pdf, other

    eess.AS cs.SD

    Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free Communication

    Authors: Guanjie Huang, Danny Hin Kwok Tsang, Li Liu

    Abstract: Cued Speech (CS) is an innovative visual communication system that integrates lip-reading with hand coding, designed to enhance effective communication for individuals with hearing impairments. Automatic CS Recognition (ACSR) refers to the AI-driven process of automatically recognizing hand gestures and lip movements in CS, converting them into text. However, previous work often relies on complex… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  11. arXiv:2503.14933  [pdf, ps, other

    eess.IV cs.CV physics.med-ph

    A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology

    Authors: Yi Luo, Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Xiaojian Chen, Rui Zhang, Quan Chen, Wil Ngwa, Kai Ding

    Abstract: Background: Lung cancer ranks as the leading cause of cancer-related mortality worldwide. The complexity of tumor delineation, crucial for radiation therapy, requires expertise often unavailable in resource-limited settings. Artificial Intelligence(AI), particularly with advancements in deep learning (DL) and natural language processing (NLP), offers potential solutions yet is challenged by high f… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 19 pages, 4 figures

  12. arXiv:2503.06809  [pdf, other

    eess.IV cs.CV

    Interactive Tumor Progression Modeling via Sketch-Based Image Editing

    Authors: Gexin Huang, Ruinan Jin, Yucheng Tang, Can Zhao, Tatsuya Harada, Xiaoxiao Li, Gu Lin

    Abstract: Accurately visualizing and editing tumor progression in medical imaging is crucial for diagnosis, treatment planning, and clinical communication. To address the challenges of subjectivity and limited precision in existing methods, we propose SkEditTumor, a sketch-based diffusion model for controllable tumor progression editing. By leveraging sketches as structural priors, our method enables precis… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 9 pages, 4 figures

  13. arXiv:2502.19281  [pdf, ps, other

    eess.SP cs.AI cs.LG

    Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer Interfaces

    Authors: Jiyuan Wang, Weishan Ye, Jialin He, Li Zhang, Gan Huang, Zhuliang Yu, Zhen Liang

    Abstract: With the rapid advancement of deep learning, attention mechanisms have become indispensable in electroencephalography (EEG) signal analysis, significantly enhancing Brain-Computer Interface (BCI) applications. This paper presents a comprehensive review of traditional and Transformer-based attention mechanisms, their embedding strategies, and their applications in EEG-based BCI, with a particular e… ▽ More

    Submitted 7 July, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  14. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  15. arXiv:2502.11462  [pdf, other

    eess.AS cs.LG cs.SD

    LMFCA-Net: A Lightweight Model for Multi-Channel Speech Enhancement with Efficient Narrow-Band and Cross-Band Attention

    Authors: Yaokai Zhang, Hanchen Pei, Wanqi Wang, Gongping Huang

    Abstract: Deep learning based end-to-end multi-channel speech enhancement methods have achieved impressive performance by leveraging sub-band, cross-band, and spatial information. However, these methods often demand substantial computational resources, limiting their practicality on terminal devices. This paper presents a lightweight multi-channel speech enhancement network with decoupled fully connected at… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted at ICASSP 2025

  16. arXiv:2502.09037  [pdf, ps, other

    eess.AS

    Advances in Microphone Array Processing and Multichannel Speech Enhancement

    Authors: Gongping Huang, Jesper R. Jensen, Jingdong Chen, Jacob Benesty, Mads G. Christensen, Akihiko Sugiyama, Gary Elko, Tomas Gaensler

    Abstract: This paper reviews pioneering works in microphone array processing and multichannel speech enhancement, highlighting historical achievements, technological evolution, commercialization aspects, and key challenges. It provides valuable insights into the progression and future direction of these areas. The paper examines foundational developments in microphone array design and optimization, showcasi… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: accepted by ICASSP 2025

  17. arXiv:2501.14264  [pdf, ps, other

    eess.IV cs.CV

    CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image

    Authors: Xiaojun Tang, Jingru Wang, Guangwei Huang, Guannan Chen, Rui Zheng, Lian Huai, Yuyu Liu, Xingqun Jiang

    Abstract: Recent advancements in Blind Image Restoration (BIR) methods, based on Generative Adversarial Networks and Diffusion Models, have significantly improved visual quality. However, they present significant challenges for Image Quality Assessment (IQA), as the existing Full-Reference IQA methods often rate images with high perceptual quality poorly. In this paper, we reassess the Solution Non-Uniquene… ▽ More

    Submitted 8 August, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

  18. arXiv:2412.08801  [pdf, other

    eess.SY

    Development dilemma of ride-sharing: Revenue or social welfare?

    Authors: Wang Chen, Guan Huang, Jintao Ke

    Abstract: This study investigates the development dilemma of ride-sharing services using real-world mobility datasets from nine cities and calibrated customers' price and detour elasticity. Through massive numerical experiments, this study reveals that while ride-sharing can benefit social welfare, it may also lead to a loss of revenue for transportation network companies (TNCs) or drivers compared with sol… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  19. arXiv:2411.19770  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning

    Authors: Haorui He, Yuchen Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, Zhizheng Wu

    Abstract: The effectiveness of one-shot voice conversion (VC) decreases in real-world scenarios where reference speeches, which are often sourced from the internet, contain various disturbances like background noise. To address this issue, we introduce Noro, a noise-robust one-shot VC system. Noro features innovative components tailored for VC using noisy reference speeches, including a dual-branch referenc… ▽ More

    Submitted 28 August, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: Accepted by APSIPA ASC 2025

  20. arXiv:2410.12177  [pdf, other

    physics.optics eess.SY

    Towards Large Scale Atomic Manufacturing: Heterodyne Grating Interferometer with Zero Dead-Zone

    Authors: Can Cui, Lvye Gao, Pengbo Zhao, Menghan Yang, Lifu Liu, Yu Ma, Guangyao Huang, Shengtong Wang, Linbin Luo, Xinghui Li

    Abstract: This paper presents a novel heterodyne grating interferometer designed to meet the precise measurement requirements of next-generation lithography systems and large-scale atomic-level manufacturing. Utilizing a dual-frequency light source, the interferometer enables simultaneous measurement of three degrees of freedom. Key advancements include a compact zero Dead-Zone optical path configuration, s… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 8 pages,11 figures

  21. arXiv:2409.13371  [pdf

    eess.IV cs.CV

    MCICSAM: Monte Carlo-guided Interpolation Consistency Segment Anything Model for Semi-Supervised Prostate Zone Segmentation

    Authors: Guantian Huang, Beibei Li, Xiaobing Fan, Aritrick Chatterjee, Cheng Wei, Shouliang Qi, Wei Qian, Dianning He

    Abstract: Accurate segmentation of various regions within the prostate is pivotal for diagnosing and treating prostate-related diseases. However, the scarcity of labeled data, particularly in specialized medical fields like prostate imaging, poses a significant challenge. Segment Anything Model (SAM) is a new large model for natural image segmentation, but there are some challenges in medical imaging. In or… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 13 pages, 5 figures

  22. Towards pedestrian head tracking: A benchmark dataset and a multi-source data fusion network

    Authors: Kailai Sun, Xinwei Wang, Shaobo Liu, Qianchuan Zhao, Gao Huang, Chang Liu

    Abstract: Pedestrian detection and tracking in crowded video sequences have many applications, including autonomous driving, robot navigation and pedestrian flow analysis. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although artificial intelligence (AI) models have achieved great progress in he… ▽ More

    Submitted 19 August, 2025; v1 submitted 11 August, 2024; originally announced August 2024.

    Comments: dataset:https://doi.org/10.34740/kaggle/ds/7494891

    Journal ref: Engineering Applications of Artificial Intelligence, 158, 111265 (2025)

  23. arXiv:2407.03089  [pdf, other

    eess.SP cs.LG q-bio.NC

    Generative AI Enables EEG Super-Resolution via Spatio-Temporal Adaptive Diffusion Learning

    Authors: Shuqiang Wang, Tong Zhou, Yanyan Shen, Ye Li, Guoheng Huang, Yong Hu

    Abstract: Electroencephalogram (EEG) technology, particularly high-density EEG (HD EEG) devices, is widely used in fields such as neuroscience. HD EEG devices improve the spatial resolution of EEG by placing more electrodes on the scalp, which meet the requirements of clinical diagnostic applications such as epilepsy focus localization. However, this technique faces challenges, such as high acquisition cost… ▽ More

    Submitted 22 February, 2025; v1 submitted 3 July, 2024; originally announced July 2024.

  24. arXiv:2406.13165  [pdf, other

    eess.IV cs.AI cs.CV cs.RO

    Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model

    Authors: Haojun Jiang, Zhenguo Sun, Ning Jia, Meng Li, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe moveme… ▽ More

    Submitted 21 October, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI2024

  25. arXiv:2404.01609  [pdf

    eess.SY

    Identifying the Largest RoCoF and Its Implications

    Authors: Licheng Wang, Luochen Xie, Gang Huang, Changsen Feng

    Abstract: The rate of change of frequency (RoCoF) is a critical factor in ensuring frequency security, particularly in power systems with low inertia. Currently, most RoCoF security constrained optimal inertia dispatch methods and inertia market mechanisms predominantly rely on the center of inertia (COI) model. This model, however, does not account for the disparities in post-contingency frequency dynamics… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  26. arXiv:2403.17701   

    eess.IV cs.CV cs.LG

    Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation

    Authors: Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu

    Abstract: Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its var… ▽ More

    Submitted 3 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Experimental method encountered errors, undergoing experiment again

  27. arXiv:2403.11626  [pdf, other

    cs.GR cs.AI cs.CV cs.MM cs.SD eess.AS

    QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation

    Authors: Zhizhen Zhou, Yejing Huo, Guoheng Huang, An Zeng, Xuhang Chen, Lian Huang, Zinuo Li

    Abstract: The study of music-generated dance is a novel and challenging Image generation task. It aims to input a piece of music and seed motions, then generate natural dance movements for the subsequent music. Transformer-based methods face challenges in time series prediction tasks related to human movements and music due to their struggle in capturing the nonlinear relationship and temporal aspects. This… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by The Visual Computer Journal

  28. arXiv:2402.14099  [pdf, other

    eess.IV cs.CV physics.med-ph

    EXACT-Net:EHR-guided lung tumor auto-segmentation for non-small cell lung cancer radiotherapy

    Authors: Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Rui Zhang, Katelyn Kelly, Quan Chen, Kai Ding

    Abstract: Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, which accounts for 87% of diagnoses, require radiation therapy. Rapid treatment initiation significantly increases the patient's survival rate and reduces the mortality rate. Accurate tumor segmentation is a critical step in the diagnosis and treatment o… ▽ More

    Submitted 31 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  29. arXiv:2402.12143  [pdf, other

    eess.SP

    Joint mode switching and resource allocation in wireless-powered RIS-aided multiuser communication systems

    Authors: Mingang Yuan, Wenzhe Zhang, Gaofei Huang

    Abstract: This paper investigates a wireless-powered hybrid reflecting intelligent surface (hybrid RIS)-assisted multiple access system, where the RIS can harvest energy from energy station (ES) transmitted radio frequency signal (RF), and each reflecting element can flexibly switch between active mode, passive mode, and idle mode. The objective is to minimize the maximum energy consumption of the users by… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  30. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  31. IoT-based Analysis for Smart Energy Management

    Authors: Guang-Li Huang, Adnan Anwar, Seng W. Loke, Arkady Zaslavsky, Jinho Choi

    Abstract: Smart energy management based on the Internet of Things (IoT) aims to achieve optimal energy utilization through real-time energy monitoring and analyses of power consumption patterns in IoT networks (e.g., residential homes and offices) supported by wireless technologies. This is of great significance for the sustainable development of energy. Energy disaggregation is an important technology to r… ▽ More

    Submitted 25 August, 2023; originally announced November 2023.

  32. arXiv:2305.02596  [pdf

    eess.SY

    A Soft Coordination Method of Heterogeneous Devices in Distribution System Voltage Control

    Authors: Licheng Wang, Tao Wang, Gang Huang, Ruifeng Yan, Kai Wang, Youbing Zhang, Shijie Cheng

    Abstract: With the continuous increase of photovoltaic (PV) penetration, the voltage control interactions between newly installed PV inverters and previously deployed on-load tap-changer (OLTC) transformers become ever more significant. To achieve coordinated voltage regulation, current methods often rely on a decision-making algorithm to fully take over the control of all devices, requiring OLTC to give up… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  33. arXiv:2304.06496  [pdf, other

    eess.SP cs.HC cs.LG

    EEGMatch: Learning with Incomplete Labels for Semi-Supervised EEG-based Cross-Subject Emotion Recognition

    Authors: Rushuang Zhou, Weishan Ye, Zhiguo Zhang, Yanyang Luo, Li Zhang, Linling Li, Gan Huang, Yining Dong, Yuan-Ting Zhang, Zhen Liang

    Abstract: Electroencephalography (EEG) is an objective tool for emotion recognition and shows promising performance. However, the label scarcity problem is a main challenge in this field, which limits the wide application of EEG-based emotion recognition. In this paper, we propose a novel semi-supervised learning framework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, an EEG-Mixup based… ▽ More

    Submitted 29 August, 2024; v1 submitted 27 March, 2023; originally announced April 2023.

  34. arXiv:2210.15285  [pdf, other

    cs.SD cs.CL eess.AS

    SAN: a robust end-to-end ASR model architecture

    Authors: Zeping Min, Qian Ge, Guanhua Huang

    Abstract: In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. Specifically, SAN constructs two sub-networks to differentiate the audio feature input and then introduces a loss to unify the output distribution of these sub-networks. Adversarial learning enables the network to captur… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  35. arXiv:2209.07581  [pdf, other

    physics.space-ph cs.LG eess.SP

    The Development of Spatial Attention U-Net for The Recovery of Ionospheric Measurements and The Extraction of Ionospheric Parameters

    Authors: Guan-Han Huang, Alexei V. Dmitriev, Chia-Hsien Lin, Yu-Chi Chang, Mon-Chai Hsieh, Enkhtuya Tsogtbaatar, Merlin M. Mendoza, Hao-Wei Hsu, Yu-Chiang Lin, Lung-Chih Tsai, Yung-Hui Li

    Abstract: We train a deep learning artificial neural network model, Spatial Attention U-Net to recover useful ionospheric signals from noisy ionogram data measured by Hualien's Vertical Incidence Pulsed Ionospheric Radar. Our results show that the model can well identify F2 layer ordinary and extraordinary modes (F2o, F2x) and the combined signals of the E layer (ordinary and extraordinary modes and sporadi… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: 17 pages, 7 figures, 3 tables

    Journal ref: Radio Science 57 (2022) e2022RS007471

  36. arXiv:2205.01858  [pdf

    q-bio.QM cs.LG eess.SP

    DeeptDCS: Deep Learning-Based Estimation of Currents Induced During Transcranial Direct Current Stimulation

    Authors: Xiaofan Jia, Sadeed Bin Sayed, Nahian Ibn Hasan, Luis J. Gomez, Guang-Bin Huang, Abdulkadir C. Yucel

    Abstract: Objective: Transcranial direct current stimulation (tDCS) is a non-invasive brain stimulation technique used to generate conduction currents in the head and disrupt brain functions. To rapidly evaluate the tDCS-induced current density in near real-time, this paper proposes a deep learning-based emulator, named DeeptDCS. Methods: The emulator leverages Attention U-net taking the volume conductor mo… ▽ More

    Submitted 6 October, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

  37. Cross-Modality Deep Feature Learning for Brain Tumor Segmentation

    Authors: Dingwen Zhang, Guohai Huang, Qiang Zhang, Jungong Han, Junwei Han, Yizhou Yu

    Abstract: Recent advances in machine learning and prevalence of digital medical images have opened up an opportunity to address the challenging brain tumor segmentation (BTS) task by using deep convolutional neural networks. However, different from the RGB image data that are very widespread, the medical image data used in brain tumor segmentation are relatively scarce in terms of the data scale but contain… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: published on Pattern Recognition 2021

  38. arXiv:2105.03245  [pdf, other

    cs.CV cs.LG eess.IV

    Adaptive Focus for Efficient Video Recognition

    Authors: Yulin Wang, Zhaoxi Chen, Haojun Jiang, Shiji Song, Yizeng Han, Gao Huang

    Abstract: In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency. It is observed that the most informative region in each frame of a video is usually a small image patch, which shifts smoothly across frames. Therefore, we model the patch localization problem as a sequential decision task, and propose a reinforcement learning based approach… ▽ More

    Submitted 18 August, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

    Comments: ICCV 2021 (oral presentation)

  39. arXiv:2102.03777  [pdf, other

    cs.HC cs.LG eess.SP

    EEGFuseNet: Hybrid Unsupervised Deep Feature Characterization and Fusion for High-Dimensional EEG with An Application to Emotion Recognition

    Authors: Zhen Liang, Rushuang Zhou, Li Zhang, Linling Li, Gan Huang, Zhiguo Zhang, Shin Ishii

    Abstract: How to effectively and efficiently extract valid and reliable features from high-dimensional electroencephalography (EEG), particularly how to fuse the spatial and temporal dynamic brain information into a better feature representation, is a critical issue in brain data analysis. Most current EEG studies work in a task driven manner and explore the valid EEG features with a supervised model, which… ▽ More

    Submitted 27 August, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

    Journal ref: IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29(2021) 1913-1925

  40. arXiv:2102.03357  [pdf, other

    eess.SP cs.AI cs.LG eess.SY

    Machine Learning for Electronic Design Automation: A Survey

    Authors: Guyue Huang, Jingbo Hu, Yifan He, Jialong Liu, Mingyuan Ma, Zhaoyang Shen, Juejian Wu, Yuanfan Xu, Hengrui Zhang, Kai Zhong, Xuefei Ning, Yuzhe Ma, Haoyu Yang, Bei Yu, Huazhong Yang, Yu Wang

    Abstract: With the down-scaling of CMOS technology, the design complexity of very large-scale integrated (VLSI) is increasing. Although the application of machine learning (ML) techniques in electronic design automation (EDA) can trace its history back to the 90s, the recent breakthrough of ML and the increasing complexity of EDA tasks have aroused more interests in incorporating ML to solve EDA tasks. In t… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 January, 2021; originally announced February 2021.

    Comments: Accepted by TODAES. The first 10 authors are ordered alphabetically

  41. arXiv:2010.12876  [pdf, other

    eess.IV cs.LG eess.SP

    Electromagnetic Source Imaging via a Data-Synthesis-Based Convolutional Encoder-Decoder Network

    Authors: Gexin Huang, Jiawen Liang, Ke Liu, Chang Cai, ZhengHui Gu, Feifei Qi, Yuan Qing Li, Zhu Liang Yu, Wei Wu

    Abstract: Electromagnetic source imaging (ESI) requires solving a highly ill-posed inverse problem. To seek a unique solution, traditional ESI methods impose various forms of priors that may not accurately reflect the actual source properties, which may hinder their broad applications. To overcome this limitation, in this paper a novel data-synthesized spatio-temporally convolutional encoder-decoder network… ▽ More

    Submitted 13 July, 2022; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: 15 pages, 14 figures, and journal

  42. arXiv:1911.11417  [pdf, other

    eess.SP

    COOK: Chirp-OOK Communication with Self-reliant Bitrate Adaptation in Backscatter Networks

    Authors: Gang Huang, Panlong Yang, Xin He, Yubo Yan, Hao Zhou, Xiangyang Li, Pengjun Wan

    Abstract: For large-scale Internet of Things (IoT), backscatter communication is a promising technology to reduce power consumption and simplify deployment. However, backscatter communication lacks stability, along with limited communication range within a few meters. Due to the limited computation ability of backscatter tags, it is burdensome to effectively adapt the bitrate for the time-varying channel. T… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: 7 pages

  43. arXiv:1910.13799  [pdf, other

    eess.AS cs.LG cs.SD

    Multimodal Learning For Classroom Activity Detection

    Authors: Hang Li, Yu Kang, Wenbiao Ding, Song Yang, Songfan Yang, Gale Yan Huang, Zitao Liu

    Abstract: Classroom activity detection (CAD) focuses on accurately classifying whether the teacher or student is speaking and recording both the length of individual utterances during a class. A CAD solution helps teachers get instant feedback on their pedagogical instructions. This greatly improves educators' teaching skills and hence leads to students' achievement. However, CAD is very challenging because… ▽ More

    Submitted 10 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: The 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

  44. arXiv:1908.09445  [pdf, other

    cs.RO cs.CV eess.IV

    High Performance Visual Object Tracking with Unified Convolutional Networks

    Authors: Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

    Abstract: Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different tasks and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards real-time applicati… ▽ More

    Submitted 25 August, 2019; originally announced August 2019.

    Comments: Extended version of [arXiv:1711.04661] our UCT tracker in ICCV VOT2017