Search | arXiv e-print repository

Ambiguity Function Analysis of AFDM Signals for Integrated Sensing and Communications

Authors: Haoran Yin, Yanqun Tang, Yuanhan Ni, Zulin Wang, Gaojie Chen, Jun Xiong, Kai Yang, Marios Kountouris, Yong Liang Guan, Yong Zeng

Abstract: Affine frequency division multiplexing (AFDM) is a promising chirp-based waveform with high flexibility and resilience, making it well-suited for next-generation wireless networks, particularly in high-mobility scenarios. In this paper, we investigate the ambiguity functions (AFs) of AFDM signals, which fundamentally characterize their range and velocity estimation capabilities in both monostatic… ▽ More Affine frequency division multiplexing (AFDM) is a promising chirp-based waveform with high flexibility and resilience, making it well-suited for next-generation wireless networks, particularly in high-mobility scenarios. In this paper, we investigate the ambiguity functions (AFs) of AFDM signals, which fundamentally characterize their range and velocity estimation capabilities in both monostatic and bistatic settings. Specifically, we first derive the auto-ambiguity function (AAF) of an AFDM chirp subcarrier, revealing its "spike-like" local property and "periodic-like" global property along the rotated delay and Doppler dimensions. This structure naturally forms a parallelogram for each localized pulse of the AAF of the AFDM chirp subcarrier, enabling unambiguous target sensing. Then, we study the cross-ambiguity function (CAF) between two different AFDM chirp subcarriers, which exhibits the same local and global properties as the AAF but with an additional shift along the Doppler dimension. We then extend our analysis to the AF of various typical AFDM frames, considering both deterministic pilot and random data symbols. In particular, we demonstrate that inserting guard symbols in AFDM facilitates interference-free sensing. Simulation results validate our theoretical findings, highlighting AFDM's strong potential for ISAC applications. △ Less

Submitted 10 July, 2025; originally announced July 2025.

Comments: 14 pages, 14 figures. Under revision in an IEEE Journal

arXiv:2506.12537 [pdf, ps, other]

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction

Authors: Xiaoran Fan, Zhichao Sun, Yangfan Gao, Jingfei Xiong, Hang Yan, Yifei Cao, Jiajun Sun, Shuo Li, Zhihao Zhang, Zhiheng Xi, Yuhao Zhou, Senjie Jin, Changhao Jiang, Junjie Ye, Ming Zhang, Rui Zheng, Zhenhua Han, Yunke Zhang, Demei Yan, Shaokang Dong, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract: Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the impact of key components (i.e., speech tokenizers, speech heads, and speaker modeling) on the performance of LLM-centric SLMs. We… ▽ More Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the impact of key components (i.e., speech tokenizers, speech heads, and speaker modeling) on the performance of LLM-centric SLMs. We compare coupled, semi-decoupled, and fully decoupled speech tokenizers under a fair SLM framework and find that decoupled tokenization significantly improves alignment and synthesis quality. To address the information density mismatch between speech and text, we introduce multi-token prediction (MTP) into SLMs, enabling each hidden state to decode multiple speech tokens. This leads to up to 12$\times$ faster decoding and a substantial drop in word error rate (from 6.07 to 3.01). Furthermore, we propose a speaker-aware generation paradigm and introduce RoleTriviaQA, a large-scale role-playing knowledge QA benchmark with diverse speaker identities. Experiments demonstrate that our methods enhance both knowledge understanding and speaker consistency. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2505.22855 [pdf, ps, other]

IRS: Incremental Relationship-guided Segmentation for Digital Pathology

Authors: Ruining Deng, Junchao Zhu, Juming Xiong, Can Cui, Tianyuan Yao, Junlin Guo, Siqi Lu, Marilyn Lionts, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Yihe Yang, Paul Dennis Simonson, Mert R. Sabuncu, Haichun Yang, Yuankai Huo

Abstract: Continual learning is rapidly emerging as a key focus in computer vision, aiming to develop AI systems capable of continuous improvement, thereby enhancing their value and practicality in diverse real-world applications. In healthcare, continual learning holds great promise for continuously acquired digital pathology data, which is collected in hospitals on a daily basis. However, panoramic segmen… ▽ More Continual learning is rapidly emerging as a key focus in computer vision, aiming to develop AI systems capable of continuous improvement, thereby enhancing their value and practicality in diverse real-world applications. In healthcare, continual learning holds great promise for continuously acquired digital pathology data, which is collected in hospitals on a daily basis. However, panoramic segmentation on digital whole slide images (WSIs) presents significant challenges, as it is often infeasible to obtain comprehensive annotations for all potential objects, spanning from coarse structures (e.g., regions and unit objects) to fine structures (e.g., cells). This results in temporally and partially annotated data, posing a major challenge in developing a holistic segmentation framework. Moreover, an ideal segmentation model should incorporate new phenotypes, unseen diseases, and diverse populations, making this task even more complex. In this paper, we introduce a novel and unified Incremental Relationship-guided Segmentation (IRS) learning scheme to address temporally acquired, partially annotated data while maintaining out-of-distribution (OOD) continual learning capacity in digital pathology. The key innovation of IRS lies in its ability to realize a new spatial-temporal OOD continual learning paradigm by mathematically modeling anatomical relationships between existing and newly introduced classes through a simple incremental universal proposition matrix. Experimental results demonstrate that the IRS method effectively handles the multi-scale nature of pathological segmentation, enabling precise kidney segmentation across various structures (regions, units, and cells) as well as OOD disease lesions at multiple magnifications. This capability significantly enhances domain generalization, making IRS a robust approach for real-world digital pathology applications. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.19709 [pdf, ps, other]

Capacity-Optimized Pre-Equalizer Design for Visible Light Communication Systems

Authors: Runxin Zhang, Yulin Shao, Jian Xiong, Lu Lu, Murat Uysal

Abstract: Since commercial LEDs are primarily designed for illumination rather than data transmission, their modulation bandwidth is inherently limited to a few MHz. This becomes a major bottleneck in the implementation of visible light communication (VLC) systems necessiating the design of pre-equalizers. While state-of-the-art equalizer designs primarily focus on the data rate increasing through bandwidth… ▽ More Since commercial LEDs are primarily designed for illumination rather than data transmission, their modulation bandwidth is inherently limited to a few MHz. This becomes a major bottleneck in the implementation of visible light communication (VLC) systems necessiating the design of pre-equalizers. While state-of-the-art equalizer designs primarily focus on the data rate increasing through bandwidth expansion, they often overlook the accompanying degradation in signal-to-noise ratio (SNR). Achieving effective bandwidth extension without introducing excessive SNR penalties remains a significant challenge, since the channel capacity is a non-linear function of both parameters. In this paper, we present a fundamental analysis of how the parameters of the LED and pre-equalization circuits influence the channel capacity in intensity modulation and direct detection (IMDD)-based VLC systems. We derive a closed-form expression for channel capacity model that is an explicitly function of analog pre-equalizer circuit parameters. Building upon the derived capacity expression, we propose a systematic design methodology for analog pre-equalizers that effectively balances bandwidth and SNR, thereby maximizing the overall channel capacity across a wide range of channel attenuations. We present extensive numerical results to validate the effectiveness of the proposed design and demonstrate the improvements over conventional bandwidth-optimized pre-equalizer designs. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2502.04199 [pdf, other]

Expanding Training Data for Endoscopic Phenotyping of Eosinophilic Esophagitis

Authors: Juming Xiong, Hou Xiong, Quan Liu, Ruining Deng, Regina N Tyree, Girish Hiremath, Yuankai Huo

Abstract: Eosinophilic esophagitis (EoE) is a chronic esophageal disorder marked by eosinophil-dominated inflammation. Diagnosing EoE usually involves endoscopic inspection of the esophageal mucosa and obtaining esophageal biopsies for histologic confirmation. Recent advances have seen AI-assisted endoscopic imaging, guided by the EREFS system, emerge as a potential alternative to reduce reliance on invasiv… ▽ More Eosinophilic esophagitis (EoE) is a chronic esophageal disorder marked by eosinophil-dominated inflammation. Diagnosing EoE usually involves endoscopic inspection of the esophageal mucosa and obtaining esophageal biopsies for histologic confirmation. Recent advances have seen AI-assisted endoscopic imaging, guided by the EREFS system, emerge as a potential alternative to reduce reliance on invasive histological assessments. Despite these advancements, significant challenges persist due to the limited availability of data for training AI models - a common issue even in the development of AI for more prevalent diseases. This study seeks to improve the performance of deep learning-based EoE phenotype classification by augmenting our training data with a diverse set of images from online platforms, public datasets, and electronic textbooks increasing our dataset from 435 to 7050 images. We utilized the Data-efficient Image Transformer for image classification and incorporated attention map visualizations to boost interpretability. The findings show that our expanded dataset and model enhancements improved diagnostic accuracy, robustness, and comprehensive analysis, enhancing patient outcomes. △ Less

Submitted 6 February, 2025; originally announced February 2025.

arXiv:2411.15942 [pdf, other]

Cross-organ Deployment of EOS Detection AI without Retraining: Feasibility and Limitation

Authors: Yifei Wu, Juming Xiong, Tianyuan Yao, Ruining Deng, Junlin Guo, Jialin Yue, Naweed Chowdhury, Yuankai Huo

Abstract: Chronic rhinosinusitis (CRS) is characterized by persistent inflammation in the paranasal sinuses, leading to typical symptoms of nasal congestion, facial pressure, olfactory dysfunction, and discolored nasal drainage, which can significantly impact quality-of-life. Eosinophils (Eos), a crucial component in the mucosal immune response, have been linked to disease severity in CRS. The diagnosis of… ▽ More Chronic rhinosinusitis (CRS) is characterized by persistent inflammation in the paranasal sinuses, leading to typical symptoms of nasal congestion, facial pressure, olfactory dysfunction, and discolored nasal drainage, which can significantly impact quality-of-life. Eosinophils (Eos), a crucial component in the mucosal immune response, have been linked to disease severity in CRS. The diagnosis of eosinophilic CRS typically uses a threshold of 10-20 eos per high-power field (HPF). However, manually counting Eos in histological samples is laborious and time-intensive, making the use of AI-driven methods for automated evaluations highly desirable. Interestingly, eosinophils are predominantly located in the gastrointestinal (GI) tract, which has prompted the release of numerous deep learning models trained on GI data. This study leverages a CircleSnake model initially trained on upper-GI data to segment Eos cells in whole slide images (WSIs) of nasal tissues. It aims to determine the extent to which Eos segmentation models developed for the GI tract can be adapted to nasal applications without retraining. The experimental results show promising accuracy in some WSIs, although, unsurprisingly, the performance varies across cases. This paper details these performance outcomes, delves into the reasons for such variations, and aims to provide insights that could guide future development of deep learning models for eosinophilic CRS. △ Less

Submitted 24 November, 2024; originally announced November 2024.

Comments: 8 pages, 5 figures. Accepted by SPIE Medical Imaging 2025 on October 28, 2024

arXiv:2411.13766 [pdf, ps, other]

Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge

Authors: Ruiyang Qin, Dancheng Liu, Gelei Xu, Zheyu Yan, Chenhui Xu, Yuting Hu, X. Sharon Hu, Jinjun Xiong, Yiyu Shi

Abstract: The combination of Large Language Models (LLM) and Automatic Speech Recognition (ASR), when deployed on edge devices (called edge ASR-LLM), can serve as a powerful personalized assistant to enable audio-based interaction for users. Compared to text-based interaction, edge ASR-LLM allows accessible and natural audio interactions. Unfortunately, existing ASR-LLM models are mainly trained in high-per… ▽ More The combination of Large Language Models (LLM) and Automatic Speech Recognition (ASR), when deployed on edge devices (called edge ASR-LLM), can serve as a powerful personalized assistant to enable audio-based interaction for users. Compared to text-based interaction, edge ASR-LLM allows accessible and natural audio interactions. Unfortunately, existing ASR-LLM models are mainly trained in high-performance computing environments and produce substantial model weights, making them difficult to deploy on edge devices. More importantly, to better serve users' personalized needs, the ASR-LLM must be able to learn from each distinct user, given that audio input often contains highly personalized characteristics that necessitate personalized on-device training. Since individually fine-tuning the ASR or LLM often leads to suboptimal results due to modality-specific limitations, end-to-end training ensures seamless integration of audio features and language understanding (cross-modal alignment), ultimately enabling a more personalized and efficient adaptation on edge devices. However, due to the complex training requirements and substantial computational demands of existing approaches, cross-modal alignment between ASR audio and LLM can be challenging on edge devices. In this work, we propose a resource-efficient cross-modal alignment framework that bridges ASR and LLMs on edge devices to handle personalized audio input. Our framework enables efficient ASR-LLM alignment on resource-constrained devices like NVIDIA Jetson Orin (8GB RAM), achieving 50x training time speedup while improving the alignment quality by more than 50\%. To the best of our knowledge, this is the first work to study efficient ASR-LLM alignment on resource-constrained edge devices. △ Less

Submitted 9 July, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

Comments: Accepted by ICCAD'25

arXiv:2411.00078 [pdf, other]

How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment

Authors: Junlin Guo, Siqi Lu, Can Cui, Ruining Deng, Tianyuan Yao, Zhewen Tao, Yizhe Lin, Marilyn Lionts, Quan Liu, Juming Xiong, Yu Wang, Shilin Zhao, Catie Chang, Mitchell Wilkes, Mengmeng Yin, Haichun Yang, Yuankai Huo

Abstract: Training AI foundation models has emerged as a promising large-scale learning approach for addressing real-world healthcare challenges, including digital pathology. While many of these models have been developed for tasks like disease diagnosis and tissue quantification using extensive and diverse training datasets, their readiness for deployment on some arguably simplest tasks, such as nuclei seg… ▽ More Training AI foundation models has emerged as a promising large-scale learning approach for addressing real-world healthcare challenges, including digital pathology. While many of these models have been developed for tasks like disease diagnosis and tissue quantification using extensive and diverse training datasets, their readiness for deployment on some arguably simplest tasks, such as nuclei segmentation within a single organ (e.g., the kidney), remains uncertain. This paper seeks to answer this key question, "How good are we?", by thoroughly evaluating the performance of recent cell foundation models on a curated multi-center, multi-disease, and multi-species external testing dataset. Additionally, we tackle a more challenging question, "How can we improve?", by developing and assessing human-in-the-loop data enrichment strategies aimed at enhancing model performance while minimizing the reliance on pixel-level human annotation. To address the first question, we curated a multicenter, multidisease, and multispecies dataset consisting of 2,542 kidney whole slide images (WSIs). Three state-of-the-art (SOTA) cell foundation models-Cellpose, StarDist, and CellViT-were selected for evaluation. To tackle the second question, we explored data enrichment algorithms by distilling predictions from the different foundation models with a human-in-the-loop framework, aiming to further enhance foundation model performance with minimal human efforts. Our experimental results showed that all three foundation models improved over their baselines with model fine-tuning with enriched data. Interestingly, the baseline model with the highest F1 score does not yield the best segmentation outcomes after fine-tuning. This study establishes a benchmark for the development and deployment of cell vision foundation models tailored for real-world data applications. △ Less

Submitted 31 October, 2024; originally announced November 2024.

arXiv:2410.11865 [pdf, other]

Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

Authors: Dancheng Liu, Jason Yang, Ishan Albrecht-Buehler, Helen Qin, Sophie Li, Yuting Hu, Amir Nassereldine, Jinjun Xiong

Abstract: Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a… ▽ More Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a growing need for efficient and scalable SLA methods powered by artificial intelligence. This position paper presents a survey of existing techniques suitable for automating SLA pipelines, with an emphasis on adapting automatic speech recognition (ASR) models for children's speech, an overview of current SLAs and their automated counterparts to demonstrate the feasibility of AI-enhanced SLA pipelines, and a discussion of practical considerations, including accessibility and privacy concerns, associated with the deployment of AI-powered SLAs. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: AAAI-FSS 24

arXiv:2409.16277 [pdf, other]

Compressed Depth Map Super-Resolution and Restoration: AIM 2024 Challenge Results

Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Jinhui Xiong, Wei Ye, Rakesh Ranjan, Radu Timofte

Abstract: The increasing demand for augmented reality (AR) and virtual reality (VR) applications highlights the need for efficient depth information processing. Depth maps, essential for rendering realistic scenes and supporting advanced functionalities, are typically large and challenging to stream efficiently due to their size. This challenge introduces a focus on developing innovative depth upsampling te… ▽ More The increasing demand for augmented reality (AR) and virtual reality (VR) applications highlights the need for efficient depth information processing. Depth maps, essential for rendering realistic scenes and supporting advanced functionalities, are typically large and challenging to stream efficiently due to their size. This challenge introduces a focus on developing innovative depth upsampling techniques to reconstruct high-quality depth maps from compressed data. These techniques are crucial for overcoming the limitations posed by depth compression, which often degrades quality, loses scene details and introduces artifacts. By enhancing depth upsampling methods, this challenge aims to improve the efficiency and quality of depth map reconstruction. Our goal is to advance the state-of-the-art in depth processing technologies, thereby enhancing the overall user experience in AR and VR applications. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: ECCV 2024 - Advances in Image Manipulation (AIM)

arXiv:2409.13117 [pdf, other]

Breaking the Barriers of One-to-One Usage of Implicit Neural Representation in Image Compression: A Linear Combination Approach with Performance Guarantees

Authors: Sai Sanjeet, Seyyedali Hosseinalipour, Jinjun Xiong, Masahiro Fujita, Bibhu Datta Sahoo

Abstract: In an era where the exponential growth of image data driven by the Internet of Things (IoT) is outpacing traditional storage solutions, this work explores and advances the potential of Implicit Neural Representation (INR) as a transformative approach to image compression. INR leverages the function approximation capabilities of neural networks to represent various types of data. While previous res… ▽ More In an era where the exponential growth of image data driven by the Internet of Things (IoT) is outpacing traditional storage solutions, this work explores and advances the potential of Implicit Neural Representation (INR) as a transformative approach to image compression. INR leverages the function approximation capabilities of neural networks to represent various types of data. While previous research has employed INR to achieve compression by training small networks to reconstruct large images, this work proposes a novel advancement: representing multiple images with a single network. By modifying the loss function during training, the proposed approach allows a small number of weights to represent a large number of images, even those significantly different from each other. A thorough analytical study of the convergence of this new training method is also carried out, establishing upper bounds that not only confirm the validity of the method but also offer insights into optimal hyperparameter design. The proposed method is evaluated on the Kodak, ImageNet, and CIFAR-10 datasets. Experimental results demonstrate that all 24 images in the Kodak dataset can be represented by linear combinations of two sets of weights, achieving a peak signal-to-noise ratio (PSNR) of 26.5 dB with as low as 0.2 bits per pixel (BPP). The proposed method matches the rate-distortion performance of state-of-the-art image codecs, such as BPG, on the CIFAR-10 dataset. Additionally, the proposed method maintains the fundamental properties of INR, such as arbitrary resolution reconstruction of images. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: 10 pages, 13 figures

arXiv:2408.06381 [pdf, other]

Assessment of Cell Nuclei AI Foundation Models in Kidney Pathology

Authors: Junlin Guo, Siqi Lu, Can Cui, Ruining Deng, Tianyuan Yao, Zhewen Tao, Yizhe Lin, Marilyn Lionts, Quan Liu, Juming Xiong, Yu Wang, Shilin Zhao, Catie Chang, Mitchell Wilkes, Mengmeng Yin, Haichun Yang, Yuankai Huo

Abstract: Cell nuclei instance segmentation is a crucial task in digital kidney pathology. Traditional automatic segmentation methods often lack generalizability when applied to unseen datasets. Recently, the success of foundation models (FMs) has provided a more generalizable solution, potentially enabling the segmentation of any cell type. In this study, we perform a large-scale evaluation of three widely… ▽ More Cell nuclei instance segmentation is a crucial task in digital kidney pathology. Traditional automatic segmentation methods often lack generalizability when applied to unseen datasets. Recently, the success of foundation models (FMs) has provided a more generalizable solution, potentially enabling the segmentation of any cell type. In this study, we perform a large-scale evaluation of three widely used state-of-the-art (SOTA) cell nuclei foundation models (Cellpose, StarDist, and CellViT). Specifically, we created a highly diverse evaluation dataset consisting of 2,542 kidney whole slide images (WSIs) collected from both human and rodent sources, encompassing various tissue types, sizes, and staining methods. To our knowledge, this is the largest-scale evaluation of its kind to date. Our quantitative analysis of the prediction distribution reveals a persistent performance gap in kidney pathology. Among the evaluated models, CellViT demonstrated superior performance in segmenting nuclei in kidney pathology. However, none of the foundation models are perfect; a performance gap remains in general nuclei segmentation for kidney pathology. △ Less

Submitted 6 February, 2025; v1 submitted 9 August, 2024; originally announced August 2024.

arXiv:2407.06662 [pdf, other]

Experimental Demonstration of 16D Voronoi Constellation with Two-Level Coding over 50km Four-Core Fiber

Authors: Can Zhao, Bin Chen, Jiaqi Cai, Zhiwei Liang, Yi Lei, Junjie Xiong, Lin Ma, Daohui Hu, Lin Sun, Gangxiang Shen

Abstract: A 16-dimensional Voronoi constellation concatenated with multilevel coding is experimentally demonstrated over a 50km four-core fiber transmission system. The proposed scheme reduces the required launch power by 6dB and provides a 17dB larger operating range than 16QAM with BICM at the outer HD-FEC BER threshold. A 16-dimensional Voronoi constellation concatenated with multilevel coding is experimentally demonstrated over a 50km four-core fiber transmission system. The proposed scheme reduces the required launch power by 6dB and provides a 17dB larger operating range than 16QAM with BICM at the outer HD-FEC BER threshold. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 4 pages, 4 figures, accepted by 2024 European Conference on Optical Communication (ECOC)

arXiv:2407.00596 [pdf, other]

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.19286

arXiv:2406.17926 [pdf, other]

FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data

Authors: Dancheng Liu, Jinjun Xiong

Abstract: Automatic Speech Recognition (ASR) for adults' speeches has made significant progress by employing deep neural network (DNN) models recently, but improvement in children's speech is still unsatisfactory due to children's speech's distinct characteristics. DNN models pre-trained on adult data often struggle in generalizing children's speeches with fine tuning because of the lack of high-quality ali… ▽ More Automatic Speech Recognition (ASR) for adults' speeches has made significant progress by employing deep neural network (DNN) models recently, but improvement in children's speech is still unsatisfactory due to children's speech's distinct characteristics. DNN models pre-trained on adult data often struggle in generalizing children's speeches with fine tuning because of the lack of high-quality aligned children's speeches. When generating datasets, human annotations are not scalable, and existing forced-alignment tools are not usable as they make impractical assumptions about the quality of the input transcriptions. To address these challenges, we propose a new forced-alignment tool, FASA, as a flexible and automatic speech aligner to extract high-quality aligned children's speech data from many of the existing noisy children's speech data. We demonstrate its usage on the CHILDES dataset and show that FASA can improve data quality by 13.6$\times$ over human annotations. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 4 pages, 1 figure

arXiv:2406.15668 [pdf, other]

PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices

Authors: Amir Nassereldine, Dancheng Liu, Chenhui Xu, Ruiyang Qin, Yiyu Shi, Jinjun Xiong

Abstract: Edge-based automatic speech recognition (ASR) technologies are increasingly prevalent in the development of intelligent and personalized assistants. However, resource-constrained ASR models face significant challenges in adaptivity, incrementality, and inclusivity when faced with a diverse population. To tackle those challenges, we propose PI-Whisper, a novel ASR system that adaptively enhances re… ▽ More Edge-based automatic speech recognition (ASR) technologies are increasingly prevalent in the development of intelligent and personalized assistants. However, resource-constrained ASR models face significant challenges in adaptivity, incrementality, and inclusivity when faced with a diverse population. To tackle those challenges, we propose PI-Whisper, a novel ASR system that adaptively enhances recognition capabilities by identifying speakers' characteristics in real-time. In this work, we show how the design of PI-Whisper allows for incremental adaptation of new characteristics without the need for repetitive retraining, enhances recognition capabilities, and improves equity and fairness across diverse speaker groups. PI-Whisper demonstrates these advantages by achieving state-of-the-art accuracy, reducing the word error rate (WER) by up to 13.7% relative to baselines while scaling linearly to computing resources. △ Less

Submitted 23 December, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

Comments: in submission

arXiv:2403.04945 [pdf, ps, other]

MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

Authors: Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Jing Xiong, Rossella Arcucci, Huaxiu Yao, Mi Zhang

Abstract: Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the… ▽ More Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT's results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, resilience to signal perturbation, and alignment with human expert evaluation. These findings emphasize the efficacy of MEIT and its potential for real-world clinical application. △ Less

Submitted 7 July, 2025; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: ACL 2025

arXiv:2402.19286 [pdf, other]

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intricate spatial interrelations among objects from clinical knowledge. In this research, we introduce a novel universal proposition learning approach, called panoramic renal pathology segmentation (PrPSeg), designed to segment comprehensively panoramic structures within kidney by integrating extensive knowledge of kidney anatomy. In this paper, we propose (1) the design of a comprehensive universal proposition matrix for renal pathology, facilitating the incorporation of classification and spatial relationships into the segmentation process; (2) a token-based dynamic head single network architecture, with the improvement of the partial label image segmentation and capability for future data enlargement; and (3) an anatomy loss function, quantifying the inter-object relationships across the kidney. △ Less

Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

arXiv:2311.08880 [pdf, other]

Motion Control of Two Mobile Robots under Allowable Collisions

Authors: Li Tan, Wei Ren, Xi-Ming Sun, Junlin Xiong

Abstract: This letter investigates the motion control problem of two mobile robots under allowable collisions. Here, the allowable collisions mean that the collisions do not damage the mobile robots. The occurrence of the collisions is discussed and the effects of the collisions on the mobile robots are analyzed to develop a hybrid model of each mobile robot under allowable collisions. Based on the effects… ▽ More This letter investigates the motion control problem of two mobile robots under allowable collisions. Here, the allowable collisions mean that the collisions do not damage the mobile robots. The occurrence of the collisions is discussed and the effects of the collisions on the mobile robots are analyzed to develop a hybrid model of each mobile robot under allowable collisions. Based on the effects of the collisions, we show the necessity of redesigning the motion control strategy for mobile robots. Furthermore, impulsive control techniques are applied to redesign the motion control strategy to guarantee the task accomplishment for each mobile robot. Finally, an example is used to illustrate the redesigned motion control strategy. △ Less

Submitted 26 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 8 pages, 5 figures

arXiv:2308.08974 [pdf, other]

Eosinophils Instance Object Segmentation on Whole Slide Imaging Using Multi-label Circle Representation

Authors: Yilin Liu, Ruining Deng, Juming Xiong, Regina N Tyree, Hernan Correa, Girish Hiremath, Yaohong Wang, Yuankai Huo

Abstract: Eosinophilic esophagitis (EoE) is a chronic and relapsing disease characterized by esophageal inflammation. Symptoms of EoE include difficulty swallowing, food impaction, and chest pain which significantly impact the quality of life, resulting in nutritional impairments, social limitations, and psychological distress. The diagnosis of EoE is typically performed with a threshold (15 to 20) of eosin… ▽ More Eosinophilic esophagitis (EoE) is a chronic and relapsing disease characterized by esophageal inflammation. Symptoms of EoE include difficulty swallowing, food impaction, and chest pain which significantly impact the quality of life, resulting in nutritional impairments, social limitations, and psychological distress. The diagnosis of EoE is typically performed with a threshold (15 to 20) of eosinophils (Eos) per high-power field (HPF). Since the current counting process of Eos is a resource-intensive process for human pathologists, automatic methods are desired. Circle representation has been shown as a more precise, yet less complicated, representation for automatic instance cell segmentation such as CircleSnake approach. However, the CircleSnake was designed as a single-label model, which is not able to deal with multi-label scenarios. In this paper, we propose the multi-label CircleSnake model for instance segmentation on Eos. It extends the original CircleSnake model from a single-label design to a multi-label model, allowing segmentation of multiple object types. Experimental results illustrate the CircleSnake model's superiority over the traditional Mask R-CNN model and DeepSnake model in terms of average precision (AP) in identifying and segmenting eosinophils, thereby enabling enhanced characterization of EoE. This automated approach holds promise for streamlining the assessment process and improving diagnostic accuracy in EoE analysis. The source code has been made publicly available at https://github.com/yilinliu610730/EoE. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.06333 [pdf, other]

Deep Learning-Based Open Source Toolkit for Eosinophil Detection in Pediatric Eosinophilic Esophagitis

Authors: Juming Xiong, Yilin Liu, Ruining Deng, Regina N Tyree, Hernan Correa, Girish Hiremath, Yaohong Wang, Yuankai Huo

Abstract: Eosinophilic Esophagitis (EoE) is a chronic, immune/antigen-mediated esophageal disease, characterized by symptoms related to esophageal dysfunction and histological evidence of eosinophil-dominant inflammation. Owing to the intricate microscopic representation of EoE in imaging, current methodologies which depend on manual identification are not only labor-intensive but also prone to inaccuracies… ▽ More Eosinophilic Esophagitis (EoE) is a chronic, immune/antigen-mediated esophageal disease, characterized by symptoms related to esophageal dysfunction and histological evidence of eosinophil-dominant inflammation. Owing to the intricate microscopic representation of EoE in imaging, current methodologies which depend on manual identification are not only labor-intensive but also prone to inaccuracies. In this study, we develop an open-source toolkit, named Open-EoE, to perform end-to-end whole slide image (WSI) level eosinophil (Eos) detection using one line of command via Docker. Specifically, the toolkit supports three state-of-the-art deep learning-based object detection models. Furthermore, Open-EoE further optimizes the performance by implementing an ensemble learning strategy, and enhancing the precision and reliability of our results. The experimental results demonstrated that the Open-EoE toolkit can efficiently detect Eos on a testing set with 289 WSIs. At the widely accepted threshold of >= 15 Eos per high power field (HPF) for diagnosing EoE, the Open-EoE achieved an accuracy of 91%, showing decent consistency with pathologist evaluations. This suggests a promising avenue for integrating machine learning methodologies into the diagnostic process for EoE. The docker and source code has been made publicly available at https://github.com/hrlblab/Open-EoE. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.14778 [pdf, other]

MATNilm: Multi-appliance-task Non-intrusive Load Monitoring with Limited Labeled Data

Authors: Jing Xiong, Tianqi Hong, Dongbo Zhao, Yu Zhang

Abstract: Non-intrusive load monitoring (NILM) identifies the status and power consumption of various household appliances by disaggregating the total power usage signal of an entire house. Efficient and accurate load monitoring facilitates user profile establishment, intelligent household energy management, and peak load shifting. This is beneficial for both the end-users and utilities by improving the ove… ▽ More Non-intrusive load monitoring (NILM) identifies the status and power consumption of various household appliances by disaggregating the total power usage signal of an entire house. Efficient and accurate load monitoring facilitates user profile establishment, intelligent household energy management, and peak load shifting. This is beneficial for both the end-users and utilities by improving the overall efficiency of a power distribution network. Existing approaches mainly focus on developing an individual model for each appliance. Those approaches typically rely on a large amount of household-labeled data which is hard to collect. In this paper, we propose a multi-appliance-task framework with a training-efficient sample augmentation (SA) scheme that boosts the disaggregation performance with limited labeled data. For each appliance, we develop a shared-hierarchical split structure for its regression and classification tasks. In addition, we also propose a two-dimensional attention mechanism in order to capture spatio-temporal correlations among all appliances. With only one-day training data and limited appliance operation profiles, the proposed SA algorithm can achieve comparable test performance to the case of training with the full dataset. Finally, simulation results show that our proposed approach features a significantly improved performance over many baseline models. The relative errors can be reduced by more than 50% on average. The codes of this work are available at https://github.com/jxiong22/MATNilm △ Less

Submitted 29 July, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.14076 [pdf, other]

A Phase-Coded Time-Domain Interleaved OTFS Waveform with Improved Ambiguity Function

Authors: Jiajun Zhu, Yanqun Tang, Chao Yang, Chi Zhang, Haoran Yin, Jiaojiao Xiong, Yuhua Chen

Abstract: Integrated sensing and communication (ISAC) is a significant application scenario in future wireless communication networks, and sensing capability of a waveform is always evaluated by the ambiguity function. To enhance the sensing performance of the orthogonal time frequency space (OTFS) waveform, we propose a novel time-domain interleaved cyclic-shifted P4-coded OTFS (TICP4-OTFS) with improved a… ▽ More Integrated sensing and communication (ISAC) is a significant application scenario in future wireless communication networks, and sensing capability of a waveform is always evaluated by the ambiguity function. To enhance the sensing performance of the orthogonal time frequency space (OTFS) waveform, we propose a novel time-domain interleaved cyclic-shifted P4-coded OTFS (TICP4-OTFS) with improved ambiguity function. TICP4-OTFS can achieve superior autocorrelation features in both the time and frequency domains by exploiting the multicarrier-like form of OTFS after interleaved and the favorable autocorrelation attributes of the P4 code. Furthermore, we present the vectorized formulation of TICP4-OTFS modulation as well as its signal structure in each domain. Numerical simulations show that our proposed TICP4-OTFS waveform outperforms OTFS with a narrower mainlobe as well as lower and more distant sidelobes in terms of delay and Doppler-dimensional ambiguity functions, and an instance of range estimation using pulse compression is illustrated to exhibit the proposed waveform\u2019s greater resolution. Besides, TICP4-OTFS achieves better performance of bit error rate for communication in low signal-to-noise ratio (SNR) scenarios. △ Less

Submitted 23 September, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: This paper has been accepted by 2023 IEEE Globecom Workshops (GC Wkshps): Workshop on Integrated Sensing and Communications for Internet of Things

arXiv:2307.09279 [pdf, other]

Regression-free Blind Image Quality Assessment with Content-Distortion Consistency

Authors: Xiaoqi Wang, Jian Xiong, Hao Gao, Weisi Lin

Abstract: The optimization objective of regression-based blind image quality assessment (IQA) models is to minimize the mean prediction error across the training dataset, which can lead to biased parameter estimation due to potential training data biases. To mitigate this issue, we propose a regression-free framework for image quality evaluation, which is based upon retrieving locally similar instances by i… ▽ More The optimization objective of regression-based blind image quality assessment (IQA) models is to minimize the mean prediction error across the training dataset, which can lead to biased parameter estimation due to potential training data biases. To mitigate this issue, we propose a regression-free framework for image quality evaluation, which is based upon retrieving locally similar instances by incorporating semantic and distortion feature spaces. The approach is motivated by the observation that the human visual system (HVS) exhibits analogous perceptual responses to semantically similar image contents impaired by identical distortions, which we term as content-distortion consistency. The proposed method constructs a hierarchical k-nearest neighbor (k-NN) algorithm for instance retrieval through two classification modules: semantic classification (SC) module and distortion classification (DC) module. Given a test image and an IQA database, the SC module retrieves multiple pristine images semantically similar to the test image. The DC module then retrieves instances based on distortion similarity from the distorted images that correspond to each retrieved pristine image. Finally, quality prediction is obtained by aggregating the subjective scores of the retrieved instances. Without training on subjective quality scores, the proposed regression-free method achieves competitive, even superior performance compared to state-of-the-art regression-based methods on authentic and synthetic distortion IQA benchmarks. △ Less

Submitted 21 October, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.02306 [pdf, other]

Cross-CBAM: A Lightweight network for Scene Segmentation

Authors: Zhengbin Zhang, Zhenhao Xu, Xingsheng Gu, Juan Xiong

Abstract: Scene parsing is a great challenge for real-time semantic segmentation. Although traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed is unsatisfactory. Meanwhile, this progress is achieved with fairly large networks and powerful computational resources. However, it is difficult to run extremely large models on edge… ▽ More Scene parsing is a great challenge for real-time semantic segmentation. Although traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed is unsatisfactory. Meanwhile, this progress is achieved with fairly large networks and powerful computational resources. However, it is difficult to run extremely large models on edge computing devices with limited computing power, which poses a huge challenge to the real-time semantic segmentation tasks. In this paper, we present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable field-of-view and multiscale information. And we propose a Cross Convolutional Block Attention Module(CCBAM), in which a cross-multiply operation is employed in the CCBAM module to make high-level semantic information guide low-level detail information. Different from previous work, these works use attention to focus on the desired information in the backbone. CCBAM uses cross-attention for feature fusion in the FPN structure. Extensive experiments on the Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the proposed Cross-CBAM model by achieving a promising trade-off between segmentation accuracy and inference speed. On the Cityscapes test set, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti. △ Less

Submitted 4 June, 2023; originally announced June 2023.

arXiv:2305.08465 [pdf, other]

An Overview of Resource Allocation in Integrated Sensing and Communication

Authors: Jinming Du, Yanqun Tang, Xizhang Wei, Jiaojiao Xiong, Jiajun Zhu, Haoran Yin, Chi Zhang, Haibo Chen

Abstract: Integrated sensing and communication (ISAC) is considered as a promising solution for improving spectrum efficiency and relieving wireless spectrum congestion. This paper systematically introduces the evolutionary path of ISAC technologies, then sorts out and summarizes the current research status of ISAC resource allocation. From the perspective of different integrated levels of ISAC, we introduc… ▽ More Integrated sensing and communication (ISAC) is considered as a promising solution for improving spectrum efficiency and relieving wireless spectrum congestion. This paper systematically introduces the evolutionary path of ISAC technologies, then sorts out and summarizes the current research status of ISAC resource allocation. From the perspective of different integrated levels of ISAC, we introduce and elaborate the research progress of resource allocation in different stages, namely, resource separated, orthogonal, converged, and collaborative stages. In addition, we give in-depth consideration to propose a new resource allocation framework from a multi-granularity perspective. Finally, we demonstrate the feasibility of our proposed framework with a case of full-duplex ISAC system. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 6 pages,4 figures,conference

arXiv:2305.05082 [pdf, other]

doi 10.1109/ACCESS.2023.3275095

A Unifying Framework of Attention-based Neural Load Forecasting

Authors: Jing Xiong, Yu Zhang

Abstract: Accurate load forecasting is critical for reliable and efficient planning and operation of electric power grids. In this paper, we propose a unifying deep learning framework for load forecasting, which includes time-varying feature weighting, hierarchical temporal attention, and feature-reinforced error correction. Our framework adopts a modular design with good generalization capability. First, t… ▽ More Accurate load forecasting is critical for reliable and efficient planning and operation of electric power grids. In this paper, we propose a unifying deep learning framework for load forecasting, which includes time-varying feature weighting, hierarchical temporal attention, and feature-reinforced error correction. Our framework adopts a modular design with good generalization capability. First, the feature-weighting mechanism assigns input features with temporal weights. Second, a recurrent encoder-decoder structure with hierarchical attention is developed as a load predictor. The hierarchical attention enables a similar day selection, which re-evaluates the importance of historical information at each time step. Third, we develop an error correction module that explores the errors and learned feature hidden information to further improve the model's forecasting performance. Experimental results demonstrate that our proposed framework outperforms existing methods on two public datasets and performance metrics, with the feature weighting mechanism and error correction module being critical to achieving superior performance. Our framework provides an effective solution to the electric load forecasting problem, which can be further adapted to many other forecasting tasks. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2302.14224 [pdf, other]

Overview and Performance Analysis of Various Waveforms in High Mobility Scenarios

Authors: Yu Zhou, Haoran Yin, Jiaojiao Xiong, Shiyu Song, Jiajun Zhu, Jinming Du, Haibo Chen, Yanqun Tang

Abstract: In the high-mobility scenarios of next-generation wireless communication systems (beyond 5G/6G), the performance of orthogonal frequency division multiplexing (OFDM) deteriorates drastically due to the loss of orthogonality between the subcarriers caused by large Doppler frequency shifts. Various emerging waveforms have been proposed for fast time-varying channels with excellent results. In this p… ▽ More In the high-mobility scenarios of next-generation wireless communication systems (beyond 5G/6G), the performance of orthogonal frequency division multiplexing (OFDM) deteriorates drastically due to the loss of orthogonality between the subcarriers caused by large Doppler frequency shifts. Various emerging waveforms have been proposed for fast time-varying channels with excellent results. In this paper, we classify these waveforms from the perspective of their modulation domain and establish a unified framework to provide a comprehensive system structure comparison. Then we analyze bit error rate (BER) performance of each waveform in doubly selective channels. Through the discussions on their complexity and compatibility with OFDM systems, we finally give the candidate waveform suggestions. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.11179

Cyclic Delay-Doppler Shift: A Simple Transmit Diversity Technique for Delay-Doppler Waveforms in Doubly Selective Channels

Authors: Haoran Yin, Jiaojiao Xiong, Yu Zhou, Chi Zhang, Di Zhang, Xizhang Wei, Yanqun Tang

Abstract: Delay-Doppler waveform design has been considered as a promising solution to achieve reliable communication under high-mobility channels for the space-air-ground-integrated networks (SAGIN). In this paper, we introduce the cyclic delay-Doppler shift (CDDS) technique for delay-Doppler waveforms to extract transmit diversity in doubly selective channels. Two simple CDDS schemes, named time-domain CD… ▽ More Delay-Doppler waveform design has been considered as a promising solution to achieve reliable communication under high-mobility channels for the space-air-ground-integrated networks (SAGIN). In this paper, we introduce the cyclic delay-Doppler shift (CDDS) technique for delay-Doppler waveforms to extract transmit diversity in doubly selective channels. Two simple CDDS schemes, named time-domain CDDS (TD-CDDS) and modulation-domain CDDS (MD-CDDS), are proposed in the setting of multiple-input multiple-output (MIMO). We demonstrate the applications of CDDS on two representative delay-Doppler waveforms, namely orthogonal time frequency space (OTFS) and affine frequency division multiplexing (AFDM), by deriving their corresponding CDDS matrices. Furthermore, we prove theoretically and experimentally that CDDS can provide OTFS and AFDM with full transmit diversity gain on most occasions. △ Less

Submitted 14 January, 2025; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: We are requesting the withdrawal of this paper due to critical issues identified in the document. Specifically, in Section III of the paper, the expression in Equation (7) is ambiguous and leads to inconsistencies in the subsequent derivations and conclusions. As a result, this could potentially confuse readers and misguide further research. Significant changes are made to the documents

arXiv:2211.08658 [pdf, other]

Consistent Direct Time-of-Flight Video Depth Super-Resolution

Authors: Zhanghao Sun, Wei Ye, Jinhui Xiong, Gyeongmin Choe, Jialiang Wang, Shuochen Su, Rakesh Ranjan

Abstract: Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. However, limited by manufacturing capabilities in a compact module, the dToF data has a low spatial resolution (e.g., $\sim 20\times30$ for iPhone dToF), and it requires a super-resolution step before being passed to downstream tasks. In this paper, we solve this super-resolution problem by fusing the low-… ▽ More Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. However, limited by manufacturing capabilities in a compact module, the dToF data has a low spatial resolution (e.g., $\sim 20\times30$ for iPhone dToF), and it requires a super-resolution step before being passed to downstream tasks. In this paper, we solve this super-resolution problem by fusing the low-resolution dToF data with the corresponding high-resolution RGB guidance. Unlike the conventional RGB-guided depth enhancement approaches, which perform the fusion in a per-frame manner, we propose the first multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the low-resolution dToF imaging. In addition, dToF sensors provide unique depth histogram information for each local patch, and we incorporate this dToF-specific feature in our network design to further alleviate spatial ambiguity. To evaluate our models on complex dynamic indoor environments and to provide a large-scale dToF sensor dataset, we introduce DyDToF, the first synthetic RGB-dToF video dataset that features dynamic objects and a realistic dToF simulator following the physical imaging process. We believe the methods and dataset are beneficial to a broad community as dToF depth sensing is becoming mainstream on mobile devices. Our code and data are publicly available: https://github.com/facebookresearch/DVSR/ △ Less

Submitted 3 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.03577 [pdf]

Regrowth-free AlGaInAs MQW polarization controller integrated with sidewall grating DFB laser

Authors: Xiao Sun, Song Liang, Weiqing Cheng, Shengwei Ye, Yiming Sun, Yongguang Huang, Ruikang Zhang, Jichuan Xiong, Xuefeng Liu, John H. Marsh, Lianping Hou

Abstract: We report an AlGaInAs multiple quantum well integrated source of polarization controlled light consisting of a polarization mode converter PMC, differential phase shifter(DPS), and a side wall grating distributed-feedback DFB laser. We demonstrate an asymmetrical stepped-height ridge waveguide PMC to realize TE to TM polarization conversion and a symmetrical straight waveguide DPS to enable polari… ▽ More We report an AlGaInAs multiple quantum well integrated source of polarization controlled light consisting of a polarization mode converter PMC, differential phase shifter(DPS), and a side wall grating distributed-feedback DFB laser. We demonstrate an asymmetrical stepped-height ridge waveguide PMC to realize TE to TM polarization conversion and a symmetrical straight waveguide DPS to enable polarization rotation from approximately counterclockwise circular polarization to linear polarization. Based on the identical epitaxial layer scheme, all of the PMC, DPS, and DFB laser can be integrated monolithically using only a single step of metalorganic vapor phase epitaxy and two steps of III V material dry etching. For the DFB-PMC device, a high TE to TM polarization conversion efficiency 98% over a wide range of DFB injection currents is reported at 1555 nm wavelength. For the DFB-PMC-DPS device, a 60 degree rotation of the Stokes vector was obtained on the Poincaré sphere with a range of bias voltage from 0 V to -4.0 V at IDFB is 170 mA. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2210.10519

arXiv:2208.07655 [pdf, other]

A Hybrid Deep Feature-Based Deformable Image Registration Method for Pathology Images

Authors: Chulong Zhang, Yuming Jiang, Na Li, Zhicheng Zhang, Md Tauhidul Islam, Jingjing Dai, Lin Liu, Wenfeng He, Wenjian Qin, Jing Xiong, Yaoqin Xie, Xiaokun Liang

Abstract: Pathologists need to combine information from differently stained pathology slices for accurate diagnosis. Deformable image registration is a necessary technique for fusing multi-modal pathology slices. This paper proposes a hybrid deep feature-based deformable image registration framework for stained pathology samples. We first extract dense feature points via the detector-based and detector-free… ▽ More Pathologists need to combine information from differently stained pathology slices for accurate diagnosis. Deformable image registration is a necessary technique for fusing multi-modal pathology slices. This paper proposes a hybrid deep feature-based deformable image registration framework for stained pathology samples. We first extract dense feature points via the detector-based and detector-free deep learning feature networks and perform points matching. Then, to further reduce false matches, an outlier detection method combining the isolation forest statistical model and the local affine correction model is proposed. Finally, the interpolation method generates the deformable vector field for pathology image registration based on the above matching points. We evaluate our method on the dataset of the Non-rigid Histology Image Registration (ANHIR) challenge, which is co-organized with the IEEE ISBI 2019 conference. Our technique outperforms the traditional approaches by 17% with the Average-Average registration target error (rTRE) reaching 0.0034. The proposed method achieved state-of-the-art performance and ranked 1st in evaluating the test dataset. The proposed hybrid deep feature-based registration method can potentially become a reliable method for pathology image registration. △ Less

Submitted 10 April, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: 22 pages, 12 figures. This work has been submitted to the IEEE for possible publication

arXiv:2205.02939 [pdf]

Modelling Pre-fatigue, Low-velocity Impact and Fatigue behaviours of Composite Helicopter Tail Structures under Multipoint Coordinated Loading Spectrum

Authors: Zheng-Qiang Cheng, Wei Tan, Jun-Jiang Xiong

Abstract: This paper aims to numerically study the pre-fatigue, low-velocity impact (LVI) and fatigue progressive damage behaviours of a full-scale composite helicopter tail structure under multipoint coordinated loading spectrum. First, a fatigue progressive damage model (PDM) incorporating multiaxial fatigue residual strength degradation rule, fatigue failure criteria based on fatigue residual strength co… ▽ More This paper aims to numerically study the pre-fatigue, low-velocity impact (LVI) and fatigue progressive damage behaviours of a full-scale composite helicopter tail structure under multipoint coordinated loading spectrum. First, a fatigue progressive damage model (PDM) incorporating multiaxial fatigue residual strength degradation rule, fatigue failure criteria based on fatigue residual strength concept and sudden stiffness degradation rule was proposed. Then, an LVI progressive damage model for plain-weave (PW) and unidirectional (UD) composites was developed. Moreover, a full-process analysis algorithm with a reasonable damage transfer strategy for pre-fatigue, LVI and fatigue progressive damage analysis was proposed. Finally, a highly computational efficient and accurate full-scale global-local finite element (FE) model of helicopter tail structure was built to predict strain distribution under two flight working conditions, to predict LVI damage under impact loading, and to assess fatigue damage behaviours under multipoint coordinated loading spectrum. The numerical predictions agree well with test results from this work and literature data, indicating that the developed pre-fatigue, LVI, fatigue PDMs and algorithms, as well as the global-local FE modelling based on shell-to-solid coupling, can effectively analyse the impact damage tolerance of full-scale aircraft structures. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: 43 pages, 16 figures

arXiv:2203.02655 [pdf, other]

Audio-visual speech separation based on joint feature representation with cross-modal attention

Authors: Junwen Xiong, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha, Yanning Zhang

Abstract: Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments. Unfortunately, most of current separation strategies prefer a straightforward fusion based on feature learning of each single modality, which is far from sufficient consideration of inter-relationships between modalites. Inspired by learning joint feature rep… ▽ More Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments. Unfortunately, most of current separation strategies prefer a straightforward fusion based on feature learning of each single modality, which is far from sufficient consideration of inter-relationships between modalites. Inspired by learning joint feature representations from audio and visual streams with attention mechanism, in this study, a novel cross-modal fusion strategy is proposed to benefit the whole framework with semantic correlations between different modalities. To further improve audio-visual speech separation, the dense optical flow of lip motion is incorporated to strengthen the robustness of visual representation. The evaluation of the proposed work is performed on two public audio-visual speech separation benchmark datasets. The overall improvement of the performance has demonstrated that the additional motion network effectively enhances the visual representation of the combined lip images and audio signal, as well as outperforming the baseline in terms of all metrics with the proposed cross-modal fusion. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: 5 pages, 3 figures

arXiv:2203.02216 [pdf, other]

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

Authors: Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha

Abstract: Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding. According to their respective characteristics, the scheme of independently designed architecture has been widely used in correspondence to each single task. This may lead to the representation learned by the model being task-specific, and inevitably result in the l… ▽ More Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding. According to their respective characteristics, the scheme of independently designed architecture has been widely used in correspondence to each single task. This may lead to the representation learned by the model being task-specific, and inevitably result in the lack of generalization ability of the feature based on multi-modal modeling. More recent studies have shown that establishing cross-modal relationship between auditory and visual stream is a promising solution for the challenge of audio-visual multi-task learning. Therefore, as a motivation to bridge the multi-modal associations in audio-visual tasks, a unified framework is proposed to achieve target speaker detection and speech enhancement with joint learning of audio-visual modeling in this study. △ Less

Submitted 7 July, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: 13 pages, 8figures

arXiv:2201.12862 [pdf, ps, other]

Lyapunov Conditions for Input-to-State Stability of Hybrid Systems with Memory

Authors: Wei Ren, Junlin Xiong

Abstract: This paper studies input-to-state stability for hybrid systems with memory, which models hybrid dynamics affected by time delays. Using both Lyapunov-Razumikhin functions and Lyapunov-Krasovskii functionals, Lyapunov-based sufficient conditions are established for input-to-state stability. In addition, further extensions and relaxations are proposed for special cases, such as the stable flow/jump… ▽ More This paper studies input-to-state stability for hybrid systems with memory, which models hybrid dynamics affected by time delays. Using both Lyapunov-Razumikhin functions and Lyapunov-Krasovskii functionals, Lyapunov-based sufficient conditions are established for input-to-state stability. In addition, further extensions and relaxations are proposed for special cases, such as the stable flow/jump cases and the cases that Lyapunov functions do not decrease strictly during flow/jumps. Finally, two examples are used to illustrate the developed results. △ Less

Submitted 30 January, 2022; originally announced January 2022.

Comments: 8 pages, IEEE Transactions on Automatic Control

arXiv:2201.08514 [pdf, other]

How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Authors: Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong

Abstract: Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and proves the benefits of unlabeled dat… ▽ More Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and proves the benefits of unlabeled data in both training convergence and generalization ability. To make our theoretical analysis feasible, we focus on the case of one-hidden-layer neural networks. However, theoretical understanding of iterative self-training is non-trivial even for a shallow neural network. One of the key challenges is that existing neural network landscape analysis built upon supervised learning no longer holds in the (semi-supervised) self-training paradigm. We address this challenge and prove that iterative self-training converges linearly with both convergence rate and generalization accuracy improved in the order of $1/\sqrt{M}$, where $M$ is the number of unlabeled samples. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training. △ Less

Submitted 14 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: 36 pages

Journal ref: Tenth International Conference on Learning Representations 2022

arXiv:2110.14307 [pdf, other]

doi 10.1109/TMC.2021.3073969

RF-Based Human Activity Recognition Using Signal Adapted Convolutional Neural Network

Authors: Zhe Chen, Chao Cai, Tianyue Zheng, Jun Luo, Jie Xiong, Xin Wang

Abstract: Human Activity Recognition (HAR) plays a critical role in a wide range of real-world applications, and it is traditionally achieved via wearable sensing. Recently, to avoid the burden and discomfort caused by wearable devices, device-free approaches exploiting RF signals arise as a promising alternative for HAR. Most of the latest device-free approaches require training a large deep neural network… ▽ More Human Activity Recognition (HAR) plays a critical role in a wide range of real-world applications, and it is traditionally achieved via wearable sensing. Recently, to avoid the burden and discomfort caused by wearable devices, device-free approaches exploiting RF signals arise as a promising alternative for HAR. Most of the latest device-free approaches require training a large deep neural network model in either time or frequency domain, entailing extensive storage to contain the model and intensive computations to infer activities. Consequently, even with some major advances on device-free HAR, current device-free approaches are still far from practical in real-world scenarios where the computation and storage resources possessed by, for example, edge devices, are limited. Therefore, we introduce HAR-SAnet which is a novel RF-based HAR framework. It adopts an original signal adapted convolutional neural network architecture: instead of feeding the handcraft features of RF signals into a classifier, HAR-SAnet fuses them adaptively from both time and frequency domains to design an end-to-end neural network model. We apply point-wise grouped convolution and depth-wise separable convolutions to confine the model scale and to speed up the inference execution time. The experiment results show that the recognition accuracy of HAR-SAnet outperforms state-of-the-art algorithms and systems. △ Less

Submitted 27 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 13 pages

Journal ref: IEEE Transactions on Mobile Computing, 19 April 2021

arXiv:2110.09123 [pdf, other]

Joint Spatial Division and Coaxial Multiplexing for Downlink Multi-User OAM Wireless Backhaul

Authors: Wen-Xuan Long, Rui Chen, Marco Moretti, Jian Xiong, Jiandong Li

Abstract: Orbital angular momentum (OAM) at radio frequency (RF) provides a novel approach of multiplexing a set of orthogonal modes on the same frequency channel to achieve high spectral efficiencies (SEs). However, the existing research on OAM wireless communications is mainly focused on pointto-point transmission in the line-of-sight (LoS) scenario. In this paper, we propose an overall scheme of the down… ▽ More Orbital angular momentum (OAM) at radio frequency (RF) provides a novel approach of multiplexing a set of orthogonal modes on the same frequency channel to achieve high spectral efficiencies (SEs). However, the existing research on OAM wireless communications is mainly focused on pointto-point transmission in the line-of-sight (LoS) scenario. In this paper, we propose an overall scheme of the downlink multi-user OAM (MU-OAM) wireless backhaul based on uniform circular arrays (UCAs) for broadcasting networks, which can achieve the joint spatial division and coaxial multiplexing (JSDCM). A salient feature of the proposed downlink MU-OAM wireless backhaul systems is that the channel matrices are completely characterized by the position of each small base station (SBS), independent of the numbers of subcarriers and antennas, which avoids estimating large channel matrices required by the traditional downlink multi-user multiple-input multiple-output (MU-MIMO) wireless backhaul systems. Thereafter, we propose an OAM-based multiuser distance and angle of arrival (AoA) estimation method, which is able to simultaneously estimate the positions of multiple SBSs with a flexible number of training symbols. With the estimated distances and AoAs, a MU-OAM preprocessing scheme is applied to eliminate the co-mode and inter-mode interferences in the downlink MU-OAM channel. At last, the proposed methods are extended to the downlink MU-OAM-MIMO wireless backhaul system equipped with uniform concentric circular arrays (UCCAs), for which much higher spectral efficiency (SE) and energy efficiency (EE) than traditional MU-MIMO systems can be achieved. Both mathematical analysis and simulation results validate that the proposed scheme can effectively eliminate both interferences of the practical downlink MU-OAM channel and approaches the performance of the ideal MU-OAM channel. △ Less

Submitted 18 October, 2021; originally announced October 2021.

arXiv:2110.07378 [pdf, other]

Scheduler-Pointed False Data Injection Attack for Event-Based Remote State Estimation

Authors: Qiulin Xu, Junlin Xiong

Abstract: In this paper, an attack problem is investigated for event-based remote state estimation in cyber-physical systems. Our objective is to degrade the effect of the event-based scheduler while bypassing a $χ^2$ false data detector. A two-channel scheduler-pointed false data injection attack strategy is proposed by modifying the numerical characteristics of innovation signals. The attack strategy is p… ▽ More In this paper, an attack problem is investigated for event-based remote state estimation in cyber-physical systems. Our objective is to degrade the effect of the event-based scheduler while bypassing a $χ^2$ false data detector. A two-channel scheduler-pointed false data injection attack strategy is proposed by modifying the numerical characteristics of innovation signals. The attack strategy is proved to be always existent, and an algorithm is provided to find it. Under the proposed attack strategy, the scheduler becomes almost invalid and the performance of the remote estimator is degraded. Numerical simulations are used to illustrate our theoretical results. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: 10 pages, 5 figures

arXiv:2108.11763 [pdf, other]

doi 10.1109/PESGM46819.2021.9637992

Attention-based Neural Load Forecasting: A Dynamic Feature Selection Approach

Authors: Jing Xiong, Pengyang Zhou, Alan Chen, Yu Zhang

Abstract: Encoder-decoder-based recurrent neural network (RNN) has made significant progress in sequence-to-sequence learning tasks such as machine translation and conversational models. Recent works have shown the advantage of this type of network in dealing with various time series forecasting tasks. The present paper focuses on the problem of multi-horizon short-term load forecasting, which plays a key r… ▽ More Encoder-decoder-based recurrent neural network (RNN) has made significant progress in sequence-to-sequence learning tasks such as machine translation and conversational models. Recent works have shown the advantage of this type of network in dealing with various time series forecasting tasks. The present paper focuses on the problem of multi-horizon short-term load forecasting, which plays a key role in the power system's planning and operation. Leveraging the encoder-decoder RNN, we develop an attention model to select the relevant features and similar temporal information adaptively. First, input features are assigned with different weights by a feature selection attention layer, while the updated historical features are encoded by a bi-directional long short-term memory (BiLSTM) layer. Then, a decoder with hierarchical temporal attention enables a similar day selection, which re-evaluates the importance of historical information at each time step. Numerical results tested on the dataset of the global energy forecasting competition 2014 show that our proposed model significantly outperforms some existing forecasting schemes. △ Less

Submitted 24 August, 2021; originally announced August 2021.

arXiv:2108.10122 [pdf, other]

Ghost Panorama

Authors: Zhiyuan Ye, Hai-Bo Wang, Jun Xiong, Kaige Wang

Abstract: Computational ghost imaging or single-pixel imaging enables the image formation of an unknown scene using a lens-free photodetector. In this Letter, we present a computational panoramic ghost imaging system that can achieve the full-color panorama using a single-pixel photodetector, where a convex mirror performs the optical transformation of the engineered Hadamard-based circular illumination pat… ▽ More Computational ghost imaging or single-pixel imaging enables the image formation of an unknown scene using a lens-free photodetector. In this Letter, we present a computational panoramic ghost imaging system that can achieve the full-color panorama using a single-pixel photodetector, where a convex mirror performs the optical transformation of the engineered Hadamard-based circular illumination pattern from unidirectionally to omnidirectionally. To our best knowledge, it is the first time to propose the concept of ghost panorama and realize preliminary experimentations. It is foreseeable that ghost panorama will have more advantages in imaging and detection in many extreme conditions (e.g., scattering/turbulence, cryogenic temperatures, and unconventional spectra), as well as broad application prospects in the positioning of fast-moving targets and situation awareness for autonomous vehicles. △ Less

Submitted 13 August, 2021; originally announced August 2021.

Comments: 5 pages, 4figures

arXiv:2106.15283 [pdf, other]

doi 10.1145/3448021

Similarity Embedding Networks for Robust Human Activity Recognition

Authors: Chenglin Li, Carrie Lu Tong, Di Niu, Bei Jiang, Xiao Zuo, Lei Cheng, Jian Xiong, Jianming Yang

Abstract: Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vec… ▽ More Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vectors through carefully designed convolutional and LSTM layers. The embedding network is trained with a pairwise similarity loss, encouraging the clustering of samples from the same class in the embedded real space, and can be effectively trained on a small dataset and even on a noisy dataset with mislabeled samples. Based on the learned embeddings, we further propose both nonparametric and parametric approaches for activity recognition. Extensive evaluation based on two public datasets has shown that the proposed similarity embedding network significantly outperforms state-of-the-art deep models on HAR classification tasks, is robust to mislabeled samples in the training set, and can also be used to effectively denoise a noisy dataset. △ Less

Submitted 31 May, 2021; originally announced June 2021.

arXiv:2106.08519 [pdf, other]

Global Rhythm Style Transfer Without Text Transcriptions

Authors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson

Abstract: Prosody plays an important role in characterizing the style of a speaker or an emotion, but most non-parallel voice or emotion style transfer algorithms do not convert any prosody information. Two major components of prosody are pitch and rhythm. Disentangling the prosody information, particularly the rhythm component, from the speech is challenging because it involves breaking the synchrony betwe… ▽ More Prosody plays an important role in characterizing the style of a speaker or an emotion, but most non-parallel voice or emotion style transfer algorithms do not convert any prosody information. Two major components of prosody are pitch and rhythm. Disentangling the prosody information, particularly the rhythm component, from the speech is challenging because it involves breaking the synchrony between the input speech and the disentangled speech representation. As a result, most existing prosody style transfer algorithms would need to rely on some form of text transcriptions to identify the content information, which confines their application to high-resource languages only. Recently, SpeechSplit has made sizeable progress towards unsupervised prosody style transfer, but it is unable to extract high-level global prosody style in an unsupervised manner. In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions. AutoPST is an Autoencoder-based Prosody Style Transfer framework with a thorough rhythm removal module guided by the self-expressive representation learning. Experiments on different style transfer tasks show that AutoPST can effectively convert prosody that correctly reflects the styles of the target domains. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2105.15174 [pdf, ps, other]

doi 10.1109/TVT.2021.3085296

Energy-Efficient Precoding in Electromagnetic Exposure-Constrained Uplink Multiuser MIMO

Authors: Jiayuan Xiong, Li You, Derrick Wing Kwan Ng, Wenjin Wang, Xiqi Gao

Abstract: User electromagnetic (EM) exposure is continuously being exacerbated by the evolution of multi-antenna portable devices. To mitigate the effects of EM radiation, portable devices must satisfy tight regulations on user exposure level, generally measured by specific absorption rate (SAR). To this end, we investigate the SAR-aware uplink precoder design for the energy efficiency (EE) maximization in… ▽ More User electromagnetic (EM) exposure is continuously being exacerbated by the evolution of multi-antenna portable devices. To mitigate the effects of EM radiation, portable devices must satisfy tight regulations on user exposure level, generally measured by specific absorption rate (SAR). To this end, we investigate the SAR-aware uplink precoder design for the energy efficiency (EE) maximization in multiuser multiple-input multiple-output transmission exploiting statistical channel state information (CSI). As the objective function of the design problem is computationally demanding in the absence of closed form, we present an asymptotic approximation of the objective to facilitate the precoder design. An iterative algorithm based on Dinkelbach's method and sequential optimization is proposed to obtain an optimal solution of the asymptotic EE optimization problem. Based on the transformed problem, an iterative SAR-aware water-filing scheme is further conceived for the EE optimization precoding design with statistical CSI. Numerical results illustrate substantial performance improvements provided by our proposed SAR-aware energy-efficient transmission scheme over the traditional baseline schemes. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: We investigate the SAR-aware uplink precoder design for the EE maximization in multiuser MIMO transmission exploiting statistical CSI

Journal ref: IEEE Transactions on Vehicular Technology, vol. 70, no. 7, pp. 7226-7231, Jul. 2021

arXiv:2105.10299 [pdf, other]

Optimal Estimator Design and Properties Analysis for Interconnected Systems with Asymmetric Information Structure

Authors: Yan Wang, Junlin Xiong, Zaiyue Yang, Rong Su

Abstract: This paper studies the optimal state estimation problem for interconnected systems. Each subsystem can obtain its own measurement in real time, while, the measurements transmitted between the subsystems suffer from random delay. The optimal estimator is analytically designed for minimizing the conditional error covariance. The boundedness of the expected error covariance (EEC) is analyzed. In part… ▽ More This paper studies the optimal state estimation problem for interconnected systems. Each subsystem can obtain its own measurement in real time, while, the measurements transmitted between the subsystems suffer from random delay. The optimal estimator is analytically designed for minimizing the conditional error covariance. The boundedness of the expected error covariance (EEC) is analyzed. In particular, a new condition that is easy to verify is established for the boundedness of EEC. Further, the properties of EEC with respect to the delay probability are studied. We found that there exists a critical probability such that the EEC is bounded if the delay probability is below the critical probability. Also, a lower and upper bound of the critical probability is derived. Finally, the proposed results are applied to a power system, and the effectiveness of the designed methods is illustrated by simulations. △ Less

Submitted 2 May, 2023; v1 submitted 21 May, 2021; originally announced May 2021.

arXiv:2104.05463 [pdf, other]

Scalable Power Control/Beamforming in Heterogeneous Wireless Networks with Graph Neural Networks

Authors: Xiaochen Zhang, Haitao Zhao, Jun Xiong, Li Zhou, Jibo Wei

Abstract: Machine learning (ML) has been widely used for efficient resource allocation (RA) in wireless networks. Although superb performance is achieved on small and simple networks, most existing ML-based approaches are confronted with difficulties when heterogeneity occurs and network size expands. In this paper, specifically focusing on power control/beamforming (PC/BF) in heterogeneous device-to-device… ▽ More Machine learning (ML) has been widely used for efficient resource allocation (RA) in wireless networks. Although superb performance is achieved on small and simple networks, most existing ML-based approaches are confronted with difficulties when heterogeneity occurs and network size expands. In this paper, specifically focusing on power control/beamforming (PC/BF) in heterogeneous device-to-device (D2D) networks, we propose a novel unsupervised learning-based framework named heterogeneous interference graph neural network (HIGNN) to handle these challenges. First, we characterize diversified link features and interference relations with heterogeneous graphs. Then, HIGNN is proposed to empower each link to obtain its individual transmission scheme after limited information exchange with neighboring links. It is noteworthy that HIGNN is scalable to wireless networks of growing sizes with robust performance after trained on small-sized networks. Numerical results show that compared with state-of-the-art benchmarks, HIGNN achieves much higher execution efficiency while providing strong performance. △ Less

Submitted 8 December, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: 6 pages, 6 figures, accepted by IEEE GLOBECOM 2021

arXiv:2103.16051 [pdf, ps, other]

Reduced Dynamics and Control for an Autonomous Bicycle

Authors: Jiaming Xiong, Bo Li, Ruihan Yu, Daolin Ma, Wei Wang, Caishan Liu

Abstract: In this paper, we propose the reduced model for the full dynamics of a bicycle and analyze its nonlinear behavior under a proportional control law for steering. Based on the Gibbs-Appell equations for the Whipple bicycle, we obtain a second-order nonlinear ordinary differential equation (ODE) that governs the bicycle's controlled motion. Two types of equilibrium points for the governing equation a… ▽ More In this paper, we propose the reduced model for the full dynamics of a bicycle and analyze its nonlinear behavior under a proportional control law for steering. Based on the Gibbs-Appell equations for the Whipple bicycle, we obtain a second-order nonlinear ordinary differential equation (ODE) that governs the bicycle's controlled motion. Two types of equilibrium points for the governing equation are found, which correspond to the bicycle's uniform straight forward and circular motions, respectively. By applying the Hurwitz criterion to the linearized equation, we find that the steer coefficient must be negative, consistent with the human's intuition of turning toward a fall. Under this condition, a critical angular velocity of the rear wheel exists, above which the uniform straight forward motion is stable, and slightly below which a pair of symmetrical stable uniform circular motions will occur. These theoretical findings are verified by both numerical simulations and experiments performed on a powered autonomous bicycle. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Journal ref: ICRA 2021

arXiv:2103.06549 [pdf, other]

Advanced Geometry Surface Coding for Dynamic Point Cloud Compression

Authors: Jian Xiong, Hao Gao, Miaohui Wang, Hongliang Li, King Ngi Ngan, Weisi Lin

Abstract: In video-based dynamic point cloud compression (V-PCC), 3D point clouds are projected onto 2D images for compressing with the existing video codecs. However, the existing video codecs are originally designed for natural visual signals, and it fails to account for the characteristics of point clouds. Thus, there are still problems in the compression of geometry information generated from the point… ▽ More In video-based dynamic point cloud compression (V-PCC), 3D point clouds are projected onto 2D images for compressing with the existing video codecs. However, the existing video codecs are originally designed for natural visual signals, and it fails to account for the characteristics of point clouds. Thus, there are still problems in the compression of geometry information generated from the point clouds. Firstly, the distortion model in the existing rate-distortion optimization (RDO) is not consistent with the geometry quality assessment metrics. Secondly, the prediction methods in video codecs fail to account for the fact that the highest depth values of a far layer is greater than or equal to the corresponding lowest depth values of a near layer. This paper proposes an advanced geometry surface coding (AGSC) method for dynamic point clouds (DPC) compression. The proposed method consists of two modules, including an error projection model-based (EPM-based) RDO and an occupancy map-based (OM-based) merge prediction. Firstly, the EPM model is proposed to describe the relationship between the distortion model in the existing video codec and the geometry quality metric. Secondly, the EPM-based RDO method is presented to project the existing distortion model on the plane normal and is simplified to estimate the average normal vectors of coding units (CUs). Finally, we propose the OM-based merge prediction approach, in which the prediction pixels of merge modes are refined based on the occupancy map. Experiments tested on the standard point clouds show that the proposed method achieves an average 9.84\% bitrate saving for geometry compression. △ Less

Submitted 11 March, 2021; originally announced March 2021.

arXiv:2103.02894 [pdf, ps, other]

Stability and $\mathcal{H}_{\infty}$ Performance Analysis of Stochastic Linear Networked and Quantized Control Systems

Authors: Wei Ren, Junlin Xiong

Abstract: This paper studies the stability and $\mathcal{H}_{\infty}$ performance analysis problem for linear networked and quantized control systems with both communication delays random packet losses. To deal with the network-induced uncertainties and random packet dropouts, a novel discrete-time stochastic system model is developed for continuous-time networked control systems, and further overapproximat… ▽ More This paper studies the stability and $\mathcal{H}_{\infty}$ performance analysis problem for linear networked and quantized control systems with both communication delays random packet losses. To deal with the network-induced uncertainties and random packet dropouts, a novel discrete-time stochastic system model is developed for continuous-time networked control systems, and further overapproximated via a polytopic system with norm-bounded uncertainty. Based on the overapproximated system model, sufficient conditions are established for linear networked and quantized control systems in different cases to guarantee input-to-state stability and $\mathcal{H}_{\infty}$ performance with respect to the network-induced errors. Finally, a numerical example is presented to illustrated the developed results. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: 8 pages, 2 figures, extended version of the ACC paper

Showing 1–50 of 75 results for author: Xiong, J