Search | arXiv e-print repository

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a dual-codebook audio tokenizer for linguistic and semantic feature extraction, a 130-billion-parameter backbone LLM and a neural vocoder for high-fidelity speech synthesis. Our post-training approach employs interleaved token-output of text and audio to enhance semantic coherence and combines Direct Preference Optimization (DPO) with model merge to improve performance. Evaluations on the StepEval-Audio-360 benchmark demonstrate that Step-Audio-AQAA excels especially in speech control, outperforming the state-of-art LALMs in key areas. This work contributes a promising solution for end-to-end LALMs and highlights the critical role of token-based vocoder in enhancing overall performance for AQAA tasks. △ Less

Submitted 10 June, 2025; originally announced June 2025.

Comments: 12 pages, 3 figures

arXiv:2506.07129 [pdf, ps, other]

Energy Efficiency Maximization for Movable Antenna Communication Systems

Authors: Jingze Ding, Zijian Zhou, Lipeng Zhu, Yuping Zhao, Bingli Jiao, Rui Zhang

Abstract: This paper investigates energy efficiency maximization for movable antenna (MA)-aided multi-user uplink communication systems by considering the time delay and energy consumption incurred by practical antenna movement. We first examine the special case with a single user and propose an optimization algorithm based on the one-dimensional (1D) exhaustive search to maximize the user's energy efficien… ▽ More This paper investigates energy efficiency maximization for movable antenna (MA)-aided multi-user uplink communication systems by considering the time delay and energy consumption incurred by practical antenna movement. We first examine the special case with a single user and propose an optimization algorithm based on the one-dimensional (1D) exhaustive search to maximize the user's energy efficiency. Moreover, we derive an upper bound on the energy efficiency and analyze the conditions required to achieve this performance bound under different numbers of channel paths. Then, for the general multi-user scenario, we propose an iterative algorithm to fairly maximize the minimum energy efficiency among all users. Simulation results demonstrate the effectiveness of the proposed scheme in improving energy efficiency compared to existing MA schemes that do not account for movement-related costs, as well as the conventional fixed-position antenna (FPA) scheme. In addition, the results show the robustness of the proposed scheme to imperfect channel state information (CSI) and provide valuable insights for practical system deployment. △ Less

Submitted 8 June, 2025; originally announced June 2025.

arXiv:2505.20760 [pdf, ps, other]

Polarforming for Wireless Networks: Opportunities and Challenges

Authors: Jingze Ding, Zijian Zhou, Xiaodan Shao, Bingli Jiao, Rui Zhang

Abstract: Polarforming emerges as a promising technique for manipulating the polarization of electromagnetic (EM) waves by shaping the polarization of an antenna into a desired state. By dynamically adjusting antenna polarization, polarforming enables real-time polarization matching or mismatching with received EM waves, thereby leveraging polarization degrees of freedom (DoFs) to enhance wireless communica… ▽ More Polarforming emerges as a promising technique for manipulating the polarization of electromagnetic (EM) waves by shaping the polarization of an antenna into a desired state. By dynamically adjusting antenna polarization, polarforming enables real-time polarization matching or mismatching with received EM waves, thereby leveraging polarization degrees of freedom (DoFs) to enhance wireless communication performance. In this article, we first present an overview of the fundamental principles and design approaches underlying the polarforming technique. We then analyze the key advantages of polarforming, including hardware cost reduction, depolarization mitigation, channel adaptation, signal power enhancement, and interference suppression. Furthermore, we explore promising applications of polarforming for next-generation wireless networks. Numerical case studies demonstrate the substantial performance gains of polarforming over conventional fixed-polarization antenna (FPA) systems, along with a discussion of implementation challenges to motivate future research. △ Less

Submitted 2 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

arXiv:2504.05966 [pdf, other]

doi 10.1109/TMI.2025.3554785

AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting

Authors: Xiaolin Fan, Yan Wang, Yingying Zhang, Mingkun Bao, Bosen Jia, Dong Lu, Yifan Gu, Jian Cheng, Haogang Zhu

Abstract: Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. Howe… ▽ More Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. However, in real clinical scenarios, the challenge of positioning semantic 2D slices with any orientation into varying coordinate space in arbitrary 3D volume remains unsolved. We thus introduce a novel framework, AVP-AP, the first to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume. Specifically, this paper first proposes an atlas prompting method, which generates a 3D canonical atlas and trains a network to map slices into their corresponding positions in the atlas space via a self-supervised manner. Then, guided by atlas prompts corresponding to the given query images in a reference CT, we identify the coarse positions of slices in the target CT volume using rigid transformation between the 3D atlas and target CT volume, effectively reducing the search space. Finally, we refine the coarse positions by maximizing the similarity between the predicted slices and the query images in the feature space of a given foundation model. Our framework is flexible and efficient compared to other methods, outperforming other methods by 19.8% average structural similarity (SSIM) in arbitrary view positioning and achieving 9% SSIM in two-chamber view compared to four radiologists. Meanwhile, experiments on a public dataset validate our framework's generalizability. △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: 12 pages, 8 figures, published to TMI

Journal ref: IEEE TRANSACTIONS ON MEDICAL IMAGING, March 2025

arXiv:2502.11946 [pdf, other]

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio. △ Less

Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

arXiv:2412.19470 [pdf, other]

Movable Antenna-Aided Near-Field Integrated Sensing and Communication

Authors: Jingze Ding, Zijian Zhou, Xiaodan Shao, Bingli Jiao, Rui Zhang

Abstract: Integrated sensing and communication (ISAC) is emerging as a pivotal technology for next-generation wireless networks. However, existing ISAC systems are based on fixed-position antennas (FPAs), which inevitably incur a loss in performance when balancing the trade-off between sensing and communication. Movable antenna (MA) technology offers promising potential to enhance ISAC performance by enabli… ▽ More Integrated sensing and communication (ISAC) is emerging as a pivotal technology for next-generation wireless networks. However, existing ISAC systems are based on fixed-position antennas (FPAs), which inevitably incur a loss in performance when balancing the trade-off between sensing and communication. Movable antenna (MA) technology offers promising potential to enhance ISAC performance by enabling flexible antenna movement. Nevertheless, exploiting more spatial channel variations requires larger antenna moving regions, which may invalidate the conventional far-field assumption for channels between transceivers. Therefore, this paper utilizes the MA to enhance sensing and communication capabilities in near-field ISAC systems, where a full-duplex base station (BS) is equipped with multiple transmit and receive MAs movable in large-size regions to simultaneously sense multiple targets and serve multiple uplink (UL) and downlink (DL) users for communication. We aim to maximize the weighted sum of sensing and communication rates (WSR) by jointly designing the transmit beamformers, sensing signal covariance matrices, receive beamformers, and MA positions at the BS, as well as the UL power allocation. The resulting optimization problem is challenging to solve, while we propose an efficient two-layer random position (RP) algorithm to tackle it. In addition, to reduce movement delay and cost, we design an antenna position matching (APM) algorithm based on the greedy strategy to minimize the total MA movement distance. Extensive simulation results demonstrate the substantial performance improvement achieved by deploying MAs in near-field ISAC systems. Moreover, the results show the effectiveness of the proposed APM algorithm in reducing the antenna movement distance, which is helpful for energy saving and time overhead reduction for MA-aided near-field ISAC systems with large moving regions. △ Less

Submitted 27 December, 2024; originally announced December 2024.

arXiv:2411.11029 [pdf, other]

Wafer Map Defect Classification Using Autoencoder-Based Data Augmentation and Convolutional Neural Network

Authors: Yin-Yin Bao, Er-Chao Li, Hong-Qiang Yang, Bin-Bin Jia

Abstract: In semiconductor manufacturing, wafer defect maps (WDMs) play a crucial role in diagnosing issues and enhancing process yields by revealing critical defect patterns. However, accurately categorizing WDM defects presents significant challenges due to noisy data, unbalanced defect classes, and the complexity of failure modes. To address these challenges, this study proposes a novel method combining… ▽ More In semiconductor manufacturing, wafer defect maps (WDMs) play a crucial role in diagnosing issues and enhancing process yields by revealing critical defect patterns. However, accurately categorizing WDM defects presents significant challenges due to noisy data, unbalanced defect classes, and the complexity of failure modes. To address these challenges, this study proposes a novel method combining a self-encoder-based data augmentation technique with a convolutional neural network (CNN). By introducing noise into the latent space, the self-encoder enhances data diversity and mitigates class imbalance, thereby improving the model's generalization capabilities. The augmented dataset is subsequently used to train the CNN, enabling it to deliver precise classification of both common and rare defect patterns. Experimental results on the WM-811K dataset demonstrate that the proposed method achieves a classification accuracy of 98.56%, surpassing Random Forest, SVM, and Logistic Regression by 19%, 21%, and 27%, respectively. These findings highlight the robustness and effectiveness of the proposed approach, offering a reliable solution for wafer defect detection and classification. △ Less

Submitted 17 November, 2024; originally announced November 2024.

Comments: 26 pages, 11 figures, including dataset preprocessing, proposed methods, and experimental results

MSC Class: 68T07; 68U10 ACM Class: I.2.10; I.5.1; I.5.4; I.4.8

arXiv:2409.07771 [pdf, ps, other]

Polarforming for Wireless Communications: Modeling and Performance Analysis

Authors: Zijian Zhou, Jingze Ding, Chenbo Wang, Bingli Jiao, Rui Zhang

Abstract: This paper presents, for the first time, the concept of polarforming for wireless communications. Polarforming refers to a novel technique that enables the polarization of an antenna to shape into a desired polarization state for aligning with the polarization of an electromagnetic (EM) wave. It can fully leverage polarization diversity to enhance the performance of wireless communication systems… ▽ More This paper presents, for the first time, the concept of polarforming for wireless communications. Polarforming refers to a novel technique that enables the polarization of an antenna to shape into a desired polarization state for aligning with the polarization of an electromagnetic (EM) wave. It can fully leverage polarization diversity to enhance the performance of wireless communication systems through polarization matching. To implement polarforming, we propose a new paradigm of phase shifter (PS)-based polarization-reconfigurable antennas (PRAs) that can form linear, circular, and general elliptical polarizations by phase shift control. To further demonstrate the benefits of polarforming, we investigate a PRA-aided wireless communication system equipped with tunable polarization of antennas. We characterize the multiple-input multiple-output (MIMO) channel capacity of the considered system as a function of the phase shifts of PS-based PRAs. We also provide a detailed polarforming interpretation under the single-input single-output (SISO) scenario and theoretically show how polarforming differs from the conventional (analog) beamforming based on PSs. Moreover, we develop an alternating optimization approach to maximize the channel capacity for the systems with single-antenna transmitter/receiver. Based on the water-filling principle, we also derive an upper bound on the MIMO channel capacity with PS-based PRAs and then maximize this capacity bound by optimizing the phase shifts through alternating optimization. Finally, comprehensive simulation results are presented, which not only validate the effectiveness of polarforming in combating channel depolarization but also exhibit substantial performance improvements over conventional systems. △ Less

Submitted 20 March, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

Comments: 13 pages, 11 figures

arXiv:2409.06502 [pdf, other]

doi 10.1109/LWC.2024.3519367

Power-Efficient Full-Duplex Satellite Communications Aided by Movable Antennas

Authors: Lifeng Lin, Jingze Ding, Zijian Zhou, Bingli Jiao

Abstract: This letter investigates a movable antenna (MA)-aided full-duplex (FD) satellite communication system, where the satellite, equipped with both transmit and receive MAs, serves multiple uplink (UL) and downlink (DL) user terminals (UTs) in FD mode. Specifically, we formulate a multiobjective optimization problem to minimize the UL and DL transmit powers under imperfect channel state information (CS… ▽ More This letter investigates a movable antenna (MA)-aided full-duplex (FD) satellite communication system, where the satellite, equipped with both transmit and receive MAs, serves multiple uplink (UL) and downlink (DL) user terminals (UTs) in FD mode. Specifically, we formulate a multiobjective optimization problem to minimize the UL and DL transmit powers under imperfect channel state information (CSI). To jointly optimize the MA positions and transmit powers, we propose a two-loop particle swarm optimization (PSO) algorithm based on a multiobjective optimization framework. Simulation results show that flexible adjustments of MA positions can effectively reduce the total UL and DL transmit powers, while also alleviating the burden on self-interference (SI) cancellation modules. △ Less

Submitted 20 December, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: This paper has been accepted by IEEE Wireless Communications Letters

arXiv:2409.05430 [pdf, other]

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge

Authors: Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, Binbin Zhang, Bin Jia

Abstract: The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three tracks: (1) SED, which aims to develop systems for detection of stuttering events; (2) ASR, which focuses on creating robust systems for recognizing stuttered speech;… ▽ More The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three tracks: (1) SED, which aims to develop systems for detection of stuttering events; (2) ASR, which focuses on creating robust systems for recognizing stuttered speech; and (3) Research track for innovative approaches utilizing the provided dataset. We utilizes an open-source Mandarin stuttering dataset AS-70, which has been split into new training and test sets for the challenge. This paper presents the dataset, details the challenge tracks, and analyzes the performance of the top systems, highlighting improvements in detection accuracy and reductions in recognition error rates. Our findings underscore the potential of specialized models and augmentation strategies in developing stuttered speech technologies. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 8 pages, 2 figures, accepted by SLT 2024

arXiv:2408.13435 [pdf, ps, other]

Prototype of Secure Wire-Line Telephone

Authors: Lifeng Lin, Zijian Zhou, Peihe Jiang, Sanjun Liu, Lai Wei, Bingli Jiao

Abstract: This paper presents a secure wire-line telephone system that employs physical layer security (PLS) to protect against wiretapping. The system generates artificial noise (AN) in both transmission directions and uses a telephone hybrid circuit to effectively suppress the AN for the purpose of secure communication. Furthermore, we analyze the secrecy capacity of the system and evaluate its performanc… ▽ More This paper presents a secure wire-line telephone system that employs physical layer security (PLS) to protect against wiretapping. The system generates artificial noise (AN) in both transmission directions and uses a telephone hybrid circuit to effectively suppress the AN for the purpose of secure communication. Furthermore, we analyze the secrecy capacity of the system and evaluate its performance through theoretical analysis and practical experiments. The results demonstrate that the proposed system can significantly enhance communication security while preserving the integrity of legitimate signals. The results also validate that the proposed system is a robust and effective solution for securing wire-line telephone communications. △ Less

Submitted 14 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

Comments: 5 pages, 7 figures. Submitted for possible publication

arXiv:2408.10552 [pdf, ps, other]

doi 10.1109/LWC.2024.3490697

Near-Field Multiuser Communications Aided by Movable Antennas

Authors: Jingze Ding, Lipeng Zhu, Zijian Zhou, Bingli Jiao, Rui Zhang

Abstract: This letter investigates movable antenna (MA)-aided downlink (DL) multiuser communication systems under the near-field channel condition, where both the base station (BS) and the users are equipped with MAs to fully exploit the degrees of freedom (DoFs) in antenna position optimization. We develop a general channel model to accurately describe the channel characteristics in the near-field region a… ▽ More This letter investigates movable antenna (MA)-aided downlink (DL) multiuser communication systems under the near-field channel condition, where both the base station (BS) and the users are equipped with MAs to fully exploit the degrees of freedom (DoFs) in antenna position optimization. We develop a general channel model to accurately describe the channel characteristics in the near-field region and formulate an MA-position optimization problem to minimize the BS's transmit power subject to users' individual rate constraints. To solve this problem, we propose a two-loop dynamic neighborhood pruning particle swarm optimization (DNPPSO) algorithm that significantly reduces the computational complexity as compared to the standard particle swarm optimization (PSO) algorithm while achieving similar performance. Simulation results validate the effectiveness and advantages of the proposed scheme in power-saving for near-field multiuser communications. △ Less

Submitted 6 November, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: This paper has been accepted by IEEE Wireless Communications Letters

arXiv:2407.10393 [pdf, other]

doi 10.1109/TWC.2024.3520806

Movable Antenna-Aided Secure Full-Duplex Multi-User Communications

Authors: Jingze Ding, Zijian Zhou, Bingli Jiao

Abstract: In this paper, we investigate physical layer security (PLS) for full-duplex (FD) multi-user systems. We consider a base station (BS) that operates in FD mode and transmits artificial noise (AN) to simultaneously protect uplink (UL) and downlink (DL) transmissions. Conventional fixed-position antennas (FPAs) at the FD BS struggle to fully exploit spatial degrees of freedom (DoFs) to improve signal… ▽ More In this paper, we investigate physical layer security (PLS) for full-duplex (FD) multi-user systems. We consider a base station (BS) that operates in FD mode and transmits artificial noise (AN) to simultaneously protect uplink (UL) and downlink (DL) transmissions. Conventional fixed-position antennas (FPAs) at the FD BS struggle to fully exploit spatial degrees of freedom (DoFs) to improve signal reception and suppress interference. To overcome this limitation, we propose a novel FD BS architecture equipped with multiple transmit and receive movable antennas (MAs). The MAs introduce the DoFs in antenna position optimization, which can improve the performance of secure communication systems. To serve users and counter the cooperative interception of multiple eavesdroppers (Eves), we formulate a sum of secrecy rates (SSR) maximization problem to jointly optimize the MA positions, the transmit, receive, and AN beamformers at the BS, and the UL powers. We propose an alternating optimization (AO) algorithm, which decomposes the original problem into three sub-problems, to solve the challenging non-convex optimization problem with highly coupled variables. Specifically, we propose the multi-velocity particle swarm optimization (MVPSO), which is an improved version of the standard particle swarm optimization (PSO), to simultaneously optimize all MA positions. The transmit/AN beamformers and the UL powers are solved by successive convex approximation (SCA). The optimal receive beamformer is derived as a closed-form solution. Simulation results demonstrate the effectiveness of the proposed algorithms and the advantages of MAs over conventional FPAs in enhancing the security of FD multi-user systems. △ Less

Submitted 30 December, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

Comments: This paper has been accepted by IEEE Transactions on Wireless Communications

arXiv:2406.18088 [pdf, other]

doi 10.21437/Interspeech.2024-2550

LLM-Driven Multimodal Opinion Expression Identification

Authors: Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang

Abstract: Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world s… ▽ More Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios. Utilizing CMU MOSEI and IEMOCAP datasets, we construct the CI-MOEI dataset. Additionally, Text-to-Speech (TTS) technology is applied to the MPQA dataset to obtain the CIM-OEI dataset. We design a template for the OEI task to take full advantage of the generative power of large language models (LLMs). Advancing further, we propose an LLM-driven method STOEI, which combines speech and text modal to identify opinion expressions. Our experiments demonstrate that MOEI significantly improves the performance while our method outperforms existing methods by 9.20\% and obtains SOTA results. △ Less

Submitted 29 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 Figures, Accept by Interspeech 2024

Journal ref: Proceedings of Interspeech 2024

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2403.20025 [pdf, ps, other]

doi 10.1109/GLOBECOM52923.2024.10901646

Secure Full-Duplex Communication via Movable Antennas

Authors: Jingze Ding, Zijian Zhou, Chenbo Wang, Wenyao Li, Lifeng Lin, Bingli Jiao

Abstract: This paper investigates physical layer security (PLS) in a movable antenna (MA)-assisted full-duplex (FD) system. In this system, an FD base station (BS) with multiple MAs for transmission and reception provides services for an uplink (UL) user and a downlink (DL) user. Each user operates in half-duplex (HD) mode and is equipped with a single fixed-position antenna (FPA), in the presence of a sing… ▽ More This paper investigates physical layer security (PLS) in a movable antenna (MA)-assisted full-duplex (FD) system. In this system, an FD base station (BS) with multiple MAs for transmission and reception provides services for an uplink (UL) user and a downlink (DL) user. Each user operates in half-duplex (HD) mode and is equipped with a single fixed-position antenna (FPA), in the presence of a single-FPA eavesdropper (Eve). To ensure secure communication, artificial noise (AN) is transmitted to obstruct the interception of Eve. The objective of this paper is to maximize the sum secrecy rate (SSR) of the UL and DL users by jointly optimizing the beamformers of the BS and the positions of MAs. This paper also proposes an alternating optimization (AO) method to address the non-convex problem, which decomposes the optimization problem into three subproblems and solves them iteratively. Simulation results demonstrate a significant performance gain in the SSR achieved by the proposed scheme compared to the benchmark schemes. △ Less

Submitted 7 September, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

Comments: The paper has been accepted by Globecom2024

arXiv:2401.17049 [pdf, ps, other]

doi 10.1109/LCOMM.2024.3453296

Movable Antenna-Enabled Co-Frequency Co-Time Full-Duplex Wireless Communication

Authors: Jingze Ding, Zijian Zhou, Wenyao Li, Chenbo Wang, Lifeng Lin, Bingli Jiao

Abstract: Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This technology is especially beneficial for co-frequency co-time full-duplex (CCFD) wireless communication, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning trans… ▽ More Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This technology is especially beneficial for co-frequency co-time full-duplex (CCFD) wireless communication, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/receive antennas, we can mitigate the SI and enhance the reception of incoming signals. Thus, this paper proposes a novel MA-enabled point-to-point CCFD system and formulates the minimum achievable rate of two CCFD terminals. To maximize the minimum achievable rate and determine the positions of MAs, we introduce a solution based on projected particle swarm optimization (PPSO), which can circumvent common suboptimal positioning issues. Moreover, simulation results reveal that the PPSO method leads to better performance compared to the conventional alternating position optimization (APO). The results also demonstrate that an MA-enabled CCFD system outperforms the one using fixed-position antennas (FPAs). △ Less

Submitted 2 September, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: This paper has been accepted by IEEE Communications Letters

arXiv:2312.15993 [pdf]

Adaptive Kalman-based hybrid car following strategy using TD3 and CACC

Authors: Yuqi Zheng, Ruidong Yan, Bin Jia, Rui Jiang, Adriana TAPUS, Xiaojing Chen, Shiteng Zheng, Ying Shang

Abstract: In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performa… ▽ More In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performance and even lead to accidents. To address the above problems, a hybrid car following strategy based on an adaptive Kalman Filter is proposed by regarding CACC and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Different from traditional hybrid strategy based on fixed coefficients, the Kalman gain H, using as an adaptive coefficient, is derived from multi-timestep predictions and Monte Carlo Tree Search. At the end of study, simulation results with 4157745 timesteps indicate that, compared with the TD3 and HCFS algorithms, the proposed algorithm in this study can substantially enhance the safety of car following in mixed traffic flow without compromising the comfort and efficiency. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 32pages,13figures

arXiv:2309.05042 [pdf, ps, other]

High-Precision Channel Estimation for Sub-Noise Self-Interference Cancellation

Authors: Dongsheng Zheng, Lifeng Lin, Wenyao Li, Bingli Jiao

Abstract: Self-interference cancellation plays a crucial role in achieving reliable full-duplex communications. In general, it is essential to cancel the self-interference signal below the thermal noise level, which necessitates accurate reconstruction of the self-interference signal. In this paper, we propose a high-precision channel estimation method specifically designed for sub-noise self-interference c… ▽ More Self-interference cancellation plays a crucial role in achieving reliable full-duplex communications. In general, it is essential to cancel the self-interference signal below the thermal noise level, which necessitates accurate reconstruction of the self-interference signal. In this paper, we propose a high-precision channel estimation method specifically designed for sub-noise self-interference cancellation. Exploiting the fact that all transmitted symbols are known to their respective receivers, our method utilizes all transmitted symbols for self-interference channel estimation. Through analytical derivations and numerical simulations, we validate the effectiveness of the proposed method. The results demonstrate the superior performance of our approach in achieving sub-noise self-interference cancellation. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2305.01341 [pdf, other]

Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces

Authors: Yingyang Chen, Yuncong Li, Miaowen Wen, Duoying Zhang, Bingli Jiao, Zhiguo Ding, Theodoros A. Tsiftsis, H. Vincent Poor

Abstract: Full duplex (FD) radio has attracted extensive attention due to its co-time and co-frequency transceiving capability. {However, the potential gain brought by FD radios is closely related to the management of self-interference (SI), which imposes high or even stringent requirements on SI cancellation (SIC) techniques. When the FD deployment evolves into next-generation mobile networking, the SI pro… ▽ More Full duplex (FD) radio has attracted extensive attention due to its co-time and co-frequency transceiving capability. {However, the potential gain brought by FD radios is closely related to the management of self-interference (SI), which imposes high or even stringent requirements on SI cancellation (SIC) techniques. When the FD deployment evolves into next-generation mobile networking, the SI problem becomes more complicated, significantly limiting its potential gains.} In this paper, we conceive a multi-cell FD networking scheme by deploying a reconfigurable intelligent surface (RIS) at the cell boundary to configure the radio environment proactively. To achieve the full potential of the system, we aim to maximize the sum rate (SR) of multiple cells by jointly optimizing the transmit precoding (TPC) matrices at FD base stations (BSs) and users and the phase shift matrix at RIS. Since the original problem is non-convex, we reformulate and decouple it into a pair of subproblems by utilizing the relationship between the SR and minimum mean square error (MMSE). The optimal solutions of TPC matrices are obtained in closed form, while both complex circle manifold (CCM) and successive convex approximation (SCA) based algorithms are developed to resolve the phase shift matrix suboptimally. Our simulation results show that introducing an RIS into an FD networking system not only improves the overall SR significantly but also enhances the cell edge performance prominently. More importantly, we validate that the RIS deployment with optimized phase shifts can reduce the requirement for SIC and the number of BS antennas, which further reduces the hardware cost and power consumption, especially with a sufficient number of reflecting elements. As a result, the utilization of an RIS enables the originally cumbersome FD networking system to become efficient and practical. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 15 pages, 14 figures

arXiv:2212.05715 [pdf, other]

Integrated optimization of train timetables rescheduling and response vehicles on a disrupted metro line

Authors: Hui Wang, Jialin Liu, Feng Li, Hao Ji, Bin Jia, Ziyou Gao

Abstract: When an unexpected metro disruption occurs, metro managers need to reschedule timetables to avoid trains going into the disruption area, and transport passengers stranded at disruption stations as quickly as possible. This paper proposes a two-stage optimization model to jointly make decisions for two tasks. In the first stage, the timetable rescheduling problem with cancellation and short-turning… ▽ More When an unexpected metro disruption occurs, metro managers need to reschedule timetables to avoid trains going into the disruption area, and transport passengers stranded at disruption stations as quickly as possible. This paper proposes a two-stage optimization model to jointly make decisions for two tasks. In the first stage, the timetable rescheduling problem with cancellation and short-turning strategies is formulated as a mixed integer linear programming (MILP). In particular, the instantaneous parameters and variables are used to describe the accumulation of time-varying passenger flow. In the second one, a system-optimal dynamic traffic assignment (SODTA) model is employed to dynamically schedule response vehicles, which is able to capture the dynamic traffic and congestion. Numerical cases of Beijing Metro Line 9 verify the efficiency and effectiveness of our proposed model, and results show that: (1) when occurring a disruption event during peak hours, the impact on the normal timetable is greater, and passengers in the direction with fewer train services are more affected; (2) if passengers stranded at the terminal stations of disruption area are not transported in time, they will rapidly increase at a speed of more than 300 passengers per minute; (3) compared with the fixed shortest path, using the response vehicles reduces the total travel time about 7%. However, it results in increased travel time for some passengers. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: 32 pages, 21 figures

arXiv:2211.11275 [pdf, other]

doi 10.1109/TMM.2023.3275873

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Authors: Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, Jinyu Li, Furu Wei

Abstract: Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech rep… ▽ More Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech representation learning was not well explored. In this paper, we propose a unified cross-modal representation learning framework VATLM (Visual-Audio-Text Language Model). The proposed VATLM employs a unified backbone network to model the modality-independent information and utilizes three simple modality-dependent modules to preprocess visual, speech, and text inputs. In order to integrate these three modalities into one shared semantic space, VATLM is optimized with a masked prediction task of unified tokens, given by our proposed unified tokenizer. We evaluate the pre-trained VATLM on audio-visual related downstream tasks, including audio-visual speech recognition (AVSR), visual speech recognition (VSR) tasks. Results show that the proposed VATLM outperforms previous the state-of-the-art models, such as audio-visual pre-trained AV-HuBERT model, and analysis also demonstrates that VATLM is capable of aligning different modalities into the same space. To facilitate future research, we release the code and pre-trained models at https://aka.ms/vatlm. △ Less

Submitted 19 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: 11 pages, Accepted by IEEE Transactions on Multimedia

arXiv:2209.01739 [pdf, ps, other]

Auxiliary Factor Method to Remove ISI of Nyquist Filters

Authors: Zijian Zhou, Lifeng Lin, Bingli Jiao

Abstract: As has been known, the Nyquist first condition promises no intersymbol interference (ISI) as derived in the frequency domain. However, the practical implementation using the FIR filter truncates the Fourier transform by its window and prevents the mathematical calculation from reaching the ideal solution at zero-ISI. For obtaining better results, an increase in the window's length is required in g… ▽ More As has been known, the Nyquist first condition promises no intersymbol interference (ISI) as derived in the frequency domain. However, the practical implementation using the FIR filter truncates the Fourier transform by its window and prevents the mathematical calculation from reaching the ideal solution at zero-ISI. For obtaining better results, an increase in the window's length is required in general. To address this problem, a new approach is presented by using auxiliary factors (AFs) to compensate shortcomings of the truncated Fourier transform and remove the ISI completely, regardless of the window's length. In addition, the performance in the presence of the timing jitter is also improved significantly. The closed-form solution of the AFs is derived and the effectiveness is confirmed by the simulation results. Finally, the problems of the transmission delay and additional calculation complexity are analysed. △ Less

Submitted 7 February, 2024; v1 submitted 4 September, 2022; originally announced September 2022.

Comments: This paper was accepted by IEEE Communications Letters

arXiv:2108.00469 [pdf, ps, other]

Delay Aware Secure Offloading for NOMA-Assisted Mobile Edge Computing in Internet of Vehicles

Authors: Ling He, Miaowen Wen, Yingyang Chen, Bingli Jiao

Abstract: In this paper, a multi-vehicle multi-task nonorthogonal multiple access (NOMA) assisted mobile edge computing (MEC) system with passive eavesdropping vehicles is investigated. To heighten the performance of edge vehicles, we propose a vehicle grouping pairing method, which utilizes vehicles near the MEC as full-duplex relays to assist edge vehicles. For promoting transmission security, we employ a… ▽ More In this paper, a multi-vehicle multi-task nonorthogonal multiple access (NOMA) assisted mobile edge computing (MEC) system with passive eavesdropping vehicles is investigated. To heighten the performance of edge vehicles, we propose a vehicle grouping pairing method, which utilizes vehicles near the MEC as full-duplex relays to assist edge vehicles. For promoting transmission security, we employ artificial noise to interrupt eavesdropping vehicles. Furthermore, we derive the approximate expression of secrecy outage probability of the system. The combined optimization of vehicle task division, power allocation, and transmit beamforming is formulated to minimize the total delay of task completion of edge vehicles. Then, we design a power allocation and task scheduling algorithm based on genetic algorithm to solve the mixed-integer nonlinear programming problem. Numerical results demonstrate the superiority of our proposed scheme in terms of system security and transmission delay. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: 12 pages, 9 figures

arXiv:2105.11273 [pdf, ps, other]

Application of Opportunistic Bit to Multilevel Codes

Authors: Bingli Jiao, Mingxi Yin, Yuli Yang

Abstract: In this paper, we propose a new signal organization method to work in the structure of the multi level coding (MLC). The transmit bits are divided into opportunistic bit (OB) and conventional bit (CB), which are mapped to the lower level- and higher level signal in parallel to the MLC, respectively. Because the OB's mapping does not require signal power explicitly, the energy of the CB modulated s… ▽ More In this paper, we propose a new signal organization method to work in the structure of the multi level coding (MLC). The transmit bits are divided into opportunistic bit (OB) and conventional bit (CB), which are mapped to the lower level- and higher level signal in parallel to the MLC, respectively. Because the OB's mapping does not require signal power explicitly, the energy of the CB modulated symbol can be doubled. As the result, the overall mutual information of the proposed method is found higher than that of the conventional BPSK in one dimensional case. Moreover, the extension of the method to the two-complex-dimension shows the better performance over the QPSK. The numerical results confirm this approach. △ Less

Submitted 25 May, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

arXiv:2105.11272 [pdf, ps, other]

A Practical Consideration on Convex Mutual Information

Authors: Mingxi Yin, Bingli Jiao, Dongsheng Zheng, Yuli Yang

Abstract: In this paper, we focus on the convex mutual information, which was found at the lowest level split in multilevel coding schemes with communications over the additive white Gaussian noise (AWGN) channel. Theoretical analysis shows that communication achievable rates (ARs) do not necessarily below mutual information in the convex region. In addition, simulation results are provided as an evidence. In this paper, we focus on the convex mutual information, which was found at the lowest level split in multilevel coding schemes with communications over the additive white Gaussian noise (AWGN) channel. Theoretical analysis shows that communication achievable rates (ARs) do not necessarily below mutual information in the convex region. In addition, simulation results are provided as an evidence. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: Submitted to IEEE Transactions on Information Theory with ID IT-20-0794

arXiv:2103.03796 [pdf]

doi 10.1109/TASE.2021.3100709

Hybrid Car-Following Strategy based on Deep Deterministic Policy Gradient and Cooperative Adaptive Cruise Control

Authors: Ruidong Yan, Rui Jiang, Bin Jia, Jin Huang, Diange Yang

Abstract: Deep deterministic policy gradient (DDPG)-based car-following strategy can break through the constraints of the differential equation model due to the ability of exploration on complex environments. However, the car-following performance of DDPG is usually degraded by unreasonable reward function design, insufficient training, and low sampling efficiency. In order to solve this kind of problem, a… ▽ More Deep deterministic policy gradient (DDPG)-based car-following strategy can break through the constraints of the differential equation model due to the ability of exploration on complex environments. However, the car-following performance of DDPG is usually degraded by unreasonable reward function design, insufficient training, and low sampling efficiency. In order to solve this kind of problem, a hybrid car-following strategy based on DDPG and cooperative adaptive cruise control (CACC) is proposed. First, the car-following process is modeled as the Markov decision process to calculate CACC and DDPG simultaneously at each frame. Given a current state, two actions are obtained from CACC and DDPG, respectively. Then, an optimal action, corresponding to the one offering a larger reward, is chosen as the output of the hybrid strategy. Meanwhile, a rule is designed to ensure that the change rate of acceleration is smaller than the desired value. Therefore, the proposed strategy not only guarantees the basic performance of car-following through CACC but also makes full use of the advantages of exploration on complex environments via DDPG. Finally, simulation results show that the car-following performance of the proposed strategy is improved compared with that of DDPG and CACC. △ Less

Submitted 10 January, 2022; v1 submitted 24 February, 2021; originally announced March 2021.

Comments: 9 pages, 11 figures

ACM Class: J.7

Journal ref: published online 2021

arXiv:2007.07023 [pdf, other]

A Quasi-Doppler Method for Doubling Transmission Efficiency Through Two Orthogonal Directions

Authors: Bingli Jiao

Abstract: Inspired by the anisotropy of Doppler effect with wave propagations, we propose a new method to leverage one information symbol serving two users located in two geometrically orthogonal directions. Specifically in broadband wireless communication, we use multiple antennas with the proposed signal switching method to emulate a moving emission source and yield the frequency shift, referred to as Qua… ▽ More Inspired by the anisotropy of Doppler effect with wave propagations, we propose a new method to leverage one information symbol serving two users located in two geometrically orthogonal directions. Specifically in broadband wireless communication, we use multiple antennas with the proposed signal switching method to emulate a moving emission source and yield the frequency shift, referred to as Quasi Doppler effect, which is converted to the discrete phase modulation. Further, using this discrete modulated phase can adjust the phase of one transmit symbol in achieving two different phases in different directions. The modulation mechanism is explained through theoretical derivations with the analysis on the performance robustness in the application-scenarios of crossroads having small geometric deviations. In contrast to the use of conventional symbols, this approach can double the transmission efficiency which is confirmed by our simulations results. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:1911.01819 [pdf, ps, other]

A Quasi Doppler Method for Signal Transmission to Spatial Perpendicular Directions

Authors: Bingli Jiao

Abstract: This paper introduces a communication method that can use one information symbol to provide two sets of the independent bits to two receivers in spatial perpendicular directions. The new communication scheme is realized by switching one signal source among a linear array antenna elements, which are used to emulating a moving transmitter. The theoretical derivations are presented in the paper. This paper introduces a communication method that can use one information symbol to provide two sets of the independent bits to two receivers in spatial perpendicular directions. The new communication scheme is realized by switching one signal source among a linear array antenna elements, which are used to emulating a moving transmitter. The theoretical derivations are presented in the paper. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: 3 pages, 2 figures

arXiv:1906.03870 [pdf, other]

doi 10.1007/s00530-019-00607-x

Deep Learning-Based Automatic Downbeat Tracking: A Brief Review

Authors: Bijue Jia, Jiancheng Lv, Dayiheng Liu

Abstract: As an important format of multimedia, music has filled almost everyone's life. Automatic analyzing music is a significant step to satisfy people's need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still r… ▽ More As an important format of multimedia, music has filled almost everyone's life. Automatic analyzing music is a significant step to satisfy people's need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still remains a challenge. Previous researches either focus on feature engineering (extracting certain features by signal processing, which are semi-automatic solutions); or have some limitations: they can only model music audio recordings within limited time signatures and tempo ranges. Recently, deep learning has surpassed traditional machine learning methods and has become the primary algorithm in feature learning; the combination of traditional and deep learning methods also has made better performance. In this paper, we begin with a background introduction of downbeat tracking problem. Then, we give detailed discussions of the following topics: system architecture, feature extraction, deep neural network algorithms, datasets, and evaluation strategy. In addition, we take a look at the results from the annual benchmark evaluation--Music Information Retrieval Evaluation eXchange (MIREX)--as well as the developments in software implementations. Although much has been achieved in the area of automatic downbeat tracking, some problems still remain. We point out these problems and conclude with possible directions and challenges for future research. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 22 pages, 7 figures. arXiv admin note: text overlap with arXiv:1605.08396 by other authors

Journal ref: Multimedia Systems, 2019, 25(6): 617-638

arXiv:1801.09350 [pdf, ps, other]

Estimating Distances via Received Signal Strength and Connectivity in Wireless Sensor Networks

Authors: Qing Miao, Baoqi Huang, Bing Jia

Abstract: Distance estimation is vital for localization and many other applications in wireless sensor networks (WSNs). Particularly, it is desirable to implement distance estimation as well as localization without using specific hardware in low-cost WSNs. As such, both the received signal strength (RSS) based approach and the connectivity based approach have gained much attention. The RSS based approach is… ▽ More Distance estimation is vital for localization and many other applications in wireless sensor networks (WSNs). Particularly, it is desirable to implement distance estimation as well as localization without using specific hardware in low-cost WSNs. As such, both the received signal strength (RSS) based approach and the connectivity based approach have gained much attention. The RSS based approach is suitable for estimating short distances, whereas the connectivity based approach obtains relatively good performance for estimating long distances. Considering the complementary features of these two approaches, we propose a fusion method based on the maximum-likelihood estimator (MLE) to estimate the distance between any pair of neighboring nodes in a WSN through efficiently fusing the information from the RSS and local connectivity. Additionally, the method is reported under the practical log-normal shadowing model, and the associated Cramer-Rao lower bound (CRLB) is also derived for performance analysis. Both simulations and experiments based on practical measurements are carried out, and demonstrate that the proposed method outperforms any single approach and approaches to the CRLB as well. △ Less

Submitted 28 January, 2018; originally announced January 2018.

arXiv:1711.04365 [pdf]

doi 10.1109/DASC.2017.8102137

Dynamic Multi-Arm Bandit Game Based Multi-Agents Spectrum Sharing Strategy Design

Authors: Jingyang Lu, Lun Li, Dan Shen, Genshe Chen, Bin Jia, Erik Blasch, Khanh Pham

Abstract: For a wireless avionics communication system, a Multi-arm bandit game is mathematically formulated, which includes channel states, strategies, and rewards. The simple case includes only two agents sharing the spectrum which is fully studied in terms of maximizing the cumulative reward over a finite time horizon. An Upper Confidence Bound (UCB) algorithm is used to achieve the optimal solutions for… ▽ More For a wireless avionics communication system, a Multi-arm bandit game is mathematically formulated, which includes channel states, strategies, and rewards. The simple case includes only two agents sharing the spectrum which is fully studied in terms of maximizing the cumulative reward over a finite time horizon. An Upper Confidence Bound (UCB) algorithm is used to achieve the optimal solutions for the stochastic Multi-Arm Bandit (MAB) problem. Also, the MAB problem can also be solved from the Markov game framework perspective. Meanwhile, Thompson Sampling (TS) is also used as benchmark to evaluate the proposed approach performance. Numerical results are also provided regarding minimizing the expectation of the regret and choosing the best parameter for the upper confidence bound. △ Less

Submitted 12 November, 2017; originally announced November 2017.

Showing 1–32 of 32 results for author: Jia, B