Search | arXiv e-print repository

When Near Becomes Far: From Rayleigh to Optimal Near-Field and Far-Field Boundaries

Authors: Sajad Daei, Gabor Fodor, Mikael Skoglund

Abstract: The transition toward 6G is pushing wireless communication into a regime where the classical plane-wave assumption no longer holds. Millimeter-wave and sub-THz frequencies shrink wavelengths to millimeters, while meter-scale arrays featuring hundreds of antenna elements dramatically enlarge the aperture. Together, these trends collapse the classical Rayleigh far-field boundary from kilometers to m… ▽ More The transition toward 6G is pushing wireless communication into a regime where the classical plane-wave assumption no longer holds. Millimeter-wave and sub-THz frequencies shrink wavelengths to millimeters, while meter-scale arrays featuring hundreds of antenna elements dramatically enlarge the aperture. Together, these trends collapse the classical Rayleigh far-field boundary from kilometers to mere single-digit meters. Consequently, most practical 6G indoor, vehicular, and industrial deployments will inherently operate within the radiating near-field, where reliance on the plane-wave approximation leads to severe array-gain losses, degraded localization accuracy, and excessive pilot overhead. This paper re-examines the fundamental question: Where does the far-field truly begin? Rather than adopting purely geometric definitions, we introduce an application-oriented approach based on user-defined error budgets and a rigorous Fresnel-zone analysis that fully accounts for both amplitude and phase curvature. We propose three practical mismatch metrics: worst-case element mismatch, worst-case normalized mean square error, and spectral efficiency loss. For each metric, we derive a provably optimal transition distance--the minimal range beyond which mismatch permanently remains below a given tolerance--and provide closed-form solutions. Extensive numerical evaluations across diverse frequencies and antenna-array dimensions show that our proposed thresholds can exceed the Rayleigh distance by more than an order of magnitude. By transforming the near-field from a design nuisance into a precise, quantifiable tool, our results provide a clear roadmap for enabling reliable and resource-efficient near-field communications and sensing in emerging 6G systems. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.01223 [pdf, other]

One Target, Many Views: Multi-User Fusion for Collaborative Uplink ISAC

Authors: Sajad Daei, Gabor Fodor, Mikael Skoglund

Abstract: We propose a novel pilot-free multi-user uplink framework for integrated sensing and communication (ISAC) in mm-wave networks, where single-antenna users transmit orthogonal frequency division multiplexing signals without dedicated pilots. The base station exploits the spatial and velocity diversities of users to simultaneously decode messages and detect targets, transforming user transmissions in… ▽ More We propose a novel pilot-free multi-user uplink framework for integrated sensing and communication (ISAC) in mm-wave networks, where single-antenna users transmit orthogonal frequency division multiplexing signals without dedicated pilots. The base station exploits the spatial and velocity diversities of users to simultaneously decode messages and detect targets, transforming user transmissions into a powerful sensing tool. Each user's signal, structured by a known codebook, propagates through a sparse multi-path channel with shared moving targets and user-specific scatterers. Notably, common targets induce distinct delay-Doppler-angle signatures, while stationary scatterers cluster in parameter space. We formulate the joint multi-path parameter estimation and data decoding as a 3D super-resolution problem, extracting delays, Doppler shifts, and angles-of-arrival via atomic norm minimization, efficiently solved using semidefinite programming. A core innovation is multiuser fusion, where diverse user observations are collaboratively combined to enhance sensing and decoding. This approach improves robustness and integrates multi-user perspectives into a unified estimation framework, enabling high-resolution sensing and reliable communication. Numerical results show that the proposed framework significantly enhances both target estimation and communication performance, highlighting its potential for next-generation ISAC systems. △ Less

Submitted 2 May, 2025; originally announced May 2025.

arXiv:2501.13870 [pdf, other]

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

Authors: Shuqi Dai, Yunyun Wang, Roger B. Dannenberg, Zeyu Jin

Abstract: We propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. Our framework enables control over multiple aspects, including language content based on lyrics, performance attributes based on a musical score, singing style and vocal techniques bas… ▽ More We propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. Our framework enables control over multiple aspects, including language content based on lyrics, performance attributes based on a musical score, singing style and vocal techniques based on a selector, and voice identity based on a speech sample. The proposed zero-shot learning paradigm consists of one SVS model and two SVC models, utilizing pre-trained content embeddings and a diffusion-based generator. The proposed framework is also trained on mixed datasets comprising both singing and speech audio, allowing singing voice cloning based on speech reference. Experiments show substantial improvements in timbre similarity and musicality over state-of-the-art baselines, providing insights into other low-data music tasks such as instrumental style transfer. Examples can be found at: everyone-can-sing.github.io. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2412.13504 [pdf, other]

Urban Air Temperature Prediction using Conditional Diffusion Models

Authors: Siyang Dai, Jun Liu, Ngai-Man Cheung

Abstract: Urbanization as a global trend has led to many environmental challenges, including the urban heat island (UHI) effect. The increase in temperature has a significant impact on the well-being of urban residents. Air temperature ($T_a$) at 2m above the surface is a key indicator of the UHI effect. How land use land cover (LULC) affects $T_a$ is a critical research question which requires high-resolut… ▽ More Urbanization as a global trend has led to many environmental challenges, including the urban heat island (UHI) effect. The increase in temperature has a significant impact on the well-being of urban residents. Air temperature ($T_a$) at 2m above the surface is a key indicator of the UHI effect. How land use land cover (LULC) affects $T_a$ is a critical research question which requires high-resolution (HR) $T_a$ data at neighborhood scale. However, weather stations providing $T_a$ measurements are sparsely distributed e.g. more than 10km apart; and numerical models are impractically slow and computationally expensive. In this work, we propose a novel method to predict HR $T_a$ at 100m ground separation distance (gsd) using land surface temperature (LST) and other LULC related features which can be easily obtained from satellite imagery. Our method leverages diffusion models for the first time to generate accurate and visually realistic HR $T_a$ maps, which outperforms prior methods. We pave the way for meteorological research using computer vision techniques by providing a dataset of an extended spatial and temporal coverage, and a high spatial resolution as a benchmark for future research. Furthermore, we show that our model can be applied to urban planning by simulating the impact of different urban designs on $T_a$. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2412.05580 [pdf, other]

Self-Supervised Masked Mesh Learning for Unsupervised Anomaly Detection on 3D Cortical Surfaces

Authors: Hao-Chun Yang, Sicheng Dai, Saige Rutherford, Christian Gaser, Andre F Marquand, Christian F Beckmann, Thomas Wolfers

Abstract: Unsupervised anomaly detection in brain imaging is challenging. In this paper, we propose self-supervised masked mesh learning for unsupervised anomaly detection on 3D cortical surfaces. Our framework leverages the intrinsic geometry of the cortical surface to learn a self-supervised representation that captures the underlying structure of the brain. We introduce a masked mesh convolutional neural… ▽ More Unsupervised anomaly detection in brain imaging is challenging. In this paper, we propose self-supervised masked mesh learning for unsupervised anomaly detection on 3D cortical surfaces. Our framework leverages the intrinsic geometry of the cortical surface to learn a self-supervised representation that captures the underlying structure of the brain. We introduce a masked mesh convolutional neural network (MMN) that learns to predict masked regions of the cortical surface. By training the MMN on a large dataset of healthy subjects, we learn a representation that captures the normal variation in the cortical surface. We then use this representation to detect anomalies in unseen individuals by calculating anomaly scores based on the reconstruction error of the MMN. We evaluated our framework by training on population-scale dataset UKB and HCP-Aging and testing on two datasets of Alzheimer's disease patients ADNI and OASIS3. Our results show that our framework can detect anomalies in cortical thickness, cortical volume, and cortical sulcus characteristics, which are known to be biomarkers of Alzheimer's disease. Our proposed framework provides a promising approach for unsupervised anomaly detection based on normative variation of cortical features. △ Less

Submitted 30 March, 2025; v1 submitted 7 December, 2024; originally announced December 2024.

arXiv:2411.12183 [pdf, other]

Action-Attentive Deep Reinforcement Learning for Autonomous Alignment of Beamlines

Authors: Siyu Wang, Shengran Dai, Jianhui Jiang, Shuang Wu, Yufei Peng, Junbin Zhang

Abstract: Synchrotron radiation sources play a crucial role in fields such as materials science, biology, and chemistry. The beamline, a key subsystem of the synchrotron, modulates and directs the radiation to the sample for analysis. However, the alignment of beamlines is a complex and time-consuming process, primarily carried out manually by experienced engineers. Even minor misalignments in optical compo… ▽ More Synchrotron radiation sources play a crucial role in fields such as materials science, biology, and chemistry. The beamline, a key subsystem of the synchrotron, modulates and directs the radiation to the sample for analysis. However, the alignment of beamlines is a complex and time-consuming process, primarily carried out manually by experienced engineers. Even minor misalignments in optical components can significantly affect the beam's properties, leading to suboptimal experimental outcomes. Current automated methods, such as bayesian optimization (BO) and reinforcement learning (RL), although these methods enhance performance, limitations remain. The relationship between the current and target beam properties, crucial for determining the adjustment, is not fully considered. Additionally, the physical characteristics of optical elements are overlooked, such as the need to adjust specific devices to control the output beam's spot size or position. This paper addresses the alignment of beamlines by modeling it as a Markov Decision Process (MDP) and training an intelligent agent using RL. The agent calculates adjustment values based on the current and target beam states, executes actions, and iterates until optimal parameters are achieved. A policy network with action attention is designed to improve decision-making by considering both state differences and the impact of optical components. Experiments on two simulated beamlines demonstrate that our algorithm outperforms existing methods, with ablation studies highlighting the effectiveness of the action attention-based policy network. △ Less

Submitted 18 November, 2024; originally announced November 2024.

Comments: 17 pages, 5 figures

arXiv:2410.08224 [pdf, other]

doi 10.62762/CJIF.2024.876830

A Survey of Spatio-Temporal EEG data Analysis: from Models to Applications

Authors: Pengfei Wang, Huanran Zheng, Silong Dai, Yiqiao Wang, Xiaotian Gu, Yuanbin Wu, Xiaoling Wang

Abstract: In recent years, the field of electroencephalography (EEG) analysis has witnessed remarkable advancements, driven by the integration of machine learning and artificial intelligence. This survey aims to encapsulate the latest developments, focusing on emerging methods and technologies that are poised to transform our comprehension and interpretation of brain activity. We delve into self-supervised… ▽ More In recent years, the field of electroencephalography (EEG) analysis has witnessed remarkable advancements, driven by the integration of machine learning and artificial intelligence. This survey aims to encapsulate the latest developments, focusing on emerging methods and technologies that are poised to transform our comprehension and interpretation of brain activity. We delve into self-supervised learning methods that enable the robust representation of brain signals, which are fundamental for a variety of downstream applications. We also explore emerging discriminative methods, including graph neural networks (GNN), foundation models, and large language models (LLMs)-based approaches. Furthermore, we examine generative technologies that harness EEG data to produce images or text, offering novel perspectives on brain activity visualization and interpretation. The survey provides an extensive overview of these cutting-edge techniques, their current applications, and the profound implications they hold for future research and clinical practice. The relevant literature and open-source materials have been compiled and are consistently being refreshed at \url{https://github.com/wpf535236337/LLMs4TS} △ Less

Submitted 26 September, 2024; originally announced October 2024.

Comments: submitted to IECE Chinese Journal of Information Fusion

Journal ref: Chinese Journal of Information Fusion, 2024, 1(3): 183-211

arXiv:2410.04930 [pdf, other]

Near-Field ISAC in 6G: Addressing Phase Nonlinearity via Lifted Super-Resolution

Authors: Sajad Daei, Amirreza Zamani, Saikat Chatterjee, Mikael Skoglund, Gabor Fodor

Abstract: Integrated sensing and communications (ISAC) is a promising component of 6G networks, fusing communication and radar technologies to facilitate new services. Additionally, the use of extremely large-scale antenna arrays (ELAA) at the ISAC common receiver not only facilitates terahertz-rate communication links but also significantly enhances the accuracy of target detection in radar applications. I… ▽ More Integrated sensing and communications (ISAC) is a promising component of 6G networks, fusing communication and radar technologies to facilitate new services. Additionally, the use of extremely large-scale antenna arrays (ELAA) at the ISAC common receiver not only facilitates terahertz-rate communication links but also significantly enhances the accuracy of target detection in radar applications. In practical scenarios, communication scatterers and radar targets often reside in close proximity to the ISAC receiver. This, combined with the use of ELAA, fundamentally alters the electromagnetic characteristics of wireless and radar channels, shifting from far-field planar-wave propagation to near-field spherical wave propagation. Under the far-field planar-wave model, the phase of the array response vector varies linearly with the antenna index. In contrast, in the near-field spherical wave model, this phase relationship becomes nonlinear. This shift presents a fundamental challenge: the widely-used Fourier analysis can no longer be directly applied for target detection and communication channel estimation at the ISAC common receiver. In this work, we propose a feasible solution to address this fundamental issue. Specifically, we demonstrate that there exists a high-dimensional space in which the phase nonlinearity can be expressed as linear. Leveraging this insight, we develop a lifted super-resolution framework that simultaneously performs communication channel estimation and extracts target parameters with high precision. △ Less

Submitted 25 December, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

Comments: To appear in IEEE ICASSP 2025

arXiv:2408.14340 [pdf, other]

Foundation Models for Music: A Survey

Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan , et al. (17 additional authors not shown)

Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm. △ Less

Submitted 3 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

arXiv:2407.17777 [pdf, other]

Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment

Authors: Shenghong Dai, Shiqi Jiang, Yifan Yang, Ting Cao, Mo Li, Suman Banerjee, Lili Qiu

Abstract: This paper presents Babel, the expandable modality alignment model, specially designed for multi-modal sensing. While there has been considerable work on multi-modality alignment, they all struggle to effectively incorporate multiple sensing modalities due to the data scarcity constraints. How to utilize multi-modal data with partial pairings in sensing remains an unresolved challenge. Babel tackl… ▽ More This paper presents Babel, the expandable modality alignment model, specially designed for multi-modal sensing. While there has been considerable work on multi-modality alignment, they all struggle to effectively incorporate multiple sensing modalities due to the data scarcity constraints. How to utilize multi-modal data with partial pairings in sensing remains an unresolved challenge. Babel tackles this challenge by introducing the concept of expandable modality alignment. The key idea involves transforming the N-modality alignment into a series of binary-modality alignments. Novel techniques are also proposed to further mitigate data scarcity issue and balance the contribution of the newly incorporated modality with the previously established modality alignment during the expandable alignment process. We provide the comprehensive implementation. In the pre-training phase, Babel currently aligns 6 sensing modalities, namely Wi-Fi, mmWave, IMU, LiDAR, video, and depth. For the deployment phase, as a foundation model, any single or combination of aligned modalities could be selected from Babel and applied to downstream tasks. Evaluation demonstrates Babel's outstanding performance on eight human activity recognition datasets, compared to a broad range of baselines e.g., the SOTA single-modal sensing networks, multi-modal sensing framework, and multi-modal large language models. Babel not only improves the performance of individual modality sensing (12% averaged accuracy improvement), but also effectively fuses multiple available modalities (up to 22% accuracy increase). Case studies also highlight emerging application scenarios empowered by Babel, including cross-modality retrieval (i.e., sensing imaging), and bridging LLM for sensing comprehension. △ Less

Submitted 21 March, 2025; v1 submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted by SenSys'25

arXiv:2405.07906 [pdf, other]

Improved Downlink Channel Estimation in Time-Varying FDD Massive MIMO Systems

Authors: Sajad Daei, Mikael Skoglund, Gabor Fodor

Abstract: In this work, we address the challenge of accurately obtaining channel state information at the transmitter (CSIT) for frequency division duplexing (FDD) multiple input multiple output systems. Although CSIT is vital for maximizing spatial multiplexing gains, traditional CSIT estimation methods often suffer from impracticality due to the substantial training and feedback overhead they require. To… ▽ More In this work, we address the challenge of accurately obtaining channel state information at the transmitter (CSIT) for frequency division duplexing (FDD) multiple input multiple output systems. Although CSIT is vital for maximizing spatial multiplexing gains, traditional CSIT estimation methods often suffer from impracticality due to the substantial training and feedback overhead they require. To address this challenge, we leverage two sources of prior information simultaneously: the presence of limited local scatterers at the base station (BS) and the time-varying characteristics of the channel. The former results in a redundant angular sparsity of users' channels exceeding the spatial dimension (i.e., the number of BS antennas), while the latter provides a prior non-uniform distribution in the angular domain. We propose a weighted optimization framework that simultaneously reflects both of these features. The optimal weights are then obtained by minimizing the expected recovery error of the optimization problem. This establishes an analytical closed-form relationship between the optimal weights and the angular domain characteristics. Numerical experiments verify the effectiveness of our proposed approach in reducing the recovery error and consequently resulting in decreased training and feedback overhead. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07895 [pdf, other]

Optimal Transmitter Design and Pilot Spacing in MIMO Non-Stationary Aging Channels

Authors: Sajad Daei, Gabor Fodor, Mikael Skoglund

Abstract: This work considers an uplink wireless communication system where multiple users with multiple antennas transmit data frames over dynamic channels. Previous studies have shown that multiple transmit and receive antennas can substantially enhance the sum-capacity of all users when the channel is known at the transmitter and in the case of uncorrelated transmit and receive antennas. However, spatial… ▽ More This work considers an uplink wireless communication system where multiple users with multiple antennas transmit data frames over dynamic channels. Previous studies have shown that multiple transmit and receive antennas can substantially enhance the sum-capacity of all users when the channel is known at the transmitter and in the case of uncorrelated transmit and receive antennas. However, spatial correlations stemming from close proximity of transmit antennas and channel variation between pilot and data time slots, known as channel aging, can substantially degrade the transmission rate if they are not properly into account. In this work, we provide an analytical framework to concurrently exploit both of these features. Specifically, we first propose a beamforming framework to capture spatial correlations. Then, based on random matrix theory tools, we introduce a deterministic expression that approximates the average sum-capacity of all users. Subsequently, we obtain the optimal values of pilot spacing and beamforming vectors upon maximizing this expression. Simulation results show the impacts of path loss, velocity of mobile users and Rician factor on the resulting sum-capacity and underscore the efficacy of our methodology compared to prior works. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07890 [pdf, other]

Subspace-Informed Matrix Completion

Authors: Hamideh. Sadat Fazael Ardakani, Sajad Daei, Arash Amini, Mikael Skoglund, Gabor Fodor

Abstract: In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior inf… ▽ More In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior information is oftentimes the low-rank nature of the ground-truth matrix, in various real-world scenarios, additional knowledge about the ground-truth low-rank matrix is available. For instance, in collaborative filtering, Netflix problem, and dynamic channel estimation in wireless communications, we have partial or full knowledge about the signal subspace in advance. Specifically, we are aware of some subspaces that form multiple angles with the column and row spaces of the ground-truth matrix. Leveraging this valuable information has the potential to significantly reduce the required number of observations. To this end, we introduce a multi-weight nuclear norm optimization problem that concurrently promotes the low-rank property as well the information about the available subspaces. The proposed weights are tailored to penalize each angle corresponding to each basis of the prior subspace independently. We further propose an optimal weight selection strategy by minimizing the coherence parameter of the ground-truth matrix, which is equivalent to minimizing the required number of observations. Simulation results validate the advantages of incorporating multiple weights in the completion procedure. Specifically, our proposed multi-weight optimization problem demonstrates a substantial reduction in the required number of observations compared to the state-of-the-art methods. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2111.00235

arXiv:2405.07882 [pdf, other]

Exploiting Spatial and Temporal Correlations in Massive MIMO Systems Over Non-Stationary Aging Channels

Authors: Sajad Daei, Gabor Fodor, Mikael Skoglund

Abstract: This work investigates a multi-user, multi-antenna uplink wireless system, where multiple users transmit signals to a base station. Previous research has explored the potential for linear growth in spectral efficiency by employing multiple transmit and receive antennas. This gain depends on the quality of channel state information and uncorrelated antennas. However, spatial correlations, arising f… ▽ More This work investigates a multi-user, multi-antenna uplink wireless system, where multiple users transmit signals to a base station. Previous research has explored the potential for linear growth in spectral efficiency by employing multiple transmit and receive antennas. This gain depends on the quality of channel state information and uncorrelated antennas. However, spatial correlations, arising from closely-spaced antennas, and channel aging effects, stemming from the difference between the channel at pilot and data time instances, can substantially counteract these benefits and degrade the transmission rate, especially in non-stationary environments. To address these challenges, this work introduces a real-time beamforming framework to compensate for the spatial correlation effect. A channel estimation scheme is then developed, leveraging temporal channel correlations and considering mobile device velocity and antenna spacing. Subsequently, an expression approximating the average spectral efficiency is obtained, dependent on pilot spacing, pilot and data powers, and beamforming vectors. By maximizing this expression, optimal parameters are identified. Numerical results reveal the effectiveness of the proposed approach compared to prior works. Moreover, optimal pilot spacing remains unaffected by interference components such as path loss and the velocity of interference users. The impact of interference components also diminishes with an increasing number of transmit antennas. △ Less

Submitted 18 March, 2025; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.13368 by other authors

arXiv:2401.13368 [pdf, other]

Towards Optimal Pilot Spacing and Power Control in Multi-Antenna Systems Operating Over Non-Stationary Rician Aging Channels

Authors: Sajad Daei, Gabor Fodor, Mikael Skoglund, Miklos Telek

Abstract: Several previous works have addressed the inherent trade-off between allocating resources in the power and time domains to pilot and data signals in multiple input multiple output systems over block-fading channels. In particular, when the channel changes rapidly in time, channel aging degrades the performance in terms of spectral efficiency without proper pilot spacing and power control. Despite… ▽ More Several previous works have addressed the inherent trade-off between allocating resources in the power and time domains to pilot and data signals in multiple input multiple output systems over block-fading channels. In particular, when the channel changes rapidly in time, channel aging degrades the performance in terms of spectral efficiency without proper pilot spacing and power control. Despite recognizing non-stationary stochastic processes as more accurate models for time-varying wireless channels, the problem of pilot spacing and power control in multi-antenna systems operating over non-stationary channels is not addressed in the literature. In this paper, we address this gap by introducing a refined first-order autoregressive model that exploits the inherent temporal correlations over non-stationary Rician aging channels. We design a multi-frame structure for data transmission that better reflects the non-stationary fading environment than previously developed single-frame structures. Subsequently, to determine optimal pilot spacing and power control within this multi-frame structure, we develop an optimization framework and an efficient algorithm based on maximizing a deterministic equivalent expression for the spectral efficiency, demonstrating its generality by encompassing previous channel aging results. Our numerical results indicate the efficacy of the proposed method in terms of spectral efficiency gains over the single frame structure. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2308.03518 [pdf, ps, other]

Off-the-grid Blind Deconvolution and Demixing

Authors: Saeed Razavikia, Sajad Daei, Mikael Skoglund, Gabor Fodor, Carlo Fischione

Abstract: We consider the problem of gridless blind deconvolution and demixing (GB2D) in scenarios where multiple users communicate messages through multiple unknown channels, and a single base station (BS) collects their contributions. This scenario arises in various communication fields, including wireless communications, the Internet of Things, over-the-air computation, and integrated sensing and communi… ▽ More We consider the problem of gridless blind deconvolution and demixing (GB2D) in scenarios where multiple users communicate messages through multiple unknown channels, and a single base station (BS) collects their contributions. This scenario arises in various communication fields, including wireless communications, the Internet of Things, over-the-air computation, and integrated sensing and communications. In this setup, each user's message is convolved with a multi-path channel formed by several scaled and delayed copies of Dirac spikes. The BS receives a linear combination of the convolved signals, and the goal is to recover the unknown amplitudes, continuous-indexed delays, and transmitted waveforms from a compressed vector of measurements at the BS. However, in the absence of any prior knowledge of the transmitted messages and channels, GB2D is highly challenging and intractable in general. To address this issue, we assume that each user's message follows a distinct modulation scheme living in a known low-dimensional subspace. By exploiting these subspace assumptions and the sparsity of the multipath channels for different users, we transform the nonlinear GB2D problem into a matrix tuple recovery problem from a few linear measurements. To achieve this, we propose a semidefinite programming optimization that exploits the specific low-dimensional structure of the matrix tuple to recover the messages and continuous delays of different communication paths from a single received signal at the BS. Finally, our numerical experiments show that our proposed method effectively recovers all transmitted messages and the continuous delay parameters of the channels with a sufficient number of samples. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2306.12228 [pdf, ps, other]

Blind Asynchronous Goal-Oriented Detection for Massive Connectivity

Authors: Sajad Daei, Saeed Razavikia, Marios Kountouris, Mikael Skoglund, Gabor Fodor, Carlo Fischione

Abstract: Resource allocation and multiple access schemes are instrumental for the success of communication networks, which facilitate seamless wireless connectivity among a growing population of uncoordinated and non-synchronized users. In this paper, we present a novel random access scheme that addresses one of the most severe barriers of current strategies to achieve massive connectivity and ultra-reliab… ▽ More Resource allocation and multiple access schemes are instrumental for the success of communication networks, which facilitate seamless wireless connectivity among a growing population of uncoordinated and non-synchronized users. In this paper, we present a novel random access scheme that addresses one of the most severe barriers of current strategies to achieve massive connectivity and ultra-reliable and low latency communications for 6G. The proposed scheme utilizes wireless channels' angular continuous group-sparsity feature to provide low latency, high reliability, and massive access features in the face of limited time-bandwidth resources, asynchronous transmissions, and preamble errors. Specifically, a reconstruction-free goal-oriented optimization problem is proposed, which preserves the angular information of active devices and is then complemented by a clustering algorithm to assign active users to specific groups. This allows us to identify active stationary devices according to their line of sight angles. Additionally, for mobile devices, an alternating minimization algorithm is proposed to recover their preamble, data, and channel gains simultaneously, enabling the identification of active mobile users. Simulation results show that the proposed algorithm provides excellent performance and supports a massive number of devices. Moreover, the performance of the proposed scheme is independent of the total number of devices, distinguishing it from other random access schemes. The proposed method provides a unified solution to meet the requirements of machine-type communications and ultra-reliable and low-latency communications, making it an important contribution to the emerging 6G networks. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2303.08490 [pdf, other]

Strong Baseline and Bag of Tricks for COVID-19 Detection of CT Scans

Authors: Chih-Chung Hsu, Chih-Yu Jian, Chia-Ming Lee, Chi-Han Tsai, Sheng-Chieh Dai

Abstract: This paper investigates the application of deep learning models for lung Computed Tomography (CT) image analysis. Traditional deep learning frameworks encounter compatibility issues due to variations in slice numbers and resolutions in CT images, which stem from the use of different machines. Commonly, individual slices are predicted and subsequently merged to obtain the final result; however, thi… ▽ More This paper investigates the application of deep learning models for lung Computed Tomography (CT) image analysis. Traditional deep learning frameworks encounter compatibility issues due to variations in slice numbers and resolutions in CT images, which stem from the use of different machines. Commonly, individual slices are predicted and subsequently merged to obtain the final result; however, this approach lacks slice-wise feature learning and consequently results in decreased performance. We propose a novel slice selection method for each CT dataset to address this limitation, effectively filtering out uncertain slices and enhancing the model's performance. Furthermore, we introduce a spatial-slice feature learning (SSFL) technique\cite{hsu2022} that employs a conventional and efficient backbone model for slice feature training, followed by extracting one-dimensional data from the trained model for COVID and non-COVID classification using a dedicated classification model. Leveraging these experimental steps, we integrate one-dimensional features with multiple slices for channel merging and employ a 2D convolutional neural network (CNN) model for classification. In addition to the aforementioned methods, we explore various high-performance classification models, ultimately achieving promising results. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: technical report. Keywords: Spatial-Slice correlation, COVID-19 classification, convolutional neural networks, computed tomography

arXiv:2301.03448 [pdf, other]

Multi-User Distributed Computing Via Compressed Sensing

Authors: Ali Khalesi, Sajad Daei, Marios Kountouris, Petros Elia

Abstract: The multi-user linearly-separable distributed computing problem is considered here, in which $N$ servers help to compute the real-valued functions requested by $K$ users, where each function can be written as a linear combination of up to $L$ (generally non-linear) subfunctions. Each server computes a fraction $γ$ of the subfunctions, then communicates a function of its computed outputs to some of… ▽ More The multi-user linearly-separable distributed computing problem is considered here, in which $N$ servers help to compute the real-valued functions requested by $K$ users, where each function can be written as a linear combination of up to $L$ (generally non-linear) subfunctions. Each server computes a fraction $γ$ of the subfunctions, then communicates a function of its computed outputs to some of the users, and then each user collects its received data to recover its desired function. Our goal is to bound the ratio between the computation workload done by all servers over the number of datasets. To this end, we here reformulate the real-valued distributed computing problem into a matrix factorization problem and then into a basic sparse recovery problem, where sparsity implies computational savings. Building on this, we first give a simple probabilistic scheme for subfunction assignment, which allows us to upper bound the optimal normalized computation cost as $γ\leq \frac{K}{N}$ that a generally intractable $\ell_0$-minimization would give. To bypass the intractability of such optimal scheme, we show that if these optimal schemes enjoy $γ\leq - r\frac{K}{N}W^{-1}_{-1}(- \frac{2K}{e N r} )$ (where $W_{-1}(\cdot)$ is the Lambert function and $r$ calibrates the communication between servers and users), then they can actually be derived using a tractable Basis Pursuit $\ell_1$-minimization. This newly-revealed connection between distributed computation and compressed sensing opens up the possibility of designing practical distributed computing algorithms by employing tools and methods from compressed sensing. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: Submitted to ITW2023. arXiv admin note: text overlap with arXiv:2206.11119

arXiv:2212.13582 [pdf, other]

Distribution-aware $\ell_1$ Analysis Minimization

Authors: Raziyeh Takbiri, Sajad Daei

Abstract: This work is about recovering an analysis-sparse vector, i.e. sparse vector in some transform domain, from under-sampled measurements. In real-world applications, there often exist random analysis-sparse vectors whose distribution in the analysis domain are known. To exploit this information, a weighted $\ell_1$ analysis minimization is often considered. The task of choosing the weights in this ca… ▽ More This work is about recovering an analysis-sparse vector, i.e. sparse vector in some transform domain, from under-sampled measurements. In real-world applications, there often exist random analysis-sparse vectors whose distribution in the analysis domain are known. To exploit this information, a weighted $\ell_1$ analysis minimization is often considered. The task of choosing the weights in this case is however challenging and non-trivial. In this work, we provide an analytical method to choose the suitable weights. Specifically, we first obtain a tight upper-bound expression for the expected number of required measurements. This bound depends on two critical parameters: support distribution and expected sign of the analysis domain which are both accessible in advance. Then, we calculate the near-optimal weights by minimizing this expression with respect to the weights. Our strategy works for both noiseless and noisy settings. Numerical results demonstrate the superiority of our proposed method. Specifically, the weighted $\ell_1$ analysis minimization with our near-optimal weighting design considerably needs fewer measurements than its regular $\ell_1$ analysis counterpart. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2210.16587 [pdf, other]

Relating Human Perception of Musicality to Prediction in a Predictive Coding Model

Authors: Nikolas McNeal, Jennifer Huang, Aniekan Umoren, Shuqi Dai, Roger Dannenberg, Richard Randall, Tai Sing Lee

Abstract: We explore the use of a neural network inspired by predictive coding for modeling human music perception. This network was developed based on the computational neuroscience theory of recurrent interactions in the hierarchical visual cortex. When trained with video data using self-supervised learning, the model manifests behaviors consistent with human visual illusions. Here, we adapt this network… ▽ More We explore the use of a neural network inspired by predictive coding for modeling human music perception. This network was developed based on the computational neuroscience theory of recurrent interactions in the hierarchical visual cortex. When trained with video data using self-supervised learning, the model manifests behaviors consistent with human visual illusions. Here, we adapt this network to model the hierarchical auditory system and investigate whether it will make similar choices to humans regarding the musicality of a set of random pitch sequences. When the model is trained with a large corpus of instrumental classical music and popular melodies rendered as mel spectrograms, it exhibits greater prediction errors for random pitch sequences that are rated less musical by human subjects. We found that the prediction error depends on the amount of information regarding the subsequent note, the pitch interval, and the temporal context. Our findings suggest that predictability is correlated with human perception of musicality and that a predictive coding neural network trained on music can be used to characterize the features and motifs contributing to human perception of music. △ Less

Submitted 29 October, 2022; originally announced October 2022.

Comments: 5 pages, 5 figures, currently in peer review

arXiv:2209.00182 [pdf, other]

What is missing in deep music generation? A study of repetition and structure in popular music

Authors: Shuqi Dai, Huiran Yu, Roger B. Dannenberg

Abstract: Structure is one of the most essential aspects of music, and music structure is commonly indicated through repetition. However, the nature of repetition and structure in music is still not well understood, especially in the context of music generation, and much remains to be explored with Music Information Retrieval (MIR) techniques. Analyses of two popular music datasets (Chinese and American) il… ▽ More Structure is one of the most essential aspects of music, and music structure is commonly indicated through repetition. However, the nature of repetition and structure in music is still not well understood, especially in the context of music generation, and much remains to be explored with Music Information Retrieval (MIR) techniques. Analyses of two popular music datasets (Chinese and American) illustrate important music construction principles: (1) structure exists at multiple hierarchical levels, (2) songs use repetition and limited vocabulary so that individual songs do not follow general statistics of song collections, (3) structure interacts with rhythm, melody, harmony, and predictability, and (4) over the course of a song, repetition is not random, but follows a general trend as revealed by cross-entropy. These and other findings offer challenges as well as opportunities for deep-learning music generation and suggest new formal music criteria and evaluation methods. Music from recent music generation systems is analyzed and compared to human-composed music in our datasets, often revealing striking differences from a structural perspective. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: In Proceedings of the 23rd Int. Society for Music Information Retrieval (ISMIR) 2022

arXiv:2205.07092 [pdf, other]

doi 10.1109/TSP.2023.3254140

Blind Goal-Oriented Massive Access for Future Wireless Networks

Authors: Sajad Daei, Marios Kountouris

Abstract: Emerging communication networks are envisioned to support massive wireless connectivity of heterogeneous devices with sporadic traffic and diverse requirements in terms of latency, reliability, and bandwidth. Providing multiple access to an increasing number of uncoordinated users and sharing the limited resources become essential in this context. In this work, we revisit the random access (RA) pr… ▽ More Emerging communication networks are envisioned to support massive wireless connectivity of heterogeneous devices with sporadic traffic and diverse requirements in terms of latency, reliability, and bandwidth. Providing multiple access to an increasing number of uncoordinated users and sharing the limited resources become essential in this context. In this work, we revisit the random access (RA) problem and exploit the continuous angular group sparsity feature of wireless channels to propose a novel RA strategy that provides low latency, high reliability, and massive access with limited bandwidth resources in an all-in-one package. To this end, we first design a reconstruction-free goal-oriented optimization problem, which only preserves the angular information required to identify the active devices. To solve this, we propose an alternating direction method of multipliers (ADMM) and derive closed-form expressions for each ADMM step. Then, we design a clustering algorithm that assigns the users in specific groups from which we can identify active stationary devices by their angles. For mobile devices, we propose an alternating minimization algorithm to recover their data and their channel gains simultaneously, which allows us to identify active mobile users. Simulation results show significant performance gains in terms of active user detection and false alarm probabilities as compared to state-of-the-art RA schemes, even with limited number of preambles. Moreover, unlike prior work, the performance of the proposed blind goal-oriented massive access does not depend on the number of devices. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2109.00663 [pdf, other]

Controllable deep melody generation via hierarchical music structure representation

Authors: Shuqi Dai, Zeyu Jin, Celso Gomes, Roger B. Dannenberg

Abstract: Recent advances in deep learning have expanded possibilities to generate music, but generating a customizable full piece of music with consistent long-term structure remains a challenge. This paper introduces MusicFrameworks, a hierarchical music structure representation and a multi-step generative process to create a full-length melody guided by long-term repetitive structure, chord, melodic cont… ▽ More Recent advances in deep learning have expanded possibilities to generate music, but generating a customizable full piece of music with consistent long-term structure remains a challenge. This paper introduces MusicFrameworks, a hierarchical music structure representation and a multi-step generative process to create a full-length melody guided by long-term repetitive structure, chord, melodic contour, and rhythm constraints. We first organize the full melody with section and phrase-level structure. To generate melody in each phrase, we generate rhythm and basic melody using two separate transformer-based networks, and then generate the melody conditioned on the basic melody, rhythm and chords in an auto-regressive manner. By factoring music generation into sub-problems, our approach allows simpler models and requires less data. To customize or add variety, one can alter chords, basic melody, and rhythm structure in the music frameworks, letting our networks generate the melody accordingly. Additionally, we introduce new features to encode musical positional information, rhythm patterns, and melodic contours based on musical domain knowledge. A listening test reveals that melodies generated by our method are rated as good as or better than human-composed music in the POP909 dataset about half the time. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: 6 pages, 9 figures, in Proc. of the 22nd Int. Society for Music Information Retrieval Conf.,Online, 2021

arXiv:2108.09498 [pdf, other]

doi 10.1109/LSP.2022.3165759

Active User Detection and Channel Estimation for Spatial-based Random Access in Crowded Massive MIMO Systems via Blind Super-resolution

Authors: Abolghasem Afshar, Vahid Tabataba Vakili, Sajad Daei

Abstract: This work presents a novel framework for random access in crowded scenarios of multiple-input multiple-output(MIMO) systems. A multi-antenna base station (BS) and multiple single-antenna users are considered in these systems. A huge portion of the system resources is dedicated as orthogonal pilots for accurate channel estimation which imposes a huge training overhead. This overhead can be highly m… ▽ More This work presents a novel framework for random access in crowded scenarios of multiple-input multiple-output(MIMO) systems. A multi-antenna base station (BS) and multiple single-antenna users are considered in these systems. A huge portion of the system resources is dedicated as orthogonal pilots for accurate channel estimation which imposes a huge training overhead. This overhead can be highly mitigated by exploiting intrinsic angular domain sparsity of massive MIMO channels and the sporadic traffic of users, i.e., few number of users are active to sent or receive data in each coherence interval. In fact, the angles of arrivals (AoAs) coming from active users are continuous parameters and can take any arbitrary values. Besides, the AoAs corresponding to each active user are alongside each other forming a specific cluster. This work revolves around exploiting these features. Specifically, a blind clustering-based algorithm is proposed that not only recovers the transmitted data by users in grant free random access and primary pilots in random access blocks of coherent transmission, but also provides accurate channel estimation. Our approach is based on transforming the unknown variables into a higher dimensional space with matrix variables. An off-grid atomic norm minimization is then proposed to obtain the unknown matrix from only a few observed arrays at the BS. Then, a clustering-based approach is employed to identify which AoAs correspond to which active users. After identifying active users and their AoAs, an alternating-based approach is performed to obtain the channels and data or primary pilots of active users. Simulation results demonstrate the effectiveness of our approach in AoA detection as well as data recovery. △ Less

Submitted 26 April, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

Journal ref: IEEE Signal Processing Letters, April 2022

arXiv:2108.03591 [pdf, other]

FederatedNILM: A Distributed and Privacy-preserving Framework for Non-intrusive Load Monitoring based on Federated Deep Learning

Authors: Shuang Dai, Fanlin Meng, Qian Wang, Xizhong Chen

Abstract: Non-intrusive load monitoring (NILM), which usually utilizes machine learning methods and is effective in disaggregating smart meter readings from the household-level into appliance-level consumptions, can help to analyze electricity consumption behaviours of users and enable practical smart energy and smart grid applications. However, smart meters are privately owned and distributed, which make r… ▽ More Non-intrusive load monitoring (NILM), which usually utilizes machine learning methods and is effective in disaggregating smart meter readings from the household-level into appliance-level consumptions, can help to analyze electricity consumption behaviours of users and enable practical smart energy and smart grid applications. However, smart meters are privately owned and distributed, which make real-world applications of NILM challenging. To this end, this paper develops a distributed and privacy-preserving federated deep learning framework for NILM (FederatedNILM), which combines federated learning with a state-of-the-art deep learning architecture to conduct NILM for the classification of typical states of household appliances. Through extensive comparative experiments, the effectiveness of the proposed FederatedNILM framework is demonstrated. △ Less

Submitted 8 August, 2021; originally announced August 2021.

arXiv:2108.01393 [pdf, other]

Electrical peak demand forecasting- A review

Authors: Shuang Dai, Fanlin Meng, Hongsheng Dai, Qian Wang, Xizhong Chen

Abstract: The power system is undergoing rapid evolution with the roll-out of advanced metering infrastructure and local energy applications (e.g. electric vehicles) as well as the increasing penetration of intermittent renewable energy at both transmission and distribution level, which characterizes the peak load demand with stronger randomness and less predictability and therefore poses a threat to the po… ▽ More The power system is undergoing rapid evolution with the roll-out of advanced metering infrastructure and local energy applications (e.g. electric vehicles) as well as the increasing penetration of intermittent renewable energy at both transmission and distribution level, which characterizes the peak load demand with stronger randomness and less predictability and therefore poses a threat to the power grid security. Since storing large quantities of electricity to satisfy load demand is neither economically nor environmentally friendly, effective peak demand management strategies and reliable peak load forecast methods become essential for optimizing the power system operations. To this end, this paper provides a timely and comprehensive overview of peak load demand forecast methods in the literature. To our best knowledge, this is the first comprehensive review on such topic. In this paper we first give a precise and unified problem definition of peak load demand forecast. Second, 139 papers on peak load forecast methods were systematically reviewed where methods were classified into different stages based on the timeline. Thirdly, a comparative analysis of peak load forecast methods are summarized and different optimizing methods to improve the forecast performance are discussed. The paper ends with a comprehensive summary of the reviewed papers and a discussion of potential future research directions. △ Less

Submitted 3 August, 2021; originally announced August 2021.

arXiv:2105.04709 [pdf, other]

Personalized Popular Music Generation Using Imitation and Structure

Authors: Shuqi Dai, Xichu Ma, Ye Wang, Roger B. Dannenberg

Abstract: Many practices have been presented in music generation recently. While stylistic music generation using deep learning techniques has became the main stream, these models still struggle to generate music with high musicality, different levels of music structure, and controllability. In addition, more application scenarios such as music therapy require imitating more specific musical styles from a f… ▽ More Many practices have been presented in music generation recently. While stylistic music generation using deep learning techniques has became the main stream, these models still struggle to generate music with high musicality, different levels of music structure, and controllability. In addition, more application scenarios such as music therapy require imitating more specific musical styles from a few given music examples, rather than capturing the overall genre style of a large data corpus. To address requirements that challenge current deep learning methods, we propose a statistical machine learning model that is able to capture and imitate the structure, melody, chord, and bass style from a given example seed song. An evaluation using 10 pop songs shows that our new representations and methods are able to create high-quality stylistic music that is similar to a given input song. We also discuss potential uses of our approach in music evaluation and music therapy. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: 26 pages, 12 figures

arXiv:2010.12732 [pdf]

doi 10.1109/TUFFC.2020.3000055

Octave-Tunable Magnetostatic Wave YIG Resonators on a Chip

Authors: Sen Dai, Sunil A. Bhave, Renyuan Wang

Abstract: We have designed, fabricated, and characterized magnetostatic wave (MSW) resonators on a chip. The resonators are fabricated by patterning single-crystal yttrium iron garnet (YIG) film on a gadolinium gallium garnet (GGG) substrate and excited by loop-inductor transducers. We achieved this technology breakthrough by developing a YIG film etching process and fabricating thick aluminum coplanar wave… ▽ More We have designed, fabricated, and characterized magnetostatic wave (MSW) resonators on a chip. The resonators are fabricated by patterning single-crystal yttrium iron garnet (YIG) film on a gadolinium gallium garnet (GGG) substrate and excited by loop-inductor transducers. We achieved this technology breakthrough by developing a YIG film etching process and fabricating thick aluminum coplanar waveguide (CPW) inductor loop around each resonator to individually address and excite MSWs. At 4.77 GHz, the 0.68 square mm resonator achieves a quality factor Q > 5000 with a bias field of 987 Oe. We also demonstrate YIG resonator tuning by more than one octave from 3.63 to 7.63 GHz by applying an in-plane external magnetic field. The measured quality factor of the resonator is consistently over 3000 above 4 GHz. The micromachining technology enables the fabrication of multiple single- and two-port YIG resonators on the same chip with all resonators demonstrating octave tunability and high Q . △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2010.07518 [pdf, other]

Automatic Analysis and Influence of Hierarchical Structure on Melody, Rhythm and Harmony in Popular Music

Authors: Shuqi Dai, Huan Zhang, Roger B. Dannenberg

Abstract: Repetition is a basic indicator of musical structure. This study introduces new algorithms for identifying musical phrases based on repetition. Phrases combine to form sections yielding a two-level hierarchical structure. Automatically detected hierarchical repetition structures reveal significant interactions between structure and chord progressions, melody and rhythm. Different levels of hierarc… ▽ More Repetition is a basic indicator of musical structure. This study introduces new algorithms for identifying musical phrases based on repetition. Phrases combine to form sections yielding a two-level hierarchical structure. Automatically detected hierarchical repetition structures reveal significant interactions between structure and chord progressions, melody and rhythm. Different levels of hierarchy interact differently, providing evidence that structural hierarchy plays an important role in music beyond simple notions of repetition or similarity. Our work suggests new applications for music generation and music evaluation. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: In Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020), Stockholm, Sweden, October 21-24, 2020

arXiv:2008.07142 [pdf, other]

POP909: A Pop-song Dataset for Music Arrangement Generation

Authors: Ziyu Wang, Ke Chen, Junyan Jiang, Yiyi Zhang, Maoran Xu, Shuqi Dai, Xianbin Gu, Gus Xia

Abstract: Music arrangement generation is a subtask of automatic music generation, which involves reconstructing and re-conceptualizing a piece with new compositional techniques. Such a generation process inevitably requires reference from the original melody, chord progression, or other structural information. Despite some promising models for arrangement, they lack more refined data to achieve better eval… ▽ More Music arrangement generation is a subtask of automatic music generation, which involves reconstructing and re-conceptualizing a piece with new compositional techniques. Such a generation process inevitably requires reference from the original melody, chord progression, or other structural information. Despite some promising models for arrangement, they lack more refined data to achieve better evaluations and more practical results. In this paper, we propose POP909, a dataset which contains multiple versions of the piano arrangements of 909 popular songs created by professional musicians. The main body of the dataset contains the vocal melody, the lead instrument melody, and the piano accompaniment for each song in MIDI format, which are aligned to the original audio files. Furthermore, we provide the annotations of tempo, beat, key, and chords, where the tempo curves are hand-labeled and others are done by MIR algorithms. Finally, we conduct several baseline experiments with this dataset using standard deep music generation algorithms. △ Less

Submitted 17 August, 2020; originally announced August 2020.

Journal ref: In Proceedings of 21st International Conference on Music Information Retrieval (ISMIR), Montreal, Canada (virtual conference), 2020

arXiv:2004.00259 [pdf, ps, other]

doi 10.1016/j.sigpro.2022.108786

Demixing Sines and Spikes Using Multiple Measurement Vectors

Authors: Hoomaan Maskan, Sajad Daei, Mohammad Hossein Kahaei

Abstract: In this paper, we address the line spectral estimation problem with multiple measurement corrupted vectors. Such scenarios appear in many practical applications such as radar, optics, and seismic imaging in which the signal of interest can be modeled as the sum of a spectrally sparse and a blocksparse signal known as outlier. Our aim is to demix the two components and for that, we design a convex… ▽ More In this paper, we address the line spectral estimation problem with multiple measurement corrupted vectors. Such scenarios appear in many practical applications such as radar, optics, and seismic imaging in which the signal of interest can be modeled as the sum of a spectrally sparse and a blocksparse signal known as outlier. Our aim is to demix the two components and for that, we design a convex problem whose objective function promotes both of the structures. Using positive trigonometric polynomials (PTP) theory, we reformulate the dual problem as a semi-definite program (SDP). Our theoretical results states that for a fixed number of measurements N and constant number of outliers, up to O(N) spectral lines can be recovered using our SDP problem as long as a minimum frequency separation condition is satisfied. Our simulation results also show that increasing the number of samples per measurement vectors, reduces the minimum required frequency separation for successful recovery. △ Less

Submitted 23 September, 2022; v1 submitted 1 April, 2020; originally announced April 2020.

Comments: 33 pages, 8 figures. Signal Processing (2022)

arXiv:1911.11825 [pdf, other]

Autonomous WiFi Fingerprinting for Indoor Localization

Authors: Shilong Dai, Liang He, Xuebo Zhang

Abstract: WiFi-based indoor localization has received extensive attentions from both academia and industry. However, the overhead of constructing and maintaining the WiFi fingerprint map remains a bottleneck for the wide-deployment of WiFi-based indoor localization systems. Recently, robots are adopted as the professional surveyor to fingerprint the environment autonomously. But the time and energy cost sti… ▽ More WiFi-based indoor localization has received extensive attentions from both academia and industry. However, the overhead of constructing and maintaining the WiFi fingerprint map remains a bottleneck for the wide-deployment of WiFi-based indoor localization systems. Recently, robots are adopted as the professional surveyor to fingerprint the environment autonomously. But the time and energy cost still limit the coverage of the robot surveyor, thus reduce its scalability. To fill this need, we design an AutonomousWiFi Fingerprinting system, called AuF, which autonomously constructs the fingerprint database with time and energy efficiency. AuF first conduct an automatic initialization process in the target indoor environment, then constructs the WiFi fingerprint database of in two steps: (i) surveying the site without sojourn, (ii) recovering unreliable signals in the database with two methods. We have implemented and evaluated AuF using a Pioneer 3-DX robot, on two sites of our $70$$\times$$90$m$^2$ Department building with different structures and deployments of access points (APs). The results show AuF finishes the fingerprint database construction in 43/51 minutes, and consumes 60/82 Wh on the two floors respectively, which is a 64%/71% and 61%/64% reduction when compared to traditional site survey methods, without degrading the localization accuracy. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: mobile computing, 10 pages

arXiv:1903.10459 [pdf, other]

Spatial Consistency Evaluation Based on Massive SIMO Measurements

Authors: Sida Dai, Martin Kurras

Abstract: In this paper, the spatial consistency of wireless massive single-input-multiple-output channels in a cellular small cell scenario is evaluated based on measurements taken in Berlin city. The evaluation is done by computing the similarity of covariance matrices over the distance. As similarity measure the correlation matrix distance is used. A classification of the measurements tracks based on the… ▽ More In this paper, the spatial consistency of wireless massive single-input-multiple-output channels in a cellular small cell scenario is evaluated based on measurements taken in Berlin city. The evaluation is done by computing the similarity of covariance matrices over the distance. As similarity measure the correlation matrix distance is used. A classification of the measurements tracks based on the shape of the curves into four different categories is done. The results in this paper indicate that spatial consistency is a highly deterministic property in the sense that it depends strongly on the individual environment and not so much on large scale parameters. Therefore, we conclude that spatial consistency is not sufficiently modelled by the current 3rd Generation Partnership Project feature. △ Less

Submitted 25 March, 2019; originally announced March 2019.

arXiv:1808.03549 [pdf, other]

Evaluation of the Spatial Consistency Feature in the 3GPP GSCM Channel Model

Authors: Martin Kurras, Sida Dai, Stephan Jaeckel, Lars Thiele

Abstract: Since the development of 4G networks, Multiple-Input Multiple-Output (MIMO) and later multiple-user MIMO became a mature part to increase the spectral efficiency of mobile communication networks. An essential part of simultaneous multiple-user communication is the grouping of users with complementing channel properties. With the introduction of Base Station (BS) with large amount of antenna ports,… ▽ More Since the development of 4G networks, Multiple-Input Multiple-Output (MIMO) and later multiple-user MIMO became a mature part to increase the spectral efficiency of mobile communication networks. An essential part of simultaneous multiple-user communication is the grouping of users with complementing channel properties. With the introduction of Base Station (BS) with large amount of antenna ports, i.e. transceiver units, the focus in spatial precoding is moved from uniform to heterogeneous cell coverage with changing traffic demands throughout the cell and 3D beamforming. In order to deal with the increasing feedback requirement for Frequency-Division Duplex (FDD) systems, concepts for user clustering on second order statistics are suggested in both the scientific and standardization literature. Former 3rd Generation Partnership Project (3GPP) Geometry-based Stochastic Channel Model (GSCM) channel models lack the required spatial correlation of small-scale fading. Since the latest release of 3GPP Geometry-based Stochastic Channel Model this issue is claimed to be solved and hence our contribution is an evaluation of this spatial consistency feature. △ Less

Submitted 10 August, 2018; originally announced August 2018.

arXiv:1803.06841 [pdf, other]

Music Style Transfer: A Position Paper

Authors: Shuqi Dai, Zheng Zhang, Gus G. Xia

Abstract: Led by the success of neural style transfer on visual arts, there has been a rising trend very recently in the effort of music style transfer. However, "music style" is not yet a well-defined concept from a scientific point of view. The difficulty lies in the intrinsic multi-level and multi-modal character of music representation (which is very different from image representation). As a result, de… ▽ More Led by the success of neural style transfer on visual arts, there has been a rising trend very recently in the effort of music style transfer. However, "music style" is not yet a well-defined concept from a scientific point of view. The difficulty lies in the intrinsic multi-level and multi-modal character of music representation (which is very different from image representation). As a result, depending on their interpretation of "music style", current studies under the category of "music style transfer", are actually solving completely different problems that belong to a variety of sub-fields of Computer Music. Also, a vanilla end-to-end approach, which aims at dealing with all levels of music representation at once by directly adopting the method of image style transfer, leads to poor results. Thus, we vitally propose a more scientifically-viable definition of music style transfer by breaking it down into precise concepts of timbre style transfer, performance style transfer and composition style transfer, as well as to connect different aspects of music style transfer with existing well-established sub-fields of computer music studies. In addition, we discuss the current limitations of music style modeling and its future directions by drawing spirit from some deep generative models, especially the ones using unsupervised learning and disentanglement techniques. △ Less

Submitted 19 July, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

Comments: In Proceeding of International Workshop on Musical Metacreation (MUME), 2018, Salamanca, Spain

Showing 1–36 of 36 results for author: Dai, S