-
Aliasing Reduction in Neural Amp Modeling by Smoothing Activations
Authors:
Ryota Sato,
Julius O. Smith III
Abstract:
The increasing demand for high-quality digital emulations of analog audio hardware such as vintage guitar amplifiers has led to numerous works in neural-network-based black-box modeling, with deep learning architectures like WaveNet showing promising results. However, a key limitation in all of these models is the aliasing artifacts that arise from the use of nonlinear activation functions in neur…
▽ More
The increasing demand for high-quality digital emulations of analog audio hardware such as vintage guitar amplifiers has led to numerous works in neural-network-based black-box modeling, with deep learning architectures like WaveNet showing promising results. However, a key limitation in all of these models is the aliasing artifacts that arise from the use of nonlinear activation functions in neural networks. In this paper, we investigate novel and modified activation functions aimed at mitigating aliasing within neural amplifier models. Supporting this, we introduce a novel metric, the Aliasing-to-Signal Ratio (ASR), which quantitatively assesses the level of aliasing with high accuracy. Measuring also the conventional Error-to-Signal Ratio (ESR), we conducted studies on a range of preexisting and modern activation functions with varying stretch factors. Our findings confirmed that activation functions with smoother curves tend to achieve lower ASR values, indicating a noticeable reduction in aliasing. Notably, this improvement in aliasing reduction was achievable without a substantial increase in ESR, demonstrating the potential for high modeling accuracy with reduced aliasing in neural amp models.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
TetraGrip: Sensor-Driven Multi-Suction Reactive Object Manipulation in Cluttered Scenes
Authors:
Paolo Torrado,
Joshua Levin,
Markus Grotz,
Joshua Smith
Abstract:
Warehouse robotic systems equipped with vacuum grippers must reliably grasp a diverse range of objects from densely packed shelves. However, these environments present significant challenges, including occlusions, diverse object orientations, stacked and obstructed items, and surfaces that are difficult to suction. We introduce \tetra, a novel vacuum-based grasping strategy featuring four suction…
▽ More
Warehouse robotic systems equipped with vacuum grippers must reliably grasp a diverse range of objects from densely packed shelves. However, these environments present significant challenges, including occlusions, diverse object orientations, stacked and obstructed items, and surfaces that are difficult to suction. We introduce \tetra, a novel vacuum-based grasping strategy featuring four suction cups mounted on linear actuators. Each actuator is equipped with an optical time-of-flight (ToF) proximity sensor, enabling reactive grasping.
We evaluate \tetra in a warehouse-style setting, demonstrating its ability to manipulate objects in stacked and obstructed configurations. Our results show that our RL-based policy improves picking success in stacked-object scenarios by 22.86\% compared to a single-suction gripper. Additionally, we demonstrate that TetraGrip can successfully grasp objects in scenarios where a single-suction gripper fails due to physical limitations, specifically in two cases: (1) picking an object occluded by another object and (2) retrieving an object in a complex scenario. These findings highlight the advantages of multi-actuated, suction-based grasping in unstructured warehouse environments. The project website is available at: \href{https://tetragrip.github.io/}{https://tetragrip.github.io/}.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Epitaxial high-K AlBN barrier GaN HEMTs
Authors:
Chandrashekhar Savant,
Thai-Son Nguyen,
Kazuki Nomoto,
Saurabh Vishwakarma,
Siyuan Ma,
Akshey Dhar,
Yu-Hsin Chen,
Joseph Casamento,
David J. Smith,
Huili Grace Xing,
Debdeep Jena
Abstract:
We report a polarization-induced 2D electron gas (2DEG) at an epitaxial AlBN/GaN heterojunction grown on a SiC substrate. Using this 2DEG in a long conducting channel, we realize ultra-thin barrier AlBN/GaN high electron mobility transistors that exhibit current densities of more than 0.25 A/mm, clean current saturation, a low pinch-off voltage of -0.43 V, and a peak transconductance of 0.14 S/mm.…
▽ More
We report a polarization-induced 2D electron gas (2DEG) at an epitaxial AlBN/GaN heterojunction grown on a SiC substrate. Using this 2DEG in a long conducting channel, we realize ultra-thin barrier AlBN/GaN high electron mobility transistors that exhibit current densities of more than 0.25 A/mm, clean current saturation, a low pinch-off voltage of -0.43 V, and a peak transconductance of 0.14 S/mm. Transistor performance in this preliminary realization is limited by the contact resistance. Capacitance-voltage measurements reveal that introducing 7 % B in the epitaxial AlBN barrier on GaN boosts the relative dielectric constant of AlBN to 16, higher than the AlN dielectric constant of 9. Epitaxial high-K barrier AlBN/GaN HEMTs can thus extend performance beyond the capabilities of current GaN transistors.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
A Hodge-FAST Framework for High-Resolution Dynamic Functional Connectivity Analysis of Higher Order Interactions in EEG Signals
Authors:
Om Roy,
Yashar Moshfeghi,
Jason Smith,
Agustin Ibanez,
Mario A. Parra,
Keith M. Smith
Abstract:
We introduce a novel framework that integrates Hodge decomposition with Filtered Average Short-Term (FAST) functional connectivity to analyze dynamic functional connectivity (DFC) in EEG signals. This method leverages graph-based topology and simplicial analysis to explore transient connectivity patterns at multiple scales, addressing noise, sparsity, and computational efficiency. The temporal EEG…
▽ More
We introduce a novel framework that integrates Hodge decomposition with Filtered Average Short-Term (FAST) functional connectivity to analyze dynamic functional connectivity (DFC) in EEG signals. This method leverages graph-based topology and simplicial analysis to explore transient connectivity patterns at multiple scales, addressing noise, sparsity, and computational efficiency. The temporal EEG data are first sparsified by keeping only the most globally important connections, instantaneous connectivity at these connections is then filtered by global long-term stable correlations. This tensor is then decomposed into three orthogonal components to study signal flows over higher-order structures such as triangle and loop structures. Our analysis of Alzheimer-related MCI patients show significant temporal differences related to higher-order interactions that a pairwise analysis on its own does not implicate. This allows us for the first time to capture higher-dimensional interactions at high temporal resolution in noisy EEG signal recordings.
△ Less
Submitted 7 February, 2025; v1 submitted 31 January, 2025;
originally announced February 2025.
-
Electrostatic Clutches Enable Simultaneous Mechanical Multiplexing
Authors:
Timothy E. Amish,
Jeffrey T. Auletta,
Chad C. Kessens,
Joshua R. Smith,
Jeffrey I. Lipton
Abstract:
Actuating robotic systems with multiple degrees of freedom (DoF) traditionally requires numerous motors, leading to increased size, weight, cost, and power consumption. Mechanical multiplexing offers a solution by enabling a single actuator to control multiple DoF. However, existing multiplexers have either been limited to electrically controlled time-based multiplexing that control one DoF at a t…
▽ More
Actuating robotic systems with multiple degrees of freedom (DoF) traditionally requires numerous motors, leading to increased size, weight, cost, and power consumption. Mechanical multiplexing offers a solution by enabling a single actuator to control multiple DoF. However, existing multiplexers have either been limited to electrically controlled time-based multiplexing that control one DoF at a time or have relied on mechanical switching to control multiple DoF simultaneously. There is a strong need for a system that can perform electrically controlled multiplexing for both time-based and simultaneous control of multiple DoF. This study introduces a novel electrostatic capstan clutch-based mechanical multiplexer that enables high-force, single-motor control of multiple DoF. Here, we show that our system achieves both single-input-single-output (SISO) and single-input-multipleoutput (SIMO) actuation, allowing bidirectional control and position holding with minimal power consumption. Each output can actuate a 22.24 N load, limited by clutch performance, up to 5 cm. The number of outputs and actuation length is currently limited by the length of the drive shaft. We demonstrate the integration of our system into a 4-DoF commercial robotic hand using a single motor. These findings show that electrostatic clutchbased multiplexing provides a scalable and energy-efficient design solution for high-DoF robotic platforms, opening new possibilities for lightweight and power-efficient actuation in robotics.
△ Less
Submitted 21 March, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
A Digital twin for Diesel Engines: Operator-infused PINNs with Transfer Learning for Engine Health Monitoring
Authors:
Kamaljyoti Nath,
Varun Kumar,
Daniel J. Smith,
George Em Karniadakis
Abstract:
Improving diesel engine efficiency and emission reduction have been critical research topics. Recent government regulations have shifted this focus to another important area related to engine health and performance monitoring. Although the advancements in the use of deep learning methods for system monitoring have shown promising results in this direction, designing efficient methods suitable for…
▽ More
Improving diesel engine efficiency and emission reduction have been critical research topics. Recent government regulations have shifted this focus to another important area related to engine health and performance monitoring. Although the advancements in the use of deep learning methods for system monitoring have shown promising results in this direction, designing efficient methods suitable for field systems remains an open research challenge. The objective of this study is to develop a computationally efficient neural network-based approach for identifying unknown parameters of a mean value diesel engine model to facilitate physics-based health monitoring and maintenance forecasting. We propose a hybrid method combining physics informed neural networks, PINNs, and a deep neural operator, DeepONet to predict unknown parameters and gas flow dynamics in a diesel engine. The operator network predicts independent actuator dynamics learnt through offline training, thereby reducing the PINNs online computational cost. To address PINNs need for retraining with changing input scenarios, we propose two transfer learning (TL) strategies. The first strategy involves multi-stage transfer learning for parameter identification. While this method is computationally efficient as compared to online PINN training, improvements are required to meet field requirements. The second TL strategy focuses solely on training the output weights and biases of a subset of multi-head networks pretrained on a larger dataset, substantially reducing computation time during online prediction. We also evaluate our model for epistemic and aleatoric uncertainty by incorporating dropout in pretrained networks and Gaussian noise in the training dataset. This strategy offers a tailored, computationally inexpensive, and physics-based approach for parameter identification in diesel engine sub systems.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Phase Selection and Analysis for Multi-frequency Multi-user RIS Systems Employing Subsurfaces in Correlated Ricean and Rayleigh Environments
Authors:
Amy S. Inwood,
Peter J. Smith,
Philippa A. Martin,
Graeme K. Woodward
Abstract:
We analyse the performance of a reconfigurable intelligent surface (RIS) aided system where the RIS is divided into subsurfaces. Each subsurface is designed specifically for one user, who is served on their own frequency band. The other subsurfaces (those not designed for this user) provide additional uncontrolled scattering. We derive the exact closed-form expression for the mean signal-to-noise…
▽ More
We analyse the performance of a reconfigurable intelligent surface (RIS) aided system where the RIS is divided into subsurfaces. Each subsurface is designed specifically for one user, who is served on their own frequency band. The other subsurfaces (those not designed for this user) provide additional uncontrolled scattering. We derive the exact closed-form expression for the mean signal-to-noise ratio (SNR) of the subsurface design (SD) when all channels experience correlated Ricean fading. We simplify this to find the mean SNR for line-of-sight (LoS) channels and channels experiencing correlated Rayleigh fading. An iterative SD (ISD) process is proposed, where subsurfaces are designed sequentially, and the phases that are already set are used to enhance the design of the remaining subsurfaces. This is extended to a converged ISD (CISD), where the ISD process is repeated multiple times until the SNR increases by less than a specified tolerance. The ISD and CISD both provide a performance improvement over SD, which increases as the number of RIS elements (N) increases. The SD is significantly simpler than the lowest complexity multi-user (MU) method we know of, and despite each user having less bandwidth, the SD outperforms the existing method in some key scenarios. The SD is more robust to strongly LoS channels and clustered users, as it does not rely on spatial multiplexing like other MU methods. Combined with the complexity reduction, this makes the SD an attractive phase selection method.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
Rician Channel Modelling for Super Wideband MIMO Communications
Authors:
Sachitha C. Bandara,
Peter J. Smith,
Erfan Khordad,
Robin Evans,
Rajitha Senanayake
Abstract:
Recent developments in Multiple-Input-Multiple-Output (MIMO) technology include packing a large number of antenna elements in a compact array to access the bandwidth benefits provided by higher mutual coupling (MC). The resulting super-wideband (SW) systems require a circuit-theoretic framework to handle the MC and channel models which span extremely large bands. Hence, in this paper, we make two…
▽ More
Recent developments in Multiple-Input-Multiple-Output (MIMO) technology include packing a large number of antenna elements in a compact array to access the bandwidth benefits provided by higher mutual coupling (MC). The resulting super-wideband (SW) systems require a circuit-theoretic framework to handle the MC and channel models which span extremely large bands. Hence, in this paper, we make two key contributions. First, we develop a physically-consistent Rician channel model for use with SW systems. Secondly, we express the circuit-theoretic models in terms of a standard MIMO model, so that insights into the effects of antenna layouts, MC, and bandwidth can be made using standard communication theory. For example, we show the bandwidth widening resulting from the new channel model. In addition, we show that MC distorts line-of-sight paths which has beamforming implications. We also highlight the interaction between spatial correlation and MC and show that tight coupling reduces spatial correlations at low frequencies.
△ Less
Submitted 26 May, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints
Authors:
Haonan Chen,
Jordan B. L. Smith,
Janne Spijkervet,
Ju-Chiang Wang,
Pei Zou,
Bochen Li,
Qiuqiang Kong,
Xingjian Du
Abstract:
Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences.…
▽ More
Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences. To the best of our knowledge, this work is the first to demonstrate the feasibility of training symbolic generation models solely from auto-transcribed audio data. Furthermore, to enhance the controllability of the trained model, we introduce SymPAC (Symbolic Music Language Model with Prompting And Constrained Generation), which is distinguished by using (a) prompt bars in encoding and (b) a technique called Constrained Generation via Finite State Machines (FSMs) during inference time. We show the flexibility and controllability of this approach, which may be critical in making music AI useful to creators and users.
△ Less
Submitted 9 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
SGP-RI: A Real-Time-Trainable and Decentralized IoT Indoor Localization Model Based on Sparse Gaussian Process with Reduced-Dimensional Inputs
Authors:
Zhe Tang,
Sihao Li,
Zichen Huang,
Guandong Yang,
Kyeong Soo Kim,
Jeremy S. Smith
Abstract:
Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. Thi…
▽ More
Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. This centralized approach faces several challenges, including the database's inability to accommodate the dynamic and unpredictable nature of the indoor electromagnetic environment, the model retraining costs, and the susceptibility of centralized servers to security breaches. To mitigate these challenges we aim to amalgamate the offline and online phases of traditional indoor localization methods using a real-time-trainable and decentralized IoT indoor localization model based on Sparse Gaussian Process with Reduced-dimensional Inputs (SGP-RI), where the number and dimension of the input data are reduced through reference point and wireless access point filtering, respectively. The experimental results based on a multi-building and multi-floor static database as well as a single-building and single-floor dynamic database, demonstrate that the proposed SGP-RI model with less than half the training samples as inducing inputs can produce comparable localization performance to the standard Gaussian Process model with the whole training samples. The SGP-RI model enables the decentralization of indoor localization, facilitating its deployment to resource-constrained IoT devices, and thereby could provide enhanced security and privacy, reduced costs, and network dependency. Also, the model's capability of real-time training makes it possible to quickly adapt to the time-varying indoor electromagnetic environment.
△ Less
Submitted 24 August, 2024;
originally announced September 2024.
-
Turbulence Strength $C_n^2$ Estimation from Video using Physics-based Deep Learning
Authors:
Ripon Kumar Saha,
Esen Salcin,
Jihoo Kim,
Joseph Smith,
Suren Jayasuriya
Abstract:
Images captured from a long distance suffer from dynamic image distortion due to turbulent flow of air cells with random temperatures, and thus refractive indices. This phenomenon, known as image dancing, is commonly characterized by its refractive-index structure constant $C_n^2$ as a measure of the turbulence strength. For many applications such as atmospheric forecast model, long-range/astronom…
▽ More
Images captured from a long distance suffer from dynamic image distortion due to turbulent flow of air cells with random temperatures, and thus refractive indices. This phenomenon, known as image dancing, is commonly characterized by its refractive-index structure constant $C_n^2$ as a measure of the turbulence strength. For many applications such as atmospheric forecast model, long-range/astronomy imaging, and aviation safety, optical communication technology, $C_n^2$ estimation is critical for accurately sensing the turbulent environment. Previous methods for $C_n^2$ estimation include estimation from meteorological data (temperature, relative humidity, wind shear, etc.) for single-point measurements, two-ended pathlength measurements from optical scintillometer for path-averaged $C_n^2$, and more recently estimating $C_n^2$ from passive video cameras for low cost and hardware complexity. In this paper, we present a comparative analysis of classical image gradient methods for $C_n^2$ estimation and modern deep learning-based methods leveraging convolutional neural networks. To enable this, we collect a dataset of video capture along with reference scintillometer measurements for ground truth, and we release this unique dataset to the scientific community. We observe that deep learning methods can achieve higher accuracy when trained on similar data, but suffer from generalization errors to other, unseen imagery as compared to classical methods. To overcome this trade-off, we present a novel physics-based network architecture that combines learned convolutional layers with a differentiable image gradient method that maintains high accuracy while being generalizable across image datasets.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
State-Free Inference of State-Space Models: The Transfer Function Approach
Authors:
Rom N. Parnichkun,
Stefano Massaroli,
Alessandro Moro,
Jimmy T. H. Smith,
Ramin Hasani,
Mathias Lechner,
Qi An,
Christopher Ré,
Hajime Asama,
Stefano Ermon,
Taiji Suzuki,
Atsushi Yamashita,
Michael Poli
Abstract:
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of…
▽ More
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF.
△ Less
Submitted 1 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment
Authors:
Jorge Tapias Gomez,
Aneesh Rangnekar,
Hannah Williams,
Hannah Thompson,
Julio Garcia-Aguilar,
Joshua Jesse Smith,
Harini Veeraraghavan
Abstract:
Endoscopic images are used at various stages of rectal cancer treatment starting from cancer screening, diagnosis, during treatment to assess response and toxicity from treatments such as colitis, and at follow up to detect new tumor or local regrowth (LR). However, subjective assessment is highly variable and can underestimate the degree of response in some patients, subjecting them to unnecessar…
▽ More
Endoscopic images are used at various stages of rectal cancer treatment starting from cancer screening, diagnosis, during treatment to assess response and toxicity from treatments such as colitis, and at follow up to detect new tumor or local regrowth (LR). However, subjective assessment is highly variable and can underestimate the degree of response in some patients, subjecting them to unnecessary surgery, or overestimate response that places patients at risk of disease spread. Advances in deep learning has shown the ability to produce consistent and objective response assessment for endoscopic images. However, methods for detecting cancers, regrowth, and monitoring response during the entire course of patient treatment and follow-up are lacking. This is because, automated diagnosis and rectal cancer response assessment requires methods that are robust to inherent imaging illumination variations and confounding conditions (blood, scope, blurring) present in endoscopy images as well as changes to the normal lumen and tumor during treatment. Hence, a hierarchical shifted window (Swin) transformer was trained to distinguish rectal cancer from normal lumen using endoscopy images. Swin as well as two convolutional (ResNet-50, WideResNet-50), and vision transformer (ViT) models were trained and evaluated on follow-up longitudinal images to detect LR on private dataset as well as on out-of-distribution (OOD) public colonoscopy datasets to detect pre/non-cancerous polyps. Color shifts were applied using optimal transport to simulate distribution shifts. Swin and ResNet models were similarly accurate in the in-distribution dataset. Swin was more accurate than other methods (follow-up: 0.84, OOD: 0.83) even when subject to color shifts (follow-up: 0.83, OOD: 0.87), indicating capability to provide robust performance for longitudinal cancer assessment.
△ Less
Submitted 30 January, 2025; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Can FSK Be Optimised for Integrated Sensing and Communications?
Authors:
Tian Han,
Peter J Smith,
Urbashi Mitra,
Jamie S Evans,
Rajitha Senanayake
Abstract:
Motivated by the ideal peak-to-average-power ratio and radar sensing capability of traditional frequency-coded radar waveforms, this paper considers the frequency shift keying (FSK) based waveform for joint communications and radar (JCR). An analysis of the probability distributions of its ambiguity function (AF) sidelobe levels (SLs) and peak sidelobe level (PSL) is conducted to study the radar s…
▽ More
Motivated by the ideal peak-to-average-power ratio and radar sensing capability of traditional frequency-coded radar waveforms, this paper considers the frequency shift keying (FSK) based waveform for joint communications and radar (JCR). An analysis of the probability distributions of its ambiguity function (AF) sidelobe levels (SLs) and peak sidelobe level (PSL) is conducted to study the radar sensing capability of random FSK. Numerical results show that the independent frequency modulation introduces uncontrollable AF PSLs. In order to address this problem, the initial phases of waveform sub-pulses are designed by solving a min-max optimisation problem. Numerical results indicate that the optimisation-based phase design can effectively reduce the AF PSL to a level close to well-designed radar waveforms while having no impact on the data rate and the receiver complexity. For large numbers of waveform sub-pulses and modulation orders, the impact on the error probability is also insignificant.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Anatomical basis of human sex differences in ECG identified by automated torso-cardiac three-dimensional reconstruction
Authors:
Hannah J. Smith,
Blanca Rodriguez,
Yuling Sang,
Marcel Beetz,
Robin Choudhury,
Vicente Grau,
Abhirup Banerjee
Abstract:
Background and Aims: The electrocardiogram (ECG) is routinely used for diagnosis and risk stratification following myocardial infarction (MI), though its interpretation is confounded by anatomical variability and sex differences. Women have a higher incidence of missed MI diagnosis and poorer outcomes following infarction. Sex differences in ECG biomarkers and torso-ventricular anatomy have not be…
▽ More
Background and Aims: The electrocardiogram (ECG) is routinely used for diagnosis and risk stratification following myocardial infarction (MI), though its interpretation is confounded by anatomical variability and sex differences. Women have a higher incidence of missed MI diagnosis and poorer outcomes following infarction. Sex differences in ECG biomarkers and torso-ventricular anatomy have not been well characterised, largely due to the absence of high-throughput torso reconstruction methods.
Methods: This work presents quantification of sex differences in ECG versus anatomical biomarkers in healthy and post-MI subjects, enabled by a novel, end-to-end automated pipeline for torso-ventricular anatomical reconstruction from clinically standard cardiac magnetic resonance imaging. Personalised 3D torso-ventricular reconstructions were generated for 425 post-MI subjects and 1051 healthy controls from the UK Biobank. Regression models were created relating the extracted torso-ventricular and ECG parameters.
Results: Half the sex difference in QRS durations is explained by smaller ventricles in women both in healthy ($3.4 \pm 1.3$ms of $6.0 \pm 1.5$ms) and post-MI ($4.5 \pm 1.4$ms of $8.3 \pm 2.5$ms) subjects. Lower baseline STj amplitude in women is also associated with smaller ventricles, and more superior and posterior cardiac position. Post-MI T wave amplitude and R axis deviations are more strongly associated with a more posterior and horizontal cardiac position in women rather than electrophysiology as in men.
Conclusion: A novel computational pipeline enables the three-dimensional reconstruction of 1476 torso-cardiac geometries of healthy and post-myocardial infarction subjects, quantification of sex and BMI-related differences and association with ECG biomarkers. Any ECG-based tool should be reviewed considering anatomical sex differences to avoid sex-biased outcomes.
△ Less
Submitted 17 July, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Johnsen-Rahbek Capstan Clutch: A High Torque Electrostatic Clutch
Authors:
Timothy E. Amish,
Jeffrey T. Auletta,
Chad C. Kessens,
Joshua R. Smith,
Jeffrey I. Lipton
Abstract:
In many robotic systems, the holding state consumes power, limits operating time, and increases operating costs. Electrostatic clutches have the potential to improve robotic performance by generating holding torques with low power consumption. A key limitation of electrostatic clutches has been their low specific shear stresses which restrict generated holding torque, limiting many applications. H…
▽ More
In many robotic systems, the holding state consumes power, limits operating time, and increases operating costs. Electrostatic clutches have the potential to improve robotic performance by generating holding torques with low power consumption. A key limitation of electrostatic clutches has been their low specific shear stresses which restrict generated holding torque, limiting many applications. Here we show how combining the Johnsen-Rahbek (JR) effect with the exponential tension scaling capstan effect can produce clutches with the highest specific shear stress in the literature. Our system generated 31.3 N/cm^2 sheer stress and a total holding torque of 7.1 Nm while consuming only 2.5 mW/cm^2 at 500 V. We demonstrate a theoretical model of an electrostatic adhesive capstan clutch and demonstrate how large angle (theta > 2pi) designs increase efficiency over planar or small angle (theta < pi) clutch designs. We also report the first unfilled polymeric material, polybenzimidazole (PBI), to exhibit the JR-effect.
△ Less
Submitted 27 March, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Protecting Massive MIMO-Radar Coexistence: Precoding Design and Power Control
Authors:
Mohamed Elfiatoure,
Mohammadali Mohammadi,
Hien Quoc Ngo,
Peter J. Smith,
Michail Matthaiou
Abstract:
This paper studies the coexistence between a downlink multiuser massive multi-input-multi-output (MIMO) communication system and MIMO radar. The performance of the massive MIMO system with maximum ratio ($\MR$), zero-forcing ($\ZF$), and protective $\ZF$ ($\PZF$) precoding designs is characterized in terms of spectral efficiency (SE) and by taking the channel estimation errors and power control in…
▽ More
This paper studies the coexistence between a downlink multiuser massive multi-input-multi-output (MIMO) communication system and MIMO radar. The performance of the massive MIMO system with maximum ratio ($\MR$), zero-forcing ($\ZF$), and protective $\ZF$ ($\PZF$) precoding designs is characterized in terms of spectral efficiency (SE) and by taking the channel estimation errors and power control into account. The idea of $\PZF$ precoding relies on the projection of the information-bearing signal onto the null space of the radar channel to protect the radar against communication signals. We further derive closed-form expressions for the detection probability of the radar system for the considered precoding designs. By leveraging the closed-form expressions for the SE and detection probability, we formulate a power control problem at the radar and base station (BS) to maximize the detection probability while satisfying the per-user SE requirements. This optimization problem can be efficiently tackled via the bisection method by solving a linear feasibility problem. Our analysis and simulations show that the $\PZF$ design has the highest detection probability performance among all designs, with intermediate SE performance compared to the other two designs. Moreover, by optimally selecting the power control coefficients at the BS and radar, the detection probability improves significantly.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Continuous Fluid Antenna Systems: Modeling and Analysis
Authors:
Constantinos Psomas,
Peter J. Smith,
Himal A. Suraweera,
Ioannis Krikidis
Abstract:
Fluid antennas (FAs) is a promising technology for introducing flexibility and reconfigurability in wireless networks. Recent research efforts have highlighted the potential gains that can be achieved in comparison to conventional antennas. These works assume that the FA has a discrete number of positions that the liquid can take. However, from a practical standpoint, the liquid moves in a continu…
▽ More
Fluid antennas (FAs) is a promising technology for introducing flexibility and reconfigurability in wireless networks. Recent research efforts have highlighted the potential gains that can be achieved in comparison to conventional antennas. These works assume that the FA has a discrete number of positions that the liquid can take. However, from a practical standpoint, the liquid moves in a continuous fashion to any point inside the FA. In this paper, we focus on a continuous FA system (CFAS) and present a general framework for its design and analytical evaluation. In particular, we derive closed-form analytical expressions for the level crossing rate (LCR) and the average fade duration of the continuous signal-to-interference ratio (SIR) process over the FA's length. Then, by leveraging the LCR expression, we characterize the system's outage performance with a bound on the cumulative distribution function of the SIR's supremum. Our results confirm that the CFAS outperforms its discrete counterpart and thus provides the performance limits of FA-based systems.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Frequency Estimation Using Complex-Valued Shifted Window Transformer
Authors:
Josiah W. Smith,
Murat Torlak
Abstract:
Estimating closely spaced frequency components of a signal is a fundamental problem in statistical signal processing. In this letter, we introduce 1-D real-valued and complex-valued shifted window (Swin) transformers, referred to as SwinFreq and CVSwinFreq, respectively, for line-spectra frequency estimation on 1-D complex-valued signals. Whereas 2-D Swin transformer-based models have gained tract…
▽ More
Estimating closely spaced frequency components of a signal is a fundamental problem in statistical signal processing. In this letter, we introduce 1-D real-valued and complex-valued shifted window (Swin) transformers, referred to as SwinFreq and CVSwinFreq, respectively, for line-spectra frequency estimation on 1-D complex-valued signals. Whereas 2-D Swin transformer-based models have gained traction for optical image super-resolution, we introduce for the first time a complex-valued Swin module designed to leverage the complex-valued nature of signals for a wide array of applications. The proposed approach overcomes the limitations of the classical algorithms such as the periodogram, MUSIC, and OMP in addition to state-of-the-art deep learning approach cResFreq. SwinFreq and CVSwinFreq boast superior performance at low signal-to-noise ratio SNR and improved resolution capability while requiring fewer model parameters than cResFreq, thus deeming it more suitable for edge and mobile applications. We find that the real-valued Swin-Freq outperforms its complex-valued counterpart CVSwinFreq for several tasks while touting a smaller model size. Finally, we apply the proposed techniques for radar range profile super-resolution using real data. The results from both synthetic and real experimentation validate the numerical and empirical superiority of SwinFreq and CVSwinFreq to the state-of-the-art deep learning-based techniques and traditional frequency estimation algorithms. The code and models are publicly available at https://github.com/josiahwsmith10/spectral-super-resolution-swin.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool
Authors:
Josiah W. Smith,
Murat Torlak
Abstract:
Accelerated by the increasing attention drawn by 5G, 6G, and Internet of Things applications, communication and sensing technologies have rapidly evolved from millimeter-wave (mmWave) to terahertz (THz) in recent years. Enabled by significant advancements in electromagnetic (EM) hardware, mmWave and THz frequency regimes spanning 30 GHz to 300 GHz and 300 GHz to 3000 GHz, respectively, can be empl…
▽ More
Accelerated by the increasing attention drawn by 5G, 6G, and Internet of Things applications, communication and sensing technologies have rapidly evolved from millimeter-wave (mmWave) to terahertz (THz) in recent years. Enabled by significant advancements in electromagnetic (EM) hardware, mmWave and THz frequency regimes spanning 30 GHz to 300 GHz and 300 GHz to 3000 GHz, respectively, can be employed for a host of applications. The main feature of THz systems is high-bandwidth transmission, enabling ultra-high-resolution imaging and high-throughput communications; however, challenges in both the hardware and algorithmic arenas remain for the ubiquitous adoption of THz technology. Spectra comprising mmWave and THz frequencies are well-suited for synthetic aperture radar (SAR) imaging at sub-millimeter resolutions for a wide spectrum of tasks like material characterization and nondestructive testing (NDT). This article provides a tutorial review of systems and algorithms for THz SAR in the near-field with an emphasis on emerging algorithms that combine signal processing and machine learning techniques. As part of this study, an overview of classical and data-driven THz SAR algorithms is provided, focusing on object detection for security applications and SAR image super-resolution. We also discuss relevant issues, challenges, and future research directions for emerging algorithms and THz SAR, including standardization of system and algorithm benchmarking, adoption of state-of-the-art deep learning techniques, signal processing-optimized machine learning, and hybrid data-driven signal processing algorithms...
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Complex-Valued Neural Networks for Data-Driven Signal Processing and Signal Understanding
Authors:
Josiah W. Smith
Abstract:
Complex-valued neural networks have emerged boasting superior modeling performance for many tasks across the signal processing, sensing, and communications arenas. However, developing complex-valued models currently demands development of basic deep learning operations, such as linear or convolution layers, as modern deep learning frameworks like PyTorch and Tensor flow do not adequately support c…
▽ More
Complex-valued neural networks have emerged boasting superior modeling performance for many tasks across the signal processing, sensing, and communications arenas. However, developing complex-valued models currently demands development of basic deep learning operations, such as linear or convolution layers, as modern deep learning frameworks like PyTorch and Tensor flow do not adequately support complex-valued neural networks. This paper overviews a package built on PyTorch with the intention of implementing light-weight interfaces for common complex-valued neural network operations and architectures. Similar to natural language understanding (NLU), which as recently made tremendous leaps towards text-based intelligence, RF Signal Understanding (RFSU) is a promising field extending conventional signal processing algorithms using a hybrid approach of signal mechanics-based insight with data-driven modeling power. Notably, we include efficient implementations for linear, convolution, and attention modules in addition to activation functions and normalization layers such as batchnorm and layernorm. Additionally, we include efficient implementations of manifold-based complex-valued neural network layers that have shown tremendous promise but remain relatively unexplored in many research contexts. Although there is an emphasis on 1-D data tensors, due to a focus on signal processing, communications, and radar data, many of the routines are implemented for 2-D and 3-D data as well. Specifically, the proposed approach offers a useful set of tools and documentation for data-driven signal processing research and practical implementation.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Dual Radar SAR Controller
Authors:
Josiah Smith
Abstract:
The following is a user guide for the Dual Radar SAR Controller graphical user interface (GUI) to operate the dual radar synthetic aperture radar (SAR) scanner. The scanner was designed in the Spring semester of 2022 by Josiah Smith (RA), Yusef Alimam (UG), and Geetika Vedula (UG) with multiple axes of motion for the radar and target under test. The system is operated by a personal computer (PC) r…
▽ More
The following is a user guide for the Dual Radar SAR Controller graphical user interface (GUI) to operate the dual radar synthetic aperture radar (SAR) scanner. The scanner was designed in the Spring semester of 2022 by Josiah Smith (RA), Yusef Alimam (UG), and Geetika Vedula (UG) with multiple axes of motion for the radar and target under test. The system is operated by a personal computer (PC) running MATLAB. An AMC4030 motion controller is employed to control the mechanical system. An ESP32 microcontroller synchronizes the mechanical motion and radar frame firing to achieving precise positioning at high movement speeds; the software was designed by Josiah Smith (RA) and Benjamin Roy (UG). A second system is designed that employs 3-axes of motion (X-Y + rotation) for fine control over the location of the target under test. The entire system is capable of efficiently collecting data from colocated and non-colocated radars for multiband fusion imaging in addition to simple single radar imaging.
△ Less
Submitted 27 June, 2023;
originally announced September 2023.
-
Radar-STDA: A High-Performance Spatial-Temporal Denoising Autoencoder for Interference Mitigation of FMCW Radars
Authors:
Lulu Liu,
Runwei Guan,
Fei Ma,
Jeremy Smith,
Yutao Yue
Abstract:
With its small size, low cost and all-weather operation, millimeter-wave radar can accurately measure the distance, azimuth and radial velocity of a target compared to other traffic sensors. However, in practice, millimeter-wave radars are plagued by various interferences, leading to a drop in target detection accuracy or even failure to detect targets. This is undesirable in autonomous vehicles a…
▽ More
With its small size, low cost and all-weather operation, millimeter-wave radar can accurately measure the distance, azimuth and radial velocity of a target compared to other traffic sensors. However, in practice, millimeter-wave radars are plagued by various interferences, leading to a drop in target detection accuracy or even failure to detect targets. This is undesirable in autonomous vehicles and traffic surveillance, as it is likely to threaten human life and cause property damage. Therefore, interference mitigation is of great significance for millimeter-wave radar-based target detection. Currently, the development of deep learning is rapid, but existing deep learning-based interference mitigation models still have great limitations in terms of model size and inference speed. For these reasons, we propose Radar-STDA, a Radar-Spatial Temporal Denoising Autoencoder. Radar-STDA is an efficient nano-level denoising autoencoder that takes into account both spatial and temporal information of range-Doppler maps. Among other methods, it achieves a maximum SINR of 17.08 dB with only 140,000 parameters. It obtains 207.6 FPS on an RTX A4000 GPU and 56.8 FPS on an NVIDIA Jetson AGXXavier respectively when denoising range-Doppler maps for three consecutive frames. Moreover, we release a synthetic data set called Ra-inf for the task, which involves 384,769 range-Doppler maps with various clutters from objects of no interest and receiver noise in realistic scenarios. To the best of our knowledge, Ra-inf is the first synthetic dataset of radar interference. To support the community, our research is open-source via the link \url{https://github.com/GuanRunwei/rd_map_temporal_spatial_denoising_autoencoder}.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Novel Hybrid-Learning Algorithms for Improved Millimeter-Wave Imaging Systems
Authors:
Josiah Smith
Abstract:
Increasing attention is being paid to millimeter-wave (mmWave), 30 GHz to 300 GHz, and terahertz (THz), 300 GHz to 10 THz, sensing applications including security sensing, industrial packaging, medical imaging, and non-destructive testing. Traditional methods for perception and imaging are challenged by novel data-driven algorithms that offer improved resolution, localization, and detection rates.…
▽ More
Increasing attention is being paid to millimeter-wave (mmWave), 30 GHz to 300 GHz, and terahertz (THz), 300 GHz to 10 THz, sensing applications including security sensing, industrial packaging, medical imaging, and non-destructive testing. Traditional methods for perception and imaging are challenged by novel data-driven algorithms that offer improved resolution, localization, and detection rates. Over the past decade, deep learning technology has garnered substantial popularity, particularly in perception and computer vision applications. Whereas conventional signal processing techniques are more easily generalized to various applications, hybrid approaches where signal processing and learning-based algorithms are interleaved pose a promising compromise between performance and generalizability. Furthermore, such hybrid algorithms improve model training by leveraging the known characteristics of radio frequency (RF) waveforms, thus yielding more efficiently trained deep learning algorithms and offering higher performance than conventional methods. This dissertation introduces novel hybrid-learning algorithms for improved mmWave imaging systems applicable to a host of problems in perception and sensing. Various problem spaces are explored, including static and dynamic gesture classification; precise hand localization for human computer interaction; high-resolution near-field mmWave imaging using forward synthetic aperture radar (SAR); SAR under irregular scanning geometries; mmWave image super-resolution using deep neural network (DNN) and Vision Transformer (ViT) architectures; and data-level multiband radar fusion using a novel hybrid-learning architecture. Furthermore, we introduce several novel approaches for deep learning model training and dataset synthesis.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Cutting-Edge Techniques for Depth Map Super-Resolution
Authors:
Ryan Peterson,
Josiah Smith
Abstract:
To overcome hardware limitations in commercially available depth sensors which result in low-resolution depth maps, depth map super-resolution (DMSR) is a practical and valuable computer vision task. DMSR requires upscaling a low-resolution (LR) depth map into a high-resolution (HR) space. Joint image filtering for DMSR has been applied using spatially-invariant and spatially-variant convolutional…
▽ More
To overcome hardware limitations in commercially available depth sensors which result in low-resolution depth maps, depth map super-resolution (DMSR) is a practical and valuable computer vision task. DMSR requires upscaling a low-resolution (LR) depth map into a high-resolution (HR) space. Joint image filtering for DMSR has been applied using spatially-invariant and spatially-variant convolutional neural network (CNN) approaches. In this project, we propose a novel joint image filtering DMSR algorithm using a Swin transformer architecture. Furthermore, we introduce a Nonlinear Activation Free (NAF) network based on a conventional CNN model used in cutting-edge image restoration applications and compare the performance of the techniques. The proposed algorithms are validated through numerical studies and visual examples demonstrating improvements to state-of-the-art performance while maintaining competitive computation time for noisy depth map super-resolution.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Efficient CNN-based Super Resolution Algorithms for mmWave Mobile Radar Imaging
Authors:
Christos Vasileiou,
Josiah W. Smith,
Shiva Thiagarajan,
Matthew Nigh,
Yiorgos Makris,
Murat Torlak
Abstract:
In this paper, we introduce an innovative super resolution approach to emerging modes of near-field synthetic aperture radar (SAR) imaging. Recent research extends convolutional neural network (CNN) architectures from the optical to the electromagnetic domain to achieve super resolution on images generated from radar signaling. Specifically, near-field synthetic aperture radar (SAR) imaging, a met…
▽ More
In this paper, we introduce an innovative super resolution approach to emerging modes of near-field synthetic aperture radar (SAR) imaging. Recent research extends convolutional neural network (CNN) architectures from the optical to the electromagnetic domain to achieve super resolution on images generated from radar signaling. Specifically, near-field synthetic aperture radar (SAR) imaging, a method for generating high-resolution images by scanning a radar across space to create a synthetic aperture, is of interest due to its high-fidelity spatial sensing capability, low cost devices, and large application space. Since SAR imaging requires large aperture sizes to achieve high resolution, super-resolution algorithms are valuable for many applications. Freehand smartphone SAR, an emerging sensing modality, requires irregular SAR apertures in the near-field and computation on mobile devices. Achieving efficient high-resolution SAR images from irregularly sampled data collected by freehand motion of a smartphone is a challenging task. In this paper, we propose a novel CNN architecture to achieve SAR image super-resolution for mobile applications by employing state-of-the-art SAR processing and deep learning techniques. The proposed algorithm is verified via simulation and an empirical study. Our algorithm demonstrates high-efficiency and high-resolution radar imaging for near-field scenarios with irregular scanning geometries.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution
Authors:
Josiah Smith,
Yusef Alimam,
Geetika Vedula,
Murat Torlak
Abstract:
In this paper, we develop a novel super-resolution algorithm for near-field synthetic-aperture radar (SAR) under irregular scanning geometries. As fifth-generation (5G) millimeter-wave (mmWave) devices are becoming increasingly affordable and available, high-resolution SAR imaging is feasible for end-user applications and non-laboratory environments. Emerging applications such freehand imaging, wh…
▽ More
In this paper, we develop a novel super-resolution algorithm for near-field synthetic-aperture radar (SAR) under irregular scanning geometries. As fifth-generation (5G) millimeter-wave (mmWave) devices are becoming increasingly affordable and available, high-resolution SAR imaging is feasible for end-user applications and non-laboratory environments. Emerging applications such freehand imaging, wherein a handheld radar is scanned throughout space by a user, unmanned aerial vehicle (UAV) imaging, and automotive SAR face several unique challenges for high-resolution imaging. First, recovering a SAR image requires knowledge of the array positions throughout the scan. While recent work has introduced camera-based positioning systems capable of adequately estimating the position, recovering the algorithm efficiently is a requirement to enable edge and Internet of Things (IoT) technologies. Efficient algorithms for non-cooperative near-field SAR sampling have been explored in recent work, but suffer image defocusing under position estimation error and can only produce medium-fidelity images. In this paper, we introduce a mobile-friend vision transformer (ViT) architecture to address position estimation error and perform SAR image super-resolution (SR) under irregular sampling geometries. The proposed algorithm, Mobile-SRViT, is the first to employ a ViT approach for SAR image enhancement and is validated in simulation and via empirical studies.
△ Less
Submitted 27 June, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Efficient 3-D Near-Field MIMO-SAR Imaging for Irregular Scanning Geometries
Authors:
Josiah Smith,
Murat Torlak
Abstract:
In this article, we introduce a novel algorithm for efficient near-field synthetic aperture radar (SAR) imaging for irregular scanning geometries. With the emergence of fifth-generation (5G) millimeter-wave (mmWave) devices, near-field SAR imaging is no longer confined to laboratory environments. Recent advances in positioning technology have attracted significant interest for a diverse set of new…
▽ More
In this article, we introduce a novel algorithm for efficient near-field synthetic aperture radar (SAR) imaging for irregular scanning geometries. With the emergence of fifth-generation (5G) millimeter-wave (mmWave) devices, near-field SAR imaging is no longer confined to laboratory environments. Recent advances in positioning technology have attracted significant interest for a diverse set of new applications in mmWave imaging. However, many use cases, such as automotive-mounted SAR imaging, unmanned aerial vehicle (UAV) imaging, and freehand imaging with smartphones, are constrained to irregular scanning geometries. Whereas traditional near-field SAR imaging systems and quick personnel security (QPS) scanners employ highly precise motion controllers to create ideal synthetic arrays, emerging applications, mentioned previously, inherently cannot achieve such ideal positioning. In addition, many Internet of Things (IoT) and 5G applications impose strict size and computational complexity limitations that must be considered for edge mmWave imaging technology. In this study, we propose a novel algorithm to leverage the advantages of non-cooperative SAR scanning patterns, small form-factor multiple-input multiple-output (MIMO) radars, and efficient monostatic planar image reconstruction algorithms. We propose a framework to mathematically decompose arbitrary and irregular sampling geometries and a joint solution to mitigate multistatic array imaging artifacts. The proposed algorithm is validated through simulations and an empirical study of arbitrary scanning scenarios. Our algorithm achieves high-resolution and high-efficiency near-field MIMO-SAR imaging, and is an elegant solution to computationally constrained irregularly sampled imaging problems.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Improved Static Hand Gesture Classification on Deep Convolutional Neural Networks using Novel Sterile Training Technique
Authors:
Josiah Smith,
Shiva Thiagarajan,
Richard Willis,
Yiorgos Makris,
Murat Torlak
Abstract:
In this paper, we investigate novel data collection and training techniques towards improving classification accuracy of non-moving (static) hand gestures using a convolutional neural network (CNN) and frequency-modulated-continuous-wave (FMCW) millimeter-wave (mmWave) radars. Recently, non-contact hand pose and static gesture recognition have received considerable attention in many applications r…
▽ More
In this paper, we investigate novel data collection and training techniques towards improving classification accuracy of non-moving (static) hand gestures using a convolutional neural network (CNN) and frequency-modulated-continuous-wave (FMCW) millimeter-wave (mmWave) radars. Recently, non-contact hand pose and static gesture recognition have received considerable attention in many applications ranging from human-computer interaction (HCI), augmented/virtual reality (AR/VR), and even therapeutic range of motion for medical applications. While most current solutions rely on optical or depth cameras, these methods require ideal lighting and temperature conditions. mmWave radar devices have recently emerged as a promising alternative offering low-cost system-on-chip sensors whose output signals contain precise spatial information even in non-ideal imaging conditions. Additionally, deep convolutional neural networks have been employed extensively in image recognition by learning both feature extraction and classification simultaneously. However, little work has been done towards static gesture recognition using mmWave radars and CNNs due to the difficulty involved in extracting meaningful features from the radar return signal, and the results are inferior compared with dynamic gesture classification. This article presents an efficient data collection approach and a novel technique for deep CNN training by introducing ``sterile'' images which aid in distinguishing distinct features among the static gestures and subsequently improve the classification accuracy. Applying the proposed data collection and training methods yields an increase in classification rate of static hand gestures from $85\%$ to $93\%$ and $90\%$ to $95\%$ for range and range-angle profiles, respectively.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Near-Field MIMO-ISAR Millimeter-Wave Imaging
Authors:
Josiah W. Smith,
Muhammet Emin Yanik,
Murat Torlak
Abstract:
Multiple-input-multiple-output (MIMO) millimeter-wave (mmWave) sensors for synthetic aperture radar (SAR) and inverse SAR (ISAR) address the fundamental challenges of cost-effectiveness and scalability inherent to near-field imaging. In this paper, near-field MIMO-ISAR mmWave imaging systems are discussed and developed. The rotational ISAR (R-ISAR) regime investigated in this paper requires rotati…
▽ More
Multiple-input-multiple-output (MIMO) millimeter-wave (mmWave) sensors for synthetic aperture radar (SAR) and inverse SAR (ISAR) address the fundamental challenges of cost-effectiveness and scalability inherent to near-field imaging. In this paper, near-field MIMO-ISAR mmWave imaging systems are discussed and developed. The rotational ISAR (R-ISAR) regime investigated in this paper requires rotating the target at a constant radial distance from the transceiver and scanning the transceiver along a vertical track. Using a 77GHz mmWave radar, a high resolution three-dimensional (3-D) image can be reconstructed from this two-dimensional scanning taking into account the spherical near-field wavefront. While prior work in literature consists of single-input-single-output circular synthetic aperture radar (SISO-CSAR) algorithms or computationally sluggish MIMO-CSAR image reconstruction algorithms, this paper proposes a novel algorithm for efficient MIMO 3-D holographic imaging and details the design of a MIMO R-ISAR imaging system. The proposed algorithm applies a multistatic-to-monostatic phase compensation to the R-ISAR regime allowing for use of highly efficient monostatic algorithms. We demonstrate the algorithm's performance in real-world imaging scenarios on a prototyped MIMO R-ISAR platform. Our fully integrated system, consisting of a mechanical scanner and efficient imaging algorithm, is capable of pairing the scanning efficiency of the MIMO regime with the computational efficiency of single pixel image reconstruction algorithms.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Deep Learning-Based Multiband Signal Fusion for 3-D SAR Super-Resolution
Authors:
Josiah Smith,
Murat Torlak
Abstract:
Three-dimensional (3-D) synthetic aperture radar (SAR) is widely used in many security and industrial applications requiring high-resolution imaging of concealed or occluded objects. The ability to resolve intricate 3-D targets is essential to the performance of such applications and depends directly on system bandwidth. However, because high-bandwidth systems face several prohibitive hurdles, an…
▽ More
Three-dimensional (3-D) synthetic aperture radar (SAR) is widely used in many security and industrial applications requiring high-resolution imaging of concealed or occluded objects. The ability to resolve intricate 3-D targets is essential to the performance of such applications and depends directly on system bandwidth. However, because high-bandwidth systems face several prohibitive hurdles, an alternative solution is to operate multiple radars at distinct frequency bands and fuse the multiband signals. Current multiband signal fusion methods assume a simple target model and a small number of point reflectors, which is invalid for realistic security screening and industrial imaging scenarios wherein the target model effectively consists of a large number of reflectors. To the best of our knowledge, this study presents the first use of deep learning for multiband signal fusion. The proposed network, called kR-Net, employs a hybrid, dual-domain complex-valued convolutional neural network (CV-CNN) to fuse multiband signals and impute the missing samples in the frequency gaps between subbands. By exploiting the relationships in both the wavenumber domain and wavenumber spectral domain, the proposed framework overcomes the drawbacks of existing multiband imaging techniques for realistic scenarios at a fraction of the computation time of existing multiband fusion algorithms. Our method achieves high-resolution imaging of intricate targets previously impossible using conventional techniques and enables finer resolution capacity for concealed weapon detection and occluded object classification using multiband signaling without requiring more advanced hardware. Furthermore, a fully integrated multiband imaging system is developed using commercially available millimeter-wave (mmWave) radars for efficient multiband imaging.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
An FCNN-Based Super-Resolution mmWave Radar Framework for Contactless Musical Instrument Interface
Authors:
Josiah W. Smith,
Orges Furxhi,
Murat Torlak
Abstract:
In this article, we propose a framework for contactless human-computer interaction (HCI) using novel tracking techniques based on deep learning-based super-resolution and tracking algorithms. Our system offers unprecedented high-resolution tracking of hand position and motion characteristics by leveraging spatial and temporal features embedded in the reflected radar waveform. Rather than classifyi…
▽ More
In this article, we propose a framework for contactless human-computer interaction (HCI) using novel tracking techniques based on deep learning-based super-resolution and tracking algorithms. Our system offers unprecedented high-resolution tracking of hand position and motion characteristics by leveraging spatial and temporal features embedded in the reflected radar waveform. Rather than classifying samples from a predefined set of hand gestures, as common in existing work on deep learning with mmWave radar, our proposed imager employs a regressive full convolutional neural network (FCNN) approach to improve localization accuracy by spatial super-resolution. While the proposed techniques are suitable for a host of tracking applications, this article focuses on their application as a musical interface to demonstrate the robustness of the gesture sensing pipeline and deep learning signal processing chain. The user can control the instrument by varying the position and velocity of their hand above the vertically-facing sensor. By employing a commercially available multiple-input-multiple-output (MIMO) radar rather than a traditional optical sensor, our framework demonstrates the efficacy of the mmWave sensing modality for fine motion tracking and offers an elegant solution to a host of HCI tasks. Additionally, we provide a freely available software package and user interface for controlling the device, streaming the data to MATLAB in real-time, and increasing accessibility to the signal processing and device interface functionality utilized in this article.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Real-Time Prediction of Gas Flow Dynamics in Diesel Engines using a Deep Neural Operator Framework
Authors:
Varun Kumar,
Somdatta Goswami,
Daniel J. Smith,
George Em Karniadakis
Abstract:
We develop a data-driven deep neural operator framework to approximate multiple output states for a diesel engine and generate real-time predictions with reasonable accuracy. As emission norms become more stringent, the need for fast and accurate models that enable analysis of system behavior have become an essential requirement for system development. The fast transient processes involved in the…
▽ More
We develop a data-driven deep neural operator framework to approximate multiple output states for a diesel engine and generate real-time predictions with reasonable accuracy. As emission norms become more stringent, the need for fast and accurate models that enable analysis of system behavior have become an essential requirement for system development. The fast transient processes involved in the operation of a combustion engine make it difficult to develop accurate physics-based models for such systems. As an alternative to physics based models, we develop an operator-based regression model (DeepONet) to learn the relevant output states for a mean-value gas flow engine model using the engine operating conditions as input variables. We have adopted a mean-value model as a benchmark for comparison, simulated using Simulink. The developed approach necessitates using the initial conditions of the output states to predict the accurate sequence over the temporal domain. To this end, a sequence-to-sequence approach is embedded into the proposed framework. The accuracy of the model is evaluated by comparing the prediction output to ground truth generated from Simulink model. The maximum $\mathcal L_2$ relative error observed was approximately $6.5\%$. The sensitivity of the DeepONet model is evaluated under simulated noise conditions and the model shows relatively low sensitivity to noise. The uncertainty in model prediction is further assessed by using a mean ensemble approach. The worst-case error at the $(μ+ 2σ)$ boundary was found to be $12\%$. The proposed framework provides the ability to predict output states in real-time and enables data-driven learning of complex input-output operator mapping. As a result, this model can be applied during initial development stages, where accurate models may not be available.
△ Less
Submitted 6 July, 2023; v1 submitted 2 April, 2023;
originally announced April 2023.
-
Roadmap on Deep Learning for Microscopy
Authors:
Giovanni Volpe,
Carolina Wählby,
Lei Tian,
Michael Hecht,
Artur Yakimovich,
Kristina Monakhova,
Laura Waller,
Ivo F. Sbalzarini,
Christopher A. Metzler,
Mingyang Xie,
Kevin Zhang,
Isaac C. D. Lenton,
Halina Rubinsztein-Dunlop,
Daniel Brunner,
Bijie Bai,
Aydogan Ozcan,
Daniel Midtvedt,
Hao Wang,
Nataša Sladoje,
Joakim Lindblad,
Jason T. Smith,
Marien Ochoa,
Margarida Barroso,
Xavier Intes,
Tong Qiu
, et al. (50 additional authors not shown)
Abstract:
Through digital imaging, microscopy has evolved from primarily being a means for visual observation of life at the micro- and nano-scale, to a quantitative tool with ever-increasing resolution and throughput. Artificial intelligence, deep neural networks, and machine learning are all niche terms describing computational methods that have gained a pivotal role in microscopy-based research over the…
▽ More
Through digital imaging, microscopy has evolved from primarily being a means for visual observation of life at the micro- and nano-scale, to a quantitative tool with ever-increasing resolution and throughput. Artificial intelligence, deep neural networks, and machine learning are all niche terms describing computational methods that have gained a pivotal role in microscopy-based research over the past decade. This Roadmap is written collectively by prominent researchers and encompasses selected aspects of how machine learning is applied to microscopy image data, with the aim of gaining scientific knowledge by improved image quality, automated detection, segmentation, classification and tracking of objects, and efficient merging of information from multiple imaging modalities. We aim to give the reader an overview of the key developments and an understanding of possibilities and limitations of machine learning for microscopy. It will be of interest to a wide cross-disciplinary audience in the physical sciences and life sciences.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Modeling the Rhythm from Lyrics for Melody Generation of Pop Song
Authors:
Daiyu Zhang,
Ju-Chiang Wang,
Katerina Kosta,
Jordan B. L. Smith,
Shicen Zhou
Abstract:
Creating a pop song melody according to pre-written lyrics is a typical practice for composers. A computational model of how lyrics are set as melodies is important for automatic composition systems, but an end-to-end lyric-to-melody model would require enormous amounts of paired training data. To mitigate the data constraints, we adopt a two-stage approach, dividing the task into lyric-to-rhythm…
▽ More
Creating a pop song melody according to pre-written lyrics is a typical practice for composers. A computational model of how lyrics are set as melodies is important for automatic composition systems, but an end-to-end lyric-to-melody model would require enormous amounts of paired training data. To mitigate the data constraints, we adopt a two-stage approach, dividing the task into lyric-to-rhythm and rhythm-to-melody modules. However, the lyric-to-rhythm task is still challenging due to its multimodality. In this paper, we propose a novel lyric-to-rhythm framework that includes part-of-speech tags to achieve better text setting, and a Transformer architecture designed to model long-term syllable-to-note associations. For the rhythm-to-melody task, we adapt a proven chord-conditioned melody Transformer, which has achieved state-of-the-art results. Experiments for Chinese lyric-to-melody generation show that the proposed framework is able to model key characteristics of rhythm and pitch distributions in the dataset, and in a subjective evaluation, the melodies generated by our system were rated as similar to or better than those of a state-of-the-art alternative.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
MuSFA: Improving Music Structural Function Analysis with Partially Labeled Data
Authors:
Ju-Chiang Wang,
Jordan B. L. Smith,
Yun-Ning Hung
Abstract:
Music structure analysis (MSA) systems aim to segment a song recording into non-overlapping sections with useful labels. Previous MSA systems typically predict abstract labels in a post-processing step and require the full context of the song. By contrast, we recently proposed a supervised framework, called "Music Structural Function Analysis" (MuSFA), that models and predicts meaningful labels li…
▽ More
Music structure analysis (MSA) systems aim to segment a song recording into non-overlapping sections with useful labels. Previous MSA systems typically predict abstract labels in a post-processing step and require the full context of the song. By contrast, we recently proposed a supervised framework, called "Music Structural Function Analysis" (MuSFA), that models and predicts meaningful labels like 'verse' and 'chorus' directly from audio, without requiring the full context of a song. However, the performance of this system depends on the amount and quality of training data. In this paper, we propose to repurpose a public dataset, HookTheory Lead Sheet Dataset (HLSD), to improve the performance. HLSD contains over 18K excerpts of music sections originally collected for studying automatic melody harmonization. We treat each excerpt as a partially labeled song and provide a label mapping, so that HLSD can be used together with other public datasets, such as SALAMI, RWC, and Isophonics. In cross-dataset evaluations, we find that including HLSD in training can improve state-of-the-art boundary detection and section labeling scores by ~3% and ~1% respectively.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Hierarchical Control Strategy for Moving A Robot Manipulator Between Small Containers
Authors:
Paolo Torrado,
Boling Yang,
Joshua Smith
Abstract:
In this paper, we study the implementation of a model predictive controller (MPC) for the task of object manipulation in a highly uncertain environment (e.g., picking objects from a semi-flexible array of densely packed bins). As a real-time perception-driven feedback controller, MPC is robust to the uncertainties in this environment. However, our experiment shows MPC cannot control a robot to com…
▽ More
In this paper, we study the implementation of a model predictive controller (MPC) for the task of object manipulation in a highly uncertain environment (e.g., picking objects from a semi-flexible array of densely packed bins). As a real-time perception-driven feedback controller, MPC is robust to the uncertainties in this environment. However, our experiment shows MPC cannot control a robot to complete a sequence of motions in a heavily occluded environment due to its myopic nature. It will benefit from adding a high-level policy that adaptively adjusts the optimization problem for MPC.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
An Analytical Model for Stepwise Adiabatic Driver Energy Consumption
Authors:
Eric J. Carlson,
Joshua R. Smith
Abstract:
This paper presents a complete closed-form analytical model for determining the per-cycle energy consumption of stepwise adiabatic drivers used for driving a capacitive load such as a power FET gate. The model takes into account the number of steps used, the stepwise driver tank capacitance, the load capacitance, and the stepwise driver switch resistance and on-time. Model accuracy is compared to…
▽ More
This paper presents a complete closed-form analytical model for determining the per-cycle energy consumption of stepwise adiabatic drivers used for driving a capacitive load such as a power FET gate. The model takes into account the number of steps used, the stepwise driver tank capacitance, the load capacitance, and the stepwise driver switch resistance and on-time. Model accuracy is compared to that of simulation and models from previous work.
△ Less
Submitted 29 October, 2022;
originally announced October 2022.
-
Optimal Phase Design for RIS Channel Estimation
Authors:
Chelsea L. Miller,
Peter J. Smith,
Pawel A. Dmochowski
Abstract:
We develop an optimal version of a prior two-stage channel estimation protocol for RIS-assisted channels. The new design uses a modified DFT matrix (MDFT) for the training phases at the RIS and is shown to minimize the total channel estimation error variance. In conjunction with interpolation (estimating fewer RIS channels), the MDFT approach accelerates channel estimation even when the channel fr…
▽ More
We develop an optimal version of a prior two-stage channel estimation protocol for RIS-assisted channels. The new design uses a modified DFT matrix (MDFT) for the training phases at the RIS and is shown to minimize the total channel estimation error variance. In conjunction with interpolation (estimating fewer RIS channels), the MDFT approach accelerates channel estimation even when the channel from base station to RIS is line-of-sight. In contrast, prior two-stage techniques required a full-rank channel for efficient estimation. We investigate the resulting channel estimation errors by comparing different training phase designs for a variety of propagation conditions using a ray-based channel model. To examine the overall performance, we simulate the spectral efficiency with MRC processing for a single-user RIS-assisted system using an existing optimal design for the RIS transmission phases. Results verify the optimality of MDFT while simulations and analysis show that the performance is more dependent on the user-to-RIS channel correlation and the coarseness of the interpolation used, rather than the training phase design. For example, under a scenario with more highly correlated channels, the procedure accelerates channel estimation by a factor of 16, while the improvement is a factor of 5 in a less correlated case. The overall procedure is extremely robust, with a maximum performance loss of 1.5bits/sec/Hz compared to that with perfect channel state information for the considered channel conditions.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Robustness of an Artificial Intelligence Solution for Diagnosis of Normal Chest X-Rays
Authors:
Tom Dyer,
Jordan Smith,
Gaetan Dissez,
Nicole Tay,
Qaiser Malik,
Tom Naunton Morgan,
Paul Williams,
Liliana Garcia-Mondragon,
George Pearse,
Simon Rasalingham
Abstract:
Purpose: Artificial intelligence (AI) solutions for medical diagnosis require thorough evaluation to demonstrate that performance is maintained for all patient sub-groups and to ensure that proposed improvements in care will be delivered equitably. This study evaluates the robustness of an AI solution for the diagnosis of normal chest X-rays (CXRs) by comparing performance across multiple patient…
▽ More
Purpose: Artificial intelligence (AI) solutions for medical diagnosis require thorough evaluation to demonstrate that performance is maintained for all patient sub-groups and to ensure that proposed improvements in care will be delivered equitably. This study evaluates the robustness of an AI solution for the diagnosis of normal chest X-rays (CXRs) by comparing performance across multiple patient and environmental subgroups, as well as comparing AI errors with those made by human experts.
Methods: A total of 4,060 CXRs were sampled to represent a diverse dataset of NHS patients and care settings. Ground-truth labels were assigned by a 3-radiologist panel. AI performance was evaluated against assigned labels and sub-groups analysis was conducted against patient age and sex, as well as CXR view, modality, device manufacturer and hospital site.
Results: The AI solution was able to remove 18.5% of the dataset by classification as High Confidence Normal (HCN). This was associated with a negative predictive value (NPV) of 96.0%, compared to 89.1% for diagnosis of normal scans by radiologists. In all AI false negative (FN) cases, a radiologist was found to have also made the same error when compared to final ground-truth labels. Subgroup analysis showed no statistically significant variations in AI performance, whilst reduced normal classification was observed in data from some hospital sites.
Conclusion: We show the AI solution could provide meaningful workload savings by diagnosis of 18.5% of scans as HCN with a superior NPV to human readers. The AI solution is shown to perform well across patient subgroups and error cases were shown to be subjective or subtle in nature.
△ Less
Submitted 31 August, 2022;
originally announced September 2022.
-
Enhancing Early Lung Cancer Detection on Chest Radiographs with AI-assistance: A Multi-Reader Study
Authors:
Gaetan Dissez,
Nicole Tay,
Tom Dyer,
Matthew Tam,
Richard Dittrich,
David Doyne,
James Hoare,
Jackson J. Pat,
Stephanie Patterson,
Amanda Stockham,
Qaiser Malik,
Tom Naunton Morgan,
Paul Williams,
Liliana Garcia-Mondragon,
Jordan Smith,
George Pearse,
Simon Rasalingham
Abstract:
Objectives: The present study evaluated the impact of a commercially available explainable AI algorithm in augmenting the ability of clinicians to identify lung cancer on chest X-rays (CXR).
Design: This retrospective study evaluated the performance of 11 clinicians for detecting lung cancer from chest radiographs, with and without assistance from a commercially available AI algorithm (red dot,…
▽ More
Objectives: The present study evaluated the impact of a commercially available explainable AI algorithm in augmenting the ability of clinicians to identify lung cancer on chest X-rays (CXR).
Design: This retrospective study evaluated the performance of 11 clinicians for detecting lung cancer from chest radiographs, with and without assistance from a commercially available AI algorithm (red dot, Behold.ai) that predicts suspected lung cancer from CXRs. Clinician performance was evaluated against clinically confirmed diagnoses.
Setting: The study analysed anonymised patient data from an NHS hospital; the dataset consisted of 400 chest radiographs from adult patients (18 years and above) who had a CXR performed in 2020, with corresponding clinical text reports.
Participants: A panel of readers consisting of 11 clinicians (consultant radiologists, radiologist trainees and reporting radiographers) participated in this study.
Main outcome measures: Overall accuracy, sensitivity, specificity and precision for detecting lung cancer on CXRs by clinicians, with and without AI input. Agreement rates between clinicians and performance standard deviation were also evaluated, with and without AI input.
Results: The use of the AI algorithm by clinicians led to an improved overall performance for lung tumour detection, achieving an overall increase of 17.4% of lung cancers being identified on CXRs which would have otherwise been missed, an overall increase in detection of smaller tumours, a 24% and 13% increased detection of stage 1 and stage 2 lung cancers respectively, and standardisation of clinician performance.
Conclusions: This study showed great promise in the clinical utility of AI algorithms in improving early lung cancer diagnosis and promoting health equity through overall improvement in reader performances, without impacting downstream imaging resources.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions
Authors:
Ju-Chiang Wang,
Yun-Ning Hung,
Jordan B. L. Smith
Abstract:
Conventional music structure analysis algorithms aim to divide a song into segments and to group them with abstract labels (e.g., 'A', 'B', and 'C'). However, explicitly identifying the function of each segment (e.g., 'verse' or 'chorus') is rarely attempted, but has many applications. We introduce a multi-task deep learning framework to model these structural semantic labels directly from audio b…
▽ More
Conventional music structure analysis algorithms aim to divide a song into segments and to group them with abstract labels (e.g., 'A', 'B', and 'C'). However, explicitly identifying the function of each segment (e.g., 'verse' or 'chorus') is rarely attempted, but has many applications. We introduce a multi-task deep learning framework to model these structural semantic labels directly from audio by estimating "verseness," "chorusness," and so forth, as a function of time. We propose a 7-class taxonomy (i.e., intro, verse, chorus, bridge, outro, instrumental, and silence) and provide rules to consolidate annotations from four disparate datasets. We also propose to use a spectral-temporal Transformer-based model, called SpecTNT, which can be trained with an additional connectionist temporal localization (CTL) loss. In cross-dataset evaluations using four public datasets, we demonstrate the effectiveness of the SpecTNT model and CTL loss, and obtain strong results overall: the proposed system outperforms state-of-the-art chorus-detection and boundary-detection methods at detecting choruses and boundaries, respectively.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
Multi-Output Gaussian Process-Based Data Augmentation for Multi-Building and Multi-Floor Indoor Localization
Authors:
Zhe Tang,
Sihao Li,
Kyeong Soo Kim,
Jeremy Smith
Abstract:
Location fingerprinting based on RSSI becomes a mainstream indoor localization technique due to its advantage of not requiring the installation of new infrastructure and the modification of existing devices, especially given the prevalence of Wi-Fi-enabled devices and the ubiquitous Wi-Fi access in modern buildings. The use of AI/ML technologies like DNNs makes location fingerprinting more accurat…
▽ More
Location fingerprinting based on RSSI becomes a mainstream indoor localization technique due to its advantage of not requiring the installation of new infrastructure and the modification of existing devices, especially given the prevalence of Wi-Fi-enabled devices and the ubiquitous Wi-Fi access in modern buildings. The use of AI/ML technologies like DNNs makes location fingerprinting more accurate and reliable, especially for large-scale multi-building and multi-floor indoor localization. The application of DNNs for indoor localization, however, depends on a large amount of preprocessed and deliberately-labeled data for their training. Considering the difficulty of the data collection in an indoor environment, especially under the current epidemic situation of COVID-19, we investigate three different methods of RSSI data augmentation based on Multi-Output Gaussian Process (MOGP), i.e., by a single floor, by neighboring floors, and by a single building; unlike Single-Output Gaussian Process (SOGP), MOGP can take into account the correlation among RSSI observations from multiple Access Points (APs) deployed closely to each other (e.g., APs on the same floor of a building) by collectively handling them. The feasibility of the MOGP-based RSSI data augmentation is demonstrated through experiments based on the state-of-the-art RNN indoor localization model and the UJIIndoorLoc, i.e., the most popular publicly-available multi-building and multi-floor indoor localization database, where the RNN model trained with the UJIIndoorLoc database augmented by using the whole RSSI data of a building in fitting an MOGP model (i.e., by a single building) outperforms the other two augmentation methods as well as the RNN model trained with the original UJIIndoorLoc database, resulting in the mean three-dimensional positioning error of 8.42 m.
△ Less
Submitted 31 July, 2023; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Low dosage 3D volume fluorescence microscopy imaging using compressive sensing
Authors:
Varun Mannam,
Jacob Brandt,
Cody J. Smith,
Scott Howard
Abstract:
Fluorescence microscopy has been a significant tool to observe long-term imaging of embryos (in vivo) growth over time. However, cumulative exposure is phototoxic to such sensitive live samples. While techniques like light-sheet fluorescence microscopy (LSFM) allow for reduced exposure, it is not well suited for deep imaging models. Other computational techniques are computationally expensive and…
▽ More
Fluorescence microscopy has been a significant tool to observe long-term imaging of embryos (in vivo) growth over time. However, cumulative exposure is phototoxic to such sensitive live samples. While techniques like light-sheet fluorescence microscopy (LSFM) allow for reduced exposure, it is not well suited for deep imaging models. Other computational techniques are computationally expensive and often lack restoration quality. To address this challenge, one can use various low-dosage imaging techniques that are developed to achieve the 3D volume reconstruction using a few slices in the axial direction (z-axis); however, they often lack restoration quality. Also, acquiring dense images (with small steps) in the axial direction is computationally expensive. To address this challenge, we present a compressive sensing (CS) based approach to fully reconstruct 3D volumes with the same signal-to-noise ratio (SNR) with less than half of the excitation dosage. We present the theory and experimentally validate the approach. To demonstrate our technique, we capture a 3D volume of the RFP labeled neurons in the zebrafish embryo spinal cord (30um thickness) with the axial sampling of 0.1um using a confocal microscope. From the results, we observe the CS-based approach achieves accurate 3D volume reconstruction from less than 20% of the entire stack optical sections. The developed CS-based methodology in this work can be easily applied to other deep imaging modalities such as two-photon and light-sheet microscopy, where reducing sample photo-toxicity is a critical challenge.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Communication by means of Modulated Johnson Noise
Authors:
Zerina Kapetanovic,
Miguel Morales,
Joshua R. Smith
Abstract:
We present the design of a new passive wireless communication system that does not rely on ambient or generated RF sources. Instead, we exploit the Johnson (thermal) noise generated by a resistor to transmit information bits wirelessly. By switching the load connected to an antenna between a resistor and open circuit, we can achieve data rates of up to 26bps and distances of up to 7.3 meters. This…
▽ More
We present the design of a new passive wireless communication system that does not rely on ambient or generated RF sources. Instead, we exploit the Johnson (thermal) noise generated by a resistor to transmit information bits wirelessly. By switching the load connected to an antenna between a resistor and open circuit, we can achieve data rates of up to 26bps and distances of up to 7.3 meters. This communication method is orders of magnitude less power consuming than conventional communication schemes and presents the opportunity to enable wireless communication in areas with a complete lack of connectivity.
△ Less
Submitted 6 August, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Supervised Metric Learning for Music Structure Features
Authors:
Ju-Chiang Wang,
Jordan B. L. Smith,
Wei-Tsung Lu,
Xuchen Song
Abstract:
Music structure analysis (MSA) methods traditionally search for musically meaningful patterns in audio: homogeneity, repetition, novelty, and segment-length regularity. Hand-crafted audio features such as MFCCs or chromagrams are often used to elicit these patterns. However, with more annotations of section labels (e.g., verse, chorus, and bridge) becoming available, one can use supervised feature…
▽ More
Music structure analysis (MSA) methods traditionally search for musically meaningful patterns in audio: homogeneity, repetition, novelty, and segment-length regularity. Hand-crafted audio features such as MFCCs or chromagrams are often used to elicit these patterns. However, with more annotations of section labels (e.g., verse, chorus, and bridge) becoming available, one can use supervised feature learning to make these patterns even clearer and improve MSA performance. To this end, we take a supervised metric learning approach: we train a deep neural network to output embeddings that are near each other for two spectrogram inputs if both have the same section type (according to an annotation), and otherwise far apart. We propose a batch sampling scheme to ensure the labels in a training pair are interpreted meaningfully. The trained model extracts features that can be used in existing MSA algorithms. In evaluations with three datasets (HarmonixSet, SALAMI, and RWC), we demonstrate that using the proposed features can improve a traditional MSA algorithm significantly in both intra- and cross-dataset scenarios.
△ Less
Submitted 29 April, 2022; v1 submitted 17 October, 2021;
originally announced October 2021.
-
Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-task Learning
Authors:
Ju-Chiang Wang,
Jordan B. L. Smith,
Jitong Chen,
Xuchen Song,
Yuxuan Wang
Abstract:
This paper presents a novel supervised approach to detecting the chorus segments in popular music. Traditional approaches to this task are mostly unsupervised, with pipelines designed to target some quality that is assumed to define "chorusness," which usually means seeking the loudest or most frequently repeated sections. We propose to use a convolutional neural network with a multi-task learning…
▽ More
This paper presents a novel supervised approach to detecting the chorus segments in popular music. Traditional approaches to this task are mostly unsupervised, with pipelines designed to target some quality that is assumed to define "chorusness," which usually means seeking the loudest or most frequently repeated sections. We propose to use a convolutional neural network with a multi-task learning objective, which simultaneously fits two temporal activation curves: one indicating "chorusness" as a function of time, and the other the location of the boundaries. We also propose a post-processing method that jointly takes into account the chorus and boundary predictions to produce binary output. In experiments using three datasets, we compare our system to a set of public implementations of other segmentation and chorus-detection algorithms, and find our approach performs significantly better.
△ Less
Submitted 21 April, 2021; v1 submitted 26 March, 2021;
originally announced March 2021.
-
Modeling the Compatibility of Stem Tracks to Generate Music Mashups
Authors:
Jiawen Huang,
Ju-Chiang Wang,
Jordan B. L. Smith,
Xuchen Song,
Yuxuan Wang
Abstract:
A music mashup combines audio elements from two or more songs to create a new work. To reduce the time and effort required to make them, researchers have developed algorithms that predict the compatibility of audio elements. Prior work has focused on mixing unaltered excerpts, but advances in source separation enable the creation of mashups from isolated stems (e.g., vocals, drums, bass, etc.). In…
▽ More
A music mashup combines audio elements from two or more songs to create a new work. To reduce the time and effort required to make them, researchers have developed algorithms that predict the compatibility of audio elements. Prior work has focused on mixing unaltered excerpts, but advances in source separation enable the creation of mashups from isolated stems (e.g., vocals, drums, bass, etc.). In this work, we take advantage of separated stems not just for creating mashups, but for training a model that predicts the mutual compatibility of groups of excerpts, using self-supervised and semi-supervised methods. Specifically, we first produce a random mashup creation pipeline that combines stem tracks obtained via source separation, with key and tempo automatically adjusted to match, since these are prerequisites for high-quality mashups. To train a model to predict compatibility, we use stem tracks obtained from the same song as positive examples, and random combinations of stems with key and/or tempo unadjusted as negative examples. To improve the model and use more data, we also train on "average" examples: random combinations with matching key and tempo, where we treat them as unlabeled data as their true compatibility is unknown. To determine whether the combined signal or the set of stem signals is more indicative of the quality of the result, we experiment on two model architectures and train them using semi-supervised learning technique. Finally, we conduct objective and subjective evaluations of the system, comparing them to a standard rule-based system.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Convolutional Neural Network Denoising in Fluorescence Lifetime Imaging Microscopy (FLIM)
Authors:
Varun Mannam,
Yide Zhang,
Xiaotong Yuan,
Takashi Hato,
Pierre C. Dagher,
Evan L. Nichols,
Cody J. Smith,
Kenneth W. Dunn,
Scott Howard
Abstract:
Fluorescence lifetime imaging microscopy (FLIM) systems are limited by their slow processing speed, low signal-to-noise ratio (SNR), and expensive and challenging hardware setups. In this work, we demonstrate applying a denoising convolutional network to improve FLIM SNR. The network will be integrated with an instant FLIM system with fast data acquisition based on analog signal processing, high S…
▽ More
Fluorescence lifetime imaging microscopy (FLIM) systems are limited by their slow processing speed, low signal-to-noise ratio (SNR), and expensive and challenging hardware setups. In this work, we demonstrate applying a denoising convolutional network to improve FLIM SNR. The network will be integrated with an instant FLIM system with fast data acquisition based on analog signal processing, high SNR using high-efficiency pulse-modulation, and cost-effective implementation utilizing off-the-shelf radio-frequency components. Our instant FLIM system simultaneously provides the intensity, lifetime, and phasor plots \textit{in vivo} and \textit{ex vivo}. By integrating image denoising using the trained deep learning model on the FLIM data, provide accurate FLIM phasor measurements are obtained. The enhanced phasor is then passed through the K-means clustering segmentation method, an unbiased and unsupervised machine learning technique to separate different fluorophores accurately. Our experimental \textit{in vivo} mouse kidney results indicate that introducing the deep learning image denoising model before the segmentation effectively removes the noise in the phasor compared to existing methods and provides clearer segments. Hence, the proposed deep learning-based workflow provides fast and accurate automatic segmentation of fluorescence images using instant FLIM. The denoising operation is effective for the segmentation if the FLIM measurements are noisy. The clustering can effectively enhance the detection of biological structures of interest in biomedical imaging applications.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
The Optimal Location and Size of an Intermediate Coil in a Magnetic Resonant Coupling Wireless Power Transfer System
Authors:
Kedi Yan,
Gregory E. Moore,
Joshua R. Smith
Abstract:
To increase the transmission distance of Wireless Power Transfer (WPT) systems, we provide guidelines on choosing the optimal location of an Intermediate Coil with respect to size within a standard five-coil axially aligned experimental setup. From our results, for maximum magnitude of S21 at the resonant frequency we found the optimal location to exist where the coupling coefficient between the T…
▽ More
To increase the transmission distance of Wireless Power Transfer (WPT) systems, we provide guidelines on choosing the optimal location of an Intermediate Coil with respect to size within a standard five-coil axially aligned experimental setup. From our results, for maximum magnitude of S21 at the resonant frequency we found the optimal location to exist where the coupling coefficient between the Transmitter and the Intermediate Coil and the coupling coefficient between the Receiver and the Intermediate Coil are identical. Additionally, the optimal outer diameter for the maximum magnitude of S21 at the resonant frequency of the Intermediate Coil in the given symmetric and asymmetric setup are found to be larger than both TX and RX.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.