Search | arXiv e-print repository

doi 10.1007/978-3-031-86920-4_10

Image Quality Transfer of Diffusion MRI Guided By High-Resolution Structural MRI

Authors: Alp G. Cicimen, Henry F. J. Tregidgo, Matteo Figini, Eirini Messaritaki, Carolyn B. McNabb, Marco Palombo, C. John Evans, Mara Cercignani, Derek K. Jones, Daniel C. Alexander

Abstract: Prior work on the Image Quality Transfer on Diffusion MRI (dMRI) has shown significant improvement over traditional interpolation methods. However, the difficulty in obtaining ultra-high resolution Diffusion MRI scans poses a problem in training neural networks to obtain high-resolution dMRI scans. Here we hypothesise that the inclusion of structural MRI images, which can be acquired at much highe… ▽ More Prior work on the Image Quality Transfer on Diffusion MRI (dMRI) has shown significant improvement over traditional interpolation methods. However, the difficulty in obtaining ultra-high resolution Diffusion MRI scans poses a problem in training neural networks to obtain high-resolution dMRI scans. Here we hypothesise that the inclusion of structural MRI images, which can be acquired at much higher resolutions, can be used as a guide to obtaining a more accurate high-resolution dMRI output. To test our hypothesis, we have constructed a novel framework that incorporates structural MRI scans together with dMRI to obtain high-resolution dMRI scans. We set up tests which evaluate the validity of our claim through various configurations and compare the performance of our approach against a unimodal approach. Our results show that the inclusion of structural MRI scans do lead to an improvement in high-resolution image prediction when T1w data is incorporated into the model input. △ Less

Submitted 6 August, 2024; originally announced August 2024.

arXiv:2402.01796 [pdf, other]

Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction

Authors: Daniela A. Wiepert, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland R. Barnard, David T. Jones, Hugo Botha

Abstract: Accurately extracting clinical information from speech is critical to the diagnosis and treatment of many neurological conditions. As such, there is interest in leveraging AI for automatic, objective assessments of clinical speech to facilitate diagnosis and treatment of speech disorders. We explore transfer learning using foundation models, focusing on the impact of layer selection for the downst… ▽ More Accurately extracting clinical information from speech is critical to the diagnosis and treatment of many neurological conditions. As such, there is interest in leveraging AI for automatic, objective assessments of clinical speech to facilitate diagnosis and treatment of speech disorders. We explore transfer learning using foundation models, focusing on the impact of layer selection for the downstream task of predicting pathological speech features. We find that selecting an optimal layer can greatly improve performance (~15.8% increase in balanced accuracy per feature as compared to worst layer, ~13.6% increase as compared to final layer), though the best layer varies by predicted feature and does not always generalize well to unseen data. A learned weighted sum offers comparable performance to the average best layer in-distribution (only ~1.2% lower) and had strong generalization for out-of-distribution data (only 1.5% lower than the average best layer). △ Less

Submitted 21 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to INTERSPEECH 2024

arXiv:2310.13010 [pdf, other]

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model

Authors: Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. JR Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela Wiepert, David T. Jones, Hugo Botha

Abstract: We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders. We combine this classifier with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings. Our model compresses long sequences into a small set of class-specific latent representations and a factorized projection is use… ▽ More We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders. We combine this classifier with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings. Our model compresses long sequences into a small set of class-specific latent representations and a factorized projection is used to predict different attributes of the disordered input speech. The benefit of our approach is that it allows us to model different regions of the input for different classes and is at the same time data efficient. We evaluated the proposed model extensively on a curated corpus from the Mayo Clinic. Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%. With limited task-specific data, we find that pretraining is important and surprisingly pretraining with the unrelated automatic speech recognition (ASR) task is also beneficial. Encodings from the middle layers provide a mix of both acoustic and phonetic information and achieve best prediction results compared to just using the final layer encodings (83.1% vs. 79.6%). The results are promising and with further refinements may help clinicians detect speech abnormalities without needing access to highly specialized speech-language pathologists. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Journal ref: Proc. ASRU, 2023

arXiv:2210.09975 [pdf]

Risk of re-identification for shared clinical speech recordings

Authors: Daniela A. Wiepert, Bradley A. Malin, Joseph R. Duffy, Rene L. Utianski, John L. Stricker, David T. Jones, Hugo Botha

Abstract: Large, curated datasets are required to leverage speech-based tools in healthcare. These are costly to produce, resulting in increased interest in data sharing. As speech can potentially identify speakers (i.e., voiceprints), sharing recordings raises privacy concerns. We examine the re-identification risk for speech recordings, without reference to demographic or metadata, using a state-of-the-ar… ▽ More Large, curated datasets are required to leverage speech-based tools in healthcare. These are costly to produce, resulting in increased interest in data sharing. As speech can potentially identify speakers (i.e., voiceprints), sharing recordings raises privacy concerns. We examine the re-identification risk for speech recordings, without reference to demographic or metadata, using a state-of-the-art speaker recognition system. We demonstrate that the risk is inversely related to the number of comparisons an adversary must consider, i.e., the search space. Risk is high for a small search space but drops as the search space grows ($precision >0.85$ for $<1*10^{6}$ comparisons, $precision <0.5$ for $>3*10^{6}$ comparisons). Next, we show that the nature of a speech recording influences re-identification risk, with non-connected speech (e.g., vowel prolongation) being harder to identify. Our findings suggest that speaker recognition systems can be used to re-identify participants in specific circumstances, but in practice, the re-identification risk appears low. △ Less

Submitted 21 August, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: 24 pages, 6 figures

arXiv:2208.01602 [pdf, ps, other]

Lossy compression of multidimensional medical images using sinusoidal activation networks: an evaluation study

Authors: Matteo Mancini, Derek K. Jones, Marco Palombo

Abstract: In this work, we evaluate how neural networks with periodic activation functions can be leveraged to reliably compress large multidimensional medical image datasets, with proof-of-concept application to 4D diffusion-weighted MRI (dMRI). In the medical imaging landscape, multidimensional MRI is a key area of research for developing biomarkers that are both sensitive and specific to the underlying t… ▽ More In this work, we evaluate how neural networks with periodic activation functions can be leveraged to reliably compress large multidimensional medical image datasets, with proof-of-concept application to 4D diffusion-weighted MRI (dMRI). In the medical imaging landscape, multidimensional MRI is a key area of research for developing biomarkers that are both sensitive and specific to the underlying tissue microstructure. However, the high-dimensional nature of these data poses a challenge in terms of both storage and sharing capabilities and associated costs, requiring appropriate algorithms able to represent the information in a low-dimensional space. Recent theoretical developments in deep learning have shown how periodic activation functions are a powerful tool for implicit neural representation of images and can be used for compression of 2D images. Here we extend this approach to 4D images and show how any given 4D dMRI dataset can be accurately represented through the parameters of a sinusoidal activation network, achieving a data compression rate about 10 times higher than the standard DEFLATE algorithm. Our results show that the proposed approach outperforms benchmark ReLU and Tanh activation perceptron architectures in terms of mean squared error, peak signal-to-noise ratio and structural similarity index. Subsequent analyses using the tensor and spherical harmonics representations demonstrate that the proposed lossy compression reproduces accurately the characteristics of the original data, leading to relative errors about 5 to 10 times lower than the benchmark JPEG2000 lossy compression and similar to standard pre-processing steps such as MP-PCA denosing, suggesting a loss of information within the currently accepted levels for clinical application. △ Less

Submitted 3 August, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

arXiv:2202.11579 [pdf]

Identifying Oscillations Injected by Inverter-Based Solar Energy Sources

Authors: Chen Wang, Luigi Vanfretti, Chetan Mishra, Kevin D. Jones, R. Matthew Gardner

Abstract: Inverter-based solar energy sources are becoming widely integrated into modern power systems. However, their impacts on the system in the frequency domain are rarely investigated at a higher frequency range than conventional electromechanical oscillations. This paper presents evidence of the emergence of an oscillation mode injected by inverter-based solar energy sources in Dominion Energy's servi… ▽ More Inverter-based solar energy sources are becoming widely integrated into modern power systems. However, their impacts on the system in the frequency domain are rarely investigated at a higher frequency range than conventional electromechanical oscillations. This paper presents evidence of the emergence of an oscillation mode injected by inverter-based solar energy sources in Dominion Energy's service territory. This new mode was recognized from the analysis of real-world ambient synchrophasor and point-of-wave data. The analysis was performed by developing customized synchrophasor analysis tools deployed on the PredictiveGrid^{TM} platform implemented at Dominion Energy. Herein, we describe and illustrate the preliminary analysis results acquired from spectrogram observations, power spectral density plots, and mode shape estimation. The emergence and propagation of this new mode in Dominion Energy's footprint is illustrated using a heatmap based on a proposed frequency component energy metric, which helps to assess this oscillation's spread and impact. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: 5 pages, 14 figures. This paper is accepted and will be published in the Proceedings of the 2022 IEEE PES General Meeting, July 17-21 2022, Denver, CO, USA

arXiv:2111.08192 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746132

SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays

Authors: Thi Ngoc Tho Nguyen, Douglas L. Jones, Karn N. Watcharasupat, Huy Phan, Woon-Seng Gan

Abstract: Polyphonic sound event localization and detection (SELD) has many practical applications in acoustic sensing and monitoring. However, the development of real-time SELD has been limited by the demanding computational requirement of most recent SELD systems. In this work, we introduce SALSA-Lite, a fast and effective feature for polyphonic SELD using microphone array inputs. SALSA-Lite is a lightwei… ▽ More Polyphonic sound event localization and detection (SELD) has many practical applications in acoustic sensing and monitoring. However, the development of real-time SELD has been limited by the demanding computational requirement of most recent SELD systems. In this work, we introduce SALSA-Lite, a fast and effective feature for polyphonic SELD using microphone array inputs. SALSA-Lite is a lightweight variation of a previously proposed SALSA feature for polyphonic SELD. SALSA, which stands for Spatial Cue-Augmented Log-Spectrogram, consists of multichannel log-spectrograms stacked channelwise with the normalized principal eigenvectors of the spectrotemporally corresponding spatial covariance matrices. In contrast to SALSA, which uses eigenvector-based spatial features, SALSA-Lite uses normalized inter-channel phase differences as spatial features, allowing a 30-fold speedup compared to the original SALSA feature. Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset showed that the SALSA-Lite feature achieved competitive performance compared to the full SALSA feature, and significantly outperformed the traditional feature set of multichannel log-mel spectrograms with generalized cross-correlation spectra. Specifically, using SALSA-Lite features increased localization-dependent F1 score and class-dependent localization recall by 15% and 5%, respectively, compared to using multichannel log-mel spectrograms with generalized cross-correlation spectra. △ Less

Submitted 4 May, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: arXiv admin note: text overlap with arXiv:2110.00275

Journal ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 716-720

arXiv:2110.00275 [pdf, other]

doi 10.1109/TASLP.2022.3173054

SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

Authors: Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon-Seng Gan

Abstract: Sound event localization and detection (SELD) consists of two subtasks, which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses amplitude and/or phase differences between microphones to estimate source directions. As a result, it is often di… ▽ More Sound event localization and detection (SELD) consists of two subtasks, which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses amplitude and/or phase differences between microphones to estimate source directions. As a result, it is often difficult to jointly optimize these two subtasks. We propose a novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources. The SALSA feature consists of multichannel log-spectrograms stacked along with the normalized principal eigenvector of the spatial covariance matrix at each corresponding time-frequency bin. Depending on the microphone array format, the principal eigenvector can be normalized differently to extract amplitude and/or phase differences between the microphones. As a result, SALSA features are applicable for different microphone array formats such as first-order ambisonics (FOA) and multichannel microphone array (MIC). Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset with directional interferences showed that SALSA features outperformed other state-of-the-art features. Specifically, the use of SALSA features in the FOA format increased the F1 score and localization recall by 6% each, compared to the multichannel log-mel spectrograms with intensity vectors. For the MIC format, using SALSA features increased F1 score and localization recall by 16% and 7%, respectively, compared to using multichannel log-mel spectrograms with generalized cross-correlation spectra. △ Less

Submitted 6 June, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: (c) 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1749-1762, 2022

arXiv:2107.10471 [pdf, ps, other]

Improving Polyphonic Sound Event Detection on Multichannel Recordings with the Sørensen-Dice Coefficient Loss and Transfer Learning

Authors: Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Ngoc Khanh Nguyen, Zhen Jian Lee, Douglas L. Jones, Woon Seng Gan

Abstract: The Sørensen--Dice Coefficient has recently seen rising popularity as a loss function (also known as Dice loss) due to its robustness in tasks where the number of negative samples significantly exceeds that of positive samples, such as semantic segmentation, natural language processing, and sound event detection. Conventional training of polyphonic sound event detection systems with binary cross-e… ▽ More The Sørensen--Dice Coefficient has recently seen rising popularity as a loss function (also known as Dice loss) due to its robustness in tasks where the number of negative samples significantly exceeds that of positive samples, such as semantic segmentation, natural language processing, and sound event detection. Conventional training of polyphonic sound event detection systems with binary cross-entropy loss often results in suboptimal detection performance as the training is often overwhelmed by updates from negative samples. In this paper, we investigated the effect of the Dice loss, intra- and inter-modal transfer learning, data augmentation, and recording formats, on the performance of polyphonic sound event detection systems with multichannel inputs. Our analysis showed that polyphonic sound event detection systems trained with Dice loss consistently outperformed those trained with cross-entropy loss across different training settings and recording formats in terms of F1 score and error rate. We achieved further performance gains via the use of transfer learning and an appropriate combination of different data augmentation techniques. △ Less

Submitted 2 October, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: Submitted to the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021

arXiv:2107.10469 [pdf, other]

What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis

Authors: Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Zhen Jian Lee, Ngoc Khanh Nguyen, Douglas L. Jones, Woon Seng Gan

Abstract: Sound event localization and detection (SELD) is an emerging research topic that aims to unify the tasks of sound event detection and direction-of-arrival estimation. As a result, SELD inherits the challenges of both tasks, such as noise, reverberation, interference, polyphony, and non-stationarity of sound sources. Furthermore, SELD often faces an additional challenge of assigning correct corresp… ▽ More Sound event localization and detection (SELD) is an emerging research topic that aims to unify the tasks of sound event detection and direction-of-arrival estimation. As a result, SELD inherits the challenges of both tasks, such as noise, reverberation, interference, polyphony, and non-stationarity of sound sources. Furthermore, SELD often faces an additional challenge of assigning correct correspondences between the detected sound classes and directions of arrival to multiple overlapping sound events. Previous studies have shown that unknown interferences in reverberant environments often cause major degradation in the performance of SELD systems. To further understand the challenges of the SELD task, we performed a detailed error analysis on two of our SELD systems, which both ranked second in the team category of DCASE SELD Challenge, one in 2020 and one in 2021. Experimental results indicate polyphony as the main challenge in SELD, due to the difficulty in detecting all sound events of interest. In addition, the SELD systems tend to make fewer errors for the polyphonic scenario that is dominant in the training set. △ Less

Submitted 2 October, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: Accepted for the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021

Journal ref: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop, pp. 120-124

arXiv:2106.15190 [pdf, other]

doi 10.5281/zenodo.5031836

DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection

Authors: Thi Ngoc Tho Nguyen, Karn Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon Seng Gan

Abstract: Sound event localization and detection consists of two subtasks which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses magnitude or phase differences between microphones to estimate source directions. Therefore, it is often difficult to joi… ▽ More Sound event localization and detection consists of two subtasks which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses magnitude or phase differences between microphones to estimate source directions. Therefore, it is often difficult to jointly train these two subtasks simultaneously. We propose a novel feature called spatial cue-augmented log-spectrogram (SALSA) with exact time-frequency mapping between the signal power and the source direction-of-arrival. The feature includes multichannel log-spectrograms stacked along with the estimated direct-to-reverberant ratio and a normalized version of the principal eigenvector of the spatial covariance matrix at each time-frequency bin on the spectrograms. Experimental results on the DCASE 2021 dataset for sound event localization and detection with directional interference showed that the deep learning-based models trained on this new feature outperformed the DCASE challenge baseline by a large margin. We combined several models with slightly different architectures that were trained on the new feature to further improve the system performances for the DCASE sound event localization and detection challenge. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: 5 pages, Technical Report for DCASE 2021 Challenge Task 3. arXiv admin note text overlap with arXiv:2110.00275

arXiv:2105.01793 [pdf, other]

Intensity Harmonization for Airborne LiDAR

Authors: David Jones, Nathan Jacobs

Abstract: Constructing a point cloud for a large geographic region, such as a state or country, can require multiple years of effort. Often several vendors will be used to acquire LiDAR data, and a single region may be captured by multiple LiDAR scans. A key challenge is maintaining consistency between these scans, which includes point density, number of returns, and intensity. Intensity in particular can b… ▽ More Constructing a point cloud for a large geographic region, such as a state or country, can require multiple years of effort. Often several vendors will be used to acquire LiDAR data, and a single region may be captured by multiple LiDAR scans. A key challenge is maintaining consistency between these scans, which includes point density, number of returns, and intensity. Intensity in particular can be very different between scans, even in areas that are overlapping. Harmonizing the intensity between scans to remove these discrepancies is expensive and time consuming. In this paper, we propose a novel method for point cloud harmonization based on deep neural networks. We evaluate our method quantitatively and qualitatively using a high quality real world LiDAR dataset. We compare our method to several baselines, including standard interpolation methods as well as histogram matching. We show that our method performs as well as the best baseline in areas with similar intensity distributions, and outperforms all baselines in areas with different intensity distributions. Source code is available at https://github.com/mvrl/lidar-harmonization . △ Less

Submitted 4 May, 2021; originally announced May 2021.

Comments: IGARSS 2021. Project website with video and code at https://davidthomasjones.me/publications/airborne-lidar-intensity-harmonization

arXiv:2103.14485 [pdf, other]

aDWI-BIDS: an extension to the brain imaging data structure for advanced diffusion weighted imaging

Authors: James Gholam, Filip Szczepankiewicz, Chantal M. W. Tax, Lars Mueller, Emre Kopanoglu, Markus Nilsson, Santiago Aja-Fernandez, Matt Griffin, Derek K. Jones, Leandro Beltrachini

Abstract: Diffusion weighted imaging techniques permit us to infer microstructural detail in biological tissue in vivo and noninvasively. Modern sequences are based on advanced diffusion encoding schemes, allowing probing of more revealing measures of tissue microstructure than the standard apparent diffusion coefficient or fractional anisotropy. Though these methods may result in faster or more revealing a… ▽ More Diffusion weighted imaging techniques permit us to infer microstructural detail in biological tissue in vivo and noninvasively. Modern sequences are based on advanced diffusion encoding schemes, allowing probing of more revealing measures of tissue microstructure than the standard apparent diffusion coefficient or fractional anisotropy. Though these methods may result in faster or more revealing acquisitions, they generally demand prior knowledge of sequence-specific parameters for which there is no accepted sharing standard. Here, we present a metadata labelling scheme suitable for the needs of developers and users within the diffusion neuroimaging community alike: a lightweight, unambiguous parametric map relaying acqusition parameters. This extensible scheme supports a wide spectrum of diffusion encoding methods, from single diffusion encoding to highly complex sequences involving arbitrary gradient waveforms. Built under the brain imaging data structure (BIDS), it allows storage of advanced diffusion MRI data comprehensively alongside any other neuroimaging information, facilitating processing pipelines and multimodal analyses. We illustrate the usefulness of this BIDS-extension with a range of example data, and discuss the extension's impact on pre- and post-processing software. △ Less

Submitted 12 April, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

arXiv:2011.07859 [pdf, other]

A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network

Authors: Thi Ngoc Tho Nguyen, Ngoc Khanh Nguyen, Huy Phan, Lam Pham, Kenneth Ooi, Douglas L. Jones, Woon-Seng Gan

Abstract: Polyphonic sound event detection and localization (SELD) task is challenging because it is difficult to jointly optimize sound event detection (SED) and direction-of-arrival (DOA) estimation in the same network. We propose a general network architecture for SELD in which the SELD network comprises sub-networks that are pretrained to solve SED and DOA estimation independently, and a recurrent layer… ▽ More Polyphonic sound event detection and localization (SELD) task is challenging because it is difficult to jointly optimize sound event detection (SED) and direction-of-arrival (DOA) estimation in the same network. We propose a general network architecture for SELD in which the SELD network comprises sub-networks that are pretrained to solve SED and DOA estimation independently, and a recurrent layer that combines the SED and DOA estimation outputs into SELD outputs. The recurrent layer does the alignment between the sound classes and DOAs of sound events while being unaware of how these outputs are produced by the upstream SED and DOA estimation algorithms. This simple network architecture is compatible with different existing SED and DOA estimation algorithms. It is highly practical since the sub-networks can be improved independently. The experimental results using the DCASE 2020 SELD dataset show that the performances of our proposed network architecture using different SED and DOA estimation algorithms and different audio formats are competitive with other state-of-the-art SELD algorithms. The source code for the proposed SELD network architecture is available at Github. △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2009.07376 [pdf, other]

Q-space quantitative diffusion MRI measures using a stretched-exponential representation

Authors: Tomasz Pieciak, Maryam Afzali, Fabian Bogusz, Aja-Fernández, Derek K. Jones

Abstract: Diffusion magnetic resonance imaging (dMRI) is a relatively modern technique used to study tissue microstructure in a non-invasive way. Non-Gaussian diffusion representation is related to the restricted diffusion and can provide information about the underlying tissue properties. In this paper, we analytically derive $n$-th order statistics of the signal considering a stretched-exponential represe… ▽ More Diffusion magnetic resonance imaging (dMRI) is a relatively modern technique used to study tissue microstructure in a non-invasive way. Non-Gaussian diffusion representation is related to the restricted diffusion and can provide information about the underlying tissue properties. In this paper, we analytically derive $n$-th order statistics of the signal considering a stretched-exponential representation of the diffusion. Then, we retrieve the Q-space quantitative measures such as the Return-To-the-Origin Probability (RTOP), Q-space mean square displacement (QMSD), Q-space mean fourth-order displacement (QMFD). The stretched-exponential representation enables the handling of the diffusion contributions from a higher $b$-value regime under a non-Gaussian assumption, which can be useful in diagnosing or prognosis of neurodegenerative diseases in the early stages. Numerical implementation of the method is freely available at https://github.com/TPieciak/Stretched. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2008.00893 [pdf, other]

doi 10.1109/EEM49802.2020.9221928

Tracing carbon dioxide emissions in the European electricity markets

Authors: Mirko Schäfer, Bo Tranberg, Dave Jones, Anke Weidlich

Abstract: Consumption-based carbon emission measures aim to account for emissions associated with power transmission from distant regions, as opposed to measures which only consider local power generation. Outlining key differences between two different methodological variants of this approach, we report results on consumption-based emission intensities of power generation for European countries from 2016 t… ▽ More Consumption-based carbon emission measures aim to account for emissions associated with power transmission from distant regions, as opposed to measures which only consider local power generation. Outlining key differences between two different methodological variants of this approach, we report results on consumption-based emission intensities of power generation for European countries from 2016 to 2019. We find that in particular for well connected smaller countries, the consideration of imports has a significant impact on the attributed emissions. For these countries, implicit methodological choices in the input-output model are reflected in both hourly and average yearly emission measures. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: Accepted conference proceedings paper for the EEM 2020

arXiv:2005.02020 [pdf, other]

Deep learning-based parameter mapping for joint relaxation and diffusion tensor MR Fingerprinting

Authors: Carolin M. Pirkl, Pedro A. Gómez, Ilona Lipp, Guido Buonincontri, Miguel Molina-Romero, Anjany Sekuboyina, Diana Waldmannstetter, Jonathan Dannenberg, Sebastian Endt, Alberto Merola, Joseph R. Whittaker, Valentina Tomassini, Michela Tosetti, Derek K. Jones, Bjoern H. Menze, Marion I. Menzel

Abstract: Magnetic Resonance Fingerprinting (MRF) enables the simultaneous quantification of multiple properties of biological tissues. It relies on a pseudo-random acquisition and the matching of acquired signal evolutions to a precomputed dictionary. However, the dictionary is not scalable to higher-parametric spaces, limiting MRF to the simultaneous mapping of only a small number of parameters (proton de… ▽ More Magnetic Resonance Fingerprinting (MRF) enables the simultaneous quantification of multiple properties of biological tissues. It relies on a pseudo-random acquisition and the matching of acquired signal evolutions to a precomputed dictionary. However, the dictionary is not scalable to higher-parametric spaces, limiting MRF to the simultaneous mapping of only a small number of parameters (proton density, T1 and T2 in general). Inspired by diffusion-weighted SSFP imaging, we present a proof-of-concept of a novel MRF sequence with embedded diffusion-encoding gradients along all three axes to efficiently encode orientational diffusion and T1 and T2 relaxation. We take advantage of a convolutional neural network (CNN) to reconstruct multiple quantitative maps from this single, highly undersampled acquisition. We bypass expensive dictionary matching by learning the implicit physical relationships between the spatiotemporal MRF data and the T1, T2 and diffusion tensor parameters. The predicted parameter maps and the derived scalar diffusion metrics agree well with state-of-the-art reference protocols. Orientational diffusion information is captured as seen from the estimated primary diffusion directions. In addition to this, the joint acquisition and reconstruction framework proves capable of preserving tissue abnormalities in multiple sclerosis lesions. △ Less

Submitted 5 May, 2020; originally announced May 2020.

arXiv:2002.05865 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053045

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection

Authors: Thi Ngoc Tho Nguyen, Douglas L. Jones, Woon-Seng Gan

Abstract: Polyphonic sound event detection and direction-of-arrival estimation require different input features from audio signals. While sound event detection mainly relies on time-frequency patterns, direction-of-arrival estimation relies on magnitude or phase differences between microphones. Previous approaches use the same input features for sound event detection and direction-of-arrival estimation, and… ▽ More Polyphonic sound event detection and direction-of-arrival estimation require different input features from audio signals. While sound event detection mainly relies on time-frequency patterns, direction-of-arrival estimation relies on magnitude or phase differences between microphones. Previous approaches use the same input features for sound event detection and direction-of-arrival estimation, and train the two tasks jointly or in a two-stage transfer-learning manner. We propose a two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems. In the first step, we detect the sound events and estimate the directions-of-arrival separately to optimize the performance of each system. In the second step, we train a deep neural network to match the two output sequences of the event detector and the direction-of-arrival estimator. This modular and hierarchical approach allows the flexibility in the system design, and increase the performance of the whole sound event localization and detection system. The experimental results using the DCASE 2019 sound event localization and detection dataset show an improved performance compared to the previous state-of-the-art solutions. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: to be published in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:1911.11373 [pdf, other]

A two-step system for sound event localization and detection

Authors: T. N. T. Nguyen, D. L. Jones, R. Ranjan, S. Jayabalan, W. S. Gan

Abstract: Sound event detection and sound event localization requires different features from audio input signals. While sound event detection mainly relies on time-frequency patterns to distinguish different event classes, sound event localization uses magnitude or phase differences between microphones to estimate source directions. Therefore, we propose a two-step system to do sound event localization and… ▽ More Sound event detection and sound event localization requires different features from audio input signals. While sound event detection mainly relies on time-frequency patterns to distinguish different event classes, sound event localization uses magnitude or phase differences between microphones to estimate source directions. Therefore, we propose a two-step system to do sound event localization and detection. In the first step, we detect the sound events and estimate the directions-of-arrival separately. In the second step, we combine the results of the event detector and direction-of-arrival estimator together. The obtained results show a significant improvement over the baseline solution for sound event localization and detection in DCASE 2019 task 3 challenge. Using the evaluation dataset, the proposed system achieved an F1 score of 93.4% for sound event detection and an error of 5.4 degrees for direction-of-arrival estimation, while the winning solution achieved an F1 score of 94.7% and an angle error of 3.7 degrees respectively. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: 5 pages

arXiv:1911.01500 [pdf]

doi 10.1109/ACCESS.2019.2944818

Transmission Lines Positive Sequence Parameters Estimation and Instrument Transformers Calibration Based on PMU Measurement Error Model

Authors: Chen Wang, Virgilio A. Centeno, Kevin D. Jones, Duotong Yang

Abstract: Phasor Measurement Unit measurement data have been widely used in nowadays power system applications both in steady state and dynamic analysis. The performance of these applications running in utilities' energy management system depends heavily on an accurate positive sequence power system model. However, it is impractical to find this accurate model with transmission line parameters calculated di… ▽ More Phasor Measurement Unit measurement data have been widely used in nowadays power system applications both in steady state and dynamic analysis. The performance of these applications running in utilities' energy management system depends heavily on an accurate positive sequence power system model. However, it is impractical to find this accurate model with transmission line parameters calculated directly with the PMU measurements due to ratio errors brought by instrument transformers and communication errors brought by PMUs. Therefore, a methodology is proposed in this paper to estimate the actual transmission lines parameters throughout the whole system and, at the same time, calibrate the corresponding instrument transformers. A PMU positive sequence measurement error model is proposed targeting at the aforementioned errors, which is applicable to both transposed and un-transposed transmission lines. A single line parameters estimation method is designed based on Least Squares Estimation and this error model. This method requires only one set of reference measurements and the accuracy can be propagated throughout the whole network along with the topology acquired by the introduced Edge-based Breadth-first Search algorithm. The IEEE 118-bus system and the Texas 2000-bus system are used to demonstrate the effectiveness and efficiency of the proposed method. The potential for deployment in reality is also discussed. △ Less

Submitted 4 November, 2019; originally announced November 2019.

Journal ref: IEEE Access (Volume: 7), Oct 2019, 145104 - 145117

arXiv:1705.00615 [pdf, other]

Guided-Processing Outperforms Duty-Cycling for Energy-Efficient Systems

Authors: Long N. Le, Douglas L. Jones

Abstract: Energy-efficiency is highly desirable for sensing systems in the Internet of Things (IoT). A common approach to achieve low-power systems is duty-cycling, where components in a system are turned off periodically to meet an energy budget. However, this work shows that such an approach is not necessarily optimal in energy-efficiency, and proposes \textit{guided-processing} as a fundamentally better… ▽ More Energy-efficiency is highly desirable for sensing systems in the Internet of Things (IoT). A common approach to achieve low-power systems is duty-cycling, where components in a system are turned off periodically to meet an energy budget. However, this work shows that such an approach is not necessarily optimal in energy-efficiency, and proposes \textit{guided-processing} as a fundamentally better alternative. The proposed approach offers 1) explicit modeling of performance uncertainties in system internals, 2) a realistic resource consumption model, and 3) a key insight into the superiority of guided-processing over duty-cycling. Generalization from the cascade structure to the more general graph-based one is also presented. Once applied to optimize a large-scale audio sensing system with a practical detection application, empirical results show that the proposed approach significantly improves the detection performance (up to $1.7\times$ and $4\times$ reduction in false-alarm and miss rate, respectively) for the same energy consumption, when compared to the duty-cycling approach. △ Less

Submitted 1 May, 2017; originally announced May 2017.

Comments: preprint, the published version is in IEEE Transactions on Circuits and Systems I, Special Issue on Circuits and Systems for the Internet of Things - From Sensing to Sensemaking, 2017. arXiv admin note: substantial text overlap with arXiv:1705.00596

arXiv:1705.00596 [pdf, other]

doi 10.1109/JSTSP.2017.2679539

Feature-Sharing in Cascade Detection Systems with Multiple Applications

Authors: Long N. Le, Douglas L. Jones

Abstract: Traditional distributed detection systems are often designed for a single target application. However, with the emergence of the Internet of Things (IoT) paradigm, next-generation systems are expected to be a shared infrastructure for multiple applications. To this end, we propose a modular, cascade design for resource-efficient, multi-task detection systems. Two (classes of) applications are cons… ▽ More Traditional distributed detection systems are often designed for a single target application. However, with the emergence of the Internet of Things (IoT) paradigm, next-generation systems are expected to be a shared infrastructure for multiple applications. To this end, we propose a modular, cascade design for resource-efficient, multi-task detection systems. Two (classes of) applications are considered in the system, a primary and a secondary one. The primary application has universal features that can be shared with other applications, to reduce the overall feature extraction cost, while the secondary application does not. In this setting, the two applications can collaborate via feature sharing. We provide a method to optimize the operation of the multi-application cascade system based on an accurate resource consumption model. In addition, the inherent uncertainties in feature models are articulated and taken into account. For evaluation, the twin-comparison argument is invoked, and it is shown that, with the optimal feature sharing strategy, a system can achieve 9$\times$ resource saving and 1.43$\times$ improvement in detection performance. △ Less

Submitted 1 May, 2017; originally announced May 2017.

Comments: preprint, the published version is in IEEE Journal of Selected Topics in Signal Processing, Special Issue on Cooperative Signal Processing for Heterogeneous and Multi-Task Wireless Sensor Networks, 2017

Showing 1–22 of 22 results for author: Jones, D