Search | arXiv e-print repository

Private kNN-VC: Interpretable Anonymization of Converted Speech

Authors: Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller

Abstract: Speaker anonymization seeks to conceal a speaker's identity while preserving the utility of their speech. The achieved privacy is commonly evaluated with a speaker recognition model trained on anonymized speech. Although this represents a strong attack, it is unclear which aspects of speech are exploited to identify the speakers. Our research sets out to unveil these aspects. It starts with kNN-VC… ▽ More Speaker anonymization seeks to conceal a speaker's identity while preserving the utility of their speech. The achieved privacy is commonly evaluated with a speaker recognition model trained on anonymized speech. Although this represents a strong attack, it is unclear which aspects of speech are exploited to identify the speakers. Our research sets out to unveil these aspects. It starts with kNN-VC, a powerful voice conversion model that performs poorly as an anonymization system, presumably because of prosody leakage. To test this hypothesis, we extend kNN-VC with two interpretable components that anonymize the duration and variation of phones. These components increase privacy significantly, proving that the studied prosodic factors encode speaker identity and are exploited by the privacy attack. Additionally, we show that changes in the target selection algorithm considerably influence the outcome of the privacy attack. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: Accepted by Interspeech 2025

arXiv:2505.13930 [pdf, ps, other]

BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention

Authors: Yassine El Kheir, Tim Polzehl, Sebastian Möller

Abstract: We propose BiCrossMamba-ST, a robust framework for speech deepfake detection that leverages a dual-branch spectro-temporal architecture powered by bidirectional Mamba blocks and mutual cross-attention. By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech. In addition,… ▽ More We propose BiCrossMamba-ST, a robust framework for speech deepfake detection that leverages a dual-branch spectro-temporal architecture powered by bidirectional Mamba blocks and mutual cross-attention. By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech. In addition, our proposed framework leverages a convolution-based 2D attention map to focus on specific spectro-temporal regions, enabling robust deepfake detection. Operating directly on raw features, BiCrossMamba-ST achieves significant performance improvements, a 67.74% and 26.3% relative gain over state-of-the-art AASIST on ASVSpoof LA21 and ASVSpoof DF21 benchmarks, respectively, and a 6.80% improvement over RawBMamba on ASVSpoof DF21. Code and models will be made publicly available. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: Accepted Interspeech 2025

arXiv:2502.03559 [pdf, other]

Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection

Authors: Yassine El Kheir, Youness Samih, Suraj Maharjan, Tim Polzehl, Sebastian Möller

Abstract: This paper conducts a comprehensive layer-wise analysis of self-supervised learning (SSL) models for audio deepfake detection across diverse contexts, including multilingual datasets (English, Chinese, Spanish), partial, song, and scene-based deepfake scenarios. By systematically evaluating the contributions of different transformer layers, we uncover critical insights into model behavior and perf… ▽ More This paper conducts a comprehensive layer-wise analysis of self-supervised learning (SSL) models for audio deepfake detection across diverse contexts, including multilingual datasets (English, Chinese, Spanish), partial, song, and scene-based deepfake scenarios. By systematically evaluating the contributions of different transformer layers, we uncover critical insights into model behavior and performance. Our findings reveal that lower layers consistently provide the most discriminative features, while higher layers capture less relevant information. Notably, all models achieve competitive equal error rate (EER) scores even when employing a reduced number of layers. This indicates that we can reduce computational costs and increase the inference speed of detecting deepfakes by utilizing only a few lower layers. This work enhances our understanding of SSL models in deepfake detection, offering valuable insights applicable across varied linguistic and contextual settings. Our trained models and code are publicly available: https://github.com/Yaselley/SSL_Layerwise_Deepfake. △ Less

Submitted 7 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

Comments: Accepted to NAACL Findings 2025

arXiv:2204.07923 [pdf, other]

Accelerated MRI With Deep Linear Convolutional Transform Learning

Authors: Hongyi Gu, Burhaneddin Yaman, Steen Moeller, Il Yong Chun, Mehmet Akçakaya

Abstract: Recent studies show that deep learning (DL) based MRI reconstruction outperforms conventional methods, such as parallel imaging and compressed sensing (CS), in multiple applications. Unlike CS that is typically implemented with pre-determined linear representations for regularization, DL inherently uses a non-linear representation learned from a large database. Another line of work uses transform… ▽ More Recent studies show that deep learning (DL) based MRI reconstruction outperforms conventional methods, such as parallel imaging and compressed sensing (CS), in multiple applications. Unlike CS that is typically implemented with pre-determined linear representations for regularization, DL inherently uses a non-linear representation learned from a large database. Another line of work uses transform learning (TL) to bridge the gap between these two approaches by learning linear representations from data. In this work, we combine ideas from CS, TL and DL reconstructions to learn deep linear convolutional transforms as part of an algorithm unrolling approach. Using end-to-end training, our results show that the proposed technique can reconstruct MR images to a level comparable to DL methods, while supporting uniform undersampling patterns unlike conventional CS methods. Our proposed method relies on convex sparse image reconstruction with linear representation at inference time, which may be beneficial for characterizing robustness, stability and generalizability. △ Less

Submitted 19 August, 2022; v1 submitted 17 April, 2022; originally announced April 2022.

arXiv:2204.01115 [pdf, other]

On incorporating social speaker characteristics in synthetic speech

Authors: Sai Sirisha Rallabandi, Sebastian Möller

Abstract: In our previous work, we derived the acoustic features, that contribute to the perception of warmth and competence in synthetic speech. As an extension, in our current work, we investigate the impact of the derived vocal features in the generation of the desired characteristics. The acoustic features, spectral flux, F1 mean and F2 mean and their convex combinations were explored for the generation… ▽ More In our previous work, we derived the acoustic features, that contribute to the perception of warmth and competence in synthetic speech. As an extension, in our current work, we investigate the impact of the derived vocal features in the generation of the desired characteristics. The acoustic features, spectral flux, F1 mean and F2 mean and their convex combinations were explored for the generation of higher warmth in female speech. The voiced slope, spectral flux, and their convex combinations were investigated for the generation of higher competence in female speech. We have employed a feature quantization approach in the traditional end-to-end tacotron based speech synthesis model. The listening tests have shown that the convex combination of acoustic features displays higher Mean Opinion Scores of warmth and competence when compared to that of individual features. △ Less

Submitted 3 April, 2022; originally announced April 2022.

Comments: Submitted to Interspeech 2022

arXiv:2203.16032 [pdf, other]

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

Authors: Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen, Fuzheng Yang, Shidong Shang

Abstract: With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laborato… ▽ More With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laboratories and lately also in crowdsourcing following the international standards from ITU-T Rec. P.800 series. However, those approaches are costly and cannot be applied to customer data. Therefore, an effective objective assessment approach is needed to evaluate or monitor the speech quality of the ongoing conversation. The ConferencingSpeech 2022 challenge targets the non-intrusive deep neural network models for the speech quality assessment task. We open-sourced a training corpus with more than 86K speech clips in different languages, with a wide range of synthesized and live degradations and their corresponding subjective quality scores through crowdsourcing. 18 teams submitted their models for evaluation in this challenge. The blind test sets included about 4300 clips from wide ranges of degradations. This paper describes the challenge, the datasets, and the evaluation methods and reports the final results. △ Less

Submitted 31 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2112.06219 [pdf, other]

Visualising and Explaining Deep Learning Models for Speech Quality Prediction

Authors: H. Tilkorn, G. Mittag, S. Möller

Abstract: Estimating quality of transmitted speech is known to be a non-trivial task. While traditionally, test participants are asked to rate the quality of samples; nowadays, automated methods are available. These methods can be divided into: 1) intrusive models, which use both, the original and the degraded signals, and 2) non-intrusive models, which only require the degraded signal. Recently, non-intrus… ▽ More Estimating quality of transmitted speech is known to be a non-trivial task. While traditionally, test participants are asked to rate the quality of samples; nowadays, automated methods are available. These methods can be divided into: 1) intrusive models, which use both, the original and the degraded signals, and 2) non-intrusive models, which only require the degraded signal. Recently, non-intrusive models based on neural networks showed to outperform signal processing based models. However, the advantages of deep learning based models come with the cost of being more challenging to interpret. To get more insight into the prediction models the non-intrusive speech quality prediction model NISQA is analyzed in this paper. NISQA is composed of a convolutional neural network (CNN) and a recurrent neural network (RNN). The task of the CNN is to compute relevant features for the speech quality prediction on a frame level, while the RNN models time-dependencies between the individual speech frames. Different explanation algorithms are used to understand the automatically learned features of the CNN. In this way, several interpretable features could be identified, such as the sensitivity to noise or strong interruptions. On the other hand, it was found that multiple features carry redundant information. △ Less

Submitted 12 December, 2021; originally announced December 2021.

Comments: 4 pages, 6 figures, In Proceedings of the DAGA 2021 (the annual conference of the German Acoustical Society, DEGA)

ACM Class: I.2.7

arXiv:2105.05827 [pdf, other]

20-fold Accelerated 7T fMRI Using Referenceless Self-Supervised Deep Learning Reconstruction

Authors: Omer Burak Demirel, Burhaneddin Yaman, Logan Dowdle, Steen Moeller, Luca Vizioli, Essa Yacoub, John Strupp, Cheryl A. Olman, Kâmil Uğurbil, Mehmet Akçakaya

Abstract: High spatial and temporal resolution across the whole brain is essential to accurately resolve neural activities in fMRI. Therefore, accelerated imaging techniques target improved coverage with high spatio-temporal resolution. Simultaneous multi-slice (SMS) imaging combined with in-plane acceleration are used in large studies that involve ultrahigh field fMRI, such as the Human Connectome Project.… ▽ More High spatial and temporal resolution across the whole brain is essential to accurately resolve neural activities in fMRI. Therefore, accelerated imaging techniques target improved coverage with high spatio-temporal resolution. Simultaneous multi-slice (SMS) imaging combined with in-plane acceleration are used in large studies that involve ultrahigh field fMRI, such as the Human Connectome Project. However, for even higher acceleration rates, these methods cannot be reliably utilized due to aliasing and noise artifacts. Deep learning (DL) reconstruction techniques have recently gained substantial interest for improving highly-accelerated MRI. Supervised learning of DL reconstructions generally requires fully-sampled training datasets, which is not available for high-resolution fMRI studies. To tackle this challenge, self-supervised learning has been proposed for training of DL reconstruction with only undersampled datasets, showing similar performance to supervised learning. In this study, we utilize a self-supervised physics-guided DL reconstruction on a 5-fold SMS and 4-fold in-plane accelerated 7T fMRI data. Our results show that our self-supervised DL reconstruction produce high-quality images at this 20-fold acceleration, substantially improving on existing methods, while showing similar functional precision and temporal effects in the subsequent analysis compared to a standard 10-fold accelerated acquisition. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2105.04532 [pdf, other]

Improved Simultaneous Multi-Slice Functional MRI Using Self-supervised Deep Learning

Authors: Omer Burak Demirel, Burhaneddin Yaman, Logan Dowdle, Steen Moeller, Luca Vizioli, Essa Yacoub, John Strupp, Cheryl A. Olman, Kâmil Uğurbil, Mehmet Akçakaya

Abstract: Functional MRI (fMRI) is commonly used for interpreting neural activities across the brain. Numerous accelerated fMRI techniques aim to provide improved spatiotemporal resolutions. Among these, simultaneous multi-slice (SMS) imaging has emerged as a powerful strategy, becoming a part of large-scale studies, such as the Human Connectome Project. However, when SMS imaging is combined with in-plane a… ▽ More Functional MRI (fMRI) is commonly used for interpreting neural activities across the brain. Numerous accelerated fMRI techniques aim to provide improved spatiotemporal resolutions. Among these, simultaneous multi-slice (SMS) imaging has emerged as a powerful strategy, becoming a part of large-scale studies, such as the Human Connectome Project. However, when SMS imaging is combined with in-plane acceleration for higher acceleration rates, conventional SMS reconstruction methods may suffer from noise amplification and other artifacts. Recently, deep learning (DL) techniques have gained interest for improving MRI reconstruction. However, these methods are typically trained in a supervised manner that necessitates fully-sampled reference data, which is not feasible in highly-accelerated fMRI acquisitions. Self-supervised learning that does not require fully-sampled data has recently been proposed and has shown similar performance to supervised learning. However, it has only been applied for in-plane acceleration. Furthermore the effect of DL reconstruction on subsequent fMRI analysis remains unclear. In this work, we extend self-supervised DL reconstruction to SMS imaging. Our results on prospectively 10-fold accelerated 7T fMRI data show that self-supervised DL reduces reconstruction noise and suppresses residual artifacts. Subsequent fMRI analysis remains unaltered by DL processing, while the improved temporal signal-to-noise ratio produces higher coherence estimates between task runs. △ Less

Submitted 10 May, 2021; originally announced May 2021.

arXiv:2105.00783 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053951

Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks

Authors: Gabriel Mittags, Sebastian Möller

Abstract: In this paper, we present a full-reference speech quality prediction model with a deep learning approach. The model determines a feature representation of the reference and the degraded signal through a siamese recurrent convolutional network that shares the weights for both signals as input. The resulting features are then used to align the signals with an attention mechanism and are finally comb… ▽ More In this paper, we present a full-reference speech quality prediction model with a deep learning approach. The model determines a feature representation of the reference and the degraded signal through a siamese recurrent convolutional network that shares the weights for both signals as input. The resulting features are then used to align the signals with an attention mechanism and are finally combined to estimate the overall speech quality. The proposed network architecture represents a simple solution for the time-alignment problem that occurs for speech signals transmitted through Voice-Over-IP networks and shows how the clean reference signal can be incorporated into speech quality models that are based on end-to-end trained neural networks. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: Late upload, presented at ICASSP 2020

arXiv:2104.11673 [pdf, other]

doi 10.21437/Interspeech.2020-2382

Deep Learning Based Assessment of Synthetic Speech Naturalness

Authors: Gabriel Mittag, Sebastian Möller

Abstract: In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such a… ▽ More In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such as from the Blizzard Challenge and the Voice Conversion Challenge. Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores. The proposed model is made publicly available and can, for example, be used to evaluate different TTS system configurations. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: Late upload, presented at Interspeech 2020

arXiv:2104.10217 [pdf, other]

doi 10.1109/QoMEX51781.2021.9465384

Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets

Authors: Gabriel Mittag, Saman Zadtootaghaj, Thilo Michael, Babak Naderi, Sebastian Möller

Abstract: The ground truth used for training image, video, or speech quality prediction models is based on the Mean Opinion Scores (MOS) obtained from subjective experiments. Usually, it is necessary to conduct multiple experiments, mostly with different test participants, to obtain enough data to train quality models based on machine learning. Each of these experiments is subject to an experiment-specific… ▽ More The ground truth used for training image, video, or speech quality prediction models is based on the Mean Opinion Scores (MOS) obtained from subjective experiments. Usually, it is necessary to conduct multiple experiments, mostly with different test participants, to obtain enough data to train quality models based on machine learning. Each of these experiments is subject to an experiment-specific bias, where the rating of the same file may be substantially different in two experiments (e.g. depending on the overall quality distribution). These different ratings for the same distortion levels confuse neural networks during training and lead to lower performance. To overcome this problem, we propose a bias-aware loss function that estimates each dataset's biases during training with a linear function and considers it while optimising the network weights. We prove the efficiency of the proposed method by training and validating quality prediction models on synthetic and subjective image and speech quality datasets. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: Accepted at QoMEX 2021

arXiv:2104.09494 [pdf, other]

doi 10.21437/Interspeech.2021-299

NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets

Authors: Gabriel Mittag, Babak Naderi, Assmaa Chehadi, Sebastian Möller

Abstract: In this paper, we present an update to the NISQA speech quality prediction model that is focused on distortions that occur in communication networks. In contrast to the previous version, the model is trained end-to-end and the time-dependency modelling and time-pooling is achieved through a Self-Attention mechanism. Besides overall speech quality, the model also predicts the four speech quality di… ▽ More In this paper, we present an update to the NISQA speech quality prediction model that is focused on distortions that occur in communication networks. In contrast to the previous version, the model is trained end-to-end and the time-dependency modelling and time-pooling is achieved through a Self-Attention mechanism. Besides overall speech quality, the model also predicts the four speech quality dimensions Noisiness, Coloration, Discontinuity, and Loudness, and in this way gives more insight into the cause of a quality degradation. Furthermore, new datasets with over 13,000 speech files were created for training and validation of the model. The model was finally tested on a new, live-talking test dataset that contains recordings of real telephone calls. Overall, NISQA was trained and evaluated on 81 datasets from different sources and showed to provide reliable predictions also for unknown speech samples. The code, model weights, and datasets are open-sourced. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: Submitted to Interspeech 2021

arXiv:2104.04371 [pdf, other]

Speech Quality Assessment in Crowdsourcing: Comparison Category Rating Method

Authors: Babak Naderi, Sebastian Möller, Ross Cutler

Abstract: Traditionally, Quality of Experience (QoE) for a communication system is evaluated through a subjective test. The most common test method for speech QoE is the Absolute Category Rating (ACR), in which participants listen to a set of stimuli, processed by the underlying test conditions, and rate their perceived quality for each stimulus on a specific scale. The Comparison Category Rating (CCR) is a… ▽ More Traditionally, Quality of Experience (QoE) for a communication system is evaluated through a subjective test. The most common test method for speech QoE is the Absolute Category Rating (ACR), in which participants listen to a set of stimuli, processed by the underlying test conditions, and rate their perceived quality for each stimulus on a specific scale. The Comparison Category Rating (CCR) is another standard approach in which participants listen to both reference and processed stimuli and rate their quality compared to the other one. The CCR method is particularly suitable for systems that improve the quality of input speech. This paper evaluates an adaptation of the CCR test procedure for assessing speech quality in the crowdsourcing set-up. The CCR method was introduced in the ITU-T Rec. P.800 for laboratory-based experiments. We adapted the test for the crowdsourcing approach following the guidelines from ITU-T Rec. P.800 and P.808. We show that the results of the CCR procedure via crowdsourcing are highly reproducible. We also compared the CCR test results with widely used ACR test procedures obtained in the laboratory and crowdsourcing. Our results show that the CCR procedure in crowdsourcing is a reliable and valid test method. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted for QoMEX2021

arXiv:2103.03970 [pdf, other]

doi 10.1109/TASLP.2021.3057955

Incorporating Wireless Communication Parameters into the E-Model Algorithm

Authors: Demóstenes Z. Rodríguez, Dick Carrillo Melgarejo, Miguel A. Ramírez, Pedro H. J. Nardelli, Sebastian Möller

Abstract: Telecommunication service providers have to guarantee acceptable speech quality during a phone call to avoid a negative impact on the users' quality of experience. Currently, there are different speech quality assessment methods. ITU-T Recommendation G.107 describes the E-model algorithm, which is a computational model developed for network planning purposes focused on narrowband (NB) networks. La… ▽ More Telecommunication service providers have to guarantee acceptable speech quality during a phone call to avoid a negative impact on the users' quality of experience. Currently, there are different speech quality assessment methods. ITU-T Recommendation G.107 describes the E-model algorithm, which is a computational model developed for network planning purposes focused on narrowband (NB) networks. Later, ITU-T Recommendations G.107.1 and G.107.2 were developed for wideband (WB) and fullband (FB) networks. These algorithms use different impairment factors, each one related to different speech communication steps. However, the NB, WB, and FB E-model algorithms do not consider wireless techniques used in these networks, such as Multiple-Input-Multiple-Output (MIMO) systems, which are used to improve the communication system robustness in the presence of different types of wireless channel degradation. In this context, the main objective of this study is to propose a general methodology to incorporate wireless network parameters into the NB and WB E-model algorithms. To accomplish this goal, MIMO and wireless channel parameters are incorporated into the E-model algorithms, specifically into the $I_{e,eff}$ and $I_{e,eff,WB}$ impairment factors. For performance validation, subjective tests were carried out, and the proposed methodology reached a Pearson correlation coefficient (PCC) and a root mean square error (RMSE) of $0.9732$ and $0.2351$, respectively. It is noteworthy that our proposed methodology does not affect the rest of the E-model input parameters, and it intends to be useful for wireless network planning in speech communication services. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 18 pages

Journal ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021

arXiv:2102.13066 [pdf]

On Instabilities of Conventional Multi-Coil MRI Reconstruction to Small Adverserial Perturbations

Authors: Chi Zhang, Jinghan Jia, Burhaneddin Yaman, Steen Moeller, Sijia Liu, Mingyi Hong, Mehmet Akçakaya

Abstract: Although deep learning (DL) has received much attention in accelerated MRI, recent studies suggest small perturbations may lead to instabilities in DL-based reconstructions, leading to concern for their clinical application. However, these works focus on single-coil acquisitions, which is not practical. We investigate instabilities caused by small adversarial attacks for multi-coil acquisitions. O… ▽ More Although deep learning (DL) has received much attention in accelerated MRI, recent studies suggest small perturbations may lead to instabilities in DL-based reconstructions, leading to concern for their clinical application. However, these works focus on single-coil acquisitions, which is not practical. We investigate instabilities caused by small adversarial attacks for multi-coil acquisitions. Our results suggest that, parallel imaging and multi-coil CS exhibit considerable instabilities against small adversarial perturbations. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: To appear in Proceedings of the 29th Annual Meeting of ISMRM, 2021

arXiv:2011.09414 [pdf, other]

doi 10.1109/ISBI48211.2021.9434054

Self-Supervised Physics-Guided Deep Learning Reconstruction For High-Resolution 3D LGE CMR

Authors: Burhaneddin Yaman, Chetan Shenoy, Zilin Deng, Steen Moeller, Hossam El-Rewaidy, Reza Nezafat, Mehmet Akçakaya

Abstract: Late gadolinium enhancement (LGE) cardiac MRI (CMR) is the clinical standard for diagnosis of myocardial scar. 3D isotropic LGE CMR provides improved coverage and resolution compared to 2D imaging. However, image acceleration is required due to long scan times and contrast washout. Physics-guided deep learning (PG-DL) approaches have recently emerged as an improved accelerated MRI strategy. Traini… ▽ More Late gadolinium enhancement (LGE) cardiac MRI (CMR) is the clinical standard for diagnosis of myocardial scar. 3D isotropic LGE CMR provides improved coverage and resolution compared to 2D imaging. However, image acceleration is required due to long scan times and contrast washout. Physics-guided deep learning (PG-DL) approaches have recently emerged as an improved accelerated MRI strategy. Training of PG-DL methods is typically performed in supervised manner requiring fully-sampled data as reference, which is challenging in 3D LGE CMR. Recently, a self-supervised learning approach was proposed to enable training PG-DL techniques without fully-sampled data. In this work, we extend this self-supervised learning approach to 3D imaging, while tackling challenges related to small training database sizes of 3D volumes. Results and a reader study on prospectively accelerated 3D LGE show that the proposed approach at 6-fold acceleration outperforms the clinically utilized compressed sensing approach at 3-fold acceleration. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Journal ref: Proceedings of IEEE ISBI, 2021

arXiv:2010.13868 [pdf, other]

doi 10.1109/ICASSP39728.2021.9413495

Improved Supervised Training of Physics-Guided Deep Learning Image Reconstruction with Multi-Masking

Authors: Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Mehmet Akçakaya

Abstract: Physics-guided deep learning (PG-DL) via algorithm unrolling has received significant interest for improved image reconstruction, including MRI applications. These methods unroll an iterative optimization algorithm into a series of regularizer and data consistency units. The unrolled networks are typically trained end-to-end using a supervised approach. Current supervised PG-DL approaches use all… ▽ More Physics-guided deep learning (PG-DL) via algorithm unrolling has received significant interest for improved image reconstruction, including MRI applications. These methods unroll an iterative optimization algorithm into a series of regularizer and data consistency units. The unrolled networks are typically trained end-to-end using a supervised approach. Current supervised PG-DL approaches use all of the available sub-sampled measurements in their data consistency units. Thus, the network learns to fit the rest of the measurements. In this study, we propose to improve the performance and robustness of supervised training by utilizing randomness by retrospectively selecting only a subset of all the available measurements for data consistency units. The process is repeated multiple times using different random masks during training for further enhancement. Results on knee MRI show that the proposed multi-mask supervised PG-DL enhances reconstruction performance compared to conventional supervised PG-DL approaches. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Journal ref: Proceedings of IEEE ICASSP, 2021

arXiv:2010.13260 [pdf, ps, other]

Effect of Language Proficiency on Subjective Evaluation of Noise Suppression Algorithms

Authors: Babak Naderi, Gabriel Mittag, Rafael Zequeira Jim\a'enez, Sebastian Möller

Abstract: Speech communication systems based on Voice-over-IP technology are frequently used by native as well as non-native speakers of a target language, e.g. in international phone calls or telemeetings. Frequently, such calls also occur in a noisy environment, making noise suppression modules necessary to increase perceived quality of experience. Whereas standard tests for assessing perceived quality ma… ▽ More Speech communication systems based on Voice-over-IP technology are frequently used by native as well as non-native speakers of a target language, e.g. in international phone calls or telemeetings. Frequently, such calls also occur in a noisy environment, making noise suppression modules necessary to increase perceived quality of experience. Whereas standard tests for assessing perceived quality make use of native listeners, we assume that noise-reduced speech and residual noise may affect native and non-native listeners of a target language in different ways. To test this assumption, we report results of two subjective tests conducted with English and German native listeners who judge the quality of speech samples recorded by native English, German, and Mandarin speakers, which are degraded with different background noise levels and noise suppression effects. The experiments were conducted following the standardized ITU-T Rec. P.835 approach, however implemented in a crowdsourcing setting according to ITU-T Rec. P.808. Our results show a significant influence of language on speech signal ratings and, consequently, on the overall perceived quality in specific conditions. △ Less

Submitted 25 October, 2020; originally announced October 2020.

arXiv:2009.07163 [pdf]

doi 10.1002/mrm.28788

A Self-Decoupled 32 Channel Receive Array for Human Brain Magnetic Resonance Imaging at 10.5T

Authors: Nader Tavaf, Russell L. Lagore, Steve Jungst, Shajan Gunamony, Jerahmie Radder, Andrea Grant, Steen Moeller, Edward Auerbach, Kamil Ugurbil, Gregor Adriany, Pierre-Francois Van de Moortele

Abstract: Purpose: Receive array layout, noise mitigation and B0 field strength are crucial contributors to signal-to-noise ratio (SNR) and parallel imaging performance. Here, we investigate SNR and parallel imaging gains at 10.5 Tesla (T) compared to 7T using 32-channel receive arrays at both fields. Methods: A self-decoupled 32-channel receive array for human brain imaging at 10.5T (10.5T-32Rx), consistin… ▽ More Purpose: Receive array layout, noise mitigation and B0 field strength are crucial contributors to signal-to-noise ratio (SNR) and parallel imaging performance. Here, we investigate SNR and parallel imaging gains at 10.5 Tesla (T) compared to 7T using 32-channel receive arrays at both fields. Methods: A self-decoupled 32-channel receive array for human brain imaging at 10.5T (10.5T-32Rx), consisting of 31 loops and one cloverleaf element, was co-designed and built in tandem with a 16-channel dual-row loop transmitter. Novel receive array design and self-decoupling techniques were implemented. Parallel imaging performance, in terms of SNR and noise amplification (g-factor), of the 10.5T-32Rx was compared to the performance of an industry-standard 32-channel receiver at 7T (7T-32Rx) via experimental phantom measurements. Results: Compared to the 7T-32Rx, the 10.5T-32Rx provided 1.46 times the central SNR and 2.08 times the peripheral SNR. Minimum inverse g-factor value of the 10.5T-32Rx (min(1/g) = 0.56) was 51% higher than that of the 7T-32Rx (min(1/g) = 0.37) with R=4x4 2D acceleration, resulting in significantly enhanced parallel imaging performance at 10.5T compared to 7T. The g-factor values of 10.5T-32Rx were on par with those of a 64-channel receiver at 7T, e.g. 1.8 versus 1.9, respectively, with R=4x4 axial acceleration. Conclusion: Experimental measurements demonstrated effective self-decoupling of the receive array as well as substantial gains in SNR and parallel imaging performance at 10.5T compared to 7T. △ Less

Submitted 9 November, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: to be published in Magnetic Resonance in Medicine

Journal ref: Magn Reson Med. 2021 pp 1-14

arXiv:2008.06029 [pdf]

doi 10.1002/nbm.4798

Multi-Mask Self-Supervised Learning for Physics-Guided Neural Networks in Highly Accelerated MRI

Authors: Burhaneddin Yaman, Hongyi Gu, Seyed Amir Hossein Hosseini, Omer Burak Demirel, Steen Moeller, Jutta Ellermann, Kâmil Uğurbil, Mehmet Akçakaya

Abstract: Self-supervised learning has shown great promise due to its capability to train deep learning MRI reconstruction methods without fully-sampled data. Current self-supervised learning methods for physics-guided reconstruction networks split acquired undersampled data into two disjoint sets, where one is used for data consistency (DC) in the unrolled network and the other to define the training loss.… ▽ More Self-supervised learning has shown great promise due to its capability to train deep learning MRI reconstruction methods without fully-sampled data. Current self-supervised learning methods for physics-guided reconstruction networks split acquired undersampled data into two disjoint sets, where one is used for data consistency (DC) in the unrolled network and the other to define the training loss. In this study, we propose an improved self-supervised learning strategy that more efficiently uses the acquired data to train a physics-guided reconstruction network without a database of fully-sampled data. The proposed multi-mask self-supervised learning via data undersampling (SSDU) applies a hold-out masking operation on acquired measurements to split it into multiple pairs of disjoint sets for each training sample, while using one of these pairs for DC units and the other for defining loss, thereby more efficiently using the undersampled data. Multi-mask SSDU is applied on fully-sampled 3D knee and prospectively undersampled 3D brain MRI datasets, for various acceleration rates and patterns, and compared to CG-SENSE and single-mask SSDU DL-MRI, as well as supervised DL-MRI when fully-sampled data is available. Results on knee MRI show that the proposed multi-mask SSDU outperforms SSDU and performs closely with supervised DL-MRI. A clinical reader study further ranks the multi-mask SSDU higher than supervised DL-MRI in terms of SNR and aliasing artifacts. Results on brain MRI show that multi-mask SSDU achieves better reconstruction quality compared to SSDU. Reader study demonstrates that multi-mask SSDU at R=8 significantly improves reconstruction compared to single-mask SSDU at R=8, as well as CG-SENSE at R=2. △ Less

Submitted 8 June, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

Journal ref: NMR in Biomedicine, 2022

arXiv:2005.05550 [pdf, other]

doi 10.1109/EMBC44109.2020.9176241

High-Fidelity Accelerated MRI Reconstruction by Scan-Specific Fine-Tuning of Physics-Based Neural Networks

Authors: Seyed Amir Hossein Hosseini, Burhaneddin Yaman, Steen Moeller, Mehmet Akçakaya

Abstract: Long scan duration remains a challenge for high-resolution MRI. Deep learning has emerged as a powerful means for accelerated MRI reconstruction by providing data-driven regularizers that are directly learned from data. These data-driven priors typically remain unchanged for future data in the testing phase once they are learned during training. In this study, we propose to use a transfer learning… ▽ More Long scan duration remains a challenge for high-resolution MRI. Deep learning has emerged as a powerful means for accelerated MRI reconstruction by providing data-driven regularizers that are directly learned from data. These data-driven priors typically remain unchanged for future data in the testing phase once they are learned during training. In this study, we propose to use a transfer learning approach to fine-tune these regularizers for new subjects using a self-supervision approach. While the proposed approach can compromise the extremely fast reconstruction time of deep learning MRI methods, our results on knee MRI indicate that such adaptation can substantially reduce the remaining artifacts in reconstructed images. In addition, the proposed approach has the potential to reduce the risks of generalization to rare pathological conditions, which may be unavailable in the training data. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Journal ref: Proceedings of IEEE EMBC, 2020

arXiv:2005.00836 [pdf, ps, other]

Towards Deep Learning Methods for Quality Assessment of Computer-Generated Imagery

Authors: Markus Utke, Saman Zadtootaghaj, Steven Schmidt, Sebastian Möller

Abstract: Video gaming streaming services are growing rapidly due to new services such as passive video streaming, e.g. Twitch.tv, and cloud gaming, e.g. Nvidia Geforce Now. In contrast to traditional video content, gaming content has special characteristics such as extremely high motion for some games, special motion patterns, synthetic content and repetitive content, which makes the state-of-the-art video… ▽ More Video gaming streaming services are growing rapidly due to new services such as passive video streaming, e.g. Twitch.tv, and cloud gaming, e.g. Nvidia Geforce Now. In contrast to traditional video content, gaming content has special characteristics such as extremely high motion for some games, special motion patterns, synthetic content and repetitive content, which makes the state-of-the-art video and image quality metrics perform weaker for this special computer generated content. In this paper, we outline our plan to build a deep learningbased quality metric for video gaming quality assessment. In addition, we present initial results by training the network based on VMAF values as a ground truth to give some insights on how to build a metric in future. The paper describes the method that is used to choose an appropriate Convolutional Neural Network architecture. Furthermore, we estimate the size of the required subjective quality dataset which achieves a sufficiently high performance. The results show that by taking around 5k images for training of the last six modules of Xception, we can obtain a relatively high performance metric to assess the quality of distorted video games. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: 4 pages

arXiv:2005.00400 [pdf, other]

Multi-episodic Perceived Quality of an Audio-on-Demand Service

Authors: Dennis Guse, Oliver Hohlfeld, Anna Wunderlich, Benjamin Weiss, Sebastian Möller

Abstract: QoE is traditionally evaluated by using short stimuli usually representing parts or single usage episodes. This opens the question on how the overall service perception involving multiple} usage episodes can be evaluated---a question of high practical relevance to service operators. Despite initial research on this challenging aspect of multi-episodic perceived quality, the question of the underly… ▽ More QoE is traditionally evaluated by using short stimuli usually representing parts or single usage episodes. This opens the question on how the overall service perception involving multiple} usage episodes can be evaluated---a question of high practical relevance to service operators. Despite initial research on this challenging aspect of multi-episodic perceived quality, the question of the underlying quality formation processes and its factors are still to be discovered. We present a multi-episodic experiment of an Audio on Demand service over a usage period of 6~days with 93 participants. Our work directly extends prior work investigating the impact of time between usage episodes. The results show similar effects---also the recency effect is not statistically significant. In addition, we extend prediction of multi-episodic judgments by accounting for the observed saturation. △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: To appear at IEEE QoMEX 2020

ACM Class: H.5.1; H.5.5; C.2.m

arXiv:1912.07669 [pdf]

doi 10.1002/mrm.28378

Self-Supervised Learning of Physics-Guided Reconstruction Neural Networks without Fully-Sampled Reference Data

Authors: Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uğurbil, Mehmet Akçakaya

Abstract: Purpose: To develop a strategy for training a physics-guided MRI reconstruction neural network without a database of fully-sampled datasets. Theory and Methods: Self-supervised learning via data under-sampling (SSDU) for physics-guided deep learning (DL) reconstruction partitions available measurements into two disjoint sets, one of which is used in the data consistency units in the unrolled netwo… ▽ More Purpose: To develop a strategy for training a physics-guided MRI reconstruction neural network without a database of fully-sampled datasets. Theory and Methods: Self-supervised learning via data under-sampling (SSDU) for physics-guided deep learning (DL) reconstruction partitions available measurements into two disjoint sets, one of which is used in the data consistency units in the unrolled network and the other is used to define the loss for training. The proposed training without fully-sampled data is compared to fully-supervised training with ground-truth data, as well as conventional compressed sensing and parallel imaging methods using the publicly available fastMRI knee database. The same physics-guided neural network is used for both proposed SSDU and supervised training. The SSDU training is also applied to prospectively 2-fold accelerated high-resolution brain datasets at different acceleration rates, and compared to parallel imaging. Results: Results on five different knee sequences at acceleration rate of 4 shows that proposed self-supervised approach performs closely with supervised learning, while significantly outperforming conventional compressed sensing and parallel imaging, as characterized by quantitative metrics and a clinical reader study. The results on prospectively sub-sampled brain datasets, where supervised learning cannot be employed due to lack of ground-truth reference, show that the proposed self-supervised approach successfully perform reconstruction at high acceleration rates (4, 6 and 8). Image readings indicate improved visual reconstruction quality with the proposed approach compared to parallel imaging at acquisition acceleration. Conclusion: The proposed SSDU approach allows training of physics-guided DL-MRI reconstruction without fully-sampled data, while achieving comparable results with supervised DL-MRI trained on fully-sampled data. △ Less

Submitted 14 April, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

Comments: This work is an extension of our previous work arXiv:1910.09116

Journal ref: Magnetic Resonance in Medicine, 2020

arXiv:1912.07197 [pdf, other]

doi 10.1109/JSTSP.2020.3003170

Dense Recurrent Neural Networks for Accelerated MRI: History-Cognizant Unrolling of Optimization Algorithms

Authors: Seyed Amir Hossein Hosseini, Burhaneddin Yaman, Steen Moeller, Mingyi Hong, Mehmet Akçakaya

Abstract: Inverse problems for accelerated MRI typically incorporate domain-specific knowledge about the forward encoding operator in a regularized reconstruction framework. Recently physics-driven deep learning (DL) methods have been proposed to use neural networks for data-driven regularization. These methods unroll iterative optimization algorithms to solve the inverse problem objective function, by alte… ▽ More Inverse problems for accelerated MRI typically incorporate domain-specific knowledge about the forward encoding operator in a regularized reconstruction framework. Recently physics-driven deep learning (DL) methods have been proposed to use neural networks for data-driven regularization. These methods unroll iterative optimization algorithms to solve the inverse problem objective function, by alternating between domain-specific data consistency and data-driven regularization via neural networks. The whole unrolled network is then trained end-to-end to learn the parameters of the network. Due to simplicity of data consistency updates with gradient descent steps, proximal gradient descent (PGD) is a common approach to unroll physics-driven DL reconstruction methods. However, PGD methods have slow convergence rates, necessitating a higher number of unrolled iterations, leading to memory issues in training and slower reconstruction times in testing. Inspired by efficient variants of PGD methods that use a history of the previous iterates, we propose a history-cognizant unrolling of the optimization algorithm with dense connections across iterations for improved performance. In our approach, the gradient descent steps are calculated at a trainable combination of the outputs of all the previous regularization units. We also apply this idea to unrolling variable splitting methods with quadratic relaxation. Our results in reconstruction of the fastMRI knee dataset show that the proposed history-cognizant approach reduces residual aliasing artifacts compared to its conventional unrolled counterpart without requiring extra computational power or increasing reconstruction time. △ Less

Submitted 8 July, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

Journal ref: IEEE Journal of Selected Topics in Signal Processing, 2020

arXiv:1910.09116 [pdf, other]

doi 10.1109/ISBI45749.2020.9098514

Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data

Authors: Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uǧurbil, Mehmet Akçakaya

Abstract: Deep learning (DL) has emerged as a tool for improving accelerated MRI reconstruction. A common strategy among DL methods is the physics-based approach, where a regularized iterative algorithm alternating between data consistency and a regularizer is unrolled for a finite number of iterations. This unrolled network is then trained end-to-end in a supervised manner, using fully-sampled data as grou… ▽ More Deep learning (DL) has emerged as a tool for improving accelerated MRI reconstruction. A common strategy among DL methods is the physics-based approach, where a regularized iterative algorithm alternating between data consistency and a regularizer is unrolled for a finite number of iterations. This unrolled network is then trained end-to-end in a supervised manner, using fully-sampled data as ground truth for the network output. However, in a number of scenarios, it is difficult to obtain fully-sampled datasets, due to physiological constraints such as organ motion or physical constraints such as signal decay. In this work, we tackle this issue and propose a self-supervised learning strategy that enables physics-based DL reconstruction without fully-sampled data. Our approach is to divide the acquired sub-sampled points for each scan into training and validation subsets. During training, data consistency is enforced over the training subset, while the validation subset is used to define the loss function. Results show that the proposed self-supervised learning method successfully reconstructs images without fully-sampled data, performing similarly to the supervised approach that is trained with fully-sampled references. This has implications for physics-based inverse problem approaches for other settings, where fully-sampled data is not available or possible to acquire. △ Less

Submitted 20 October, 2019; originally announced October 2019.

Comments: 5 Pages, 5 Figures

Journal ref: Proceedings of IEEE ISBI, 2020

arXiv:1907.08137 [pdf]

doi 10.1371/journal.pone.0229418

Accelerated Coronary MRI with sRAKI: A Database-Free Self-Consistent Neural Network k-space Reconstruction for Arbitrary Undersampling

Authors: Seyed Amir Hossein Hosseini, Chi Zhang, Sebastian Weingärtner, Steen Moeller, Matthias Stuber, Kâmil Uǧurbil, Mehmet Akçakaya

Abstract: This study aims to accelerate coronary MRI using a novel reconstruction algorithm, called self-consistent robust artificial-neural-networks for k-space interpolation (sRAKI). sRAKI performs iterative parallel imaging reconstruction by enforcing coil self-consistency using subject-specific neural networks. This approach extends the linear convolutions in SPIRiT to nonlinear interpolation using conv… ▽ More This study aims to accelerate coronary MRI using a novel reconstruction algorithm, called self-consistent robust artificial-neural-networks for k-space interpolation (sRAKI). sRAKI performs iterative parallel imaging reconstruction by enforcing coil self-consistency using subject-specific neural networks. This approach extends the linear convolutions in SPIRiT to nonlinear interpolation using convolutional neural networks (CNNs). These CNNs are trained individually for each scan using the scan-specific autocalibrating signal (ACS) data. Reconstruction is performed by imposing the learned self-consistency and data-consistency enabling sRAKI to support random undersampling patterns. Fully-sampled targeted right coronary artery MRI was acquired in six healthy subjects for evaluation. The data were retrospectively undersampled, and reconstructed using SPIRiT, $\ell_1$-SPIRiT and sRAKI for acceleration rates of 2 to 5. Additionally, prospectively undersampled whole-heart coronary MRI was acquired to further evaluate performance. The results indicate that sRAKI reduces noise amplification and blurring artifacts compared with SPIRiT and $\ell_1$-SPIRiT, especially at high acceleration rates in targeted data. Quantitative analysis shows that sRAKI improves normalized mean-squared-error (~44% and ~21% over SPIRiT and $\ell_1$-SPIRiT at rate 5) and vessel sharpness (~10% and ~20% over SPIRiT and $\ell_1$-SPIRiT at rate 5). In addition, whole-heart data shows the sharpest coronary arteries when resolved using sRAKI, with 11% and 15% improvement in vessel sharpness over SPIRiT and $\ell_1$-SPIRiT, respectively. Thus, sRAKI is a database-free neural network-based reconstruction technique that may further accelerate coronary MRI with arbitrary undersampling patterns, while improving noise resilience over linear parallel imaging and image sharpness over $\ell_1$ regularization techniques. △ Less

Submitted 18 July, 2019; originally announced July 2019.

Comments: This work has been partially presented at ISMRM Workshop on Machine Learning Part 2 (October 2018), SCMR/ISMRM Co-Provided Workshop (February 2019), IEEE International Symposium on Biomedical Imaging (April 2019) and ISMRM 27$^{th}$ Annual Meeting and Exhibition (May 2019)

arXiv:1904.01112 [pdf, other]

Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction

Authors: Florian Knoll, Kerstin Hammernik, Chi Zhang, Steen Moeller, Thomas Pock, Daniel K. Sodickson, Mehmet Akcakaya

Abstract: Following the success of deep learning in a wide range of applications, neural network-based machine learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deep learning techniques from computer vision and image processing have been successfully applied to non-linear image reconstruction in the spirit of compressed sens… ▽ More Following the success of deep learning in a wide range of applications, neural network-based machine learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deep learning techniques from computer vision and image processing have been successfully applied to non-linear image reconstruction in the spirit of compressed sensing for both low dose computed tomography and accelerated MRI. The additional integration of multi-coil information to recover missing k-space lines in the MRI reconstruction process, is still studied less frequently, even though it is the de-facto standard for currently used accelerated MR acquisitions. This manuscript provides an overview of the recent machine learning approaches that have been proposed specifically for improving parallel imaging. A general background introduction to parallel MRI is given that is structured around the classical view of image space and k-space based methods. Both linear and non-linear methods are covered, followed by a discussion of recent efforts to further improve parallel imaging using machine learning, and specifically using artificial neural networks. Image-domain based techniques that introduce improved regularizers are covered as well as k-space based methods, where the focus is on better interpolation strategies using neural networks. Issues and open problems are discussed as well as recent efforts for producing open datasets and benchmarks for the community. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: 14 pages, 7 figures

arXiv:1008.4895 [pdf, other]

LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff

Authors: Longbo Huang, Scott Moeller, Michael J. Neely, Bhaskar Krishnamachari

Abstract: There has been considerable recent work developing a new stochastic network utility maximization framework using Backpressure algorithms, also known as MaxWeight. A key open problem has been the development of utility-optimal algorithms that are also delay efficient. In this paper, we show that the Backpressure algorithm, when combined with the LIFO queueing discipline (called LIFO-Backpressure),… ▽ More There has been considerable recent work developing a new stochastic network utility maximization framework using Backpressure algorithms, also known as MaxWeight. A key open problem has been the development of utility-optimal algorithms that are also delay efficient. In this paper, we show that the Backpressure algorithm, when combined with the LIFO queueing discipline (called LIFO-Backpressure), is able to achieve a utility that is within $O(1/V)$ of the optimal value, while maintaining an average delay of $O([\log(V)]^2)$ for all but a tiny fraction of the network traffic. This result holds for general stochastic network optimization problems and general Markovian dynamics. Remarkably, the performance of LIFO-Backpressure can be achieved by simply changing the queueing discipline; it requires no other modifications of the original Backpressure algorithm. We validate the results through empirical measurements from a sensor network testbed, which show good match between theory and practice. △ Less

Submitted 3 April, 2011; v1 submitted 28 August, 2010; originally announced August 2010.

Showing 1–30 of 30 results for author: Moeller, S