-
Riemannian Time Warping: Multiple Sequence Alignment in Curved Spaces
Authors:
Julian Richter,
Christopher Erdös,
Christian Scheurer,
Jochen J. Steil,
Niels Dehio
Abstract:
Temporal alignment of multiple signals through time warping is crucial in many fields, such as classification within speech recognition or robot motion learning. Almost all related works are limited to data in Euclidean space. Although an attempt was made in 2011 to adapt this concept to unit quaternions, a general extension to Riemannian manifolds remains absent. Given its importance for numerous…
▽ More
Temporal alignment of multiple signals through time warping is crucial in many fields, such as classification within speech recognition or robot motion learning. Almost all related works are limited to data in Euclidean space. Although an attempt was made in 2011 to adapt this concept to unit quaternions, a general extension to Riemannian manifolds remains absent. Given its importance for numerous applications in robotics and beyond, we introduce Riemannian Time Warping~(RTW). This novel approach efficiently aligns multiple signals by considering the geometric structure of the Riemannian manifold in which the data is embedded. Extensive experiments on synthetic and real-world data, including tests with an LBR iiwa robot, demonstrate that RTW consistently outperforms state-of-the-art baselines in both averaging and classification tasks.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
ReverbFX: A Dataset of Room Impulse Responses Derived from Reverb Effect Plugins for Singing Voice Dereverberation
Authors:
Julius Richter,
Till Svajda,
Timo Gerkmann
Abstract:
We present ReverbFX, a new room impulse response (RIR) dataset designed for singing voice dereverberation research. Unlike existing datasets based on real recorded RIRs, ReverbFX features a diverse collection of RIRs captured from various reverb audio effect plugins commonly used in music production. We conduct comprehensive experiments using the proposed dataset to benchmark the challenge of dere…
▽ More
We present ReverbFX, a new room impulse response (RIR) dataset designed for singing voice dereverberation research. Unlike existing datasets based on real recorded RIRs, ReverbFX features a diverse collection of RIRs captured from various reverb audio effect plugins commonly used in music production. We conduct comprehensive experiments using the proposed dataset to benchmark the challenge of dereverberation of singing voice recordings affected by artificial reverbs. We train two state-of-the-art generative models using ReverbFX and demonstrate that models trained with plugin-derived RIRs outperform those trained on realistic RIRs in artificial reverb scenarios.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
Authors:
Danilo de Oliveira,
Julius Richter,
Tal Peer,
Timo Gerkmann
Abstract:
We present LipDiffuser, a conditional diffusion model for lip-to-speech generation synthesizing natural and intelligible speech directly from silent video recordings. Our approach leverages the magnitude-preserving ablated diffusion model (MP-ADM) architecture as a denoiser model. To effectively condition the model, we incorporate visual features using magnitude-preserving feature-wise linear modu…
▽ More
We present LipDiffuser, a conditional diffusion model for lip-to-speech generation synthesizing natural and intelligible speech directly from silent video recordings. Our approach leverages the magnitude-preserving ablated diffusion model (MP-ADM) architecture as a denoiser model. To effectively condition the model, we incorporate visual features using magnitude-preserving feature-wise linear modulation (MP-FiLM) alongside speaker embeddings. A neural vocoder then reconstructs the speech waveform from the generated mel-spectrograms. Evaluations on LRS3 and TCD-TIMIT demonstrate that LipDiffuser outperforms existing lip-to-speech baselines in perceptual speech quality and speaker similarity, while remaining competitive in downstream automatic speech recognition (ASR). These findings are also supported by a formal listening experiment. Extensive ablation studies and cross-dataset evaluation confirm the effectiveness and generalization capabilities of our approach.
△ Less
Submitted 26 May, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
Normalize Everything: A Preconditioned Magnitude-Preserving Architecture for Diffusion-Based Speech Enhancement
Authors:
Julius Richter,
Danilo de Oliveira,
Timo Gerkmann
Abstract:
This paper presents a new framework for diffusion-based speech enhancement. Our method employs a Schroedinger bridge to transform the noisy speech distribution into the clean speech distribution. To stabilize and improve training, we employ time-dependent scalings of the inputs and outputs of the network, known as preconditioning. We consider two skip connection configurations, which either includ…
▽ More
This paper presents a new framework for diffusion-based speech enhancement. Our method employs a Schroedinger bridge to transform the noisy speech distribution into the clean speech distribution. To stabilize and improve training, we employ time-dependent scalings of the inputs and outputs of the network, known as preconditioning. We consider two skip connection configurations, which either include or omit the current process state in the denoiser's output, enabling the network to predict either environmental noise or clean speech. Each approach leads to improved performance on different speech enhancement metrics. To maintain stable magnitude levels and balance during training, we use a magnitude-preserving network architecture that normalizes all activations and network weights to unit length. Additionally, we propose learning the contribution of the noisy input within each network block for effective input conditioning. After training, we apply a method to approximate different exponential moving average (EMA) profiles and investigate their effects on the speech enhancement performance. In contrast to image generation tasks, where longer EMA lengths often enhance mode coverage, we observe that shorter EMA lengths consistently lead to better performance on standard speech enhancement metrics. Code, audio examples, and checkpoints are available online.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
Authors:
Danilo de Oliveira,
Julius Richter,
Jean-Marie Lemercier,
Simon Welker,
Timo Gerkmann
Abstract:
Diffusion models have found great success in generating high quality, natural samples of speech, but their potential for density estimation for speech has so far remained largely unexplored. In this work, we leverage an unconditional diffusion model trained only on clean speech for the assessment of speech quality. We show that the quality of a speech utterance can be assessed by estimating the li…
▽ More
Diffusion models have found great success in generating high quality, natural samples of speech, but their potential for density estimation for speech has so far remained largely unexplored. In this work, we leverage an unconditional diffusion model trained only on clean speech for the assessment of speech quality. We show that the quality of a speech utterance can be assessed by estimating the likelihood of a corresponding sample in the terminating Gaussian distribution, obtained via a deterministic noising process. The resulting method is purely unsupervised, trained only on clean speech, and therefore does not rely on annotations. Our diffusion-based approach leverages clean speech priors to assess quality based on how the input relates to the learned distribution of clean data. Our proposed log-likelihoods show promising results, correlating well with intrusive speech quality metrics and showing the best correlation with human scores in a listening experiment.
△ Less
Submitted 13 June, 2025; v1 submitted 23 October, 2024;
originally announced October 2024.
-
Investigating Training Objectives for Generative Speech Enhancement
Authors:
Julius Richter,
Danilo de Oliveira,
Timo Gerkmann
Abstract:
Generative speech enhancement has recently shown promising advancements in improving speech quality in noisy environments. Multiple diffusion-based frameworks exist, each employing distinct training objectives and learning techniques. This paper aims to explain the differences between these frameworks by focusing our investigation on score-based generative models and the Schrödinger bridge. We con…
▽ More
Generative speech enhancement has recently shown promising advancements in improving speech quality in noisy environments. Multiple diffusion-based frameworks exist, each employing distinct training objectives and learning techniques. This paper aims to explain the differences between these frameworks by focusing our investigation on score-based generative models and the Schrödinger bridge. We conduct a series of comprehensive experiments to compare their performance and highlight differing training behaviors. Furthermore, we propose a novel perceptual loss function tailored for the Schrödinger bridge framework, demonstrating enhanced performance and improved perceptual quality of the enhanced speech signals. All experimental code and pre-trained models are publicly available to facilitate further research and development in this domain.
△ Less
Submitted 18 January, 2025; v1 submitted 16 September, 2024;
originally announced September 2024.
-
Multi-Objective Global Path Planning for Lunar Exploration With a Quadruped Robot
Authors:
Julia Richter,
Hendrik Kolvenbach,
Giorgio Valsecchi,
Marco Hutter
Abstract:
In unstructured environments the best path is not always the shortest, but needs to consider various objectives like energy efficiency, risk of failure or scientific outcome. This paper proposes a global planner, based on the A* algorithm, capable of individually considering multiple layers of map data for different cost objectives. We introduce weights between the objectives, which can be adapted…
▽ More
In unstructured environments the best path is not always the shortest, but needs to consider various objectives like energy efficiency, risk of failure or scientific outcome. This paper proposes a global planner, based on the A* algorithm, capable of individually considering multiple layers of map data for different cost objectives. We introduce weights between the objectives, which can be adapted to achieve a variety of optimal paths. In order to find the best of these paths, a tool for statistical path analysis is presented. Our planner was tested on exemplary lunar topographies to propose two trajectories for exploring the Aristarchus Plateau. The optimized paths significantly reduce the risk of failure while yielding more scientific value compared to a manually planned paths in the same area. The planner and analysis tool are made open-source in order to simplify mission planning for planetary scientists.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Authors:
Julius Richter,
Yi-Chiao Wu,
Steven Krenn,
Simon Welker,
Bunlong Lay,
Shinji Watanabe,
Alexander Richard,
Timo Gerkmann
Abstract:
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various m…
▽ More
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.
△ Less
Submitted 11 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
Authors:
Danilo de Oliveira,
Simon Welker,
Julius Richter,
Timo Gerkmann
Abstract:
To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for…
▽ More
To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for evaluation. For this, we introduce enhancement models that exploit the widely used PESQ measure. Our "PESQetarian" model achieves 3.82 PESQ on VB-DMD while scoring very poorly in a listening experiment. While the obtained PESQ value of 3.82 would imply "state-of-the-art" PESQ-performance on the VB-DMD benchmark, our examples show that when optimizing w.r.t. a metric, an isolated evaluation on the same metric may be misleading. Instead, other metrics should be included in the evaluation and the resulting performance predictions should be confirmed by listening.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Bayesian Windkessel calibration using optimized 0D surrogate models
Authors:
Jakob Richter,
Jonas Nitzler,
Luca Pegolotti,
Karthik Menon,
Jonas Biehler,
Wolfgang A. Wall,
Daniele E. Schiavazzi,
Alison L. Marsden,
Martin R. Pfaller
Abstract:
Boundary condition (BC) calibration to assimilate clinical measurements is an essential step in any subject-specific simulation of cardiovascular fluid dynamics. Bayesian calibration approaches have successfully quantified the uncertainties inherent in identified parameters. Yet, routinely estimating the posterior distribution for all BC parameters in 3D simulations has been unattainable due to th…
▽ More
Boundary condition (BC) calibration to assimilate clinical measurements is an essential step in any subject-specific simulation of cardiovascular fluid dynamics. Bayesian calibration approaches have successfully quantified the uncertainties inherent in identified parameters. Yet, routinely estimating the posterior distribution for all BC parameters in 3D simulations has been unattainable due to the infeasible computational demand. We propose an efficient method to identify Windkessel parameter posteriors using results from a single high-fidelity three-dimensional (3D) model evaluation. We only evaluate the 3D model once for an initial choice of BCs and use the result to create a highly accurate zero-dimensional (0D) surrogate. We then perform Sequential Monte Carlo (SMC) using the optimized 0D model to derive the high-dimensional Windkessel BC posterior distribution. We validate this approach in a publicly available dataset of N=72 subject-specific vascular models. We found that optimizing 0D models to match 3D data a priori lowered their median approximation error by nearly one order of magnitude. In a subset of models, we confirm that the optimized 0D models still generalize to a wide range of BCs. Finally, we present the high-dimensional Windkessel parameter posterior for different measured signal-to-noise ratios in a vascular model using SMC. We further validate that the 0D-derived posterior is a good approximation of the 3D posterior. The minimal computational demand of our method using a single 3D simulation, combined with the open-source nature of all software and data used in this work, will increase access and efficiency of Bayesian Windkessel calibration in cardiovascular fluid dynamics simulations.
△ Less
Submitted 29 July, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Diffusion Models for Audio Restoration
Authors:
Jean-Marie Lemercier,
Julius Richter,
Simon Welker,
Eloi Moliner,
Vesa Välimäki,
Timo Gerkmann
Abstract:
With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to reco…
▽ More
With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to recover clean sound signals from the corrupted input data. We present here audio restoration algorithms based on diffusion models, with a focus on speech enhancement and music restoration tasks. Traditional approaches, often grounded in handcrafted rules and statistical heuristics, have shaped our understanding of audio signals. In the past decades, there has been a notable shift towards data-driven methods that exploit the modeling capabilities of DNNs. Deep generative models, and among them diffusion models, have emerged as powerful techniques for learning complex data distributions. However, relying solely on DNN-based learning approaches carries the risk of reducing interpretability, particularly when employing end-to-end models. Nonetheless, data-driven approaches allow more flexibility in comparison to statistical model-based frameworks, whose performance depends on distributional and statistical assumptions that can be difficult to guarantee. Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality. We explain the diffusion formalism and its application to the conditional generation of clean audio signals. We believe that diffusion models open an exciting field of research with the potential to spawn new audio restoration algorithms that are natural-sounding and remain robust in difficult acoustic situations.
△ Less
Submitted 11 November, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Task-Parameterized Imitation Learning with Time-Sensitive Constraints
Authors:
Julian Richter,
João Oliveira,
Christian Scheurer,
Jochen Steil,
Niels Dehio
Abstract:
Programming a robot manipulator should be as intuitive as possible. To achieve that, the paradigm of teaching motion skills by providing few demonstrations has become widely popular in recent years. Probabilistic versions thereof take into account the uncertainty given by the distribution of the training data. However, precise execution of start-, via-, and end-poses at given times can not always…
▽ More
Programming a robot manipulator should be as intuitive as possible. To achieve that, the paradigm of teaching motion skills by providing few demonstrations has become widely popular in recent years. Probabilistic versions thereof take into account the uncertainty given by the distribution of the training data. However, precise execution of start-, via-, and end-poses at given times can not always be guaranteed. This limits the technology transfer to industrial application. To address this problem, we propose a novel constrained formulation of the Expectation Maximization algorithm for learning Gaussian Mixture Models (GMM) on Riemannian Manifolds. Our approach applies to probabilistic imitation learning and extends also to the well-established TP-GMM framework with Task-Parameterization. It allows to prescribe end-effector poses at defined execution times, for instance for precise pick & place scenarios. The probabilistic approach is compared with state-of-the-art learning-from-demonstration methods using the KUKA LBR iiwa robot. The reader is encouraged to watch the accompanying video available at https://youtu.be/JMI1YxtN9C0
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
A Probabilistic Neural Twin for Treatment Planning in Peripheral Pulmonary Artery Stenosis
Authors:
John D. Lee,
Jakob Richter,
Martin R. Pfaller,
Jason M. Szafron,
Karthik Menon,
Andrea Zanoni,
Michael R. Ma,
Jeffrey A. Feinstein,
Jacqueline Kreutzer,
Alison L. Marsden,
Daniele E. Schiavazzi
Abstract:
The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an…
▽ More
The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an application to the repair of multiple stenosis in peripheral pulmonary artery disease through either transcatheter pulmonary artery rehabilitation or surgery, where it is of interest to achieve desired pressures and flows at specific locations in the pulmonary artery tree, while minimizing the risk for the patient. Since different degrees of success can be achieved in practice during treatment, we formulate the problem in probability, and solve it through a sample-based approach. We propose a new offline-online pipeline for probabilsitic real-time treatment planning which combines offline assimilation of boundary conditions, model reduction, and training dataset generation with online estimation of marginal probabilities, possibly conditioned on the degree of augmentation observed in already repaired lesions. Moreover, we propose a new approach for the parametrization of arbitrarily shaped vascular repairs through iterative corrections of a zero-dimensional approximant. We demonstrate this pipeline for a diseased model of the pulmonary artery tree available through the Vascular Model Repository.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Single and Few-step Diffusion for Generative Speech Enhancement
Authors:
Bunlong Lay,
Jean-Marie Lemercier,
Julius Richter,
Timo Gerkmann
Abstract:
Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score estimation is called multiple times to solve the iterative reverse process. This results in a slow inference process and causes discretization errors that accumulate…
▽ More
Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score estimation is called multiple times to solve the iterative reverse process. This results in a slow inference process and causes discretization errors that accumulate over the sampling trajectory. In this paper, we address these limitations through a two-stage training approach. In the first stage, we train the diffusion model the usual way using the generative denoising score matching loss. In the second stage, we compute the enhanced signal by solving the reverse process and compare the resulting estimate to the clean speech target using a predictive loss. We show that using this second training stage enables achieving the same performance as the baseline model using only 5 function evaluations instead of 60 function evaluations. While the performance of usual generative diffusion algorithms drops dramatically when lowering the number of function evaluations (NFEs) to obtain single-step diffusion, we show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting and also generalizes better than its predictive counterpart.
△ Less
Submitted 15 January, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.
-
In silico high-resolution whole lung model to predict the locally delivered dose of inhaled drugs
Authors:
Maximilian J. Grill,
Jonas Biehler,
Karl-Robert Wichmann,
David Rudlstorfer,
Maximilian Rixner,
Marie Brei,
Jakob Richter,
Joshua Bügel,
Nina Pischke,
Wolfgang A. Wall,
Kei W. Müller
Abstract:
The big crux with drug delivery to human lungs is that the delivered dose at the local site of action is unpredictable and very difficult to measure, even a posteriori. It is highly subject-specific as it depends on lung morphology, disease, breathing, and aerosol characteristics. Given these challenges, computational approaches have shown potential, but have so far failed due to fundamental metho…
▽ More
The big crux with drug delivery to human lungs is that the delivered dose at the local site of action is unpredictable and very difficult to measure, even a posteriori. It is highly subject-specific as it depends on lung morphology, disease, breathing, and aerosol characteristics. Given these challenges, computational approaches have shown potential, but have so far failed due to fundamental methodical limitations. We present and validate a novel in silico model that enables the subject-specific prediction of local aerosol deposition throughout the entire lung. Its unprecedented spatiotemporal resolution allows to track each aerosol particle anytime during the breathing cycle, anywhere in the complete system of conducting airways and the alveolar region. Predictions are shown to be in excellent agreement with in vivo SPECT/CT data for a healthy human cohort. We further showcase the model's capabilities to represent strong heterogeneities in diseased lungs by studying an IPF patient. Finally, high computational efficiency and automated model generation and calibration ensure readiness to be applied at scale. We envision our method not only to improve inhalation therapies by informing and accelerating all stages of (pre-)clinical drug and device development, but also as a more-than-equivalent alternative to nuclear imaging of the lungs.
△ Less
Submitted 11 July, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings
Authors:
Danilo de Oliveira,
Julius Richter,
Jean-Marie Lemercier,
Tal Peer,
Timo Gerkmann
Abstract:
Since its inception, the field of deep speech enhancement has been dominated by predictive (discriminative) approaches, such as spectral mapping or masking. Recently, however, novel generative approaches have been applied to speech enhancement, attaining good denoising performance with high subjective quality scores. At the same time, advances in deep learning also allowed for the creation of neur…
▽ More
Since its inception, the field of deep speech enhancement has been dominated by predictive (discriminative) approaches, such as spectral mapping or masking. Recently, however, novel generative approaches have been applied to speech enhancement, attaining good denoising performance with high subjective quality scores. At the same time, advances in deep learning also allowed for the creation of neural network-based metrics, which have desirable traits such as being able to work without a reference (non-intrusively). Since generatively enhanced speech tends to exhibit radically different residual distortions, its evaluation using instrumental speech metrics may behave differently compared to predictively enhanced speech. In this paper, we evaluate the performance of the same speech enhancement backbone trained under predictive and generative paradigms on a variety of metrics and show that intrusive and non-intrusive measures correlate differently for each paradigm. This analysis motivates the search for metrics that can together paint a complete and unbiased picture of speech enhancement performance, irrespective of the model's training process.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Audio-Visual Speech Enhancement with Score-Based Generative Models
Authors:
Julius Richter,
Simone Frintrop,
Timo Gerkmann
Abstract:
This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information. In particular, we exploit audio-visual embeddings obtained from a self-super\-vised learning model that has been fine-tuned on lipreading. The layer-wise features of its transformer-based encoder are aggregated, time-aligne…
▽ More
This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information. In particular, we exploit audio-visual embeddings obtained from a self-super\-vised learning model that has been fine-tuned on lipreading. The layer-wise features of its transformer-based encoder are aggregated, time-aligned, and incorporated into the noise conditional score network. Experimental evaluations show that the proposed audio-visual speech enhancement system yields improved speech quality and reduces generative artifacts such as phonetic confusions with respect to the audio-only equivalent. The latter is supported by the word error rate of a downstream automatic speech recognition model, which decreases noticeably, especially at low input signal-to-noise ratios.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
Authors:
Héctor Martel,
Julius Richter,
Kai Li,
Xiaolin Hu,
Timo Gerkmann
Abstract:
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an…
▽ More
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an audio branch and a video branch, with iterative A-FRCNN blocks sharing weights for each modality. We evaluated our model in a controlled environment using the NTCD-TIMIT dataset and in-the-wild using a synthetic dataset that combines LRS3 and WHAM!. The experiments demonstrate the superiority of our model in both settings with respect to various audio-only and audio-visual baselines. Furthermore, the reduced footprint of our model makes it suitable for low resource applications.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
Speech Signal Improvement Using Causal Generative Diffusion Models
Authors:
Julius Richter,
Simon Welker,
Jean-Marie Lemercier,
Bunlong Lay,
Tal Peer,
Timo Gerkmann
Abstract:
In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. The method is based on a generative diffusion model which has been shown to work well in scenarios with missing data and non-linear corruptions. To guarantee causal processing, we modify the network architecture of our previous work and replace global normalization with ca…
▽ More
In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. The method is based on a generative diffusion model which has been shown to work well in scenarios with missing data and non-linear corruptions. To guarantee causal processing, we modify the network architecture of our previous work and replace global normalization with causal adaptive gain control. We generate diverse training data containing a broad range of distortions. This work was performed in the context of an "ICASSP Signal Processing Grand Challenge" and submitted to the non-real-time track of the "Speech Signal Improvement Challenge 2023", where it was ranked fifth.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement
Authors:
Bunlong Lay,
Simon Welker,
Julius Richter,
Timo Gerkmann
Abstract:
Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and…
▽ More
Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and thus only at an approximation of the noisy mixture. This results in a discrepancy between the terminating distribution of the forward process and the prior used for solving the reverse process at inference. In this paper, we address this discrepancy and propose a forward process based on a Brownian bridge. We show that such a process leads to a reduction of the mismatch compared to previous diffusion processes. More importantly, we show that our approach improves in objective metrics over the baseline process with only half of the iteration steps and having one hyperparameter less to tune.
△ Less
Submitted 30 May, 2023; v1 submitted 28 February, 2023;
originally announced February 2023.
-
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
Authors:
Jean-Marie Lemercier,
Julius Richter,
Simon Welker,
Timo Gerkmann
Abstract:
Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to r…
▽ More
Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).
△ Less
Submitted 12 March, 2024; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration
Authors:
Jean-Marie Lemercier,
Julius Richter,
Simon Welker,
Timo Gerkmann
Abstract:
Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech enhancement and dereverberation. While discriminative models have traditionally been argued to be more powerful e.g. for speech enhancement, generative diffusion approac…
▽ More
Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech enhancement and dereverberation. While discriminative models have traditionally been argued to be more powerful e.g. for speech enhancement, generative diffusion approaches have recently been shown to narrow this performance gap considerably. In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks. For this, we extend our prior contributions on diffusion-based speech enhancement in the complex time-frequency domain to the task of bandwith extension. We then compare it to a discriminatively trained neural network with the same network architecture on three restoration tasks, namely speech denoising, dereverberation and bandwidth extension. We observe that the generative approach performs globally better than its discriminative counterpart on all tasks, with the strongest benefit for non-additive distortion models, like in dereverberation and bandwidth extension. Code and audio examples can be found online at https://uhh.de/inf-sp-sgmsemultitask
△ Less
Submitted 16 March, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
Authors:
Julius Richter,
Simon Welker,
Jean-Marie Lemercier,
Bunlong Lay,
Timo Gerkmann
Abstract:
In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve into an extensive theoretical examination of its implications. Opposed to usual conditional generation tasks, we do not start the reverse process from pure Gaussia…
▽ More
In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve into an extensive theoretical examination of its implications. Opposed to usual conditional generation tasks, we do not start the reverse process from pure Gaussian noise but from a mixture of noisy speech and Gaussian noise. This matches our forward process which moves from clean speech to noisy speech by including a drift term. We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates. By adapting the network architecture, we are able to significantly improve the speech enhancement performance, indicating that the network, rather than the formalism, was the main limitation of our original approach. In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models and achieves better generalization when evaluating on a different corpus than used for training. We complement the results with an instrumental evaluation using real-world noisy recordings and a listening experiment, in which our proposed method is rated best. Examining different sampler configurations for solving the reverse process allows us to balance the performance and computational speed of the proposed method. Moreover, we show that the proposed method is also suitable for dereverberation and thus not limited to additive background noise removal. Code and audio examples are available online, see https://github.com/sp-uhh/sgmse
△ Less
Submitted 13 June, 2023; v1 submitted 11 August, 2022;
originally announced August 2022.
-
Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview
Authors:
Florian Karl,
Tobias Pielok,
Julia Moosbauer,
Florian Pfisterer,
Stefan Coors,
Martin Binder,
Lennart Schneider,
Janek Thomas,
Jakob Richter,
Michel Lang,
Eduardo C. Garrido-Merchán,
Juergen Branke,
Bernd Bischl
Abstract:
Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metric…
▽ More
Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies, both from the domain of evolutionary algorithms and Bayesian optimization. We illustrate the utility of MOO in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability and robustness.
△ Less
Submitted 6 June, 2024; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Understanding the Domain Gap in LiDAR Object Detection Networks
Authors:
Jasmine Richter,
Florian Faion,
Di Feng,
Paul Benedikt Becker,
Piotr Sielecki,
Claudius Glaeser
Abstract:
In order to make autonomous driving a reality, artificial neural networks have to work reliably in the open-world. However, the open-world is vast and continuously changing, so it is not technically feasible to collect and annotate training datasets which accurately represent this domain. Therefore, there are always domain gaps between training datasets and the open-world which must be understood.…
▽ More
In order to make autonomous driving a reality, artificial neural networks have to work reliably in the open-world. However, the open-world is vast and continuously changing, so it is not technically feasible to collect and annotate training datasets which accurately represent this domain. Therefore, there are always domain gaps between training datasets and the open-world which must be understood. In this work, we investigate the domain gaps between high-resolution and low-resolution LiDAR sensors in object detection networks. Using a unique dataset, which enables us to study sensor resolution domain gaps independent of other effects, we show two distinct domain gaps - an inference domain gap and a training domain gap. The inference domain gap is characterised by a strong dependence on the number of LiDAR points per object, while the training gap shows no such dependence. These fndings show that different approaches are required to close these inference and training domain gaps.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain
Authors:
Simon Welker,
Julius Richter,
Timo Gerkmann
Abstract:
Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We deriv…
▽ More
Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We derive this training task within the formalism of stochastic differential equations (SDEs), thereby enabling the use of predictor-corrector samplers. We provide alternative formulations inspired by previous publications on using generative diffusion models for speech enhancement, avoiding the need for any prior assumptions on the noise distribution and making the training task purely generative which, as we show, results in improved enhancement performance.
△ Less
Submitted 7 July, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges
Authors:
Bernd Bischl,
Martin Binder,
Michel Lang,
Tobias Pielok,
Jakob Richter,
Stefan Coors,
Janek Thomas,
Theresa Ullmann,
Marc Becker,
Anne-Laure Boulesteix,
Difan Deng,
Marius Lindauer
Abstract:
Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for superv…
▽ More
Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for supervised machine learning, can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing. It gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with ML pipelines, runtime improvements, and parallelization. This work is accompanied by an appendix that contains information on specific software packages in R and Python, as well as information and recommended hyperparameter search spaces for specific learning algorithms. We also provide notebooks that demonstrate concepts from this work as supplementary files.
△ Less
Submitted 24 November, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Online and Real-Time Tracking in a Surveillance Scenario
Authors:
Oliver Urbann,
Oliver Bredtmann,
Maximilian Otten,
Jan-Philip Richter,
Thilo Bauer,
David Zibriczky
Abstract:
This paper presents an approach for tracking in a surveillance scenario. Typical aspects for this scenario are a 24/7 operation with a static camera mounted above the height of a human with many objects or people. The Multiple Object Tracking Benchmark 20 (MOT20) reflects this scenario best. We can show that our approach is real-time capable on this benchmark and outperforms all other real-time ca…
▽ More
This paper presents an approach for tracking in a surveillance scenario. Typical aspects for this scenario are a 24/7 operation with a static camera mounted above the height of a human with many objects or people. The Multiple Object Tracking Benchmark 20 (MOT20) reflects this scenario best. We can show that our approach is real-time capable on this benchmark and outperforms all other real-time capable approaches in HOTA, MOTA, and IDF1. We achieve this by contributing a fast Siamese network reformulated for linear runtime (instead of quadratic) to generate fingerprints from detections. Thus, it is possible to associate the detections to Kalman filters based on multiple tracking specific ratings: Cosine similarity of fingerprints, Intersection over Union, and pixel distance ratio in the image.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement
Authors:
Guillaume Carbajal,
Julius Richter,
Timo Gerkmann
Abstract:
Recently, the standard variational autoencoder has been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. Variational autoencoders have then been conditioned on a label describing a high-level speech attribute (e.g. speech activity) that allows for a more explicit control of speech generation. However, the label is not guarantee…
▽ More
Recently, the standard variational autoencoder has been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. Variational autoencoders have then been conditioned on a label describing a high-level speech attribute (e.g. speech activity) that allows for a more explicit control of speech generation. However, the label is not guaranteed to be disentangled from the other latent variables, which results in limited performance improvements compared to the standard variational autoencoder. In this work, we propose to use an adversarial training scheme for variational autoencoders to disentangle the label from the other latent variables. At training, we use a discriminator that competes with the encoder of the variational autoencoder. Simultaneously, we also use an additional encoder that estimates the label for the decoder of the variational autoencoder, which proves to be crucial to learn disentanglement. We show the benefit of the proposed disentanglement learning when a voice activity label, estimated from visual data, is used for speech enhancement.
△ Less
Submitted 3 August, 2021; v1 submitted 19 May, 2021;
originally announced May 2021.
-
Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier
Authors:
Guillaume Carbajal,
Julius Richter,
Timo Gerkmann
Abstract:
Recently, variational autoencoders have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. However, variational autoencoders are trained on clean speech only, which results in a limited ability of extracting the speech signal from noisy speech compared to supervised approaches. In this paper, we propose to guide the variatio…
▽ More
Recently, variational autoencoders have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. However, variational autoencoders are trained on clean speech only, which results in a limited ability of extracting the speech signal from noisy speech compared to supervised approaches. In this paper, we propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech. The estimated label is a high-level categorical variable describing the speech signal (e.g. speech activity) allowing for a more informed latent distribution compared to the standard variational autoencoder. We evaluate our method with different types of labels on real recordings of different noisy environments. Provided that the label better informs the latent distribution and that the classifier achieves good performance, the proposed approach outperforms the standard variational autoencoder and a conventional neural network-based supervised approach.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Deep semantic gaze embedding and scanpath comparison for expertise classification during OPT viewing
Authors:
Nora Castner,
Thomas Kübler,
Katharina Scheiter,
Juilane Richter,
Thérése Eder,
Fabian Hüttig,
Constanze Keutel,
Enkelejda Kasneci
Abstract:
Modeling eye movement indicative of expertise behavior is decisive in user evaluation. However, it is indisputable that task semantics affect gaze behavior. We present a novel approach to gaze scanpath comparison that incorporates convolutional neural networks (CNN) to process scene information at the fixation level. Image patches linked to respective fixations are used as input for a CNN and the…
▽ More
Modeling eye movement indicative of expertise behavior is decisive in user evaluation. However, it is indisputable that task semantics affect gaze behavior. We present a novel approach to gaze scanpath comparison that incorporates convolutional neural networks (CNN) to process scene information at the fixation level. Image patches linked to respective fixations are used as input for a CNN and the resulting feature vectors provide the temporal and spatial gaze information necessary for scanpath similarity comparison.We evaluated our proposed approach on gaze data from expert and novice dentists interpreting dental radiographs using a local alignment similarity score. Our approach was capable of distinguishing experts from novices with 93% accuracy while incorporating the image semantics. Moreover, our scanpath comparison using image patch features has the potential to incorporate task semantics from a variety of tasks
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Avocado: Open-Source Flexible Constrained Interaction Testing for Practical Application
Authors:
Jan Richter,
Bestoun S. Ahmed,
Miroslav Bures,
Cleber R. Rosa Junior
Abstract:
This paper presents the outcome of a research collaboration between academia and industry to implement and utilize the capabilities of constrained interaction testing for an open-source tool for industrial-scale application. The project helps promote flexibility in generating constrained interaction test suites, executing them, and setting up a test oracle to report them--all within the same tool…
▽ More
This paper presents the outcome of a research collaboration between academia and industry to implement and utilize the capabilities of constrained interaction testing for an open-source tool for industrial-scale application. The project helps promote flexibility in generating constrained interaction test suites, executing them, and setting up a test oracle to report them--all within the same tool called Avocado. Avocado employs a constraint solver with computational algorithms to generate constrained interaction test suites. The environment of the application under test can be set up to execute the generated test suite with minimum effort. A test oracle can be set up by the tool to report the status and the results of the executed test cases. Avocado represents a comprehensive and flexible solution for conducting combinatorial interaction testing (CIT) and constrained CIT on an industrial application. In this paper, we present the structure of the tool and our method of implementing the algorithms in detail.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data
Authors:
Patrick Schratz,
Jannes Muenchow,
Eugenia Iturritxa,
Jakob Richter,
Alexander Brenning
Abstract:
Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several pract…
▽ More
Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM). Different nested cross-validation methods including hyperparameter tuning methods are used to evaluate model performances with the aim to receive bias-reduced performance estimates. As a case study the spatial distribution of forest disease Diplodia sapinea in the Basque Country in Spain is investigated using common environmental variables such as temperature, precipitation, soil or lithology as predictors. Results show that GAM and RF (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) performance estimates of the GAM and RF are 0.167 (24%) and 0.213 (30%), respectively. It is recommended to also use spatial partitioning for cross-validation hyperparameter tuning of spatial data.
△ Less
Submitted 29 March, 2018;
originally announced March 2018.
-
mlr Tutorial
Authors:
Julia Schiffner,
Bernd Bischl,
Michel Lang,
Jakob Richter,
Zachary M. Jones,
Philipp Probst,
Florian Pfisterer,
Mason Gallo,
Dominik Kirchhoff,
Tobias Kühn,
Janek Thomas,
Lars Kotthoff
Abstract:
This document provides and in-depth introduction to the mlr framework for machine learning experiments in R.
This document provides and in-depth introduction to the mlr framework for machine learning experiments in R.
△ Less
Submitted 17 September, 2016;
originally announced September 2016.
-
Weak Secrecy in the Multi-Way Untrusted Relay Channel with Compute-and-Forward
Authors:
Johannes Richter,
Christian Scheunert,
Sabrina Engelmann,
Eduard A. Jorswieck
Abstract:
We investigate the problem of secure communications in a Gaussian multi-way relay channel applying the compute-and-forward scheme using nested lattice codes. All nodes employ half-duplex operation and can exchange confidential messages only via an untrusted relay. The relay is assumed to be honest but curious, i.e., an eavesdropper that conforms to the system rules and applies the intended relayin…
▽ More
We investigate the problem of secure communications in a Gaussian multi-way relay channel applying the compute-and-forward scheme using nested lattice codes. All nodes employ half-duplex operation and can exchange confidential messages only via an untrusted relay. The relay is assumed to be honest but curious, i.e., an eavesdropper that conforms to the system rules and applies the intended relaying scheme. We start with the general case of the single-input multiple-output (SIMO) L-user multi-way relay channel and provide an achievable secrecy rate region under a weak secrecy criterion. We show that the securely achievable sum rate is equivalent to the difference between the computation rate and the multiple access channel (MAC) capacity. Particularly, we show that all nodes must encode their messages such that the common computation rate tuple falls outside the MAC capacity region of the relay. We provide results for the single-input single-output (SISO) and the multiple-input single-input (MISO) L-user multi-way relay channel as well as the two-way relay channel. We discuss these results and show the dependency between channel realization and achievable secrecy rate. We further compare our result to available results in the literature for different schemes and show that the proposed scheme operates close to the compute-and-forward rate without secrecy.
△ Less
Submitted 23 June, 2014;
originally announced June 2014.
-
TerraService.NET: An Introduction to Web Services
Authors:
Tom Barclay,
Jim Gray,
Eric Strand,
Steve Ekblad,
Jeffrey Richter
Abstract:
This article explores the design and construction of a geo-spatial Internet web service application from the host web site perspective and from the perspective of an application using the web service. The TerraService.NET web service was added to the popular TerraServer database and web site with no major structural changes to the database. The article discusses web service design, implementatio…
▽ More
This article explores the design and construction of a geo-spatial Internet web service application from the host web site perspective and from the perspective of an application using the web service. The TerraService.NET web service was added to the popular TerraServer database and web site with no major structural changes to the database. The article discusses web service design, implementation, and deployment concepts and design guidelines. Web services enable applications that aggregate and interact with information and resources from Internet-scale distributed servers. The article presents the design of two USDA applications that interoperate with database and web service resources in Fort Collins Colorado and the TerraService web service located in Tukwila Washington.
△ Less
Submitted 7 August, 2002;
originally announced August 2002.