Skip to main content

Showing 1–11 of 11 results for author: Koehler, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.10757  [pdf, other

    eess.IV cs.CV physics.med-ph

    Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment

    Authors: Fabian Drexel, Vasiliki Sideri-Lampretsa, Henriette Bast, Alexander W. Marka, Thomas Koehler, Florian T. Gassert, Daniela Pfeiffer, Daniel Rueckert, Franz Pfeiffer

    Abstract: Dark-field radiography of the human chest has been demonstrated to have promising potential for the analysis of the lung microstructure and the diagnosis of respiratory diseases. However, previous studies of dark-field chest radiographs evaluated the lung signal only in the inspiratory breathing state. Our work aims to add a new perspective to these previous assessments by locally comparing dark-f… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

    Comments: 10 pages, 6 figures

  2. arXiv:2401.10460  [pdf, other

    cs.SD cs.LG eess.AS

    Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

    Authors: Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He

    Abstract: Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder o… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted for ICASSP 2024

  3. arXiv:2210.16045  [pdf, other

    cs.SD cs.CL eess.AS

    Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

    Authors: Jason Fong, Yun Wang, Prabhav Agrawal, Vimal Manohar, Jilong Wu, Thilo Köhler, Qing He

    Abstract: Text-based voice editing (TBVE) uses synthetic output from text-to-speech (TTS) systems to replace words in an original recording. Recent work has used neural models to produce edited speech that is similar to the original speech in terms of clarity, speaker identity, and prosody. However, one limitation of prior work is the usage of finetuning to optimise performance: this requires further model… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  4. arXiv:2205.11952  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    3D helical CT Reconstruction with a Memory Efficient Learned Primal-Dual Architecture

    Authors: Jevgenija Rudzusika, Buda Bajić, Thomas Koehler, Ozan Öktem

    Abstract: Deep learning based computed tomography (CT) reconstruction has demonstrated outstanding performance on simulated 2D low-dose CT data. This applies in particular to domain adapted neural networks, which incorporate a handcrafted physics model for CT imaging. Empirical evidence shows that employing such architectures reduces the demand for training data and improves upon generalisation. However, th… ▽ More

    Submitted 28 November, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

  5. arXiv:2108.11730  [pdf, other

    stat.ML cs.CV cs.LG cs.NE eess.IV math.OC

    Deep learning based dictionary learning and tomographic image reconstruction

    Authors: Jevgenija Rudzusika, Thomas Koehler, Ozan Öktem

    Abstract: This work presents an approach for image reconstruction in clinical low-dose tomography that combines principles from sparse signal processing with ideas from deep learning. First, we describe sparse signal representation in terms of dictionaries from a statistical perspective and interpret dictionary learning as a process of aligning distribution that arises from a generative model with empirical… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: 34 pages, 5 figures

    Journal ref: SIAM Journal on Imaging Sciences, Vol 15, Iss 4. (2002)

  6. arXiv:2104.00705  [pdf, other

    cs.SD cs.AI eess.AS

    Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling

    Authors: Qing He, Zhiping Xiu, Thilo Koehler, Jilong Wu

    Abstract: Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio. High-quality spectrum models usually incorporate the encoder-decoder architecture with self-attention or bi-directional long short-term (BLSTM) units. While these models can produce high quality speech,… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  7. arXiv:2011.12985  [pdf, other

    cs.SD cs.LG eess.AS

    FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

    Authors: Bichen Wu, Qing He, Peizhao Zhang, Thilo Koehler, Kurt Keutzer, Peter Vajda

    Abstract: Nowadays more and more applications can benefit from edge-based text-to-speech (TTS). However, most existing TTS models are too computationally expensive and are not flexible enough to be deployed on the diverse variety of edge devices with their equally diverse computational capacities. To address this, we propose FBWave, a family of efficient and scalable neural vocoders that can achieve optimal… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  8. arXiv:2011.03542  [pdf, other

    physics.med-ph eess.IV physics.optics

    X-ray dark-field signal reduction due to hardening of the visibility spectrum

    Authors: Fabio De Marco, Jana Andrejewski, Theresa Urban, Konstantin Willer, Lukas Gromann, Thomas Koehler, Hanns-Ingo Maack, Julia Herzen, Franz Pfeiffer

    Abstract: X-ray dark-field imaging enables a spatially-resolved visualization of ultra-small-angle X-ray scattering. Using phantom measurements, we demonstrate that a material's effective dark-field signal may be reduced by modification of the visibility spectrum by other dark-field-active objects in the beam. This is the dark-field equivalent of conventional beam-hardening, and is distinct from related, kn… ▽ More

    Submitted 3 November, 2023; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: 13 pages, 6 figures. Submitted to IEEE Transactions on Medical Imaging

  9. arXiv:2002.06758  [pdf, other

    cs.SD eess.AS

    Interactive Text-to-Speech System via Joint Style Analysis

    Authors: Yang Gao, Weiyi Zheng, Zhaojun Yang, Thilo Kohler, Christian Fuegen, Qing He

    Abstract: While modern TTS technologies have made significant advancements in audio quality, there is still a lack of behavior naturalness compared to conversing with people. We propose a style-embedded TTS system that generates styled responses based on the speech query style. To achieve this, the system includes a style extraction model that extracts a style embedding from the speech query, which is then… ▽ More

    Submitted 21 September, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

    Comments: Accepted by Interspeech 2020

  10. arXiv:1911.04762  [pdf, other

    eess.IV cs.CV

    Merging-ISP: Multi-Exposure High Dynamic Range Image Signal Processing

    Authors: Prashant Chaudhari, Franziska Schirrmacher, Andreas Maier, Christian Riess, Thomas Köhler

    Abstract: High dynamic range (HDR) imaging combines multiple images with different exposure times into a single high-quality image. The image signal processing pipeline (ISP) is a core component in digital cameras to perform these operations. It includes demosaicing of raw color filter array (CFA) data at different exposure times, alignment of the exposures, conversion to HDR domain, and exposure merging in… ▽ More

    Submitted 4 October, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Computational Photography, DAGM GCPR 2021

  11. arXiv:1910.12612  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR

    Authors: Duc Le, Thilo Koehler, Christian Fuegen, Michael L. Seltzer

    Abstract: Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English. However, graphemic ASR still has problems with rare long-tail words that do not follow the standard spelling conventions seen in training, such as entity names. In this work, we present a novel… ▽ More

    Submitted 13 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: To appear at ICASSP 2020