Skip to main content

Showing 1–8 of 8 results for author: El-Khamy, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2005.01996  [pdf, other

    eess.IV cs.CV

    NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

    Authors: Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun, Wei Deng, Mostafa El-Khamy, Chiu Man Ho, Xiaozhong Ji, Amin Kheradmand, Gwantae Kim, Hanseok Ko, Kanghyu Lee, Jungwon Lee, Hao Li, Ziluan Liu, Zhi-Song Liu, Shuai Liu, Yunhua Lu, Zibo Meng , et al. (21 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Proc… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

  2. arXiv:2003.00830  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    GSANet: Semantic Segmentation with Global and Selective Attention

    Authors: Qingfeng Liu, Mostafa El-Khamy, Dongwoon Bai, Jungwon Lee

    Abstract: This paper proposes a novel deep learning architecture for semantic segmentation. The proposed Global and Selective Attention Network (GSANet) features Atrous Spatial Pyramid Pooling (ASPP) with a novel sparsemax global attention and a novel selective attention that deploys a condensation and diffusion mechanism to aggregate the multi-scale contextual information from the extracted deep features.… ▽ More

    Submitted 13 February, 2020; originally announced March 2020.

  3. arXiv:1910.10707  [pdf, other

    cs.SD eess.AS

    End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

    Authors: Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee

    Abstract: Although supervised learning based on a deep neural network has recently achieved substantial improvement on speech enhancement, the existing schemes have either of two critical issues: spectrum or metric mismatches. The spectrum mismatch is a well known issue that any spectrum modification after short-time Fourier transform (STFT), in general, cannot be fully recovered after inverse short-time Fo… ▽ More

    Submitted 5 May, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: 5 pages, submitted to Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:1901.09146

  4. arXiv:1910.06762  [pdf, other

    eess.AS cs.SD

    T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement

    Authors: Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee

    Abstract: Transformer neural networks (TNN) demonstrated state-of-art performance on many natural language processing (NLP) tasks, replacing recurrent neural networks (RNNs), such as LSTMs or GRUs. However, TNNs did not perform well in speech enhancement, whose contextual nature is different than NLP tasks, like machine translation. Self-attention is a core building block of the Transformer, which not only… ▽ More

    Submitted 11 February, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

    Comments: 5 pages, Submitted to ICASSP 2020

  5. arXiv:1909.04802  [pdf, other

    eess.IV cs.CV

    Variable Rate Deep Image Compression With a Conditional Autoencoder

    Authors: Yoojin Choi, Mostafa El-Khamy, Jungwon Lee

    Abstract: In this paper, we propose a novel variable-rate learned image compression framework with a conditional autoencoder. Previous learning-based image compression methods mostly require training separate networks for different compression rates so they can yield compressed images of varying quality. In contrast, we train and deploy only one variable-rate image compression network implemented with a con… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  6. arXiv:1901.09146  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

    Authors: Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee

    Abstract: Supervised learning based on a deep neural network recently has achieved substantial improvement on speech enhancement. Denoising networks learn mapping from noisy speech to clean one directly, or to a spectrum mask which is the ratio between clean and noisy spectra. In either case, the network is optimized by minimizing mean square error (MSE) between ground-truth labels and time-domain or spectr… ▽ More

    Submitted 8 March, 2023; v1 submitted 25 January, 2019; originally announced January 2019.

  7. arXiv:1810.06766  [pdf, other

    eess.IV cs.CV

    DN-ResNet: Efficient Deep Residual Network for Image Denoising

    Authors: Haoyu Ren, Mostafa El-Khamy, Jungwon Lee

    Abstract: A deep learning approach to blind denoising of images without complete knowledge of the noise statistics is considered. We propose DN-ResNet, which is a deep convolutional neural network (CNN) consisting of several residual blocks (ResBlocks). With cascade training, DN-ResNet is more accurate and more computationally efficient than the state of art denoising networks. An edge-aware loss function i… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

    Journal ref: Asian Conference of Computer Vision 2018

  8. arXiv:1710.10224  [pdf, other

    cs.CL cs.SD eess.AS

    BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition

    Authors: Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee

    Abstract: Despite the remarkable progress achieved on automatic speech recognition, recognizing far-field speeches mixed with various noise sources is still a challenging task. In this paper, we introduce novel student-teacher transfer learning, BridgeNet which can provide a solution to improve distant speech recognition. There are two key features in BridgeNet. First, BridgeNet extends traditional student-… ▽ More

    Submitted 21 February, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

    Comments: Accepted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)