-
High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching
Authors:
Guoqiang Zhang,
Kenta Niwa,
J. P. Lewis,
Cedric Mesnage,
W. Bastiaan Kleijn
Abstract:
We introduce relative and absolute position matching (RAPM), a diffusion distillation method resulting in high quality generation that can be trained efficiently on a single GPU. Recent diffusion distillation research has achieved excellent results for high-resolution text-to-image generation with methods such as phased consistency models (PCM) and improved distribution matching distillation (DMD2…
▽ More
We introduce relative and absolute position matching (RAPM), a diffusion distillation method resulting in high quality generation that can be trained efficiently on a single GPU. Recent diffusion distillation research has achieved excellent results for high-resolution text-to-image generation with methods such as phased consistency models (PCM) and improved distribution matching distillation (DMD2). However, these methods generally require many GPUs (e.g.~8-64) and significant batchsizes (e.g.~128-2048) during training, resulting in memory and compute requirements that are beyond the resources of some researchers. RAPM provides effective single-GPU diffusion distillation training with a batchsize of 1. The new method attempts to mimic the sampling trajectories of the teacher model by matching the relative and absolute positions. The design of relative positions is inspired by PCM. Two discriminators are introduced accordingly in RAPM, one for matching relative positions and the other for absolute positions. Experimental results on StableDiffusion (SD) V1.5 and SDXL indicate that RAPM with 4 timesteps produces comparable FID scores as the best method with 1 timestep under very limited computational resources.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
Authors:
Marlon Tobaben,
Mohamed Ali Souibgui,
Rubèn Tito,
Khanh Nguyen,
Raouf Kerkouche,
Kangsoo Jung,
Joonas Jälkö,
Lei Kang,
Andrey Barsky,
Vincent Poulain d'Andecy,
Aurélie Joseph,
Aashiq Muhamed,
Kevin Kuo,
Virginia Smith,
Yusuke Yamasaki,
Takumi Fukami,
Kenta Niwa,
Iifan Tyou,
Hiro Ishii,
Rio Yokota,
Ragul N,
Rintu Kutum,
Josep Llados,
Ernest Valveny,
Antti Honkela
, et al. (2 additional authors not shown)
Abstract:
The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over…
▽ More
The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over the document images. Thereby, it brings together researchers and expertise from the document analysis, privacy, and federated learning communities. Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain, mimicking a typical federated invoice processing setup. The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality. Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold in track 1 and to protect all information from each document provider using differential privacy in track 2. The competition served as a new testbed for developing and testing private federated learning methods, simultaneously raising awareness about privacy within the document image analysis and recognition community. Ultimately, the competition analysis provides best practices and recommendations for successfully running privacy-focused federated learning challenges in the future.
△ Less
Submitted 3 June, 2025; v1 submitted 6 November, 2024;
originally announced November 2024.
-
Parameter-free Clipped Gradient Descent Meets Polyak
Authors:
Yuki Takezawa,
Han Bao,
Ryoma Sato,
Kenta Niwa,
Makoto Yamada
Abstract:
Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search. However, the method is time-consuming, particularly when multiple hyperparameters exist. Therefore, recent studies have analyzed parameter-free methods that adjust the hyp…
▽ More
Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search. However, the method is time-consuming, particularly when multiple hyperparameters exist. Therefore, recent studies have analyzed parameter-free methods that adjust the hyperparameters on the fly. However, the existing work is limited to investigations of parameter-free methods for the stepsize, and parameter-free methods for other hyperparameters have not been explored. For instance, although the gradient clipping threshold is a crucial hyperparameter in addition to the stepsize for preventing gradient explosion issues, none of the existing studies have investigated parameter-free methods for clipped gradient descent. Therefore, in this study, we investigate the parameter-free methods for clipped gradient descent. Specifically, we propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning, and its convergence rate is asymptotically independent of $L$ under $L$-smooth and $(L_0, L_1)$-smooth assumptions of the loss function, similar to that of clipped gradient descent with well-tuned hyperparameters. We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods using LSTM, Nano-GPT, and T5.
△ Less
Submitted 31 October, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Optimal Transport with Cyclic Symmetry
Authors:
Shoichiro Takeda,
Yasunori Akagi,
Naoki Marumo,
Kenta Niwa
Abstract:
We propose novel fast algorithms for optimal transport (OT) utilizing a cyclic symmetry structure of input data. Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing. Our main idea is to reduce OT to a small optimization problem that has significantly fewer variables by utilizing cyclic symmetry and various optimiza…
▽ More
We propose novel fast algorithms for optimal transport (OT) utilizing a cyclic symmetry structure of input data. Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing. Our main idea is to reduce OT to a small optimization problem that has significantly fewer variables by utilizing cyclic symmetry and various optimization techniques. On the basis of this reduction, our algorithms solve the small optimization problem instead of the original OT. As a result, our algorithms obtain the optimal solution and the objective function value of the original OT faster than solving the original OT directly. In this paper, our focus is on two crucial OT formulations: the linear programming OT (LOT) and the strongly convex-regularized OT, which includes the well-known entropy-regularized OT (EROT). Experiments show the effectiveness of our algorithms for LOT and EROT in synthetic/real-world data that has a strict/approximate cyclic symmetry structure. Through theoretical and experimental results, this paper successfully introduces the concept of symmetry into the OT research field for the first time.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Embarrassingly Simple Text Watermarks
Authors:
Ryoma Sato,
Yuki Takezawa,
Han Bao,
Kenta Niwa,
Makoto Yamada
Abstract:
We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject…
▽ More
We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject a watermark without changing the meaning of the text at all while a validator can detect if a text was generated from a system that adopted Easymark or not with high credibility. Easymark is extremely easy to implement so that it only requires a few lines of code. Easymark does not require access to LLMs, so it can be implemented on the user-side when the LLM providers do not offer watermarked LLMs. In spite of its simplicity, it achieves higher detection accuracy and BLEU scores than the state-of-the-art text watermarking methods. We also prove the impossibility theorem of perfect watermarking, which is valuable in its own right. This theorem shows that no matter how sophisticated a watermark is, a malicious user could remove it from the text, which motivate us to use a simple watermark such as Easymark. We carry out experiments with LLM-generated texts and confirm that Easymark can be detected reliably without any degradation of BLEU and perplexity, and outperform state-of-the-art watermarks in terms of both quality and reliability.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Necessary and Sufficient Watermark for Large Language Models
Authors:
Yuki Takezawa,
Ryoma Sato,
Han Bao,
Kenta Niwa,
Makoto Yamada
Abstract:
In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written…
▽ More
In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. More specifically, we derive minimum constraints required to be imposed on the generated texts to distinguish whether LLMs or humans write the texts. Then, we formulate the NS-Watermark as a constrained optimization problem and propose an efficient algorithm to solve it. Through the experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores.
△ Less
Submitted 15 February, 2025; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence
Authors:
Yuki Takezawa,
Ryoma Sato,
Han Bao,
Kenta Niwa,
Makoto Yamada
Abstract:
Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponen…
▽ More
Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-$(k + 1)$ Graph. Unlike the existing topologies, the Base-$(k + 1)$ Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-$(k + 1)$ Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies.
△ Less
Submitted 15 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data
Authors:
Yuki Takezawa,
Han Bao,
Kenta Niwa,
Ryoma Sato,
Makoto Yamada
Abstract:
SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momen…
▽ More
SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and deteriorate when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the setting where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient $β\in [0, 1)$. Through experiments, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum and can consistently outperform these existing methods when the data distributions are heterogeneous.
△ Less
Submitted 24 September, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization
Authors:
Yuki Takezawa,
Kenta Niwa,
Makoto Yamada
Abstract:
In recent years, decentralized learning has emerged as a powerful tool not only for large-scale machine learning, but also for preserving privacy. One of the key challenges in decentralized learning is that the data distribution held by each node is statistically heterogeneous. To address this challenge, the primal-dual algorithm called the Edge-Consensus Learning (ECL) was proposed and was experi…
▽ More
In recent years, decentralized learning has emerged as a powerful tool not only for large-scale machine learning, but also for preserving privacy. One of the key challenges in decentralized learning is that the data distribution held by each node is statistically heterogeneous. To address this challenge, the primal-dual algorithm called the Edge-Consensus Learning (ECL) was proposed and was experimentally shown to be robust to the heterogeneity of data distributions. However, the convergence rate of the ECL is provided only when the objective function is convex, and has not been shown in a standard machine learning setting where the objective function is non-convex. Furthermore, the intuitive reason why the ECL is robust to the heterogeneity of data distributions has not been investigated. In this work, we first investigate the relationship between the ECL and Gossip algorithm and show that the update formulas of the ECL can be regarded as correcting the local stochastic gradient in the Gossip algorithm. Then, we propose the Generalized ECL (G-ECL), which contains the ECL as a special case, and provide the convergence rates of the G-ECL in both (strongly) convex and non-convex settings, which do not depend on the heterogeneity of data distributions. Through synthetic experiments, we demonstrate that the numerical results of both the G-ECL and ECL coincide with the convergence rate of the G-ECL.
△ Less
Submitted 22 September, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Communication Compression for Decentralized Learning with Operator Splitting Methods
Authors:
Yuki Takezawa,
Kenta Niwa,
Makoto Yamada
Abstract:
In decentralized learning, operator splitting methods using a primal-dual formulation (e.g., the Edge-Consensus Learning (ECL)) has been shown to be robust to heterogeneous data and has attracted significant attention in recent years. However, in the ECL, a node needs to exchange dual variables with its neighbors. These exchanges incur significant communication costs. For the Gossip-based algorith…
▽ More
In decentralized learning, operator splitting methods using a primal-dual formulation (e.g., the Edge-Consensus Learning (ECL)) has been shown to be robust to heterogeneous data and has attracted significant attention in recent years. However, in the ECL, a node needs to exchange dual variables with its neighbors. These exchanges incur significant communication costs. For the Gossip-based algorithms, many compression methods have been proposed, but these Gossip-based algorithm do not perform well when the data distribution held by each node is statistically heterogeneous. In this work, we propose the novel framework of the compression methods for the ECL, called the Communication Compressed ECL (C-ECL). Specifically, we reformulate the update formulas of the ECL, and propose to compress the update values of the dual variables. We demonstrate experimentally that the C-ECL can achieve a nearly equivalent performance with fewer parameter exchanges than the ECL. Moreover, we demonstrate that the C-ECL is more robust to heterogeneous data than the Gossip-based algorithms.
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Authors:
Guoqiang Zhang,
Kenta Niwa,
W. Bastiaan Kleijn
Abstract:
We make contributions towards improving adaptive-optimizer performance. Our improvements are based on suppression of the range of adaptive stepsizes in the AdaBelief optimizer. Firstly, we show that the particular placement of the parameter epsilon within the update expressions of AdaBelief reduces the range of the adaptive stepsizes, making AdaBelief closer to SGD with momentum. Secondly, we exte…
▽ More
We make contributions towards improving adaptive-optimizer performance. Our improvements are based on suppression of the range of adaptive stepsizes in the AdaBelief optimizer. Firstly, we show that the particular placement of the parameter epsilon within the update expressions of AdaBelief reduces the range of the adaptive stepsizes, making AdaBelief closer to SGD with momentum. Secondly, we extend AdaBelief by further suppressing the range of the adaptive stepsizes. To achieve the above goal, we perform mutual layerwise vector projections between the gradient g_t and its first momentum m_t before using them to estimate the second momentum. The new optimization method is referred to as Aida. Thirdly, extensive experimental results show that Aida outperforms nine optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the nine methods when training WGAN-GP models for image generation tasks. Furthermore, Aida produces higher validation accuracies than AdaBelief for training ResNet18 over ImageNet. Code is available <a href="https://github.com/guoqiang-x-zhang/AidaOptimizer">at this URL</a>
△ Less
Submitted 24 January, 2023; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Revisiting the Primal-Dual Method of Multipliers for Optimisation over Centralised Networks
Authors:
Guoqiang Zhang,
Kenta Niwa,
W. Bastiaan Kleijn
Abstract:
The primal-dual method of multipliers (PDMM) was originally designed for solving a decomposable optimisation problem over a general network. In this paper, we revisit PDMM for optimisation over a centralized network. We first note that the recently proposed method FedSplit [1] implements PDMM for a centralized network. In [1], Inexact FedSplit (i.e., gradient based FedSplit) was also studied both…
▽ More
The primal-dual method of multipliers (PDMM) was originally designed for solving a decomposable optimisation problem over a general network. In this paper, we revisit PDMM for optimisation over a centralized network. We first note that the recently proposed method FedSplit [1] implements PDMM for a centralized network. In [1], Inexact FedSplit (i.e., gradient based FedSplit) was also studied both empirically and theoretically. We identify the cause for the poor reported performance of Inexact FedSplit, which is due to the improper initialisation in the gradient operations at the client side. To fix the issue of Inexact FedSplit, we propose two versions of Inexact PDMM, which are referred to as gradient-based PDMM (GPDMM) and accelerated GPDMM (AGPDMM), respectively. AGPDMM accelerates GPDMM at the cost of transmitting two times the number of parameters from the server to each client per iteration compared to GPDMM. We provide a new convergence bound for GPDMM for a class of convex optimisation problems. Our new bounds are tighter than those derived for Inexact FedSplit. We also investigate the update expressions of AGPDMM and SCAFFOLD to find their similarities. It is found that when the number K of gradient steps at the client side per iteration is K=1, both AGPDMM and SCAFFOLD reduce to vanilla gradient descent with proper parameter setup. Experimental results indicate that AGPDMM converges faster than SCAFFOLD when K>1 while GPDMM converges slightly worse than SCAFFOLD.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
SSFG: Stochastically Scaling Features and Gradients for Regularizing Graph Convolutional Networks
Authors:
Haimin Zhang,
Min Xu,
Guoqiang Zhang,
Kenta Niwa
Abstract:
Graph convolutional networks have been successfully applied in various graph-based tasks. In a typical graph convolutional layer, node features are updated by aggregating neighborhood information. Repeatedly applying graph convolutions can cause the oversmoothing issue, i.e., node features at deep layers converge to similar values. Previous studies have suggested that oversmoothing is one of the m…
▽ More
Graph convolutional networks have been successfully applied in various graph-based tasks. In a typical graph convolutional layer, node features are updated by aggregating neighborhood information. Repeatedly applying graph convolutions can cause the oversmoothing issue, i.e., node features at deep layers converge to similar values. Previous studies have suggested that oversmoothing is one of the major issues that restrict the performance of graph convolutional networks. In this paper, we propose a stochastic regularization method to tackle the oversmoothing problem. In the proposed method, we stochastically scale features and gradients (SSFG) by a factor sampled from a probability distribution in the training procedure. By explicitly applying a scaling factor to break feature convergence, the oversmoothing issue is alleviated. We show that applying stochastic scaling at the gradient level is complementary to that applied at the feature level to improve the overall performance. Our method does not increase the number of trainable parameters. When used together with ReLU, our SSFG can be seen as a stochastic ReLU activation function. We experimentally validate our SSFG regularization method on three commonly used types of graph networks. Extensive experimental results on seven benchmark datasets for four graph-based tasks demonstrate that our SSFG regularization is effective in improving the overall performance of the baseline graph networks.
△ Less
Submitted 30 March, 2021; v1 submitted 20 February, 2021;
originally announced February 2021.
-
Approximated Orthonormal Normalisation in Training Neural Networks
Authors:
Guoqiang Zhang,
Kenta Niwa,
W. B. Kleijn
Abstract:
Generalisation of a deep neural network (DNN) is one major concern when employing the deep learning approach for solving practical problems. In this paper we propose a new technique, named approximated orthonormal normalisation (AON), to improve the generalisation capacity of a DNN model. Considering a weight matrix W from a particular neural layer in the model, our objective is to design a functi…
▽ More
Generalisation of a deep neural network (DNN) is one major concern when employing the deep learning approach for solving practical problems. In this paper we propose a new technique, named approximated orthonormal normalisation (AON), to improve the generalisation capacity of a DNN model. Considering a weight matrix W from a particular neural layer in the model, our objective is to design a function h(W) such that its row vectors are approximately orthogonal to each other while allowing the DNN model to fit the training data sufficiently accurate. By doing so, it would avoid co-adaptation among neurons of the same layer to be able to improve network-generalisation capacity. Specifically, at each iteration, we first approximate (WW^T)^(-1/2) using its Taylor expansion before multiplying the matrix W. After that, the matrix product is then normalised by applying the spectral normalisation (SN) technique to obtain h(W). Conceptually speaking, AON is designed to turn orthonormal regularisation into orthonormal normalisation to avoid manual balancing the original and penalty functions. Experimental results show that AON yields promising validation performance compared to orthonormal regularisation.
△ Less
Submitted 14 January, 2020; v1 submitted 21 November, 2019;
originally announced November 2019.
-
Rapidly Adapting Moment Estimation
Authors:
Guoqiang Zhang,
Kenta Niwa,
W. Bastiaan Kleijn
Abstract:
Adaptive gradient methods such as Adam have been shown to be very effective for training deep neural networks (DNNs) by tracking the second moment of gradients to compute the individual learning rates. Differently from existing methods, we make use of the most recent first moment of gradients to compute the individual learning rates per iteration. The motivation behind it is that the dynamic varia…
▽ More
Adaptive gradient methods such as Adam have been shown to be very effective for training deep neural networks (DNNs) by tracking the second moment of gradients to compute the individual learning rates. Differently from existing methods, we make use of the most recent first moment of gradients to compute the individual learning rates per iteration. The motivation behind it is that the dynamic variation of the first moment of gradients may provide useful information to obtain the learning rates. We refer to the new method as the rapidly adapting moment estimation (RAME). The theoretical convergence of deterministic RAME is studied by using an analysis similar to the one used in [1] for Adam. Experimental results for training a number of DNNs show promising performance of RAME w.r.t. the convergence speed and generalization performance compared to the stochastic heavy-ball (SHB) method, Adam, and RMSprop.
△ Less
Submitted 24 February, 2019;
originally announced February 2019.
-
DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score
Authors:
Yuma Koizumi,
Kenta Niwa,
Yusuke Hioka,
Kazunori Kobayashi,
Yoichi Haneda
Abstract:
We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squa…
▽ More
We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for sound-quality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create high-quality output signals. However, since most OSQA scores are not analytically tractable, \textit{i.e.}, they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN optimization scheme on the basis of \textit{black-box optimization}, which is used for training a computer that plays a game. For a black-box-optimization scheme, we adopt the policy gradient method for calculating the gradient on the basis of a sampling algorithm. To simulate output signals using the sampling algorithm, DNNs are used to estimate the probability density function of the output signals that maximize OSQA scores. The OSQA scores are calculated from the simulated output signals, and the DNNs are trained to increase the probability of generating the simulated output signals that achieve high OSQA scores. Through several experiments, we found that OSQA scores significantly increased by applying the proposed method, even though the MSE was not minimized.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Software Defined Media: Virtualization of Audio-Visual Services
Authors:
Manabu Tsukada,
Keiko Ogawa,
Masahiro Ikeda,
Takuro Sone,
Kenta Niwa,
Shoichiro Saito,
Takashi Kasuya,
Hideki Sunahara,
Hiroshi Esaki
Abstract:
Internet-native audio-visual services are witnessing rapid development. Among these services, object-based audio-visual services are gaining importance. In 2014, we established the Software Defined Media (SDM) consortium to target new research areas and markets involving object-based digital media and Internet-by-design audio-visual environments. In this paper, we introduce the SDM architecture th…
▽ More
Internet-native audio-visual services are witnessing rapid development. Among these services, object-based audio-visual services are gaining importance. In 2014, we established the Software Defined Media (SDM) consortium to target new research areas and markets involving object-based digital media and Internet-by-design audio-visual environments. In this paper, we introduce the SDM architecture that virtualizes networked audio-visual services along with the development of smart buildings and smart cities using Internet of Things (IoT) devices and smart building facilities. Moreover, we design the SDM architecture as a layered architecture to promote the development of innovative applications on the basis of rapid advancements in software-defined networking (SDN). Then, we implement a prototype system based on the architecture, present the system at an exhibition, and provide it as an SDM API to application developers at hackathons. Various types of applications are developed using the API at these events. An evaluation of SDM API access shows that the prototype SDM platform effectively provides 3D audio reproducibility and interactiveness for SDM applications.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
PSD estimation in Beamspace for Estimating Direct-to-Reverberant Ratio from A Reverberant Speech Signal
Authors:
Yusuke Hioka,
Kenta Niwa
Abstract:
A method for estimation of direct-to-reverberant ratio (DRR) using a microphone array is proposed. The proposed method estimates the power spectral density (PSD) of the direct sound and the reverberation using the algorithm \textit{PSD estimation in beamspace} with a microphone array and calculates the DRR of the observed signal. The speech corpus of the ACE (Acoustic Characterisation of Environme…
▽ More
A method for estimation of direct-to-reverberant ratio (DRR) using a microphone array is proposed. The proposed method estimates the power spectral density (PSD) of the direct sound and the reverberation using the algorithm \textit{PSD estimation in beamspace} with a microphone array and calculates the DRR of the observed signal. The speech corpus of the ACE (Acoustic Characterisation of Environments) Challenge was utilised for evaluating the practical feasibility of the proposed method. The experimental results revealed that the proposed method was able to effectively estimate the DRR from a recording of a reverberant speech signal which included various environmental noise.
△ Less
Submitted 29 October, 2015;
originally announced October 2015.