Neural Network based End-to-End Query by Example Spoken Term Detection

Ram, Dhananjay; Miculicich, Lesly; Bourlard, Hervé

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1911.08332 (eess)

[Submitted on 19 Nov 2019]

Title:Neural Network based End-to-End Query by Example Spoken Term Detection

Authors:Dhananjay Ram, Lesly Miculicich, Hervé Bourlard

View PDF

Abstract:This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual features perform increasingly better with more training languages. Previously, it has been shown that the DTW based matching can be replaced with a CNN based matching while using posterior features. Here, we show that the CNN based matching outperforms DTW based matching using bottleneck features as well. In this case, the feature extraction and pattern matching stages of our QbE-STD system are optimized independently of each other. We propose to integrate these two stages in a fully neural network based end-to-end learning framework to enable joint optimization of those two stages simultaneously. The proposed approaches are evaluated on two challenging multilingual datasets: Spoken Web Search 2013 and Query by Example Search on Speech Task 2014, demonstrating in each case significant improvements.

Comments:	Submitted to IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Cite as:	arXiv:1911.08332 [eess.AS]
	(or arXiv:1911.08332v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1911.08332

Submission history

From: Dhananjay Ram [view email]
[v1] Tue, 19 Nov 2019 15:07:07 UTC (3,806 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Neural Network based End-to-End Query by Example Spoken Term Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Neural Network based End-to-End Query by Example Spoken Term Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators