Audio segmentation based on melodic style with hand-crafted features and with convolutional neural networks

Vidwans, Amruta; Deo, Nachiket; Rao, Preeti

Computer Science > Sound

arXiv:1807.11138 (cs)

[Submitted on 30 Jul 2018]

Title:Audio segmentation based on melodic style with hand-crafted features and with convolutional neural networks

Authors:Amruta Vidwans, Nachiket Deo, Preeti Rao

View PDF

Abstract:We investigate methods for the automatic labeling of the taan section, a prominent structural component of the Hindustani Khayal vocal concert. The taan contains improvised raga-based melody rendered in the highly distinctive style of rapid pitch and energy modulations of the voice. We propose computational features that capture these specific high-level characteristics of the singing voice in the polyphonic context. The extracted local features are used to achieve classification at the frame level via a trained multilayer perceptron (MLP) network, followed by grouping and segmentation based on novelty detection. We report high accuracies with reference to musician annotated taan sections across artists and concerts. We also compare the performance obtained by the compact specialized features with frame-level classification via a convolutional neural network (CNN) operating directly on audio spectrogram patches for the same task. While the relatively simple architecture we experiment with does not quite attain the classification accuracy of the hand-crafted features, it provides for a performance well above chance with interesting insights about the ability of the network to learn discriminative features effectively from labeled data.

Comments:	This work was done in 2015 at Indian Institute of Technology, Bombay, as a part of the ERC grant agreement 267583 (CompMusic) project
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1807.11138 [cs.SD]
	(or arXiv:1807.11138v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1807.11138

Submission history

From: Amruta Vidwans [view email]
[v1] Mon, 30 Jul 2018 01:34:20 UTC (527 KB)

Computer Science > Sound

Title:Audio segmentation based on melodic style with hand-crafted features and with convolutional neural networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Audio segmentation based on melodic style with hand-crafted features and with convolutional neural networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators