AQUALLM: Audio Question Answering Data Generation Using Large Language Models

Behera, Swarup Ranjan; Injeti, Krishna Mohan; Patibandla, Jaya Sai Kiran; Pokala, Praveen Kumar; Pailla, Balakrishna Reddy

Computer Science > Computation and Language

arXiv:2312.17343 (cs)

[Submitted on 28 Dec 2023]

Title:AQUALLM: Audio Question Answering Data Generation Using Large Language Models

Authors:Swarup Ranjan Behera, Krishna Mohan Injeti, Jaya Sai Kiran Patibandla, Praveen Kumar Pokala, Balakrishna Reddy Pailla

View PDF HTML (experimental)

Abstract:Audio Question Answering (AQA) constitutes a pivotal task in which machines analyze both audio signals and natural language questions to produce precise natural language answers. The significance of possessing high-quality, diverse, and extensive AQA datasets cannot be overstated when aiming for the precision of an AQA system. While there has been notable focus on developing accurate and efficient AQA models, the creation of high-quality, diverse, and extensive datasets for the specific task at hand has not garnered considerable attention. To address this challenge, this work makes several contributions. We introduce a scalable AQA data generation pipeline, denoted as the AQUALLM framework, which relies on Large Language Models (LLMs). This framework utilizes existing audio-caption annotations and incorporates state-of-the-art LLMs to generate expansive, high-quality AQA datasets. Additionally, we present three extensive and high-quality benchmark datasets for AQA, contributing significantly to the progression of AQA research. AQA models trained on the proposed datasets set superior benchmarks compared to the existing state-of-the-art. Moreover, models trained on our datasets demonstrate enhanced generalizability when compared to models trained using human-annotated AQA data. Code and datasets will be accessible on GitHub~\footnote{\url{this https URL}}.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
ACM classes:	I.2.7
Cite as:	arXiv:2312.17343 [cs.CL]
	(or arXiv:2312.17343v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.17343

Submission history

From: Swarup Ranjan Behera [view email]
[v1] Thu, 28 Dec 2023 20:01:27 UTC (258 KB)

Computer Science > Computation and Language

Title:AQUALLM: Audio Question Answering Data Generation Using Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AQUALLM: Audio Question Answering Data Generation Using Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators