Skip to main content

Showing 1–4 of 4 results for author: Lipping, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.09967   

    eess.AS cs.SD

    Language-based Audio Retrieval Task in DCASE 2022 Challenge

    Authors: Huang Xie, Samuel Lipping, Tuomas Virtanen

    Abstract: Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which aims at developing computational systems to model relationships between audio signals and free-form textual descriptions. Compared with audio captioning (Subtask 6A), whi… ▽ More

    Submitted 4 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: Update for arXiv:2206.06108 mistakenly submitted as a new article

  2. arXiv:2204.09634  [pdf, other

    cs.SD cs.LG eess.AS

    Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering

    Authors: Samuel Lipping, Parthasaarathy Sudarsanam, Konstantinos Drossos, Tuomas Virtanen

    Abstract: Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer. In this paper, we introduce Clotho-AQA, a dataset for Audio question answering consisting of 1991 audio files each between 15 to 30 seconds in duration selected from the Clotho dataset. For each audio file, we coll… ▽ More

    Submitted 17 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  3. arXiv:1910.09387  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    Clotho: An Audio Captioning Dataset

    Authors: Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen

    Abstract: Audio captioning is the novel task of general audio content description using free text. It is an intermodal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal. In this paper we present Clotho, a dataset for audio captioning consisting of 4981 audio samples of 15 to 30 seconds duration and… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

  4. arXiv:1907.09238  [pdf, other

    cs.SD eess.AS

    Crowdsourcing a Dataset of Audio Captions

    Authors: Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen

    Abstract: Audio captioning is a novel field of multi-modal translation and it is the task of creating a textual description of the content of an audio signal (e.g. "people talking in a big room"). The creation of a dataset for this task requires a considerable amount of work, rendering the crowdsourcing a very attractive option. In this paper we present a three steps based framework for crowdsourcing an aud… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.