Skip to main content

Showing 1–1 of 1 results for author: Pothuganti, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.17129  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

    Authors: Adrian S. Roman, Baladithya Balamurugan, Rithik Pothuganti

    Abstract: This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a fr… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.