Skip to main content

Showing 1–5 of 5 results for author: Abramovski, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.17304  [pdf, other

    cs.SD cs.LG eess.AS

    Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

    Authors: Igor Abramovski, Alon Vinnikov, Shalev Shaer, Naoyuki Kanda, Xiaofei Wang, Amir Ivry, Eyal Krupka

    Abstract: The first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1) Challenge is a pivotal initiative that sets new benchmarks by offering datasets more representative of the needs of real-world business applications than those previously available. The challenge provides a unique combination of 280 recorded meetings across 30 diverse environments, capturing real-world acoustic… ▽ More

    Submitted 9 March, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

  2. arXiv:2401.08887  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

    Authors: Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

    Abstract: We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: preprint

  3. arXiv:2309.08295  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

    Authors: Ilya Gurvich, Ido Leichter, Dharmendar Reddy Palle, Yossi Asher, Alon Vinnikov, Igor Abramovski, Vishak Gopal, Ross Cutler, Eyal Krupka

    Abstract: We introduce a distinctive real-time, causal, neural network-based active speaker detection system optimized for low-power edge computing. This system drives a virtual cinematography module and is deployed on a commercial device. The system uses data originating from a microphone array and a 360-degree camera. Our network requires only 127 MFLOPs per participant, for a meeting with 14 participants… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  4. arXiv:2109.10598  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Diarisation using location tracking with agglomerative clustering

    Authors: Jeremy H. M. Wong, Igor Abramovski, Xiong Xiao, Yifan Gong

    Abstract: Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes to relax this assumption, by explicitly modelling the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framewo… ▽ More

    Submitted 23 September, 2021; v1 submitted 22 September, 2021; originally announced September 2021.

  5. arXiv:1912.04979  [pdf, other

    eess.AS cs.CL cs.CV cs.SD eess.IV

    Advances in Online Audio-Visual Meeting Transcription

    Authors: Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao , et al. (1 additional authors not shown)

    Abstract: This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we desc… ▽ More

    Submitted 10 December, 2019; originally announced December 2019.

    Comments: To appear in Proc. IEEE ASRU Workshop 2019