-
Goggin's corrected Kalman Filter: Guarantees and Filtering Regimes
Authors:
Imon Banerjee,
Itai Gurvich
Abstract:
In this paper we revisit a non-linear filter for {\em non-Gaussian} noises that was introduced in [1]. Goggin proved that transforming the observations by the score function and then applying the Kalman Filter (KF) to the transformed observations results in an asymptotically optimal filter. In the current paper, we study the convergence rate of Goggin's filter in a pre-limit setting that allows us…
▽ More
In this paper we revisit a non-linear filter for {\em non-Gaussian} noises that was introduced in [1]. Goggin proved that transforming the observations by the score function and then applying the Kalman Filter (KF) to the transformed observations results in an asymptotically optimal filter. In the current paper, we study the convergence rate of Goggin's filter in a pre-limit setting that allows us to study a range of signal-to-noise regimes which includes, as a special case, Goggin's setting. Our guarantees are explicit in the level of observation noise, and unlike most other works in filtering, we do not assume Gaussianity of the noises.
Our proofs build on combining simple tools from two separate literature streams. One is a general posterior Cramér-Rao lower bound for filtering. The other is convergence-rate bounds in the Fisher information central limit theorem.
Along the way, we also study filtering regimes for linear state-space models, characterizing clearly degenerate regimes -- where trivial filters are nearly optimal -- and a {\em balanced} regime, which is where Goggin's filter has the most value. \footnote{This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Authors:
Alon Vinnikov,
Amir Ivry,
Aviv Hurvitz,
Igor Abramovski,
Sharon Koubi,
Ilya Gurvich,
Shai Pe`er,
Xiong Xiao,
Benjamin Martinez Elizalde,
Naoyuki Kanda,
Xiaofei Wang,
Shalev Shaer,
Stav Yagev,
Yossi Asher,
Sunit Sivasankaran,
Yifan Gong,
Min Tang,
Huaming Wang,
Eyal Krupka
Abstract:
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First…
▽ More
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics. It is recorded across 30 conference rooms, featuring 4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This is aligned with common setups in actual conference rooms, and avoids technical complexities associated with multi-device tasks. It also allows for the development of geometry-specific solutions. The NOTSOFAR-1 Challenge aims to advance research in the field of distant conversational speech recognition, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmarking datasets.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism
Authors:
Ilya Gurvich,
Ido Leichter,
Dharmendar Reddy Palle,
Yossi Asher,
Alon Vinnikov,
Igor Abramovski,
Vishak Gopal,
Ross Cutler,
Eyal Krupka
Abstract:
We introduce a distinctive real-time, causal, neural network-based active speaker detection system optimized for low-power edge computing. This system drives a virtual cinematography module and is deployed on a commercial device. The system uses data originating from a microphone array and a 360-degree camera. Our network requires only 127 MFLOPs per participant, for a meeting with 14 participants…
▽ More
We introduce a distinctive real-time, causal, neural network-based active speaker detection system optimized for low-power edge computing. This system drives a virtual cinematography module and is deployed on a commercial device. The system uses data originating from a microphone array and a 360-degree camera. Our network requires only 127 MFLOPs per participant, for a meeting with 14 participants. Unlike previous work, we examine the error rate of our network when the computational budget is exhausted, and find that it exhibits graceful degradation, allowing the system to operate reasonably well even in this case. Departing from conventional DOA estimation approaches, our network learns to query the available acoustic data, considering the detected head locations. We train and evaluate our algorithm on a realistic meetings dataset featuring up to 14 participants in the same meeting, overlapped speech, and other challenging scenarios.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
A Low-rank Approximation for MDPs via Moment Coupling
Authors:
Amy B. Z. Zhang,
Itai Gurvich
Abstract:
We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the…
▽ More
We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is $\textit{not}$ required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this $\textit{moment matching}$, the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provided a disciplined mechanism to tune the aggregation and disaggregation probabilities. The computational gains arise from the reduction of the effective state space from $N$ to $N^{\frac{1}{2}+ε}$ is as one might intuitively expect from approximations grounded in the central limit theorem.
△ Less
Submitted 9 April, 2021; v1 submitted 18 September, 2020;
originally announced September 2020.
-
Advances in Online Audio-Visual Meeting Transcription
Authors:
Takuya Yoshioka,
Igor Abramovski,
Cem Aksoylar,
Zhuo Chen,
Moshe David,
Dimitrios Dimitriadis,
Yifan Gong,
Ilya Gurvich,
Xuedong Huang,
Yan Huang,
Aviv Hurvitz,
Li Jiang,
Sharon Koubi,
Eyal Krupka,
Ido Leichter,
Changliang Liu,
Partha Parthasarathy,
Alon Vinnikov,
Lingfeng Wu,
Xiong Xiao,
Wayne Xiong,
Huaming Wang,
Zhenghao Wang,
Jun Zhang,
Yong Zhao
, et al. (1 additional authors not shown)
Abstract:
This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we desc…
▽ More
This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we describe an online audio-visual speaker diarization method that leverages face tracking and identification, sound source localization, speaker identification, and, if available, prior speaker information for robustness to various real world challenges. All components are integrated in a meeting transcription framework called SRD, which stands for "separate, recognize, and diarize". Experimental results using recordings of natural meetings involving up to 11 attendees are reported. The continuous speech separation improves a word error rate (WER) by 16.1% compared with a highly tuned beamformer. When a complete list of meeting attendees is available, the discrepancy between WER and speaker-attributed WER is only 1.0%, indicating accurate word-to-speaker association. This increases marginally to 1.6% when 50% of the attendees are unknown to the system.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Online Allocation and Pricing: Constant Regret via Bellman Inequalities
Authors:
Alberto Vera,
Siddhartha Banerjee,
Itai Gurvich
Abstract:
We develop a framework for designing simple and efficient policies for a family of online allocation and pricing problems, that includes online packing, budget-constrained probing, dynamic pricing, and online contextual bandits with knapsacks. In each case, we evaluate the performance of our policies in terms of their regret (i.e., additive gap) relative to an offline controller that is endowed wi…
▽ More
We develop a framework for designing simple and efficient policies for a family of online allocation and pricing problems, that includes online packing, budget-constrained probing, dynamic pricing, and online contextual bandits with knapsacks. In each case, we evaluate the performance of our policies in terms of their regret (i.e., additive gap) relative to an offline controller that is endowed with more information than the online controller. Our framework is based on Bellman Inequalities, which decompose the loss of an algorithm into two distinct sources of error: (1) arising from computational tractability issues, and (2) arising from estimation/prediction of random trajectories. Balancing these errors guides the choice of benchmarks, and leads to policies that are both tractable and have strong performance guarantees. In particular, in all our examples, we demonstrate constant-regret policies that only require re-solving an LP in each period, followed by a simple greedy action-selection rule; thus, our policies are practical as well as provably near optimal.
△ Less
Submitted 30 July, 2020; v1 submitted 14 June, 2019;
originally announced June 2019.
-
Uniformly bounded regret in the multi-secretary problem
Authors:
Alessandro Arlotto,
Itai Gurvich
Abstract:
In the secretary problem of Cayley (1875) and Moser (1956), $n$ non-negative, independent, random variables with common distribution are sequentially presented to a decision maker who decides when to stop and collect the most recent realization. The goal is to maximize the expected value of the collected element. In the $k$-choice variant, the decision maker is allowed to make $k \leq n$ selection…
▽ More
In the secretary problem of Cayley (1875) and Moser (1956), $n$ non-negative, independent, random variables with common distribution are sequentially presented to a decision maker who decides when to stop and collect the most recent realization. The goal is to maximize the expected value of the collected element. In the $k$-choice variant, the decision maker is allowed to make $k \leq n$ selections to maximize the expected total value of the selected elements. Assuming that the values are drawn from a known distribution with finite support, we prove that the best regret---the expected gap between the optimal online policy and its offline counterpart in which all $n$ values are made visible at time $0$---is uniformly bounded in the the number of candidates $n$ and the budget $k$. Our proof is constructive: we develop an adaptive Budget-Ratio policy that achieves this performance. The policy selects or skips values depending on where the ratio of the residual budget to the remaining time stands relative to multiple thresholds that correspond to middle points of the distribution. We also prove that being adaptive is crucial: in general, the minimal regret among non-adaptive policies grows like the square root of $n$. The difference is the value of adaptiveness.
△ Less
Submitted 1 June, 2018; v1 submitted 20 October, 2017;
originally announced October 2017.