Skip to main content

Showing 1–17 of 17 results for author: Kuznetsova, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03578  [pdf, ps, other

    cs.CV cs.AI cs.LG

    SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

    Authors: Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, Andrew Zisserman

    Abstract: In recent years, there has been a proliferation of spatiotemporal foundation models in different scientific disciplines. While promising, these models are often domain-specific and are only assessed within the particular applications for which they are designed. Given that many tasks can be represented as video modeling problems, video foundation models (ViFMs) hold considerable promise as general… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: ICCV 2025, GitHub repo: https://github.com/google-deepmind/scivid

  2. arXiv:2506.10274  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Discrete Audio Tokens: More Than a Survey!

    Authors: Pooneh Mousavi, Gallil Maimon, Adel Moumen, Darius Petermann, Jiatong Shi, Haibin Wu, Haici Yang, Anastasia Kuznetsova, Artem Ploujnikov, Ricard Marxer, Bhuvana Ramabhadran, Benjamin Elizalde, Loren Lugosch, Jinyu Li, Cem Subakan, Phil Woodland, Minje Kim, Hung-yi Lee, Shinji Watanabe, Yossi Adi, Mirco Ravanelli

    Abstract: Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics while enabling efficient storage and inference, as well as competitive performance across diverse downstream tasks. They provide a practical alternative to continuous features, enabling the integration of speech and audio into modern large language models (LLMs).… ▽ More

    Submitted 16 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  3. arXiv:2501.13372  [pdf, other

    eess.AS cs.AI

    Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement

    Authors: Jae-Sung Bae, Anastasia Kuznetsova, Dinesh Manocha, John Hershey, Trausti Kristjansson, Minje Kim

    Abstract: This paper presents a new challenge that calls for zero-shot text-to-speech (TTS) systems to augment speech data for the downstream task, personalized speech enhancement (PSE), as part of the Generative Data Augmentation workshop at ICASSP 2025. Collecting high-quality personalized data is challenging due to privacy concerns and technical difficulties in recording audio from the test scene. To add… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: Accepted to ICASSP 2025 Satellite Workshop: Generative Data Augmentation for Real-World Signal Processing Applications

  4. arXiv:2403.10367  [pdf, other

    cs.CV

    Testing MediaPipe Holistic for Linguistic Analysis of Nonmanual Markers in Sign Languages

    Authors: Anna Kuznetsova, Vadim Kimmelman

    Abstract: Advances in Deep Learning have made possible reliable landmark tracking of human bodies and faces that can be used for a variety of tasks. We test a recent Computer Vision solution, MediaPipe Holistic (MPH), to find out if its tracking of the facial features is reliable enough for a linguistic analysis of data from sign languages, and compare it to an older solution (OpenFace, OF). We use an exist… ▽ More

    Submitted 25 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  5. arXiv:2312.08255  [pdf, other

    eess.IV cs.CV cs.LG

    OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods

    Authors: Mikhail Kulyabin, Aleksei Zhdanov, Anastasia Nikiforova, Andrey Stepichev, Anna Kuznetsova, Mikhail Ronkin, Vasilii Borisov, Alexander Bogachev, Sergey Korotkich, Paul A Constable, Andreas Maier

    Abstract: Optical coherence tomography (OCT) is a non-invasive imaging technique with extensive clinical applications in ophthalmology. OCT enables the visualization of the retinal layers, playing a vital role in the early detection and monitoring of retinal diseases. OCT uses the principle of light wave interference to create detailed images of the retinal microstructures, making it a valuable tool for dia… ▽ More

    Submitted 1 October, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Journal ref: Scientific Data 11.1 (2024): 365

  6. arXiv:2212.11920  [pdf, other

    cs.CV

    Beyond SOT: Tracking Multiple Generic Objects at Once

    Authors: Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc Van Gool, Alina Kuznetsova

    Abstract: Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the la… ▽ More

    Submitted 25 February, 2024; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: accepted by WACV'24

  7. arXiv:2211.07493  [pdf, ps, other

    eess.AS cs.SD

    The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement

    Authors: Anastasia Kuznetsova, Aswin Sivaraman, Minje Kim

    Abstract: With the advances in deep learning, speech enhancement systems benefited from large neural network architectures and achieved state-of-the-art quality. However, speaker-agnostic methods are not always desirable, both in terms of quality and their complexity, when they are to be used in a resource-constrained environment. One promising way is personalized speech enhancement (PSE), which is a smalle… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  8. arXiv:2210.09611  [pdf, other

    stat.AP cs.DL cs.IR

    Relationships between patenting trends and research activity for green energy technologies

    Authors: Regina Tuganova, Anna Permyakova, Anna Kuznetsova, Karina Rakhmanova, Natalia Monzul, Roman Uvarov, Elizaveta Kovtun, Semen Budennyy

    Abstract: Green technology is viewed as a means of creating a sustainable society and a catalyst for sustainable development by the global community. It is responsible for both the potential reduction of production waste and the reduction of carbon footprint and CO2 emissions. However, alongside with the growing popularity of green technologies, there is an emerging skepticism about their contribution to so… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: 11 pages, 3 figures

  9. arXiv:2202.08883  [pdf, other

    eess.AS cs.LG cs.SD

    Curriculum optimization for low-resource speech recognition

    Authors: Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers

    Abstract: Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text. However, conventional data feeding pipelines may be sub-optimal for low-resource speech recognition, which still remains a challenging task. We propose an automated curriculum learning approach to optimize the sequence of training examples based on both the progress of the model wh… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

  10. arXiv:2201.04020  [pdf, other

    cs.HC stat.CO

    ConsumerCheck: A Software for Analysis of Sensory and Consumer Data

    Authors: Oliver Tomic, Alexandra Kuznetsova, Per Bruun Brockhoff, Thomas Graff, Tormod Næs

    Abstract: ConsumerCheck is an open source data analysis software tailored for analysis of sensory and consumer data. Since some of the implemented methods are generic, such as PCA, PLSR and PCR, other data from other domains may also be analysed with ConsumerCheck. The software comes with a graphical user interface and as such provides non-statisticians and users without programming skills free access to a… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

    Comments: 37 pages inculding references, 41 figures

  11. arXiv:2103.13318  [pdf, other

    cs.CV

    Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types

    Authors: Thomas Mensink, Jasper Uijlings, Alina Kuznetsova, Michael Gygli, Vittorio Ferrari

    Abstract: Transfer learning enables to re-use knowledge learned on a source task to help learning a target task. A simple form of transfer learning is common in current state-of-the-art computer vision models, i.e. pre-training a model for image classification on the ILSVRC dataset, and then fine-tune on any target task. However, previous systematic studies of transfer learning have been limited and the cir… ▽ More

    Submitted 20 November, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted for future publication in TPAMI

  12. arXiv:2102.03662  [pdf, other

    cs.CL cs.SD eess.AS

    A bandit approach to curriculum generation for automatic speech recognition

    Authors: Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers

    Abstract: The Automated Speech Recognition (ASR) task has been a challenging domain especially for low data scenarios with few audio examples. This is the main problem in training ASR systems on the data from low-resource or marginalized languages. In this paper we present an approach to mitigate the lack of training data by employing Automated Curriculum Learning in combination with an adversarial bandit a… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

  13. arXiv:2012.12554  [pdf, other

    cs.CV cs.HC

    Efficient video annotation with visual interpolation and frame selection guidance

    Authors: A. Kuznetsova, A. Talati, Y. Luo, K. Simmons, V. Ferrari

    Abstract: We introduce a unified framework for generic video annotation with bounding boxes. Video annotation is a longstanding problem, as it is a tedious and time-consuming process. We tackle two important challenges of video annotation: (1) automatic temporal interpolation and extrapolation of bounding boxes provided by a human annotator on a subset of all frames, and (2) automatic selection of frames to… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: accepted to WACV 2021

  14. arXiv:1901.04017  [pdf, other

    cs.CV cs.CR cs.NI

    A Machine-Synesthetic Approach To DDoS Network Attack Detection

    Authors: Yuri Monakhov, Oleg Nikitin, Anna Kuznetsova, Alexey Kharlamov, Alexandr Amochkin

    Abstract: In the authors' opinion, anomaly detection systems, or ADS, seem to be the most perspective direction in the subject of attack detection, because these systems can detect, among others, the unknown (zero-day) attacks. To detect anomalies, the authors propose to use machine synesthesia. In this case, machine synesthesia is understood as an interface that allows using image classification algorithms… ▽ More

    Submitted 22 March, 2019; v1 submitted 13 January, 2019; originally announced January 2019.

    Comments: 12 pages, 2 figures, 5 tables. Accepted to the Intelligent Systems Conference (IntelliSys) 2019

    MSC Class: 62H30; 62H35; 94C99; 15A04 ACM Class: C.2.m; I.5.0; I.2.6

  15. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

    Authors: Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, Vittorio Ferrari

    Abstract: We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an in… ▽ More

    Submitted 21 February, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted to International Journal of Computer Vision, 2020

  16. arXiv:1810.04470  [pdf, other

    cs.NI eess.SY

    Analysis Of Congestion Control In Data Channels With Frequent Frame Loss

    Authors: Yuri Monakhov, Anna Kuznetsova

    Abstract: Development of optimal control procedures for congested networks is a key factor in maintaining efficient network utilization. The absence of congestion control mechanism or its failure can lead to the lack of availability for certain network segments, and in severe cases -- for the entire network. The paper presents an analytical model describing the operation of the TCP Reno congestion control a… ▽ More

    Submitted 10 October, 2018; originally announced October 2018.

    Comments: 5 pages, 4 figures, 2nd European Conference on Electrical Engineering & Computer Science EECS 2018: Bern, Switzerland, December 20-22, 2018

    MSC Class: 94C99; 68M12; 68M20; 90B25 ACM Class: C.2.2; C.2.3; C.4

  17. arXiv:1807.02136  [pdf, other

    cs.CV

    Detecting Visual Relationships Using Box Attention

    Authors: Alexander Kolesnikov, Alina Kuznetsova, Christoph H. Lampert, Vittorio Ferrari

    Abstract: We propose a new model for detecting visual relationships, such as "person riding motorcycle" or "bottle on table". This task is an important step towards comprehensive structured image understanding, going beyond detecting individual objects. Our main novelty is a Box Attention mechanism that allows to model pairwise interactions between objects using standard object detection pipelines. The resu… ▽ More

    Submitted 2 May, 2019; v1 submitted 5 July, 2018; originally announced July 2018.