Skip to main content

Showing 1–7 of 7 results for author: Flemotomos, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2411.00664  [pdf, other

    eess.AS cs.CL

    Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval

    Authors: Nikolaos Flemotomos, Roger Hsiao, Pawel Swietojanski, Takaaki Hori, Dogan Can, Xiaodan Zhuang

    Abstract: Neural contextual biasing allows speech recognition models to leverage contextually relevant information, leading to improved transcription accuracy. However, the biasing mechanism is typically based on a cross-attention module between the audio and a catalogue of biasing entries, which means computational complexity can pose severe practical limitations on the size of the biasing catalogue and co… ▽ More

    Submitted 4 November, 2024; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 13 pages, 7 figures, submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  2. arXiv:2204.00657  [pdf, other

    eess.AS cs.SD

    Multimodal Clustering with Role Induced Constraints for Speaker Diarization

    Authors: Nikolaos Flemotomos, Shrikanth Narayanan

    Abstract: Speaker clustering is an essential step in conventional speaker diarization systems and is typically addressed as an audio-only speech processing task. The language used by the participants in a conversation, however, carries additional information that can help improve the clustering performance. This is especially true in conversational interactions, such as business meetings, interviews, and le… ▽ More

    Submitted 11 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: To appear at Interspeech 2022

  3. arXiv:2106.07922  [pdf, other

    cs.CL eess.AS

    An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates

    Authors: Zhuohao Chen, Nikolaos Flemotomos, Karan Singla, Torrey A. Creed, David C. Atkins, Shrikanth Narayanan

    Abstract: Text-based computational approaches for assessing the quality of psychotherapy are being developed to support quality assurance and clinical training. However, due to the long durations of typical conversation based therapy sessions, and due to limited annotated modeling resources, computational methods largely rely on frequency-based lexical features or dialogue acts to assess the overall session… ▽ More

    Submitted 20 March, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted by Computer Speech & Language

  4. arXiv:2102.11265  [pdf, other

    eess.AS cs.CL cs.SD

    Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies

    Authors: Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

    Abstract: With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domai… ▽ More

    Submitted 27 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: new version has an updated title

  5. arXiv:2005.07809  [pdf, other

    eess.AS cs.CL cs.SD

    Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions

    Authors: Zhuohao Chen, Nikolaos Flemotomos, Victor Ardulov, Torrey A. Creed, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

    Abstract: Cognitive Behavioral Therapy (CBT) is a goal-oriented psychotherapy for mental health concerns implemented in a conversational setting with broad empirical support for its effectiveness across a range of presenting problems and client populations. The quality of a CBT session is typically assessed by trained human raters who manually assign pre-defined session-level behavioral codes. In this paper… ▽ More

    Submitted 14 October, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

  6. A Memory Augmented Architecture for Continuous Speaker Identification in Meetings

    Authors: Nikolaos Flemotomos, Dimitrios Dimitriadis

    Abstract: We introduce and analyze a novel approach to the problem of speaker identification in multi-party recorded meetings. Given a speech segment and a set of available candidate profiles, we propose a novel data-driven way to model the distance relations between them, aiming at identifying the speaker label corresponding to that segment. To achieve this we employ a recurrent, memory-based architecture,… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: Submitted to ICASSP 2020

  7. Linguistically Aided Speaker Diarization Using Speaker Role Information

    Authors: Nikolaos Flemotomos, Panayiotis Georgiou, Shrikanth Narayanan

    Abstract: Speaker diarization relies on the assumption that speech segments corresponding to a particular speaker are concentrated in a specific region of the speaker space; a region which represents that speaker's identity. These identities are not known a priori, so a clustering algorithm is typically employed, which is traditionally based solely on audio. Under noisy conditions, however, such an approach… ▽ More

    Submitted 5 February, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: from v1: restructured Introduction and Background, added experimental results with ASR text and language-only baseline