-
Towards Automatic Evaluation of Task-Oriented Dialogue Flows
Authors:
Mehrnoosh Mirtaheri,
Nikhil Varghese,
Chandra Khatri,
Amol Kelkar
Abstract:
Task-oriented dialogue systems rely on predefined conversation schemes (dialogue flows) often represented as directed acyclic graphs. These flows can be manually designed or automatically generated from previously recorded conversations. Due to variations in domain expertise or reliance on different sets of prior conversations, these dialogue flows can manifest in significantly different graph str…
▽ More
Task-oriented dialogue systems rely on predefined conversation schemes (dialogue flows) often represented as directed acyclic graphs. These flows can be manually designed or automatically generated from previously recorded conversations. Due to variations in domain expertise or reliance on different sets of prior conversations, these dialogue flows can manifest in significantly different graph structures. Despite their importance, there is no standard method for evaluating the quality of dialogue flows. We introduce FuDGE (Fuzzy Dialogue-Graph Edit Distance), a novel metric that evaluates dialogue flows by assessing their structural complexity and representational coverage of the conversation data. FuDGE measures how well individual conversations align with a flow and, consequently, how well a set of conversations is represented by the flow overall. Through extensive experiments on manually configured flows and flows generated by automated techniques, we demonstrate the effectiveness of FuDGE and its evaluation framework. By standardizing and optimizing dialogue flows, FuDGE enables conversational designers and automated techniques to achieve higher levels of efficiency and automation.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
KULCQ: An Unsupervised Keyword-based Utterance Level Clustering Quality Metric
Authors:
Pranav Guruprasad,
Negar Mokhberian,
Nikhil Varghese,
Chandra Khatri,
Amol Kelkar
Abstract:
Intent discovery is crucial for both building new conversational agents and improving existing ones. While several approaches have been proposed for intent discovery, most rely on clustering to group similar utterances together. Traditional evaluation of these utterance clusters requires intent labels for each utterance, limiting scalability. Although some clustering quality metrics exist that do…
▽ More
Intent discovery is crucial for both building new conversational agents and improving existing ones. While several approaches have been proposed for intent discovery, most rely on clustering to group similar utterances together. Traditional evaluation of these utterance clusters requires intent labels for each utterance, limiting scalability. Although some clustering quality metrics exist that do not require labeled data, they focus solely on cluster geometry while ignoring the linguistic nuances present in conversational transcripts. In this paper, we introduce Keyword-based Utterance Level Clustering Quality (KULCQ), an unsupervised metric that leverages keyword analysis to evaluate clustering quality. We demonstrate KULCQ's effectiveness by comparing it with existing unsupervised clustering metrics and validate its performance through comprehensive ablation studies. Our results show that KULCQ better captures semantic relationships in conversational data while maintaining consistency with geometric clustering principles.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Constrained RS coding for Low Peak to Average Power Ratio in FBMC -- OQAM Systems
Authors:
Job Chunkath,
V. S. Sheeba,
Nisha Varghese
Abstract:
Multi-carrier modulation techniques have now become a standard in many communication protocols. Filter bank based multi-carrier (FBMC) generation techniques have been discussed in the literature as a means for overcoming the shortcomings of IFFT/FFT based OFDM system. The Peak to Average Power Ratio (PAPR) is a problem faced by all multi-carrier techniques. This paper discusses the methods for red…
▽ More
Multi-carrier modulation techniques have now become a standard in many communication protocols. Filter bank based multi-carrier (FBMC) generation techniques have been discussed in the literature as a means for overcoming the shortcomings of IFFT/FFT based OFDM system. The Peak to Average Power Ratio (PAPR) is a problem faced by all multi-carrier techniques. This paper discusses the methods for reducing PAPR in a FBMC system while maintaining acceptable Bit Error Rate (BER). A new PAPR minimizing scheme called Constrained Reed Solomon (CRS) coding is proposed. The hybrid techniques using coding and companding are tested for different channel models and is found to yield promising results.
△ Less
Submitted 30 June, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
IR Motion Deblurring
Authors:
Nisha Varghese,
Mahesh Mohan M. R.,
A. N. Rajagopalan
Abstract:
Camera gimbal systems are important in various air or water borne systems for applications such as navigation, target tracking, security and surveillance. A higher steering rate (rotation angle per second) of gimbal is preferable for real-time applications since a given field-of-view (FOV) can be revisited within a short period of time. However, due to relative motion between the gimbal and scene…
▽ More
Camera gimbal systems are important in various air or water borne systems for applications such as navigation, target tracking, security and surveillance. A higher steering rate (rotation angle per second) of gimbal is preferable for real-time applications since a given field-of-view (FOV) can be revisited within a short period of time. However, due to relative motion between the gimbal and scene during the exposure time, the captured video frames can suffer from motion blur. Since most of the post-capture applications require blurfree images, motion deblurring in real-time is an important need. Even though there exist blind deblurring methods which aim to retrieve latent images from blurry inputs, they are constrained by very high-dimensional optimization thus incurring large execution times. On the other hand, deep learning methods for motion deblurring, though fast, do not generalize satisfactorily to different domains (e.g., air, water, etc). In this work, we address the problem of real-time motion deblurring in infrared (IR) images captured by a gimbal-based system. We reveal how a priori knowledge of the blur-kernel can be used in conjunction with non-blind deblurring methods to achieve real-time performance. Importantly, our mathematical model can be leveraged to create large-scale datasets with realistic gimbal motion blur. Such datasets which are a rarity can be a valuable asset for contemporary deep learning methods. We show that, in comparison to the state-of-the-art techniques in deblurring, our method is better suited for practical gimbal-based imaging systems.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Can Commercial Testing Automation Tools Work for IoT? A Case Study of Selenium and Node-Red
Authors:
Neenu Varghese,
Roopak Sinha
Abstract:
Background: Testing IoT software is challenging due to large scale, volume of data and heterogeneity. Testing automation is a much-needed feature in the domain. Aims: The first goal of this research is to explore the requirements and challenges of IoT testing automation. The second goal is to integrate testing automation tools used in commercial software into the IoT context. Method: A systematic…
▽ More
Background: Testing IoT software is challenging due to large scale, volume of data and heterogeneity. Testing automation is a much-needed feature in the domain. Aims: The first goal of this research is to explore the requirements and challenges of IoT testing automation. The second goal is to integrate testing automation tools used in commercial software into the IoT context. Method: A systematic literature review is carried out to elicit requirements for testing automation in IoT. A design science approach is followed to build a testing automation tool for IoT applications written in the Node-Red platform, using the commercial testing automation tool Selenium. The resulting framework uses the Selenium Web Driver for browser-based testing automation for IoT applications. Results: The proposed framework has been functionally tested on multiple browsers with preliminary evaluation on maintainability, browser capability and comprehensiveness. Conclusions: The use of commercial tools for testing automation in IoT is feasible. However, major challenges like high data volumes and parallel transmission and processing of data need to be addressed comprehensively for complete integration.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Athena: Constructing Dialogues Dynamically with Discourse Constraints
Authors:
Vrindavan Harrison,
Juraj Juraska,
Wen Cui,
Lena Reed,
Kevin K. Bowden,
Jiaqi Wu,
Brian Schwarzmann,
Abteen Ebrahimi,
Rishi Rajasekaran,
Nikhil Varghese,
Max Wechsler-Azen,
Steve Whittaker,
Jeffrey Flanigan,
Marilyn Walker
Abstract:
This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response gen…
▽ More
This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response generators. This allows Athena to procure responses from dynamic sources, such as knowledge graph traversals and feature-based on-the-fly response retrieval methods. After describing the dialogue system architecture, we perform an analysis of conversations that Athena participated in during the 2019 Alexa Prize Competition. We conclude with a report on several user studies we carried out to better understand how individual user characteristics affect system ratings.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.