Search | arXiv e-print repository

doi 10.3765/plsa.v6i1.4971

Human-AI Interactions Through A Gricean Lens

Authors: Laura Panfili, Steve Duman, Andrew Nave, Katherine Phelps Ridgeway, Nathan Eversole, Ruhi Sarikaya

Abstract: Grice's Cooperative Principle (1975) describes the implicit maxims that guide conversation between humans. As humans begin to interact with non-human dialogue systems more frequently and in a broader scope, an important question emerges: what principles govern those interactions? The present study addresses this question by evaluating human-AI interactions using Grice's four maxims; we demonstrate… ▽ More Grice's Cooperative Principle (1975) describes the implicit maxims that guide conversation between humans. As humans begin to interact with non-human dialogue systems more frequently and in a broader scope, an important question emerges: what principles govern those interactions? The present study addresses this question by evaluating human-AI interactions using Grice's four maxims; we demonstrate that humans do, indeed, apply these maxims to interactions with AI, even making explicit references to the AI's performance through a Gricean lens. Twenty-three participants interacted with an American English-speaking Alexa and rated and discussed their experience with an in-lab researcher. Researchers then reviewed each exchange, identifying those that might relate to Grice's maxims: Quantity, Quality, Manner, and Relevance. Many instances of explicit user frustration stemmed from violations of Grice's maxims. Quantity violations were noted for too little but not too much information, while Quality violations were rare, indicating trust in Alexa's responses. Manner violations focused on speed and humanness. Relevance violations were the most frequent, and they appear to be the most frustrating. While the maxims help describe many of the issues participants encountered, other issues do not fit neatly into Grice's framework. Participants were particularly averse to Alexa initiating exchanges or making unsolicited suggestions. To address this gap, we propose the addition of human Priority to describe human-AI interaction. Humans and AIs are not conversational equals, and human initiative takes priority. We suggest that the application of Grice's Cooperative Principles to human-AI interactions is beneficial both from an AI development perspective and as a tool for describing an emerging form of interaction. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Journal ref: Proceedings of the Linguistic Society of America 6 (2021) 288-302

arXiv:2106.02363 [pdf, other]

Learning Slice-Aware Representations with Mixture of Attentions

Authors: Cheng Wang, Sungjin Lee, Sunghyun Park, Han Li, Young-Bum Kim, Ruhi Sarikaya

Abstract: Real-world machine learning systems are achieving remarkable performance in terms of coarse-grained metrics like overall accuracy and F-1 score. However, model improvement and development often require fine-grained modeling on individual data subsets or slices, for instance, the data slices where the models have unsatisfactory results. In practice, it gives tangible values for developing such mode… ▽ More Real-world machine learning systems are achieving remarkable performance in terms of coarse-grained metrics like overall accuracy and F-1 score. However, model improvement and development often require fine-grained modeling on individual data subsets or slices, for instance, the data slices where the models have unsatisfactory results. In practice, it gives tangible values for developing such models that can pay extra attention to critical or interested slices while retaining the original overall performance. This work extends the recent slice-based learning (SBL)~\cite{chen2019slice} with a mixture of attentions (MoA) to learn slice-aware dual attentive representations. We empirically show that the MoA approach outperforms the baseline method as well as the original SBL approach on monitored slices with two natural language understanding (NLU) tasks. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: Findings of the ACL: ACL-IJCNLP 2021

arXiv:2104.13216 [pdf, other]

Handling Long-Tail Queries with Slice-Aware Conversational Systems

Authors: Cheng Wang, Sun Kim, Taiwoo Park, Sajal Choudhary, Sunghyun Park, Young-Bum Kim, Ruhi Sarikaya, Sungjin Lee

Abstract: We have been witnessing the usefulness of conversational AI systems such as Siri and Alexa, directly impacting our daily lives. These systems normally rely on machine learning models evolving over time to provide quality user experience. However, the development and improvement of the models are challenging because they need to support both high (head) and low (tail) usage scenarios, requiring fin… ▽ More We have been witnessing the usefulness of conversational AI systems such as Siri and Alexa, directly impacting our daily lives. These systems normally rely on machine learning models evolving over time to provide quality user experience. However, the development and improvement of the models are challenging because they need to support both high (head) and low (tail) usage scenarios, requiring fine-grained modeling strategies for specific data subsets or slices. In this paper, we explore the recent concept of slice-based learning (SBL) (Chen et al., 2019) to improve our baseline conversational skill routing system on the tail yet critical query traffic. We first define a set of labeling functions to generate weak supervision data for the tail intents. We then extend the baseline model towards a slice-aware architecture, which monitors and improves the model performance on the selected tail intents. Applied to de-identified live traffic from a commercial conversational AI system, our experiments show that the slice-aware model is beneficial in improving model performance for the tail intents while maintaining the overall performance. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: Published at ICLR 2021 Workshop on Weakly Supervised Learning

arXiv:2103.03373 [pdf, other]

Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration

Authors: Han Li, Sunghyun Park, Aswarth Dara, Jinseok Nam, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, Ruhi Sarikaya

Abstract: Current state-of-the-art large-scale conversational AI or intelligent digital assistant systems in industry comprises a set of components such as Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). For some of these systems that leverage a shared NLU ontology (e.g., a centralized intent/slot schema), there exists a separate skill routing component to correctly route a requ… ▽ More Current state-of-the-art large-scale conversational AI or intelligent digital assistant systems in industry comprises a set of components such as Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). For some of these systems that leverage a shared NLU ontology (e.g., a centralized intent/slot schema), there exists a separate skill routing component to correctly route a request to an appropriate skill, which is either a first-party or third-party application that actually executes on a user request. The skill routing component is needed as there are thousands of skills that can either subscribe to the same intent and/or subscribe to an intent under specific contextual conditions (e.g., device has a screen). Ensuring model robustness or resilience in the skill routing component is an important problem since skills may dynamically change their subscription in the ontology after the skill routing model has been deployed to production. We show how different modeling design choices impact the model robustness in the context of skill routing on a state-of-the-art commercial conversational AI system, specifically on the choices around data augmentation, model architecture, and optimization method. We show that applying data augmentation can be a very effective and practical way to drastically improve model robustness. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2010.12251 [pdf, other]

A scalable framework for learning from implicit user feedback to improve natural language understanding in large-scale conversational AI systems

Authors: Sunghyun Park, Han Li, Ameen Patel, Sidharth Mudgal, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, Ruhi Sarikaya

Abstract: Natural Language Understanding (NLU) is an established component within a conversational AI or digital assistant system, and it is responsible for producing semantic understanding of a user request. We propose a scalable and automatic approach for improving NLU in a large-scale conversational AI system by leveraging implicit user feedback, with an insight that user interaction data and dialog cont… ▽ More Natural Language Understanding (NLU) is an established component within a conversational AI or digital assistant system, and it is responsible for producing semantic understanding of a user request. We propose a scalable and automatic approach for improving NLU in a large-scale conversational AI system by leveraging implicit user feedback, with an insight that user interaction data and dialog context have rich information embedded from which user satisfaction and intention can be inferred. In particular, we propose a general domain-agnostic framework for curating new supervision data for improving NLU from live production traffic. With an extensive set of experiments, we show the results of applying the framework and improving NLU for a large-scale production system and show its impact across 10 domains. △ Less

Submitted 10 September, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: EMNLP 2021

ACM Class: I.2.7; I.2.1

arXiv:2006.07113 [pdf, other]

Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents

Authors: Dookun Park, Hao Yuan, Dongmin Kim, Yinglei Zhang, Matsoukas Spyros, Young-Bum Kim, Ruhi Sarikaya, Edward Guo, Yuan Ling, Kevin Quinn, Pham Hung, Benjamin Yao, Sungjin Lee

Abstract: Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems serving the needs of real users. An widely used approach to tackle this is to collect human annotation data and use them for evaluation or modeling. Human annotation based approaches are easier to control, but hard to scale. A novel alternative approach is to col… ▽ More Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems serving the needs of real users. An widely used approach to tackle this is to collect human annotation data and use them for evaluation or modeling. Human annotation based approaches are easier to control, but hard to scale. A novel alternative approach is to collect user's direct feedback via a feedback elicitation system embedded to the conversational agent system, and use the collected user feedback to train a machine-learned model for generalization. User feedback is the best proxy for user satisfaction, but is not available for some ineligible intents and certain situations. Thus, these two types of approaches are complementary to each other. In this work, we tackle the user satisfaction assessment problem with a hybrid approach that fuses explicit user feedback, user satisfaction predictions inferred by two machine-learned models, one trained on user feedback data and the other human annotation data. The hybrid approach is based on a waterfall policy, and the experimental results with Amazon Alexa's large-scale datasets show significant improvements in inferring user satisfaction. A detailed hybrid architecture, an in-depth analysis on user feedback data, and an algorithm that generates data sets to properly simulate the live traffic are presented in this paper. △ Less

Submitted 29 May, 2020; originally announced June 2020.

arXiv:1911.02557 [pdf, other]

Feedback-Based Self-Learning in Large-Scale Conversational AI Agents

Authors: Pragaash Ponnusamy, Alireza Roshan Ghias, Chenlei Guo, Ruhi Sarikaya

Abstract: Today, most large-scale conversational AI agents (e.g. Alexa, Siri, or Google Assistant) are built using manually annotated data to train the different components of the system. Typically, the accuracy of the ML models in these components are improved by manually transcribing and annotating data. As the scope of these systems increase to cover more scenarios and domains, manual annotation to impro… ▽ More Today, most large-scale conversational AI agents (e.g. Alexa, Siri, or Google Assistant) are built using manually annotated data to train the different components of the system. Typically, the accuracy of the ML models in these components are improved by manually transcribing and annotating data. As the scope of these systems increase to cover more scenarios and domains, manual annotation to improve the accuracy of these components becomes prohibitively costly and time consuming. In this paper, we propose a system that leverages user-system interaction feedback signals to automate learning without any manual annotation. Users here tend to modify a previous query in hopes of fixing an error in the previous turn to get the right results. These reformulations, which are often preceded by defective experiences caused by errors in ASR, NLU, ER or the application. In some cases, users may not properly formulate their requests (e.g. providing partial title of a song), but gleaning across a wider pool of users and sessions reveals the underlying recurrent patterns. Our proposed self-learning system automatically detects the errors, generate reformulations and deploys fixes to the runtime system to correct different types of errors occurring in different components of the system. In particular, we propose leveraging an absorbing Markov Chain model as a collaborative filtering mechanism in a novel attempt to mine these patterns. We show that our approach is highly scalable, and able to learn reformulations that reduce Alexa-user errors by pooling anonymized data across millions of customers. The proposed self-learning system achieves a win/loss ratio of 11.8 and effectively reduces the defect rate by more than 30% on utterance level reformulations in our production A/B tests. To the best of our knowledge, this is the first self-learning large-scale conversational AI system in production. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: 8 pages, 2 figures

arXiv:1905.00924 [pdf, other]

Locale-agnostic Universal Domain Classification Model in Spoken Language Understanding

Authors: Jihwan Lee, Ruhi Sarikaya, Young-Bum Kim

Abstract: In this paper, we introduce an approach for leveraging available data across multiple locales sharing the same language to 1) improve domain classification model accuracy in Spoken Language Understanding and user experience even if new locales do not have sufficient data and 2) reduce the cost of scaling the domain classifier to a large number of locales. We propose a locale-agnostic universal dom… ▽ More In this paper, we introduce an approach for leveraging available data across multiple locales sharing the same language to 1) improve domain classification model accuracy in Spoken Language Understanding and user experience even if new locales do not have sufficient data and 2) reduce the cost of scaling the domain classifier to a large number of locales. We propose a locale-agnostic universal domain classification model based on selective multi-task learning that learns a joint representation of an utterance over locales with different sets of domains and allows locales to share knowledge selectively depending on the domains. The experimental results demonstrate the effectiveness of our approach on domain classification task in the scenario of multiple locales with imbalanced data and disparate domain sets. The proposed approach outperforms other baselines models especially when classifying locale-specific domains and also low-resourced domains. △ Less

Submitted 2 May, 2019; originally announced May 2019.

Comments: NAACL-HLT 2019

arXiv:1905.00921 [pdf, other]

Continuous Learning for Large-scale Personalized Domain Classification

Authors: Han Li, Jihwan Lee, Sidharth Mudgal, Ruhi Sarikaya, Young-Bum Kim

Abstract: Domain classification is the task of mapping spoken language utterances to one of the natural language understanding domains in intelligent personal digital assistants (IPDAs). This is a major component in mainstream IPDAs in industry. Apart from official domains, thousands of third-party domains are also created by external developers to enhance the capability of IPDAs. As more domains are develo… ▽ More Domain classification is the task of mapping spoken language utterances to one of the natural language understanding domains in intelligent personal digital assistants (IPDAs). This is a major component in mainstream IPDAs in industry. Apart from official domains, thousands of third-party domains are also created by external developers to enhance the capability of IPDAs. As more domains are developed rapidly, the question of how to continuously accommodate the new domains still remains challenging. Moreover, existing continual learning approaches do not address the problem of incorporating personalized information dynamically for better domain classification. In this paper, we propose CoNDA, a neural network based approach for domain classification that supports incremental learning of new classes. Empirical evaluation shows that CoNDA achieves high accuracy and outperforms baselines by a large margin on both incrementally added new domains and existing domains. △ Less

Submitted 2 May, 2019; originally announced May 2019.

Comments: NAACL-HLT 2019

arXiv:1812.06083 [pdf, other]

Coupled Representation Learning for Domains, Intents and Slots in Spoken Language Understanding

Authors: JIhwan Lee, Dongchan Kim, Ruhi Sarikaya, Young-Bum Kim

Abstract: Representation learning is an essential problem in a wide range of applications and it is important for performing downstream tasks successfully. In this paper, we propose a new model that learns coupled representations of domains, intents, and slots by taking advantage of their hierarchical dependency in a Spoken Language Understanding system. Our proposed model learns the vector representation o… ▽ More Representation learning is an essential problem in a wide range of applications and it is important for performing downstream tasks successfully. In this paper, we propose a new model that learns coupled representations of domains, intents, and slots by taking advantage of their hierarchical dependency in a Spoken Language Understanding system. Our proposed model learns the vector representation of intents based on the slots tied to these intents by aggregating the representations of the slots. Similarly, the vector representation of a domain is learned by aggregating the representations of the intents tied to a specific domain. To the best of our knowledge, it is the first approach to jointly learning the representations of domains, intents, and slots using their hierarchical relationships. The experimental results demonstrate the effectiveness of the representations learned by our model, as evidenced by improved performance on the contextual cross-domain reranking task. △ Less

Submitted 13 December, 2018; originally announced December 2018.

Comments: IEEE SLT 2018

arXiv:1810.12464 [pdf, other]

Differentiable Greedy Networks

Authors: Thomas Powers, Rasool Fakoor, Siamak Shakeri, Abhinav Sethy, Amanjit Kainth, Abdel-rahman Mohamed, Ruhi Sarikaya

Abstract: Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization. In this paper, we propose a subset selection algorithm that is trainable with gradient-based methods yet achieves near-optimal performance via submodular optimization. We focus on the task of identifying a relevant set of sentences for claim verification in the context of the FEVER t… ▽ More Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization. In this paper, we propose a subset selection algorithm that is trainable with gradient-based methods yet achieves near-optimal performance via submodular optimization. We focus on the task of identifying a relevant set of sentences for claim verification in the context of the FEVER task. Conventional methods for this task look at sentences on their individual merit and thus do not optimize the informativeness of sentences as a set. We show that our proposed method which builds on the idea of unfolding a greedy algorithm into a computational graph allows both interpretability and gradient-based training. The proposed differentiable greedy network (DGN) outperforms discrete optimization algorithms as well as other baseline methods in terms of precision and recall. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: Work in progress and under review

arXiv:1810.00679 [pdf, other]

Direct optimization of F-measure for retrieval-based personal question answering

Authors: Rasool Fakoor, Amanjit Kainth, Siamak Shakeri, Christopher Winestock, Abdel-rahman Mohamed, Ruhi Sarikaya

Abstract: Recent advances in spoken language technologies and the introduction of many customer facing products, have given rise to a wide customer reliance on smart personal assistants for many of their daily tasks. In this paper, we present a system to reduce users' cognitive load by extending personal assistants with long-term personal memory where users can store and retrieve by voice, arbitrary pieces… ▽ More Recent advances in spoken language technologies and the introduction of many customer facing products, have given rise to a wide customer reliance on smart personal assistants for many of their daily tasks. In this paper, we present a system to reduce users' cognitive load by extending personal assistants with long-term personal memory where users can store and retrieve by voice, arbitrary pieces of information. The problem is framed as a neural retrieval based question answering system where answers are selected from previously stored user memories. We propose to directly optimize the end-to-end retrieval performance, measured by the F1-score, using reinforcement learning, leading to better performance on our experimental test set(s). △ Less

Submitted 27 September, 2018; originally announced October 2018.

Comments: accepted at SLT2018

arXiv:1806.01773 [pdf, other]

doi 10.21437/Interspeech.2018-1035

Contextual Slot Carryover for Disparate Schemas

Authors: Chetan Naik, Arpit Gupta, Hancheng Ge, Lambert Mathias, Ruhi Sarikaya

Abstract: In the slot-filling paradigm, where a user can refer back to slots in the context during a conversation, the goal of the contextual understanding system is to resolve the referring expressions to the appropriate slots in the context. In large-scale multi-domain systems, this presents two challenges - scaling to a very large and potentially unbounded set of slot values, and dealing with diverse sch… ▽ More In the slot-filling paradigm, where a user can refer back to slots in the context during a conversation, the goal of the contextual understanding system is to resolve the referring expressions to the appropriate slots in the context. In large-scale multi-domain systems, this presents two challenges - scaling to a very large and potentially unbounded set of slot values, and dealing with diverse schemas. We present a neural network architecture that addresses the slot value scalability challenge by reformulating the contextual interpretation as a decision to carryover a slot from a set of possible candidates. To deal with heterogenous schemas, we introduce a simple data-driven method for trans- forming the candidate slots. Our experiments show that our approach can scale to multiple domains and provides competitive results over a strong baseline. △ Less

Submitted 5 June, 2018; originally announced June 2018.

Comments: Accepted at Interspeech 2018

arXiv:1804.08065 [pdf, other]

Efficient Large-Scale Domain Classification with Personalized Attention

Authors: Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya

Abstract: In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs). This scenario is observed for many mainstream IPDAs in industry that allow third parties to develop thousands of new domains to augment built-in ones to rapidly increase domain coverage and overall IPDA capabiliti… ▽ More In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs). This scenario is observed for many mainstream IPDAs in industry that allow third parties to develop thousands of new domains to augment built-in ones to rapidly increase domain coverage and overall IPDA capabilities. We propose a scalable neural model architecture with a shared encoder, a novel attention mechanism that incorporates personalization information and domain-specific classifiers that solves the problem efficiently. Our architecture is designed to efficiently accommodate new domains that appear in-between full model retraining cycles with a rapid bootstrapping mechanism two orders of magnitude faster than retraining. We account for practical constraints in real-time production systems, and design to minimize memory footprint and runtime latency. We demonstrate that incorporating personalization results in significantly more accurate domain classification in the setting with thousands of overlapping domains. △ Less

Submitted 22 April, 2018; originally announced April 2018.

Comments: Accepted to ACL 2018

arXiv:1804.08064 [pdf, other]

A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding

Authors: Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, Ruhi Sarikaya

Abstract: Intelligent personal digital assistants (IPDAs), a popular real-life application with spoken language understanding capabilities, can cover potentially thousands of overlapping domains for natural language understanding, and the task of finding the best domain to handle an utterance becomes a challenging problem on a large scale. In this paper, we propose a set of efficient and scalable neural sho… ▽ More Intelligent personal digital assistants (IPDAs), a popular real-life application with spoken language understanding capabilities, can cover potentially thousands of overlapping domains for natural language understanding, and the task of finding the best domain to handle an utterance becomes a challenging problem on a large scale. In this paper, we propose a set of efficient and scalable neural shortlisting-reranking models for large-scale domain classification in IPDAs. The shortlisting stage focuses on efficiently trimming all domains down to a list of k-best candidate domains, and the reranking stage performs a list-wise reranking of the initial k-best domains with additional contextual information. We show the effectiveness of our approach with extensive experiments on 1,500 IPDA domains. △ Less

Submitted 21 April, 2018; originally announced April 2018.

Comments: Accepted to NAACL 2018

arXiv:1711.10705 [pdf, other]

Speaker-Sensitive Dual Memory Networks for Multi-Turn Slot Tagging

Authors: Young-Bum Kim, Sungjin Lee, Ruhi Sarikaya

Abstract: In multi-turn dialogs, natural language understanding models can introduce obvious errors by being blind to contextual information. To incorporate dialog history, we present a neural architecture with Speaker-Sensitive Dual Memory Networks which encode utterances differently depending on the speaker. This addresses the different extents of information available to the system - the system knows onl… ▽ More In multi-turn dialogs, natural language understanding models can introduce obvious errors by being blind to contextual information. To incorporate dialog history, we present a neural architecture with Speaker-Sensitive Dual Memory Networks which encode utterances differently depending on the speaker. This addresses the different extents of information available to the system - the system knows only the surface form of user utterances while it has the exact semantics of system output. We performed experiments on real user data from Microsoft Cortana, a commercial personal assistant. The result showed a significant performance improvement over the state-of-the-art slot tagging models using contextual information. △ Less

Submitted 29 November, 2017; originally announced November 2017.

Comments: 5 pages conference paper accepted to IEEE ASRU 2017. Will be published in December 2017

Showing 1–16 of 16 results for author: Sarikaya, R