-
Sentence-level Aggregation of Lexical Metrics Correlates Stronger with Human Judgements than Corpus-level Aggregation
Authors:
Paulo Cavalin,
Pedro Henrique Domingues,
Claudio Pinhanez
Abstract:
In this paper we show that corpus-level aggregation hinders considerably the capability of lexical metrics to accurately evaluate machine translation (MT) systems. With empirical experiments we demonstrate that averaging individual segment-level scores can make metrics such as BLEU and chrF correlate much stronger with human judgements and make them behave considerably more similar to neural metri…
▽ More
In this paper we show that corpus-level aggregation hinders considerably the capability of lexical metrics to accurately evaluate machine translation (MT) systems. With empirical experiments we demonstrate that averaging individual segment-level scores can make metrics such as BLEU and chrF correlate much stronger with human judgements and make them behave considerably more similar to neural metrics such as COMET and BLEURT. We show that this difference exists because corpus- and segment-level aggregation differs considerably owing to the classical average of ratio versus ratio of averages Mathematical problem. Moreover, as we also show, such difference affects considerably the statistical robustness of corpus-level aggregation. Considering that neural metrics currently only cover a small set of sufficiently-resourced languages, the results in this paper can help make the evaluation of MT systems for low-resource languages more trustworthy.
△ Less
Submitted 23 January, 2025; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences
Authors:
Claudio Pinhanez,
Paulo Cavalin,
Luciana Storto,
Thomas Finbow,
Alexander Cobbinah,
Julio Nogima,
Marisa Vasconcelos,
Pedro Domingues,
Priscila de Souza Mizukami,
Nicole Grell,
Majoí Gongora,
Isabel Gonçalves
Abstract:
Since 2022 we have been exploring application areas and technologies in which Artificial Intelligence (AI) and modern Natural Language Processing (NLP), such as Large Language Models (LLMs), can be employed to foster the usage and facilitate the documentation of Indigenous languages which are in danger of disappearing. We start by discussing the decreasing diversity of languages in the world and h…
▽ More
Since 2022 we have been exploring application areas and technologies in which Artificial Intelligence (AI) and modern Natural Language Processing (NLP), such as Large Language Models (LLMs), can be employed to foster the usage and facilitate the documentation of Indigenous languages which are in danger of disappearing. We start by discussing the decreasing diversity of languages in the world and how working with Indigenous languages poses unique ethical challenges for AI and NLP. To address those challenges, we propose an alternative development AI cycle based on community engagement and usage. Then, we report encouraging results in the development of high-quality machine learning translators for Indigenous languages by fine-tuning state-of-the-art (SOTA) translators with tiny amounts of data and discuss how to avoid some common pitfalls in the process. We also present prototypes we have built in projects done in 2023 and 2024 with Indigenous communities in Brazil, aimed at facilitating writing, and discuss the development of Indigenous Language Models (ILMs) as a replicable and scalable way to create spell-checkers, next-word predictors, and similar tools. Finally, we discuss how we envision a future for language documentation where dying languages are preserved as interactive language models.
△ Less
Submitted 29 July, 2024; v1 submitted 17 July, 2024;
originally announced July 2024.
-
Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations
Authors:
Claudio Pinhanez,
Raul Fernandez,
Marcelo Grave,
Julio Nogima,
Ron Hoory
Abstract:
Representations of AI agents in user interfaces and robotics are predominantly White, not only in terms of facial and skin features, but also in the synthetic voices they use. In this paper we explore some unexpected challenges in the representation of race we found in the process of developing an U.S. English Text-to-Speech (TTS) system aimed to sound like an educated, professional, regional acce…
▽ More
Representations of AI agents in user interfaces and robotics are predominantly White, not only in terms of facial and skin features, but also in the synthetic voices they use. In this paper we explore some unexpected challenges in the representation of race we found in the process of developing an U.S. English Text-to-Speech (TTS) system aimed to sound like an educated, professional, regional accent-free African American woman. The paper starts by presenting the results of focus groups with African American IT professionals where guidelines and challenges for the creation of a representative and appropriate TTS system were discussed and gathered, followed by a discussion about some of the technical difficulties faced by the TTS system developers. We then describe two studies with U.S. English speakers where the participants were not able to attribute the correct race to the African American TTS voice while overwhelmingly correctly recognizing the race of a White TTS system of similar quality. A focus group with African American IT workers not only confirmed the representativeness of the African American voice we built, but also suggested that the surprising recognition results may have been caused by the inability or the latent prejudice from non-African Americans to associate educated, non-vernacular, professionally-sounding voices to African American people.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Exploring the Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Out-of-Scope Detection Tasks
Authors:
Claudio Pinhanez,
Paulo Cavalin
Abstract:
This work explores the intrinsic limitations of the popular one-hot encoding method in classification of intents when detection of out-of-scope (OOS) inputs is required. Although recent work has shown that there can be significant improvements in OOS detection when the intent classes are represented as dense-vectors based on domain specific knowledge, we argue in this paper that such gains are mor…
▽ More
This work explores the intrinsic limitations of the popular one-hot encoding method in classification of intents when detection of out-of-scope (OOS) inputs is required. Although recent work has shown that there can be significant improvements in OOS detection when the intent classes are represented as dense-vectors based on domain specific knowledge, we argue in this paper that such gains are more likely due to advantages of dense-vector to one-hot encoding methods in representing the complexity of the OOS space. We start by showing how dense-vector encodings can create OOS spaces with much richer topologies than one-hot encoding methods. We then demonstrate empirically, using four standard intent classification datasets, that knowledge-free, randomly generated dense-vector encodings of intent classes can yield massive, over 20% gains over one-hot encodings, and also outperform the previous, domain knowledge-based, SOTA of one of the datasets. We finish by describing a novel algorithm to search for good dense-vector encodings and present initial but promising experimental results of its use.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Towards a New Science of Disinformation
Authors:
Claudio S. Pinhanez,
German H. Flores,
Marisa A. Vasconcelos,
Mu Qiao,
Nick Linck,
Rogério de Paula,
Yuya J. Ong
Abstract:
How can we best address the dangerous impact that deep learning-generated fake audios, photographs, and videos (a.k.a. deepfakes) may have in personal and societal life? We foresee that the availability of cheap deepfake technology will create a second wave of disinformation where people will receive specific, personalized disinformation through different channels, making the current approaches to…
▽ More
How can we best address the dangerous impact that deep learning-generated fake audios, photographs, and videos (a.k.a. deepfakes) may have in personal and societal life? We foresee that the availability of cheap deepfake technology will create a second wave of disinformation where people will receive specific, personalized disinformation through different channels, making the current approaches to fight disinformation obsolete. We argue that fake media has to be seen as an upcoming cybersecurity problem, and we have to shift from combating its spread to a prevention and cure framework where users have available ways to verify, challenge, and argue against the veracity of each piece of media they are exposed to. To create the technologies behind this framework, we propose that a new Science of Disinformation is needed, one which creates a theoretical framework both for the processes of communication and consumption of false content. Key scientific and technological challenges facing this research agenda are listed and discussed in the light of state-of-art technologies for fake media generation and detection, argument finding and construction, and how to effectively engage users in the prevention and cure processes.
△ Less
Submitted 17 March, 2022;
originally announced April 2022.
-
Expose Uncertainty, Instill Distrust, Avoid Explanations: Towards Ethical Guidelines for AI
Authors:
Claudio S. Pinhanez
Abstract:
In this position paper, I argue that the best way to help and protect humans using AI technology is to make them aware of the intrinsic limitations and problems of AI algorithms. To accomplish this, I suggest three ethical guidelines to be used in the presentation of results, mandating AI systems to expose uncertainty, to instill distrust, and, contrary to traditional views, to avoid explanations.…
▽ More
In this position paper, I argue that the best way to help and protect humans using AI technology is to make them aware of the intrinsic limitations and problems of AI algorithms. To accomplish this, I suggest three ethical guidelines to be used in the presentation of results, mandating AI systems to expose uncertainty, to instill distrust, and, contrary to traditional views, to avoid explanations. The paper does a preliminary discussion of the guidelines and provides some arguments for their adoption, aiming to start a debate in the community about AI ethics in practice.
△ Less
Submitted 29 November, 2021;
originally announced December 2021.
-
Using Meta-Knowledge Mined from Identifiers to Improve Intent Recognition in Neuro-Symbolic Algorithms
Authors:
Claudio Pinhanez,
Paulo Cavalin,
Victor Ribeiro,
Heloisa Candello,
Julio Nogima,
Ana Appel,
Mauro Pichiliani,
Maira Gatti de Bayser,
Melina Guerra,
Henrique Ferreira,
Gabriel Malfatti
Abstract:
In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve intent recognition in conversational systems. As evidenced by the analysis of thousands of real-world chatbots and in interviews with professional chatbot curators, developers and domain experts tend to organize the set of chatbot intents by identifying them using proto-taxonomies, i.e., meta-knowledge con…
▽ More
In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve intent recognition in conversational systems. As evidenced by the analysis of thousands of real-world chatbots and in interviews with professional chatbot curators, developers and domain experts tend to organize the set of chatbot intents by identifying them using proto-taxonomies, i.e., meta-knowledge connecting high-level, symbolic concepts shared across different intents. By using neuro-symbolic algorithms able to incorporate such proto-taxonomies to expand intent representation, we show that such mined meta-knowledge can improve accuracy in intent recognition. In a dataset with intents and example utterances from hundreds of professional chatbots, we saw improvements of more than 10% in the equal error rate (EER) in almost a third of the chatbots when we apply those algorithms in comparison to a baseline of the same algorithms without the meta-knowledge. The meta-knowledge proved to be even more relevant in detecting out-of-scope utterances, decreasing the false acceptance rate (FAR) in more than 20\% in about half of the chatbots. The experiments demonstrate that such symbolic meta-knowledge structures can be effectively mined and used by neuro-symbolic algorithms, apparently by incorporating into the learning process higher-level structures of the problem being solved. Based on these results, we also discuss how the use of mined meta-knowledge can be an answer for the challenge of knowledge acquisition in neuro-symbolic algorithms.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
A Hybrid Solution to Learn Turn-Taking in Multi-Party Service-based Chat Groups
Authors:
Maira Gatti de Bayser,
Melina Alberio Guerra,
Paulo Cavalin,
Claudio Pinhanez
Abstract:
To predict the next most likely participant to interact in a multi-party conversation is a difficult problem. In a text-based chat group, the only information available is the sender, the content of the text and the dialogue history. In this paper we present our study on how these information can be used on the prediction task through a corpus and architecture that integrates turn-taking classifie…
▽ More
To predict the next most likely participant to interact in a multi-party conversation is a difficult problem. In a text-based chat group, the only information available is the sender, the content of the text and the dialogue history. In this paper we present our study on how these information can be used on the prediction task through a corpus and architecture that integrates turn-taking classifiers based on Maximum Likelihood Expectation (MLE), Convolutional Neural Networks (CNN) and Finite State Automata (FSA). The corpus is a synthetic adaptation of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ) to a multiple travel service-based bots scenario with dialogue errors and was created to simulate user's interaction and evaluate the architecture. We present experimental results which show that the CNN approach achieves better performance than the baseline with an accuracy of 92.34%, but the integrated solution with MLE, CNN and FSA achieves performance even better, with 95.65%.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Machine Teaching by Domain Experts: Towards More Humane,Inclusive, and Intelligent Machine Learning Systems
Authors:
Claudio Pinhanez
Abstract:
This paper argues that a possible way to escape from the limitations of current machine learning (ML) systems is to allow their development directly by domain experts without the mediation of ML experts. This could be accomplished by making ML systems interactively teachable using concepts, definitions, and similar high level knowledge constructs. Pointing to the recent advances in machine teachin…
▽ More
This paper argues that a possible way to escape from the limitations of current machine learning (ML) systems is to allow their development directly by domain experts without the mediation of ML experts. This could be accomplished by making ML systems interactively teachable using concepts, definitions, and similar high level knowledge constructs. Pointing to the recent advances in machine teaching technology, we list key technical challenges specific for such expert-centric ML systems, and suggest that they are more humane and possibly more intelligent than traditional ML systems in many domains. We then argue that ML systems could also benefit greatly from being built by a community of experts as much as open source software did, creating more inclusive systems, in terms of enabling different points-of-view about the same corpus of knowledge. Advantages of the community approach over current ways to build ML systems, as well as specific challenges this approach raises, are also discussed in the paper.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Learning Multi-Party Turn-Taking Models from Dialogue Logs
Authors:
Maira Gatti de Bayser,
Paulo Cavalin,
Claudio Pinhanez,
Bianca Zadrozny
Abstract:
This paper investigates the application of machine learning (ML) techniques to enable intelligent systems to learn multi-party turn-taking models from dialogue logs. The specific ML task consists of determining who speaks next, after each utterance of a dialogue, given who has spoken and what was said in the previous utterances. With this goal, this paper presents comparisons of the accuracy of di…
▽ More
This paper investigates the application of machine learning (ML) techniques to enable intelligent systems to learn multi-party turn-taking models from dialogue logs. The specific ML task consists of determining who speaks next, after each utterance of a dialogue, given who has spoken and what was said in the previous utterances. With this goal, this paper presents comparisons of the accuracy of different ML techniques such as Maximum Likelihood Estimation (MLE), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) architectures, with and without utterance data. We present three corpora: the first with dialogues from an American TV situated comedy (chit-chat), the second with logs from a financial advice multi-bot system and the third with a corpus created from the Multi-Domain Wizard-of-Oz dataset (both are topic-oriented). The results show: (i) the size of the corpus has a very positive impact on the accuracy for the content-based deep learning approaches and those models perform best in the larger datasets; and (ii) if the dialogue dataset is small and topic-oriented (but with few topics), it is sufficient to use an agent-only MLE or SVM models, although slightly higher accuracies can be achieved with the use of the content of the utterances with a CNN model.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.
-
Different but Equal: Comparing User Collaboration with Digital Personal Assistants vs. Teams of Expert Agents
Authors:
Claudio S. Pinhanez,
Heloisa Candello,
Mauro C. Pichiliani,
Marisa Vasconcelos,
Melina Guerra,
Maíra G. de Bayser,
Paulo Cavalin
Abstract:
This work compares user collaboration with conversational personal assistants vs. teams of expert chatbots. Two studies were performed to investigate whether each approach affects accomplishment of tasks and collaboration costs. Participants interacted with two equivalent financial advice chatbot systems, one composed of a single conversational adviser and the other based on a team of four experts…
▽ More
This work compares user collaboration with conversational personal assistants vs. teams of expert chatbots. Two studies were performed to investigate whether each approach affects accomplishment of tasks and collaboration costs. Participants interacted with two equivalent financial advice chatbot systems, one composed of a single conversational adviser and the other based on a team of four experts chatbots. Results indicated that users had different forms of experiences but were equally able to achieve their goals. Contrary to the expected, there were evidences that in the teamwork situation that users were more able to predict agent behavior better and did not have an overhead to maintain common ground, indicating similar collaboration costs. The results point towards the feasibility of either of the two approaches for user collaboration with conversational agents.
△ Less
Submitted 24 August, 2018;
originally announced August 2018.
-
Combining Textual Content and Structure to Improve Dialog Similarity
Authors:
Ana Paula Appel,
Paulo Rodrigo Cavalin,
Marisa Affonso Vasconcelos,
Claudio Santos Pinhanez
Abstract:
Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data ha…
▽ More
Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data has a different nature of a simple question and answering interaction, in which context and temporal properties (turn order) creates a different understanding of such data. In this paper, we propose a novelty metric to compute dialogs' similarity based not only on the text content but also on the information related to the dialog structure. Our experimental results performed over the Switchboard dataset show that using evidence from both textual content and the dialog structure leads to more accurate results than using each measure in isolation.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
A Social Network Analysis Framework for Modeling Health Insurance Claims Data
Authors:
Ana Paula Appel,
Vagner F. de Santana,
Luis G. Moyano,
Marcia Ito,
Claudio Santos Pinhanez
Abstract:
Health insurance companies in Brazil have their data about claims organized having the view only for providers. In this way, they loose the physician view and how they share patients. Partnership between physicians can view as a fruitful work in most of the cases but sometimes this could be a problem for health insurance companies and patients, for example a recommendation to visit another physici…
▽ More
Health insurance companies in Brazil have their data about claims organized having the view only for providers. In this way, they loose the physician view and how they share patients. Partnership between physicians can view as a fruitful work in most of the cases but sometimes this could be a problem for health insurance companies and patients, for example a recommendation to visit another physician only because they work in same clinic. The focus of the work is to better understand physicians activities and how these activities are represented in the data. Our approach considers three aspects: the relationships among physicians, the relationships between physicians and patients, and the relationships between physicians and health providers. We present the results of an analysis of a claims database (detailing 18 months of activity) from a large health insurance company in Brazil. The main contribution presented in this paper is a set of models to represent: mutual referral between physicians, patient retention, and physician centrality in the health insurance network. Our results show the proposed models based on social network frameworks, extracted surprising insights about physicians from real health insurance claims data.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
Computer Interfaces to Organizations: Perspectives on Borg-Human Interaction Design
Authors:
Claudio Pinhanez
Abstract:
We use the term borg to refer to the complex organizations composed of people, machines, and processes with which users frequently interact using computer interfaces and websites. Unlike interfaces to pure machines, we contend that borg-human interaction (BHI) happens in a context combining the anthropomorphization of the interface, conflict with users, and dramatization of the interaction process…
▽ More
We use the term borg to refer to the complex organizations composed of people, machines, and processes with which users frequently interact using computer interfaces and websites. Unlike interfaces to pure machines, we contend that borg-human interaction (BHI) happens in a context combining the anthropomorphization of the interface, conflict with users, and dramatization of the interaction process. We believe this context requires designers to construct the human facet of the borg, a structure encompassing the borg's personality, social behavior, and embodied actions; and the strategies to co-create dramatic narratives with the user. To design the human facet of a borg, different concepts and models are explored and discussed, borrowing ideas from psychology, sociology, and arts. Based on those foundations, we propose six design methodologies to complement traditional computer-human interface design techniques, including play-and-freeze enactment of conflicts and the use of giant puppets as interface prototypes.
△ Less
Submitted 8 December, 2017;
originally announced December 2017.
-
A Hybrid Architecture for Multi-Party Conversational Systems
Authors:
Maira Gatti de Bayser,
Paulo Cavalin,
Renan Souza,
Alan Braz,
Heloisa Candello,
Claudio Pinhanez,
Jean-Pierre Briot
Abstract:
Multi-party Conversational Systems are systems with natural language interaction between one or more people or systems. From the moment that an utterance is sent to a group, to the moment that it is replied in the group by a member, several activities must be done by the system: utterance understanding, information search, reasoning, among others. In this paper we present the challenges of designi…
▽ More
Multi-party Conversational Systems are systems with natural language interaction between one or more people or systems. From the moment that an utterance is sent to a group, to the moment that it is replied in the group by a member, several activities must be done by the system: utterance understanding, information search, reasoning, among others. In this paper we present the challenges of designing and building multi-party conversational systems, the state of the art, our proposed hybrid architecture using both rules and machine learning and some insights after implementing and evaluating one on the finance domain.
△ Less
Submitted 4 May, 2017; v1 submitted 2 May, 2017;
originally announced May 2017.