-
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World
Authors:
Sina J. Semnani,
Pingyue Zhang,
Wanyue Zhai,
Haozhuo Li,
Ryan Beauchamp,
Trey Billing,
Katayoun Kishi,
Manling Li,
Monica S. Lam
Abstract:
This paper presents LEMONADE, a large-scale conflict event dataset comprising 39,786 events across 20 languages and 171 countries, with extensive coverage of region-specific entities. LEMONADE is based on a partially reannotated subset of the Armed Conflict Location & Event Data (ACLED), which has documented global conflict events for over a decade.
To address the challenge of aggregating multil…
▽ More
This paper presents LEMONADE, a large-scale conflict event dataset comprising 39,786 events across 20 languages and 171 countries, with extensive coverage of region-specific entities. LEMONADE is based on a partially reannotated subset of the Armed Conflict Location & Event Data (ACLED), which has documented global conflict events for over a decade.
To address the challenge of aggregating multilingual sources for global event analysis, we introduce abstractive event extraction (AEE) and its subtask, abstractive entity linking (AEL). Unlike conventional span-based event extraction, our approach detects event arguments and entities through holistic document understanding and normalizes them across the multilingual dataset. We evaluate various large language models (LLMs) on these tasks, adapt existing zero-shot event extraction systems, and benchmark supervised models. Additionally, we introduce ZEST, a novel zero-shot retrieval-based system for AEL.
Our best zero-shot system achieves an end-to-end F1 score of 58.3%, with LLMs outperforming specialized event extraction models such as GoLLIE. For entity linking, ZEST achieves an F1 score of 45.7%, significantly surpassing OneNet, a state-of-the-art zero-shot baseline that achieves only 23.7%. However, these zero-shot results lag behind the best supervised systems by 20.1% and 37.0% in the end-to-end and AEL tasks, respectively, highlighting the need for further research.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
Authors:
Yucheng Jiang,
Yijia Shao,
Dekun Ma,
Sina J. Semnani,
Monica S. Lam
Abstract:
While language model (LM)-powered chatbots and generative search engines excel at answering concrete queries, discovering information in the terrain of unknown unknowns remains challenging for users. To emulate the common educational scenario where children/students learn by listening to and participating in conversations of their parents/teachers, we create Collaborative STORM (Co-STORM). Unlike…
▽ More
While language model (LM)-powered chatbots and generative search engines excel at answering concrete queries, discovering information in the terrain of unknown unknowns remains challenging for users. To emulate the common educational scenario where children/students learn by listening to and participating in conversations of their parents/teachers, we create Collaborative STORM (Co-STORM). Unlike QA systems that require users to ask all the questions, Co-STORM lets users observe and occasionally steer the discourse among several LM agents. The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously. To facilitate user interaction, Co-STORM assists users in tracking the discourse by organizing the uncovered information into a dynamic mind map, ultimately generating a comprehensive report as takeaways. For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals. Co-STORM outperforms baseline methods on both discourse trace and report quality. In a further human evaluation, 70% of participants prefer Co-STORM over a search engine, and 78% favor it over a RAG chatbot.
△ Less
Submitted 17 October, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions
Authors:
Shicheng Liu,
Sina J. Semnani,
Harold Triedman,
Jialiang Xu,
Isaac Dan Zhao,
Monica S. Lam
Abstract:
Large Language Models (LLMs) have led to significant improvements in the Knowledge Base Question Answering (KBQA) task. However, datasets used in KBQA studies do not capture the true complexity of KBQA tasks. They either have simple questions, use synthetically generated logical forms, or are based on small knowledge base (KB) schemas.
We introduce the SPINACH dataset, an expert-annotated KBQA d…
▽ More
Large Language Models (LLMs) have led to significant improvements in the Knowledge Base Question Answering (KBQA) task. However, datasets used in KBQA studies do not capture the true complexity of KBQA tasks. They either have simple questions, use synthetically generated logical forms, or are based on small knowledge base (KB) schemas.
We introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. The complexity of these in-the-wild queries calls for a KBQA system that can dynamically explore large and often incomplete schemas and reason about them, as it is infeasible to create a comprehensive training dataset.
We also introduce an in-context learning KBQA agent, also called SPINACH, that mimics how a human expert would write SPARQLs to handle challenging questions. SPINACH achieves a new state of the art on the QALD-7, QALD-9 Plus and QALD-10 datasets by 31.0%, 27.0%, and 10.0% in $F_1$, respectively, and coming within 1.6% of the fine-tuned LLaMA SOTA model on WikiWebQuestions. On our new SPINACH dataset, the SPINACH agent outperforms all baselines, including the best GPT-4-based KBQA agent, by at least 38.1% in $F_1$.
△ Less
Submitted 21 October, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval
Authors:
Kazuaki Furumai,
Roberto Legaspi,
Julio Vizcarra,
Yudai Yamazaki,
Yasutaka Nishimura,
Sina J. Semnani,
Kazushi Ikeda,
Weiyan Shi,
Monica S. Lam
Abstract:
Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots employed responsibly for social good can be an enabler of positive individual and social change. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. Furthermore, they emplo…
▽ More
Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots employed responsibly for social good can be an enabler of positive individual and social change. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. Furthermore, they employ only a handful of pre-defined persuasion strategies. We propose PersuaBot, a zero-shot chatbot based on Large Language Models (LLMs) that is factual and more persuasive by leveraging many more nuanced strategies. PersuaBot uses an LLM to first generate natural responses, from which the strategies used are extracted. To combat hallucination of LLMs, Persuabot replace any unsubstantiated claims in the response with retrieved facts supporting the extracted strategies. We applied our chatbot, PersuaBot, to three significantly different domains needing persuasion skills: donation solicitation, recommendations, and health intervention. Our experiments on simulated and human conversations show that our zero-shot approach is more persuasive than prior work, while achieving factual accuracy surpassing state-of-the-art knowledge-oriented chatbots.
△ Less
Submitted 23 October, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
LINSCAN -- A Linearity Based Clustering Algorithm
Authors:
Andrew Dennehy,
Xiaoyu Zou,
Shabnam J. Semnani,
Yuri Fialko,
Alexander Cloninger
Abstract:
DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approx…
▽ More
DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approximating their local neighborhoods and leveraging a distance function derived from the Kullback Leibler Divergence, LINSCAN can detect and distinguish lineated clusters that are spatially close but have orthogonal covariances. We demonstrate how LINSCAN can be applied to seismic data to identify active faults, including intersecting faults, and determine their orientation. Finally, we discuss the properties a generalization of DBSCAN and OPTICS must have in order to retain the stability benefits of these algorithms.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing
Authors:
Heidi C. Zhang,
Sina J. Semnani,
Farhad Ghassemi,
Jialiang Xu,
Shicheng Liu,
Monica S. Lam
Abstract:
We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive he…
▽ More
We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive heterogeneous open-domain QA dataset, with 56.5% exact match (EM) rate. More importantly, manual analysis on a sample of the dataset suggests that SPAGHETTI is more than 90% accurate, indicating that EM is no longer suitable for assessing the capabilities of QA systems today.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Benchmarks Underestimate the Readiness of Multi-lingual Dialogue Agents
Authors:
Andrew H. Lee,
Sina J. Semnani,
Galo Castillo-López,
Gäel de Chalendar,
Monojit Choudhury,
Ashna Dua,
Kapil Rajesh Kavitha,
Sungkyun Kim,
Prashant Kodali,
Ponnurangam Kumaraguru,
Alexis Lombard,
Mehrad Moradshahi,
Gihyun Park,
Nasredine Semmar,
Jiwon Seo,
Tianhao Shen,
Manish Shrivastava,
Deyi Xiong,
Monica S. Lam
Abstract:
Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD.
To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are mor…
▽ More
Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD.
To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are more compatible with in-context learning where only a handful of few-shot examples are used. We test our approach on the multilingual TOD dataset X-RiSAWOZ, which has 12 domains in Chinese, English, French, Korean, Hindi, and code-mixed Hindi-English. Our turn-by-turn DST accuracy on the 6 languages range from 55.6% to 80.3%, seemingly worse than the SOTA results from fine-tuned models that achieve from 60.7% to 82.8%; our BLEU scores in the response generation (RG) subtask are also significantly lower than SOTA.
However, after manual evaluation of the validation set, we find that by correcting gold label errors and improving dataset annotation schema, GPT-4 with our prompts can achieve (1) 89.6%-96.8% accuracy in DST, and (2) more than 99% correct response generation across different languages. This leads us to conclude that current automatic metrics heavily underestimate the effectiveness of in-context learning.
△ Less
Submitted 16 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Incremental Neural Controlled Differential Equations for Modeling of Path-dependent Material Behavior
Authors:
Yangzi He,
Shabnam J. Semnani
Abstract:
Data-driven surrogate modeling has emerged as a promising approach for reducing computational expenses of multiscale simulations. Recurrent Neural Network (RNN) is a common choice for modeling of path-dependent behavior. However, previous studies have shown that RNNs fail to make predictions that are consistent with perturbation in the input strain, leading to potential oscillations and lack of co…
▽ More
Data-driven surrogate modeling has emerged as a promising approach for reducing computational expenses of multiscale simulations. Recurrent Neural Network (RNN) is a common choice for modeling of path-dependent behavior. However, previous studies have shown that RNNs fail to make predictions that are consistent with perturbation in the input strain, leading to potential oscillations and lack of convergence when implemented within finite element simulations. In this work, we leverage neural differential equations which have recently emerged to model time series in a continuous manner and show their robustness in modeling elasto-plastic path-dependent material behavior. We develop a new sequential model called Incremental Neural Controlled Differential Equation (INCDE) for general time-variant dynamical systems, including path-dependent constitutive models. INCDE is formulated and analyzed in terms of stability and convergence. Surrogate models based on INCDE are subsequently trained and tested for J2 and Drucker-Prager plasticity. The surrogate models are implemented for material point simulations and boundary value problems solved using the finite element method with various cyclic and monotonic loading protocols to demonstrate the robustness, consistency and accuracy of the proposed approach.
△ Less
Submitted 28 December, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models
Authors:
Shicheng Liu,
Jialiang Xu,
Wesley Tjangnaka,
Sina J. Semnani,
Chen Jie Yu,
Monica S. Lam
Abstract:
While most conversational agents are grounded on either free-text or structured knowledge, many knowledge corpora consist of hybrid sources. This paper presents the first conversational agent that supports the full generality of hybrid data access for large knowledge corpora, through a language we developed called SUQL (Structured and Unstructured Query Language). Specifically, SUQL extends SQL wi…
▽ More
While most conversational agents are grounded on either free-text or structured knowledge, many knowledge corpora consist of hybrid sources. This paper presents the first conversational agent that supports the full generality of hybrid data access for large knowledge corpora, through a language we developed called SUQL (Structured and Unstructured Query Language). Specifically, SUQL extends SQL with free-text primitives (summary and answer), so information retrieval can be composed with structured data accesses arbitrarily in a formal, succinct, precise, and interpretable notation. With SUQL, we propose the first semantic parser, an LLM with in-context learning, that can handle hybrid data sources.
Our in-context learning-based approach, when applied to the HybridQA dataset, comes within 8.9% exact match and 7.1% F1 of the SOTA, which was trained on 62K data samples. More significantly, unlike previous approaches, our technique is applicable to large databases and free-text corpora. We introduce a dataset consisting of crowdsourced questions and conversations on Yelp, a large, real restaurant knowledge base with structured and unstructured data. We show that our few-shot conversational agent based on SUQL finds an entity satisfying all user requirements 90.3% of the time, compared to 63.4% for a baseline based on linearization.
△ Less
Submitted 13 March, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents
Authors:
Mehrad Moradshahi,
Tianhao Shen,
Kalika Bali,
Monojit Choudhury,
Gaël de Chalendar,
Anmol Goel,
Sungkyun Kim,
Prashant Kodali,
Ponnurangam Kumaraguru,
Nasredine Semmar,
Sina J. Semnani,
Jiwon Seo,
Vivek Seshadri,
Manish Shrivastava,
Michael Sun,
Aditya Yadavalli,
Chaobin You,
Deyi Xiong,
Monica S. Lam
Abstract:
Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-H…
▽ More
Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents.
The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks.
We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Authors:
Sina J. Semnani,
Violet Z. Yao,
Heidi C. Zhang,
Monica S. Lam
Abstract:
This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus.
WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engagi…
▽ More
This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus.
WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engaging responses. We distill WikiChat based on GPT-4 into a 7B-parameter LLaMA model with minimal loss of quality, to significantly improve its latency, cost and privacy, and facilitate research and deployment.
Using a novel hybrid human-and-LLM evaluation methodology, we show that our best system achieves 97.3% factual accuracy in simulated conversations. It significantly outperforms all retrieval-based and LLM-based baselines, and by 3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4. Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is also significantly more informative and engaging, just like an LLM.
WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments.
△ Less
Submitted 27 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata
Authors:
Silei Xu,
Shicheng Liu,
Theo Culhane,
Elizaveta Pertseva,
Meng-Hsi Wu,
Sina J. Semnani,
Monica S. Lam
Abstract:
While large language models (LLMs) can answer many questions correctly, they can also hallucinate and give wrong answers. Wikidata, with its over 12 billion facts, can be used to ground LLMs to improve their factuality. This paper presents WikiWebQuestions, a high-quality question answering benchmark for Wikidata. Ported over from WebQuestions for Freebase, it consists of real-world data with SPAR…
▽ More
While large language models (LLMs) can answer many questions correctly, they can also hallucinate and give wrong answers. Wikidata, with its over 12 billion facts, can be used to ground LLMs to improve their factuality. This paper presents WikiWebQuestions, a high-quality question answering benchmark for Wikidata. Ported over from WebQuestions for Freebase, it consists of real-world data with SPARQL annotation. This paper presents a few-shot sequence-to-sequence semantic parser for Wikidata. We modify SPARQL to use the unique domain and property names instead of their IDs. We train the parser to use either the results from an entity linker or mentions in the query. We fine-tune LLaMA by adding the few-shot training data to that used to fine-tune Alpaca. Our experimental results demonstrate the effectiveness of this methodology, establishing a strong baseline of 76% and 65% answer accuracy in the dev and test sets of WikiWebQuestions, respectively. By pairing our semantic parser with GPT-3, we combine verifiable results with qualified GPT-3 guesses to provide useful answers to 96% of the questions in dev. We also show that our method outperforms the state-of-the-art for the QALD-7 Wikidata dataset by 3.6% in F1 score.
△ Less
Submitted 5 November, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation
Authors:
Mehrad Moradshahi,
Sina J. Semnani,
Monica S. Lam
Abstract:
Task-oriented Dialogue (ToD) agents are mostly limited to a few widely-spoken languages, mainly due to the high cost of acquiring training data for each language. Existing low-cost approaches that rely on cross-lingual embeddings or naive machine translation sacrifice a lot of accuracy for data efficiency, and largely fail in creating a usable dialogue agent. We propose automatic methods that use…
▽ More
Task-oriented Dialogue (ToD) agents are mostly limited to a few widely-spoken languages, mainly due to the high cost of acquiring training data for each language. Existing low-cost approaches that rely on cross-lingual embeddings or naive machine translation sacrifice a lot of accuracy for data efficiency, and largely fail in creating a usable dialogue agent. We propose automatic methods that use ToD training data in a source language to build a high-quality functioning dialogue agent in another target language that has no training data (i.e. zero-shot) or a small training set (i.e. few-shot). Unlike most prior work in cross-lingual ToD that only focuses on Dialogue State Tracking (DST), we build an end-to-end agent.
We show that our approach closes the accuracy gap between few-shot and existing full-shot methods for ToD agents. We achieve this by (1) improving the dialogue data representation, (2) improving entity-aware machine translation, and (3) automatic filtering of noisy translations.
We evaluate our approach on the recent bilingual dialogue dataset BiToD. In Chinese to English transfer, in the zero-shot setting, our method achieves 46.7% and 22.0% in Task Success Rate (TSR) and Dialogue Success Rate (DSR) respectively. In the few-shot setting where 10% of the data in the target language is used, we improve the state-of-the-art by 15.2% and 14.0%, coming within 5% of full-shot training.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
ThingTalk: An Extensible, Executable Representation Language for Task-Oriented Dialogues
Authors:
Monica S. Lam,
Giovanni Campagna,
Mehrad Moradshahi,
Sina J. Semnani,
Silei Xu
Abstract:
Task-oriented conversational agents rely on semantic parsers to translate natural language to formal representations. In this paper, we propose the design and rationale of the ThingTalk formal representation, and how the design improves the development of transactional task-oriented agents.
ThingTalk is built on four core principles: (1) representing user requests directly as executable statemen…
▽ More
Task-oriented conversational agents rely on semantic parsers to translate natural language to formal representations. In this paper, we propose the design and rationale of the ThingTalk formal representation, and how the design improves the development of transactional task-oriented agents.
ThingTalk is built on four core principles: (1) representing user requests directly as executable statements, covering all the functionality of the agent, (2) representing dialogues formally and succinctly to support accurate contextual semantic parsing, (3) standardizing types and interfaces to maximize reuse between agents, and (4) allowing multiple, independently-developed agents to be composed in a single virtual assistant. ThingTalk is developed as part of the Genie Framework that allows developers to quickly build transactional agents given a database and APIs.
We compare ThingTalk to existing representations: SMCalFlow, SGD, TreeDST. Compared to the others, the ThingTalk design is both more general and more cost-effective. Evaluated on the MultiWOZ benchmark, using ThingTalk and associated tools yields a new state of the art accuracy of 79% turn-by-turn.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
House Price Prediction using Satellite Imagery
Authors:
Sina Jandaghi Semnani,
Hoormazd Rezaei
Abstract:
In this paper we show how using satellite images can improve the accuracy of housing price estimation models. Using Los Angeles County's property assessment dataset, by transferring learning from an Inception-v3 model pretrained on ImageNet, we could achieve an improvement of ~10% in R-squared score compared to two baseline models that only use non-image features of the house.
In this paper we show how using satellite images can improve the accuracy of housing price estimation models. Using Los Angeles County's property assessment dataset, by transferring learning from an Inception-v3 model pretrained on ImageNet, we could achieve an improvement of ~10% in R-squared score compared to two baseline models that only use non-image features of the house.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine Translation
Authors:
Mehrad Moradshahi,
Giovanni Campagna,
Sina J. Semnani,
Silei Xu,
Monica S. Lam
Abstract:
We propose Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language. Our methodology is to (1) generate training data automatically in the target language by augmenting machine-translated datasets with local entities scraped from public websites, (2) add a few-shot boost of human-translated sentences and tra…
▽ More
We propose Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language. Our methodology is to (1) generate training data automatically in the target language by augmenting machine-translated datasets with local entities scraped from public websites, (2) add a few-shot boost of human-translated sentences and train a novel XLMR-LSTM semantic parser, and (3) test the model on natural utterances curated using human translators.
We assess the effectiveness of our approach by extending the current capabilities of Schema2QA, a system for English Question Answering (QA) on the open web, to 10 new languages for the restaurants and hotels domains. Our models achieve an overall test accuracy ranging between 61% and 69% for the hotels domain and between 64% and 78% for restaurants domain, which compares favorably to 69% and 80% obtained for English parser trained on gold English data and a few examples from validation set. We show our approach outperforms the previous state-of-the-art methodology by more than 30% for hotels and 40% for restaurants with localized ontologies for the subset of languages tested.
Our methodology enables any software developer to add a new language capability to a QA system for a new domain, leveraging machine translation, in less than 24 hours.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data
Authors:
Silei Xu,
Sina J. Semnani,
Giovanni Campagna,
Monica S. Lam
Abstract:
We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of a…
▽ More
We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.
△ Less
Submitted 7 June, 2021; v1 submitted 9 October, 2020;
originally announced October 2020.
-
A Few-Shot Semantic Parser for Wizard-of-Oz Dialogues with the Precise ThingTalk Representation
Authors:
Giovanni Campagna,
Sina J. Semnani,
Ryan Kearns,
Lucas Jun Koba Sato,
Silei Xu,
Monica S. Lam
Abstract:
Previous attempts to build effective semantic parsers for Wizard-of-Oz (WOZ) conversations suffer from the difficulty in acquiring a high-quality, manually annotated training set. Approaches based only on dialogue synthesis are insufficient, as dialogues generated from state-machine based models are poor approximations of real-life conversations. Furthermore, previously proposed dialogue state rep…
▽ More
Previous attempts to build effective semantic parsers for Wizard-of-Oz (WOZ) conversations suffer from the difficulty in acquiring a high-quality, manually annotated training set. Approaches based only on dialogue synthesis are insufficient, as dialogues generated from state-machine based models are poor approximations of real-life conversations. Furthermore, previously proposed dialogue state representations are ambiguous and lack the precision necessary for building an effective agent. This paper proposes a new dialogue representation and a sample-efficient methodology that can predict precise dialogue states in WOZ conversations. We extended the ThingTalk representation to capture all information an agent needs to respond properly. Our training strategy is sample-efficient: we combine (1) fewshot data sparsely sampling the full dialogue space and (2) synthesized data covering a subset space of dialogues generated by a succinct state-based dialogue model. The completeness of the extended ThingTalk language is demonstrated with a fully operational agent, which is also used in training data synthesis. We demonstrate the effectiveness of our methodology on MultiWOZ 3.0, a reannotation of the MultiWOZ 2.1 dataset in ThingTalk. ThingTalk can represent 98% of the test turns, while the simulator can emulate 85% of the validation set. We train a contextual semantic parser using our strategy, and obtain 79% turn-by-turn exact match accuracy on the reannotated test set.
△ Less
Submitted 7 April, 2022; v1 submitted 16 September, 2020;
originally announced September 2020.
-
Revisiting the Open-Domain Question Answering Pipeline
Authors:
Sina J. Semnani,
Manish Pandey
Abstract:
Open-domain question answering (QA) is the tasl of identifying answers to natural questions from a large corpus of documents. The typical open-domain QA system starts with information retrieval to select a subset of documents from the corpus, which are then processed by a machine reader to select the answer spans. This paper describes Mindstone, an open-domain QA system that consists of a new mult…
▽ More
Open-domain question answering (QA) is the tasl of identifying answers to natural questions from a large corpus of documents. The typical open-domain QA system starts with information retrieval to select a subset of documents from the corpus, which are then processed by a machine reader to select the answer spans. This paper describes Mindstone, an open-domain QA system that consists of a new multi-stage pipeline that employs a traditional BM25-based information retriever, RM3-based neural relevance feedback, neural ranker, and a machine reading comprehension stage. This paper establishes a new baseline for end-to-end performance on question answering for Wikipedia/SQuAD dataset (EM=58.1, F1=65.8), with substantial gains over the previous state of the art (Yang et al., 2019b). We also show how the new pipeline enables the use of low-resolution labels, and can be easily tuned to meet various timing requirements.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
An anisotropic viscoplasticity model for shale based on layered microstructure homogenization
Authors:
Jinhyun Choo,
Shabnam J. Semnani,
Joshua A. White
Abstract:
Viscoplastic deformation of shale is frequently observed in many subsurface applications. Many studies have suggested that this viscoplastic behavior is anisotropic---specifically, transversely isotropic---and closely linked to the layered composite structure at the microscale. In this work, we develop a two-scale constitutive model for shale in which anisotropic viscoplastic behavior naturally em…
▽ More
Viscoplastic deformation of shale is frequently observed in many subsurface applications. Many studies have suggested that this viscoplastic behavior is anisotropic---specifically, transversely isotropic---and closely linked to the layered composite structure at the microscale. In this work, we develop a two-scale constitutive model for shale in which anisotropic viscoplastic behavior naturally emerges from semi-analytical homogenization of a bi-layer microstructure. The microstructure is modeled as a composite of soft layers, representing a ductile matrix formed by clay and organics, and hard layers, corresponding to a brittle matrix composed of stiff minerals. This layered microstructure renders the macroscopic behavior anisotropic, even when the individual layers are modeled with isotropic constitutive laws. Using a common correlation between clay and organic content and magnitude of creep, we apply a viscoplastic Modified Cam-Clay plasticity model to the soft layers, while treating the hard layers as a linear elastic material to minimize the number of calibration parameters. We then describe the implementation of the proposed model in a standard material update subroutine. The model is validated with laboratory creep data on samples from three gas shale formations. We also demonstrate the computational behavior of the proposed model through simulation of time-dependent borehole closure in a shale formation with different bedding plane directions.
△ Less
Submitted 26 October, 2020; v1 submitted 25 August, 2020;
originally announced August 2020.
-
An Inelastic Homogenization Framework for Layered Materials with Planes of Weakness
Authors:
Shabnam J. Semnani,
Joshua A. White
Abstract:
Many geologic materials have a composite structure, in which macroscopic mechanical behavior is determined by the properties, shape, and heterogeneous distribution of individual constituents. In particular, sedimentary rocks commonly exhibit a layered microstructure, with distinct bedding planes that can also form planes of weakness. In this work, we present a homogenization framework for modeling…
▽ More
Many geologic materials have a composite structure, in which macroscopic mechanical behavior is determined by the properties, shape, and heterogeneous distribution of individual constituents. In particular, sedimentary rocks commonly exhibit a layered microstructure, with distinct bedding planes that can also form planes of weakness. In this work, we present a homogenization framework for modeling inelastic layered media. The proposed constitutive model allows for distinct micro-constitutive laws for each layer, explicit representation of layer distributions, as well as incorporation of imperfect bonding at the interface between adjacent layers. No a priori assumptions are needed regarding the specific consitutive models used for the layers and interfaces, providing significant modeling flexibility. The overall framework provides a simple and physically-motivated way of defining anisotropic material behavior as an emergent property of the layered microstructure. The model is calibrated using triaxial and true-triaxial experimental data to demonstrate its ability to describe anisotropic deformation and multiple modes of failure.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.