-
SUper Team at SemEval-2016 Task 3: Building a feature-rich system for community question answering
Authors:
Tsvetomila Mihaylova,
Pepa Gencheva,
Martin Boyanov,
Ivana Yovcheva,
Todor Mihaylov,
Momchil Hardalov,
Yasen Kiprov,
Daniel Balchev,
Ivan Koychev,
Preslav Nakov,
Ivelina Nikolova,
Galia Angelova
Abstract:
We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors t…
▽ More
We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors trained on QatarLiving data and similarities between the question and the comment for subtasks A and C, and between the original and the related question for Subtask B.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search
Authors:
Pepa Atanasova,
Georgi Karadzhov,
Yasen Kiprov,
Preslav Nakov,
Fabrizio Sebastiani
Abstract:
In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links'' metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, al…
▽ More
In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links'' metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking follow-up questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
Finding People's Professions and Nationalities Using Distant Supervision - The FMI@SU "goosefoot" team at the WSDM Cup 2017 Triple Scoring Task
Authors:
Valentin Zmiycharov,
Dimitar Alexandrov,
Preslav Nakov,
Ivan Koychev,
Yasen Kiprov
Abstract:
We describe the system that our FMI@SU student's team built for participating in the Triple Scoring task at the WSDM Cup 2017. Given a triple from a "type-like" relation, profession or nationality, the goal is to produce a score, on a scale from 0 to 7, that measures the relevance of the statement expressed by the triple: e.g., how well does the profession of an Actor fit for Quentin Tarantino? We…
▽ More
We describe the system that our FMI@SU student's team built for participating in the Triple Scoring task at the WSDM Cup 2017. Given a triple from a "type-like" relation, profession or nationality, the goal is to produce a score, on a scale from 0 to 7, that measures the relevance of the statement expressed by the triple: e.g., how well does the profession of an Actor fit for Quentin Tarantino? We propose a distant supervision approach using information crawled from Wikipedia, DeletionPedia, and DBpedia, together with task-specific word embeddings, TF-IDF weights, and role occurrence order, which we combine in a linear regression model. The official evaluation ranked our submission 1st on Kendall's Tau, 7th on Average score difference, and 9th on Accuracy, out of 21 participating teams.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
Large-Scale Goodness Polarity Lexicons for Community Question Answering
Authors:
Todor Mihaylov,
Daniel Belchev,
Yasen Kiprov,
Ivan Koychev,
Preslav Nakov
Abstract:
We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary an…
▽ More
We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline and state-of-the art performance on SemEval-2016 Task 3.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation
Authors:
Georgi Karadjov,
Tsvetomila Mihaylova,
Yasen Kiprov,
Georgi Georgiev,
Ivan Koychev,
Preslav Nakov
Abstract:
Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text's writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometr…
▽ More
Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text's writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient, and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.
△ Less
Submitted 28 July, 2017; v1 submitted 12 July, 2017;
originally announced July 2017.