Search | arXiv e-print repository

arXiv:1908.01868 [pdf, other]

Local versus Global Strategies in Social Query Expansion

Authors: Omar Alonso, Vasileios Kandylas, Serge-Eric Tremblay

Abstract: Link sharing in social media can be seen as a collaboratively retrieved set of documents for a query or topic expressed by a hashtag. Temporal information plays an important role for identifying the correct context for which such annotations are valid for retrieval purposes. We investigate how social data as temporal context can be used for query expansion and compare global versus local strategie… ▽ More Link sharing in social media can be seen as a collaboratively retrieved set of documents for a query or topic expressed by a hashtag. Temporal information plays an important role for identifying the correct context for which such annotations are valid for retrieval purposes. We investigate how social data as temporal context can be used for query expansion and compare global versus local strategies for computing such contextual information for a set of hashtags. △ Less

Submitted 5 August, 2019; originally announced August 2019.

arXiv:1906.05986 [pdf, other]

Scalable Knowledge Graph Construction from Twitter

Authors: Omar Alonso, Vasileios Kandylas, Serge-Eric Tremblay

Abstract: We describe a knowledge graph derived from Twitter data with the goal of discovering relationships between people, links, and topics. The goal is to filter out noise from Twitter and surface an inside-out view that relies on high quality content. The generated graph contains many relationships where the user can query and traverse the structure from different angles allowing the development of new… ▽ More We describe a knowledge graph derived from Twitter data with the goal of discovering relationships between people, links, and topics. The goal is to filter out noise from Twitter and surface an inside-out view that relies on high quality content. The generated graph contains many relationships where the user can query and traverse the structure from different angles allowing the development of new applications. △ Less

Submitted 13 June, 2019; originally announced June 2019.

arXiv:1411.0149 [pdf, other]

How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels

Authors: Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, Aleksandrs Slivkins

Abstract: Crowdsourcing has been part of the IR toolbox as a cheap and fast mechanism to obtain labels for system development and evaluation. Successful deployment of crowdsourcing at scale involves adjusting many variables, a very important one being the number of workers needed per human intelligence task (HIT). We consider the crowdsourcing task of learning the answer to simple multiple-choice HITs, whic… ▽ More Crowdsourcing has been part of the IR toolbox as a cheap and fast mechanism to obtain labels for system development and evaluation. Successful deployment of crowdsourcing at scale involves adjusting many variables, a very important one being the number of workers needed per human intelligence task (HIT). We consider the crowdsourcing task of learning the answer to simple multiple-choice HITs, which are representative of many relevance experiments. In order to provide statistically significant results, one often needs to ask multiple workers to answer the same HIT. A stopping rule is an algorithm that, given a HIT, decides for any given set of worker answers if the system should stop and output an answer or iterate and ask one more worker. Knowing the historic performance of a worker in the form of a quality score can be beneficial in such a scenario. In this paper we investigate how to devise better stopping rules given such quality scores. We also suggest adaptive exploration as a promising approach for scalable and automatic creation of ground truth. We conduct a data analysis on an industrial crowdsourcing platform, and use the observations from this analysis to design new stopping rules that use the workers' quality scores in a non-trivial manner. We then perform a simulation based on a real-world workload, showing that our algorithm performs better than the more naive approaches. △ Less

Submitted 19 May, 2016; v1 submitted 1 November, 2014; originally announced November 2014.

Comments: SIGIR 2016

arXiv:1410.2828 [pdf, other]

A Study on Placement of Social Buttons in Web Pages

Authors: Omar Alonso, Vasilis Kandylas

Abstract: With the explosion of social media in the last few years, web pages nowadays include different social network buttons where users can express if they support or recommend content. Those social buttons are very visual and their presentations, along with the counters, mark the importance of the social network and the interest on the content. In this paper, we analyze the presence of four types of so… ▽ More With the explosion of social media in the last few years, web pages nowadays include different social network buttons where users can express if they support or recommend content. Those social buttons are very visual and their presentations, along with the counters, mark the importance of the social network and the interest on the content. In this paper, we analyze the presence of four types of social buttons (Facebook, Twitter, Google+1, and LinkedIn) in a large collection of web pages that we tracked over a period of time. We report on the distribution and counts along with some characteristics per domain. Finally, we outline some research directions. △ Less

Submitted 10 October, 2014; originally announced October 2014.

arXiv:1407.6714 [pdf, other]

CrowdSTAR: A Social Task Routing Framework for Online Communities

Authors: Besmira Nushi, Omar Alonso, Martin Hentschel, Vasileios Kandylas

Abstract: The online communities available on the Web have shown to be significantly interactive and capable of collectively solving difficult tasks. Nevertheless, it is still a challenge to decide how a task should be dispatched through the network due to the high diversity of the communities and the dynamically changing expertise and social availability of their members. We introduce CrowdSTAR, a framewor… ▽ More The online communities available on the Web have shown to be significantly interactive and capable of collectively solving difficult tasks. Nevertheless, it is still a challenge to decide how a task should be dispatched through the network due to the high diversity of the communities and the dynamically changing expertise and social availability of their members. We introduce CrowdSTAR, a framework designed to route tasks across and within online crowds. CrowdSTAR indexes the topic-specific expertise and social features of the crowd contributors and then uses a routing algorithm, which suggests the best sources to ask based on the knowledge vs. availability trade-offs. We experimented with the proposed framework for question and answering scenarios by using two popular social networks as crowd candidates: Twitter and Quora. △ Less

Submitted 24 July, 2014; originally announced July 2014.

ACM Class: H.4.m; H.5.3

arXiv:1307.3673 [pdf, other]

A Data Management Approach for Dataset Selection Using Human Computation

Authors: Alexandros Ntoulas, Omar Alonso, Vasilis Kandylas

Abstract: As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly translates to training and working costs. Crowdsourcing platforms have made labeling cheaper and faster, but they still involve significant costs, especially… ▽ More As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly translates to training and working costs. Crowdsourcing platforms have made labeling cheaper and faster, but they still involve significant costs, especially for the cases where the potential set of candidate data to be labeled is large. In this paper we describe a methodology and a prototype system aiming at addressing this challenge for Web-scale problems in an industrial setting. We discuss ideas on how to efficiently select the data to use for training of machine learning algorithms in an attempt to reduce cost. We show results achieving good performance with reduced cost by carefully selecting which instances to label. Our proposed algorithm is presented as part of a framework for managing and generating training datasets, which includes, among other components, a human computation element. △ Less

Submitted 13 July, 2013; originally announced July 2013.

arXiv:1302.3268 [pdf, ps, other]

Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem

Authors: Ittai Abraham, Omar Alonso, Vasilis Kandylas, Aleksandrs Slivkins

Abstract: Very recently crowdsourcing has become the de facto platform for distributing and collecting human computation for a wide range of tasks and applications such as information retrieval, natural language processing and machine learning. Current crowdsourcing platforms have some limitations in the area of quality control. Most of the effort to ensure good quality has to be done by the experimenter wh… ▽ More Very recently crowdsourcing has become the de facto platform for distributing and collecting human computation for a wide range of tasks and applications such as information retrieval, natural language processing and machine learning. Current crowdsourcing platforms have some limitations in the area of quality control. Most of the effort to ensure good quality has to be done by the experimenter who has to manage the number of workers needed to reach good results. We propose a simple model for adaptive quality control in crowdsourced multiple-choice tasks which we call the \emph{bandit survey problem}. This model is related to, but technically different from the well-known multi-armed bandit problem. We present several algorithms for this problem, and support them with analysis and simulations. Our approach is based in our experience conducting relevance evaluation for a large commercial search engine. △ Less

Submitted 20 May, 2013; v1 submitted 13 February, 2013; originally announced February 2013.

Comments: Full version of a paper in COLT 2013

Showing 1–7 of 7 results for author: Kandylas, V