Search | arXiv e-print repository

doi 10.1007/s10791-020-09386-w

Evaluation Metrics for Measuring Bias in Search Engine Results

Authors: Gizem Gezici, Aldo Lipani, Yucel Saygin, Emine Yilmaz

Abstract: Search engines decide what we see for a given search query. Since many people are exposed to information through search engines, it is fair to expect that search engines are neutral. However, search engine results do not necessarily cover all the viewpoints of a search query topic, and they can be biased towards a specific view since search engine results are returned based on relevance, which is… ▽ More Search engines decide what we see for a given search query. Since many people are exposed to information through search engines, it is fair to expect that search engines are neutral. However, search engine results do not necessarily cover all the viewpoints of a search query topic, and they can be biased towards a specific view since search engine results are returned based on relevance, which is calculated using many features and sophisticated algorithms where search neutrality is not necessarily the focal point. Therefore, it is important to evaluate the search engine results with respect to bias. In this work we propose novel web search bias evaluation measures which take into account the rank and relevance. We also propose a framework to evaluate web search bias using the proposed measures and test our framework on two popular search engines based on 57 controversial query topics such as abortion, medical marijuana, and gay marriage. We measure the stance bias (in support or against), as well as the ideological bias (conservative or liberal). We observe that the stance does not necessarily correlate with the ideological leaning, e.g. a positive stance on abortion indicates a liberal leaning but a positive stance on Cuba embargo indicates a conservative leaning. Our experiments show that neither of the search engines suffers from stance bias. However, both search engines suffer from ideological bias, both favouring one ideological leaning to the other, which is more significant from the perspective of polarisation in our society. △ Less

Submitted 3 February, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Journal ref: 24, 2021, 85-113

arXiv:2206.09987 [pdf, other]

Measuring Gender Bias in Educational Videos: A Case Study on YouTube

Authors: Gizem Gezici, Yucel Saygin

Abstract: Students are increasingly using online materials to learn new subjects or to supplement their learning process in educational institutions. Issues regarding gender bias have been raised in the context of formal education and some measures have been proposed to mitigate them. However, online educational materials in terms of possible gender bias and stereotypes which may appear in different forms a… ▽ More Students are increasingly using online materials to learn new subjects or to supplement their learning process in educational institutions. Issues regarding gender bias have been raised in the context of formal education and some measures have been proposed to mitigate them. However, online educational materials in terms of possible gender bias and stereotypes which may appear in different forms are yet to be investigated in the context of search bias in a widely-used search platform. As a first step towards measuring possible gender bias in online platforms, we have investigated YouTube educational videos in terms of the perceived gender of their narrators. We adopted bias measures for ranked search results to evaluate educational videos returned by YouTube in response to queries related to STEM (Science, Technology, Engineering, and Mathematics) and NON-STEM fields of education. Gender is a research area by itself in social sciences which is beyond the scope of this work. In this respect, for annotating the perceived gender of the narrator of an instructional video we used only a crude classification of gender into Male, and Female. Then, for analysing perceived gender bias we utilised bias measures that have been inspired by search platforms and further incorporated rank information into our analysis. Our preliminary results demonstrate that there is a significant bias towards the male gender on the returned YouTube educational videos, and the degree of bias varies when we compare STEM and NON-STEM queries. Finally, there is a strong evidence that rank information might affect the results. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:1808.00388 [pdf, other]

How Does Tweet Difficulty Affect Labeling Performance of Annotators?

Authors: Stefan Räbiger, Yücel Saygın, Myra Spiliopoulou

Abstract: Crowdsourcing is a popular means to obtain labeled data at moderate costs, for example for tweets, which can then be used in text mining tasks. To alleviate the problem of low-quality labels in this context, multiple human factors have been analyzed to identify and deal with workers who provide such labels. However, one aspect that has been rarely considered is the inherent difficulty of tweets to… ▽ More Crowdsourcing is a popular means to obtain labeled data at moderate costs, for example for tweets, which can then be used in text mining tasks. To alleviate the problem of low-quality labels in this context, multiple human factors have been analyzed to identify and deal with workers who provide such labels. However, one aspect that has been rarely considered is the inherent difficulty of tweets to be labeled and how this affects the reliability of the labels that annotators assign to such tweets. Therefore, we investigate in this preliminary study this connection using a hierarchical sentiment labeling task on Twitter. We find that there is indeed a relationship between both factors, assuming that annotators have labeled some tweets before: labels assigned to easy tweets are more reliable than those assigned to difficult tweets. Therefore, training predictors on easy tweets enhances the performance by up to 6% in our experiment. This implies potential improvements for active learning techniques and crowdsourcing. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: 13 pages, 4 figures, "for dataset, see https://www.researchgate.net/publication/325180810_Infsci2017_dataset"

arXiv:1706.05171 [pdf, ps, other]

doi 10.1016/j.eswa.2017.06.013

Improving Scalability of Inductive Logic Programming via Pruning and Best-Effort Optimisation

Authors: Mishal Kazmi, Peter Schüller, Yücel Saygın

Abstract: Inductive Logic Programming (ILP) combines rule-based and statistical artificial intelligence methods, by learning a hypothesis comprising a set of rules given background knowledge and constraints for the search space. We focus on extending the XHAIL algorithm for ILP which is based on Answer Set Programming and we evaluate our extensions using the Natural Language Processing application of senten… ▽ More Inductive Logic Programming (ILP) combines rule-based and statistical artificial intelligence methods, by learning a hypothesis comprising a set of rules given background knowledge and constraints for the search space. We focus on extending the XHAIL algorithm for ILP which is based on Answer Set Programming and we evaluate our extensions using the Natural Language Processing application of sentence chunking. With respect to processing natural language, ILP can cater for the constant change in how we use language on a daily basis. At the same time, ILP does not require huge amounts of training examples such as other statistical methods and produces interpretable results, that means a set of rules, which can be analysed and tweaked if necessary. As contributions we extend XHAIL with (i) a pruning mechanism within the hypothesis generalisation algorithm which enables learning from larger datasets, (ii) a better usage of modern solver technology using recently developed optimisation methods, and (iii) a time budget that permits the usage of suboptimal results. We evaluate these improvements on the task of sentence chunking using three datasets from a recent SemEval competition. Results show that our improvements allow for learning on bigger datasets with results that are of similar quality to state-of-the-art systems on the same task. Moreover, we compare the hypotheses obtained on datasets to gain insights on the structure of each dataset. △ Less

Submitted 16 June, 2017; originally announced June 2017.

Comments: 24 pages, preprint of article accepted at Expert Systems With Applications

Journal ref: Expert Systems With Applications 87, pages 291-303, 2017

Showing 1–4 of 4 results for author: Saygin, Y