Search | arXiv e-print repository

Report on the Workshop on Simulations for Information Access (Sim4IA 2024) at SIGIR 2024

Authors: Timo Breuer, Christin Katharina Kreutz, Norbert Fuhr, Krisztian Balog, Philipp Schaer, Nolwenn Bernard, Ingo Frommholz, Marcel Gohsen, Kaixin Ji, Gareth J. F. Jones, Jüri Keller, Jiqun Liu, Martin Mladenov, Gabriella Pasi, Johanne Trippas, Xi Wang, Saber Zerhoudi, ChengXiang Zhai

Abstract: This paper is a report of the Workshop on Simulations for Information Access (Sim4IA) workshop at SIGIR 2024. The workshop had two keynotes, a panel discussion, nine lightning talks, and two breakout sessions. Key takeaways were user simulation's importance in academia and industry, the possible bridging of online and offline evaluation, and the issues of organizing a companion shared task around… ▽ More This paper is a report of the Workshop on Simulations for Information Access (Sim4IA) workshop at SIGIR 2024. The workshop had two keynotes, a panel discussion, nine lightning talks, and two breakout sessions. Key takeaways were user simulation's importance in academia and industry, the possible bridging of online and offline evaluation, and the issues of organizing a companion shared task around user simulations for information access. We report on how we organized the workshop, provide a brief overview of what happened at the workshop, and summarize the main topics and findings of the workshop and future work. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Preprint of a SIGIR Forum submission for Vol. 58 No. 2 - December 2024

arXiv:2303.10497 [pdf, other]

doi 10.5220/0011798700003417

Examining the Potential for Conversational Exploratory Search using a Smart Speaker Digital Assistant

Authors: Abhishek Kaushik, Gareth J. F. Jones

Abstract: Online Digital Assistants, such as Amazon Alexa, Google Assistant, Apple Siri are very popular and provide a range or services to their users, a key function is their ability to satisfy user information needs from the sources available to them. Users may often regard these applications as providing search services similar to Google type search engines. However, while it is clear that they are in g… ▽ More Online Digital Assistants, such as Amazon Alexa, Google Assistant, Apple Siri are very popular and provide a range or services to their users, a key function is their ability to satisfy user information needs from the sources available to them. Users may often regard these applications as providing search services similar to Google type search engines. However, while it is clear that they are in general able to answer factoid questions effectively, it is much less obvious how well they support less specific or exploratory type search tasks. We describe an investigation examining the behaviour of the standard Amazon Alexa for exploratory search tasks. The results of our study show that it not effective in addressing these types of information needs. We propose extensions to Alexa designed to overcome these shortcomings. Our Custom Alexa application extends Alexa's conversational functionality for exploratory search. A user study shows that our extended Alexa application both enables users to more successfully complete exploratory search tasks and is well accepted by our test users. △ Less

Submitted 18 March, 2023; originally announced March 2023.

Journal ref: Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - HUCAPP, ISBN 978-989-758-634-7; ISSN 2184-4321, SciTePress, pages 305-317, 2023

arXiv:2303.09258 [pdf, other]

doi 10.5220/0011798500003417

Comparing Conventional and Conversational Search Interaction using Implicit Evaluation Methods

Authors: Abhishek Kaushik, Gareth J. F. Jones

Abstract: Conversational search applications offer the prospect of improved user experience in information seeking via agent support. However, it is not clear how searchers will respond to this mode of engagement, in comparison to a conventional user-driven search interface, such as those found in a standard web search engine. We describe a laboratory-based study directly comparing user behaviour for a conv… ▽ More Conversational search applications offer the prospect of improved user experience in information seeking via agent support. However, it is not clear how searchers will respond to this mode of engagement, in comparison to a conventional user-driven search interface, such as those found in a standard web search engine. We describe a laboratory-based study directly comparing user behaviour for a conventional search interface (CSI) with that of an agent-mediated multiview conversational search interface (MCSI) which extends the CSI. User reaction and search outcomes of the two interfaces are compared using implicit evaluation using five analysis methods: claiming to have a better search experience in contrast to a corresponding standard search interface. △ Less

Submitted 18 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Journal ref: Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - (Volume 2)- February 19-21, 2023, in Lisbon, Portugal

arXiv:2301.06056 [pdf, other]

Improving Noise Robustness for Spoken Content Retrieval using Semi-supervised ASR and N-best Transcripts for BERT-based Ranking Models

Authors: Yasufumi Moriya, Gareth. J. F. Jones

Abstract: BERT-based re-ranking and dense retrieval (DR) systems have been shown to improve search effectiveness for spoken content retrieval (SCR). However, both methods can still show a reduction in effectiveness when using ASR transcripts in comparison to accurate manual transcripts. We find that a known-item search task on the How2 dataset of spoken instruction videos shows a reduction in mean reciproca… ▽ More BERT-based re-ranking and dense retrieval (DR) systems have been shown to improve search effectiveness for spoken content retrieval (SCR). However, both methods can still show a reduction in effectiveness when using ASR transcripts in comparison to accurate manual transcripts. We find that a known-item search task on the How2 dataset of spoken instruction videos shows a reduction in mean reciprocal rank (MRR) scores of 10-14%. As a potential method to reduce this disparity, we investigate the use of semi-supervised ASR transcripts and N-best ASR transcripts to mitigate ASR errors for spoken search using BERT-based ranking. Semi-supervised ASR transcripts brought 2-5.5% MRR improvements over standard ASR transcripts and our N-best early fusion methods for BERT DR systems improved MRR by 3-4%. Combining semi-supervised transcripts with N-best early fusion for BERT DR reduced the MRR gap in search effectiveness between manual and ASR transcripts by more than 50% from 14.32% to 6.58%. △ Less

Submitted 15 January, 2023; originally announced January 2023.

Comments: accepted by SLT 2022

arXiv:2203.05899 [pdf, other]

Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

Authors: Tianbo Ji, Yvette Graham, Gareth J. F. Jones, Chenyang Lyu, Qun Liu

Abstract: Evaluation of open-domain dialogue systems is highly challenging and development of better techniques is highlighted time and again as desperately needed. Despite substantial efforts to carry out reliable live evaluation of systems in recent competitions, annotations have been abandoned and reported as too unreliable to yield sensible results. This is a serious problem since automatic metrics are… ▽ More Evaluation of open-domain dialogue systems is highly challenging and development of better techniques is highlighted time and again as desperately needed. Despite substantial efforts to carry out reliable live evaluation of systems in recent competitions, annotations have been abandoned and reported as too unreliable to yield sensible results. This is a serious problem since automatic metrics are not known to provide a good indication of what may or may not be a high-quality conversation. Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost. Self-replication experiments reveal almost perfectly repeatable results with a correlation of $r=0.969$. Furthermore, due to the lack of appropriate methods of statistical significance testing, the likelihood of potential improvements to systems occurring due to chance is rarely taken into account in dialogue evaluation, and the evaluation we propose facilitates application of standard tests. Since we have developed a highly reliable evaluation method, new insights into system performance can be revealed. We therefore include a comparison of state-of-the-art models (i) with and without personas, to measure the contribution of personas to conversation quality, as well as (ii) prescribed versus freely chosen topics. Interestingly with respect to personas, results indicate that personas do not positively contribute to conversation quality as expected. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: to appear at ACL 2022 main conference

arXiv:2105.03311 [pdf, other]

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

Authors: Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton

Abstract: To facilitate effective translation modeling and translation studies, one of the crucial questions to address is how to assess translation quality. From the perspectives of accuracy, reliability, repeatability and cost, translation quality assessment (TQA) itself is a rich and challenging task. In this work, we present a high-level and concise survey of TQA methods, including both manual judgement… ▽ More To facilitate effective translation modeling and translation studies, one of the crucial questions to address is how to assess translation quality. From the perspectives of accuracy, reliability, repeatability and cost, translation quality assessment (TQA) itself is a rich and challenging task. In this work, we present a high-level and concise survey of TQA methods, including both manual judgement criteria and automated evaluation metrics, which we classify into further detailed sub-categories. We hope that this work will be an asset for both translation model researchers and quality assessment researchers. In addition, we hope that it will enable practitioners to quickly develop a better understanding of the conventional TQA field, and to find corresponding closely relevant evaluation solutions for their own needs. This work may also serve inspire further development of quality assessment and evaluation methodologies for other natural language processing (NLP) tasks in addition to machine translation (MT), such as automatic text summarization (ATS), natural language understanding (NLU) and natural language generation (NLG). △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: Accepted to 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021): Workshop on Modelling Translation: Translatology in the Digital Age (MoTra21). arXiv admin note: substantial text overlap with arXiv:1605.04515

arXiv:2104.13473 [pdf, other]

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Authors: George Awad, Asad A. Butt, Keith Curtis, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Jesse Zhang, Eliot Godard, Baptiste Chocot, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quenot

Abstract: The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last twenty years this effort has yielded a better understanding of how systems can effectively accomplish such… ▽ More The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last twenty years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2020 represented a continuation of four tasks and the addition of two new tasks. In total, 29 teams from various research organizations worldwide completed one or more of the following six tasks: 1. Ad-hoc Video Search (AVS), 2. Instance Search (INS), 3. Disaster Scene Description and Indexing (DSDI), 4. Video to Text Description (VTT), 5. Activities in Extended Video (ActEV), 6. Video Summarization (VSUM). This paper is an introduction to the evaluation framework, tasks, data, and measures used in the evaluation campaign. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: TRECVID 2020 Workshop Overview Paper. arXiv admin note: substantial text overlap with arXiv:2009.09984

arXiv:2104.04501 [pdf, ps, other]

Exploring Current User Web Search Behaviours in Analysis Tasks to be Supported in Conversational Search

Authors: Abhishek Kaushik, Gareth J. F. Jones

Abstract: Conversational search presents opportunities to support users in their search activities to improve the effectiveness and efficiency of search while reducing their cognitive load. Limitations of the potential competency of conversational agents restrict the situations for which conversational search agents can replace human intermediaries. It is thus more interesting, initially at least, to invest… ▽ More Conversational search presents opportunities to support users in their search activities to improve the effectiveness and efficiency of search while reducing their cognitive load. Limitations of the potential competency of conversational agents restrict the situations for which conversational search agents can replace human intermediaries. It is thus more interesting, initially at least, to investigate opportunities for conversational interaction to support less complex information retrieval tasks, such as typical web search, which do not require human-level intelligence in the conversational agent. In order to move towards the development of a system to enable conversational search of this type, we need to understand their required capabilities. To progress our understanding of these, we report a study examining the behaviour of users when using a standard web search engine, designed to enable us to identify opportunities to support their search activities using a conversational agent. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted in SIGIR 2018 Second International Workshop on Conversational Approaches to Information Retrieval (CAIR 18), July 12, 2018, Ann Arbor Michigan, USA

arXiv:2104.04497 [pdf, other]

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Authors: Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton, Paolo Bolzoni

Abstract: Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remain about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chi… ▽ More Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remain about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chinese decomposition embedding in detail, i.e., radical, stroke, and intermediate levels, and how well these decompositions represent the meaning of the original character sequences, we carry out analysis with both automated and human evaluation of MT. Furthermore, we investigate if the combination of decomposed Multiword Expressions (MWEs) can enhance the model learning. MWE integration into MT has seen more than a decade of exploration. However, decomposed MWEs has not previously been explored. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted to publish in NoDaLiDa2021

arXiv:2104.03940 [pdf, other]

A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces

Authors: Abhishek Kaushik, Gareth J. F. Jones

Abstract: Conversational search (CS) has recently become a significant focus of the information retrieval (IR) research community. Multiple studies have been conducted which explore the concept of conversational search. Understanding and advancing research in CS requires careful and detailed evaluation. Existing CS studies have been limited to evaluation based on simple user feedback on task completion. We… ▽ More Conversational search (CS) has recently become a significant focus of the information retrieval (IR) research community. Multiple studies have been conducted which explore the concept of conversational search. Understanding and advancing research in CS requires careful and detailed evaluation. Existing CS studies have been limited to evaluation based on simple user feedback on task completion. We propose a CS evaluation framework which includes multiple dimensions: search experience, knowledge gain, software usability, cognitive load and user experience, based on studies of conversational systems and IR. We introduce these evaluation criteria and propose their use in a framework for the evaluation of CS systems. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: Accepted in MICROS (Mixed-Initiative ConveRsatiOnal Systems) Workshop at 43rd European Conference on Information Retrieval

arXiv:2103.15953 [pdf, other]

TREC 2020 Podcasts Track Overview

Authors: Rosie Jones, Ben Carterette, Ann Clifton, Maria Eskevich, Gareth J. F. Jones, Jussi Karlgren, Aasish Pappu, Sravana Reddy, Yongze Yu

Abstract: The Podcast Track is new at the Text Retrieval Conference (TREC) in 2020. The podcast track was designed to encourage research into podcasts in the information retrieval and NLP research communities. The track consisted of two shared tasks: segment retrieval and summarization, both based on a dataset of over 100,000 podcast episodes (metadata, audio, and automatic transcripts) which was released c… ▽ More The Podcast Track is new at the Text Retrieval Conference (TREC) in 2020. The podcast track was designed to encourage research into podcasts in the information retrieval and NLP research communities. The track consisted of two shared tasks: segment retrieval and summarization, both based on a dataset of over 100,000 podcast episodes (metadata, audio, and automatic transcripts) which was released concurrently with the track. The track generated considerable interest, attracted hundreds of new registrations to TREC and fifteen teams, mostly disjoint between search and summarization, made final submissions for assessment. Deep learning was the dominant experimental approach for both search experiments and summarization. This paper gives an overview of the tasks and the results of the participants' experiments. The track will return to TREC 2021 with the same two tasks, incorporating slight modifications in response to participant feedback. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Journal ref: The Proceedings of the Twenty-Ninth Text REtrieval Conference Proceedings (TREC 2020)

arXiv:2006.15679 [pdf, other]

Kernel Density Estimation based Factored Relevance Model for Multi-Contextual Point-of-Interest Recommendation

Authors: Anirban Chakraborty, Debasis Ganguly, Annalina Caputo, Gareth J. F. Jones

Abstract: An automated contextual suggestion algorithm is likely to recommend contextually appropriate and personalized 'points-of-interest' (POIs) to a user, if it can extract information from the user's preference history (exploitation) and effectively blend it with the user's current contextual information (exploration) to predict a POI's 'appropriateness' in the current context. To balance this trade-of… ▽ More An automated contextual suggestion algorithm is likely to recommend contextually appropriate and personalized 'points-of-interest' (POIs) to a user, if it can extract information from the user's preference history (exploitation) and effectively blend it with the user's current contextual information (exploration) to predict a POI's 'appropriateness' in the current context. To balance this trade-off between exploitation and exploration, we propose an unsupervised, generic framework involving a factored relevance model (FRLM), constituting two distinct components, one pertaining to historical contexts, and the other corresponding to the current context. We further generalize the proposed FRLM by incorporating the semantic relationships between terms in POI descriptors using kernel density estimation (KDE) on embedded word vectors. Additionally, we show that trip-qualifiers, (e.g. 'trip-type', 'accompanied-by') are potentially useful information sources that could be used to improve the recommendation effectiveness. Using such information is not straight forward since users' texts/reviews of visited POIs typically do not explicitly contain such annotations. We undertake a weakly supervised approach to predict the associations between the review-texts in a user profile and the likely trip contexts. Our experiments, conducted on the TREC contextual suggestion 2016 dataset, demonstrate that factorization, KDE-based generalizations, and trip-qualifier enriched contexts of the relevance model improve POI recommendation. △ Less

Submitted 25 November, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

Comments: To appear at Information Retrieval Journal

arXiv:2006.03022 [pdf, other]

Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Authors: Hao Wu, Gareth J. F. Jones, Francois Pitie

Abstract: Live video commenting systems are an emerging feature of online video sites. Recently the Chinese video sharing platform Bilibili, has popularised a novel captioning system where user comments are displayed as streams of moving subtitles overlaid on the video playback screen and broadcast to all viewers in real-time. LiveBot was recently introduced as a novel Automatic Live Video Commenting (ALVC)… ▽ More Live video commenting systems are an emerging feature of online video sites. Recently the Chinese video sharing platform Bilibili, has popularised a novel captioning system where user comments are displayed as streams of moving subtitles overlaid on the video playback screen and broadcast to all viewers in real-time. LiveBot was recently introduced as a novel Automatic Live Video Commenting (ALVC) application. This enables the automatic generation of live video comments from both the existing video stream and existing viewers comments. In seeking to reproduce the baseline results reported in the original Livebot paper, we found differences between the reproduced results using the project codebase and the numbers reported in the paper. Further examination of this situation suggests that this may be caused by a number of small issues in the project code, including a non-obvious overlap between the training and test sets. In this paper, we study these discrepancies in detail and propose an alternative baseline implementation as a reference for other researchers in this field. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: 4 pages, 2 figures

Report number: 06-04

arXiv:2005.10583 [pdf, other]

MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

Authors: Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton

Abstract: Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from… ▽ More Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from the PARSEME (PARSing and Multi-word Expressions) EU Project. This is a small collection of only 871 pairs of English-German MWEs. In this paper, we present multi-lingual and bilingual MWE corpora that we have extracted from root parallel corpora. Our collections are 3,159,226 and 143,042 bilingual MWE pairs for German-English and Chinese-English respectively after filtering. We examine the quality of these extracted bilingual MWEs in MT experiments. Our initial experiments applying MWEs in MT show improved translation performances on MWE terms in qualitative analysis and better general evaluation scores in quantitative analysis, on both German-English and Chinese-English language pairs. We follow a standard experimental pipeline to create our MultiMWE corpora which are available online. Researchers can use this free corpus for their own models or use them in a knowledge base as model features. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: Accepted to LREC2020

arXiv:1906.06147 [pdf, other]

Grounding Object Detections With Transcriptions

Authors: Yasufumi Moriya, Ramon Sanabria, Florian Metze, Gareth J. F. Jones

Abstract: A vast amount of audio-visual data is available on the Internet thanks to video streaming services, to which users upload their content. However, there are difficulties in exploiting available data for supervised statistical models due to the lack of labels. Unfortunately, generating labels for such amount of data through human annotation can be expensive, time-consuming and prone to annotation er… ▽ More A vast amount of audio-visual data is available on the Internet thanks to video streaming services, to which users upload their content. However, there are difficulties in exploiting available data for supervised statistical models due to the lack of labels. Unfortunately, generating labels for such amount of data through human annotation can be expensive, time-consuming and prone to annotation errors. In this paper, we propose a method to automatically extract entity-video frame pairs from a collection of instruction videos by using speech transcriptions and videos. We conduct experiments on image recognition and visual grounding tasks on the automatically constructed entity-video frame dataset of How2. The models will be evaluated on new manually annotated portion of How2 dev5 and val set and on the Flickr30k dataset. This work constitutes a first step towards meta-algorithms capable of automatically construct task-specific training sets. △ Less

Submitted 28 July, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

arXiv:1606.07869 [pdf, other]

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

Authors: Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, Gareth J. F. Jones

Abstract: A major difficulty in applying word vector embeddings in IR is in devising an effective and efficient strategy for obtaining representations of compound units of text, such as whole documents, (in comparison to the atomic words), for the purpose of indexing and scoring documents. Instead of striving for a suitable method for obtaining a single vector representation of a large document of text, we… ▽ More A major difficulty in applying word vector embeddings in IR is in devising an effective and efficient strategy for obtaining representations of compound units of text, such as whole documents, (in comparison to the atomic words), for the purpose of indexing and scoring documents. Instead of striving for a suitable method for obtaining a single vector representation of a large document of text, we rather aim for developing a similarity metric that makes use of the similarities between the individual embedded word vectors in a document and a query. More specifically, we represent a document and a query as sets of word vectors, and use a standard notion of similarity measure between these sets, computed as a function of the similarities between each constituent word pair from these sets. We then make use of this similarity measure in combination with standard IR based similarities for document ranking. The results of our initial experimental investigations shows that our proposed method improves MAP by up to $5.77\%$, in comparison to standard text-based language model similarity, on the TREC ad-hoc dataset. △ Less

Submitted 25 June, 2016; originally announced June 2016.

Comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval July 21, 2016, Pisa, Italy

arXiv:1312.1913 [pdf, other]

Adapting Binary Information Retrieval Evaluation Metrics for Segment-based Retrieval Tasks

Authors: Robin Aly, Maria Eskevich, Roeland Ordelman, Gareth J. F. Jones

Abstract: This report describes metrics for the evaluation of the effectiveness of segment-based retrieval based on existing binary information retrieval metrics. This metrics are described in the context of a task for the hyperlinking of video segments. This evaluation approach re-uses existing evaluation measures from the standard Cranfield evaluation paradigm. Our adaptation approach can in principle be… ▽ More This report describes metrics for the evaluation of the effectiveness of segment-based retrieval based on existing binary information retrieval metrics. This metrics are described in the context of a task for the hyperlinking of video segments. This evaluation approach re-uses existing evaluation measures from the standard Cranfield evaluation paradigm. Our adaptation approach can in principle be used with any kind of effectiveness measure that uses binary relevance, and for other segment-baed retrieval tasks. In our video hyperlinking setting, we use precision at a cut-off rank n and mean average precision. △ Less

Submitted 6 December, 2013; originally announced December 2013.

Comments: Explanation of evaluation measures for the linking task of the MediaEval Workshop 2013

Showing 1–17 of 17 results for author: Jones, G J F