Search | arXiv e-print repository

On Correlating Factors for Domain Adaptation Performance

Abstract: Dense retrievers have demonstrated significant potential for neural information retrieval; however, they lack robustness to domain shifts, limiting their efficacy in zero-shot settings across diverse domains. In this paper, we set out to analyze the possible factors that lead to successful domain adaptation of dense retrievers. We include domain similarity proxies between generated queries to test… ▽ More Dense retrievers have demonstrated significant potential for neural information retrieval; however, they lack robustness to domain shifts, limiting their efficacy in zero-shot settings across diverse domains. In this paper, we set out to analyze the possible factors that lead to successful domain adaptation of dense retrievers. We include domain similarity proxies between generated queries to test and source domains. Furthermore, we conduct a case study comparing two powerful domain adaptation techniques. We find that generated query type distribution is an important factor, and generating queries that share a similar domain to the test documents improves the performance of domain adaptation methods. This study further emphasizes the importance of domain-tailored generated queries. △ Less

Submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.14459 [pdf, other]

Interpretability Analysis of Domain Adapted Dense Retrievers

Authors: Goksenin Yuksel, Jaap Kamps

Abstract: Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. Previous research has investigated unsupervised domain adaptation techniques to adapt dense retrievers to target domains. However, these studies have not focused on expla… ▽ More Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. Previous research has investigated unsupervised domain adaptation techniques to adapt dense retrievers to target domains. However, these studies have not focused on explainability analysis to understand how such adaptations alter the model's behavior. In this paper, we propose utilizing the integrated gradients framework to develop an interpretability method that provides both instance-based and ranking-based explanations for dense retrievers. To generate these explanations, we introduce a novel baseline that reveals both query and document attributions. This method is used to analyze the effects of domain adaptation on input attributions for query and document tokens across two datasets: the financial question answering dataset (FIQA) and the biomedical information retrieval dataset (TREC-COVID). Our visualizations reveal that domain-adapted models focus more on in-domain terminology compared to non-adapted models, exemplified by terms such as "hedge," "gold," "corona," and "disease." This research addresses how unsupervised domain adaptation techniques influence the behavior of dense retrievers when adapted to new domains. Additionally, we demonstrate that integrated gradients are a viable choice for explaining and analyzing the internal mechanisms of these opaque neural models. △ Less

Submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.14434 [pdf, other]

Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation

Authors: Goksenin Yuksel, David Rau, Jaap Kamps

Abstract: Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL). GPL uses synthetic query generation and initially mined hard negatives to distill know… ▽ More Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL). GPL uses synthetic query generation and initially mined hard negatives to distill knowledge from cross-encoder to dense retrievers in the target domain. In this paper, we analyze the documents retrieved by the domain-adapted model and discover that these are more relevant to the target queries than those of the non-domain-adapted model. We then propose refreshing the hard-negative index during the knowledge distillation phase to mine better hard negatives. Our remining R-GPL approach boosts ranking performance in 13/14 BEIR datasets and 9/12 LoTTe datasets. Our contributions are (i) analyzing hard negatives returned by domain-adapted and non-domain-adapted models and (ii) applying the GPL training with and without hard-negative re-mining in LoTTE and BEIR datasets. △ Less

Submitted 24 January, 2025; originally announced January 2025.

arXiv:2207.02522 [pdf, other]

doi 10.1145/3539813.3545144

The Role of Complex NLP in Transformers for Text Ranking?

Authors: David Rau, Jaap Kamps

Abstract: Even though term-based methods such as BM25 provide strong baselines in ranking, under certain conditions they are dominated by large pre-trained masked language models (MLMs) such as BERT. To date, the source of their effectiveness remains unclear. Is it their ability to truly understand the meaning through modeling syntactic aspects? We answer this by manipulating the input order and position in… ▽ More Even though term-based methods such as BM25 provide strong baselines in ranking, under certain conditions they are dominated by large pre-trained masked language models (MLMs) such as BERT. To date, the source of their effectiveness remains unclear. Is it their ability to truly understand the meaning through modeling syntactic aspects? We answer this by manipulating the input order and position information in a way that destroys the natural sequence order of query and passage and shows that the model still achieves comparable performance. Overall, our results highlight that syntactic aspects do not play a critical role in the effectiveness of re-ranking with BERT. We point to other mechanisms such as query-passage cross-attention and richer embeddings that capture word meanings based on aggregated context regardless of the word order for being the main attributions for its superior performance. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR '22)

arXiv:2204.07233 [pdf, other]

How Different are Pre-trained Transformers for Text Ranking?

Authors: David Rau, Jaap Kamps

Abstract: In recent years, large pre-trained transformers have led to substantial gains in performance over traditional retrieval models and feedback approaches. However, these results are primarily based on the MS Marco/TREC Deep Learning Track setup, with its very particular setup, and our understanding of why and how these models work better is fragmented at best. We analyze effective BERT-based cross-en… ▽ More In recent years, large pre-trained transformers have led to substantial gains in performance over traditional retrieval models and feedback approaches. However, these results are primarily based on the MS Marco/TREC Deep Learning Track setup, with its very particular setup, and our understanding of why and how these models work better is fragmented at best. We analyze effective BERT-based cross-encoders versus traditional BM25 ranking for the passage retrieval task where the largest gains have been observed, and investigate two main questions. On the one hand, what is similar? To what extent does the neural ranker already encompass the capacity of traditional rankers? Is the gain in performance due to a better ranking of the same documents (prioritizing precision)? On the other hand, what is different? Can it retrieve effectively documents missed by traditional systems (prioritizing recall)? We discover substantial differences in the notion of relevance identifying strengths and weaknesses of BERT that may inspire research for future improvement. Our results contribute to our understanding of (black-box) neural rankers relative to (well-understood) traditional rankers, help understand the particular experimental setting of MS-Marco-based test collections. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: ECIR 2022

arXiv:2109.06707 [pdf, other]

A pragmatic approach to estimating average treatment effects from EHR data: the effect of prone positioning on mechanically ventilated COVID-19 patients

Authors: Adam Izdebski, Patrick J. Thoral, Robbert C. A. Lalisang, Dean M. McHugh, Diederik Gommers, Olaf L. Cremer, Rob J. Bosman, Sander Rigter, Evert-Jan Wils, Tim Frenzel, Dave A. Dongelmans, Remko de Jong, Marco A. A. Peters, Marlijn J. A Kamps, Dharmanand Ramnarain, Ralph Nowitzky, Fleur G. C. A. Nooteboom, Wouter de Ruijter, Louise C. Urlings-Strop, Ellen G. M. Smit, D. Jannet Mehagnoul-Schipper, Tom Dormans, Cornelis P. C. de Jager, Stefaan H. A. Hendriks, Sefanja Achterberg , et al. (21 additional authors not shown)

Abstract: Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article proposes a pragmatic methodo… ▽ More Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article proposes a pragmatic methodology to obtain preliminary but robust estimation of treatment effect from observational studies, to provide front-line clinicians with a degree of confidence in their treatment strategy. Our study design is applied to an open problem, the estimation of treatment effect of the proning maneuver on COVID-19 Intensive Care patients. △ Less

Submitted 3 December, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

arXiv:1810.05436 [pdf, other]

HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents

Authors: Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

Abstract: A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three distributions for assessing the diversity of documents: distributions of words within documents, words within topics, and topics within documents. Topic models play a central role in this approach and, hence, thei… ▽ More A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three distributions for assessing the diversity of documents: distributions of words within documents, words within topics, and topics within documents. Topic models play a central role in this approach and, hence, their quality is crucial to the efficacy of measuring topical diversity. The quality of topic models is affected by two causes: generality and impurity of topics. General topics only include common information of a background corpus and are assigned to most of the documents. Impure topics contain words that are not related to the topic. Impurity lowers the interpretability of topic models. Impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation process aimed at removing generality and impurity. Our approach has three re-estimation components: (1) document re-estimation, which removes general words from the documents; (2) topic re-estimation, which re-estimates the distribution over words of each topic; and (3) topic assignment re-estimation, which re-estimates for each document its distributions over topics. For measuring topical diversity of text documents, our HiTR approach improves over the state-of-the-art measured on PubMed dataset. △ Less

Submitted 12 October, 2018; originally announced October 2018.

Comments: IEEE Transactions on Knowledge and Data Engineering

arXiv:1806.08694 [pdf, other]

Learning to Rank from Samples of Variable Quality

Authors: Mostafa Dehghani, Jaap Kamps

Abstract: Training deep neural networks requires many training samples, but in practice, training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality-versus quantity trade-off in the learning process. Do we learn from the… ▽ More Training deep neural networks requires many training samples, but in practice, training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality-versus quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on document ranking where we outperform state-of-the-art alternative semi-supervised methods. △ Less

Submitted 21 June, 2018; originally announced June 2018.

Comments: Presented at The First International SIGIR2016 Workshop on Learning From Limited Or Noisy Data For Information Retrieval. arXiv admin note: substantial text overlap with arXiv:1711.02799

arXiv:1711.11383 [pdf, other]

Learning to Learn from Weak Supervision by Full Supervision

Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

Abstract: In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that ar… ▽ More In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to control the magnitude of the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model. △ Less

Submitted 30 November, 2017; originally announced November 2017.

Comments: Accepted at NIPS Workshop on Meta-Learning (MetaLearn 2017), Long Beach, CA, USA

arXiv:1711.05603 [pdf, other]

Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

Authors: Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, Jaap Kamps

Abstract: Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a s… ▽ More Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a set of texts that share a specific metadata feature, which can be a time-period, but also a social entity such as a political party. For each viewpoint, we learn a semantic space in which each word is represented as a low dimensional neural embedded vector. The challenge is to compare the meaning of a word in one space to its meaning in another space and measure the size of the semantic shifts. We compare the effectiveness of a measure based on optimal transformations between the two spaces with a measure based on the similarity of the neighbors of the word in the respective spaces. Our experiments demonstrate that the combination of these two performs best. We show that the semantic shifts not only occur over time, but also along different viewpoints in a short period of time. For evaluation, we demonstrate how this approach captures meaningful semantic shifts and can help improve other tasks such as the contrastive viewpoint summarization and ideology detection (measured as classification accuracy) in political texts. We also show that the two laws of semantic change which were empirically shown to hold for temporal shifts also hold for shifts across viewpoints. These laws state that frequent words are less likely to shift meaning while words with many senses are more likely to do so. △ Less

Submitted 15 November, 2017; originally announced November 2017.

Comments: In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management (CIKM2017)

arXiv:1711.02799 [pdf, other]

Fidelity-Weighted Learning

Authors: Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

Abstract: Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the s… ▽ More Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we propose "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations. △ Less

Submitted 23 May, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

Comments: Published as a conference paper at ICLR 2018

arXiv:1711.00313 [pdf, other]

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

Abstract: Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-t… ▽ More Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-tune the parameters with a small amount of data with true labels. This feels intuitively sub-optimal as these two independent stages leave the model unaware about the varying label quality. What if we could somehow inform the model about the label quality? In this paper, we propose a semi-supervised learning method where we train two neural networks in a multi-task fashion: a "target network" and a "confidence network". The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to weight the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model. We evaluate our learning strategy on two different tasks: document ranking and sentiment classification. The results demonstrate that our approach not only enhances the performance compared to the baselines but also speeds up the learning process from weak labels. △ Less

Submitted 7 December, 2017; v1 submitted 1 November, 2017; originally announced November 2017.

arXiv:1711.00310 [pdf, other]

doi 10.1145/3121050.3121105

On Search Powered Navigation

Authors: Mostafa Dehghani, Glorianna Jagfeld, Hosein Azarbonyad, Alex Olieman, Jaap Kamps, Maarten Marx

Abstract: Query-based searching and browsing-based navigation are the two main components of exploratory search. Search lets users dig in deep by controlling their actions to focus on and find just the information they need, whereas navigation helps them to get an overview to decide which content is most important. In this paper, we introduce the concept of "search powered navigation" and investigate the ef… ▽ More Query-based searching and browsing-based navigation are the two main components of exploratory search. Search lets users dig in deep by controlling their actions to focus on and find just the information they need, whereas navigation helps them to get an overview to decide which content is most important. In this paper, we introduce the concept of "search powered navigation" and investigate the effect of empowering navigation with search functionality on information seeking behavior of users and their experience by conducting a user study on exploratory search tasks, differentiated by different types of information needs. Our main findings are as follows: First, we observe radically different search tactics. Using search, users are able to control and augment their search focus, hence they explore the data in a depth-first, bottom-up manner. Conversely, using pure navigation they tend to check different options to be able to decide on their path into the data, which corresponds to a breadth-first, top-down exploration. Second, we observe a general natural tendency to combine aspects of search and navigation, however, our experiments show that the search functionality is essential to solve exploratory search tasks that require finding documents related to a narrow domain. Third, we observe a natural need for search powered navigation: users using a system without search functionality find creative ways to mimic searching using navigation. △ Less

Submitted 1 November, 2017; originally announced November 2017.

Comments: Accepted for publication in ACM SIGIR International Conference on the Theory of Information Retrieval

arXiv:1710.01127 [pdf, other]

Finding Talk About the Past in the Discourse of Non-Historians

Authors: Alex Olieman, Kaspar Beelen, Jaap Kamps

Abstract: A heightened interest in the presence of the past has given rise to the new field of memory studies, but there is a lack of search and research tools to support studying how and why the past is evoked in diachronic discourses. Searching for temporal references is not straightforward. It entails bridging the gap between conceptually-based information needs on one side, and term-based inverted index… ▽ More A heightened interest in the presence of the past has given rise to the new field of memory studies, but there is a lack of search and research tools to support studying how and why the past is evoked in diachronic discourses. Searching for temporal references is not straightforward. It entails bridging the gap between conceptually-based information needs on one side, and term-based inverted indexes on the other. Our approach enables the search for references to (intersubjective) historical periods in diachronic corpora. It consists of a semantically-enhanced search engine that is able to find references to many entities at a time, which is combined with a novel interface that invites its user to actively sculpt the search result set. Until now we have been concerned mostly with user-friendly retrieval and selection of sources, but our tool can also contribute to existing efforts to create reusable linked data from and for research in the humanities. △ Less

Submitted 3 October, 2017; originally announced October 2017.

Comments: Presented at Drift-a-LOD 2017

arXiv:1708.01162 [pdf, other]

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

Authors: Alex Olieman, Kaspar Beelen, Milan van Lange, Jaap Kamps, Maarten Marx

Abstract: Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "rig… ▽ More Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "right" EL system. We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II. The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general. △ Less

Submitted 3 August, 2017; originally announced August 2017.

Comments: Accepted for presentation at SEMANTiCS '17

arXiv:1707.07605 [pdf, other]

Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking

Authors: Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten de Rijke

Abstract: Deep neural networks have become a primary tool for solving problems in many fields. They are also used for addressing information retrieval problems and show strong performance in several tasks. Training these models requires large, representative datasets and for most IR tasks, such data contains sensitive information from users. Privacy and confidentiality concerns prevent many data owners from… ▽ More Deep neural networks have become a primary tool for solving problems in many fields. They are also used for addressing information retrieval problems and show strong performance in several tasks. Training these models requires large, representative datasets and for most IR tasks, such data contains sensitive information from users. Privacy and confidentiality concerns prevent many data owners from sharing the data, thus today the research community can only benefit from research on large-scale datasets in a limited manner. In this paper, we discuss privacy preserving mimic learning, i.e., using predictions from a privacy preserving trained model instead of labels from the original sensitive training data as a supervision signal. We present the results of preliminary experiments in which we apply the idea of mimic learning and privacy preserving mimic learning for the task of document re-ranking as one of the core IR tasks. This research is a step toward laying the ground for enabling researchers from data-rich environments to share knowledge learned from actual users' data, which should facilitate research collaborations. △ Less

Submitted 24 July, 2017; originally announced July 2017.

Comments: SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japan

arXiv:1704.08803 [pdf, other]

Neural Ranking Models with Weak Supervision

Authors: Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft

Abstract: Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a… ▽ More Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks. We study their effectiveness under various learning scenarios (point-wise and pair-wise models) and using different input representations (i.e., from encoding query-document pairs into dense/sparse vectors to using word embedding representation). We train our networks using tens of millions of training instances and evaluate it on two standard collections: a homogeneous news collection(Robust) and a heterogeneous large-scale web collection (ClueWeb). Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models. △ Less

Submitted 29 May, 2017; v1 submitted 28 April, 2017; originally announced April 2017.

Comments: In proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017)

arXiv:1701.04273 [pdf, other]

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

Authors: Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

Abstract: A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is subopt… ▽ More A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments. △ Less

Submitted 16 January, 2017; originally announced January 2017.

Comments: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017)

arXiv:1609.00514 [pdf, other]

doi 10.1145/2970398.2970408

On Horizontal and Vertical Separation in Hierarchical Text Classification

Authors: Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx

Abstract: Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account… ▽ More Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers. Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce a "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Hierarchical Significant Words Language Models (HSWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real-world data and demonstrate that how HSWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure. △ Less

Submitted 2 September, 2016; originally announced September 2016.

Comments: Full paper (10 pages) accepted for publication in proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR'16)

MSC Class: 68P20

arXiv:1609.00511 [pdf, ps, other]

doi 10.1145/2854946.2855003

Generalized Group Profiling for Content Customization

Authors: Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx

Abstract: There is an ongoing debate on personalization, adapting results to the unique user exploiting a user's personal history, versus customization, adapting results to a group profile sharing one or more characteristics with the user at hand. Personal profiles are often sparse, due to cold start problems and the fact that users typically search for new items or information, necessitating to back-off to… ▽ More There is an ongoing debate on personalization, adapting results to the unique user exploiting a user's personal history, versus customization, adapting results to a group profile sharing one or more characteristics with the user at hand. Personal profiles are often sparse, due to cold start problems and the fact that users typically search for new items or information, necessitating to back-off to customization, but group profiles often suffer from accidental features brought in by the unique individual contributing to the group. In this paper we propose a generalized group profiling approach that teases apart the exact contribution of the individual user level and the "abstract" group level by extracting a latent model that captures all, and only, the essential features of the whole group. Our main findings are the followings. First, we propose an efficient way of group profiling which implicitly eliminates the general and specific features from users' models in a group and takes out the abstract model representing the whole group. Second, we employ the resulting models in the task of contextual suggestion. We analyse different grouping criteria and we find that group-based suggestions improve the customization. Third, we see that the granularity of groups affects the quality of group profiling. We observe that grouping approach should compromise between the level of customization and groups' size. △ Less

Submitted 2 September, 2016; originally announced September 2016.

Comments: Short paper (4 pages) published in proceedings of ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR'16)

MSC Class: 68P20

arXiv:1608.07952 [pdf, other]

Topical Generalization for Presentation of User Profiles

Authors: Alex Olieman, Jaap Kamps, Gleb Satyukov, Emil de Valk

Abstract: Fine-grained user profile generation approaches have made it increasingly feasible to display on a profile page in which topics a user has expertise or interest. Earlier work on topical user profiling has been directed at enhancing search and personalization functionality, but making such profiles useful for human consumption presents new challenges. With this work, we have taken a first step towa… ▽ More Fine-grained user profile generation approaches have made it increasingly feasible to display on a profile page in which topics a user has expertise or interest. Earlier work on topical user profiling has been directed at enhancing search and personalization functionality, but making such profiles useful for human consumption presents new challenges. With this work, we have taken a first step toward a semantic layout mode for topical user profiles. We have developed a topical generalization approach which finds coherent groups of topics and adds labels to them, based on their association with broader topics in the Wikipedia category graph. A nested layout mode, employing topical generalization, is compared with a simpler flat layout mode in our user study. The results indicate that users favor the nested structure over flat profiles, but tend to overlook the specific topics on the lower level. We propose a third layout mode to address this issue. △ Less

Submitted 19 November, 2016; v1 submitted 29 August, 2016; originally announced August 2016.

Comments: (to be) presented at DIR'16, November 25, 2016, Delft, The Netherlands

arXiv:1607.07904 [pdf, other]

doi 10.13140/RG.2.1.2488.7288

Beyond Movie Recommendations: Solving the Continuous Cold Start Problem in E-commerceRecommendations

Authors: Julia Kiseleva, Alexander Tuzhilin, Jaap Kamps, Melanie J. I. Mueller, Lucas Bernardi, Chad Davis, Ivan Kovacek, Mats Stafseng Einarsen, Djoerd Hiemstra

Abstract: Many e-commerce websites use recommender systems or personalized rankers to personalize search results based on their previous interactions. However, a large fraction of users has no prior inter-actions, making it impossible to use collaborative filtering or rely on user history for personalization. Even the most active users mayvisit only a few times a year and may have volatile needs or differen… ▽ More Many e-commerce websites use recommender systems or personalized rankers to personalize search results based on their previous interactions. However, a large fraction of users has no prior inter-actions, making it impossible to use collaborative filtering or rely on user history for personalization. Even the most active users mayvisit only a few times a year and may have volatile needs or different personas, making their personal history a sparse and noisy signal at best. This paper investigates how, when we cannot rely on the user history, the large scale availability of other user interactions still allows us to build meaningful profiles from the contextual data and whether such contextual profiles are useful to customize the ranking, exemplified by data from a major online travel agentBooking.com.Our main findings are threefold: First, we characterize the Continuous Cold Start Problem(CoCoS) from the viewpoint of typical e-commerce applications. Second, as explicit situational con-text is not available in typical real world applications, implicit cues from transaction logs used at scale can capture essential features of situational context. Third, contextual user profiles can be created offline, resulting in a set of smaller models compared to a single huge non-contextual model, making contextual ranking available with negligible CPU and memory footprint. Finally we conclude that, in an online A/B test on live users, our contextual ranker in-creased user engagement substantially over a non-contextual base-line, with click-through-rate (CTR) increased by 20%. This clearly demonstrates the value of contextual user profiles in a real world application. △ Less

Submitted 26 July, 2016; originally announced July 2016.

arXiv:1512.07051 [pdf, other]

The Impact of Technical Domain Expertise on Search Behavior and Task Outcome

Authors: Julia Kiseleva, Alejandro Montes García, Jaap Kamps, Nikita Spirin

Abstract: Domain expertise is regarded as one of the key factors impacting search success: experts are known to write more effective queries, to select the right results on the result page, and to find answers satisfying their information needs. Search transaction logs play the crucial role in the result ranking. Yet despite the variety in expertise levels of users, all prior interactions are treated alike,… ▽ More Domain expertise is regarded as one of the key factors impacting search success: experts are known to write more effective queries, to select the right results on the result page, and to find answers satisfying their information needs. Search transaction logs play the crucial role in the result ranking. Yet despite the variety in expertise levels of users, all prior interactions are treated alike, suggesting that weighting in expertise can improve the ranking for informational tasks. The main aim of this paper is to investigate the impact of high levels of technical domain expertise on both search behavior and task outcome. We conduct an online user study with searchers proficient in programming languages. We focus on Java and Javascript, yet we believe that our study and results are applicable for other expertise-sensitive search tasks. The main findings are three-fold: First, we constructed expertise tests that effectively measure technical domain expertise and correlate well with the self-reported expertise. Second, we showed that there is a clear position bias, but technical domain experts were less affected by position bias. Third, we found that general expertise helped finding the correct answers, but the domain experts were more successful as they managed to detect better answers. Our work is using explicit tests to determine user expertise levels, which is an important step toward fully automatic detection of expertise levels based on interaction behavior. A deeper understanding of the impact of expertise on search behavior and task outcome can enable more effective use of expert behavior in search logs - essentially make everyone search as an expert. △ Less

Submitted 22 December, 2015; originally announced December 2015.

arXiv:1509.02010 [pdf, other]

LocLinkVis: A Geographic Information Retrieval-Based System for Large-Scale Exploratory Search

Authors: Alex Olieman, Jaap Kamps, Rosa Merino Claros

Abstract: In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footpri… ▽ More In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footprints. △ Less

Submitted 26 September, 2015; v1 submitted 7 September, 2015; originally announced September 2015.

Comments: SEM'15

Journal ref: Proc. Posters and Demos Track of 11th Int. Conf. on Semantic Systems (2015) 30-33

arXiv:1509.01865 [pdf]

A Hybrid Approach to Domain-Specific Entity Linking

Authors: Alex Olieman, Jaap Kamps, Maarten Marx, Arjan Nusselder

Abstract: The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hy… ▽ More The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hybrid use of simple specialist linkers in combination with an existing generalist system to address this problem. Our main findings are the following. First, we construct a new reusable benchmark for EL on a corpus of domain-specific conversations. Second, we test the performance of a range of approaches under the same conditions, and show that specialist linkers obtain high precision in isolation, and high recall when combined with generalist linkers. Hence, we can effectively exploit local context and get the best of both worlds. △ Less

Submitted 6 September, 2015; originally announced September 2015.

Comments: SEM'15

ACM Class: H.3.1

Journal ref: Proc. Posters and Demos track of 11th Int. Conf. on Semantic Systems (2015) 55-58

arXiv:1508.01177 [pdf, other]

The Continuous Cold Start Problem in e-Commerce Recommender Systems

Authors: Lucas Bernardi, Jaap Kamps, Julia Kiseleva, Melanie JI Müller

Abstract: Many e-commerce websites use recommender systems to recommend items to users. When a user or item is new, the system may fail because not enough information is available on this user or item. Various solutions to this `cold-start problem' have been proposed in the literature. However, many real-life e-commerce applications suffer from an aggravated, recurring version of cold-start even for known u… ▽ More Many e-commerce websites use recommender systems to recommend items to users. When a user or item is new, the system may fail because not enough information is available on this user or item. Various solutions to this `cold-start problem' have been proposed in the literature. However, many real-life e-commerce applications suffer from an aggravated, recurring version of cold-start even for known users or items, since many users visit the website rarely, change their interests over time, or exhibit different personas. This paper exposes the `Continuous Cold Start' (CoCoS) problem and its consequences for content- and context-based recommendation from the viewpoint of typical e-commerce applications, illustrated with examples from a major travel recommendation website, Booking.com. △ Less

Submitted 5 August, 2015; originally announced August 2015.

Comments: 6 pages, 3 figures. 2nd Workshop on New Trends in Content-Based Recommender Systems, RecSys 2015

arXiv:1506.00904 [pdf, other]

doi 10.1145/2766462.2776777

Where to Go on Your Next Trip? Optimizing Travel Destinations Based on User Preferences

Authors: Julia Kiseleva, Melanie J. I. Müller, Lucas Bernardi, Chad Davis, Ivan Kovacek, Mats Stafseng Einarsen, Jaap Kamps, Alexander Tuzhilin, Djoerd Hiemstra

Abstract: Recommendation based on user preferences is a common task for e-commerce websites. New recommendation algorithms are often evaluated by offline comparison to baseline algorithms such as recommending random or the most popular items. Here, we investigate how these algorithms themselves perform and compare to the operational production system in large scale online experiments in a real-world applica… ▽ More Recommendation based on user preferences is a common task for e-commerce websites. New recommendation algorithms are often evaluated by offline comparison to baseline algorithms such as recommending random or the most popular items. Here, we investigate how these algorithms themselves perform and compare to the operational production system in large scale online experiments in a real-world application. Specifically, we focus on recommending travel destinations at Booking.com, a major online travel site, to users searching for their preferred vacation activities. To build ranking models we use multi-criteria rating data provided by previous users after their stay at a destination. We implement three methods and compare them to the current baseline in Booking.com: random, most popular, and Naive Bayes. Our general conclusion is that, in an online A/B test with live users, our Naive-Bayes based ranker increased user engagement significantly over the current online system. △ Less

Submitted 2 June, 2015; originally announced June 2015.

Comments: 6 pages, 2 figures in SIGIR 2015, SIRIP Symposium on IR in Practice

Showing 1–27 of 27 results for author: Kamps, J