Skip to main content

Showing 1–27 of 27 results for author: Kamps, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.14466  [pdf, other

    cs.IR stat.AP

    On Correlating Factors for Domain Adaptation Performance

    Authors: Goksenin Yuksel, Jaap Kamps

    Abstract: Dense retrievers have demonstrated significant potential for neural information retrieval; however, they lack robustness to domain shifts, limiting their efficacy in zero-shot settings across diverse domains. In this paper, we set out to analyze the possible factors that lead to successful domain adaptation of dense retrievers. We include domain similarity proxies between generated queries to test… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  2. arXiv:2501.14459  [pdf, other

    cs.IR cs.AI

    Interpretability Analysis of Domain Adapted Dense Retrievers

    Authors: Goksenin Yuksel, Jaap Kamps

    Abstract: Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. Previous research has investigated unsupervised domain adaptation techniques to adapt dense retrievers to target domains. However, these studies have not focused on expla… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  3. arXiv:2501.14434  [pdf, other

    cs.IR cs.LG

    Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation

    Authors: Goksenin Yuksel, David Rau, Jaap Kamps

    Abstract: Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL). GPL uses synthetic query generation and initially mined hard negatives to distill know… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  4. The Role of Complex NLP in Transformers for Text Ranking?

    Authors: David Rau, Jaap Kamps

    Abstract: Even though term-based methods such as BM25 provide strong baselines in ranking, under certain conditions they are dominated by large pre-trained masked language models (MLMs) such as BERT. To date, the source of their effectiveness remains unclear. Is it their ability to truly understand the meaning through modeling syntactic aspects? We answer this by manipulating the input order and position in… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR '22)

  5. arXiv:2204.07233  [pdf, other

    cs.IR cs.AI cs.CL

    How Different are Pre-trained Transformers for Text Ranking?

    Authors: David Rau, Jaap Kamps

    Abstract: In recent years, large pre-trained transformers have led to substantial gains in performance over traditional retrieval models and feedback approaches. However, these results are primarily based on the MS Marco/TREC Deep Learning Track setup, with its very particular setup, and our understanding of why and how these models work better is fragmented at best. We analyze effective BERT-based cross-en… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: ECIR 2022

  6. arXiv:2109.06707  [pdf, other

    cs.LG cs.AI

    A pragmatic approach to estimating average treatment effects from EHR data: the effect of prone positioning on mechanically ventilated COVID-19 patients

    Authors: Adam Izdebski, Patrick J. Thoral, Robbert C. A. Lalisang, Dean M. McHugh, Diederik Gommers, Olaf L. Cremer, Rob J. Bosman, Sander Rigter, Evert-Jan Wils, Tim Frenzel, Dave A. Dongelmans, Remko de Jong, Marco A. A. Peters, Marlijn J. A Kamps, Dharmanand Ramnarain, Ralph Nowitzky, Fleur G. C. A. Nooteboom, Wouter de Ruijter, Louise C. Urlings-Strop, Ellen G. M. Smit, D. Jannet Mehagnoul-Schipper, Tom Dormans, Cornelis P. C. de Jager, Stefaan H. A. Hendriks, Sefanja Achterberg , et al. (21 additional authors not shown)

    Abstract: Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article proposes a pragmatic methodo… ▽ More

    Submitted 3 December, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

  7. arXiv:1810.05436  [pdf, other

    cs.CL cs.IR cs.LG

    HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents

    Authors: Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

    Abstract: A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three distributions for assessing the diversity of documents: distributions of words within documents, words within topics, and topics within documents. Topic models play a central role in this approach and, hence, thei… ▽ More

    Submitted 12 October, 2018; originally announced October 2018.

    Comments: IEEE Transactions on Knowledge and Data Engineering

  8. arXiv:1806.08694  [pdf, other

    cs.IR cs.AI

    Learning to Rank from Samples of Variable Quality

    Authors: Mostafa Dehghani, Jaap Kamps

    Abstract: Training deep neural networks requires many training samples, but in practice, training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality-versus quantity trade-off in the learning process. Do we learn from the… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

    Comments: Presented at The First International SIGIR2016 Workshop on Learning From Limited Or Noisy Data For Information Retrieval. arXiv admin note: substantial text overlap with arXiv:1711.02799

  9. arXiv:1711.11383  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Learning to Learn from Weak Supervision by Full Supervision

    Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

    Abstract: In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that ar… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.

    Comments: Accepted at NIPS Workshop on Meta-Learning (MetaLearn 2017), Long Beach, CA, USA

  10. arXiv:1711.05603  [pdf, other

    cs.CL

    Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

    Authors: Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, Jaap Kamps

    Abstract: Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a s… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management (CIKM2017)

  11. arXiv:1711.02799  [pdf, other

    cs.LG cs.CL cs.NE

    Fidelity-Weighted Learning

    Authors: Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

    Abstract: Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the s… ▽ More

    Submitted 23 May, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: Published as a conference paper at ICLR 2018

  12. arXiv:1711.00313  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

    Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

    Abstract: Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-t… ▽ More

    Submitted 7 December, 2017; v1 submitted 1 November, 2017; originally announced November 2017.

  13. On Search Powered Navigation

    Authors: Mostafa Dehghani, Glorianna Jagfeld, Hosein Azarbonyad, Alex Olieman, Jaap Kamps, Maarten Marx

    Abstract: Query-based searching and browsing-based navigation are the two main components of exploratory search. Search lets users dig in deep by controlling their actions to focus on and find just the information they need, whereas navigation helps them to get an overview to decide which content is most important. In this paper, we introduce the concept of "search powered navigation" and investigate the ef… ▽ More

    Submitted 1 November, 2017; originally announced November 2017.

    Comments: Accepted for publication in ACM SIGIR International Conference on the Theory of Information Retrieval

  14. arXiv:1710.01127  [pdf, other

    cs.IR cs.DL

    Finding Talk About the Past in the Discourse of Non-Historians

    Authors: Alex Olieman, Kaspar Beelen, Jaap Kamps

    Abstract: A heightened interest in the presence of the past has given rise to the new field of memory studies, but there is a lack of search and research tools to support studying how and why the past is evoked in diachronic discourses. Searching for temporal references is not straightforward. It entails bridging the gap between conceptually-based information needs on one side, and term-based inverted index… ▽ More

    Submitted 3 October, 2017; originally announced October 2017.

    Comments: Presented at Drift-a-LOD 2017

  15. arXiv:1708.01162  [pdf, other

    cs.IR

    Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

    Authors: Alex Olieman, Kaspar Beelen, Milan van Lange, Jaap Kamps, Maarten Marx

    Abstract: Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "rig… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

    Comments: Accepted for presentation at SEMANTiCS '17

  16. arXiv:1707.07605  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking

    Authors: Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten de Rijke

    Abstract: Deep neural networks have become a primary tool for solving problems in many fields. They are also used for addressing information retrieval problems and show strong performance in several tasks. Training these models requires large, representative datasets and for most IR tasks, such data contains sensitive information from users. Privacy and confidentiality concerns prevent many data owners from… ▽ More

    Submitted 24 July, 2017; originally announced July 2017.

    Comments: SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japan

  17. arXiv:1704.08803  [pdf, other

    cs.IR cs.CL cs.LG

    Neural Ranking Models with Weak Supervision

    Authors: Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft

    Abstract: Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a… ▽ More

    Submitted 29 May, 2017; v1 submitted 28 April, 2017; originally announced April 2017.

    Comments: In proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017)

  18. arXiv:1701.04273  [pdf, other

    cs.IR

    Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

    Authors: Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

    Abstract: A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is subopt… ▽ More

    Submitted 16 January, 2017; originally announced January 2017.

    Comments: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017)

  19. arXiv:1609.00514  [pdf, other

    cs.IR cs.CL cs.IT

    On Horizontal and Vertical Separation in Hierarchical Text Classification

    Authors: Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx

    Abstract: Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account… ▽ More

    Submitted 2 September, 2016; originally announced September 2016.

    Comments: Full paper (10 pages) accepted for publication in proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR'16)

    MSC Class: 68P20

  20. Generalized Group Profiling for Content Customization

    Authors: Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx

    Abstract: There is an ongoing debate on personalization, adapting results to the unique user exploiting a user's personal history, versus customization, adapting results to a group profile sharing one or more characteristics with the user at hand. Personal profiles are often sparse, due to cold start problems and the fact that users typically search for new items or information, necessitating to back-off to… ▽ More

    Submitted 2 September, 2016; originally announced September 2016.

    Comments: Short paper (4 pages) published in proceedings of ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR'16)

    MSC Class: 68P20

  21. arXiv:1608.07952  [pdf, other

    cs.IR

    Topical Generalization for Presentation of User Profiles

    Authors: Alex Olieman, Jaap Kamps, Gleb Satyukov, Emil de Valk

    Abstract: Fine-grained user profile generation approaches have made it increasingly feasible to display on a profile page in which topics a user has expertise or interest. Earlier work on topical user profiling has been directed at enhancing search and personalization functionality, but making such profiles useful for human consumption presents new challenges. With this work, we have taken a first step towa… ▽ More

    Submitted 19 November, 2016; v1 submitted 29 August, 2016; originally announced August 2016.

    Comments: (to be) presented at DIR'16, November 25, 2016, Delft, The Netherlands

  22. Beyond Movie Recommendations: Solving the Continuous Cold Start Problem in E-commerceRecommendations

    Authors: Julia Kiseleva, Alexander Tuzhilin, Jaap Kamps, Melanie J. I. Mueller, Lucas Bernardi, Chad Davis, Ivan Kovacek, Mats Stafseng Einarsen, Djoerd Hiemstra

    Abstract: Many e-commerce websites use recommender systems or personalized rankers to personalize search results based on their previous interactions. However, a large fraction of users has no prior inter-actions, making it impossible to use collaborative filtering or rely on user history for personalization. Even the most active users mayvisit only a few times a year and may have volatile needs or differen… ▽ More

    Submitted 26 July, 2016; originally announced July 2016.

  23. arXiv:1512.07051  [pdf, other

    cs.IR

    The Impact of Technical Domain Expertise on Search Behavior and Task Outcome

    Authors: Julia Kiseleva, Alejandro Montes García, Jaap Kamps, Nikita Spirin

    Abstract: Domain expertise is regarded as one of the key factors impacting search success: experts are known to write more effective queries, to select the right results on the result page, and to find answers satisfying their information needs. Search transaction logs play the crucial role in the result ranking. Yet despite the variety in expertise levels of users, all prior interactions are treated alike,… ▽ More

    Submitted 22 December, 2015; originally announced December 2015.

  24. arXiv:1509.02010  [pdf, other

    cs.IR

    LocLinkVis: A Geographic Information Retrieval-Based System for Large-Scale Exploratory Search

    Authors: Alex Olieman, Jaap Kamps, Rosa Merino Claros

    Abstract: In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footpri… ▽ More

    Submitted 26 September, 2015; v1 submitted 7 September, 2015; originally announced September 2015.

    Comments: SEM'15

    Journal ref: Proc. Posters and Demos Track of 11th Int. Conf. on Semantic Systems (2015) 30-33

  25. arXiv:1509.01865  [pdf

    cs.IR cs.CL

    A Hybrid Approach to Domain-Specific Entity Linking

    Authors: Alex Olieman, Jaap Kamps, Maarten Marx, Arjan Nusselder

    Abstract: The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hy… ▽ More

    Submitted 6 September, 2015; originally announced September 2015.

    Comments: SEM'15

    ACM Class: H.3.1

    Journal ref: Proc. Posters and Demos track of 11th Int. Conf. on Semantic Systems (2015) 55-58

  26. arXiv:1508.01177  [pdf, other

    cs.IR

    The Continuous Cold Start Problem in e-Commerce Recommender Systems

    Authors: Lucas Bernardi, Jaap Kamps, Julia Kiseleva, Melanie JI Müller

    Abstract: Many e-commerce websites use recommender systems to recommend items to users. When a user or item is new, the system may fail because not enough information is available on this user or item. Various solutions to this `cold-start problem' have been proposed in the literature. However, many real-life e-commerce applications suffer from an aggravated, recurring version of cold-start even for known u… ▽ More

    Submitted 5 August, 2015; originally announced August 2015.

    Comments: 6 pages, 3 figures. 2nd Workshop on New Trends in Content-Based Recommender Systems, RecSys 2015

  27. Where to Go on Your Next Trip? Optimizing Travel Destinations Based on User Preferences

    Authors: Julia Kiseleva, Melanie J. I. Müller, Lucas Bernardi, Chad Davis, Ivan Kovacek, Mats Stafseng Einarsen, Jaap Kamps, Alexander Tuzhilin, Djoerd Hiemstra

    Abstract: Recommendation based on user preferences is a common task for e-commerce websites. New recommendation algorithms are often evaluated by offline comparison to baseline algorithms such as recommending random or the most popular items. Here, we investigate how these algorithms themselves perform and compare to the operational production system in large scale online experiments in a real-world applica… ▽ More

    Submitted 2 June, 2015; originally announced June 2015.

    Comments: 6 pages, 2 figures in SIGIR 2015, SIRIP Symposium on IR in Practice