Skip to main content

Showing 1–13 of 13 results for author: Hiemstra, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2109.06306  [pdf, other

    cs.IR

    BERT for Target Apps Selection: Analyzing the Diversity and Performance of BERT in Unified Mobile Search

    Authors: Negin Ghasemi, Mohammad Aliannejadi, Djoerd Hiemstra

    Abstract: A unified mobile search framework aims to identify the mobile apps that can satisfy a user's information need and route the user's query to them. Previous work has shown that resource descriptions for mobile apps are sparse as they rely on the app's previous queries. This problem puts certain apps in dominance and leaves out the resource-scarce apps from the top ranks. In this case, we need a rank… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  2. arXiv:2010.12674  [pdf, other

    cs.IR

    Exploring task-based query expansion at the TREC-COVID track

    Authors: Thomas Schoegje, Chris Kamphuis, Koen Dercksen, Djoerd Hiemstra, Toine Pieters, Arjen de Vries

    Abstract: We explore how to generate effective queries based on search tasks. Our approach has three main steps: 1) identify search tasks based on research goals, 2) manually classify search queries according to those tasks, and 3) compare three methods to improve search rankings based on the task context. The most promising approach is based on expanding the user's query terms using task terms, which sligh… ▽ More

    Submitted 16 November, 2020; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Update version 2: Improved title Update version 3: corrected terminology hyponym -> hypernym in two instances Documents our participation to the TREC-COVID track. Contains 16 pages, 0 figures

  3. arXiv:2007.02620  [pdf, other

    cs.IR

    Reducing Misinformation in Query Autocompletions

    Authors: Djoerd Hiemstra

    Abstract: Query autocompletions help users of search engines to speed up their searches by recommending completions of partially typed queries in a drop down box. These recommended query autocompletions are usually based on large logs of queries that were previously entered by the search engine's users. Therefore, misinformation entered -- either accidentally or purposely to manipulate the search engine --… ▽ More

    Submitted 11 September, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: Published at the 2nd International Symposium on Open Search Technology, 12-14 October 2020, CERN, Geneva, Switzerland

    MSC Class: IR

  4. arXiv:2001.05714  [pdf, other

    cs.CL

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Authors: Jan Trienes, Dolf Trieschnigg, Christin Seifert, Djoerd Hiemstra

    Abstract: Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

    Comments: Proceedings of the 1st ACM WSDM Health Search and Data Mining Workshop (HSDM2020), 2020

  5. arXiv:1811.09292  [pdf, other

    cs.IR cs.SI

    Recommending Users: Whom to Follow on Federated Social Networks

    Authors: Jan Trienes, Andrés Torres Cano, Djoerd Hiemstra

    Abstract: To foster an active and engaged community, social networks employ recommendation algorithms that filter large amounts of contents and provide a user with personalized views of the network. Popular social networks such as Facebook and Twitter generate follow recommendations by listing profiles a user may be interested to connect with. Federated social networks aim to resolve issues associated with… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

    Comments: 4 pages, 1 figure

    Journal ref: In Proceedings of the 17th Dutch-Belgian Information Retrieval Workshop (DIR2018). Nov. 2018, Leiden, The Netherlands, 13-16

  6. arXiv:1609.04556  [pdf, ps, other

    cs.IR

    Resource Selection for Federated Search on the Web

    Authors: Dong Nguyen, Thomas Demeester, Dolf Trieschnigg, Djoerd Hiemstra

    Abstract: A publicly available dataset for federated search reflecting a real web environment has long been absent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web using a recently released test collection containing the results from more than a hundred real search… ▽ More

    Submitted 15 September, 2016; originally announced September 2016.

    Comments: CTIT Technical Report, University of Twente

    ACM Class: H.3.3

  7. Beyond Movie Recommendations: Solving the Continuous Cold Start Problem in E-commerceRecommendations

    Authors: Julia Kiseleva, Alexander Tuzhilin, Jaap Kamps, Melanie J. I. Mueller, Lucas Bernardi, Chad Davis, Ivan Kovacek, Mats Stafseng Einarsen, Djoerd Hiemstra

    Abstract: Many e-commerce websites use recommender systems or personalized rankers to personalize search results based on their previous interactions. However, a large fraction of users has no prior inter-actions, making it impossible to use collaborative filtering or rely on user history for personalization. Even the most active users mayvisit only a few times a year and may have volatile needs or differen… ▽ More

    Submitted 26 July, 2016; originally announced July 2016.

  8. Predicting Relevance based on Assessor Disagreement: Analysis and Practical Applications for Search Evaluation

    Authors: Thomas Demeester, Robin Aly, Djoerd Hiemstra, Dong Nguyen, Chris Develder

    Abstract: Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor… ▽ More

    Submitted 23 November, 2015; originally announced November 2015.

    Comments: Accepted for publication in Springer Information Retrieval Journal, special issue on Information Retrieval Evaluation using Test Collections

  9. arXiv:1508.02483  [pdf, ps, other

    cs.SI

    Determine the User Country of a Tweet

    Authors: Han van der Veen, Djoerd Hiemstra, Tijs van den Broek, Michel Ehrenhard, Ariana Need

    Abstract: In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes mode… ▽ More

    Submitted 11 August, 2015; originally announced August 2015.

    Comments: CTIT Technical Report, University of Twente

    Report number: tr-ctit-15-05

  10. Where to Go on Your Next Trip? Optimizing Travel Destinations Based on User Preferences

    Authors: Julia Kiseleva, Melanie J. I. Müller, Lucas Bernardi, Chad Davis, Ivan Kovacek, Mats Stafseng Einarsen, Jaap Kamps, Alexander Tuzhilin, Djoerd Hiemstra

    Abstract: Recommendation based on user preferences is a common task for e-commerce websites. New recommendation algorithms are often evaluated by offline comparison to baseline algorithms such as recommending random or the most popular items. Here, we investigate how these algorithms themselves perform and compare to the operational production system in large scale online experiments in a real-world applica… ▽ More

    Submitted 2 June, 2015; originally announced June 2015.

    Comments: 6 pages, 2 figures in SIGIR 2015, SIRIP Symposium on IR in Practice

  11. arXiv:1008.1566  [pdf, other

    cs.LG cs.AI

    Separate Training for Conditional Random Fields Using Co-occurrence Rate Factorization

    Authors: Zhemin Zhu, Djoerd Hiemstra, Peter Apers, Andreas Wombacher

    Abstract: The standard training method of Conditional Random Fields (CRFs) is very slow for large-scale applications. As an alternative, piecewise training divides the full graph into pieces, trains them independently, and combines the learned weights at test time. In this paper, we present \emph{separate} training for undirected models based on the novel Co-occurrence Rate Factorization (CR-F). Separate tr… ▽ More

    Submitted 4 December, 2012; v1 submitted 9 August, 2010; originally announced August 2010.

    Comments: 10pages

    Report number: TR-CTIT-12-29

  12. arXiv:1005.4752  [pdf, ps, other

    cs.IR cs.DB

    A database approach to information retrieval: The remarkable relationship between language models and region models

    Authors: Djoerd Hiemstra, Vojkan Mihajlovic

    Abstract: In this report, we unify two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured document retrieval. They provide a well-defined behaviour as well as a simple query language that allows application developers to rapidly develop applications. Language models are particularly useful to reason about the ranking of search r… ▽ More

    Submitted 26 May, 2010; originally announced May 2010.

    Comments: Published as CTIT Technical Report 05-35

    Report number: TR-CTIT-10-15 ACM Class: H.3.3

  13. arXiv:1004.4489  [pdf, other

    cs.IR

    MIREX: MapReduce Information Retrieval Experiments

    Authors: Djoerd Hiemstra, Claudia Hauff

    Abstract: We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost ma- chines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code i… ▽ More

    Submitted 26 April, 2010; originally announced April 2010.

    Report number: TR-CTIT-10-15 ACM Class: H.3.3