Skip to main content

Showing 1–50 of 66 results for author: Lioma, C

.
  1. arXiv:2503.21714  [pdf, other

    cs.CL

    As easy as PIE: understanding when pruning causes language models to disagree

    Authors: Pietro Tropeano, Maria Maistro, Tuukka Ruotsalo, Christina Lioma

    Abstract: Language Model (LM) pruning compresses the model by removing weights, nodes, or other parts of its architecture. Typically, pruning focuses on the resulting efficiency gains at the cost of effectiveness. However, when looking at how individual data points are affected by pruning, it turns out that a particular subset of data points always bears most of the brunt (in terms of reduced accuracy) when… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted to NAACL 2025 (Findings)

  2. Joint Evaluation of Fairness and Relevance in Recommender Systems with Pareto Frontier

    Authors: Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma

    Abstract: Fairness and relevance are two important aspects of recommender systems (RSs). Typically, they are evaluated either (i) separately by individual measures of fairness and relevance, or (ii) jointly using a single measure that accounts for fairness with respect to relevance. However, approach (i) often does not provide a reliable joint estimate of the goodness of the models, as it has two different… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted to TheWebConf/WWW 2025 (Oral)

  3. arXiv:2501.18805  [pdf, ps, other

    cs.IR

    Are Representation Disentanglement and Interpretability Linked in Recommendation Models? A Critical Review and Reproducibility Study

    Authors: Ervin Dervishaj, Tuukka Ruotsalo, Maria Maistro, Christina Lioma

    Abstract: Unsupervised learning of disentangled representations has been closely tied to enhancing the representation intepretability of Recommender Systems (RSs). This has been achieved by making the representation of individual features more distinctly separated, so that it is easier to attribute the contribution of features to the model's predictions. However, such advantages in interpretability and feat… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: Accepted at the 47th European Conference on Information Retrieval (ECIR 2025)

  4. arXiv:2412.17031  [pdf, ps, other

    cs.CL cs.AI

    A Reality Check on Context Utilisation for Retrieval-Augmented Generation

    Authors: Lovisa Hagström, Sara Vera Marjanović, Haeun Yu, Arnav Arora, Christina Lioma, Maria Maistro, Pepa Atanasova, Isabelle Augenstein

    Abstract: Retrieval-augmented generation (RAG) helps address the limitations of parametric knowledge embedded within a language model (LM). In real world settings, retrieved information can vary in complexity, yet most investigations of LM utilisation of context has been limited to synthetic text. We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with re… ▽ More

    Submitted 29 May, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

    Comments: Accepted at ACL 2025

  5. Joint Extraction and Classification of Danish Competences for Job Matching

    Authors: Qiuchi Li, Christina Lioma

    Abstract: The matching of competences, such as skills, occupations or knowledges, is a key desiderata for candidates to be fit for jobs. Automatic extraction of competences from CVs and Jobs can greatly promote recruiters' productivity in locating relevant candidates for job vacancies. This work presents the first model that jointly extracts and classifies competence from Danish job postings. Different from… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Journal ref: Advances in Information Retrieval. ECIR 2023.Lecture Notes in Computer Science, vol 13981. Springer, Cham

  6. arXiv:2407.17023  [pdf, other

    cs.CL cs.AI

    DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models

    Authors: Sara Vera Marjanović, Haeun Yu, Pepa Atanasova, Maria Maistro, Christina Lioma, Isabelle Augenstein

    Abstract: Knowledge-intensive language understanding tasks require Language Models (LMs) to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. However, conflicting knowledge can be present in the LM's parameters, termed intra-memory conflict, which can affect a model's propensity to accept contextual knowledge. To study the effect of intra-memory conf… ▽ More

    Submitted 7 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 15 pages, 6 figures, Accepted to Findings of EMNLP 2024

    MSC Class: 68T50 ACM Class: I.2.7

  7. Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance

    Authors: Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma

    Abstract: Relevance and fairness are two major objectives of recommender systems (RSs). Recent work proposes measures of RS fairness that are either independent from relevance (fairness-only) or conditioned on relevance (joint measures). While fairness-only measures have been studied extensively, we look into whether joint measures can be trusted. We collect all joint evaluation measures of RS relevance and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGIR 2024 as full paper

  8. Recommending Target Actions Outside Sessions in the Data-poor Insurance Domain

    Authors: Simone Borg Bruun, Christina Lioma, Maria Maistro

    Abstract: Providing personalized recommendations for insurance products is particularly challenging due to the intrinsic and distinctive features of the insurance domain. First, unlike more traditional domains like retail, movie etc., a large amount of user feedback is not available and the item catalog is smaller. Second, due to the higher complexity of products, the majority of users still prefer to compl… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.15360

    Journal ref: ACM Transactions on Recommender Systems 2023

  9. arXiv:2402.15708  [pdf, other

    cs.CL cs.AI cs.IR

    Query Augmentation by Decoding Semantics from Brain Signals

    Authors: Ziyi Ye, Jingtao Zhan, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Christina Lioma, Tuukka Ruotsalo

    Abstract: Query augmentation is a crucial technique for refining semantically imprecise queries. Traditionally, query augmentation relies on extracting information from initially retrieved, potentially relevant documents. If the quality of the initially retrieved documents is low, then the effectiveness of query augmentation would be limited as well. We propose Brain-Aug, which enhances a query by incorpora… ▽ More

    Submitted 3 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  10. arXiv:2402.13006  [pdf, other

    cs.LG cs.CL

    Investigating the Impact of Model Instability on Explanations and Uncertainty

    Authors: Sara Vera Marjanović, Isabelle Augenstein, Christina Lioma

    Abstract: Explainable AI methods facilitate the understanding of model behaviour, yet, small, imperceptible perturbations to inputs can vastly distort explanations. As these explanations are typically evaluated holistically, before model deployment, it is difficult to assess when a particular explanation is trustworthy. Some studies have tried to create confidence estimators for explanations, but none have… ▽ More

    Submitted 4 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  11. arXiv:2401.15061  [pdf, other

    cs.NE cs.ET physics.optics

    Digital-analog hybrid matrix multiplication processor for optical neural networks

    Authors: Xiansong Meng, Deming Kong, Kwangwoong Kim, Qiuchi Li, Po Dong, Ingemar J. Cox, Christina Lioma, Hao Hu

    Abstract: The computational demands of modern AI have spurred interest in optical neural networks (ONNs) which offer the potential benefits of increased speed and lower power consumption. However, current ONNs face various challenges,most significantly a limited calculation precision (typically around 4 bits) and the requirement for high-resolution signal format converters (digital-to-analogue conversions (… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  12. arXiv:2311.09889  [pdf, other

    cs.CL

    Language Generation from Brain Recordings

    Authors: Ziyi Ye, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Min Zhang, Christina Lioma, Tuukka Ruotsalo

    Abstract: Generating human language through non-invasive brain-computer interfaces (BCIs) has the potential to unlock many applications, such as serving disabled patients and improving communication. Currently, however, generating language via BCIs has been previously successful only within a classification setup for selecting pre-generated sentence continuation candidates with the most likely cortical sema… ▽ More

    Submitted 11 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Preprint. Under Submission

  13. Evaluation Measures of Individual Item Fairness for Recommender Systems: A Critical Study

    Authors: Theresia Veronika Rampisela, Maria Maistro, Tuukka Ruotsalo, Christina Lioma

    Abstract: Fairness is an emerging and challenging topic in recommender systems. In recent years, various ways of evaluating and therefore improving fairness have emerged. In this study, we examine existing evaluation measures of fairness in recommender systems. Specifically, we focus solely on exposure-based fairness measures of individual items that aim to quantify the disparity in how individual items are… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to ACM Transactions on Recommender Systems (TORS)

  14. arXiv:2305.18029  [pdf, other

    cs.CL cs.AI

    Faithfulness Tests for Natural Language Explanations

    Authors: Pepa Atanasova, Oana-Maria Camburu, Christina Lioma, Thomas Lukasiewicz, Jakob Grue Simonsen, Isabelle Augenstein

    Abstract: Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural… ▽ More

    Submitted 30 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Short paper, ACL 2023

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)

  15. arXiv:2302.13812  [pdf, other

    quant-ph cs.CL

    Adapting Pre-trained Language Models for Quantum Natural Language Processing

    Authors: Qiuchi Li, Benyou Wang, Yudong Zhu, Christina Lioma, Qun Liu

    Abstract: The emerging classical-quantum transfer learning paradigm has brought a decent performance to quantum computational models in many tasks, such as computer vision, by enabling a combination of quantum models and classical pre-trained neural networks. However, using quantum computing with pre-trained models has yet to be explored in natural language processing (NLP). Due to the high linearity constr… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  16. Graph-based Recommendation for Sparse and Heterogeneous User Interactions

    Authors: Simone Borg Bruun, Kacper Kenji Lesniak, Mirko Biasini, Vittorio Carmignani, Panagiotis Filianos, Christina Lioma, Maria Maistro

    Abstract: Recommender system research has oftentimes focused on approaches that operate on large-scale datasets containing millions of user interactions. However, many small businesses struggle to apply state-of-the-art models due to their very limited availability of data. We propose a graph-based recommender model which utilizes heterogeneous interactions between users and content of different types and i… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  17. arXiv:2212.02885  [pdf, other

    cs.CL

    Template-based Recruitment Email Generation For Job Recommendation

    Authors: Qiuchi Li, Christina Lioma

    Abstract: Text generation has long been a popular research topic in NLP. However, the task of generating recruitment emails from recruiters to candidates in the job recommendation scenario has received little attention by the research community. This work aims at defining the topic of automatic email generation for job recommendation, identifying the challenges, and providing a baseline template-based solut… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted by GEM2022 workshop

  18. Principled Multi-Aspect Evaluation Measures of Rankings

    Authors: Maria Maistro, Lucas Chaves Lima, Jakob Grue Simonsen, Christina Lioma

    Abstract: Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation).… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  19. Learning Recommendations from User Actions in the Item-poor Insurance Domain

    Authors: Simone Borg Bruun, Maria Maistro, Christina Lioma

    Abstract: While personalised recommendations are successful in domains like retail, where large volumes of user feedback on items are available, the generation of automatic recommendations in data-sparse domains, like insurance purchasing, is an open problem. The insurance domain is notoriously data-sparse because the number of products is typically low (compared to retail) and they are usually purchased to… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  20. arXiv:2204.02007  [pdf, other

    cs.CL cs.LG

    Fact Checking with Insufficient Evidence

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it wit… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 14 pages

    MSC Class: cs.CL

  21. arXiv:2109.03756  [pdf, other

    cs.LG

    Diagnostics-Guided Explanation Generation

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    ACM Class: I.2.7

  22. Unsupervised Multi-Index Semantic Hashing

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster al… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Proceedings of the 2021 World Wide Web Conference, published under Creative Commons CC-BY 4.0 License

  23. Projected Hamming Dissimilarity for Bit-Level Importance Coding in Collaborative Filtering

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Christina Lioma

    Abstract: When reasoning about tasks that involve large amounts of data, a common approach is to represent data items as objects in the Hamming space where operations can be done efficiently and effectively. Object similarity can then be computed by learning binary representations (hash codes) of the objects and computing their Hamming distance. While this is highly efficient, each bit dimension is equally… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Proceedings of the 2021 World Wide Web Conference, published under Creative Commons CC-BY 4.0 License

  24. arXiv:2103.10572  [pdf, other

    cs.MM

    Quantum-inspired Multimodal Fusion for Video Sentiment Analysis

    Authors: Qiuchi Li, Dimitris Gkoumas, Christina Lioma, Massimo Melucci

    Abstract: We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlation… ▽ More

    Submitted 22 March, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Post-print accepted by Information Fusion

  25. arXiv:2012.12366  [pdf, other

    cs.CL

    Multi-Head Self-Attention with Role-Guided Masks

    Authors: Dongsheng Wang, Casper Hansen, Lucas Chaves Lima, Christian Hansen, Maria Maistro, Jakob Grue Simonsen, Christina Lioma

    Abstract: The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence and convolutions. While some of the learned attention heads have been found to play linguistically interpretable roles, they can be redundant or prone to errors.… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: Accepted at ECIR@2021

  26. arXiv:2011.12684  [pdf, other

    cs.IR cs.LG

    Denmark's Participation in the Search Engine TREC COVID-19 Challenge: Lessons Learned about Searching for Precise Biomedical Scientific Information on COVID-19

    Authors: Lucas Chaves Lima, Casper Hansen, Christian Hansen, Dongsheng Wang, Maria Maistro, Birger Larsen, Jakob Grue Simonsen, Christina Lioma

    Abstract: This report describes the participation of two Danish universities, University of Copenhagen and Aalborg University, in the international search engine competition on COVID-19 (the 2020 TREC-COVID Challenge) organised by the U.S. National Institute of Standards and Technology (NIST) and its Text Retrieval Conference (TREC) division. The aim of the competition was to find the best search engine str… ▽ More

    Submitted 26 November, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

  27. arXiv:2009.13295  [pdf, other

    cs.CL cs.LG

    A Diagnostic Study of Explainability Techniques for Text Classification

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

    MSC Class: cs.CL; cs.AI ACM Class: I.2.7

  28. Unsupervised Semantic Hashing with Pairwise Reconstruction

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by t… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: Accepted at SIGIR'20

  29. Factuality Checking in News Headlines with Eye Tracking

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Birger Larsen, Stephen Alstrup, Christina Lioma

    Abstract: We study whether it is possible to infer if a news headline is true or false using only the movement of the human eyes when reading news headlines. Our study with 55 participants who are eye-tracked when reading 108 news headlines (72 true, 36 false) shows that false headlines receive statistically significantly less visual attention than true headlines. We further build an ensemble learner that p… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted to SIGIR 2020

  30. Content-aware Neural Hashing for Cold-start Recommendation

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Content-aware recommendation approaches are essential for providing meaningful recommendations for \textit{new} (i.e., \textit{cold-start}) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating user… ▽ More

    Submitted 31 May, 2020; originally announced June 2020.

    Comments: Accepted to SIGIR 2020

  31. arXiv:2004.05773  [pdf, other

    cs.CL cs.AI cs.LG

    Generating Fact Checking Explanations

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims.… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: In Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics (ACL 2020)

  32. arXiv:1912.12333  [pdf, other

    cs.CL cs.IR

    Encoding word order in complex embeddings

    Authors: Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen

    Abstract: Sequential word order is important when processing text. Currently, neural networks (NNs) address this by modeling word position using position embeddings. The problem is that position embeddings capture the position of individual words, but not the ordered relationship (e.g., adjacency or precedence) between individual word positions. We present a novel and principled solution for modeling both t… ▽ More

    Submitted 28 June, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

    Comments: 15 pages, 3 figures, ICLR 2020 spotlight paper. A typo on Ablation Table was revised thanks to Jingquan Zeng from SCUT

  33. arXiv:1909.06856  [pdf, other

    cs.CY

    Modelling End-of-Session Actions in Educational Systems

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Christina Lioma

    Abstract: In this paper we consider the problem of modelling when students end their session in an online mathematics educational system. Being able to model this accurately will help us optimize the way content is presented and consumed. This is done by modelling the probability of an action being the last in a session, which we denote as the End-of-Session probability. We use log data from a system where… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: In proceedings of EDM 2019

  34. arXiv:1909.03242  [pdf, other

    cs.CL cs.IR cs.LG stat.ML

    MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

    Authors: Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, Jakob Grue Simonsen

    Abstract: We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Furthe… ▽ More

    Submitted 21 October, 2019; v1 submitted 7 September, 2019; originally announced September 2019.

    Comments: Proceedings of EMNLP 2019, to appear

  35. arXiv:1906.00674  [pdf, other

    cs.IR cs.CL

    Contextually Propagated Term Weights for Document Representation

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target wo… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: SIGIR 2019

  36. arXiv:1906.00671  [pdf, other

    cs.IR cs.CL cs.LG

    Unsupervised Neural Generative Semantic Hashing

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Fast similarity search is a key component in large-scale information retrieval, where semantic hashing has become a popular strategy for representing documents as binary hash codes. Recent advances in this area have been obtained through neural network based models: generative models trained by learning to reconstruct the original documents. We present a novel unsupervised generative semantic hash… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: SIGIR 2019

  37. arXiv:1904.00761  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Speed Reading with Structural-Jump-LSTM

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Recurrent neural networks (RNNs) can model natural language by sequentially 'reading' input tokens and outputting a distributed representation of each token. Due to the sequential nature of RNNs, inference time is linearly dependent on the input length, and all inputs are read regardless of their importance. Efforts to speed up this inference, known as 'neural speed reading', either ignore or skim… ▽ More

    Submitted 2 April, 2019; v1 submitted 20 March, 2019; originally announced April 2019.

    Comments: 10 pages

    Journal ref: 7th International Conference on Learning Representations (ICLR) 2019

  38. arXiv:1903.08408  [pdf, other

    cs.IR cs.LG

    Modelling Sequential Music Track Skips using a Multi-RNN Approach

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Modelling sequential music skips provides streaming companies the ability to better understand the needs of the user base, resulting in a better user experience by reducing the need to manually skip certain music tracks. This paper describes the solution of the University of Copenhagen DIKU-IR team in the 'Spotify Sequential Skip Prediction Challenge', where the task was to predict the skip behavi… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: 4 pages

    Journal ref: 12th ACM International Conference on Web Search and Data Mining (WSDM) 2019, WSDM Cup

  39. arXiv:1903.08404  [pdf, other

    cs.IR cs.CL cs.LG

    Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Automatic fact-checking systems detect misinformation, such as fake news, by (i) selecting check-worthy sentences for fact-checking, (ii) gathering related information to the sentences, and (iii) inferring the factuality of the sentences. Most prior research on (i) uses hand-crafted features to select check-worthy sentences, and does not explicitly account for the recent finding that the top weigh… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: 6 pages

    Journal ref: In Companion Proceedings of the 2019 World Wide Web Conference

  40. arXiv:1903.08389  [pdf, other

    cs.CL cs.AI

    Contextual Compositionality Detection with External Knowledge Bases andWord Embeddings

    Authors: Dongsheng Wang, Quichi Li, Lucas Chaves Lima, Jakob grue Simonsen, Christina Lioma

    Abstract: When the meaning of a phrase cannot be inferred from the individual meanings of its words (e.g., hot dog), that phrase is said to be non-compositional. Automatic compositionality detection in multi-word phrases is critical in any application of semantic processing, such as search engines; failing to detect non-compositional phrases can hurt system effectiveness notably. Existing research treats ph… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: WWW '19 Companion, May 13-17, 2019, San Francisco, CA, USA

  41. Predicting antimicrobial drug consumption using web search data

    Authors: Niels Dalum Hansen, Kåre Mølbak, Ingemar Cox, Christina Lioma

    Abstract: Consumption of antimicrobial drugs, such as antibiotics, is linked with antimicrobial resistance. Surveillance of antimicrobial drug consumption is therefore an important element in dealing with antimicrobial resistance. Many countries lack sufficient surveillance systems. Usage of web mined data therefore has the potential to improve current surveillance methods. To this end, we study how well an… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

  42. arXiv:1802.06833  [pdf, other

    cs.IR

    Seasonal Web Search Query Selection for Influenza-Like Illness (ILI) Estimation

    Authors: Niels Dalum Hansen, Kåre Mølbak, Ingemar J. Cox, Christina Lioma

    Abstract: Influenza-like illness (ILI) estimation from web search data is an important web analytics task. The basic idea is to use the frequencies of queries in web search logs that are correlated with past ILI activity as features when estimating current ILI activity. It has been noted that since influenza is seasonal, this approach can lead to spurious correlations with features/queries that also exhibit… ▽ More

    Submitted 19 February, 2018; originally announced February 2018.

  43. arXiv:1802.02603  [pdf

    cs.IR

    To Phrase or Not to Phrase - Impact of User versus System Term Dependence Upon Retrieval

    Authors: Christina Lioma, Birger Larsen, Peter Ingwersen

    Abstract: When submitting queries to information retrieval (IR) systems, users often have the option of specifying which, if any, of the query terms are heavily dependent on each other and should be treated as a fixed phrase, for instance by placing them between quotes. In addition to such cases where users specify term dependence, automatic ways also exist for IR systems to detect dependent terms in querie… ▽ More

    Submitted 5 March, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

  44. arXiv:1709.03742  [pdf, other

    cs.IR cs.CL

    Dependencies: Formalising Semantic Catenae for Information Retrieval

    Authors: Christina Lioma

    Abstract: Building machines that can understand text like humans is an AI-complete problem. A great deal of research has already gone into this, with astounding results, allowing everyday people to discuss with their telephones, or have their reading materials analysed and classified by computers. A prerequisite for processing text semantics, common to the above examples, is having some computational repres… ▽ More

    Submitted 12 September, 2017; originally announced September 2017.

    Comments: This document is a doktordisputats - a dissertation within the Danish academic system required to obtain the degree of \textit{Doctor Scientiarum}, in form and function equivalent to the French and German Habilitation and the Higher Doctorate of the Commonwealth

  45. arXiv:1708.07157  [pdf, ps, other

    cs.IR

    Evaluation Measures for Relevance and Credibility in Ranked Lists

    Authors: Christina Lioma, Jakob Grue Simonsen, Birger Larsen

    Abstract: Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure c… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

  46. arXiv:1708.06403  [pdf, other

    cs.CY

    Smart City Analytics: Ensemble-Learned Prediction of Citizen Home Care

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Christina Lioma

    Abstract: We present an ensemble learning method that predicts large increases in the hours of home care received by citizens. The method is supervised, and uses different ensembles of either linear (logistic regression) or non-linear (random forests) classifiers. Experiments with data available from 2013 to 2017 for every citizen in Copenhagen receiving home care (27,775 citizens) show that prediction can… ▽ More

    Submitted 21 August, 2017; originally announced August 2017.

  47. arXiv:1708.04164  [pdf, other

    cs.CY cs.HC

    Sequence Modelling For Analysing Student Interaction with Educational Systems

    Authors: Christian Hansen, Casper Hansen, Niklas Hjuler, Stephen Alstrup, Christina Lioma

    Abstract: The analysis of log data generated by online educational systems is an important task for improving the systems, and furthering our knowledge of how students learn. This paper uses previously unseen log data from Edulab, the largest provider of digital learning for mathematics in Denmark, to analyse the sessions of its users, where 1.08 million student sessions are extracted from a subset of their… ▽ More

    Submitted 14 August, 2017; originally announced August 2017.

    Comments: The 10th International Conference on Educational Data Mining 2017

  48. arXiv:1704.01851  [pdf, ps, other

    cs.IR

    Fixed versus Dynamic Co-Occurrence Windows in TextRank Term Weights for Information Retrieval

    Authors: Wei Lu, Qikai Cheng, Christina Lioma

    Abstract: TextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms and edges denote relations between terms. Quite often the relation between terms is simple term co-occurrence within a fixed window of k terms. The output of TextRank when applied iteratively is a score for each vertex, i.e. a term weight, that can be used for information retrieval… ▽ More

    Submitted 6 April, 2017; originally announced April 2017.

  49. arXiv:1704.01845  [pdf, ps, other

    cs.IR

    Report on TBAS 2012: Workshop on Task-Based and Aggregated Search

    Authors: Birger Larsen, Christina Lioma, Arjen de Vries

    Abstract: The ECIR half-day workshop on Task-Based and Aggregated Search (TBAS) was held in Barcelona, Spain on 1 April 2012. The program included a keynote talk by Professor Jarvelin, six full paper presentations, two poster presentations, and an interactive discussion among the approximately 25 participants. This report overviews the aims and contents of the workshop and outlines the major outcomes.

    Submitted 6 April, 2017; originally announced April 2017.

  50. arXiv:1704.01617  [pdf, ps, other

    cs.IR

    Part of Speech Based Term Weighting for Information Retrieval

    Authors: Christina Lioma, Roi Blanco

    Abstract: Automatic language processing tools typically assign to terms so-called weights corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.