Skip to main content

Showing 1–24 of 24 results for author: Maistro, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17630  [pdf, other

    cs.CL cs.LG

    GIM: Improved Interpretability for Large Language Models

    Authors: Joakim Edin, Róbert Csordás, Tuukka Ruotsalo, Zhengxuan Wu, Maria Maistro, Jing Huang, Lars Maaløe

    Abstract: Ensuring faithful interpretability in large language models is imperative for trustworthy and reliable AI. A key obstacle is self-repair, a phenomenon where networks compensate for reduced signal in one component by amplifying others, masking the true importance of the ablated component. While prior work attributes self-repair to layer normalization and back-up components that compensate for ablat… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    MSC Class: 68T07 ACM Class: I.2.0; I.2.7

  2. arXiv:2503.21714  [pdf, other

    cs.CL

    As easy as PIE: understanding when pruning causes language models to disagree

    Authors: Pietro Tropeano, Maria Maistro, Tuukka Ruotsalo, Christina Lioma

    Abstract: Language Model (LM) pruning compresses the model by removing weights, nodes, or other parts of its architecture. Typically, pruning focuses on the resulting efficiency gains at the cost of effectiveness. However, when looking at how individual data points are affected by pruning, it turns out that a particular subset of data points always bears most of the brunt (in terms of reduced accuracy) when… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted to NAACL 2025 (Findings)

  3. Joint Evaluation of Fairness and Relevance in Recommender Systems with Pareto Frontier

    Authors: Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma

    Abstract: Fairness and relevance are two important aspects of recommender systems (RSs). Typically, they are evaluated either (i) separately by individual measures of fairness and relevance, or (ii) jointly using a single measure that accounts for fairness with respect to relevance. However, approach (i) often does not provide a reliable joint estimate of the goodness of the models, as it has two different… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted to TheWebConf/WWW 2025 (Oral)

  4. arXiv:2501.18805  [pdf, ps, other

    cs.IR

    Are Representation Disentanglement and Interpretability Linked in Recommendation Models? A Critical Review and Reproducibility Study

    Authors: Ervin Dervishaj, Tuukka Ruotsalo, Maria Maistro, Christina Lioma

    Abstract: Unsupervised learning of disentangled representations has been closely tied to enhancing the representation intepretability of Recommender Systems (RSs). This has been achieved by making the representation of individual features more distinctly separated, so that it is easier to attribute the contribution of features to the model's predictions. However, such advantages in interpretability and feat… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: Accepted at the 47th European Conference on Information Retrieval (ECIR 2025)

  5. arXiv:2412.17031  [pdf, ps, other

    cs.CL cs.AI

    A Reality Check on Context Utilisation for Retrieval-Augmented Generation

    Authors: Lovisa Hagström, Sara Vera Marjanović, Haeun Yu, Arnav Arora, Christina Lioma, Maria Maistro, Pepa Atanasova, Isabelle Augenstein

    Abstract: Retrieval-augmented generation (RAG) helps address the limitations of parametric knowledge embedded within a language model (LM). In real world settings, retrieved information can vary in complexity, yet most investigations of LM utilisation of context has been limited to synthetic text. We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with re… ▽ More

    Submitted 29 May, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

    Comments: Accepted at ACL 2025

  6. arXiv:2408.08137  [pdf, other

    cs.LG

    Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

    Authors: Joakim Edin, Andreas Geert Motzfeldt, Casper L. Christensen, Tuukka Ruotsalo, Lars Maaløe, Maria Maistro

    Abstract: Deep neural network predictions are notoriously difficult to interpret. Feature attribution methods aim to explain these predictions by identifying the contribution of each input feature. Faithfulness, often evaluated using the area over the perturbation curve (AOPC), reflects feature attributions' accuracy in describing the internal mechanisms of deep neural networks. However, many studies rely o… ▽ More

    Submitted 23 May, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2025 Main

    ACM Class: I.2.0

  7. arXiv:2407.17023  [pdf, other

    cs.CL cs.AI

    DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models

    Authors: Sara Vera Marjanović, Haeun Yu, Pepa Atanasova, Maria Maistro, Christina Lioma, Isabelle Augenstein

    Abstract: Knowledge-intensive language understanding tasks require Language Models (LMs) to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. However, conflicting knowledge can be present in the LM's parameters, termed intra-memory conflict, which can affect a model's propensity to accept contextual knowledge. To study the effect of intra-memory conf… ▽ More

    Submitted 7 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 15 pages, 6 figures, Accepted to Findings of EMNLP 2024

    MSC Class: 68T50 ACM Class: I.2.7

  8. arXiv:2406.08958  [pdf, other

    cs.LG

    An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

    Authors: Joakim Edin, Maria Maistro, Lars Maaløe, Lasse Borgholt, Jakob D. Havtorn, Tuukka Ruotsalo

    Abstract: Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves heal… ▽ More

    Submitted 28 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2024 Main

  9. Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance

    Authors: Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma

    Abstract: Relevance and fairness are two major objectives of recommender systems (RSs). Recent work proposes measures of RS fairness that are either independent from relevance (fairness-only) or conditioned on relevance (joint measures). While fairness-only measures have been studied extensively, we look into whether joint measures can be trusted. We collect all joint evaluation measures of RS relevance and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGIR 2024 as full paper

  10. Dataset and Models for Item Recommendation Using Multi-Modal User Interactions

    Authors: Simone Borg Bruun, Krisztian Balog, Maria Maistro

    Abstract: While recommender systems with multi-modal item representations (image, audio, and text), have been widely explored, learning recommendations from multi-modal user interactions (e.g., clicks and speech) remains an open problem. We study the case of multi-modal user interactions in a setting where users engage with a service provider through multiple channels (website and call center). In such case… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  11. Recommending Target Actions Outside Sessions in the Data-poor Insurance Domain

    Authors: Simone Borg Bruun, Christina Lioma, Maria Maistro

    Abstract: Providing personalized recommendations for insurance products is particularly challenging due to the intrinsic and distinctive features of the insurance domain. First, unlike more traditional domains like retail, movie etc., a large amount of user feedback is not available and the item catalog is smaller. Second, due to the higher complexity of products, the majority of users still prefer to compl… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.15360

    Journal ref: ACM Transactions on Recommender Systems 2023

  12. Evaluation Measures of Individual Item Fairness for Recommender Systems: A Critical Study

    Authors: Theresia Veronika Rampisela, Maria Maistro, Tuukka Ruotsalo, Christina Lioma

    Abstract: Fairness is an emerging and challenging topic in recommender systems. In recent years, various ways of evaluating and therefore improving fairness have emerged. In this study, we examine existing evaluation measures of fairness in recommender systems. Specifically, we focus solely on exposure-based fairness measures of individual items that aim to quantify the disparity in how individual items are… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to ACM Transactions on Recommender Systems (TORS)

  13. Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

    Authors: Joakim Edin, Alexander Junge, Jakob D. Havtorn, Lasse Borgholt, Maria Maistro, Tuukka Ruotsalo, Lars Maaløe

    Abstract: Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that seve… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: 11 pages, 6 figures, to be published in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), July 23--27, 2023, Taipei, Taiwan

    ACM Class: H.3.0

  14. Graph-based Recommendation for Sparse and Heterogeneous User Interactions

    Authors: Simone Borg Bruun, Kacper Kenji Lesniak, Mirko Biasini, Vittorio Carmignani, Panagiotis Filianos, Christina Lioma, Maria Maistro

    Abstract: Recommender system research has oftentimes focused on approaches that operate on large-scale datasets containing millions of user interactions. However, many small businesses struggle to apply state-of-the-art models due to their very limited availability of data. We propose a graph-based recommender model which utilizes heterogeneous interactions between users and content of different types and i… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  15. Crowdsourcing Controller -- Utilizing Reliable Agents in a Multiplayer Game

    Authors: Kacper Kenji Lesniak, Maria Maistro

    Abstract: This paper presents a new use case for continuous crowdsourcing, where multiple players collectively control a single character in a video game. Similar approaches have already been proposed, but they suffer from certain limitations: (1) they simply consider static time frames to group real-time inputs from multiple players; (2) then they aggregate inputs with simple majority vote, i.e., each play… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  16. arXiv:2212.01387  [pdf, other

    cs.HC

    FullBrain: a Social E-learning Platform

    Authors: Mirko Biasini, Vittorio Carmignani, Nicola Ferro, Panagiotis Filianos, Maria Maistro, Giorgio Maria di Nunzio

    Abstract: We present FullBrain, a social e-learning platform where students share and track their knowledge. FullBrain users can post notes, ask questions and share learning resources in dedicated course and concept spaces. We detail two components of FullBrain: a SIR system equipped with query autocomplete and query autosuggestion, and a Leaderboard module to improve user experience. We analyzed the day-to… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  17. Principled Multi-Aspect Evaluation Measures of Rankings

    Authors: Maria Maistro, Lucas Chaves Lima, Jakob Grue Simonsen, Christina Lioma

    Abstract: Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation).… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  18. Learning Recommendations from User Actions in the Item-poor Insurance Domain

    Authors: Simone Borg Bruun, Maria Maistro, Christina Lioma

    Abstract: While personalised recommendations are successful in domains like retail, where large volumes of user feedback on items are available, the generation of automatic recommendations in data-sparse domains, like insurance purchasing, is an open problem. The insurance domain is notoriously data-sparse because the number of products is typically low (compared to retail) and they are usually purchased to… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  19. arXiv:2210.10266  [pdf, ps, other

    cs.IR

    Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks

    Authors: Tetsuya Sakai, Sijie Tao, Maria Maistro, Zhumin Chu, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

    Abstract: Unfortunately, the official English (sub)task results reported in the NTCIR-14 WWW-2, NTCIR-15 WWW-3, and NTCIR-16 WWW-4 overview papers are incorrect due to noise in the official qrels files; this paper reports results based on the corrected qrels files. The noise is due to a fatal bug in the backend of our relevance assessment interface. More specifically, at WWW-2, WWW-3, and WWW-4, two version… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: 24 pages

  20. repro_eval: A Python Interface to Reproducibility Measures of System-oriented IR Experiments

    Authors: Timo Breuer, Nicola Ferro, Maria Maistro, Philipp Schaer

    Abstract: In this work we introduce repro_eval - a tool for reactive reproducibility studies of system-oriented information retrieval (IR) experiments. The corresponding Python package provides IR researchers with measures for different levels of reproduction when evaluating their systems' outputs. By offering an easily extensible interface, we hope to stimulate common practices when conducting a reproducib… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: Accepted at ECIR21. The final authenticated version is available online at https://doi.org/10.1007/978-3-030-72240-1_51

  21. arXiv:2103.02462  [pdf, other

    cs.IR

    University of Copenhagen Participation in TREC Health Misinformation Track 2020

    Authors: Lucas Chaves Lima, Dustin Brandon Wright, Isabelle Augenstein, Maria Maistro

    Abstract: In this paper, we describe our participation in the TREC Health Misinformation Track 2020. We submitted $11$ runs to the Total Recall Task and 13 runs to the Ad Hoc task. Our approach consists of 3 steps: (1) we create an initial run with BM25 and RM3; (2) we estimate credibility and misinformation scores for the documents in the initial run; (3) we merge the relevance, credibility and misinformat… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

  22. arXiv:2012.12366  [pdf, other

    cs.CL

    Multi-Head Self-Attention with Role-Guided Masks

    Authors: Dongsheng Wang, Casper Hansen, Lucas Chaves Lima, Christian Hansen, Maria Maistro, Jakob Grue Simonsen, Christina Lioma

    Abstract: The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence and convolutions. While some of the learned attention heads have been found to play linguistically interpretable roles, they can be redundant or prone to errors.… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: Accepted at ECIR@2021

  23. arXiv:2011.12684  [pdf, other

    cs.IR cs.LG

    Denmark's Participation in the Search Engine TREC COVID-19 Challenge: Lessons Learned about Searching for Precise Biomedical Scientific Information on COVID-19

    Authors: Lucas Chaves Lima, Casper Hansen, Christian Hansen, Dongsheng Wang, Maria Maistro, Birger Larsen, Jakob Grue Simonsen, Christina Lioma

    Abstract: This report describes the participation of two Danish universities, University of Copenhagen and Aalborg University, in the international search engine competition on COVID-19 (the 2020 TREC-COVID Challenge) organised by the U.S. National Institute of Standards and Technology (NIST) and its Text Retrieval Conference (TREC) division. The aim of the competition was to find the best search engine str… ▽ More

    Submitted 26 November, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

  24. How to Measure the Reproducibility of System-oriented IR Experiments

    Authors: Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer, Ian Soboroff

    Abstract: Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented data… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: SIGIR2020 Full Conference Paper