Skip to main content

Showing 1–33 of 33 results for author: Perego, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18088  [pdf, ps, other

    cs.LG

    Early-Exit Graph Neural Networks

    Authors: Andrea Giuseppe Di Francesco, Maria Sofia Bucarelli, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Fabrizio Silvestri

    Abstract: Early-exit mechanisms allow deep neural networks to halt inference as soon as classification confidence is high enough, adaptively trading depth for confidence, and thereby cutting latency and energy on easy inputs while retaining full-depth accuracy for harder ones. Similarly, adding early exit mechanisms to Graph Neural Networks (GNNs), the go-to models for graph-structured data, allows for dyna… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 37 pages, 14 figures

  2. Efficient Conversational Search via Topical Locality in Dense Retrieval

    Authors: Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, Guido Rocchietti, Cosimo Rulli

    Abstract: Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction, response time remains a critical bottleneck of conversational search systems. To address this, we exploit the topical locality inherent in conversational queries, i.e… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 5 pages, 2 figures, SIGIR 2025

    ACM Class: H.3

  3. Towards Robust Expert Finding in Community Question Answering Platforms

    Authors: Maddalena Amendola, Andrea Passarella, Raffaele Perego

    Abstract: This paper introduces TUEF, a topic-oriented user-interaction model for fair Expert Finding in Community Question Answering (CQA) platforms. The Expert Finding task in CQA platforms involves identifying proficient users capable of providing accurate answers to questions from the community. To this aim, TUEF improves the robustness and credibility of the CQA platform through a more precise Expert F… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Journal ref: Advances in Information Retrieval, Springer Nature Switzerland, 2024, 152--168

  4. arXiv:2412.17484  [pdf, other

    cs.DC cs.AI

    Power- and Fragmentation-aware Online Scheduling for GPU Datacenters

    Authors: Francesco Lettich, Emanuele Carlini, Franco Maria Nardini, Raffaele Perego, Salvatore Trani

    Abstract: The rise of Artificial Intelligence and Large Language Models is driving increased GPU usage in data centers for complex training and inference tasks, impacting operational costs, energy demands, and the environmental footprint of large-scale computing infrastructures. This work addresses the online scheduling problem in GPU datacenters, which involves scheduling tasks without knowledge of their f… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  5. arXiv:2410.07797  [pdf, other

    cs.CL cs.AI cs.HC cs.IR

    Rewriting Conversational Utterances with Instructed Large Language Models

    Authors: Elnara Galimzhanova, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, Guido Rocchietti

    Abstract: Many recent studies have shown the ability of large language models (LLMs) to achieve state-of-the-art performance on many NLP tasks, such as question answering, text summarization, coding, and translation. In some cases, the results provided by LLMs are on par with those of human experts. These models' most disruptive innovation is their ability to perform tasks via zero-shot or few-shot promptin… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Journal ref: 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

  6. Early Exit Strategies for Approximate k-NN Search in Dense Retrieval

    Authors: Francesco Busolin, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Salvatore Trani

    Abstract: Learned dense representations are a popular family of techniques for encoding queries and documents using high-dimensional embeddings, which enable retrieval by performing approximate k nearest-neighbors search (A-kNN). A popular technique for making A-kNN search efficient is based on a two-level index, where the embeddings of documents are clustered offline and, at query processing, a fixed numbe… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 6 pages, published at CIKM 2024

  7. arXiv:2407.05335  [pdf, other

    cs.IR

    Understanding and Addressing Gender Bias in Expert Finding Task

    Authors: Maddalena Amendola, Carlos Castillo, Andrea Passarella, Raffaele Perego

    Abstract: The Expert Finding (EF) task is critical in community Question&Answer (CQ&A) platforms, significantly enhancing user engagement by improving answer quality and reducing response times. However, biases, especially gender biases, have been identified in these platforms. This study investigates gender bias in state-of-the-art EF models and explores methods to mitigate it. Utilizing a comprehensive da… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  8. arXiv:2407.04018  [pdf, other

    cs.IR

    Leveraging Topic Specificity and Social Relationships for Expert Finding in Community Question Answering Platforms

    Authors: Maddalena Amendola, Andrea Passarella, Raffaele Perego

    Abstract: Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions to the most knowledgeable users within the community, a process known as Expert Finding (EF). EF accuracy is crucial for increasing user engagement and the rel… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  9. DESIRE-ME: Domain-Enhanced Supervised Information REtrieval using Mixture-of-Experts

    Authors: Pranav Kasela, Gabriella Pasi, Raffaele Perego, Nicola Tonellotto

    Abstract: Open-domain question answering requires retrieval systems able to cope with the diverse and varied nature of questions, providing accurate answers across a broad spectrum of query types and topics. To deal with such topic heterogeneity through a unique model, we propose DESIRE-ME, a neural information retrieval model that leverages the Mixture-of-Experts framework to combine multiple specialized n… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted at the 46th European Conference on Information Retrieval (ECIR 2024)

  10. SE-PEF: a Resource for Personalized Expert Finding

    Authors: Pranav Kasela, Gabriella Pasi, Raffaele Perego

    Abstract: The problem of personalization in Information Retrieval has been under study for a long time. A well-known issue related to this task is the lack of publicly available datasets that can support a comparative evaluation of personalized search systems. To contribute in this respect, this paper introduces SE-PEF (StackExchange - Personalized Expert Finding), a resource useful for designing and evalua… ▽ More

    Submitted 5 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: SIGIR-AP '23 Conference paper

  11. SE-PQA: Personalized Community Question Answering

    Authors: Pranav Kasela, Marco Braga, Gabriella Pasi, Raffaele Perego

    Abstract: Personalization in Information Retrieval is a topic studied for a long time. Nevertheless, there is still a lack of high-quality, real-world datasets to conduct large-scale experiments and evaluate models for personalized search. This paper contributes to filling this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new curated resource to design and evaluate personal… ▽ More

    Submitted 19 February, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

  12. arXiv:2306.12165  [pdf, other

    cs.IR cs.LG

    Post-hoc Selection of Pareto-Optimal Solutions in Search and Recommendation

    Authors: Vincenzo Paparella, Vito Walter Anelli, Franco Maria Nardini, Raffaele Perego, Tommaso Di Noia

    Abstract: Information Retrieval (IR) and Recommender Systems (RS) tasks are moving from computing a ranking of final results based on a single metric to multi-objective problems. Solving these problems leads to a set of Pareto-optimal solutions, known as Pareto frontier, in which no objective can be further improved without hurting the others. In principle, all the points on the Pareto frontier are potentia… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  13. arXiv:2211.14155  [pdf, other

    cs.IR

    Caching Historical Embeddings in Conversational Search

    Authors: Ophir Frieder, Ida Mele, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto

    Abstract: Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-si… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  14. arXiv:2209.14369  [pdf, other

    cs.SI

    Social Search: retrieving information in Online Social Platforms -- A Survey

    Authors: Maddalena Amendola, Andrea Passarella, Raffaele Perego

    Abstract: Social Search research deals with studying methodologies exploiting social information to better satisfy user information needs in Online Social Media while simplifying the search effort and consequently reducing the time spent and the computational resources utilized. Starting from previous studies, in this work, we analyze the current state of the art of the Social Search area, proposing a new t… ▽ More

    Submitted 13 September, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

  15. ILMART: Interpretable Ranking with Constrained LambdaMART

    Authors: Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Alberto Veneri

    Abstract: Interpretable Learning to Rank (LtR) is an emerging field within the research area of explainable AI, aiming at developing intelligible and accurate predictive models. While most of the previous research efforts focus on creating post-hoc explanations, in this paper we investigate how to train effective and intrinsically-interpretable ranking models. Developing these models is particularly challen… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 5 pages, 3 figures, to be published in SIGIR 2022 proceedings

  16. Learning Early Exit Strategies for Additive Ranking Ensembles

    Authors: Francesco Busolin, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Salvatore Trani

    Abstract: Modern search engine ranking pipelines are commonly based on large machine-learned ensembles of regression trees. We propose LEAR, a novel - learned - technique aimed to reduce the average number of trees traversed by documents to accumulate the scores, thus reducing the overall query response time. LEAR exploits a classifier that predicts whether a document can early exit the ensemble because it… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: 5 pages, 3 figures, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 21)

    ACM Class: H.3.3

    Journal ref: 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, 2021, 2217-2221

  17. Dynamic Hard Pruning of Neural Networks at the Edge of the Internet

    Authors: Lorenzo Valerio, Franco Maria Nardini, Andrea Passarella, Raffaele Perego

    Abstract: Neural Networks (NN), although successfully applied to several Artificial Intelligence tasks, are often unnecessarily over-parametrised. In edge/fog computing, this might make their training prohibitive on resource-constrained devices, contrasting with the current trend of decentralising intelligence from remote data centres to local constrained devices. Therefore, we investigate the problem of tr… ▽ More

    Submitted 22 October, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

  18. arXiv:2006.14233  [pdf

    cs.LG stat.ML

    Green Machine Learning via Augmented Gaussian Processes and Multi-Information Source Optimization

    Authors: Antonio Candelieri, Riccardo Perego, Francesco Archetti

    Abstract: Searching for accurate Machine and Deep Learning models is a computationally expensive and awfully energivorous process. A strategy which has been gaining recently importance to drastically reduce computational time and energy consumed is to exploit the availability of different information sources, with different computational costs and different "fidelity", typically smaller portions of a large… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: 22 pages, 4 figures, submitted to Soft computing - Special Issue on "Optimization methods for decision making: advances and applications"

  19. arXiv:2004.14641  [pdf, other

    cs.IR cs.LG

    Query-level Early Exit for Additive Learning-to-Rank Ensembles

    Authors: Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Salvatore Trani

    Abstract: Search engine ranking pipelines are commonly based on large ensembles of machine-learned decision trees. The tight constraints on query response time recently motivated researchers to investigate algorithms to make faster the traversal of the additive ensemble or to early terminate the evaluation of documents that are unlikely to be ranked among the top-k. In this paper, we investigate the novel p… ▽ More

    Submitted 30 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (short paper)

    MSC Class: 68P20

  20. Training Curricula for Open Domain Answer Re-Ranking

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: In precision-oriented tasks like answer ranking, it is more important to rank many relevant answers highly than to retrieve all relevant answers. It follows that a good ranking strategy would be to learn how to identify the easiest correct answers first (i.e., assign a high ranking score to answers that have characteristics that usually indicate relevance, and a low ranking score to those with cha… ▽ More

    Submitted 21 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (long)

  21. Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web do… ▽ More

    Submitted 26 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (long)

  22. Expansion via Prediction of Importance with Contextualization

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon,… ▽ More

    Submitted 20 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (short)

  23. arXiv:2004.14054  [pdf, other

    cs.IR cs.CL

    Topic Propagation in Conversational Search

    Authors: I. Mele, C. I. Muntean, F. M. Nardini, R. Perego, N. Tonellotto, O. Frieder

    Abstract: In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detect… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: 5 pages

  24. arXiv:2003.04275  [pdf

    cs.CY cs.LG math.OC

    Modelling Human Active Search in Optimizing Black-box Functions

    Authors: Antonio Candelieri, Riccardo Perego, Ilaria Giordani, Andrea Ponti, Francesco Archetti

    Abstract: Modelling human function learning has been the subject of in-tense research in cognitive sciences. The topic is relevant in black-box optimization where information about the objective and/or constraints is not available and must be learned through function evaluations. In this paper we focus on the relation between the behaviour of humans searching for the maximum and the probabilistic model used… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  25. arXiv:2003.04207  [pdf

    stat.ML cs.LG math.OC

    Composition of kernel and acquisition functions for High Dimensional Bayesian Optimization

    Authors: Antonio Candelieri, Ilaria Giordani, Riccardo Perego, Francesco Archetti

    Abstract: Bayesian Optimization has become the reference method for the global optimization of black box, expensive and possibly noisy functions. Bayesian Op-timization learns a probabilistic model about the objective function, usually a Gaussian Process, and builds, depending on its mean and variance, an acquisition function whose optimizer yields the new evaluation point, leading to update the probabilist… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  26. arXiv:2001.03010  [pdf, other

    cs.IR cs.DB

    Topical Result Caching in Web Search Engines

    Authors: Ida Mele, Nicola Tonellotto, Ophir Frieder, Raffaele Perego

    Abstract: Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic Cache) that stores in a stati… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

  27. arXiv:1908.06010  [pdf, other

    math.OC cs.LG math.NA stat.ML

    Safe global optimization of expensive noisy black-box functions in the $δ$-Lipschitz framework

    Authors: Yaroslav D. Sergeyev, Antonio Candelieri, Dmitri E. Kvasov, Riccardo Perego

    Abstract: In this paper, the problem of safe global maximization (it should not be confused with robust optimization) of expensive noisy black-box functions satisfying the Lipschitz condition is considered. The notion "safe" means that the objective function $f(x)$ during optimization should not violate a "safety" threshold, for instance, a certain a priori given value $h$ in a maximization problem. Thus, a… ▽ More

    Submitted 15 August, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

    Comments: Published paper (37 pages, 44 figures, 4 tables): Yaroslav D. Sergeyev - corresponding author. Soft Computing (2020)

    MSC Class: 90C26; 65K05; 68T05; 68Q32

  28. Compressed Indexes for Fast Search of Semantic Data

    Authors: Raffaele Perego, Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: The sheer increase in volume of RDF data demands efficient solutions for the triple indexing problem, that is devising a compressed data structure to compactly represent RDF triples by guaranteeing, at the same time, fast pattern matching operations. This problem lies at the heart of delivering good practical performance for the resolution of complex SPARQL queries on large RDF datasets. In this w… ▽ More

    Submitted 27 February, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Published in IEEE Transactions on Knowledge and Data Engineering (TKDE), 14 January 2020

    Journal ref: IEEE Trans. Knowl. Data Eng. 33(9): 3187-3198 (2021)

  29. arXiv:1610.08686  [pdf, ps, other

    cs.SI

    Polarized User and Topic Tracking in Twitter

    Authors: Mauro Coletto, Claudio Lucchese, Salvatore Orlando, Raffaele Perego

    Abstract: Digital traces of conversations in micro-blogging platforms and OSNs provide information about user opinion with a high degree of resolution. These information sources can be exploited to under- stand and monitor collective behaviors. In this work, we focus on polarization classes, i.e., those topics that require the user to side exclusively with one position. The proposed method provides an itera… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: SIGIR 16

  30. arXiv:1605.01895  [pdf, other

    cs.SI

    Sentiment-enhanced Multidimensional Analysis of Online Social Networks: Perception of the Mediterranean Refugees Crisis

    Authors: Mauro Coletto, Claudio Lucchese, Cristina Ioana Muntean, Franco Maria Nardini, Andrea Esuli, Chiara Renso, Raffaele Perego

    Abstract: We propose an analytical framework able to investigate discussions about polarized topics in online social networks from many different angles. The framework supports the analysis of social networks along several dimensions: time, space and sentiment. We show that the proposed analytical framework and the methodology can be used to mine knowledge about the perception of complex social phenomena. W… ▽ More

    Submitted 6 May, 2016; originally announced May 2016.

  31. arXiv:1105.4255  [pdf

    cs.IR

    Efficient Diversification of Web Search Results

    Authors: Gabriele Capannini, Franco Maria Nardini, Raffaele Perego, Fabrizio Silvestri

    Abstract: In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for dete… ▽ More

    Submitted 21 May, 2011; originally announced May 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 7, pp. 451-459 (2011)

  32. arXiv:0905.4627  [pdf, other

    cs.MM cs.IR

    CoPhIR: a Test Collection for Content-Based Image Retrieval

    Authors: Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese, Raffaele Perego, Tommaso Piccioli, Fausto Rabitti

    Abstract: The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CBIR) approaches proposed in literature, is today an important research issue. Given the wealth of images on the Web, CBIR systems must in fact leap towards Web-scale datasets. In this paper, we report on our experience in building a test collection of 100 million images, with the corresponding descrip… ▽ More

    Submitted 1 June, 2009; v1 submitted 28 May, 2009; originally announced May 2009.

    Comments: 15 pages

  33. arXiv:cs/0407053  [pdf, ps, other

    cs.IR cs.DC

    Design of a Parallel and Distributed Web Search Engine

    Authors: Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri

    Abstract: This paper describes the architecture of MOSE (My Own Search Engine), a scalable parallel and distributed engine for searching the web. MOSE was specifically designed to efficiently exploit affordable parallel architectures, such as clusters of workstations. Its modular and scalable architecture can easily be tuned to fulfill the bandwidth requirements of the application at hand. Both task-paral… ▽ More

    Submitted 21 July, 2004; originally announced July 2004.

    Comments: 8 pages. In Proceedings of the 2001 Parallel Computing Conference (ParCo 2001), 4-7 September 2001, Naples, Italy, Imperial College Press, pp. 197-204