Skip to main content

Showing 1–14 of 14 results for author: Penha, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.13572  [pdf, other

    cs.IR

    Contextualizing Spotify's Audiobook List Recommendations with Descriptive Shelves

    Authors: Gustavo Penha, Alice Wang, Martin Achenbach, Kristen Sheets, Sahitya Mantravadi, Remi Galvez, Nico Guetta-Jeanrenaud, Divya Narayanan, Ofeliya Kalaydzhyan, Hugues Bouchard

    Abstract: In this paper, we propose a pipeline to generate contextualized list recommendations with descriptive shelves in the domain of audiobooks. By creating several shelves for topics the user has an affinity to, e.g. Uplifting Women's Fiction, we can help them explore their recommendations according to their interests and at the same time recommend a diverse set of items. To do so, we use Large Languag… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: Accepted for publication in the 47th European Conference on Information Retrieval (ECIR'25)

  2. arXiv:2503.24193  [pdf, other

    cs.IR

    Text2Tracks: Prompt-based Music Recommendation via Generative Retrieval

    Authors: Enrico Palumbo, Gustavo Penha, Andreas Damianou, José Luis Redondo García, Timothy Christopher Heath, Alice Wang, Hugues Bouchard, Mounia Lalmas

    Abstract: In recent years, Large Language Models (LLMs) have enabled users to provide highly specific music recommendation requests using natural language prompts (e.g. "Can you recommend some old classics for slow dancing?"). In this setup, the recommended tracks are predicted by the LLM in an autoregressive way, i.e. the LLM generates the track titles one token at a time. While intuitive, this approach ha… ▽ More

    Submitted 2 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  3. arXiv:2410.16823  [pdf, other

    cs.IR

    Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other?

    Authors: Gustavo Penha, Ali Vardasbi, Enrico Palumbo, Marco de Nadai, Hugues Bouchard

    Abstract: Generative retrieval for search and recommendation is a promising paradigm for retrieving items, offering an alternative to traditional methods that depend on external indexes and nearest-neighbor searches. Instead, generative models directly associate inputs with item IDs. Given the breakthroughs of Large Language Models (LLMs), these generative systems can play a crucial role in centralizing a v… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted for publication in the 18th ACM Conference on Recommender Systems (RecSys'24)

  4. PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

    Authors: Azin Ghazimatin, Ekaterina Garmash, Gustavo Penha, Kristen Sheets, Martin Achenbach, Oguz Semerci, Remi Galvez, Marcus Tannenberg, Sahitya Mantravadi, Divya Narayanan, Ofeliya Kalaydzhyan, Douglas Cole, Ben Carterette, Ann Clifton, Paul N. Bennett, Claudia Hauff, Mounia Lalmas

    Abstract: Listeners of long-form talk-audio content, such as podcast episodes, often find it challenging to understand the overall structure and locate relevant sections. A practical solution is to divide episodes into chapters--semantically coherent segments labeled with titles and timestamps. Since most episodes on our platform at Spotify currently lack creator-provided chapters, automating the creation o… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 9 pages, 4 figures, CIKM industry track 2024

    MSC Class: 68P20 ACM Class: H.3.3

  5. arXiv:2303.11648  [pdf, other

    cs.IR cs.CL cs.LG

    Improving Content Retrievability in Search with Controllable Query Generation

    Authors: Gustavo Penha, Enrico Palumbo, Maryam Aziz, Alice Wang, Hugues Bouchard

    Abstract: An important goal of online platforms is to enable content discovery, i.e. allow users to find a catalog entity they were not familiar with. A pre-requisite to discover an entity, e.g. a book, with a search engine is that the entity is retrievable, i.e. there are queries for which the system will surface such entity in the top results. However, machine-learned search engines have a high retrievabi… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted for publication in the International World Wide Web Conference 2023

  6. arXiv:2301.05508  [pdf, ps, other

    cs.IR

    Do the Findings of Document and Passage Retrieval Generalize to the Retrieval of Responses for Dialogues?

    Authors: Gustavo Penha, Claudia Hauff

    Abstract: A number of learned sparse and dense retrieval approaches have recently been proposed and proven effective in tasks such as passage retrieval and document retrieval. In this paper we analyze with a replicability study if the lessons learned generalize to the retrieval of responses for dialogues, an important task for the increasingly popular field of conversational search. Unlike passage and docum… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: Accepted for publication in the European Conference on Information Retrieval (ECIR'23). arXiv admin note: substantial text overlap with arXiv:2204.10558

  7. arXiv:2204.10558  [pdf, other

    cs.IR cs.CL cs.LG

    Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues

    Authors: Gustavo Penha, Claudia Hauff

    Abstract: Ranking responses for a given dialogue context is a popular benchmark in which the setup is to re-rank the ground-truth response over a limited set of $n$ responses, where $n$ is typically 10. The predominance of this setup in conversation response ranking has lead to a great deal of attention to building neural re-rankers, while the first-stage retrieval step has been overlooked. Since the correc… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  8. arXiv:2111.13057  [pdf, other

    cs.IR

    Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators

    Authors: Gustavo Penha, Arthur Câmara, Claudia Hauff

    Abstract: Heavily pre-trained transformers for language modelling, such as BERT, have shown to be remarkably effective for Information Retrieval (IR) tasks, typically applied to re-rank the results of a first-stage retrieval model. IR benchmarks evaluate the effectiveness of retrieval pipelines based on the premise that a single query is used to instantiate the underlying information need. However, previous… ▽ More

    Submitted 15 February, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Accepted for publication in the 44nd European Conference on Information Retrieval (ECIR'22). V3: Fixed Table 2

  9. arXiv:2101.04356  [pdf, other

    cs.IR cs.CL cs.LG

    On the Calibration and Uncertainty of Neural Learning to Rank Models

    Authors: Gustavo Penha, Claudia Hauff

    Abstract: According to the Probability Ranking Principle (PRP), ranking documents in decreasing order of their probability of relevance leads to an optimal document ranking for ad-hoc retrieval. The PRP holds when two conditions are met: [C1] the models are well calibrated, and, [C2] the probabilities of relevance are reported with certainty. We know however that deep neural networks (DNNs) are often not we… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: Accepted for publication in the 16th conference of the European Chapter of the Association for Computational Linguistics (EACL'21)

  10. arXiv:2012.08575  [pdf, other

    cs.IR cs.LG

    Weakly Supervised Label Smoothing

    Authors: Gustavo Penha, Claudia Hauff

    Abstract: We study Label Smoothing (LS), a widely used regularization technique, in the context of neural learning to rank (L2R) models. LS combines the ground-truth labels with a uniform distribution, encouraging the model to be less confident in its predictions. We analyze the relationship between the non-relevant documents-specifically how they are sampled-and the effectiveness of LS, discussing how LS c… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted for publication in the 43nd European Conference on Information Retrieval (ECIR'21)

  11. arXiv:2010.03343  [pdf, other

    cs.IR cs.CL

    Slice-Aware Neural Ranking

    Authors: Gustavo Penha, Claudia Hauff

    Abstract: Understanding when and why neural ranking models fail for an IR task via error analysis is an important part of the research cycle. Here we focus on the challenges of (i) identifying categories of difficult instances (a pair of question and response candidates) for which a neural ranker is ineffective and (ii) improving neural ranking for such instances. To address both challenges we resort to sli… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: Paper accepted to EMNLP workshop SCAI 2020

  12. What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

    Authors: Gustavo Penha, Claudia Hauff

    Abstract: Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling by achieving impressive results on numerous downstream tasks. It has also been shown that they are able to implicitly store factual knowledge in their parameters after pre-training. Understanding what the pre-training procedure of LMs actually learns is a crucial step for using a… ▽ More

    Submitted 4 March, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted for publication at RecSys'20

  13. arXiv:1912.08555  [pdf, other

    cs.IR cs.CL cs.LG

    Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking

    Authors: Gustavo Penha, Claudia Hauff

    Abstract: Neural ranking models are traditionally trained on a series of random batches, sampled uniformly from the entire training set. Curriculum learning has recently been shown to improve neural models' effectiveness by sampling batches non-uniformly, going from easy to difficult instances during training. In the context of neural Information Retrieval (IR) curriculum learning has not been explored yet,… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: Accepted for publication in the 42nd European Conference on Information Retrieval (ECIR'20)

  14. arXiv:1912.04639  [pdf, other

    cs.CL cs.IR cs.LG

    Introducing MANtIS: a novel Multi-Domain Information Seeking Dialogues Dataset

    Authors: Gustavo Penha, Alexandru Balan, Claudia Hauff

    Abstract: Conversational search is an approach to information retrieval (IR), where users engage in a dialogue with an agent in order to satisfy their information needs. Previous conceptual work described properties and actions a good agent should exhibit. Unlike them, we present a novel conceptual model defined in terms of conversational goals, which enables us to reason about current research practices in… ▽ More

    Submitted 10 December, 2019; originally announced December 2019.