-
GRIT: Graph-based Recall Improvement for Task-oriented E-commerce Queries
Authors:
Hrishikesh Kulkarni,
Surya Kallumadi,
Sean MacAvaney,
Nazli Goharian,
Ophir Frieder
Abstract:
Many e-commerce search pipelines have four stages, namely: retrieval, filtering, ranking, and personalized-reranking. The retrieval stage must be efficient and yield high recall because relevant products missed in the first stage cannot be considered in later stages. This is challenging for task-oriented queries (queries with actionable intent) where user requirements are contextually intensive an…
▽ More
Many e-commerce search pipelines have four stages, namely: retrieval, filtering, ranking, and personalized-reranking. The retrieval stage must be efficient and yield high recall because relevant products missed in the first stage cannot be considered in later stages. This is challenging for task-oriented queries (queries with actionable intent) where user requirements are contextually intensive and difficult to understand. To foster research in the domain of e-commerce, we created a novel benchmark for Task-oriented Queries (TQE) by using LLM, which operates over the existing ESCI product search dataset. Furthermore, we propose a novel method 'Graph-based Recall Improvement for Task-oriented queries' (GRIT) to address the most crucial first-stage recall improvement needs. GRIT leads to robust and statistically significant improvements over state-of-the-art lexical, dense, and learned-sparse baselines. Our system supports both traditional and task-oriented e-commerce queries, yielding up to 6.3% recall improvement. In the indexing stage, GRIT first builds a product-product similarity graph using user clicks or manual annotation data. During retrieval, it locates neighbors with higher contextual and action relevance and prioritizes them over the less relevant candidates from the initial retrieval. This leads to a more comprehensive and relevant first-stage result set that improves overall system recall. Overall, GRIT leverages the locality relationships and contextual insights provided by the graph using neighboring nodes to enrich the first-stage retrieval results. We show that the method is not only robust across all introduced parameters, but also works effectively on top of a variety of first-stage retrieval methods.
△ Less
Submitted 16 February, 2025;
originally announced April 2025.
-
Bridging Personalization and Control in Scientific Personalized Search
Authors:
Sheshera Mysore,
Garima Dhanania,
Kishor Patil,
Surya Kallumadi,
Andrew McCallum,
Hamed Zamani
Abstract:
Personalized search is a problem where models benefit from learning user preferences from per-user historical interaction data. The inferred preferences enable personalized ranking models to improve the relevance of documents for users. However, personalization is also seen as opaque in its use of historical interactions and is not amenable to users' control. Further, personalization limits the di…
▽ More
Personalized search is a problem where models benefit from learning user preferences from per-user historical interaction data. The inferred preferences enable personalized ranking models to improve the relevance of documents for users. However, personalization is also seen as opaque in its use of historical interactions and is not amenable to users' control. Further, personalization limits the diversity of information users are exposed to. While search results may be automatically diversified this does little to address the lack of control over personalization. In response, we introduce a model for personalized search that enables users to control personalized rankings proactively. Our model, CtrlCE, is a novel cross-encoder model augmented with an editable memory built from users' historical interactions. The editable memory allows cross-encoders to be personalized efficiently and enables users to control personalized ranking. Next, because all queries do not require personalization, we introduce a calibrated mixing model which determines when personalization is necessary. This enables users to control personalization via their editable memory only when necessary. To thoroughly evaluate CtrlCE, we demonstrate its empirical performance in four domains of science, its ability to selectively request user control in a calibration evaluation of the mixing model, and the control provided by its editable memory in a user study.
△ Less
Submitted 30 April, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Forecasting Live Chat Intent from Browsing History
Authors:
Se-eun Yoon,
Ahmad Bin Rabiah,
Zaid Alibadi,
Surya Kallumadi,
Julian McAuley
Abstract:
Customers reach out to online live chat agents with various intents, such as asking about product details or requesting a return. In this paper, we propose the problem of predicting user intent from browsing history and address it through a two-stage approach. The first stage classifies a user's browsing history into high-level intent categories. Here, we represent each browsing history as a text…
▽ More
Customers reach out to online live chat agents with various intents, such as asking about product details or requesting a return. In this paper, we propose the problem of predicting user intent from browsing history and address it through a two-stage approach. The first stage classifies a user's browsing history into high-level intent categories. Here, we represent each browsing history as a text sequence of page attributes and use the ground-truth class labels to fine-tune pretrained Transformers. The second stage provides a large language model (LLM) with the browsing history and predicted intent class to generate fine-grained intents. For automatic evaluation, we use a separate LLM to judge the similarity between generated and ground-truth intents, which closely aligns with human judgments. Our two-stage approach yields significant performance gains compared to generating intents without the classification stage.
△ Less
Submitted 1 September, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation
Authors:
Alireza Salemi,
Surya Kallumadi,
Hamed Zamani
Abstract:
This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms…
▽ More
This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms that solicit feedback from the downstream personalized generation tasks for retrieval optimization -- one based on reinforcement learning whose reward function is defined using any arbitrary metric for personalized generation and another based on knowledge distillation from the downstream LLM to the retrieval model. This paper also introduces a pre- and post-generation retriever selection model that decides what retriever to choose for each LLM input. Extensive experiments on diverse tasks from the language model personalization (LaMP) benchmark reveal statistically significant improvements in six out of seven datasets.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Overview of the TREC 2023 Product Product Search Track
Authors:
Daniel Campos,
Surya Kallumadi,
Corby Rosset,
Cheng Xiang Zhai,
Alessandro Magnani
Abstract:
This is the first year of the TREC Product search track. The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy. This year we leverage the new product search corpus, which includes contextual metadata. Our analysis shows that in the product search domain, traditional retrieval systems are highly e…
▽ More
This is the first year of the TREC Product search track. The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy. This year we leverage the new product search corpus, which includes contextual metadata. Our analysis shows that in the product search domain, traditional retrieval systems are highly effective and commonly outperform general-purpose pretrained embedding models. Our analysis also evaluates the impact of using simplified and metadata-enhanced collections, finding no clear trend in the impact of the expanded collection. We also see some surprising outcomes; despite their widespread adoption and competitive performance on other tasks, we find single-stage dense retrieval runs can commonly be noncompetitive or generate low-quality results both in the zero-shot and fine-tuned domain.
△ Less
Submitted 15 November, 2023; v1 submitted 13 November, 2023;
originally announced November 2023.
-
A Personalized Dense Retrieval Framework for Unified Information Access
Authors:
Hansi Zeng,
Surya Kallumadi,
Zaid Alibadi,
Rodrigo Nogueira,
Hamed Zamani
Abstract:
Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest ne…
▽ More
Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest neighbor search have smoothed the path towards achieving this goal. We develop a generic and extensible dense retrieval framework, called \framework, that can handle a wide range of (personalized) information access requests, such as keyword search, query by example, and complementary item recommendation. Our proposed approach extends the capabilities of dense retrieval models for ad-hoc retrieval tasks by incorporating user-specific preferences through the development of a personalized attentive network. This allows for a more tailored and accurate personalized information access experience. Our experiments on real-world e-commerce data suggest the feasibility of developing universal information access models by demonstrating significant improvements even compared to competitive baselines specifically developed for each of these individual information access tasks. This work opens up a number of fundamental research directions for future exploration.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
A Boring-yet-effective Approach for the Product Ranking Task of the Amazon KDD Cup 2022
Authors:
Vitor Jeronymo,
Guilherme Rosa,
Surya Kallumadi,
Roberto Lotufo,
Rodrigo Nogueira
Abstract:
In this work we describe our submission to the product ranking task of the Amazon KDD Cup 2022. We rely on a receipt that showed to be effective in previous competitions: we focus our efforts towards efficiently training and deploying large language odels, such as mT5, while reducing to a minimum the number of task-specific adaptations. Despite the simplicity of our approach, our best model was le…
▽ More
In this work we describe our submission to the product ranking task of the Amazon KDD Cup 2022. We rely on a receipt that showed to be effective in previous competitions: we focus our efforts towards efficiently training and deploying large language odels, such as mT5, while reducing to a minimum the number of task-specific adaptations. Despite the simplicity of our approach, our best model was less than 0.004 nDCG@20 below the top submission. As the top 20 teams achieved an nDCG@20 close to .90, we argue that we need more difficult e-Commerce evaluation datasets to discriminate retrieval methods.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Diversifying Multi-aspect Search Results Using Simpson's Diversity Index
Authors:
Jianghong Zhou,
Eugene Agichtein,
Surya Kallumadi
Abstract:
In search and recommendation, diversifying the multi-aspect search results could help with reducing redundancy, and promoting results that might not be shown otherwise. Many previous methods have been proposed for this task. However, previous methods do not explicitly consider the uniformity of the number of the items' classes, or evenness, which could degrade the search and recommendation quality…
▽ More
In search and recommendation, diversifying the multi-aspect search results could help with reducing redundancy, and promoting results that might not be shown otherwise. Many previous methods have been proposed for this task. However, previous methods do not explicitly consider the uniformity of the number of the items' classes, or evenness, which could degrade the search and recommendation quality. To address this problem, we introduce a novel method by adapting the Simpson's Diversity Index from biology, which enables a more effective and efficient quadratic search result diversification algorithm. We also extend the method to balance the diversity between multiple aspects through weighted factors and further improve computational complexity by developing a fast approximation algorithm. We demonstrate the feasibility of the proposed method using the openly available Kaggle shoes competition dataset. Our experimental results show that our approach outperforms previous state of the art diversification methods, while reducing computational complexity.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
De-Biased Modelling of Search Click Behavior with Reinforcement Learning
Authors:
Jianghong Zhou,
Sayyed M. Zahiri,
Simon Hughes,
Khalifeh Al Jadda,
Surya Kallumadi,
Eugene Agichtein
Abstract:
Users' clicks on Web search results are one of the key signals for evaluating and improving web search quality and have been widely used as part of current state-of-the-art Learning-To-Rank(LTR) models. With a large volume of search logs available for major search engines, effective models of searcher click behavior have emerged to evaluate and train LTR models. However, when modeling the users' c…
▽ More
Users' clicks on Web search results are one of the key signals for evaluating and improving web search quality and have been widely used as part of current state-of-the-art Learning-To-Rank(LTR) models. With a large volume of search logs available for major search engines, effective models of searcher click behavior have emerged to evaluate and train LTR models. However, when modeling the users' click behavior, considering the bias of the behavior is imperative. In particular, when a search result is not clicked, it is not necessarily chosen as not relevant by the user, but instead could have been simply missed, especially for lower-ranked results. These kinds of biases in the click log data can be incorporated into the click models, propagating the errors to the resulting LTR ranking models or evaluation metrics. In this paper, we propose the De-biased Reinforcement Learning Click model (DRLC). The DRLC model relaxes previously made assumptions about the users' examination behavior and resulting latent states. To implement the DRLC model, convolutional neural networks are used as the value networks for reinforcement learning, trained to learn a policy to reduce bias in the click logs. To demonstrate the effectiveness of the DRLC model, we first compare performance with the previous state-of-art approaches using established click prediction metrics, including log-likelihood and perplexity. We further show that DRLC also leads to improvements in ranking performance. Our experiments demonstrate the effectiveness of the DRLC model in learning to reduce bias in click logs, leading to improved modeling performance and showing the potential for using DRLC for improving Web search quality.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
DeepCAT: Deep Category Representation for Query Understanding in E-commerce Search
Authors:
Ali Ahmadvand,
Surya Kallumadi,
Faizan Javed,
Eugene Agichtein
Abstract:
Mapping a search query to a set of relevant categories in the product taxonomy is a significant challenge in e-commerce search for two reasons: 1) Training data exhibits severe class imbalance problem due to biased click behavior, and 2) queries with little customer feedback (e.g., tail queries) are not well-represented in the training set, and cause difficulties for query understanding. To addres…
▽ More
Mapping a search query to a set of relevant categories in the product taxonomy is a significant challenge in e-commerce search for two reasons: 1) Training data exhibits severe class imbalance problem due to biased click behavior, and 2) queries with little customer feedback (e.g., tail queries) are not well-represented in the training set, and cause difficulties for query understanding. To address these problems, we propose a deep learning model, DeepCAT, which learns joint word-category representations to enhance the query understanding process. We believe learning category interactions helps to improve the performance of category mapping on minority classes, tail and torso queries. DeepCAT contains a novel word-category representation model that trains the category representations based on word-category co-occurrences in the training set. The category representation is then leveraged to introduce a new loss function to estimate the category-category co-occurrences for refining joint word-category embeddings. To demonstrate our model's effectiveness on minority categories and tail queries, we conduct two sets of experiments. The results show that DeepCAT reaches a 10% improvement on minority classes and a 7.1% improvement on tail queries over a state-of-the-art label embedding model. Our findings suggest a promising direction for improving e-commerce search by semantic modeling of taxonomy hierarchies.
△ Less
Submitted 10 May, 2021; v1 submitted 23 April, 2021;
originally announced April 2021.
-
APRF-Net: Attentive Pseudo-Relevance Feedback Network for Query Categorization
Authors:
Ali Ahmadvand,
Sayyed M. Zahiri,
Simon Hughes,
Khalifa Al Jadda,
Surya Kallumadi,
Eugene Agichtein
Abstract:
Query categorization is an essential part of query intent understanding in e-commerce search. A common query categorization task is to select the relevant fine-grained product categories in a product taxonomy. For frequent queries, rich customer behavior (e.g., click-through data) can be used to infer the relevant product categories. However, for more rare queries, which cover a large volume of se…
▽ More
Query categorization is an essential part of query intent understanding in e-commerce search. A common query categorization task is to select the relevant fine-grained product categories in a product taxonomy. For frequent queries, rich customer behavior (e.g., click-through data) can be used to infer the relevant product categories. However, for more rare queries, which cover a large volume of search traffic, relying solely on customer behavior may not suffice due to the lack of this signal. To improve categorization of rare queries, we adapt the Pseudo-Relevance Feedback (PRF) approach to utilize the latent knowledge embedded in semantically or lexically similar product documents to enrich the representation of the more rare queries. To this end, we propose a novel deep neural model named Attentive Pseudo Relevance Feedback Network (APRF-Net) to enhance the representation of rare queries for query categorization. To demonstrate the effectiveness of our approach, we collect search queries from a large commercial search engine, and compare APRF-Net to state-of-the-art deep learning models for text classification. Our results show that the APRF-Net significantly improves query categorization by 5.9% on F1@1 score over the baselines, which increases to 8.2% improvement for the rare (tail) queries. The findings of this paper can be leveraged for further improvements in search query representation and understanding.
△ Less
Submitted 10 May, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Semantic Product Search for Matching Structured Product Catalogs in E-Commerce
Authors:
Jason Ingyu Choi,
Surya Kallumadi,
Bhaskar Mitra,
Eugene Agichtein,
Faizan Javed
Abstract:
Retrieving all semantically relevant products from the product catalog is an important problem in E-commerce. Compared to web documents, product catalogs are more structured and sparse due to multi-instance fields that encode heterogeneous aspects of products (e.g. brand name and product dimensions). In this paper, we propose a new semantic product search algorithm that learns to represent and agg…
▽ More
Retrieving all semantically relevant products from the product catalog is an important problem in E-commerce. Compared to web documents, product catalogs are more structured and sparse due to multi-instance fields that encode heterogeneous aspects of products (e.g. brand name and product dimensions). In this paper, we propose a new semantic product search algorithm that learns to represent and aggregate multi-instance fields into a document representation using state of the art transformers as encoders. Our experiments investigate two aspects of the proposed approach: (1) effectiveness of field representations and structured matching; (2) effectiveness of adding lexical features to semantic search. After training our models using user click logs from a well-known E-commerce platform, we show that our results provide useful insights for improving product search. Lastly, we present a detailed error analysis to show which types of queries benefited the most by fielded representations and structured matching.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
JointMap: Joint Query Intent Understanding For Modeling Intent Hierarchies in E-commerce Search
Authors:
Ali Ahmadvand,
Surya Kallumadi,
Faizan Javed,
Eugene Agichtein
Abstract:
An accurate understanding of a user's query intent can help improve the performance of downstream tasks such as query scoping and ranking. In the e-commerce domain, recent work in query understanding focuses on the query to product-category mapping. But, a small yet significant percentage of queries (in our website 1.5% or 33M queries in 2019) have non-commercial intent associated with them. These…
▽ More
An accurate understanding of a user's query intent can help improve the performance of downstream tasks such as query scoping and ranking. In the e-commerce domain, recent work in query understanding focuses on the query to product-category mapping. But, a small yet significant percentage of queries (in our website 1.5% or 33M queries in 2019) have non-commercial intent associated with them. These intents are usually associated with non-commercial information seeking needs such as discounts, store hours, installation guides, etc. In this paper, we introduce Joint Query Intent Understanding (JointMap), a deep learning model to simultaneously learn two different high-level user intent tasks: 1) identifying a query's commercial vs. non-commercial intent, and 2) associating a set of relevant product categories in taxonomy to a product query. JointMap model works by leveraging the transfer bias that exists between these two related tasks through a joint-learning process. As curating a labeled data set for these tasks can be expensive and time-consuming, we propose a distant supervision approach in conjunction with an active learning model to generate high-quality training data sets. To demonstrate the effectiveness of JointMap, we use search queries collected from a large commercial website. Our results show that JointMap significantly improves both "commercial vs. non-commercial" intent prediction and product category mapping by 2.3% and 10% on average over state-of-the-art deep learning methods. Our findings suggest a promising direction to model the intent hierarchies in an e-commerce search engine.
△ Less
Submitted 29 May, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Report on the SIGIR 2019 Workshop on eCommerce (ECOM19)
Authors:
Jon Degenhardt,
Surya Kallumadi,
Utkarsh Porwal,
Andrew Trotman
Abstract:
The SIGIR 2019 Workshop on eCommerce (ECOM19), was a full day workshop that took place on Thursday, July 25, 2019 in Paris, France. The purpose of the workshop was to serve as a platform for publication and discussion of Information Retrieval and NLP research and their applications in the domain of eCommerce. The workshop program was designed to bring together practitioners and researchers from ac…
▽ More
The SIGIR 2019 Workshop on eCommerce (ECOM19), was a full day workshop that took place on Thursday, July 25, 2019 in Paris, France. The purpose of the workshop was to serve as a platform for publication and discussion of Information Retrieval and NLP research and their applications in the domain of eCommerce. The workshop program was designed to bring together practitioners and researchers from academia and industry to discuss the challenges and approaches to product search and recommendation in the eCommerce domain. A second goal was to run a data challenge on real-world eCommerce data. The workshop drew contributions from both industry as well as academia, in total the workshop received 38 submissions, and accepted 24 (63%). There were two keynotes by invited speakers, a poster session where all the accepted submissions were presented, a panel discussion, and three short talks by invited speakers.
△ Less
Submitted 27 December, 2019;
originally announced December 2019.
-
A Line in the Sand: Recommendation or Ad-hoc Retrieval?
Authors:
Surya Kallumadi,
Bhaskar Mitra,
Tereza Iofciu
Abstract:
The popular approaches to recommendation and ad-hoc retrieval tasks are largely distinct in the literature. In this work, we argue that many recommendation problems can also be cast as ad-hoc retrieval tasks. To demonstrate this, we build a solution for the RecSys 2018 Spotify challenge by combining standard ad-hoc retrieval models and using popular retrieval tools sets. We draw a parallel between…
▽ More
The popular approaches to recommendation and ad-hoc retrieval tasks are largely distinct in the literature. In this work, we argue that many recommendation problems can also be cast as ad-hoc retrieval tasks. To demonstrate this, we build a solution for the RecSys 2018 Spotify challenge by combining standard ad-hoc retrieval models and using popular retrieval tools sets. We draw a parallel between the playlist continuation task and the task of finding good expansion terms for queries in ad-hoc retrieval, and show that standard pseudo-relevance feedback can be effective as a collaborative filtering approach. We also use ad-hoc retrieval for content-based recommendation by treating the input playlist title as a query and associating all candidate tracks with meta-descriptions extracted from the background data. The recommendations from these two approaches are further supplemented by a nearest neighbor search based on track embeddings learned by a popular neural model. Our final ranked list of recommendations is produced by a learning to rank model. Our proposed solution using ad-hoc retrieval models achieved a competitive performance on the music recommendation task at RecSys 2018 challenge---finishing at rank 7 out of 112 participating teams and at rank 5 out of 31 teams for the main and the creative tracks, respectively.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.