-
LLM-based Weak Supervision Framework for Query Intent Classification in Video Search
Authors:
Farnoosh Javadi,
Phanideep Gampa,
Alyssa Woo,
Xingxing Geng,
Hang Zhang,
Jose Sepulveda,
Belhassen Bayar,
Fei Wang
Abstract:
Streaming services have reshaped how we discover and engage with digital entertainment. Despite these advancements, effectively understanding the wide spectrum of user search queries continues to pose a significant challenge. An accurate query understanding system that can handle a variety of entities that represent different user intents is essential for delivering an enhanced user experience. We…
▽ More
Streaming services have reshaped how we discover and engage with digital entertainment. Despite these advancements, effectively understanding the wide spectrum of user search queries continues to pose a significant challenge. An accurate query understanding system that can handle a variety of entities that represent different user intents is essential for delivering an enhanced user experience. We can build such a system by training a natural language understanding (NLU) model; however, obtaining high-quality labeled training data in this specialized domain is a substantial obstacle. Manual annotation is costly and impractical for capturing users' vast vocabulary variations. To address this, we introduce a novel approach that leverages large language models (LLMs) through weak supervision to automatically annotate a vast collection of user search queries. Using prompt engineering and a diverse set of LLM personas, we generate training data that matches human annotator expectations. By incorporating domain knowledge via Chain of Thought and In-Context Learning, our approach leverages the labeled data to train low-latency models optimized for real-time inference. Extensive evaluations demonstrated that our approach outperformed the baseline with an average relative gain of 113% in recall. Furthermore, our novel prompt engineering framework yields higher quality LLM-generated data to be used for weak supervision; we observed 47.60% improvement over baseline in agreement rate between LLM predictions and human annotations with respect to F1 score, weighted according to the distribution of occurrences of the search queries. Our persona selection routing mechanism further adds an additional 3.67% increase in weighted F1 score on top of our novel prompt engineering framework.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Multi-Task Learning For Reduced Popularity Bias In Multi-Territory Video Recommendations
Authors:
Phanideep Gampa,
Farnoosh Javadi,
Belhassen Bayar,
Ainur Yessenalina
Abstract:
Various data imbalances that naturally arise in a multi-territory personalized recommender system can lead to a significant item bias for globally prevalent items. A locally popular item can be overshadowed by a globally prevalent item. Moreover, users' viewership patterns/statistics can drastically change from one geographic location to another which may suggest to learn specific user embeddings.…
▽ More
Various data imbalances that naturally arise in a multi-territory personalized recommender system can lead to a significant item bias for globally prevalent items. A locally popular item can be overshadowed by a globally prevalent item. Moreover, users' viewership patterns/statistics can drastically change from one geographic location to another which may suggest to learn specific user embeddings. In this paper, we propose a multi-task learning (MTL) technique, along with an adaptive upsampling method to reduce popularity bias in multi-territory recommendations. Our proposed framework is designed to enrich training examples with active users representation through upsampling, and capable of learning geographic-based user embeddings by leveraging MTL. Through experiments, we demonstrate the effectiveness of our framework in multiple territories compared to a baseline not incorporating our proposed techniques.~Noticeably, we show improved relative gain of up to $65.27\%$ in PR-AUC metric. A case study is presented to demonstrate the advantages of our methods in attenuating the popularity bias of global items.
△ Less
Submitted 24 September, 2023;
originally announced October 2023.
-
Design Principles of Robust Multi-Armed Bandit Framework in Video Recommendations
Authors:
Belhassen Bayar,
Phanideep Gampa,
Ainur Yessenalina,
Zhen Wen
Abstract:
Current multi-armed bandit approaches in recommender systems (RS) have focused more on devising effective exploration techniques, while not adequately addressing common exploitation challenges related to distributional changes and item cannibalization. Little work exists to guide the design of robust bandit frameworks that can address these frequent challenges in RS. In this paper, we propose a ne…
▽ More
Current multi-armed bandit approaches in recommender systems (RS) have focused more on devising effective exploration techniques, while not adequately addressing common exploitation challenges related to distributional changes and item cannibalization. Little work exists to guide the design of robust bandit frameworks that can address these frequent challenges in RS. In this paper, we propose a new design principles to (i) make bandit models robust to time-variant metadata signals, (ii) less prone to item cannibalization, and (iii) prevent their weights fluctuating due to data sparsity. Through a series of experiments, we systematically examine the influence of several important bandit design choices. We demonstrate the advantage of our proposed design principles at making bandit models robust to dynamic behavioral changes through in-depth analyses. Noticeably, we show improved relative gain compared to a baseline bandit model not incorporating our design choices of up to $11.88\%$ and $44.85\%$, respectively in ROC-AUC and PR-AUC. Case studies about fairness in recommending specific popular and unpopular titles are presented, to demonstrate the robustness of our proposed design at addressing popularity biases.
△ Less
Submitted 24 September, 2023;
originally announced October 2023.
-
Dynamics of Local Elasticity During Training of Neural Nets
Authors:
Soham Dan,
Anirbit Mukherjee,
Avirup Das,
Phanideep Gampa
Abstract:
In the recent past, a property of neural training trajectories in weight-space had been isolated, that of "local elasticity" (denoted as $S_{\rm rel}$). Local elasticity attempts to quantify the propagation of the influence of a sampled data point on the prediction at another data. In this work, we embark on a comprehensive study of the existing notion of $S_{\rm rel}$ and also propose a new defin…
▽ More
In the recent past, a property of neural training trajectories in weight-space had been isolated, that of "local elasticity" (denoted as $S_{\rm rel}$). Local elasticity attempts to quantify the propagation of the influence of a sampled data point on the prediction at another data. In this work, we embark on a comprehensive study of the existing notion of $S_{\rm rel}$ and also propose a new definition that addresses the limitations that we point out for the original definition in the classification setting. On various state-of-the-art neural network training on SVHN, CIFAR-10 and CIFAR-100 we demonstrate how our new proposal of $S_{\rm rel}$, as opposed to the original definition, much more sharply detects the property of the weight updates preferring to make prediction changes within the same class as the sampled data.
In neural regression experiments we demonstrate that the original $S_{\rm rel}$ reveals a $2-$phase behavior -- that the training proceeds via an initial elastic phase when $S_{\rm rel}$ changes rapidly and an eventual inelastic phase when $S_{\rm rel}$ remains large. We show that some of these properties can be analytically reproduced in various instances of doing regression via gradient flows on model predictor classes.
△ Less
Submitted 24 August, 2023; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems
Authors:
Anirudh Goyal,
Alex Lamb,
Phanideep Gampa,
Philippe Beaudoin,
Sergey Levine,
Charles Blundell,
Yoshua Bengio,
Michael Mozer
Abstract:
Modeling a structured, dynamic environment like a video game requires keeping track of the objects and their states declarative knowledge) as well as predicting how objects behave (procedural knowledge). Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly, i.e., they lack systematicity. For example, in a video game, correct prediction…
▽ More
Modeling a structured, dynamic environment like a video game requires keeping track of the objects and their states declarative knowledge) as well as predicting how objects behave (procedural knowledge). Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly, i.e., they lack systematicity. For example, in a video game, correct prediction of one enemy's trajectory does not ensure correct prediction of another's. We address this issue via an architecture that factorizes declarative and procedural knowledge and that imposes modularity within each form of knowledge. The architecture consists of active modules called object files that maintain the state of a single object and invoke passive external knowledge sources called schemata that prescribe state updates. To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (e.g., health, position). We propose to use attention to determine which object files to update, the selection of schemata, and the propagation of information between object files. The resulting architecture is a drop-in replacement conforming to the same input-output interface as normal recurrent networks (e.g., LSTM, GRU) yet achieves substantially better generalization on environments that have multiple object tokens of the same type, including a challenging intuitive physics benchmark.
△ Less
Submitted 12 November, 2020; v1 submitted 29 June, 2020;
originally announced June 2020.
-
BanditRank: Learning to Rank Using Contextual Bandits
Authors:
Phanideep Gampa,
Sumio Fujita
Abstract:
We propose an extensible deep learning method that uses reinforcement learning to train neural networks for offline ranking in information retrieval (IR). We call our method BanditRank as it treats ranking as a contextual bandit problem. In the domain of learning to rank for IR, current deep learning models are trained on objective functions different from the measures they are evaluated on. Since…
▽ More
We propose an extensible deep learning method that uses reinforcement learning to train neural networks for offline ranking in information retrieval (IR). We call our method BanditRank as it treats ranking as a contextual bandit problem. In the domain of learning to rank for IR, current deep learning models are trained on objective functions different from the measures they are evaluated on. Since most evaluation measures are discrete quantities, they cannot be leveraged by directly using gradient descent algorithms without an approximation. BanditRank bridges this gap by directly optimizing a task-specific measure, such as mean average precision (MAP), using gradient descent. Specifically, a contextual bandit whose action is to rank input documents is trained using a policy gradient algorithm to directly maximize the reward. The reward can be a single measure, such as MAP, or a combination of several measures. The notion of ranking is also inherent in BanditRank, similar to the current \textit{listwise} approaches. To evaluate the effectiveness of BanditRank, we conducted a series of experiments on datasets related to three different tasks, i.e., web search, community, and factoid question answering. We found that it performs better than state-of-the-art methods when applied on the question answering datasets. On the web search dataset, we found that BanditRank performed better than four strong listwise baselines including LambdaMART, AdaRank, ListNet and Coordinate Ascent.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
A Tractable Algorithm For Finite-Horizon Continuous Reinforcement Learning
Authors:
Phanideep Gampa,
Sairam Satwik Kondamudi,
Lakshmanan Kailasam
Abstract:
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order $Ω(T^{2/3})$ for any algorithm discretizes the state space, improving the previous regret bound of $Ω(T^{1/2})$ of Ortner and Ryabko \cite{contrl} for the same…
▽ More
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order $Ω(T^{2/3})$ for any algorithm discretizes the state space, improving the previous regret bound of $Ω(T^{1/2})$ of Ortner and Ryabko \cite{contrl} for the same problem. Next,under the assumption that the rewards and transitions are Hölder Continuous we show that the upper bound on the discretization error is $const.Ln^{-α}T$. Finally,we give some simple experiments to validate our propositions.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.