-
LiRank: Industrial Large Scale Ranking Models at LinkedIn
Authors:
Fedor Borisyuk,
Mingzhou Zhou,
Qingquan Song,
Siyu Zhu,
Birjodh Tiwana,
Ganesh Parameswaran,
Siddharth Dangi,
Lars Hertel,
Qiang Xiao,
Xiaochen Hou,
Yunbo Ouyang,
Aman Gupta,
Sheallika Singh,
Dan Liu,
Hailing Cheng,
Lei Le,
Jonathan Hung,
Sathiya Keerthi,
Ruoyan Wang,
Fengyu Zhang,
Mohit Kothari,
Chen Zhu,
Daqi Sun,
Yun Dai,
Xun Luan
, et al. (9 additional authors not shown)
Abstract:
We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including…
▽ More
We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including Dense Gating, Transformers and Residual DCN. We also propose novel techniques for calibration and describe how we productionalized deep learning based explore/exploit methods. To enable effective, production-grade serving of large ranking models, we detail how to train and compress models using quantization and vocabulary compression. We provide details about the deployment setup for large-scale use cases of Feed ranking, Jobs Recommendations, and Ads click-through rate (CTR) prediction. We summarize our learnings from various A/B tests by elucidating the most effective technical approaches. These ideas have contributed to relative metrics improvements across the board at LinkedIn: +0.5% member sessions in the Feed, +1.76% qualified job applications for Jobs search and recommendations, and +4.3% for Ads CTR. We hope this work can provide practical insights and solutions for practitioners interested in leveraging large-scale deep ranking systems.
△ Less
Submitted 7 August, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Learning Speaker-specific Lip-to-Speech Generation
Authors:
Munender Varshney,
Ravindra Yadav,
Vinay P. Namboodiri,
Rajesh M Hegde
Abstract:
Understanding the lip movement and inferring the speech from it is notoriously difficult for the common person. The task of accurate lip-reading gets help from various cues of the speaker and its contextual or environmental setting. Every speaker has a different accent and speaking style, which can be inferred from their visual and speech features. This work aims to understand the correlation/mapp…
▽ More
Understanding the lip movement and inferring the speech from it is notoriously difficult for the common person. The task of accurate lip-reading gets help from various cues of the speaker and its contextual or environmental setting. Every speaker has a different accent and speaking style, which can be inferred from their visual and speech features. This work aims to understand the correlation/mapping between speech and the sequence of lip movement of individual speakers in an unconstrained and large vocabulary. We model the frame sequence as a prior to the transformer in an auto-encoder setting and learned a joint embedding that exploits temporal properties of both audio and video. We learn temporal synchronization using deep metric learning, which guides the decoder to generate speech in sync with input lip movements. The predictive posterior thus gives us the generated speech in speaker speaking style. We have trained our model on the Grid and Lip2Wav Chemistry lecture dataset to evaluate single speaker natural speech generation tasks from lip movement in an unconstrained natural setting. Extensive evaluation using various qualitative and quantitative metrics with human evaluation also shows that our method outperforms the Lip2Wav Chemistry dataset(large vocabulary in an unconstrained setting) by a good margin across almost all evaluation metrics and marginally outperforms the state-of-the-art on GRID dataset.
△ Less
Submitted 20 August, 2022; v1 submitted 4 June, 2022;
originally announced June 2022.
-
Minimizing Supervision in Multi-label Categorization
Authors:
Rajat,
Munender Varshney,
Pravendra Singh,
Vinay P. Namboodiri
Abstract:
Multiple categories of objects are present in most images. Treating this as a multi-class classification is not justified. We treat this as a multi-label classification problem. In this paper, we further aim to minimize the supervision required for providing supervision in multi-label classification. Specifically, we investigate an effective class of approaches that associate a weak localization w…
▽ More
Multiple categories of objects are present in most images. Treating this as a multi-class classification is not justified. We treat this as a multi-label classification problem. In this paper, we further aim to minimize the supervision required for providing supervision in multi-label classification. Specifically, we investigate an effective class of approaches that associate a weak localization with each category either in terms of the bounding box or segmentation mask. Doing so improves the accuracy of multi-label categorization. The approach we adopt is one of active learning, i.e., incrementally selecting a set of samples that need supervision based on the current model, obtaining supervision for these samples, retraining the model with the additional set of supervised samples and proceeding again to select the next set of samples. A crucial concern is the choice of the set of samples. In doing so, we provide a novel insight, and no specific measure succeeds in obtaining a consistently improved selection criterion. We, therefore, provide a selection criterion that consistently improves the overall baseline criterion by choosing the top k set of samples for a varied set of criteria. Using this criterion, we are able to show that we can retain more than 98% of the fully supervised performance with just 20% of samples (and more than 96% using 10%) of the dataset on PASCAL VOC 2007 and 2012. Also, our proposed approach consistently outperforms all other baseline metrics for all benchmark datasets and model combinations.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Cooperative Initialization based Deep Neural Network Training
Authors:
Pravendra Singh,
Munender Varshney,
Vinay P. Namboodiri
Abstract:
Researchers have proposed various activation functions. These activation functions help the deep network to learn non-linear behavior with a significant effect on training dynamics and task performance. The performance of these activations also depends on the initial state of the weight parameters, i.e., different initial state leads to a difference in the performance of a network. In this paper,…
▽ More
Researchers have proposed various activation functions. These activation functions help the deep network to learn non-linear behavior with a significant effect on training dynamics and task performance. The performance of these activations also depends on the initial state of the weight parameters, i.e., different initial state leads to a difference in the performance of a network. In this paper, we have proposed a cooperative initialization for training the deep network using ReLU activation function to improve the network performance. Our approach uses multiple activation functions in the initial few epochs for the update of all sets of weight parameters while training the network. These activation functions cooperate to overcome their drawbacks in the update of weight parameters, which in effect learn better "feature representation" and boost the network performance later. Cooperative initialization based training also helps in reducing the overfitting problem and does not increase the number of parameters, inference (test) time in the final model while improving the performance. Experiments show that our approach outperforms various baselines and, at the same time, performs well over various tasks such as classification and detection. The Top-1 classification accuracy of the model trained using our approach improves by 2.8% for VGG-16 and 2.1% for ResNet-56 on CIFAR-100 dataset.
△ Less
Submitted 5 January, 2020;
originally announced January 2020.
-
Using Ego-Clusters to Measure Network Effects at LinkedIn
Authors:
Guillaume Saint-Jacques,
Maneesh Varshney,
Jeremy Simpson,
Ya Xu
Abstract:
A network effect is said to take place when a new feature not only impacts the people who receive it, but also other users of the platform, like their connections or the people who follow them. This very common phenomenon violates the fundamental assumption underpinning nearly all enterprise experimentation systems, the stable unit treatment value assumption (SUTVA). When this assumption is broken…
▽ More
A network effect is said to take place when a new feature not only impacts the people who receive it, but also other users of the platform, like their connections or the people who follow them. This very common phenomenon violates the fundamental assumption underpinning nearly all enterprise experimentation systems, the stable unit treatment value assumption (SUTVA). When this assumption is broken, a typical experimentation platform, which relies on Bernoulli randomization for assignment and two-sample t-test for assessment of significance, will not only fail to account for the network effect, but potentially give highly biased results.
This paper outlines a simple and scalable solution to measuring network effects, using ego-network randomization, where a cluster is comprised of an "ego" (a focal individual), and her "alters" (the individuals she is immediately connected to). Our approach aims at maintaining representativity of clusters, avoiding strong modeling assumption, and significantly increasing power compared to traditional cluster-based randomization. In particular, it does not require product-specific experiment design, or high levels of investment from engineering teams, and does not require any changes to experimentation and analysis platforms, as it only requires assigning treatment an individual level. Each user either has the feature or does not, and no complex manipulation of interactions between users is needed. It focuses on measuring the one-out network effect (i.e the effect of my immediate connection's treatment on me), and gives reasonable estimates at a very low setup cost, allowing us to run such experiments dozens of times a year.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.