-
CASPR: Customer Activity Sequence-based Prediction and Representation
Authors:
Pin-Jung Chen,
Sahil Bhatnagar,
Sagar Goyal,
Damian Konrad Kowalczyk,
Mayank Shrivastava
Abstract:
Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning pr…
▽ More
Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications.
△ Less
Submitted 28 November, 2022; v1 submitted 16 November, 2022;
originally announced November 2022.
-
On the Limits to Multi-Modal Popularity Prediction on Instagram -- A New Robust, Efficient and Explainable Baseline
Authors:
Christoffer Riis,
Damian Konrad Kowalczyk,
Lars Kai Hansen
Abstract:
Our global population contributes visual content on platforms like Instagram, attempting to express themselves and engage their audiences, at an unprecedented and increasing rate. In this paper, we revisit the popularity prediction on Instagram. We present a robust, efficient, and explainable baseline for population-based popularity prediction, achieving strong ranking performance. We employ the l…
▽ More
Our global population contributes visual content on platforms like Instagram, attempting to express themselves and engage their audiences, at an unprecedented and increasing rate. In this paper, we revisit the popularity prediction on Instagram. We present a robust, efficient, and explainable baseline for population-based popularity prediction, achieving strong ranking performance. We employ the latest methods in computer vision to maximize the information extracted from the visual modality. We use transfer learning to extract visual semantics such as concepts, scenes, and objects, allowing a new level of scrutiny in an extensive, explainable ablation study. We inform feature selection towards a robust and scalable model, but also illustrate feature interactions, offering new directions for further inquiry in computational social science. Our strongest models inform a lower limit to population-based predictability of popularity on Instagram. The models are immediately applicable to social media monitoring and influencer identification.
△ Less
Submitted 20 February, 2021; v1 submitted 26 April, 2020;
originally announced April 2020.
-
The Complexity of Social Media Response: Statistical Evidence For One-Dimensional Engagement Signal in Twitter
Authors:
Damian Konrad Kowalczyk,
Lars Kai Hansen
Abstract:
Many years after online social networks exceeded our collective attention, social influence is still built on attention capital. Quality is not a prerequisite for viral spreading, yet large diffusion cascades remain the hallmark of a social influencer. Consequently, our exposure to low-quality content and questionable influence is expected to increase. Since the conception of influence maximizatio…
▽ More
Many years after online social networks exceeded our collective attention, social influence is still built on attention capital. Quality is not a prerequisite for viral spreading, yet large diffusion cascades remain the hallmark of a social influencer. Consequently, our exposure to low-quality content and questionable influence is expected to increase. Since the conception of influence maximization frameworks, multiple content performance metrics became available, albeit raising the complexity of influence analysis. In this paper, we examine and consolidate a diverse set of content engagement metrics. The correlations discovered lead us to propose a new, more holistic, one-dimensional engagement signal. We then show it is more predictable than any individual influence predictors previously investigated. Our proposed model achieves strong engagement ranking performance and is the first to explain half of the variance with features available early. We share the detailed numerical workflow to compute the new compound engagement signal. The model is immediately applicable to social media monitoring, influencer identification, campaign engagement forecasting, and curating user feeds.
△ Less
Submitted 15 February, 2020; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Scalable Privacy-Compliant Virality Prediction on Twitter
Authors:
Damian Konrad Kowalczyk,
Jan Larsen
Abstract:
The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to pre…
▽ More
The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.
△ Less
Submitted 27 February, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.