Showing 1–2 of 2 results for author: De Vries, K J

Search v0.5.6 released 2020-02-24

arXiv:2405.13692 [pdf, ps, other]

cs.LG

Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

Authors: Sergei Krutikov, Bulat Khaertdinov, Rodion Kiriukhin, Shubham Agrawal, Mozhdeh Ariannezhad, Kees Jan De Vries

Abstract: Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typ… ▽ More Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection. Our study is additionally motivated by the problem of selection bias, often occurring in real-life fraud detection systems. It is caused by the production system affecting which subset of traffic becomes labeled. This issue is typically addressed by sampling randomly a small part of the whole production data, referred to as a Control Group. This subset follows a target distribution of production data and therefore is usually preferred for training classification models with standard ML algorithms. Our methodology leverages the capabilities of Transformers to learn transferable representations using all available data by means of SSL, giving it an advantage over classical methods. Furthermore, we conduct large-scale experiments, pre-training tabular Transformers on vast amounts of data instances and fine-tuning them on smaller target datasets. The proposed approach outperforms heavily tuned GBDTs by a considerable margin of the Average Precision (AP) score in offline evaluations. Finally, we report the results of an online A/B experiment. Experimental results confirm the superiority of tabular Transformers compared to GBDTs in production, demonstrated by a statistically significant improvement in our business metric. △ Less

Submitted 30 June, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Submitted to CIKM'25, Applied Research track
arXiv:2107.01979 [pdf, other]

cs.LG cs.CR stat.AP

Machine Learning for Fraud Detection in E-Commerce: A Research Agenda

Authors: Niek Tax, Kees Jan de Vries, Mathijs de Jong, Nikoleta Dosoula, Bram van den Akker, Jon Smith, Olivier Thuong, Lucas Bernardi

Abstract: Fraud detection and prevention play an important part in ensuring the sustained operation of any e-commerce business. Machine learning (ML) often plays an important role in these anti-fraud operations, but the organizational context in which these ML models operate cannot be ignored. In this paper, we take an organization-centric view on the topic of fraud detection by formulating an operational m… ▽ More Fraud detection and prevention play an important part in ensuring the sustained operation of any e-commerce business. Machine learning (ML) often plays an important role in these anti-fraud operations, but the organizational context in which these ML models operate cannot be ignored. In this paper, we take an organization-centric view on the topic of fraud detection by formulating an operational model of the anti-fraud departments in e-commerce organizations. We derive 6 research topics and 12 practical challenges for fraud detection from this operational model. We summarize the state of the literature for each research topic, discuss potential solutions to the practical challenges, and identify 22 open research challenges. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: Accepted and to appear in the proceedings of the KDD 2021 co-located workshop: the 2nd International Workshop on Deployable Machine Learning for Security Defense (MLHat)

Search v0.5.6 released 2020-02-24