-
SoK: Advances and Open Problems in Web Tracking
Authors:
Yash Vekaria,
Yohan Beugin,
Shaoor Munir,
Gunes Acar,
Nataliia Bielova,
Steven Englehardt,
Umar Iqbal,
Alexandros Kapravelos,
Pierre Laperdrix,
Nick Nikiforakis,
Jason Polakis,
Franziska Roesner,
Zubair Shafiq,
Sebastian Zimmeck
Abstract:
Web tracking is a pervasive and opaque practice that enables personalized advertising, retargeting, and conversion tracking. Over time, it has evolved into a sophisticated and invasive ecosystem, employing increasingly complex techniques to monitor and profile users across the web. The research community has a long track record of analyzing new web tracking techniques, designing and evaluating the…
▽ More
Web tracking is a pervasive and opaque practice that enables personalized advertising, retargeting, and conversion tracking. Over time, it has evolved into a sophisticated and invasive ecosystem, employing increasingly complex techniques to monitor and profile users across the web. The research community has a long track record of analyzing new web tracking techniques, designing and evaluating the effectiveness of countermeasures, and assessing compliance with privacy regulations. Despite a substantial body of work on web tracking, the literature remains fragmented across distinctly scoped studies, making it difficult to identify overarching trends, connect new but related techniques, and identify research gaps in the field. Today, web tracking is undergoing a once-in-a-generation transformation, driven by fundamental shifts in the advertising industry, the adoption of anti-tracking countermeasures by browsers, and the growing enforcement of emerging privacy regulations. This Systematization of Knowledge (SoK) aims to consolidate and synthesize this wide-ranging research, offering a comprehensive overview of the technical mechanisms, countermeasures, and regulations that shape the modern and rapidly evolving web tracking landscape. This SoK also highlights open challenges and outlines directions for future research, aiming to serve as a unified reference and introductory material for researchers, practitioners, and policymakers alike.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants
Authors:
Yash Vekaria,
Aurelio Loris Canino,
Jonathan Levitsky,
Alex Ciechonski,
Patricia Callejo,
Anna Maria Mandalari,
Zubair Shafiq
Abstract:
Generative AI (GenAI) browser assistants integrate powerful capabilities of GenAI in web browsers to provide rich experiences such as question answering, content summarization, and agentic navigation. These assistants, available today as browser extensions, can not only track detailed browsing activity such as search and click data, but can also autonomously perform tasks such as filling forms, ra…
▽ More
Generative AI (GenAI) browser assistants integrate powerful capabilities of GenAI in web browsers to provide rich experiences such as question answering, content summarization, and agentic navigation. These assistants, available today as browser extensions, can not only track detailed browsing activity such as search and click data, but can also autonomously perform tasks such as filling forms, raising significant privacy concerns. It is crucial to understand the design and operation of GenAI browser extensions, including how they collect, store, process, and share user data. To this end, we study their ability to profile users and personalize their responses based on explicit or inferred demographic attributes and interests of users. We perform network traffic analysis and use a novel prompting framework to audit tracking, profiling, and personalization by the ten most popular GenAI browser assistant extensions. We find that instead of relying on local in-browser models, these assistants largely depend on server-side APIs, which can be auto-invoked without explicit user interaction. When invoked, they collect and share webpage content, often the full HTML DOM and sometimes even the user's form inputs, with their first-party servers. Some assistants also share identifiers and user prompts with third-party trackers such as Google Analytics. The collection and sharing continues even if a webpage contains sensitive information such as health or personal information such as name or SSN entered in a web form. We find that several GenAI browser assistants infer demographic attributes such as age, gender, income, and interests and use this profile--which carries across browsing contexts--to personalize responses. In summary, our work shows that GenAI browser assistants can and do collect personal and sensitive information for profiling and personalization with little to no safeguards.
△ Less
Submitted 10 June, 2025; v1 submitted 20 March, 2025;
originally announced March 2025.
-
Watching TV with the Second-Party: A First Look at Automatic Content Recognition Tracking in Smart TVs
Authors:
Gianluca Anselmi,
Yash Vekaria,
Alexander D'Souza,
Patricia Callejo,
Anna Maria Mandalari,
Zubair Shafiq
Abstract:
Smart TVs implement a unique tracking approach called Automatic Content Recognition (ACR) to profile viewing activity of their users. ACR is a Shazam-like technology that works by periodically capturing the content displayed on a TV's screen and matching it against a content library to detect what content is being displayed at any given point in time. While prior research has investigated third-pa…
▽ More
Smart TVs implement a unique tracking approach called Automatic Content Recognition (ACR) to profile viewing activity of their users. ACR is a Shazam-like technology that works by periodically capturing the content displayed on a TV's screen and matching it against a content library to detect what content is being displayed at any given point in time. While prior research has investigated third-party tracking in the smart TV ecosystem, it has not looked into second-party ACR tracking that is directly conducted by the smart TV platform. In this work, we conduct a black-box audit of ACR network traffic between ACR clients on the smart TV and ACR servers. We use our auditing approach to systematically investigate whether (1) ACR tracking is agnostic to how a user watches TV (e.g., linear vs. streaming vs. HDMI), (2) privacy controls offered by smart TVs have an impact on ACR tracking, and (3) there are any differences in ACR tracking between the UK and the US. We perform a series of experiments on two major smart TV platforms: Samsung and LG. Our results show that ACR works even when the smart TV is used as a "dumb" external display, opting-out stops network traffic to ACR servers, and there are differences in how ACR works across the UK and the US.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Turning the Tide on Dark Pools? Towards Multi-Stakeholder Vulnerability Notifications in the Ad-Tech Supply Chain
Authors:
Yash Vekaria,
Rishab Nithyanand,
Zubair Shafiq
Abstract:
Online advertising relies on a complex and opaque supply chain that involves multiple stakeholders, including advertisers, publishers, and ad-networks, each with distinct and sometimes conflicting incentives. Recent research has demonstrated the existence of ad-tech supply chain vulnerabilities such as dark pooling, where low-quality publishers bundle their ad inventory with higher-quality ones to…
▽ More
Online advertising relies on a complex and opaque supply chain that involves multiple stakeholders, including advertisers, publishers, and ad-networks, each with distinct and sometimes conflicting incentives. Recent research has demonstrated the existence of ad-tech supply chain vulnerabilities such as dark pooling, where low-quality publishers bundle their ad inventory with higher-quality ones to mislead advertisers. We investigate the effectiveness of vulnerability notification campaigns aimed at mitigating dark pooling. Prior research on vulnerability notifications has primarily focused on single-stakeholder scenarios, and it is unclear whether vulnerability notifications can be effective in the multi-stakeholder ad-tech supply chain. We implement an automated vulnerability notification pipeline to systematically evaluate the responsiveness of various stakeholders, including publishers, ad-networks, and advertisers to vulnerability notifications by academics and activists. Our nine-month long multi-stakeholder notification study shows that notifications are an effective method for reducing dark pooling vulnerabilities in the online advertising ecosystem, especially when targeted towards ad-networks. Further, the sender reputation does not impact responses to notifications from activists and academics in a statistically different way. In addition to being the first notification study targeting the online advertising ecosystem, we are also the first to study multi-stakeholder context in vulnerability notifications.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Auditing the Compliance and Enforcement of Twitter's Advertising Policy
Authors:
Yash Vekaria,
Zubair Shafiq,
Savvas Zannettou
Abstract:
Online platforms have enacted various policies to maintain a safe and trustworthy advertising environment. However, the extent to which these policies are adhered to and enforced remains a subject of interest and concern. In this work, we present a large-scale audit of adult advertising on Twitter (now X), specifically focusing on compliance with its adult (sexual) content advertising policy. Twit…
▽ More
Online platforms have enacted various policies to maintain a safe and trustworthy advertising environment. However, the extent to which these policies are adhered to and enforced remains a subject of interest and concern. In this work, we present a large-scale audit of adult advertising on Twitter (now X), specifically focusing on compliance with its adult (sexual) content advertising policy. Twitter is an interesting case study in that it -- uniquely from other social media platforms -- allows posting of adult content but prohibits adult content in advertising. We analyze approximately 35 thousand ads on Twitter with respect to their compliance to the adult content ad policy through Perspective API and manual annotations. Among other things, we find that nearly 38% of ads violate Twitter's adult content advertising policy, although the platform eventually removed only about 63% of these non-compliant adult ads. We also find inconsistencies in the moderation of such ads across languages, highlighting the need for more reliable and consistent moderation practices across various languages. Overall, our findings highlight blind spots in Twitter's adult ad policy enforcement for certain languages and countries. Our work underscores the importance of external audits to monitor compliance and improve transparency in online advertising.
△ Less
Submitted 20 February, 2025; v1 submitted 21 September, 2023;
originally announced September 2023.
-
The Inventory is Dark and Full of Misinformation: Understanding the Abuse of Ad Inventory Pooling in the Ad-Tech Supply Chain
Authors:
Yash Vekaria,
Rishab Nithyanand,
Zubair Shafiq
Abstract:
Ad-tech enables publishers to programmatically sell their ad inventory to millions of demand partners through a complex supply chain. Bogus or low quality publishers can exploit the opaque nature of the ad-tech to deceptively monetize their ad inventory. In this paper, we investigate for the first time how misinformation sites subvert the ad-tech transparency standards and pool their ad inventory…
▽ More
Ad-tech enables publishers to programmatically sell their ad inventory to millions of demand partners through a complex supply chain. Bogus or low quality publishers can exploit the opaque nature of the ad-tech to deceptively monetize their ad inventory. In this paper, we investigate for the first time how misinformation sites subvert the ad-tech transparency standards and pool their ad inventory with unrelated sites to circumvent brand safety protections. We find that a few major ad exchanges are disproportionately responsible for the dark pools that are exploited by misinformation websites. We further find evidence that dark pooling allows misinformation sites to deceptively sell their ad inventory to reputable brands. We conclude with a discussion of potential countermeasures such as better vetting of ad exchange partners, adoption of new ad-tech transparency standards that enable end-to-end validation of the ad-tech supply chain, as well as widespread deployment of independent audits like ours.
△ Less
Submitted 14 October, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Differential Tracking Across Topical Webpages of Indian News Media
Authors:
Yash Vekaria,
Vibhor Agarwal,
Pushkal Agarwal,
Sangeeta Mahapatra,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties emplo…
▽ More
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties employ on websites around the world, as well as the intensity of tracking performed on such websites. However, for the sake of scaling to cover a large portion of the Web, most past studies focused on homepages of websites, and did not look deeper into the tracking practices on their topical subpages. The majority of studies focused on the Global North markets such as the EU and the USA. Large markets such as India, which covers 20% of the world population and has no explicit privacy laws, have not been studied in this regard.
We aim to address these gaps and focus on the following research questions: Is tracking on topical subpages of Indian news websites different from their homepage? Do third-party trackers prefer to track specific topics? How does this preference compare to the similarity of content shown on these topical subpages? To answer these questions, we propose a novel method for automatic extraction and categorization of Indian news topical subpages based on the details in their URLs. We study the identified topical subpages and compare them with their homepages with respect to the intensity of cookie injection and third-party embeddedness and type. We find differential user tracking among subpages, and between subpages and homepages. We also find a preferential attachment of third-party trackers to specific topics. Also, embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Under the Spotlight: Web Tracking in Indian Partisan News Websites
Authors:
Vibhor Agarwal,
Yash Vekaria,
Pushkal Agarwal,
Sangeeta Mahapatra,
Shounak Set,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters wi…
▽ More
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters without Borders, we label these websites according to their partisanship (Left, Right, or Centre). We study and compare user tracking on these sites with different metrics: numbers of cookies, cookie synchronizations, device fingerprinting, and invisible pixel-based tracking. We find that Left and Centre websites serve more cookies than Right-leaning websites. However, through cookie synchronization, more user IDs are synchronized in Left websites than Right or Centre. Canvas fingerprinting is used similarly by Left and Right, and less by Centre. Invisible pixel-based tracking is 50% more intense in Centre-leaning websites than Right, and 25% more than Left. Desktop versions of news websites deliver more cookies than their mobile counterparts. A handful of third-parties are tracking users in most websites in this study. This paper, by demonstrating intense web tracking, has implications for research on overall privacy of users visiting partisan news websites in India.
△ Less
Submitted 8 March, 2021; v1 submitted 6 February, 2021;
originally announced February 2021.