Synopsis: Secure and private trend inference from encrypted semantic embeddings
Authors:
Madelyne Xiao,
Palak Jain,
Micha Gorelick,
Sarah Scheffler
Abstract:
WhatsApp and many other commonly used communication platforms guarantee end-to-end encryption (E2EE), which requires that service providers lack the cryptographic keys to read communications on their own platforms. WhatsApp's privacy-preserving design makes it difficult to study important phenomena like the spread of misinformation or political messaging, as users have a clear expectation and desi…
▽ More
WhatsApp and many other commonly used communication platforms guarantee end-to-end encryption (E2EE), which requires that service providers lack the cryptographic keys to read communications on their own platforms. WhatsApp's privacy-preserving design makes it difficult to study important phenomena like the spread of misinformation or political messaging, as users have a clear expectation and desire for privacy and little incentive to forfeit that privacy in the process of handing over raw data to researchers, journalists, or other parties.
We introduce Synopsis, a secure architecture for analyzing messaging trends in consensually-donated E2EE messages using message embeddings. Since the goal of this system is investigative journalism workflows, Synopsis must facilitate both exploratory and targeted analyses -- a challenge for systems using differential privacy (DP), and, for different reasons, a challenge for private computation approaches based on cryptography. To meet these challenges, we combine techniques from the local and central DP models and wrap the system in malicious-secure multi-party computation to ensure the DP query architecture is the only way to access messages, preventing any party from directly viewing stored message embeddings.
Evaluations on a dataset of Hindi-language WhatsApp messages (34,024 messages represented as 500-dimensional embeddings) demonstrate the efficiency and accuracy of our approach. Queries on this data run in about 30 seconds, and the accuracy of the fine-grained interface exceeds 94% on benchmark tasks.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
The Authority of "Fair" in Machine Learning
Authors:
Michael Skirpan,
Micha Gorelick
Abstract:
In this paper, we argue for the adoption of a normative definition of fairness within the machine learning community. After characterizing this definition, we review the current literature of Fair ML in light of its implications. We end by suggesting ways to incorporate a broader community and generate further debate around how to decide what is fair in ML.
In this paper, we argue for the adoption of a normative definition of fairness within the machine learning community. After characterizing this definition, we review the current literature of Fair ML in light of its implications. We end by suggesting ways to incorporate a broader community and generate further debate around how to decide what is fair in ML.
△ Less
Submitted 7 July, 2017; v1 submitted 29 June, 2017;
originally announced June 2017.