Skip to main content

Showing 1–34 of 34 results for author: de Montjoye, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.20481  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CR

    Counterfactual Influence as a Distributional Quantity

    Authors: Matthieu Meeus, Igor Shilov, Georgios Kaissis, Yves-Alexandre de Montjoye

    Abstract: Machine learning models are known to memorize samples from their training data, raising concerns around privacy and generalization. Counterfactual self-influence is a popular metric to study memorization, quantifying how the model's prediction for a sample changes depending on the sample's inclusion in the training dataset. However, recent work has shown memorization to be affected by factors beyo… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Workshop on The Impact of Memorization on Trustworthy Foundation Models (MemFM) @ ICML 2025

  2. arXiv:2505.18773  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

    Authors: Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper

    Abstract: State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, wea… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  3. arXiv:2505.15738  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.LG

    Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

    Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye

    Abstract: Large language models (LLMs) are rapidly deployed in real-world applications ranging from chatbots to agentic systems. Alignment is one of the main approaches used to defend against attacks such as prompt injection and jailbreaks. Recent defenses report near-zero Attack Success Rates (ASR) even against Greedy Coordinate Gradient (GCG), a white-box attack that generates adversarial suffixes to indu… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2505.01524  [pdf, ps, other

    cs.CR cs.AI cs.LG

    The DCR Delusion: Measuring the Privacy Risk of Synthetic Data

    Authors: Zexi Yao, Nataša Krčo, Georgi Ganev, Yves-Alexandre de Montjoye

    Abstract: Synthetic data has become an increasingly popular way to share data without revealing sensitive information. Though Membership Inference Attacks (MIAs) are widely considered the gold standard for empirically assessing the privacy of a synthetic dataset, practitioners and researchers often rely on simpler proxy metrics such as Distance to Closest Record (DCR). These metrics estimate privacy by meas… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  5. arXiv:2504.18497  [pdf, other

    cs.CR cs.AI

    DeSIA: Attribute Inference Attacks Against Limited Fixed Aggregate Statistics

    Authors: Yifeng Mao, Bozhidar Stevanoski, Yves-Alexandre de Montjoye

    Abstract: Empirical inference attacks are a popular approach for evaluating the privacy risk of data release mechanisms in practice. While an active attack literature exists to evaluate machine learning models or synthetic data release, we currently lack comparable methods for fixed aggregate statistics, in particular when only a limited number of statistics are released. We here propose an inference attack… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  6. arXiv:2412.20456  [pdf, other

    cs.CR

    Sub-optimal Learning in Meta-Classifier Attacks: A Study of Membership Inference on Differentially Private Location Aggregates

    Authors: Yuhan Liu, Florent Guepin, Igor Shilov, Yves-Alexandre De Montjoye

    Abstract: The widespread collection and sharing of location data, even in aggregated form, raises major privacy concerns. Previous studies used meta-classifier-based membership inference attacks~(MIAs) with multi-layer perceptrons~(MLPs) to estimate privacy risks in location data, including when protected by differential privacy (DP). In this work, however, we show that a significant gap exists between the… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  7. arXiv:2412.08549  [pdf, other

    cs.LG cs.SD eess.AS

    Watermarking Training Data of Music Generation Models

    Authors: Pascal Epple, Igor Shilov, Bozhidar Stevanoski, Yves-Alexandre de Montjoye

    Abstract: Generative Artificial Intelligence (Gen-AI) models are increasingly used to produce content across domains, including text, images, and audio. While these models represent a major technical breakthrough, they gain their generative capabilities from being trained on enormous amounts of human-generated content, which often includes copyrighted material. In this work, we investigate whether audio wat… ▽ More

    Submitted 12 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

  8. arXiv:2411.05743  [pdf, ps, other

    cs.LG cs.CR

    Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods

    Authors: Joseph Pollock, Igor Shilov, Euodia Dodd, Yves-Alexandre de Montjoye

    Abstract: Membership inference attacks (MIAs) are widely used to empirically assess privacy risks in machine learning models, both providing model-level vulnerability metrics and identifying the most vulnerable training samples. State-of-the-art methods, however, require training hundreds of shadow models with the same architecture as the target model. This makes the computational cost of assessing the priv… ▽ More

    Submitted 12 June, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

  9. QueryCheetah: Fast Automated Discovery of Attribute Inference Attacks Against Query-Based Systems

    Authors: Bozhidar Stevanoski, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Query-based systems (QBSs) are one of the key approaches for sharing data. QBSs allow analysts to request aggregate information from a private protected dataset. Attacks are a crucial part of ensuring QBSs are truly privacy-preserving. The development and testing of attacks is however very labor-intensive and unable to cope with the increasing complexity of systems. Automated approaches have been… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: This is an extended version of the ACM CCS paper which includes appendices

  10. arXiv:2406.18671  [pdf, other

    cs.CR cs.LG

    A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

    Authors: Vincent Guan, Florent Guépin, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Location data is frequently collected from populations and shared in aggregate form to guide policy and decision making. However, the prevalence of aggregated data also raises the privacy concern of membership inference attacks (MIAs). MIAs infer whether an individual's data contributed to the aggregate release. Although effective MIAs have been developed for aggregate location data, these require… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: To be published in PETS 2024

  11. arXiv:2406.17975  [pdf, ps, other

    cs.CL cs.CR cs.LG

    SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)

    Authors: Matthieu Meeus, Igor Shilov, Shubham Jain, Manuel Faysse, Marek Rei, Yves-Alexandre de Montjoye

    Abstract: Whether LLMs memorize their training data and what this means, from measuring privacy leakage to detecting copyright violations, has become a rapidly growing area of research. In the last few months, more than 10 new methods have been proposed to perform Membership Inference Attacks (MIAs) against LLMs. Contrary to traditional MIAs which rely on fixed-but randomized-records or models, these method… ▽ More

    Submitted 7 March, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: IEEE Conference on Secure and Trustworthy Machine Learning (SaTML 2025)

  12. arXiv:2406.13433  [pdf, ps, other

    cs.LG cs.AI

    Certification for Differentially Private Prediction in Gradient-Based Training

    Authors: Matthew Wicker, Philip Sosnin, Igor Shilov, Adrianna Janik, Mark N. Müller, Yves-Alexandre de Montjoye, Adrian Weller, Calvin Tsay

    Abstract: We study private prediction where differential privacy is achieved by adding noise to the outputs of a non-private model. Existing methods rely on noise proportional to the global sensitivity of the model, often resulting in sub-optimal privacy-utility trade-offs compared to private training. We introduce a novel approach for computing dataset-specific upper bounds on prediction sensitivity by lev… ▽ More

    Submitted 6 June, 2025; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: ICML 2025. 20 pages, 9 figures

  13. arXiv:2405.15523  [pdf, other

    cs.CL cs.LG

    The Mosaic Memory of Large Language Models

    Authors: Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye

    Abstract: As Large Language Models (LLMs) become widely adopted, understanding how they learn from, and memorize, training data becomes crucial. Memorization in LLMs is widely assumed to only occur as a result of sequences being repeated in the training data. Instead, we show that LLMs memorize by assembling information from similar sequences, a phenomena we call mosaic memory. We show major LLMs to exhibit… ▽ More

    Submitted 15 May, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

  14. arXiv:2405.15423  [pdf, other

    cs.LG cs.CR

    Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models

    Authors: Florent Guépin, Nataša Krčo, Matthieu Meeus, Yves-Alexandre de Montjoye

    Abstract: Membership Inference Attacks (MIAs) are widely used to evaluate the propensity of a machine learning (ML) model to memorize an individual record and the privacy risk releasing the model poses. MIAs are commonly evaluated similarly to ML models: the MIA is performed on a test set of models trained on datasets unseen during training, which are sampled from a larger pool, $D_{eval}$. The MIA is evalu… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  15. Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks

    Authors: Ana-Maria Cretu, Miruna Rusu, Yves-Alexandre de Montjoye

    Abstract: Smart meters, devices measuring the electricity and gas consumption of a household, are currently being deployed at a fast rate throughout the world. The data they collect are extremely useful, including in the fight against climate change. However, these data and the information that can be inferred from them are highly sensitive. Re-pseudonymization, i.e., the frequent replacement of random iden… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Extended version, including the Appendix, of a paper with the same title which will appear in the Proceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy (CODASPY '24). The first two authors contributed equally

  16. arXiv:2402.09363  [pdf, other

    cs.CL cs.CR

    Copyright Traps for Large Language Models

    Authors: Matthieu Meeus, Igor Shilov, Manuel Faysse, Yves-Alexandre de Montjoye

    Abstract: Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective aga… ▽ More

    Submitted 4 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 41st International Conference on Machine Learning (ICML 2024)

  17. arXiv:2310.15007  [pdf, other

    cs.CL cs.CR cs.LG

    Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

    Authors: Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye

    Abstract: With large language models (LLMs) poised to become embedded in our daily lives, questions are starting to be raised about the data they learned from. These questions range from potential bias or misinformation LLMs could retain from their training data to questions of copyright and fair use of human-generated text. However, while these questions emerge, developers of the recent state-of-the-art LL… ▽ More

    Submitted 15 July, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted at 33rd USENIX Security Symposium (USENIX Security 2024)

  18. arXiv:2307.01701  [pdf, other

    cs.CR cs.AI

    Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data

    Authors: Florent Guépin, Matthieu Meeus, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Synthetic data is emerging as one of the most promising solutions to share individual-level data while safeguarding privacy. While membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data, they currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. Thi… ▽ More

    Submitted 21 September, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Journal ref: ESORICS 2023 workshop Data Privacy Management (DPM) 2023

  19. Deep perceptual hashing algorithms with hidden dual purpose: when client-side scanning does facial recognition

    Authors: Shubham Jain, Ana-Maria Cretu, Antoine Cully, Yves-Alexandre de Montjoye

    Abstract: End-to-end encryption (E2EE) provides strong technical protections to individuals from interferences. Governments and law enforcement agencies around the world have however raised concerns that E2EE also allows illegal content to be shared undetected. Client-side scanning (CSS), using perceptual hashing (PH) to detect known illegal content before it is shared, is seen as a promising solution to pr… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Published at IEEE S&P 2023

    Journal ref: 2023 IEEE Symposium on Security and Privacy (SP), 234-252

  20. Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing

    Authors: Matthieu Meeus, Florent Guépin, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Synthetic data is seen as the most promising solution to share individual-level data while preserving privacy. Shadow modeling-based Membership Inference Attacks (MIAs) have become the standard approach to evaluate the privacy risk of synthetic data. While very effective, they require a large number of datasets to be created and models trained to evaluate the risk posed by a single record. The pri… ▽ More

    Submitted 21 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Journal ref: Computer Security ESORICS 2023

  21. arXiv:2306.05093  [pdf, other

    cs.CR cs.LG

    Investigating the Effect of Misalignment on Membership Privacy in the White-box Setting

    Authors: Ana-Maria Cretu, Daniel Jones, Yves-Alexandre de Montjoye, Shruti Tople

    Abstract: Machine learning models have been shown to leak sensitive information about their training datasets. Models are increasingly deployed on devices, raising concerns that white-box access to the model parameters increases the attack surface compared to black-box access which only provides query access. Directly extending the shadow modelling technique from the black-box to the white-box setting has b… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: To appear in the Proceedings on Privacy Enhancing Technologies (PoPETs 2024)

  22. arXiv:2304.07134  [pdf, other

    cs.CR

    Pool Inference Attacks on Local Differential Privacy: Quantifying the Privacy Guarantees of Apple's Count Mean Sketch in Practice

    Authors: Andrea Gadotti, Florimond Houssiau, Meenatchi Sundaram Muthu Selva Annamalai, Yves-Alexandre de Montjoye

    Abstract: Behavioral data generated by users' devices, ranging from emoji use to pages visited, are collected at scale to improve apps and services. These data, however, contain fine-grained records and can reveal sensitive information about individual users. Local differential privacy has been used by companies as a solution to collect data from users while preserving privacy. We here first introduce pool… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: Published at USENIX Security 2022. This is the full version, please cite the USENIX version (see journal reference field)

    Journal ref: USENIX Security 22 (2022)

  23. arXiv:2211.14062  [pdf, other

    cs.CR cs.LG

    M$^2$M: A general method to perform various data analysis tasks from a differentially private sketch

    Authors: Florimond Houssiau, Vincent Schellekens, Antoine Chatalic, Shreyas Kumar Annamraju, Yves-Alexandre de Montjoye

    Abstract: Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Published at the 18th International Workshop on Security and Trust Management (STM 2022)

  24. QuerySnout: Automating the Discovery of Attribute Inference Attacks against Query-Based Systems

    Authors: Ana-Maria Cretu, Florimond Houssiau, Antoine Cully, Yves-Alexandre de Montjoye

    Abstract: Although query-based systems (QBS) have become one of the main solutions to share data anonymously, building QBSes that robustly protect the privacy of individuals contributing to the dataset is a hard problem. Theoretical solutions relying on differential privacy guarantees are difficult to implement correctly with reasonable accuracy, while ad-hoc solutions might contain unknown vulnerabilities.… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: Published at the ACM CCS 2022 conference. This is an extended version that includes the Appendix

  25. Correlation inference attacks against machine learning models

    Authors: Ana-Maria Creţu, Florent Guépin, Yves-Alexandre de Montjoye

    Abstract: Despite machine learning models being widely used today, the relationship between a model and its training dataset is not well understood. We explore correlation inference attacks, whether and when a model leaks information about the correlations between the input variables of its training dataset. We first propose a model-less attack, where an adversary exploits the spherical parametrization of c… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Published in Science Advances. This version contains both the main paper and supplementary material. There are minor editorial differences between this version and the published version. The first two authors contributed equally

    Journal ref: Science Advances, Volume 10, Issue 28, 2024

  26. arXiv:2106.09820  [pdf, other

    cs.CR

    Adversarial Detection Avoidance Attacks: Evaluating the robustness of perceptual hashing-based client-side scanning

    Authors: Shubham Jain, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: End-to-end encryption (E2EE) by messaging platforms enable people to securely and privately communicate with one another. Its widespread adoption however raised concerns that illegal content might now be shared undetected. Following the global pushback against key escrow systems, client-side scanning based on perceptual hashing has been recently proposed by tech companies, governments and research… ▽ More

    Submitted 2 August, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: This is a revised version of the paper published at USENIX Security 2022. We now use a semi-automated procedure to remove duplicates from the ImageNet dataset

    Journal ref: 31st USENIX Security Symposium (USENIX Security 22), 2022

  27. arXiv:1808.00160  [pdf, other

    cs.CY cs.CR econ.GN

    Mapping the Privacy-Utility Tradeoff in Mobile Phone Data for Development

    Authors: Alejandro Noriega-Campero, Alex Rutherford, Oren Lederman, Yves A. de Montjoye, Alex Pentland

    Abstract: Today's age of data holds high potential to enhance the way we pursue and monitor progress in the fields of development and humanitarian action. We study the relation between data utility and privacy risk in large-scale behavioral data, focusing on mobile phone metadata as paradigmatic domain. To measure utility, we survey experts about the value of mobile phone metadata at various spatial and tem… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  28. arXiv:1807.00523  [pdf, other

    cs.CY

    Data for Refugees: The D4R Challenge on Mobility of Syrian Refugees in Turkey

    Authors: Albert Ali Salah, Alex Pentland, Bruno Lepri, Emmanuel Letouze, Patrick Vinck, Yves-Alexandre de Montjoye, Xiaowen Dong, Ozge Dagdelen

    Abstract: The Data for Refugees (D4R) Challenge is a non-profit challenge initiated to improve the conditions of the Syrian refugees in Turkey by providing a special database to scientific community for enabling research on urgent problems concerning refugees, including health, education, unemployment, safety, and social integration. The collected database is based on anonymised mobile Call Detail Record (C… ▽ More

    Submitted 14 October, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: See http://d4r.turktelekom.com.tr/ for more information on the D4R Challenge

  29. arXiv:1804.06752  [pdf, other

    cs.CR

    When the signal is in the noise: Exploiting Diffix's Sticky Noise

    Authors: Andrea Gadotti, Florimond Houssiau, Luc Rocher, Benjamin Livshits, Yves-Alexandre de Montjoye

    Abstract: Anonymized data is highly valuable to both businesses and researchers. A large body of research has however shown the strong limits of the de-identification release-and-forget model, where data is anonymized and shared. This has led to the development of privacy-preserving query-based systems. Based on the idea of "sticky noise", Diffix has been recently proposed as a novel query-based mechanism s… ▽ More

    Submitted 29 October, 2019; v1 submitted 18 April, 2018; originally announced April 2018.

    Journal ref: USENIX Security Conference Proceedings. 2019

  30. arXiv:1803.09007  [pdf, other

    cs.CY cs.CR

    Detrimental Network Effects in Privacy: A Graph-theoretic Model for Node-based Intrusions

    Authors: Florimond Houssiau, Piotr Sapiezynski, Laura Radaelli, Erez Shmueli, Yves-Alexandre de Montjoye

    Abstract: Despite proportionality being one of the tenets of data protection laws, we currently lack a robust analytical framework to evaluate the reach of modern data collections and the network effects at play. We here propose a graph-theoretic model and notions of node- and edge-observability to quantify the reach of networked data collections. We first prove closed-form expressions for our metrics and q… ▽ More

    Submitted 15 March, 2023; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: Published in Cell Patterns 4.1 (2023): 100662 at https://www.sciencedirect.com/science/article/pii/S2666389922003026

    Journal ref: Patterns 4.1 (2023): 100662

  31. Towards matching user mobility traces in large-scale datasets

    Authors: Dániel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye, Carlo Ratti

    Abstract: The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility… ▽ More

    Submitted 13 August, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

    Journal ref: IEEE Transactions on Big Data, 2018

  32. Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics

    Authors: Giuseppe D'Acquisto, Josep Domingo-Ferrer, Panayiotis Kikiras, Vicenç Torra, Yves-Alexandre de Montjoye, Athena Bourka

    Abstract: The extensive collection and processing of personal information in big data analytics has given rise to serious privacy concerns, related to wide scale electronic surveillance, profiling, and disclosure of private data. To reap the benefits of analytics without invading the individuals' private sphere, it is essential to draw the limits of big data processing and integrate data protection safeguar… ▽ More

    Submitted 18 December, 2015; originally announced December 2015.

    Comments: 80 pages. European Union Agency for Network and Information Security (ENISA) report, December 2015, ISBN 978-92-9204-160-1. https://www.enisa.europa.eu/activities/identity-and-trust/library/deliverables/big-data-protection/

    MSC Class: 94A60 ACM Class: K.4.1; D.4.6; H.2.0

  33. arXiv:1511.06660  [pdf, other

    cs.LG

    Modeling the Temporal Nature of Human Behavior for Demographics Prediction

    Authors: Bjarke Felbo, Pål Sundsøy, Alex 'Sandy' Pentland, Sune Lehmann, Yves-Alexandre de Montjoye

    Abstract: Mobile phone metadata is increasingly used for humanitarian purposes in developing countries as traditional data is scarce. Basic demographic information is however often absent from mobile phone datasets, limiting the operational impact of the datasets. For these reasons, there has been a growing interest in predicting demographic information from mobile phone metadata. Previous work focused on c… ▽ More

    Submitted 15 November, 2017; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: Accepted at ECML 2017. A previous version of this paper was titled 'Using Deep Learning to Predict Demographics from Mobile Phone Metadata' and was accepted at the ICLR 2016 workshop

  34. arXiv:1407.4885  [pdf, ps, other

    cs.CY cs.SI physics.soc-ph

    D4D-Senegal: The Second Mobile Phone Data for Development Challenge

    Authors: Yves-Alexandre de Montjoye, Zbigniew Smoreda, Romain Trinquart, Cezary Ziemlicki, Vincent D. Blondel

    Abstract: The D4D-Senegal challenge is an open innovation data challenge on anonymous call patterns of Orange's mobile phone users in Senegal. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Senegalese population. Participants to the challenge are given access to three mobile phone datasets. This… ▽ More

    Submitted 30 July, 2014; v1 submitted 18 July, 2014; originally announced July 2014.