Search | arXiv e-print repository

$(\varepsilon, δ)$ Considered Harmful: Best Practices for Reporting Differential Privacy Guarantees

Authors: Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Jamie Hayes, Borja Balle, Antti Honkela

Abstract: Current practices for reporting the level of differential privacy (DP) guarantees for machine learning (ML) algorithms provide an incomplete and potentially misleading picture of the guarantees and make it difficult to compare privacy levels across different settings. We argue for using Gaussian differential privacy (GDP) as the primary means of communicating DP guarantees in ML, with the full pri… ▽ More Current practices for reporting the level of differential privacy (DP) guarantees for machine learning (ML) algorithms provide an incomplete and potentially misleading picture of the guarantees and make it difficult to compare privacy levels across different settings. We argue for using Gaussian differential privacy (GDP) as the primary means of communicating DP guarantees in ML, with the full privacy profile as a secondary option in case GDP is too inaccurate. Unlike other widely used alternatives, GDP has only one parameter, which ensures easy comparability of guarantees, and it can accurately capture the full privacy profile of many important ML applications. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits the profiles remarkably well in all three cases. Although GDP is ideal for reporting the final guarantees, other formalisms (e.g., privacy loss random variables) are needed for accurate privacy accounting. We show that such intermediate representations can be efficiently converted to GDP with minimal loss in tightness. △ Less

Submitted 13 March, 2025; originally announced March 2025.

arXiv:2501.10366 [pdf, other]

Participatory Assessment of Large Language Model Applications in an Academic Medical Center

Authors: Giorgia Carra, Bogdan Kulynych, François Bastardot, Daniel E. Kaufmann, Noémie Boillat-Blanco, Jean Louis Raisaro

Abstract: Although Large Language Models (LLMs) have shown promising performance in healthcare-related applications, their deployment in the medical domain poses unique challenges of ethical, regulatory, and technical nature. In this study, we employ a systematic participatory approach to investigate the needs and expectations regarding clinical applications of LLMs at Lausanne University Hospital, an acade… ▽ More Although Large Language Models (LLMs) have shown promising performance in healthcare-related applications, their deployment in the medical domain poses unique challenges of ethical, regulatory, and technical nature. In this study, we employ a systematic participatory approach to investigate the needs and expectations regarding clinical applications of LLMs at Lausanne University Hospital, an academic medical center in Switzerland. Having identified potential LLM use-cases in collaboration with thirty stakeholders, including clinical staff across 11 departments as well nursing and patient representatives, we assess the current feasibility of these use-cases taking into account the regulatory frameworks, data protection regulation, bias, hallucinations, and deployment constraints. This study provides a framework for a participatory approach to identifying institutional needs with respect to introducing advanced technologies into healthcare practice, and a realistic analysis of the technology readiness level of LLMs for medical applications, highlighting the issues that would need to be overcome LLMs in healthcare to be ethical, and regulatory compliant. △ Less

Submitted 9 December, 2024; originally announced January 2025.

Comments: MeurIPS GenAI for Health Workshop

arXiv:2407.02191 [pdf, other]

Attack-Aware Noise Calibration for Differential Privacy

Authors: Bogdan Kulynych, Juan Felipe Gomez, Georgios Kaissis, Flavio du Pin Calmon, Carmela Troncoso

Abstract: Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy… ▽ More Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy budget $\varepsilon$. This privacy budget is in turn interpreted in terms of operational attack risks, such as accuracy, sensitivity, and specificity of inference attacks aimed to recover information about the training data records. We show that first calibrating the noise scale to a privacy budget $\varepsilon$, and then translating ε to attack risk leads to overly conservative risk assessments and unnecessarily low utility. Instead, we propose methods to directly calibrate the noise scale to a desired attack risk level, bypassing the step of choosing $\varepsilon$. For a given notion of attack risk, our approach significantly decreases noise scale, leading to increased utility at the same level of privacy. We empirically demonstrate that calibrating noise to attack sensitivity/specificity, rather than $\varepsilon$, when training privacy-preserving ML models substantially improves model accuracy for the same risk level. Our work provides a principled and practical way to improve the utility of privacy-preserving ML without compromising on privacy. The code is available at https://github.com/Felipe-Gomez/riskcal △ Less

Submitted 7 November, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Appears in NeurIPS 2024

arXiv:2402.12235 [pdf, other]

The Fundamental Limits of Least-Privilege Learning

Authors: Theresa Stadler, Bogdan Kulynych, Michael C. Gastpar, Nicolas Papernot, Carmela Troncoso

Abstract: The promise of least-privilege learning -- to find feature representations that are useful for a learning task but prevent inference of any sensitive information unrelated to this task -- is highly appealing. However, so far this concept has only been stated informally. It thus remains an open question whether and how we can achieve this goal. In this work, we provide the first formalisation of th… ▽ More The promise of least-privilege learning -- to find feature representations that are useful for a learning task but prevent inference of any sensitive information unrelated to this task -- is highly appealing. However, so far this concept has only been stated informally. It thus remains an open question whether and how we can achieve this goal. In this work, we provide the first formalisation of the least-privilege principle for machine learning and characterise its feasibility. We prove that there is a fundamental trade-off between a representation's utility for a given task and its leakage beyond the intended task: it is not possible to learn representations that have high utility for the intended task but, at the same time prevent inference of any attribute other than the task label itself. This trade-off holds under realistic assumptions on the data distribution and regardless of the technique used to learn the feature mappings that produce these representations. We empirically validate this result for a wide range of learning techniques, model architectures, and datasets. △ Less

Submitted 26 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2308.12820 [pdf, other]

Prediction without Preclusion: Recourse Verification with Reachable Sets

Authors: Avni Kothari, Bogdan Kulynych, Tsui-Wei Weng, Berk Ustun

Abstract: Machine learning models are often used to decide who receives a loan, a job interview, or a public benefit. Models in such settings use features without considering their actionability. As a result, they can assign predictions that are fixed $-$ meaning that individuals who are denied loans and interviews are, in fact, precluded from access to credit and employment. In this work, we introduce a pr… ▽ More Machine learning models are often used to decide who receives a loan, a job interview, or a public benefit. Models in such settings use features without considering their actionability. As a result, they can assign predictions that are fixed $-$ meaning that individuals who are denied loans and interviews are, in fact, precluded from access to credit and employment. In this work, we introduce a procedure called recourse verification to test if a model assigns fixed predictions to its decision subjects. We propose a model-agnostic approach for recourse verification with reachable sets $-$ i.e., the set of all points that a person can reach through their actions in feature space. We develop methods to construct reachable sets for discrete feature spaces, which can certify the responsiveness of any model by simply querying its predictions. We conduct a comprehensive empirical study on the infeasibility of recourse on datasets from consumer finance. Our results highlight how models can inadvertently preclude access by assigning fixed predictions and underscore the need to account for actionability in model development. △ Less

Submitted 1 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: ICLR 2024 Spotlight. The first two authors contributed equally

arXiv:2302.14517 [pdf, other]

doi 10.1145/3593013.3594103

Arbitrary Decisions are a Hidden Cost of Differentially Private Training

Authors: Bogdan Kulynych, Hsiang Hsu, Carmela Troncoso, Flavio P. Calmon

Abstract: Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output… ▽ More Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output predicted by equally-private models depends on the randomness used in training. Thus, for a given input, the predicted output can vary drastically if a model is re-trained, even if the same training dataset is used. The predictive-multiplicity cost of DP training has not been studied, and is currently neither audited for nor communicated to model designers and stakeholders. We derive a bound on the number of re-trainings required to estimate predictive multiplicity reliably. We analyze--both theoretically and through extensive experiments--the predictive-multiplicity cost of three DP-ensuring algorithms: output perturbation, objective perturbation, and DP-SGD. We demonstrate that the degree of predictive multiplicity rises as the level of privacy increases, and is unevenly distributed across individuals and demographic groups in the data. Because randomness used to ensure DP during training explains predictions for some examples, our results highlight a fundamental challenge to the justifiability of decisions supported by differentially private models in high-stakes settings. We conclude that practitioners should audit the predictive multiplicity of their DP-ensuring algorithms before deploying them in applications of individual-level consequence. △ Less

Submitted 15 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: To appear in ACM FAccT 2023

arXiv:2208.13058 [pdf, other]

Adversarial Robustness for Tabular Data through Cost and Utility Awareness

Authors: Klim Kireev, Bogdan Kulynych, Carmela Troncoso

Abstract: Many safety-critical applications of machine learning, such as fraud or abuse detection, use data in tabular domains. Adversarial examples can be particularly damaging for these applications. Yet, existing works on adversarial robustness primarily focus on machine-learning models in image and text domains. We argue that, due to the differences between tabular data and images or text, existing thre… ▽ More Many safety-critical applications of machine learning, such as fraud or abuse detection, use data in tabular domains. Adversarial examples can be particularly damaging for these applications. Yet, existing works on adversarial robustness primarily focus on machine-learning models in image and text domains. We argue that, due to the differences between tabular data and images or text, existing threat models are not suitable for tabular domains. These models do not capture that the costs of an attack could be more significant than imperceptibility, or that the adversary could assign different values to the utility obtained from deploying different adversarial examples. We demonstrate that, due to these differences, the attack and defense methods used for images and text cannot be directly applied to tabular settings. We address these issues by proposing new cost and utility-aware threat models that are tailored to the adversarial capabilities and constraints of attackers targeting tabular domains. We introduce a framework that enables us to design attack and defense mechanisms that result in models protected against cost and utility-aware adversaries, for example, adversaries constrained by a certain financial budget. We show that our approach is effective on three datasets corresponding to applications for which adversarial examples can have economic and social implications. △ Less

Submitted 24 February, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

Comments: The first two authors contributed equally. To appear in the proceedings of NDSS 2023

arXiv:2204.03230 [pdf, other]

What You See is What You Get: Principled Deep Learning via Distributional Generalization

Authors: Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran

Abstract: Having similar behavior at training time and test time $-$ what we call a "What You See Is What You Get" (WYSIWYG) property $-$ is desirable in machine learning. Models trained with standard stochastic gradient descent (SGD), however, do not necessarily have this property, as their complex behaviors such as robustness or subgroup performance can differ drastically between training and test time. I… ▽ More Having similar behavior at training time and test time $-$ what we call a "What You See Is What You Get" (WYSIWYG) property $-$ is desirable in machine learning. Models trained with standard stochastic gradient descent (SGD), however, do not necessarily have this property, as their complex behaviors such as robustness or subgroup performance can differ drastically between training and test time. In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization. Applying this connection, we introduce new conceptual tools for designing deep-learning methods by reducing generalization concerns to optimization ones: to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the training data. By applying this novel design principle, which bypasses "pathologies" of SGD, we construct simple algorithms that are competitive with SOTA in several distributional-robustness applications, significantly improve the privacy vs. disparate impact trade-off of DP-SGD, and mitigate robust overfitting in adversarial training. Finally, we also improve on theoretical bounds relating DP, stability, and distributional generalization. △ Less

Submitted 17 October, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: First two authors contributed equally. To appear in NeurIPS 2022

arXiv:2107.10302 [pdf, other]

Adversarial for Good? How the Adversarial ML Community's Values Impede Socially Beneficial Uses of Attacks

Authors: Kendra Albert, Maggie Delano, Bogdan Kulynych, Ram Shankar Siva Kumar

Abstract: Attacks from adversarial machine learning (ML) have the potential to be used "for good": they can be used to run counter to the existing power structures within ML, creating breathing space for those who would otherwise be the targets of surveillance and control. But most research on adversarial ML has not engaged in developing tools for resistance against ML systems. Why? In this paper, we review… ▽ More Attacks from adversarial machine learning (ML) have the potential to be used "for good": they can be used to run counter to the existing power structures within ML, creating breathing space for those who would otherwise be the targets of surveillance and control. But most research on adversarial ML has not engaged in developing tools for resistance against ML systems. Why? In this paper, we review the broader impact statements that adversarial ML researchers wrote as part of their NeurIPS 2020 papers and assess the assumptions that authors have about the goals of their work. We also collect information about how authors view their work's impact more generally. We find that most adversarial ML researchers at NeurIPS hold two fundamental assumptions that will make it difficult for them to consider socially beneficial uses of attacks: (1) it is desirable to make systems robust, independent of context, and (2) attackers of systems are normatively bad and defenders of systems are normatively good. That is, despite their expressed and supposed neutrality, most adversarial ML researchers believe that the goal of their work is to secure systems, making it difficult to conceptualize and build tools for disrupting the status quo. △ Less

Submitted 15 September, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

Comments: Author list is ordered alphabetically as there is equal contribution. 4 pages Accepted by the ICML 2021 workshop on "A Blessing in Disguise:The Prospects and Perils of Adversarial Machine Learning"

arXiv:2107.01824 [pdf, other]

Exploring Data Pipelines through the Process Lens: a Reference Model forComputer Vision

Authors: Agathe Balayn, Bogdan Kulynych, Seda Guerses

Abstract: Researchers have identified datasets used for training computer vision (CV) models as an important source of hazardous outcomes, and continue to examine popular CV datasets to expose their harms. These works tend to treat datasets as objects, or focus on particular steps in data production pipelines. We argue here that we could further systematize our analysis of harms by examining CV data pipelin… ▽ More Researchers have identified datasets used for training computer vision (CV) models as an important source of hazardous outcomes, and continue to examine popular CV datasets to expose their harms. These works tend to treat datasets as objects, or focus on particular steps in data production pipelines. We argue here that we could further systematize our analysis of harms by examining CV data pipelines through a process-oriented lens that captures the creation, the evolution and use of these datasets. As a step towards cultivating a process-oriented lens, we embarked on an empirical study of CV data pipelines informed by the field of method engineering. We present here a preliminary result: a reference model of CV data pipelines. Besides exploring the questions that this endeavor raises, we discuss how the process lens could support researchers in discovering understudied issues, and could help practitioners in making their processes more transparent. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: Presented at the CVPR workshop 2021 Beyond Fair Computer Vision

arXiv:1911.02459 [pdf, other]

doi 10.1145/3338498.3358653

zksk: A Library for Composable Zero-Knowledge Proofs

Authors: Wouter Lueks, Bogdan Kulynych, Jules Fasquelle, Simon Le Bail-Collet, Carmela Troncoso

Abstract: Zero-knowledge proofs are an essential building block in many privacy-preserving systems. However, implementing these proofs is tedious and error-prone. In this paper, we present zksk, a well-documented Python library for defining and computing sigma protocols: the most popular class of zero-knowledge proofs. In zksk, proofs compose: programmers can convert smaller proofs into building blocks that… ▽ More Zero-knowledge proofs are an essential building block in many privacy-preserving systems. However, implementing these proofs is tedious and error-prone. In this paper, we present zksk, a well-documented Python library for defining and computing sigma protocols: the most popular class of zero-knowledge proofs. In zksk, proofs compose: programmers can convert smaller proofs into building blocks that then can be combined into bigger proofs. zksk features a modern Python-based domain-specific language. This makes possible to define proofs without learning a new custom language, and to benefit from the rich Python syntax and ecosystem. The library is available at https://github.com/spring-epfl/zksk △ Less

Submitted 10 November, 2019; v1 submitted 6 November, 2019; originally announced November 2019.

Comments: Appears in 2019 Workshop on Privacy in the Electronic Society (WPES'19)

arXiv:1906.00389 [pdf, other]

Disparate Vulnerability to Membership Inference Attacks

Authors: Bogdan Kulynych, Mohammad Yaghini, Giovanni Cherubin, Michael Veale, Carmela Troncoso

Abstract: A membership inference attack (MIA) against a machine-learning model enables an attacker to determine whether a given data record was part of the model's training data or not. In this paper, we provide an in-depth study of the phenomenon of disparate vulnerability against MIAs: unequal success rate of MIAs against different population subgroups. We first establish necessary and sufficient conditio… ▽ More A membership inference attack (MIA) against a machine-learning model enables an attacker to determine whether a given data record was part of the model's training data or not. In this paper, we provide an in-depth study of the phenomenon of disparate vulnerability against MIAs: unequal success rate of MIAs against different population subgroups. We first establish necessary and sufficient conditions for MIAs to be prevented, both on average and for population subgroups, using a notion of distributional generalization. Second, we derive connections of disparate vulnerability to algorithmic fairness and to differential privacy. We show that fairness can only prevent disparate vulnerability against limited classes of adversaries. Differential privacy bounds disparate vulnerability but can significantly reduce the accuracy of the model. We show that estimating disparate vulnerability to MIAs by naïvely applying existing attacks can lead to overestimation. We then establish which attacks are suitable for estimating disparate vulnerability, and provide a statistical framework for doing so reliably. We conduct experiments on synthetic and real-world data finding statistically significant evidence of disparate vulnerability in realistic settings. The code is available at https://github.com/spring-epfl/disparate-vulnerability △ Less

Submitted 16 September, 2021; v1 submitted 2 June, 2019; originally announced June 2019.

Comments: To appear in Privacy-Enhancing Technologies Symposium (PETS) 2022. This version has an updated authors list

arXiv:1811.11293 [pdf, other]

Questioning the assumptions behind fairness solutions

Authors: Rebekah Overdorf, Bogdan Kulynych, Ero Balsa, Carmela Troncoso, Seda Gürses

Abstract: In addition to their benefits, optimization systems can have negative economic, moral, social, and political effects on populations as well as their environments. Frameworks like fairness have been proposed to aid service providers in addressing subsequent bias and discrimination during data collection and algorithm design. However, recent reports of neglect, unresponsiveness, and malevolence cast… ▽ More In addition to their benefits, optimization systems can have negative economic, moral, social, and political effects on populations as well as their environments. Frameworks like fairness have been proposed to aid service providers in addressing subsequent bias and discrimination during data collection and algorithm design. However, recent reports of neglect, unresponsiveness, and malevolence cast doubt on whether service providers can effectively implement fairness solutions. These reports invite us to revisit assumptions made about the service providers in fairness solutions. Namely, that service providers have (i) the incentives or (ii) the means to mitigate optimization externalities. Moreover, the environmental impact of these systems suggests that we need (iii) novel frameworks that consider systems other than algorithmic decision-making and recommender systems, and (iv) solutions that go beyond removing related algorithmic biases. Going forward, we propose Protective Optimization Technologies that enable optimization subjects to defend against negative consequences of optimization systems. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: Presented at Critiquing and Correcting Trends in Machine Learning (NeurIPS 2018 Workshop), Montreal, Canada. This is a short version of arXiv:1806.02711

arXiv:1810.10939 [pdf, other]

Evading classifiers in discrete domains with provable optimality guarantees

Authors: Bogdan Kulynych, Jamie Hayes, Nikita Samarin, Carmela Troncoso

Abstract: Machine-learning models for security-critical applications such as bot, malware, or spam detection, operate in constrained discrete domains. These applications would benefit from having provable guarantees against adversarial examples. The existing literature on provable adversarial robustness of models, however, exclusively focuses on robustness to gradient-based attacks in domains such as images… ▽ More Machine-learning models for security-critical applications such as bot, malware, or spam detection, operate in constrained discrete domains. These applications would benefit from having provable guarantees against adversarial examples. The existing literature on provable adversarial robustness of models, however, exclusively focuses on robustness to gradient-based attacks in domains such as images. These attacks model the adversarial cost, e.g., amount of distortion applied to an image, as a $p$-norm. We argue that this approach is not well-suited to model adversarial costs in constrained domains where not all examples are feasible. We introduce a graphical framework that (1) generalizes existing attacks in discrete domains, (2) can accommodate complex cost functions beyond $p$-norms, including financial cost incurred when attacking a classifier, and (3) efficiently produces valid adversarial examples with guarantees of minimal adversarial cost. These guarantees directly translate into a notion of adversarial robustness that takes into account domain constraints and the adversary's capabilities. We show how our framework can be used to evaluate security by crafting adversarial examples that evade a Twitter-bot detection classifier with provably minimal number of changes; and to build privacy defenses by crafting adversarial examples that evade a privacy-invasive website-fingerprinting classifier. △ Less

Submitted 1 July, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: NeurIPS 2018 Workshop on Security in Machine Learning

arXiv:1806.02711 [pdf, other]

doi 10.1145/3351095.3372853

POTs: Protective Optimization Technologies

Authors: Bogdan Kulynych, Rebekah Overdorf, Carmela Troncoso, Seda Gürses

Abstract: Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness f… ▽ More Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness frameworks' ability to capture a variety of harms caused by systems. We characterize fairness limitations using concepts from requirements engineering and from social sciences. We show that the focus on algorithms' inputs and outputs misses harms that arise from systems interacting with the world; that the focus on bias and discrimination omits broader harms on populations and their environments; and that relying on service providers excludes scenarios where they are not cooperative or intentionally adversarial. We propose Protective Optimization Technologies (POTs). POTs provide means for affected parties to address the negative impacts of systems in the environment, expanding avenues for political contestation. POTs intervene from outside the system, do not require service providers to cooperate, and can serve to correct, shift, or expose harms that systems impose on populations and their environments. We illustrate the potential and limitations of POTs in two case studies: countering road congestion caused by traffic-beating applications, and recalibrating credit scoring for loan applicants. △ Less

Submitted 26 January, 2020; v1 submitted 7 June, 2018; originally announced June 2018.

Comments: Appears in Conference on Fairness, Accountability, and Transparency (FAT* 2020). Bogdan Kulynych and Rebekah Overdorf contributed equally to this work. Version v1/v2 by Seda Gürses, Rebekah Overdorf, and Ero Balsa was presented at HotPETS 2018 and at PiMLAI 2018

arXiv:1711.04992 [pdf, other]

Feature importance scores and lossless feature pruning using Banzhaf power indices

Authors: Bogdan Kulynych, Carmela Troncoso

Abstract: Understanding the influence of features in machine learning is crucial to interpreting models and selecting the best features for classification. In this work we propose the use of principles from coalitional game theory to reason about importance of features. In particular, we propose the use of the Banzhaf power index as a measure of influence of features on the outcome of a classifier. We show… ▽ More Understanding the influence of features in machine learning is crucial to interpreting models and selecting the best features for classification. In this work we propose the use of principles from coalitional game theory to reason about importance of features. In particular, we propose the use of the Banzhaf power index as a measure of influence of features on the outcome of a classifier. We show that features having Banzhaf power index of zero can be losslessly pruned without damage to classifier accuracy. Computing the power indices does not require having access to data samples. However, if samples are available, the indices can be empirically estimated. We compute Banzhaf power indices for a neural network classifier on real-life data, and compare the results with gradient-based feature saliency, and coefficients of a logistic regression model with $L_1$ regularization. △ Less

Submitted 3 December, 2017; v1 submitted 14 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

arXiv:1707.06279 [pdf, other]

doi 10.1145/3267323.3268947

ClaimChain: Improving the Security and Privacy of In-band Key Distribution for Messaging

Authors: Bogdan Kulynych, Wouter Lueks, Marios Isaakidis, George Danezis, Carmela Troncoso

Abstract: The social demand for email end-to-end encryption is barely supported by mainstream service providers. Autocrypt is a new community-driven open specification for e-mail encryption that attempts to respond to this demand. In Autocrypt the encryption keys are attached directly to messages, and thus the encryption can be implemented by email clients without any collaboration of the providers. The dec… ▽ More The social demand for email end-to-end encryption is barely supported by mainstream service providers. Autocrypt is a new community-driven open specification for e-mail encryption that attempts to respond to this demand. In Autocrypt the encryption keys are attached directly to messages, and thus the encryption can be implemented by email clients without any collaboration of the providers. The decentralized nature of this in-band key distribution, however, makes it prone to man-in-the-middle attacks and can leak the social graph of users. To address this problem we introduce ClaimChain, a cryptographic construction for privacy-preserving authentication of public keys. Users store claims about their identities and keys, as well as their beliefs about others, in ClaimChains. These chains form authenticated decentralized repositories that enable users to prove the authenticity of both their keys and the keys of their contacts. ClaimChains are encrypted, and therefore protect the stored information, such as keys and contact identities, from prying eyes. At the same time, ClaimChain implements mechanisms to provide strong non-equivocation properties, discouraging malicious actors from distributing conflicting or inauthentic claims. We implemented ClaimChain and we show that it offers reasonable performance, low overhead, and authenticity guarantees. △ Less

Submitted 12 October, 2018; v1 submitted 19 July, 2017; originally announced July 2017.

Comments: Appears in 2018 Workshop on Privacy in the Electronic Society (WPES'18)

Showing 1–17 of 17 results for author: Kulynych, B