-
COVID-19 Datathon Based on Deidentified Governmental Data as an Approach for Solving Policy Challenges, Increasing Trust, and Building a Community: Case Study
Authors:
Mor Peleg,
Amnon Reichman,
Sivan Shachar,
Tamir Gadot,
Meytal Avgil Tsadok,
Maya Azaria,
Orr Dunkelman,
Shiri Hassid,
Daniella Partem,
Maya Shmailov,
Elad Yom-Tov,
Roy Cohen
Abstract:
Triggered by the COVID-19 crisis, Israel's Ministry of Health (MoH) held a virtual Datathon based on deidentified governmental data. Organized by a multidisciplinary committee, Israel's research community was invited to offer insights to COVID-19 policy challenges. The Datathon was designed to (1) develop operationalizable data-driven models to address COVID-19 health-policy challenges and (2) bui…
▽ More
Triggered by the COVID-19 crisis, Israel's Ministry of Health (MoH) held a virtual Datathon based on deidentified governmental data. Organized by a multidisciplinary committee, Israel's research community was invited to offer insights to COVID-19 policy challenges. The Datathon was designed to (1) develop operationalizable data-driven models to address COVID-19 health-policy challenges and (2) build a community of researchers from academia, industry, and government and rebuild their trust in the government. Three specific challenges were defined based on their relevance (significance, data availability, and potential to anonymize the data): immunization policies, special needs of the young population, and populations whose rate of compliance with COVID-19 testing is low. The MoH team extracted diverse, reliable, up-to-date, and deidentified governmental datasets for each challenge. Secure remote-access research environments with relevant data science tools were set on Amazon Web. The MoH screened the applicants and accepted around 80 participants, teaming them to balance areas of expertise as well as represent all sectors of the community. One week following the event, anonymous surveys for participants and mentors were distributed to assess overall usefulness and points for improvement. The 48-hour Datathon and pre-event sessions included 18 multidisciplinary teams, mentored by 20 data scientists, 6 epidemiologists, 5 presentation mentors, and 12 judges. The insights developed by the 3 winning teams are currently considered by the MoH as potential data science methods relevant for national policies. The most positive results were increased trust in the MoH and greater readiness to work with the government on these or future projects. Detailed feedback offered concrete lessons for improving the structure and organization of future government-led datathons.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Fuzzy Commitments Offer Insufficient Protection to Biometric Templates Produced by Deep Learning
Authors:
Danny Keller,
Margarita Osadchy,
Orr Dunkelman
Abstract:
In this work, we study the protection that fuzzy commitments offer when they are applied to facial images, processed by the state of the art deep learning facial recognition systems. We show that while these systems are capable of producing great accuracy, they produce templates of too little entropy. As a result, we present a reconstruction attack that takes a protected template, and reconstructs…
▽ More
In this work, we study the protection that fuzzy commitments offer when they are applied to facial images, processed by the state of the art deep learning facial recognition systems. We show that while these systems are capable of producing great accuracy, they produce templates of too little entropy. As a result, we present a reconstruction attack that takes a protected template, and reconstructs a facial image. The reconstructed facial images greatly resemble the original ones. In the simplest attack scenario, more than 78% of these reconstructed templates succeed in unlocking an account (when the system is configured to 0.1% FAR). Even in the "hardest" settings (in which we take a reconstructed image from one system and use it in a different system, with different feature extraction process) the reconstructed image offers 50 to 120 times higher success rates than the system's FAR.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Consistent High Dimensional Rounding with Side Information
Authors:
Orr Dunkelman,
Zeev Geyzel,
Chaya Keller,
Nathan Keller,
Eyal Ronen,
Adi Shamir,
Ran J. Tessler
Abstract:
In standard rounding, we want to map each value $X$ in a large continuous space (e.g., $R$) to a nearby point $P$ from a discrete subset (e.g., $Z$). This process seems to be inherently discontinuous in the sense that two consecutive noisy measurements $X_1$ and $X_2$ of the same value may be extremely close to each other and yet they can be rounded to different points $P_1\ne P_2$, which is undes…
▽ More
In standard rounding, we want to map each value $X$ in a large continuous space (e.g., $R$) to a nearby point $P$ from a discrete subset (e.g., $Z$). This process seems to be inherently discontinuous in the sense that two consecutive noisy measurements $X_1$ and $X_2$ of the same value may be extremely close to each other and yet they can be rounded to different points $P_1\ne P_2$, which is undesirable in many applications. In this paper we show how to make the rounding process perfectly continuous in the sense that it maps any pair of sufficiently close measurements to the same point. We call such a process consistent rounding, and make it possible by allowing a small amount of information about the first measurement $X_1$ to be unidirectionally communicated to and used by the rounding process of $X_2$.
The fault tolerance of a consistent rounding scheme is defined by the maximum distance between pairs of measurements which guarantees that they are always rounded to the same point, and our goal is to study the possible tradeoffs between the amount of information provided and the achievable fault tolerance for various types of spaces. When the measurements $X_i$ are arbitrary vectors in $R^d$, we show that communicating $\log_2(d+1)$ bits of information is both sufficient and necessary (in the worst case) in order to achieve consistent rounding for some positive fault tolerance, and when d=3 we obtain a tight upper and lower asymptotic bound of $(0.561+o(1))k^{1/3}$ on the achievable fault tolerance when we reveal $\log_2(k)$ bits of information about how $X_1$ was rounded. We analyze the problem by considering the possible colored tilings of the space with $k$ available colors, and obtain our upper and lower bounds with a variety of mathematical techniques including isoperimetric inequalities, the Brunn-Minkowski theorem, sphere packing bounds, and Čech cohomology.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
DNS-Morph: UDP-Based Bootstrapping Protocol For Tor
Authors:
Rami Ailabouni,
Orr Dunkelman,
Sara Bitan
Abstract:
Tor is one of the most popular systems for anonymous communication and censorship circumvention on the web, currently used by millions of users every day. This puts Tor as a target for attacks by organizations and governmental bodies whose goal is to hinder users' ability to connect to it. These attacks include deep packet inspection (DPI) to classify Tor traffic as well as legitimate Tor client i…
▽ More
Tor is one of the most popular systems for anonymous communication and censorship circumvention on the web, currently used by millions of users every day. This puts Tor as a target for attacks by organizations and governmental bodies whose goal is to hinder users' ability to connect to it. These attacks include deep packet inspection (DPI) to classify Tor traffic as well as legitimate Tor client impersonation (active probing) to expose Tor bridges. As a response to Tor-blocking attempts, the Tor community has developed Pluggable Transports (PTs), tools that transform the appearance of Tor's traffic flow. In this paper we introduce a new approach aiming to enhance the PT's resistance against active probing attacks, as well as white-listing censorship by partitioning the handshake of the PT from its encrypted communication. Thus, allowing mixing different PTs, e.g., ScrambleSuit for the handshake and FTE for the traffic itself. We claim that this separation reduces the possibility of marking Tor related communications. To illustrate our claim, we introduce DNS-Morph: a new method of transforming the handshake data of a PT by imitating a sequence of DNS queries and responses. Using DNS-Morph, the Tor client acts as a DNS client which sends DNS queries to the Tor bridge, and receives DNS responses from it. We implemented and successfully tested DNS-Morph using one of the PTs (ScrambleSuit), and verified its capabilities.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance
Authors:
Adi Shamir,
Itay Safran,
Eyal Ronen,
Orr Dunkelman
Abstract:
The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing properties of neural networks". Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in ta…
▽ More
The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing properties of neural networks". Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of $\mathbb{R}^n$ with the $L_0$ (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly $m$ in arbitrarily deep neural networks which are designed to distinguish between $m$ input classes.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
Tight Bounds on Online Checkpointing Algorithms
Authors:
Achiya Bar-On,
Itai Dinur,
Orr Dunkelman,
Rani Hod,
Nathan Keller,
Eyal Ronen,
Adi Shamir
Abstract:
The problem of online checkpointing is a classical problem with numerous applications which had been studied in various forms for almost 50 years. In the simplest version of this problem, a user has to maintain $k$ memorized checkpoints during a long computation, where the only allowed operation is to move one of the checkpoints from its old time to the current time, and his goal is to keep the ch…
▽ More
The problem of online checkpointing is a classical problem with numerous applications which had been studied in various forms for almost 50 years. In the simplest version of this problem, a user has to maintain $k$ memorized checkpoints during a long computation, where the only allowed operation is to move one of the checkpoints from its old time to the current time, and his goal is to keep the checkpoints as evenly spread out as possible at all times.
Bringmann et al. studied this problem as a special case of an online/offline optimization problem in which the deviation from uniformity is measured by the natural discrepancy metric of the worst case ratio between real and ideal segment lengths. They showed this discrepancy is smaller than $1.59-o(1)$ for all $k$, and smaller than $\ln4-o(1)\approx1.39$ for the sparse subset of $k$'s which are powers of 2. In addition, they obtained upper bounds on the achievable discrepancy for some small values of $k$.
In this paper we solve the main problems left open in the above-mentioned paper by proving that $\ln4$ is a tight upper and lower bound on the asymptotic discrepancy for all large $k$, and by providing tight upper and lower bounds (in the form of provably optimal checkpointing algorithms, some of which are in fact better than those of Bringmann et al.) for all the small values of $k \leq 10$.
In the last part of the paper we describe some new applications of this online checkpointing problem.
△ Less
Submitted 19 June, 2019; v1 submitted 9 April, 2017;
originally announced April 2017.
-
HoneyFaces: Increasing the Security and Privacy of Authentication Using Synthetic Facial Images
Authors:
Mor Ohana,
Orr Dunkelman,
Stuart Gibson,
Margarita Osadchy
Abstract:
One of the main challenges faced by Biometric-based authentication systems is the need to offer secure authentication while maintaining the privacy of the biometric data. Previous solutions, such as Secure Sketch and Fuzzy Extractors, rely on assumptions that cannot be guaranteed in practice, and often affect the authentication accuracy.
In this paper, we introduce HoneyFaces: the concept of add…
▽ More
One of the main challenges faced by Biometric-based authentication systems is the need to offer secure authentication while maintaining the privacy of the biometric data. Previous solutions, such as Secure Sketch and Fuzzy Extractors, rely on assumptions that cannot be guaranteed in practice, and often affect the authentication accuracy.
In this paper, we introduce HoneyFaces: the concept of adding a large set of synthetic faces (indistinguishable from real) into the biometric "password file". This password inflation protects the privacy of users and increases the security of the system without affecting the accuracy of the authentication. In particular, privacy for the real users is provided by "hiding" them among a large number of fake users (as the distributions of synthetic and real faces are equal). In addition to maintaining the authentication accuracy, and thus not affecting the security of the authentication process, HoneyFaces offer several security improvements: increased exfiltration hardness, improved leakage detection, and the ability to use a Two-server setting like in HoneyWords. Finally, HoneyFaces can be combined with other security and privacy mechanisms for biometric data.
We implemented the HoneyFaces system and tested it with a password file composed of 270 real users. The "password file" was then inflated to accommodate up to $2^{36.5}$ users (resulting in a 56.6 TB "password file"). At the same time, the inclusion of additional faces does not affect the true acceptance rate or false acceptance rate which were 93.33\% and 0.01\%, respectively.
△ Less
Submitted 11 November, 2016;
originally announced November 2016.
-
Breaching the Privacy of Israel's Paper Ballot Voting System
Authors:
Tomer Ashur,
Orr Dunkelman,
Nimrod Talmon
Abstract:
An election is a process through which citizens in liberal democracies select their governing bodies, usually through voting. For elections to be truly honest, people must be able to vote freely without being subject to coercion; that is why voting is usually done in a private manner. In this paper we analyze the security offered by a paper-ballot voting system that is used in Israel, as well as i…
▽ More
An election is a process through which citizens in liberal democracies select their governing bodies, usually through voting. For elections to be truly honest, people must be able to vote freely without being subject to coercion; that is why voting is usually done in a private manner. In this paper we analyze the security offered by a paper-ballot voting system that is used in Israel, as well as in several other countries around the world. we provide an algorithm which, based on publicly available information, breaks the privacy of the voters participating in such elections. Simulations based on real data collected in Israel show that our algorithm performs well, and can correctly recover the vote of up to 96% of the voters.
△ Less
Submitted 29 August, 2016;
originally announced August 2016.