-
Teaching an Old Dog New Tricks: Verifiable FHE Using Commodity Hardware
Authors:
Jules Drean,
Fisher Jepsen,
Edward Suh,
Srini Devadas,
Aamer Jaleel,
Gururaj Saileshwar
Abstract:
We present Argos, a simple approach for adding verifiability to fully homomorphic encryption (FHE) schemes using trusted hardware. Traditional approaches to verifiable FHE require expensive cryptographic proofs, which incur an overhead of up to seven orders of magnitude on top of FHE, making them impractical.
With Argos, we show that trusted hardware can be securely used to provide verifiability…
▽ More
We present Argos, a simple approach for adding verifiability to fully homomorphic encryption (FHE) schemes using trusted hardware. Traditional approaches to verifiable FHE require expensive cryptographic proofs, which incur an overhead of up to seven orders of magnitude on top of FHE, making them impractical.
With Argos, we show that trusted hardware can be securely used to provide verifiability for FHE computations, with minimal overhead relative to the baseline FHE computation. An important contribution of Argos is showing that the major security pitfall associated with trusted hardware, microarchitectural side channels, can be completely mitigated by excluding any secrets from the CPU and the memory hierarchy. This is made possible by focusing on building a platform that only enforces program and data integrity and not confidentiality. All secrets related to the attestation mechanism are kept in a separate coprocessor (e.g., a TPM)-inaccessible to any software-based attacker. Relying on a discrete TPM typically incurs significant performance overhead, which is why (insecure) software-based TPMs are used in practice. As a second contribution, we show that for FHE applications, the attestation protocol can be adapted to only incur a fixed cost.
Argos requires no dedicated hardware extensions and is supported on commodity processors from 2008 onward. Our prototype implementation introduces 3% overhead for FHE evaluation, and 8% for more complex protocols. In particular, we show that Argos can be used for real-world applications of FHE, such as private information retrieval (PIR) and private set intersection (PSI), where providing verifiability is imperative. By demonstrating how to combine cryptography with trusted hardware, Argos paves the way for widespread deployment of FHE-based protocols beyond the semi-honest setting, without the overhead of cryptographic proofs.
△ Less
Submitted 24 March, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
Higher-weight Jacobians
Authors:
Sheela Devadas,
Max Lieblich
Abstract:
We define and study Jacobians of Hodge structures with weight greater than 1. Jacobians of weight 2 naturally come up in the context of the Brauer group and the Tate conjecture. They were previously studied in a special case by Beauville in his work on surfaces of maximal Picard number, and are related to the work of Totaro on Hodge structures with no middle pieces. Higher-weight Jacobians are com…
▽ More
We define and study Jacobians of Hodge structures with weight greater than 1. Jacobians of weight 2 naturally come up in the context of the Brauer group and the Tate conjecture. They were previously studied in a special case by Beauville in his work on surfaces of maximal Picard number, and are related to the work of Totaro on Hodge structures with no middle pieces. Higher-weight Jacobians are complex tori, and it is generally quite difficult to tell if they are algebraic. After discussing some general theory, we compute numerous examples of Jacobians of various weights for special classes of varieties: abelian varieties of maximal Picard number, Kummer varieties, and singular K3 surfaces. It turns out that all of these Jacobians are algebraic. We compute their fields of definition.
△ Less
Submitted 9 February, 2025; v1 submitted 22 August, 2024;
originally announced August 2024.
-
Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep Learning
Authors:
Hanshen Xiao,
Jun Wan,
Srinivas Devadas
Abstract:
We study the fundamental problem of the construction of optimal randomization in Differential Privacy. Depending on the clipping strategy or additional properties of the processing function, the corresponding sensitivity set theoretically determines the necessary randomization to produce the required security parameters. Towards the optimal utility-privacy tradeoff, finding the minimal perturbatio…
▽ More
We study the fundamental problem of the construction of optimal randomization in Differential Privacy. Depending on the clipping strategy or additional properties of the processing function, the corresponding sensitivity set theoretically determines the necessary randomization to produce the required security parameters. Towards the optimal utility-privacy tradeoff, finding the minimal perturbation for properly-selected sensitivity sets stands as a central problem in DP research. In practice, l_2/l_1-norm clippings with Gaussian/Laplace noise mechanisms are among the most common setups. However, they also suffer from the curse of dimensionality. For more generic clipping strategies, the understanding of the optimal noise for a high-dimensional sensitivity set remains limited.
In this paper, we revisit the geometry of high-dimensional sensitivity sets and present a series of results to characterize the non-asymptotically optimal Gaussian noise for Rényi DP (RDP). Our results are both negative and positive: on one hand, we show the curse of dimensionality is tight for a broad class of sensitivity sets satisfying certain symmetry properties; but if, fortunately, the representation of the sensitivity set is asymmetric on some group of orthogonal bases, we show the optimal noise bounds need not be explicitly dependent on either dimension or rank. We also revisit sampling in the high-dimensional scenario, which is the key for both privacy amplification and computation efficiency in large-scale data processing. We propose a novel method, termed twice sampling, which implements both sample-wise and coordinate-wise sampling, to enable Gaussian noises to fit the sensitivity geometry more closely. With closed-form RDP analysis, we prove twice sampling produces asymptotic improvement of the privacy amplification given an additional infinity-norm restriction, especially for small sampling rate.
△ Less
Submitted 28 September, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Citadel: Simple Spectre-Safe Isolation For Real-World Programs That Share Memory
Authors:
Jules Drean,
Miguel Gomez-Garcia,
Fisher Jepsen,
Thomas Bourgeat,
Srinivas Devadas
Abstract:
Transient execution side-channel attacks, such as Spectre, have been shown to break almost all isolation primitives. We introduce a new security property we call relaxed microarchitectural isolation (RMI) that allows sensitive programs that are not-constant-time to share memory with an attacker while restricting the information leakage to that of non-speculative execution. Although this type of sp…
▽ More
Transient execution side-channel attacks, such as Spectre, have been shown to break almost all isolation primitives. We introduce a new security property we call relaxed microarchitectural isolation (RMI) that allows sensitive programs that are not-constant-time to share memory with an attacker while restricting the information leakage to that of non-speculative execution. Although this type of speculative security property is typically challenging to enforce, we show that we can leverage the enclave setup to achieve it. In particular, we use microarchitectural isolation to restrict attacker's observations in conjunction with straightforward hardware mechanisms to limit speculation. This new design point presents a compelling trade-off between security, usability, and performance, making it possible to efficiently enforce RMI for any program. We demonstrate our approach by implementing and evaluating two simple defense mechanisms that satisfy RMI: (1) Safe mode, which disables speculative accesses to shared memory, and (2) Burst mode, a localized performance optimization that requires simple program analysis on small code snippets. Our end-to-end prototype, Citadel, consists of an FPGA-based multicore processor that boots Linux and runs secure applications, including cryptographic libraries and private inference, with less than 5% performance overhead.
△ Less
Submitted 6 February, 2025; v1 submitted 26 June, 2023;
originally announced June 2023.
-
GAGA for Henselian schemes
Authors:
Sheela Devadas
Abstract:
The global analogue of a Henselian local ring is a Henselian pair-a ring R and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of monic polynomials over R/I to factorizations over R. The geometric counterpart is the notion of a Henselian scheme, which can serve as a substitute for formal schemes in applications such as deformation theory. In…
▽ More
The global analogue of a Henselian local ring is a Henselian pair-a ring R and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of monic polynomials over R/I to factorizations over R. The geometric counterpart is the notion of a Henselian scheme, which can serve as a substitute for formal schemes in applications such as deformation theory. In this paper we prove a GAGA-style cohomology comparison result for Henselian schemes in positive characteristic, making use of a "Henselian étale" topology defined in previous work in order to leverage exactness of finite pushforward for abelian sheaves in the étale topology of schemes. We will also discuss algebraizability of coherent sheaves on the Henselization of a proper scheme, proving (without a positive characteristic restriction) algebraizability for coherent subsheaves. We can then deduce a Henselian version of Chow's theorem on algebraization and the algebraizability of maps between Henselizations of proper schemes.
△ Less
Submitted 28 January, 2025; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Henselian schemes in positive characteristic
Authors:
Sheela Devadas
Abstract:
The global analogue of a Henselian local ring is a Henselian pair: a ring A and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of polynomials over A/I to factorizations over A. The geometric counterpart is the notion of a Henselian scheme, which is an analogue of a tubular neighborhood in algebraic geometry.
In this paper we revisit the fo…
▽ More
The global analogue of a Henselian local ring is a Henselian pair: a ring A and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of polynomials over A/I to factorizations over A. The geometric counterpart is the notion of a Henselian scheme, which is an analogue of a tubular neighborhood in algebraic geometry.
In this paper we revisit the foundations of the theory of Henselian schemes. The pathological behavior of quasi-coherent sheaves on Henselian schemes in characteristic 0 makes them poor models for an "algebraic tube" in characteristic 0. We show that such problems do not arise in positive characteristic, and establish good properties for analogues of smooth and étale maps in the general Henselian setting.
△ Less
Submitted 23 June, 2024; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Microscopic study of effective 2+1 dimensional gravity in ferrofluid-based hyperbolic metamaterials
Authors:
Vera N. Smolyaninova,
Jonathon Cartelli,
Nathaniel Christopher,
Benjamin Kist,
Jonathan Perry,
Stephanie Spickard,
Mary Sajini Devadas,
Igor I. Smolyaninov
Abstract:
Recent theoretical and experimental work demonstrated that nonlinear optics of ferrofluid-based hyperbolic metamaterials exhibits very unusual 2+2-dimensional spatiotemporal dynamics. Here we report a detailed microscopic study of mutual interactions of individual self-focused optical filaments inside this metamaterial. In agreement with theoretical expectations, the observed mutual interactions o…
▽ More
Recent theoretical and experimental work demonstrated that nonlinear optics of ferrofluid-based hyperbolic metamaterials exhibits very unusual 2+2-dimensional spatiotemporal dynamics. Here we report a detailed microscopic study of mutual interactions of individual self-focused optical filaments inside this metamaterial. In agreement with theoretical expectations, the observed mutual interactions of individual filaments exhibit strong similarities with general relativity in 2+1 dimensions. This observation is very important since 2+1-dimensional gravity is an exactly solvable theory even in the quantum gravity limit.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Differentially Private Deep Learning with ModelMix
Authors:
Hanshen Xiao,
Jun Wan,
Srinivas Devadas
Abstract:
Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs rando…
▽ More
Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs random aggregation of intermediate model states. It strengthens the composite privacy analysis utilizing the entropy of the training trajectory and improves the $(ε, δ)$ DP security parameters by an order of magnitude.
We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix.
Thorough experiments with significant privacy/utility improvement are presented to support our theory. We train a Resnet-20 network on CIFAR10 with $70.4\%$ accuracy via ModelMix given $(ε=8, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=145.8,δ=10^{-5})$ using regular DP-SGD; assisted with additional public low-dimensional gradient embedding, one can further improve the accuracy to $79.1\%$ with $(ε=6.1, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=111.2, δ=10^{-5})$ without ModelMix.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
PAC Privacy: Automatic Privacy Measurement and Control of Data Processing
Authors:
Hanshen Xiao,
Srinivas Devadas
Abstract:
We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Privacy. PAC Privacy characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case,…
▽ More
We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Privacy. PAC Privacy characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case, PAC Privacy is a simulatable metric that quantifies the instance-based impossibility of inference. A fully automatic analysis and proof generation framework is proposed: security parameters can be produced with arbitrarily high confidence via Monte-Carlo simulation for any black-box data processing oracle. This appealing automation property enables analysis of complicated data processing, where the worst-case proof in the classic privacy regime could be loose or even intractable. Moreover, we show that the produced PAC Privacy guarantees enjoy simple composition bounds and the automatic analysis framework can be implemented in an online fashion to analyze the composite PAC Privacy loss even under correlated randomness. On the utility side, the magnitude of (necessary) perturbation required in PAC Privacy is not lower bounded by Theta(\sqrt{d}) for a d-dimensional release but could be O(1) for many practical data processing tasks, which is in contrast to the input-independent worst-case information-theoretic lower bound. Example applications of PAC Privacy are included with comparisons to existing works.
△ Less
Submitted 19 June, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
ShorTor: Improving Tor Network Latency via Multi-hop Overlay Routing
Authors:
Kyle Hogan,
Sacha Servan-Schreiber,
Zachary Newman,
Ben Weintraub,
Cristina Nita-Rotaru,
Srinivas Devadas
Abstract:
We present ShorTor, a protocol for reducing latency on the Tor network. ShorTor uses multi-hop overlay routing, a technique typically employed by content delivery networks, to influence the route Tor traffic takes across the internet. ShorTor functions as an overlay on top of onion routing-Tor's existing routing protocol and is run by Tor relays, making it independent of the path selection perform…
▽ More
We present ShorTor, a protocol for reducing latency on the Tor network. ShorTor uses multi-hop overlay routing, a technique typically employed by content delivery networks, to influence the route Tor traffic takes across the internet. ShorTor functions as an overlay on top of onion routing-Tor's existing routing protocol and is run by Tor relays, making it independent of the path selection performed by Tor clients. As such, ShorTor reduces latency while preserving Tor's existing security properties. Specifically, the routes taken in ShorTor are in no way correlated to either the Tor user or their destination, including the geographic location of either party. We analyze the security of ShorTor using the AnoA framework, showing that ShorTor maintains all of Tor's anonymity guarantees. We augment our theoretical claims with an empirical analysis. To evaluate ShorTor's performance, we collect a real-world dataset of over 400,000 latency measurements between the 1,000 most popular Tor relays, which collectively see the vast majority of Tor traffic. With this data, we identify pairs of relays that could benefit from ShorTor: that is, two relays where introducing an additional intermediate network hop results in lower latency than the direct route between them. We use our measurement dataset to simulate the impact on end users by applying ShorTor to two million Tor circuits chosen according to Tor's specification. ShorTor reduces the latency for the 99th percentile of relay pairs in Tor by 148 ms. Similarly, ShorTor reduces the latency of Tor circuits by 122 ms at the 99th percentile. In practice, this translates to ShorTor truncating tail latencies for Tor which has a direct impact on page load times and, consequently, user experience on the Tor browser.
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption (Extended Version)
Authors:
Axel Feldmann,
Nikola Samardzic,
Aleksandar Krastev,
Srini Devadas,
Ron Dreslinski,
Karim Eldefrawy,
Nicholas Genise,
Chris Peikert,
Daniel Sanchez
Abstract:
Fully Homomorphic Encryption (FHE) allows computing on encrypted data, enabling secure offloading of computation to untrusted serves. Though it provides ideal security, FHE is expensive when executed in software, 4 to 5 orders of magnitude slower than computing on unencrypted data. These overheads are a major barrier to FHE's widespread adoption. We present F1, the first FHE accelerator that is pr…
▽ More
Fully Homomorphic Encryption (FHE) allows computing on encrypted data, enabling secure offloading of computation to untrusted serves. Though it provides ideal security, FHE is expensive when executed in software, 4 to 5 orders of magnitude slower than computing on unencrypted data. These overheads are a major barrier to FHE's widespread adoption. We present F1, the first FHE accelerator that is programmable, i.e., capable of executing full FHE programs. F1 builds on an in-depth architectural analysis of the characteristics of FHE computations that reveals acceleration opportunities. F1 is a wide-vector processor with novel functional units deeply specialized to FHE primitives, such as modular arithmetic, number-theoretic transforms, and structured permutations. This organization provides so much compute throughput that data movement becomes the bottleneck. Thus, F1 is primarily designed to minimize data movement. The F1 hardware provides an explicitly managed memory hierarchy and mechanisms to decouple data movement from execution. A novel compiler leverages these mechanisms to maximize reuse and schedule off-chip and on-chip data movement. We evaluate F1 using cycle-accurate simulations and RTL synthesis. F1 is the first system to accelerate complete FHE programs and outperforms state-of-the-art software implementations by gmean 5400x and by up to 17000x. These speedups counter most of FHE's overheads and enable new applications, like real-time private deep learning in the cloud.
△ Less
Submitted 25 September, 2021; v1 submitted 11 September, 2021;
originally announced September 2021.
-
On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data
Authors:
Di Wang,
Hanshen Xiao,
Srini Devadas,
Jinhui Xu
Abstract:
In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a compreh…
▽ More
In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a comprehensive study of DP-SCO under various settings. First, we consider the case where the loss function is strongly convex and smooth. For this case, we propose a method based on the sample-and-aggregate framework, which has an excess population risk of $\tilde{O}(\frac{d^3}{nε^4})$ (after omitting other factors), where $n$ is the sample size and $d$ is the dimensionality of the data. Then, we show that with some additional assumptions on the loss functions, it is possible to reduce the \textit{expected} excess population risk to $\tilde{O}(\frac{ d^2}{ nε^2 })$. To lift these additional conditions, we also provide a gradient smoothing and trimming based scheme to achieve excess population risks of $\tilde{O}(\frac{ d^2}{nε^2})$ and $\tilde{O}(\frac{d^\frac{2}{3}}{(nε^2)^\frac{1}{3}})$ for strongly convex and general convex loss functions, respectively, \textit{with high probability}. Experiments suggest that our algorithms can effectively deal with the challenges caused by data irregularity.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems (Extended Version)
Authors:
Yu Xia,
Xiangyao Yu,
Andrew Pavlo,
Srinivas Devadas
Abstract:
Existing single-stream logging schemes are unsuitable for in-memory database management systems (DBMSs) as the single log is often a performance bottleneck. To overcome this problem, we present Taurus, an efficient parallel logging scheme that uses multiple log streams, and is compatible with both data and command logging. Taurus tracks and encodes transaction dependencies using a vector of log se…
▽ More
Existing single-stream logging schemes are unsuitable for in-memory database management systems (DBMSs) as the single log is often a performance bottleneck. To overcome this problem, we present Taurus, an efficient parallel logging scheme that uses multiple log streams, and is compatible with both data and command logging. Taurus tracks and encodes transaction dependencies using a vector of log sequence numbers (LSNs). These vectors ensure that the dependencies are fully captured in logging and correctly enforced in recovery. Our experimental evaluation with an in-memory DBMS shows that Taurus's parallel logging achieves up to 9.9x and 2.9x speedups over single-streamed data logging and command logging, respectively. It also enables the DBMS to recover up to 22.9x and 75.6x faster than these baselines for data and command logging, respectively. We also compare Taurus with two state-of-the-art parallel logging schemes and show that the DBMS achieves up to 2.8x better performance on NVMe drives and 9.2x on HDDs.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Experimental observation of effective gravity and two times physics in ferrofluid-based hyperbolic metamaterials
Authors:
V. N. Smolyaninova,
J. Cartelli,
B. Augstein,
S. Spickard,
M. S. Devadas,
I. I. Smolyaninov
Abstract:
Recently it was proposed that extraordinary light waves in hyperbolic metamaterials exhibit two times physics behavior (Phys. Rev. Lett. 105, 067402, 2010). We report experimental observation of this effect via investigation of gravity-like nonlinear optics of iron/cobalt-based ferrofluid hyperbolic metamaterials. In addition to conventional temporal coordinate, the spatial coordinate oriented alo…
▽ More
Recently it was proposed that extraordinary light waves in hyperbolic metamaterials exhibit two times physics behavior (Phys. Rev. Lett. 105, 067402, 2010). We report experimental observation of this effect via investigation of gravity-like nonlinear optics of iron/cobalt-based ferrofluid hyperbolic metamaterials. In addition to conventional temporal coordinate, the spatial coordinate oriented along the optical axis of the metamaterial also exhibits timelike character, which leads to very unusual two times physics behavior in these systems.
△ Less
Submitted 20 August, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Local Differential Privacy in Decentralized Optimization
Authors:
Hanshen Xiao,
Yu Ye,
Srinivas Devadas
Abstract:
Privacy concerns with sensitive data are receiving increasing attention. In this paper, we study local differential privacy (LDP) in interactive decentralized optimization. By constructing random local aggregators, we propose a framework to amplify LDP by a constant. We take Alternating Direction Method of Multipliers (ADMM), and decentralized gradient descent as two concrete examples, where exper…
▽ More
Privacy concerns with sensitive data are receiving increasing attention. In this paper, we study local differential privacy (LDP) in interactive decentralized optimization. By constructing random local aggregators, we propose a framework to amplify LDP by a constant. We take Alternating Direction Method of Multipliers (ADMM), and decentralized gradient descent as two concrete examples, where experiments support our theory. In an asymptotic view, we address the following question: Under LDP, is it possible to design a distributed private minimizer for arbitrary closed convex constraints with utility loss not explicitly dependent on dimensionality? As an affiliated result, we also show that with merely linear secret sharing, information theoretic privacy is achievable for bounded colluding agents.
△ Less
Submitted 1 June, 2019; v1 submitted 16 February, 2019;
originally announced February 2019.
-
XRD: Scalable Messaging System with Cryptographic Privacy
Authors:
Albert Kwon,
David Lu,
Srinivas Devadas
Abstract:
Even as end-to-end encrypted communication becomes more popular, private messaging remains a challenging problem due to metadata leakages, such as who is communicating with whom. Most existing systems that hide communication metadata either (1) do not scale easily, (2) incur significant overheads, or (3) provide weaker guarantees than cryptographic privacy, such as differential privacy or heuristi…
▽ More
Even as end-to-end encrypted communication becomes more popular, private messaging remains a challenging problem due to metadata leakages, such as who is communicating with whom. Most existing systems that hide communication metadata either (1) do not scale easily, (2) incur significant overheads, or (3) provide weaker guarantees than cryptographic privacy, such as differential privacy or heuristic privacy. This paper presents XRD (short for Crossroads), a metadata private messaging system that provides cryptographic privacy, while scaling easily to support more users by adding more servers. At a high level, XRD uses multiple mix networks in parallel with several techniques, including a novel technique we call aggregate hybrid shuffle. As a result, XRD can support 2 million users with 251 seconds of latency with 100 servers. This is 12x and 3.7x faster than Atom and Pung, respectively, which are prior scalable messaging systems with cryptographic privacy.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
Sanctorum: A lightweight security monitor for secure enclaves
Authors:
Ilia Lebedev,
Kyle Hogan,
Jules Drean,
David Kohlbrenner,
Dayeol Lee,
Krste Asanović,
Dawn Song,
Srinivas Devadas
Abstract:
Enclaves have emerged as a particularly compelling primitive to implement trusted execution environments: strongly isolated sensitive user-mode processes in a largely untrusted software environment. While the threat models employed by various enclave systems differ, the high-level guarantees they offer are essentially the same: attestation of an enclave's initial state, as well as a guarantee of e…
▽ More
Enclaves have emerged as a particularly compelling primitive to implement trusted execution environments: strongly isolated sensitive user-mode processes in a largely untrusted software environment. While the threat models employed by various enclave systems differ, the high-level guarantees they offer are essentially the same: attestation of an enclave's initial state, as well as a guarantee of enclave integrity and privacy in the presence of an adversary.
This work describes Sanctorum, a small trusted code base (TCB), consisting of a generic enclave-capable system, which is sufficient to implement secure enclaves akin to the primitive offered by Intel's SGX. While enclaves may be implemented via unconditionally trusted hardware and microcode, as it is the case in SGX, we employ a smaller TCB principally consisting of authenticated, privileged software, which may be replaced or patched as needed. Sanctorum implements a formally verified specification for generic enclaves on an in-order multiprocessor system meeting baseline security requirements, e.g., the MIT Sanctum processor and the Keystone enclave framework. Sanctorum requires trustworthy hardware including a random number generator, a private cryptographic key pair derived via a secure bootstrapping protocol, and a robust isolation primitive to safeguard sensitive information. Sanctorum's threat model is informed by the threat model of the isolation primitive, and is suitable for adding enclaves to a variety of processor systems.
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
MI6: Secure Enclaves in a Speculative Out-of-Order Processor
Authors:
Thomas Bourgeat,
Ilia Lebedev,
Andrew Wright,
Sizhuo Zhang,
Arvind,
Srinivas Devadas
Abstract:
Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees.
We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attac…
▽ More
Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees.
We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attacker capable of mounting any software attack currently considered practical, including control flow speculation attacks. MI6 is inspired by Sanctum [16] and extends its isolation guarantee to more realistic memory hierarchies. It also introduces a purge instruction, which is used only when a secure process is scheduled, and implements it for a complex processor microarchitecture. We model the performance impact of enclaves in MI6 through FPGA emulation on AWS F1 FPGAs by running SPEC CINT2006 benchmarks on top of an untrusted Linux OS. Security comes at the cost of approximately 16.4% average slowdown for protected programs.
△ Less
Submitted 29 August, 2019; v1 submitted 23 December, 2018;
originally announced December 2018.
-
Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning
Authors:
Sanjit Bhat,
David Lu,
Albert Kwon,
Srinivas Devadas
Abstract:
In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work,…
▽ More
In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over $1\%$ higher true positive rate (TPR) than state-of-the-art attacks while achieving $4\times$ lower false positive rate (FPR). Var-CNN's improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by $3.12\%$ while increasing the TPR by $13\%$. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.
△ Less
Submitted 23 July, 2019; v1 submitted 27 February, 2018;
originally announced February 2018.
-
Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation
Authors:
Xiangyao Yu,
Christopher J. Hughes,
Nadathur Satish,
Onur Mutlu,
Srinivas Devadas
Abstract:
Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs mainly optimize for hit latency and do not consider off-chip bandwid…
▽ More
Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs mainly optimize for hit latency and do not consider off-chip bandwidth efficiency as a first-class design constraint. Hence, as we show in this paper, these designs are suboptimal for use with in-package DRAM.
We propose a new DRAM cache design, Banshee, that optimizes for both in- and off-package DRAM bandwidth efficiency without degrading access latency. The key ideas are to eliminate the in-package DRAM bandwidth overheads due to costly tag accesses through virtual memory mechanism and to incorporate a bandwidth-aware frequency-based replacement policy that is biased to reduce unnecessary traffic to off-package DRAM. Our extensive evaluation shows that Banshee provides significant performance improvement and traffic reduction over state-of-the-art latency-optimized DRAM cache designs.
△ Less
Submitted 9 April, 2017;
originally announced April 2017.
-
Efficient Synchronous Byzantine Consensus
Authors:
Ittai Abraham,
Srinivas Devadas,
Danny Dolev,
Kartik Nayak,
Ling Ren
Abstract:
We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to…
▽ More
We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption. We present a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case. The key challenge is to ensure quorum intersection at one honest replica. Our solution is to rely on the synchrony assumption to form a post-commit quorum of size $2f+1$, which intersects at $f+1$ replicas with any pre-commit quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in expected 8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24 rounds. Our protocols may be applied to build Byzantine fault tolerant systems or improve cryptographic protocols such as cryptocurrencies when synchrony can be assumed.
△ Less
Submitted 12 September, 2017; v1 submitted 7 April, 2017;
originally announced April 2017.
-
Atom: Horizontally Scaling Strong Anonymity
Authors:
Albert Kwon,
Henry Corrigan-Gibbs,
Srinivas Devadas,
Bryan Ford
Abstract:
Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all ho…
▽ More
Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all honest users of the system, against an active adversary who controls the entire network, a portion of the system's servers, and any number of malicious users. The architectural ideas behind Atom have been known in theory, but putting them into practice requires new techniques for (1) avoiding the reliance on heavy general-purpose multi-party computation protocols, (2) defeating active attacks by malicious servers at minimal performance cost, and (3) handling server failure and churn.
Atom is most suitable for sending a large number of short messages, as in a microblogging application or a high-security communication bootstrapping ("dialing") for private messaging systems. We show that, on a heterogeneous network of 1,024 servers, Atom can transit a million Tweet-length messages in 28 minutes. This is over 23x faster than prior systems with similar privacy guarantees.
△ Less
Submitted 2 October, 2017; v1 submitted 22 December, 2016;
originally announced December 2016.
-
Tardis 2.0: Optimized Time Traveling Coherence for Relaxed Consistency Models
Authors:
Xiangyao Yu,
Hongzhe Liu,
Ethan Zou,
Srinivas Devadas
Abstract:
Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalability. The original Tardis protocol, however, only supports the Sequential Consistency (SC) memory m…
▽ More
Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalability. The original Tardis protocol, however, only supports the Sequential Consistency (SC) memory model, limiting its applicability. Tardis also incurs extra network traffic on some benchmarks due to renew messages, and has suboptimal performance when the program uses spinning to communicate between threads.
In this paper, we address these downsides of Tardis protocol and make it significantly more practical. Specifically, we discuss the architectural, memory system and protocol changes required in order to implement the TSO consistency model on Tardis, and prove that the modified protocol satisfies TSO. We also describe modifications for Partial Store Order (PSO) and Release Consistency (RC). Finally, we propose optimizations for better leasing policies and to handle program spinning. On a set of benchmarks, optimized Tardis improves on a full-map directory protocol in the metrics of performance, storage and network traffic, while being simpler to implement.
△ Less
Submitted 27 July, 2016; v1 submitted 27 November, 2015;
originally announced November 2015.
-
The polynomial representation of the type $A_{n - 1}$ rational Cherednik algebra in characteristic $p \mid n$
Authors:
Sheela Devadas,
Yi Sun
Abstract:
We study the polynomial representation of the rational Cherednik algebra of type $A_{n-1}$ with generic parameter in characteristic $p$ for $p \mid n$. We give explicit formulas for generators for the maximal proper graded submodule, show that they cut out a complete intersection, and thus compute the Hilbert series of the irreducible quotient. Our methods are motivated by taking characteristic…
▽ More
We study the polynomial representation of the rational Cherednik algebra of type $A_{n-1}$ with generic parameter in characteristic $p$ for $p \mid n$. We give explicit formulas for generators for the maximal proper graded submodule, show that they cut out a complete intersection, and thus compute the Hilbert series of the irreducible quotient. Our methods are motivated by taking characteristic $p$ analogues of existing characteristic $0$ results.
△ Less
Submitted 12 May, 2016; v1 submitted 28 May, 2015;
originally announced May 2015.
-
A Proof of Correctness for the Tardis Cache Coherence Protocol
Authors:
Xiangyao Yu,
Muralidaran Vijayaraghavan,
Srinivas Devadas
Abstract:
We prove the correctness of a recently-proposed cache coherence protocol, Tardis, which is simple, yet scalable to high processor counts, because it only requires O(logN) storage per cacheline for an N-processor system. We prove that Tardis follows the sequential consistency model and is both deadlock- and livelock-free. Our proof is based on simple and intuitive invariants of the system and thus…
▽ More
We prove the correctness of a recently-proposed cache coherence protocol, Tardis, which is simple, yet scalable to high processor counts, because it only requires O(logN) storage per cacheline for an N-processor system. We prove that Tardis follows the sequential consistency model and is both deadlock- and livelock-free. Our proof is based on simple and intuitive invariants of the system and thus applies to any system scale and many variants of Tardis.
△ Less
Submitted 24 May, 2015;
originally announced May 2015.
-
TARDIS: Timestamp based Coherence Algorithm for Distributed Shared Memory
Authors:
Xiangyao Yu,
Srinivas Devadas
Abstract:
A new memory coherence protocol, Tardis, is proposed. Tardis uses timestamp counters representing logical time as well as physical time to order memory operations and enforce sequential consistency in any type of shared memory system. Tardis is unique in that as compared to the widely-adopted directory coherence protocol, and its variants, it completely avoids multicasting and only requires O(log…
▽ More
A new memory coherence protocol, Tardis, is proposed. Tardis uses timestamp counters representing logical time as well as physical time to order memory operations and enforce sequential consistency in any type of shared memory system. Tardis is unique in that as compared to the widely-adopted directory coherence protocol, and its variants, it completely avoids multicasting and only requires O(log N) storage per cache block for an N-core system rather than O(N) sharer information. Tardis is simpler and easier to reason about, yet achieves similar performance to directory protocols on a wide range of benchmarks run on 16, 64 and 256 cores.
△ Less
Submitted 23 September, 2015; v1 submitted 19 January, 2015;
originally announced January 2015.
-
A Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness
Authors:
Sheela Devadas,
Ronitt Rubinfeld
Abstract:
We present simple, self-contained proofs of correctness for algorithms for linearity testing and program checking of linear functions on finite subsets of integers represented as n-bit numbers. In addition we explore a generalization of self-testing to homomorphisms on a multidimensional vector space. We show that our self-testing algorithm for the univariate case can be directly generalized to ve…
▽ More
We present simple, self-contained proofs of correctness for algorithms for linearity testing and program checking of linear functions on finite subsets of integers represented as n-bit numbers. In addition we explore a generalization of self-testing to homomorphisms on a multidimensional vector space. We show that our self-testing algorithm for the univariate case can be directly generalized to vector space domains. The number of queries made by our algorithms is independent of domain size.
△ Less
Submitted 22 June, 2015; v1 submitted 17 December, 2014;
originally announced December 2014.
-
Representations of rational Cherednik algebras of G(m,r,n) in positive characteristic
Authors:
Sheela Devadas,
Steven V Sam
Abstract:
We study lowest-weight irreducible representations of rational Cherednik algebras attached to the complex reflection groups G(m,r,n) in characteristic p. Our approach is mostly from the perspective of commutative algebra. By studying the kernel of the contravariant bilinear form on Verma modules, we obtain formulas for Hilbert series of irreducible representations in a number of cases, and present…
▽ More
We study lowest-weight irreducible representations of rational Cherednik algebras attached to the complex reflection groups G(m,r,n) in characteristic p. Our approach is mostly from the perspective of commutative algebra. By studying the kernel of the contravariant bilinear form on Verma modules, we obtain formulas for Hilbert series of irreducible representations in a number of cases, and present conjectures in other cases. We observe that the form of the Hilbert series of the irreducible representations and the generators of the kernel tend to be determined by the value of n modulo p, and are related to special classes of subspace arrangements. Perhaps the most novel (conjectural) discovery from the commutative algebra perspective is that the generators of the kernel can be given the structure of a "matrix regular sequence" in some instances, which we prove in some small cases.
△ Less
Submitted 13 October, 2013; v1 submitted 3 April, 2013;
originally announced April 2013.
-
Path ORAM: An Extremely Simple Oblivious RAM Protocol
Authors:
Emil Stefanov,
Marten van Dijk,
Elaine Shi,
T-H. Hubert Chan,
Christopher Fletcher,
Ling Ren,
Xiangyao Yu,
Srinivas Devadas
Abstract:
We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best kno…
▽ More
We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.
△ Less
Submitted 13 January, 2014; v1 submitted 23 February, 2012;
originally announced February 2012.
-
Knowledge Flow Analysis for Security Protocols
Authors:
Emina Torlak,
Marten van Dijk,
Blaise Gassend,
Daniel Jackson,
Srinivas Devadas
Abstract:
Knowledge flow analysis offers a simple and flexible way to find flaws in security protocols. A protocol is described by a collection of rules constraining the propagation of knowledge amongst principals. Because this characterization corresponds closely to informal descriptions of protocols, it allows a succinct and natural formalization; because it abstracts away message ordering, and handles…
▽ More
Knowledge flow analysis offers a simple and flexible way to find flaws in security protocols. A protocol is described by a collection of rules constraining the propagation of knowledge amongst principals. Because this characterization corresponds closely to informal descriptions of protocols, it allows a succinct and natural formalization; because it abstracts away message ordering, and handles communications between principals and applications of cryptographic primitives uniformly, it is readily represented in a standard logic. A generic framework in the Alloy modelling language is presented, and instantiated for two standard protocols, and a new key management scheme.
△ Less
Submitted 24 May, 2006;
originally announced May 2006.
-
A Generalized Two-Phase Analysis of Knowledge Flows in Security Protocols
Authors:
Marten van Dijk,
Emina Torlak,
Blaise Gassend,
Srinivas Devadas
Abstract:
We introduce knowledge flow analysis, a simple and flexible formalism for checking cryptographic protocols. Knowledge flows provide a uniform language for expressing the actions of principals, assump- tions about intruders, and the properties of cryptographic primitives. Our approach enables a generalized two-phase analysis: we extend the two-phase theory by identifying the necessary and suffici…
▽ More
We introduce knowledge flow analysis, a simple and flexible formalism for checking cryptographic protocols. Knowledge flows provide a uniform language for expressing the actions of principals, assump- tions about intruders, and the properties of cryptographic primitives. Our approach enables a generalized two-phase analysis: we extend the two-phase theory by identifying the necessary and sufficient proper- ties of a broad class of cryptographic primitives for which the theory holds. We also contribute a library of standard primitives and show that they satisfy our criteria.
△ Less
Submitted 22 May, 2006;
originally announced May 2006.