Search | arXiv e-print repository

Teaching an Old Dog New Tricks: Verifiable FHE Using Commodity Hardware

Authors: Jules Drean, Fisher Jepsen, Edward Suh, Srini Devadas, Aamer Jaleel, Gururaj Saileshwar

Abstract: We present Argos, a simple approach for adding verifiability to fully homomorphic encryption (FHE) schemes using trusted hardware. Traditional approaches to verifiable FHE require expensive cryptographic proofs, which incur an overhead of up to seven orders of magnitude on top of FHE, making them impractical. With Argos, we show that trusted hardware can be securely used to provide verifiability… ▽ More We present Argos, a simple approach for adding verifiability to fully homomorphic encryption (FHE) schemes using trusted hardware. Traditional approaches to verifiable FHE require expensive cryptographic proofs, which incur an overhead of up to seven orders of magnitude on top of FHE, making them impractical. With Argos, we show that trusted hardware can be securely used to provide verifiability for FHE computations, with minimal overhead relative to the baseline FHE computation. An important contribution of Argos is showing that the major security pitfall associated with trusted hardware, microarchitectural side channels, can be completely mitigated by excluding any secrets from the CPU and the memory hierarchy. This is made possible by focusing on building a platform that only enforces program and data integrity and not confidentiality. All secrets related to the attestation mechanism are kept in a separate coprocessor (e.g., a TPM)-inaccessible to any software-based attacker. Relying on a discrete TPM typically incurs significant performance overhead, which is why (insecure) software-based TPMs are used in practice. As a second contribution, we show that for FHE applications, the attestation protocol can be adapted to only incur a fixed cost. Argos requires no dedicated hardware extensions and is supported on commodity processors from 2008 onward. Our prototype implementation introduces 3% overhead for FHE evaluation, and 8% for more complex protocols. In particular, we show that Argos can be used for real-world applications of FHE, such as private information retrieval (PIR) and private set intersection (PSI), where providing verifiability is imperative. By demonstrating how to combine cryptography with trusted hardware, Argos paves the way for widespread deployment of FHE-based protocols beyond the semi-honest setting, without the overhead of cryptographic proofs. △ Less

Submitted 24 March, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

arXiv:2408.12576 [pdf, ps, other]

Higher-weight Jacobians

Authors: Sheela Devadas, Max Lieblich

Abstract: We define and study Jacobians of Hodge structures with weight greater than 1. Jacobians of weight 2 naturally come up in the context of the Brauer group and the Tate conjecture. They were previously studied in a special case by Beauville in his work on surfaces of maximal Picard number, and are related to the work of Totaro on Hodge structures with no middle pieces. Higher-weight Jacobians are com… ▽ More We define and study Jacobians of Hodge structures with weight greater than 1. Jacobians of weight 2 naturally come up in the context of the Brauer group and the Tate conjecture. They were previously studied in a special case by Beauville in his work on surfaces of maximal Picard number, and are related to the work of Totaro on Hodge structures with no middle pieces. Higher-weight Jacobians are complex tori, and it is generally quite difficult to tell if they are algebraic. After discussing some general theory, we compute numerous examples of Jacobians of various weights for special classes of varieties: abelian varieties of maximal Picard number, Kummer varieties, and singular K3 surfaces. It turns out that all of these Jacobians are algebraic. We compute their fields of definition. △ Less

Submitted 9 February, 2025; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: v4: change Corollary 2.7 and Proposition 2.8. 28 pages including appendix

arXiv:2309.02672 [pdf, other]

Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep Learning

Authors: Hanshen Xiao, Jun Wan, Srinivas Devadas

Abstract: We study the fundamental problem of the construction of optimal randomization in Differential Privacy. Depending on the clipping strategy or additional properties of the processing function, the corresponding sensitivity set theoretically determines the necessary randomization to produce the required security parameters. Towards the optimal utility-privacy tradeoff, finding the minimal perturbatio… ▽ More We study the fundamental problem of the construction of optimal randomization in Differential Privacy. Depending on the clipping strategy or additional properties of the processing function, the corresponding sensitivity set theoretically determines the necessary randomization to produce the required security parameters. Towards the optimal utility-privacy tradeoff, finding the minimal perturbation for properly-selected sensitivity sets stands as a central problem in DP research. In practice, l_2/l_1-norm clippings with Gaussian/Laplace noise mechanisms are among the most common setups. However, they also suffer from the curse of dimensionality. For more generic clipping strategies, the understanding of the optimal noise for a high-dimensional sensitivity set remains limited. In this paper, we revisit the geometry of high-dimensional sensitivity sets and present a series of results to characterize the non-asymptotically optimal Gaussian noise for Rényi DP (RDP). Our results are both negative and positive: on one hand, we show the curse of dimensionality is tight for a broad class of sensitivity sets satisfying certain symmetry properties; but if, fortunately, the representation of the sensitivity set is asymmetric on some group of orthogonal bases, we show the optimal noise bounds need not be explicitly dependent on either dimension or rank. We also revisit sampling in the high-dimensional scenario, which is the key for both privacy amplification and computation efficiency in large-scale data processing. We propose a novel method, termed twice sampling, which implements both sample-wise and coordinate-wise sampling, to enable Gaussian noises to fit the sensitivity geometry more closely. With closed-form RDP analysis, we prove twice sampling produces asymptotic improvement of the privacy amplification given an additional infinity-norm restriction, especially for small sampling rate. △ Less

Submitted 28 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Journal ref: ACM CCS 2023

arXiv:2306.14882 [pdf, other]

Citadel: Simple Spectre-Safe Isolation For Real-World Programs That Share Memory

Authors: Jules Drean, Miguel Gomez-Garcia, Fisher Jepsen, Thomas Bourgeat, Srinivas Devadas

Abstract: Transient execution side-channel attacks, such as Spectre, have been shown to break almost all isolation primitives. We introduce a new security property we call relaxed microarchitectural isolation (RMI) that allows sensitive programs that are not-constant-time to share memory with an attacker while restricting the information leakage to that of non-speculative execution. Although this type of sp… ▽ More Transient execution side-channel attacks, such as Spectre, have been shown to break almost all isolation primitives. We introduce a new security property we call relaxed microarchitectural isolation (RMI) that allows sensitive programs that are not-constant-time to share memory with an attacker while restricting the information leakage to that of non-speculative execution. Although this type of speculative security property is typically challenging to enforce, we show that we can leverage the enclave setup to achieve it. In particular, we use microarchitectural isolation to restrict attacker's observations in conjunction with straightforward hardware mechanisms to limit speculation. This new design point presents a compelling trade-off between security, usability, and performance, making it possible to efficiently enforce RMI for any program. We demonstrate our approach by implementing and evaluating two simple defense mechanisms that satisfy RMI: (1) Safe mode, which disables speculative accesses to shared memory, and (2) Burst mode, a localized performance optimization that requires simple program analysis on small code snippets. Our end-to-end prototype, Citadel, consists of an FPGA-based multicore processor that boots Linux and runs secure applications, including cryptographic libraries and private inference, with less than 5% performance overhead. △ Less

Submitted 6 February, 2025; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.01722 [pdf, ps, other]

doi 10.1017/S147474802500009X

GAGA for Henselian schemes

Authors: Sheela Devadas

Abstract: The global analogue of a Henselian local ring is a Henselian pair-a ring R and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of monic polynomials over R/I to factorizations over R. The geometric counterpart is the notion of a Henselian scheme, which can serve as a substitute for formal schemes in applications such as deformation theory. In… ▽ More The global analogue of a Henselian local ring is a Henselian pair-a ring R and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of monic polynomials over R/I to factorizations over R. The geometric counterpart is the notion of a Henselian scheme, which can serve as a substitute for formal schemes in applications such as deformation theory. In this paper we prove a GAGA-style cohomology comparison result for Henselian schemes in positive characteristic, making use of a "Henselian étale" topology defined in previous work in order to leverage exactness of finite pushforward for abelian sheaves in the étale topology of schemes. We will also discuss algebraizability of coherent sheaves on the Henselization of a proper scheme, proving (without a positive characteristic restriction) algebraizability for coherent subsheaves. We can then deduce a Henselian version of Chow's theorem on algebraization and the algebraizability of maps between Henselizations of proper schemes. △ Less

Submitted 28 January, 2025; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: To appear in Journal of the Institute of Mathematics of Jussieu. Minor change to Theorem 3.6.2. 48 pages (including appendix)

MSC Class: 14A20; 13J15

Journal ref: J. Inst. Math. Jussieu 24 (2025) 1547-1589

arXiv:2212.08644 [pdf, ps, other]

Henselian schemes in positive characteristic

Authors: Sheela Devadas

Abstract: The global analogue of a Henselian local ring is a Henselian pair: a ring A and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of polynomials over A/I to factorizations over A. The geometric counterpart is the notion of a Henselian scheme, which is an analogue of a tubular neighborhood in algebraic geometry. In this paper we revisit the fo… ▽ More The global analogue of a Henselian local ring is a Henselian pair: a ring A and an ideal I which satisfy a condition resembling Hensel's lemma regarding lifting coprime factorizations of polynomials over A/I to factorizations over A. The geometric counterpart is the notion of a Henselian scheme, which is an analogue of a tubular neighborhood in algebraic geometry. In this paper we revisit the foundations of the theory of Henselian schemes. The pathological behavior of quasi-coherent sheaves on Henselian schemes in characteristic 0 makes them poor models for an "algebraic tube" in characteristic 0. We show that such problems do not arise in positive characteristic, and establish good properties for analogues of smooth and étale maps in the general Henselian setting. △ Less

Submitted 23 June, 2024; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: v3: 55 pages (including appendix). edited: proof of Corollary 6.11, some bibliography entries

MSC Class: 14A20; 13J15

arXiv:2212.02312 [pdf]

Microscopic study of effective 2+1 dimensional gravity in ferrofluid-based hyperbolic metamaterials

Authors: Vera N. Smolyaninova, Jonathon Cartelli, Nathaniel Christopher, Benjamin Kist, Jonathan Perry, Stephanie Spickard, Mary Sajini Devadas, Igor I. Smolyaninov

Abstract: Recent theoretical and experimental work demonstrated that nonlinear optics of ferrofluid-based hyperbolic metamaterials exhibits very unusual 2+2-dimensional spatiotemporal dynamics. Here we report a detailed microscopic study of mutual interactions of individual self-focused optical filaments inside this metamaterial. In agreement with theoretical expectations, the observed mutual interactions o… ▽ More Recent theoretical and experimental work demonstrated that nonlinear optics of ferrofluid-based hyperbolic metamaterials exhibits very unusual 2+2-dimensional spatiotemporal dynamics. Here we report a detailed microscopic study of mutual interactions of individual self-focused optical filaments inside this metamaterial. In agreement with theoretical expectations, the observed mutual interactions of individual filaments exhibit strong similarities with general relativity in 2+1 dimensions. This observation is very important since 2+1-dimensional gravity is an exactly solvable theory even in the quantum gravity limit. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 17 pages, 4 figures

arXiv:2210.03843 [pdf, other]

Differentially Private Deep Learning with ModelMix

Authors: Hanshen Xiao, Jun Wan, Srinivas Devadas

Abstract: Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs rando… ▽ More Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs random aggregation of intermediate model states. It strengthens the composite privacy analysis utilizing the entropy of the training trajectory and improves the $(ε, δ)$ DP security parameters by an order of magnitude. We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix. Thorough experiments with significant privacy/utility improvement are presented to support our theory. We train a Resnet-20 network on CIFAR10 with $70.4\%$ accuracy via ModelMix given $(ε=8, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=145.8,δ=10^{-5})$ using regular DP-SGD; assisted with additional public low-dimensional gradient embedding, one can further improve the accuracy to $79.1\%$ with $(ε=6.1, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=111.2, δ=10^{-5})$ without ModelMix. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2210.03458 [pdf, ps, other]

PAC Privacy: Automatic Privacy Measurement and Control of Data Processing

Authors: Hanshen Xiao, Srinivas Devadas

Abstract: We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Privacy. PAC Privacy characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case,… ▽ More We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Privacy. PAC Privacy characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case, PAC Privacy is a simulatable metric that quantifies the instance-based impossibility of inference. A fully automatic analysis and proof generation framework is proposed: security parameters can be produced with arbitrarily high confidence via Monte-Carlo simulation for any black-box data processing oracle. This appealing automation property enables analysis of complicated data processing, where the worst-case proof in the classic privacy regime could be loose or even intractable. Moreover, we show that the produced PAC Privacy guarantees enjoy simple composition bounds and the automatic analysis framework can be implemented in an online fashion to analyze the composite PAC Privacy loss even under correlated randomness. On the utility side, the magnitude of (necessary) perturbation required in PAC Privacy is not lower bounded by Theta(\sqrt{d}) for a d-dimensional release but could be O(1) for many practical data processing tasks, which is in contrast to the input-independent worst-case information-theoretic lower bound. Example applications of PAC Privacy are included with comparisons to existing works. △ Less

Submitted 19 June, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2204.04489 [pdf, other]

ShorTor: Improving Tor Network Latency via Multi-hop Overlay Routing

Authors: Kyle Hogan, Sacha Servan-Schreiber, Zachary Newman, Ben Weintraub, Cristina Nita-Rotaru, Srinivas Devadas

Abstract: We present ShorTor, a protocol for reducing latency on the Tor network. ShorTor uses multi-hop overlay routing, a technique typically employed by content delivery networks, to influence the route Tor traffic takes across the internet. ShorTor functions as an overlay on top of onion routing-Tor's existing routing protocol and is run by Tor relays, making it independent of the path selection perform… ▽ More We present ShorTor, a protocol for reducing latency on the Tor network. ShorTor uses multi-hop overlay routing, a technique typically employed by content delivery networks, to influence the route Tor traffic takes across the internet. ShorTor functions as an overlay on top of onion routing-Tor's existing routing protocol and is run by Tor relays, making it independent of the path selection performed by Tor clients. As such, ShorTor reduces latency while preserving Tor's existing security properties. Specifically, the routes taken in ShorTor are in no way correlated to either the Tor user or their destination, including the geographic location of either party. We analyze the security of ShorTor using the AnoA framework, showing that ShorTor maintains all of Tor's anonymity guarantees. We augment our theoretical claims with an empirical analysis. To evaluate ShorTor's performance, we collect a real-world dataset of over 400,000 latency measurements between the 1,000 most popular Tor relays, which collectively see the vast majority of Tor traffic. With this data, we identify pairs of relays that could benefit from ShorTor: that is, two relays where introducing an additional intermediate network hop results in lower latency than the direct route between them. We use our measurement dataset to simulate the impact on end users by applying ShorTor to two million Tor circuits chosen according to Tor's specification. ShorTor reduces the latency for the 99th percentile of relay pairs in Tor by 148 ms. Similarly, ShorTor reduces the latency of Tor circuits by 122 ms at the 99th percentile. In practice, this translates to ShorTor truncating tail latencies for Tor which has a direct impact on page load times and, consequently, user experience on the Tor browser. △ Less

Submitted 9 April, 2022; originally announced April 2022.

arXiv:2109.05371 [pdf, other]

F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption (Extended Version)

Authors: Axel Feldmann, Nikola Samardzic, Aleksandar Krastev, Srini Devadas, Ron Dreslinski, Karim Eldefrawy, Nicholas Genise, Chris Peikert, Daniel Sanchez

Abstract: Fully Homomorphic Encryption (FHE) allows computing on encrypted data, enabling secure offloading of computation to untrusted serves. Though it provides ideal security, FHE is expensive when executed in software, 4 to 5 orders of magnitude slower than computing on unencrypted data. These overheads are a major barrier to FHE's widespread adoption. We present F1, the first FHE accelerator that is pr… ▽ More Fully Homomorphic Encryption (FHE) allows computing on encrypted data, enabling secure offloading of computation to untrusted serves. Though it provides ideal security, FHE is expensive when executed in software, 4 to 5 orders of magnitude slower than computing on unencrypted data. These overheads are a major barrier to FHE's widespread adoption. We present F1, the first FHE accelerator that is programmable, i.e., capable of executing full FHE programs. F1 builds on an in-depth architectural analysis of the characteristics of FHE computations that reveals acceleration opportunities. F1 is a wide-vector processor with novel functional units deeply specialized to FHE primitives, such as modular arithmetic, number-theoretic transforms, and structured permutations. This organization provides so much compute throughput that data movement becomes the bottleneck. Thus, F1 is primarily designed to minimize data movement. The F1 hardware provides an explicitly managed memory hierarchy and mechanisms to decouple data movement from execution. A novel compiler leverages these mechanisms to maximize reuse and schedule off-chip and on-chip data movement. We evaluate F1 using cycle-accurate simulations and RTL synthesis. F1 is the first system to accelerate complete FHE programs and outperforms state-of-the-art software implementations by gmean 5400x and by up to 17000x. These speedups counter most of FHE's overheads and enable new applications, like real-time private deep learning in the cloud. △ Less

Submitted 25 September, 2021; v1 submitted 11 September, 2021; originally announced September 2021.

arXiv:2010.11082 [pdf, ps, other]

On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data

Authors: Di Wang, Hanshen Xiao, Srini Devadas, Jinhui Xu

Abstract: In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a compreh… ▽ More In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a comprehensive study of DP-SCO under various settings. First, we consider the case where the loss function is strongly convex and smooth. For this case, we propose a method based on the sample-and-aggregate framework, which has an excess population risk of $\tilde{O}(\frac{d^3}{nε^4})$ (after omitting other factors), where $n$ is the sample size and $d$ is the dimensionality of the data. Then, we show that with some additional assumptions on the loss functions, it is possible to reduce the \textit{expected} excess population risk to $\tilde{O}(\frac{ d^2}{ nε^2 })$. To lift these additional conditions, we also provide a gradient smoothing and trimming based scheme to achieve excess population risks of $\tilde{O}(\frac{ d^2}{nε^2})$ and $\tilde{O}(\frac{d^\frac{2}{3}}{(nε^2)^\frac{1}{3}})$ for strongly convex and general convex loss functions, respectively, \textit{with high probability}. Experiments suggest that our algorithms can effectively deal with the challenges caused by data irregularity. △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: Published in ICML 2020

arXiv:2010.06760 [pdf, other]

Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems (Extended Version)

Authors: Yu Xia, Xiangyao Yu, Andrew Pavlo, Srinivas Devadas

Abstract: Existing single-stream logging schemes are unsuitable for in-memory database management systems (DBMSs) as the single log is often a performance bottleneck. To overcome this problem, we present Taurus, an efficient parallel logging scheme that uses multiple log streams, and is compatible with both data and command logging. Taurus tracks and encodes transaction dependencies using a vector of log se… ▽ More Existing single-stream logging schemes are unsuitable for in-memory database management systems (DBMSs) as the single log is often a performance bottleneck. To overcome this problem, we present Taurus, an efficient parallel logging scheme that uses multiple log streams, and is compatible with both data and command logging. Taurus tracks and encodes transaction dependencies using a vector of log sequence numbers (LSNs). These vectors ensure that the dependencies are fully captured in logging and correctly enforced in recovery. Our experimental evaluation with an in-memory DBMS shows that Taurus's parallel logging achieves up to 9.9x and 2.9x speedups over single-streamed data logging and command logging, respectively. It also enables the DBMS to recover up to 22.9x and 75.6x faster than these baselines for data and command logging, respectively. We also compare Taurus with two state-of-the-art parallel logging schemes and show that the DBMS achieves up to 2.8x better performance on NVMe drives and 9.2x on HDDs. △ Less

Submitted 13 October, 2020; originally announced October 2020.

arXiv:2003.01564 [pdf]

doi 10.1117/1.AP.2.5.056001

Experimental observation of effective gravity and two times physics in ferrofluid-based hyperbolic metamaterials

Authors: V. N. Smolyaninova, J. Cartelli, B. Augstein, S. Spickard, M. S. Devadas, I. I. Smolyaninov

Abstract: Recently it was proposed that extraordinary light waves in hyperbolic metamaterials exhibit two times physics behavior (Phys. Rev. Lett. 105, 067402, 2010). We report experimental observation of this effect via investigation of gravity-like nonlinear optics of iron/cobalt-based ferrofluid hyperbolic metamaterials. In addition to conventional temporal coordinate, the spatial coordinate oriented alo… ▽ More Recently it was proposed that extraordinary light waves in hyperbolic metamaterials exhibit two times physics behavior (Phys. Rev. Lett. 105, 067402, 2010). We report experimental observation of this effect via investigation of gravity-like nonlinear optics of iron/cobalt-based ferrofluid hyperbolic metamaterials. In addition to conventional temporal coordinate, the spatial coordinate oriented along the optical axis of the metamaterial also exhibits timelike character, which leads to very unusual two times physics behavior in these systems. △ Less

Submitted 20 August, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: 23 pages, 4 figures, this version is accepted for publication in Advanced Photonics

Journal ref: Advanced Photonics 2, 056001 (2020)

arXiv:1902.06101 [pdf, other]

Local Differential Privacy in Decentralized Optimization

Authors: Hanshen Xiao, Yu Ye, Srinivas Devadas

Abstract: Privacy concerns with sensitive data are receiving increasing attention. In this paper, we study local differential privacy (LDP) in interactive decentralized optimization. By constructing random local aggregators, we propose a framework to amplify LDP by a constant. We take Alternating Direction Method of Multipliers (ADMM), and decentralized gradient descent as two concrete examples, where exper… ▽ More Privacy concerns with sensitive data are receiving increasing attention. In this paper, we study local differential privacy (LDP) in interactive decentralized optimization. By constructing random local aggregators, we propose a framework to amplify LDP by a constant. We take Alternating Direction Method of Multipliers (ADMM), and decentralized gradient descent as two concrete examples, where experiments support our theory. In an asymptotic view, we address the following question: Under LDP, is it possible to design a distributed private minimizer for arbitrary closed convex constraints with utility loss not explicitly dependent on dimensionality? As an affiliated result, we also show that with merely linear secret sharing, information theoretic privacy is achievable for bounded colluding agents. △ Less

Submitted 1 June, 2019; v1 submitted 16 February, 2019; originally announced February 2019.

arXiv:1901.04368 [pdf, other]

XRD: Scalable Messaging System with Cryptographic Privacy

Authors: Albert Kwon, David Lu, Srinivas Devadas

Abstract: Even as end-to-end encrypted communication becomes more popular, private messaging remains a challenging problem due to metadata leakages, such as who is communicating with whom. Most existing systems that hide communication metadata either (1) do not scale easily, (2) incur significant overheads, or (3) provide weaker guarantees than cryptographic privacy, such as differential privacy or heuristi… ▽ More Even as end-to-end encrypted communication becomes more popular, private messaging remains a challenging problem due to metadata leakages, such as who is communicating with whom. Most existing systems that hide communication metadata either (1) do not scale easily, (2) incur significant overheads, or (3) provide weaker guarantees than cryptographic privacy, such as differential privacy or heuristic privacy. This paper presents XRD (short for Crossroads), a metadata private messaging system that provides cryptographic privacy, while scaling easily to support more users by adding more servers. At a high level, XRD uses multiple mix networks in parallel with several techniques, including a novel technique we call aggregate hybrid shuffle. As a result, XRD can support 2 million users with 251 seconds of latency with 100 servers. This is 12x and 3.7x faster than Atom and Pung, respectively, which are prior scalable messaging systems with cryptographic privacy. △ Less

Submitted 14 January, 2019; originally announced January 2019.

arXiv:1812.10605 [pdf, other]

Sanctorum: A lightweight security monitor for secure enclaves

Authors: Ilia Lebedev, Kyle Hogan, Jules Drean, David Kohlbrenner, Dayeol Lee, Krste Asanović, Dawn Song, Srinivas Devadas

Abstract: Enclaves have emerged as a particularly compelling primitive to implement trusted execution environments: strongly isolated sensitive user-mode processes in a largely untrusted software environment. While the threat models employed by various enclave systems differ, the high-level guarantees they offer are essentially the same: attestation of an enclave's initial state, as well as a guarantee of e… ▽ More Enclaves have emerged as a particularly compelling primitive to implement trusted execution environments: strongly isolated sensitive user-mode processes in a largely untrusted software environment. While the threat models employed by various enclave systems differ, the high-level guarantees they offer are essentially the same: attestation of an enclave's initial state, as well as a guarantee of enclave integrity and privacy in the presence of an adversary. This work describes Sanctorum, a small trusted code base (TCB), consisting of a generic enclave-capable system, which is sufficient to implement secure enclaves akin to the primitive offered by Intel's SGX. While enclaves may be implemented via unconditionally trusted hardware and microcode, as it is the case in SGX, we employ a smaller TCB principally consisting of authenticated, privileged software, which may be replaced or patched as needed. Sanctorum implements a formally verified specification for generic enclaves on an in-order multiprocessor system meeting baseline security requirements, e.g., the MIT Sanctum processor and the Keystone enclave framework. Sanctorum requires trustworthy hardware including a random number generator, a private cryptographic key pair derived via a secure bootstrapping protocol, and a robust isolation primitive to safeguard sensitive information. Sanctorum's threat model is informed by the threat model of the isolation primitive, and is suitable for adding enclaves to a variety of processor systems. △ Less

Submitted 26 December, 2018; originally announced December 2018.

Comments: 6 pages

arXiv:1812.09822 [pdf, other]

MI6: Secure Enclaves in a Speculative Out-of-Order Processor

Authors: Thomas Bourgeat, Ilia Lebedev, Andrew Wright, Sizhuo Zhang, Arvind, Srinivas Devadas

Abstract: Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees. We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attac… ▽ More Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees. We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attacker capable of mounting any software attack currently considered practical, including control flow speculation attacks. MI6 is inspired by Sanctum [16] and extends its isolation guarantee to more realistic memory hierarchies. It also introduces a purge instruction, which is used only when a secure process is scheduled, and implements it for a complex processor microarchitecture. We model the performance impact of enclaves in MI6 through FPGA emulation on AWS F1 FPGAs by running SPEC CINT2006 benchmarks on top of an untrusted Linux OS. Security comes at the cost of approximately 16.4% average slowdown for protected programs. △ Less

Submitted 29 August, 2019; v1 submitted 23 December, 2018; originally announced December 2018.

Comments: 15 pages

arXiv:1802.10215 [pdf, other]

doi 10.2478/popets-2019-0070

Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning

Authors: Sanjit Bhat, David Lu, Albert Kwon, Srinivas Devadas

Abstract: In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work,… ▽ More In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over $1\%$ higher true positive rate (TPR) than state-of-the-art attacks while achieving $4\times$ lower false positive rate (FPR). Var-CNN's improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by $3.12\%$ while increasing the TPR by $13\%$. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues. △ Less

Submitted 23 July, 2019; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: Original paper split into "Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning" (PETS 2019) and "DynaFlow: An Efficient Website Fingerprinting Defense Based on Dynamically-Adjusting Flows" (WPES 2018)

Journal ref: Proceedings on Privacy Enhancing Technologies. 2019 (4):292-310

arXiv:1704.02677 [pdf, other]

Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation

Authors: Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, Srinivas Devadas

Abstract: Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs mainly optimize for hit latency and do not consider off-chip bandwid… ▽ More Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs mainly optimize for hit latency and do not consider off-chip bandwidth efficiency as a first-class design constraint. Hence, as we show in this paper, these designs are suboptimal for use with in-package DRAM. We propose a new DRAM cache design, Banshee, that optimizes for both in- and off-package DRAM bandwidth efficiency without degrading access latency. The key ideas are to eliminate the in-package DRAM bandwidth overheads due to costly tag accesses through virtual memory mechanism and to incorporate a bandwidth-aware frequency-based replacement policy that is biased to reduce unnecessary traffic to off-package DRAM. Our extensive evaluation shows that Banshee provides significant performance improvement and traffic reduction over state-of-the-art latency-optimized DRAM cache designs. △ Less

Submitted 9 April, 2017; originally announced April 2017.

Comments: 12 pages

arXiv:1704.02397 [pdf, other]

Efficient Synchronous Byzantine Consensus

Authors: Ittai Abraham, Srinivas Devadas, Danny Dolev, Kartik Nayak, Ling Ren

Abstract: We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to… ▽ More We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption. We present a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case. The key challenge is to ensure quorum intersection at one honest replica. Our solution is to rely on the synchrony assumption to form a post-commit quorum of size $2f+1$, which intersects at $f+1$ replicas with any pre-commit quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in expected 8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24 rounds. Our protocols may be applied to build Byzantine fault tolerant systems or improve cryptographic protocols such as cryptocurrencies when synchrony can be assumed. △ Less

Submitted 12 September, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

arXiv:1612.07841 [pdf, other]

Atom: Horizontally Scaling Strong Anonymity

Authors: Albert Kwon, Henry Corrigan-Gibbs, Srinivas Devadas, Bryan Ford

Abstract: Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all ho… ▽ More Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all honest users of the system, against an active adversary who controls the entire network, a portion of the system's servers, and any number of malicious users. The architectural ideas behind Atom have been known in theory, but putting them into practice requires new techniques for (1) avoiding the reliance on heavy general-purpose multi-party computation protocols, (2) defeating active attacks by malicious servers at minimal performance cost, and (3) handling server failure and churn. Atom is most suitable for sending a large number of short messages, as in a microblogging application or a high-security communication bootstrapping ("dialing") for private messaging systems. We show that, on a heterogeneous network of 1,024 servers, Atom can transit a million Tweet-length messages in 28 minutes. This is over 23x faster than prior systems with similar privacy guarantees. △ Less

Submitted 2 October, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

arXiv:1511.08774 [pdf, other]

Tardis 2.0: Optimized Time Traveling Coherence for Relaxed Consistency Models

Authors: Xiangyao Yu, Hongzhe Liu, Ethan Zou, Srinivas Devadas

Abstract: Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalability. The original Tardis protocol, however, only supports the Sequential Consistency (SC) memory m… ▽ More Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalability. The original Tardis protocol, however, only supports the Sequential Consistency (SC) memory model, limiting its applicability. Tardis also incurs extra network traffic on some benchmarks due to renew messages, and has suboptimal performance when the program uses spinning to communicate between threads. In this paper, we address these downsides of Tardis protocol and make it significantly more practical. Specifically, we discuss the architectural, memory system and protocol changes required in order to implement the TSO consistency model on Tardis, and prove that the modified protocol satisfies TSO. We also describe modifications for Partial Store Order (PSO) and Release Consistency (RC). Finally, we propose optimizations for better leasing policies and to handle program spinning. On a set of benchmarks, optimized Tardis improves on a full-map directory protocol in the metrics of performance, storage and network traffic, while being simpler to implement. △ Less

Submitted 27 July, 2016; v1 submitted 27 November, 2015; originally announced November 2015.

Comments: 14 pages

arXiv:1505.07891 [pdf, ps, other]

doi 10.1080/00927872.2016.1226866

The polynomial representation of the type $A_{n - 1}$ rational Cherednik algebra in characteristic $p \mid n$

Authors: Sheela Devadas, Yi Sun

Abstract: We study the polynomial representation of the rational Cherednik algebra of type $A_{n-1}$ with generic parameter in characteristic $p$ for $p \mid n$. We give explicit formulas for generators for the maximal proper graded submodule, show that they cut out a complete intersection, and thus compute the Hilbert series of the irreducible quotient. Our methods are motivated by taking characteristic… ▽ More We study the polynomial representation of the rational Cherednik algebra of type $A_{n-1}$ with generic parameter in characteristic $p$ for $p \mid n$. We give explicit formulas for generators for the maximal proper graded submodule, show that they cut out a complete intersection, and thus compute the Hilbert series of the irreducible quotient. Our methods are motivated by taking characteristic $p$ analogues of existing characteristic $0$ results. △ Less

Submitted 12 May, 2016; v1 submitted 28 May, 2015; originally announced May 2015.

Comments: 8 pages. v3: Streamlined proof of complete intersection property in Section 3; main results are unchanged

Journal ref: Commun. Algebra 45 (2016), 1926-1934

arXiv:1505.06459 [pdf, other]

A Proof of Correctness for the Tardis Cache Coherence Protocol

Authors: Xiangyao Yu, Muralidaran Vijayaraghavan, Srinivas Devadas

Abstract: We prove the correctness of a recently-proposed cache coherence protocol, Tardis, which is simple, yet scalable to high processor counts, because it only requires O(logN) storage per cacheline for an N-processor system. We prove that Tardis follows the sequential consistency model and is both deadlock- and livelock-free. Our proof is based on simple and intuitive invariants of the system and thus… ▽ More We prove the correctness of a recently-proposed cache coherence protocol, Tardis, which is simple, yet scalable to high processor counts, because it only requires O(logN) storage per cacheline for an N-processor system. We prove that Tardis follows the sequential consistency model and is both deadlock- and livelock-free. Our proof is based on simple and intuitive invariants of the system and thus applies to any system scale and many variants of Tardis. △ Less

Submitted 24 May, 2015; originally announced May 2015.

Comments: 16 pages, 2 figures

arXiv:1501.04504 [pdf, other]

TARDIS: Timestamp based Coherence Algorithm for Distributed Shared Memory

Authors: Xiangyao Yu, Srinivas Devadas

Abstract: A new memory coherence protocol, Tardis, is proposed. Tardis uses timestamp counters representing logical time as well as physical time to order memory operations and enforce sequential consistency in any type of shared memory system. Tardis is unique in that as compared to the widely-adopted directory coherence protocol, and its variants, it completely avoids multicasting and only requires O(log… ▽ More A new memory coherence protocol, Tardis, is proposed. Tardis uses timestamp counters representing logical time as well as physical time to order memory operations and enforce sequential consistency in any type of shared memory system. Tardis is unique in that as compared to the widely-adopted directory coherence protocol, and its variants, it completely avoids multicasting and only requires O(log N) storage per cache block for an N-core system rather than O(N) sharer information. Tardis is simpler and easier to reason about, yet achieves similar performance to directory protocols on a wide range of benchmarks run on 16, 64 and 256 cores. △ Less

Submitted 23 September, 2015; v1 submitted 19 January, 2015; originally announced January 2015.

Comments: 16 pages

arXiv:1412.5484 [pdf, ps, other]

doi 10.1007/s00224-015-9639-z

A Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness

Authors: Sheela Devadas, Ronitt Rubinfeld

Abstract: We present simple, self-contained proofs of correctness for algorithms for linearity testing and program checking of linear functions on finite subsets of integers represented as n-bit numbers. In addition we explore a generalization of self-testing to homomorphisms on a multidimensional vector space. We show that our self-testing algorithm for the univariate case can be directly generalized to ve… ▽ More We present simple, self-contained proofs of correctness for algorithms for linearity testing and program checking of linear functions on finite subsets of integers represented as n-bit numbers. In addition we explore a generalization of self-testing to homomorphisms on a multidimensional vector space. We show that our self-testing algorithm for the univariate case can be directly generalized to vector space domains. The number of queries made by our algorithms is independent of domain size. △ Less

Submitted 22 June, 2015; v1 submitted 17 December, 2014; originally announced December 2014.

arXiv:1304.0856 [pdf, ps, other]

doi 10.1216/JCA-2014-6-4-525

Representations of rational Cherednik algebras of G(m,r,n) in positive characteristic

Authors: Sheela Devadas, Steven V Sam

Abstract: We study lowest-weight irreducible representations of rational Cherednik algebras attached to the complex reflection groups G(m,r,n) in characteristic p. Our approach is mostly from the perspective of commutative algebra. By studying the kernel of the contravariant bilinear form on Verma modules, we obtain formulas for Hilbert series of irreducible representations in a number of cases, and present… ▽ More We study lowest-weight irreducible representations of rational Cherednik algebras attached to the complex reflection groups G(m,r,n) in characteristic p. Our approach is mostly from the perspective of commutative algebra. By studying the kernel of the contravariant bilinear form on Verma modules, we obtain formulas for Hilbert series of irreducible representations in a number of cases, and present conjectures in other cases. We observe that the form of the Hilbert series of the irreducible representations and the generators of the kernel tend to be determined by the value of n modulo p, and are related to special classes of subspace arrangements. Perhaps the most novel (conjectural) discovery from the commutative algebra perspective is that the generators of the kernel can be given the structure of a "matrix regular sequence" in some instances, which we prove in some small cases. △ Less

Submitted 13 October, 2013; v1 submitted 3 April, 2013; originally announced April 2013.

Comments: 23 pages; v2: clarified statement of Lemma 3.2

MSC Class: 13A50; 14N20; 16S99

Journal ref: J. Commut. Algebra 6 (2014), no. 4, 525-559

arXiv:1202.5150 [pdf, other]

Path ORAM: An Extremely Simple Oblivious RAM Protocol

Authors: Emil Stefanov, Marten van Dijk, Elaine Shi, T-H. Hubert Chan, Christopher Fletcher, Ling Ren, Xiangyao Yu, Srinivas Devadas

Abstract: We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best kno… ▽ More We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal. △ Less

Submitted 13 January, 2014; v1 submitted 23 February, 2012; originally announced February 2012.

arXiv:cs/0605109 [pdf, ps, other]

Knowledge Flow Analysis for Security Protocols

Authors: Emina Torlak, Marten van Dijk, Blaise Gassend, Daniel Jackson, Srinivas Devadas

Abstract: Knowledge flow analysis offers a simple and flexible way to find flaws in security protocols. A protocol is described by a collection of rules constraining the propagation of knowledge amongst principals. Because this characterization corresponds closely to informal descriptions of protocols, it allows a succinct and natural formalization; because it abstracts away message ordering, and handles… ▽ More Knowledge flow analysis offers a simple and flexible way to find flaws in security protocols. A protocol is described by a collection of rules constraining the propagation of knowledge amongst principals. Because this characterization corresponds closely to informal descriptions of protocols, it allows a succinct and natural formalization; because it abstracts away message ordering, and handles communications between principals and applications of cryptographic primitives uniformly, it is readily represented in a standard logic. A generic framework in the Alloy modelling language is presented, and instantiated for two standard protocols, and a new key management scheme. △ Less

Submitted 24 May, 2006; originally announced May 2006.

Comments: 20 pages

Report number: MIT-CSAIL-TR-2005-066

arXiv:cs/0605097 [pdf, ps, other]

A Generalized Two-Phase Analysis of Knowledge Flows in Security Protocols

Authors: Marten van Dijk, Emina Torlak, Blaise Gassend, Srinivas Devadas

Abstract: We introduce knowledge flow analysis, a simple and flexible formalism for checking cryptographic protocols. Knowledge flows provide a uniform language for expressing the actions of principals, assump- tions about intruders, and the properties of cryptographic primitives. Our approach enables a generalized two-phase analysis: we extend the two-phase theory by identifying the necessary and suffici… ▽ More We introduce knowledge flow analysis, a simple and flexible formalism for checking cryptographic protocols. Knowledge flows provide a uniform language for expressing the actions of principals, assump- tions about intruders, and the properties of cryptographic primitives. Our approach enables a generalized two-phase analysis: we extend the two-phase theory by identifying the necessary and sufficient proper- ties of a broad class of cryptographic primitives for which the theory holds. We also contribute a library of standard primitives and show that they satisfy our criteria. △ Less

Submitted 22 May, 2006; originally announced May 2006.

Comments: 16 pages

Showing 1–31 of 31 results for author: Devadas, S