-
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models
Authors:
Shiwei Ding,
Lan Zhang,
Zhenlin Wang,
Giuseppe Ateniese,
Xiaoyong Yuan
Abstract:
Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommodating foundation model fine-tuning in most commercial devices, which often have limited memory bandwidth. Techniques like model sharding and tensor parallelism address this issue by distributing compu…
▽ More
Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommodating foundation model fine-tuning in most commercial devices, which often have limited memory bandwidth. Techniques like model sharding and tensor parallelism address this issue by distributing computation across multiple devices to meet memory requirements. Nevertheless, these methods do not fully leverage their foundation nature in facilitating the fine-tuning process, resulting in high computational costs and imbalanced workloads. We introduce a novel Distributed Dynamic Fine-Tuning (D2FT) framework that strategically orchestrates operations across attention modules based on our observation that not all attention modules are necessary for forward and backward propagation in fine-tuning foundation models. Through three innovative selection strategies, D2FT significantly reduces the computational workload required for fine-tuning foundation models. Furthermore, D2FT addresses workload imbalances in distributed computing environments by optimizing these selection strategies via multiple knapsack optimization. Our experimental results demonstrate that the proposed D2FT framework reduces the training computational costs by 40% and training communication costs by 50% with only 1% to 2% accuracy drops on the CIFAR-10, CIFAR-100, and Stanford Cars datasets. Moreover, the results show that D2FT can be effectively extended to recent LoRA, a state-of-the-art parameter-efficient fine-tuning technique. By reducing 40% computational cost or 50% communication cost, D2FT LoRA top-1 accuracy only drops 4% to 6% on Stanford Cars dataset.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
Authors:
Dario Pasquini,
Evgenios M. Kornaropoulos,
Giuseppe Ateniese
Abstract:
Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs' susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automat…
▽ More
Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs' susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automated cyberattack, Mantis plants carefully crafted inputs into system responses, leading the attacker's LLM to disrupt their own operations (passive defense) or even compromise the attacker's machine (active defense). By deploying purposefully vulnerable decoy services to attract the attacker and using dynamic prompt injections for the attacker's LLM, Mantis can autonomously hack back the attacker. In our experiments, Mantis consistently achieved over 95% effectiveness against automated LLM-driven attacks. To foster further research and collaboration, Mantis is available as an open-source tool: https://github.com/pasquini-dario/project_mantis
△ Less
Submitted 18 November, 2024; v1 submitted 28 October, 2024;
originally announced October 2024.
-
LLMmap: Fingerprinting For Large Language Models
Authors:
Dario Pasquini,
Evgenios M. Kornaropoulos,
Giuseppe Ateniese
Abstract:
We introduce LLMmap, a first-generation fingerprinting technique targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM version in use. Our query selection is informed by domain expertise on how LLMs generate uniquely identifiable responses to thematical…
▽ More
We introduce LLMmap, a first-generation fingerprinting technique targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM version in use. Our query selection is informed by domain expertise on how LLMs generate uniquely identifiable responses to thematically varied prompts. With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM versions--whether open-source or proprietary--from various vendors, operating under various unknown system prompts, stochastic sampling hyperparameters, and even complex generation frameworks such as RAG or Chain-of-Thought. We discuss potential mitigations and demonstrate that, against resourceful adversaries, effective countermeasures may be challenging or even unrealizable.
△ Less
Submitted 10 February, 2025; v1 submitted 22 July, 2024;
originally announced July 2024.
-
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Authors:
Hanlin Zhang,
Benjamin L. Edelman,
Danilo Francati,
Daniele Venturi,
Giuseppe Ateniese,
Boaz Barak
Abstract:
Watermarking generative models consists of planting a statistical signal (watermark) in a model's output so that it can be later verified that the output was generated by the given model. A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation. In this paper, we study the (im)possibility…
▽ More
Watermarking generative models consists of planting a statistical signal (watermark) in a model's output so that it can be later verified that the output was generated by the given model. A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation. In this paper, we study the (im)possibility of strong watermarking schemes. We prove that, under well-specified and natural assumptions, strong watermarking is impossible to achieve. This holds even in the private detection algorithm setting, where the watermark insertion and detection algorithms share a secret key, unknown to the attacker. To prove this result, we introduce a generic efficient watermark attack; the attacker is not required to know the private key of the scheme or even which scheme is used. Our attack is based on two assumptions: (1) The attacker has access to a "quality oracle" that can evaluate whether a candidate output is a high-quality response to a prompt, and (2) The attacker has access to a "perturbation oracle" which can modify an output with a nontrivial probability of maintaining quality, and which induces an efficiently mixing random walk on high-quality outputs. We argue that both assumptions can be satisfied in practice by an attacker with weaker computational capabilities than the watermarked model itself, to which the attacker has only black-box access. Furthermore, our assumptions will likely only be easier to satisfy over time as models grow in capabilities and modalities. We demonstrate the feasibility of our attack by instantiating it to attack three existing watermarking schemes for large language models: Kirchenbauer et al. (2023), Kuditipudi et al. (2023), and Zhao et al. (2023). The same attack successfully removes the watermarks planted by all three schemes, with only minor quality degradation.
△ Less
Submitted 27 May, 2025; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data
Authors:
Dario Pasquini,
Giuseppe Ateniese,
Carmela Troncoso
Abstract:
We introduce the concept of "universal password model" -- a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system. To achieve this, the model does not need to access any plaintext passwords from the target credentials. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying pas…
▽ More
We introduce the concept of "universal password model" -- a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system. To achieve this, the model does not need to access any plaintext passwords from the target credentials. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying password distribution. Specifically, the model uses deep learning to capture the correlation between the auxiliary data of a group of users (e.g., users of a web application) and their passwords. It then exploits those patterns to create a tailored password model for the target system at inference time. No further training steps, targeted data collection, or prior knowledge of the community's password distribution is required. Besides improving over current password strength estimation techniques and attacks, the model enables any end-user (e.g., system administrators) to autonomously generate tailored password models for their systems without the often unworkable requirements of collecting suitable training data and fitting the underlying machine learning model. Ultimately, our framework enables the democratization of well-calibrated password models to the community, addressing a major challenge in the deployment of password security solutions at scale.
△ Less
Submitted 13 March, 2024; v1 submitted 18 January, 2023;
originally announced January 2023.
-
Eluding Secure Aggregation in Federated Learning via Model Inconsistency
Authors:
Dario Pasquini,
Danilo Francati,
Giuseppe Ateniese
Abstract:
Secure aggregation is a cryptographic protocol that securely computes the aggregation of its inputs. It is pivotal in keeping model updates private in federated learning. Indeed, the use of secure aggregation prevents the server from learning the value and the source of the individual model updates provided by the users, hampering inference and data attribution attacks. In this work, we show that…
▽ More
Secure aggregation is a cryptographic protocol that securely computes the aggregation of its inputs. It is pivotal in keeping model updates private in federated learning. Indeed, the use of secure aggregation prevents the server from learning the value and the source of the individual model updates provided by the users, hampering inference and data attribution attacks. In this work, we show that a malicious server can easily elude secure aggregation as if the latter were not in place. We devise two different attacks capable of inferring information on individual private training datasets, independently of the number of users participating in the secure aggregation. This makes them concrete threats in large-scale, real-world federated learning applications. The attacks are generic and equally effective regardless of the secure aggregation protocol used. They exploit a vulnerability of the federated learning protocol caused by incorrect usage of secure aggregation and lack of parameter validation. Our work demonstrates that current implementations of federated learning with secure aggregation offer only a "false sense of security".
△ Less
Submitted 6 September, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
Unleashing the Tiger: Inference Attacks on Split Learning
Authors:
Dario Pasquini,
Giuseppe Ateniese,
Massimo Bernaschi
Abstract:
We investigate the security of Split Learning -- a novel collaborative machine learning framework that enables peak performance by requiring minimal resources consumption. In the present paper, we expose vulnerabilities of the protocol and demonstrate its inherent insecurity by introducing general attack strategies targeting the reconstruction of clients' private training sets. More prominently, w…
▽ More
We investigate the security of Split Learning -- a novel collaborative machine learning framework that enables peak performance by requiring minimal resources consumption. In the present paper, we expose vulnerabilities of the protocol and demonstrate its inherent insecurity by introducing general attack strategies targeting the reconstruction of clients' private training sets. More prominently, we show that a malicious server can actively hijack the learning process of the distributed model and bring it into an insecure state that enables inference attacks on clients' data. We implement different adaptations of the attack and test them on various datasets as well as within realistic threat scenarios. We demonstrate that our attack is able to overcome recently proposed defensive techniques aimed at enhancing the security of the split learning protocol. Finally, we also illustrate the protocol's insecurity against malicious clients by extending previously devised attacks for Federated Learning. To make our results reproducible, we made our code available at https://github.com/pasquini-dario/SplitNN_FSHA.
△ Less
Submitted 4 November, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Reducing Bias in Modeling Real-world Password Strength via Deep Learning and Dynamic Dictionaries
Authors:
Dario Pasquini,
Marco Cianfriglia,
Giuseppe Ateniese,
Massimo Bernaschi
Abstract:
Password security hinges on an in-depth understanding of the techniques adopted by attackers. Unfortunately, real-world adversaries resort to pragmatic guessing strategies such as dictionary attacks that are inherently difficult to model in password security studies. In order to be representative of the actual threat, dictionary attacks must be thoughtfully configured and tuned. However, this proc…
▽ More
Password security hinges on an in-depth understanding of the techniques adopted by attackers. Unfortunately, real-world adversaries resort to pragmatic guessing strategies such as dictionary attacks that are inherently difficult to model in password security studies. In order to be representative of the actual threat, dictionary attacks must be thoughtfully configured and tuned. However, this process requires a domain-knowledge and expertise that cannot be easily replicated. The consequence of inaccurately calibrating dictionary attacks is the unreliability of password security analyses, impaired by a severe measurement bias.
In the present work, we introduce a new generation of dictionary attacks that is consistently more resilient to inadequate configurations. Requiring no supervision or domain-knowledge, this technique automatically approximates the advanced guessing strategies adopted by real-world attackers. To achieve this: (1) We use deep neural networks to model the proficiency of adversaries in building attack configurations. (2) Then, we introduce dynamic guessing strategies within dictionary attacks. These mimic experts' ability to adapt their guessing strategies on the fly by incorporating knowledge on their targets.
Our techniques enable more robust and sound password strength estimates within dictionary attacks, eventually reducing overestimation in modeling real-world threats in password security. Code available: https://github.com/TheAdamProject/adams
△ Less
Submitted 26 February, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Interpretable Probabilistic Password Strength Meters via Deep Learning
Authors:
Dario Pasquini,
Giuseppe Ateniese,
Massimo Bernaschi
Abstract:
Probabilistic password strength meters have been proved to be the most accurate tools to measure password strength. Unfortunately, by construction, they are limited to solely produce an opaque security estimation that fails to fully support the user during the password composition. In the present work, we move the first steps towards cracking the intelligibility barrier of this compelling class of…
▽ More
Probabilistic password strength meters have been proved to be the most accurate tools to measure password strength. Unfortunately, by construction, they are limited to solely produce an opaque security estimation that fails to fully support the user during the password composition. In the present work, we move the first steps towards cracking the intelligibility barrier of this compelling class of meters. We show that probabilistic password meters inherently own the capability of describing the latent relation occurring between password strength and password structure. In our approach, the security contribution of each character composing a password is disentangled and used to provide explicit fine-grained feedback for the user. Furthermore, unlike existing heuristic constructions, our method is free from any human bias, and, more importantly, its feedback has a probabilistic interpretation. In our contribution: (1) we formulate interpretable probabilistic password strength meters; (2) we describe how they can be implemented via an efficient and lightweight deep learning framework suitable for client-side operability.
△ Less
Submitted 11 May, 2021; v1 submitted 15 April, 2020;
originally announced April 2020.
-
Audita: A Blockchain-based Auditing Framework for Off-chain Storage
Authors:
Danilo Francati,
Giuseppe Ateniese,
Abdoulaye Faye,
Andrea Maria Milazzo,
Angelo Massimo Perillo,
Luca Schiatti,
Giuseppe Giordano
Abstract:
The cloud changed the way we manage and store data. Today, cloud storage services offer clients an infrastructure that allows them a convenient source to store, replicate, and secure data online. However, with these new capabilities also come limitations, such as lack of transparency, limited decentralization, and challenges with privacy and security. And, as the need for more agile, private and s…
▽ More
The cloud changed the way we manage and store data. Today, cloud storage services offer clients an infrastructure that allows them a convenient source to store, replicate, and secure data online. However, with these new capabilities also come limitations, such as lack of transparency, limited decentralization, and challenges with privacy and security. And, as the need for more agile, private and secure data solutions continues to grow exponentially, rethinking the current structure of cloud storage is mission-critical for enterprises. By leveraging and building upon blockchain's unique attributes, including immutability, security to the data element level, distributed (no single point of failure), we have developed a solution prototype that allows data to be reliably stored while simultaneously being secured, with tamper-evident auditability, via blockchain. The result, Audita, is a flexible solution that assures data protection and solves challenges such as scalability and privacy. Audita works via an augmented blockchain network of participants that include storage-nodes and block-creators. In addition, it provides an automatic and fair challenge system to assure that data is distributed and reliably and provably stored. While the prototype is built on Quorum, the solution framework can be used with any blockchain platform. The benefit is a system that is built to grow along with the data needs of enterprises, while continuing to build the network via incentives and solving for issues such as auditing and outsourcing.
△ Less
Submitted 3 July, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Improving Password Guessing via Representation Learning
Authors:
Dario Pasquini,
Ankit Gangwal,
Giuseppe Ateniese,
Massimo Bernaschi,
Mauro Conti
Abstract:
Learning useful representations from unstructured data is one of the core challenges, as well as a driving force, of modern data-driven approaches. Deep learning has demonstrated the broad advantages of learning and harnessing such representations. In this paper, we introduce a deep generative model representation learning approach for password guessing. We show that an abstract password represent…
▽ More
Learning useful representations from unstructured data is one of the core challenges, as well as a driving force, of modern data-driven approaches. Deep learning has demonstrated the broad advantages of learning and harnessing such representations. In this paper, we introduce a deep generative model representation learning approach for password guessing. We show that an abstract password representation naturally offers compelling and versatile properties that can be used to open new directions in the extensively studied, and yet presently active, password guessing field. These properties can establish novel password generation techniques that are neither feasible nor practical with the existing probabilistic and non-probabilistic approaches. Based on these properties, we introduce:(1) A general framework for conditional password guessing that can generate passwords with arbitrary biases; and (2) an Expectation Maximization-inspired framework that can dynamically adapt the estimated password distribution to match the distribution of the attacked password set.
△ Less
Submitted 26 July, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Arcula: A Secure Hierarchical Deterministic Wallet for Multi-asset Blockchains
Authors:
Adriano Di Luzio,
Danilo Francati,
Giuseppe Ateniese
Abstract:
This work presents Arcula, a new design for hierarchical deterministic wallets that brings identity-based addresses to the blockchain. Arcula is built on top of provably secure cryptographic primitives. It generates all its cryptographic secrets from a user-provided seed and enables the derivation of new public keys based on the identities of users, without requiring any secret information. Unlike…
▽ More
This work presents Arcula, a new design for hierarchical deterministic wallets that brings identity-based addresses to the blockchain. Arcula is built on top of provably secure cryptographic primitives. It generates all its cryptographic secrets from a user-provided seed and enables the derivation of new public keys based on the identities of users, without requiring any secret information. Unlike other wallets, it achieves all these properties while being secure against privilege escalation. We formalize the security model of hierarchical deterministic wallets and prove that an attacker compromising an arbitrary number of users within an Arcula wallet cannot escalate his privileges and compromise users higher in the access hierarchy. Our design works out-of-the-box with any blockchain that enables the verification of signatures on arbitrary messages. We evaluate its usage in a real-world scenario on the Bitcoin Cash network.
△ Less
Submitted 10 December, 2019; v1 submitted 13 June, 2019;
originally announced June 2019.
-
PassGAN: A Deep Learning Approach for Password Guessing
Authors:
Briland Hitaj,
Paolo Gasti,
Giuseppe Ateniese,
Fernando Perez-Cruz
Abstract:
State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., "password123456") and leet speak (e.g., "password" becomes "p4s…
▽ More
State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., "password123456") and leet speak (e.g., "password" becomes "p4s5w0rd"). Although these rules work well in practice, expanding them to model further passwords is a laborious task that requires specialized expertise. To address this issue, in this paper we introduce PassGAN, a novel approach that replaces human-generated password rules with theory-grounded machine learning algorithms. Instead of relying on manual password analysis, PassGAN uses a Generative Adversarial Network (GAN) to autonomously learn the distribution of real passwords from actual password leaks, and to generate high-quality password guesses. Our experiments show that this approach is very promising. When we evaluated PassGAN on two large password datasets, we were able to surpass rule-based and state-of-the-art machine learning password guessing tools. However, in contrast with the other tools, PassGAN achieved this result without any a-priori knowledge on passwords or common password structures. Additionally, when we combined the output of PassGAN with the output of HashCat, we were able to match 51%-73% more passwords than with HashCat alone. This is remarkable, because it shows that PassGAN can autonomously extract a considerable number of password properties that current state-of-the art rules do not encode.
△ Less
Submitted 14 February, 2019; v1 submitted 1 September, 2017;
originally announced September 2017.
-
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Authors:
Briland Hitaj,
Giuseppe Ateniese,
Fernando Perez-Cruz
Abstract:
Deep Learning has recently become hugely popular in machine learning, providing significant improvements in classification accuracy in the presence of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a…
▽ More
Deep Learning has recently become hugely popular in machine learning, providing significant improvements in classification accuracy in the presence of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a collection of users' private data, including habits, personal pictures, geographical positions, interests, and more, the centralized server will have access to sensitive information that could potentially be mishandled. To tackle this problem, collaborative deep learning models have recently been proposed where parties locally train their deep learning structures and only share a subset of the parameters in the attempt to keep their respective training sets private. Parameters can also be obfuscated via differential privacy (DP) to make information extraction even more challenging, as proposed by Shokri and Shmatikov at CCS'15.
Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper. In particular, we show that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants. The attack we developed exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates prototypical samples of the targeted training set that was meant to be private (the samples generated by the GAN are intended to come from the same distribution as the training data). Interestingly, we show that record-level DP applied to the shared parameters of the model, as suggested in previous work, is ineffective (i.e., record-level DP is not designed to address our attack).
△ Less
Submitted 14 September, 2017; v1 submitted 24 February, 2017;
originally announced February 2017.
-
From Pretty Good To Great: Enhancing PGP using Bitcoin and the Blockchain
Authors:
Duane Wilson,
Giuseppe Ateniese
Abstract:
PGP is built upon a Distributed Web of Trust in which the trustworthiness of a user is established by others who can vouch through a digital signature for that particular identity. Preventing its wholesale adoption are a number of inherent weaknesses to include (but not limited to) the following: 1) Trust Relationships are built on a subjective honor system, 2) Only first degree relationships can…
▽ More
PGP is built upon a Distributed Web of Trust in which the trustworthiness of a user is established by others who can vouch through a digital signature for that particular identity. Preventing its wholesale adoption are a number of inherent weaknesses to include (but not limited to) the following: 1) Trust Relationships are built on a subjective honor system, 2) Only first degree relationships can be fully trusted, 3) Levels of trust are difficult to quantify with actual values, and 4) Issues with the Web of Trust itself (Certification and Endorsement). Although the security that PGP provides is proven to be reliable, it has largely failed to garner large scale adoption. In this paper, we propose several novel contributions to address the aforementioned issues with PGP and associated Web of Trust. To address the subjectivity of the Web of Trust, we provide a new certificate format based on Bitcoin which allows a user to verify a PGP certificate using Bitcoin identity-verification transactions - forming first degree trust relationships that are tied to actual values (i.e., number of Bitcoins transferred during transaction). Secondly, we present the design of a novel Distributed PGP key server that leverages the Bitcoin transaction blockchain to store and retrieve Bitcoin-Based PGP certificates. Lastly, we provide a web prototype application that demonstrates several of these capabilities in an actual environment.
△ Less
Submitted 20 August, 2015; v1 submitted 19 August, 2015;
originally announced August 2015.
-
No Place to Hide that Bytes won't Reveal: Sniffing Location-Based Encrypted Traffic to Track a User's Position
Authors:
Giuseppe Ateniese,
Briland Hitaj,
Luigi V. Mancini,
Nino V. Verde,
Antonio Villani
Abstract:
News reports of the last few years indicated that several intelligence agencies are able to monitor large networks or entire portions of the Internet backbone. Such a powerful adversary has only recently been considered by the academic literature. In this paper, we propose a new adversary model for Location Based Services (LBSs). The model takes into account an unauthorized third party, different…
▽ More
News reports of the last few years indicated that several intelligence agencies are able to monitor large networks or entire portions of the Internet backbone. Such a powerful adversary has only recently been considered by the academic literature. In this paper, we propose a new adversary model for Location Based Services (LBSs). The model takes into account an unauthorized third party, different from the LBS provider itself, that wants to infer the location and monitor the movements of a LBS user. We show that such an adversary can extrapolate the position of a target user by just analyzing the size and the timing of the encrypted traffic exchanged between that user and the LBS provider. We performed a thorough analysis of a widely deployed location based app that comes pre-installed with many Android devices: GoogleNow. The results are encouraging and highlight the importance of devising more effective countermeasures against powerful adversaries to preserve the privacy of LBS users.
△ Less
Submitted 4 September, 2015; v1 submitted 28 May, 2015;
originally announced May 2015.
-
To Share or Not to Share in Client-Side Encrypted Clouds
Authors:
Duane Wilson,
Giuseppe Ateniese
Abstract:
With the advent of cloud computing, a number of cloud providers have arisen to provide Storage-as-a-Service (SaaS) offerings to both regular consumers and business organizations. SaaS (different than Software-as-a-Service in this context) refers to an architectural model in which a cloud provider provides digital storage on their own infrastructure. Three models exist amongst SaaS providers for pr…
▽ More
With the advent of cloud computing, a number of cloud providers have arisen to provide Storage-as-a-Service (SaaS) offerings to both regular consumers and business organizations. SaaS (different than Software-as-a-Service in this context) refers to an architectural model in which a cloud provider provides digital storage on their own infrastructure. Three models exist amongst SaaS providers for protecting the confidentiality data stored in the cloud: 1) no encryption (data is stored in plain text), 2) server-side encryption (data is encrypted once uploaded), and 3) client-side encryption (data is encrypted prior to upload). This paper seeks to identify weaknesses in the third model, as it claims to offer 100% user data confidentiality throughout all data transactions (e.g., upload, download, sharing) through a combination of Network Traffic Analysis, Source Code Decompilation, and Source Code Disassembly. The weaknesses we uncovered primarily center around the fact that the cloud providers we evaluated were each operating in a Certificate Authority capacity to facilitate data sharing. In this capacity, they assume the role of both certificate issuer and certificate authorizer as denoted in a Public-Key Infrastructure (PKI) scheme - which gives them the ability to view user data contradicting their claims of 100% data confidentiality. We have collated our analysis and findings in this paper and explore some potential solutions to address these weaknesses in these sharing methods. The solutions proposed are a combination of best practices associated with the use of PKI and other cryptographic primitives generally accepted for protecting the confidentiality of shared information.
△ Less
Submitted 19 November, 2014; v1 submitted 10 April, 2014;
originally announced April 2014.
-
No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone
Authors:
Nino Vincenzo Verde,
Giuseppe Ateniese,
Emanuele Gabrielli,
Luigi Vincenzo Mancini,
Angelo Spognardi
Abstract:
It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In rea…
▽ More
It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis.
△ Less
Submitted 9 February, 2014;
originally announced February 2014.
-
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Authors:
Giuseppe Ateniese,
Giovanni Felici,
Luigi V. Mancini,
Angelo Spognardi,
Antonio Villani,
Domenico Vitali
Abstract:
Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training…
▽ More
Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights.
△ Less
Submitted 19 June, 2013;
originally announced June 2013.