-
UnifyFL: Enabling Decentralized Cross-Silo Federated Learning
Authors:
Sarang S,
Druva Dhakshinamoorthy,
Aditya Shiva Sharma,
Yuvraj Singh Bhadauria,
Siddharth Chaitra Vivek,
Arihant Bansal,
Arnab K. Paul
Abstract:
Federated Learning (FL) is a decentralized machine learning (ML) paradigm in which models are trained on private data across several devices called clients and combined at a single node called an aggregator rather than aggregating the data itself. Many organizations employ FL to have better privacy-aware ML-driven decision-making capabilities. However, organizations often operate independently rat…
▽ More
Federated Learning (FL) is a decentralized machine learning (ML) paradigm in which models are trained on private data across several devices called clients and combined at a single node called an aggregator rather than aggregating the data itself. Many organizations employ FL to have better privacy-aware ML-driven decision-making capabilities. However, organizations often operate independently rather than collaborate to enhance their FL capabilities due to the lack of an effective mechanism for collaboration. The challenge lies in balancing trust and resource efficiency. One approach relies on trusting a third-party aggregator to consolidate models from all organizations (multilevel FL), but this requires trusting an entity that may be biased or unreliable. Alternatively, organizations can bypass a third party by sharing their local models directly, which requires significant computational resources for validation. Both approaches reflect a fundamental trade-off between trust and resource constraints, with neither offering an ideal solution. In this work, we develop a trust-based cross-silo FL framework called UnifyFL, which uses decentralized orchestration and distributed storage. UnifyFL provides flexibility to the participating organizations and presents synchronous and asynchronous modes to handle stragglers. Our evaluation on a diverse testbed shows that UnifyFL achieves a performance comparable to the ideal multilevel centralized FL while allowing trust and optimal use of resources.
△ Less
Submitted 5 May, 2025; v1 submitted 26 April, 2025;
originally announced April 2025.
-
When Less is More: Achieving Faster Convergence in Distributed Edge Machine Learning
Authors:
Advik Raj Basani,
Siddharth Chaitra Vivek,
Advaith Krishna,
Arnab K. Paul
Abstract:
Distributed Machine Learning (DML) on resource-constrained edge devices holds immense potential for real-world applications. However, achieving fast convergence in DML in these heterogeneous environments remains a significant challenge. Traditional frameworks like Bulk Synchronous Parallel and Asynchronous Stochastic Parallel rely on frequent, small updates that incur substantial communication ove…
▽ More
Distributed Machine Learning (DML) on resource-constrained edge devices holds immense potential for real-world applications. However, achieving fast convergence in DML in these heterogeneous environments remains a significant challenge. Traditional frameworks like Bulk Synchronous Parallel and Asynchronous Stochastic Parallel rely on frequent, small updates that incur substantial communication overhead and hinder convergence speed. Furthermore, these frameworks often employ static dataset sizes, neglecting the heterogeneity of edge devices and potentially leading to straggler nodes that slow down the entire training process. The straggler nodes, i.e., edge devices that take significantly longer to process their assigned data chunk, hinder the overall training speed. To address these limitations, this paper proposes Hermes, a novel probabilistic framework for efficient DML on edge devices. This framework leverages a dynamic threshold based on recent test loss behavior to identify statistically significant improvements in the model's generalization capability, hence transmitting updates only when major improvements are detected, thereby significantly reducing communication overhead. Additionally, Hermes employs dynamic dataset allocation to optimize resource utilization and prevents performance degradation caused by straggler nodes. Our evaluations on a real-world heterogeneous resource-constrained environment demonstrate that Hermes achieves faster convergence compared to state-of-the-art methods, resulting in a remarkable $13.22$x reduction in training time and a $62.1\%$ decrease in communication overhead.
△ Less
Submitted 27 October, 2024;
originally announced October 2024.
-
Practical Privacy-Preserving Identity Verification using Third-Party Cloud Services and FHE (Role of Data Encoding in Circuit Depth Management)
Authors:
Deep Inder Mohan,
Srinivas Vivek
Abstract:
National digital identity verification systems have played a critical role in the effective distribution of goods and services, particularly, in developing countries. Due to the cost involved in deploying and maintaining such systems, combined with a lack of in-house technical expertise, governments seek to outsource this service to third-party cloud service providers to the extent possible. This…
▽ More
National digital identity verification systems have played a critical role in the effective distribution of goods and services, particularly, in developing countries. Due to the cost involved in deploying and maintaining such systems, combined with a lack of in-house technical expertise, governments seek to outsource this service to third-party cloud service providers to the extent possible. This leads to increased concerns regarding the privacy of users' personal data. In this work, we propose a practical privacy-preserving digital identity (ID) verification protocol where the third-party cloud services process the identity data encrypted using a (single-key) Fully Homomorphic Encryption (FHE) scheme such as BFV. Though the role of a trusted entity such as government is not completely eliminated, our protocol does significantly reduces the computation load on such parties.
A challenge in implementing a privacy-preserving ID verification protocol using FHE is to support various types of queries such as exact and/or fuzzy demographic and biometric matches including secure age comparisons. From a cryptographic engineering perspective, our main technical contribution is a user data encoding scheme that encodes demographic and biometric user data in only two BFV ciphertexts and yet facilitates us to outsource various types of ID verification queries to a third-party cloud. Our encoding scheme also ensures that the only computation done by the trusted entity is a query-agnostic "extended" decryption. This is in stark contrast with recent works that outsource all the non-arithmetic operations to a trusted server. We implement our protocol using the Microsoft SEAL FHE library and demonstrate its practicality.
△ Less
Submitted 27 September, 2024; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Ciphertext-Only Attack on a Secure $k$-NN Computation on Cloud
Authors:
Shyam Murthy,
Santosh Kumar Upadhyaya,
Srinivas Vivek
Abstract:
The rise of cloud computing has spurred a trend of transferring data storage and computational tasks to the cloud. To protect confidential information such as customer data and business details, it is essential to encrypt this sensitive data before cloud storage. Implementing encryption can prevent unauthorized access, data breaches, and the resultant financial loss, reputation damage, and legal i…
▽ More
The rise of cloud computing has spurred a trend of transferring data storage and computational tasks to the cloud. To protect confidential information such as customer data and business details, it is essential to encrypt this sensitive data before cloud storage. Implementing encryption can prevent unauthorized access, data breaches, and the resultant financial loss, reputation damage, and legal issues. Moreover, to facilitate the execution of data mining algorithms on the cloud-stored data, the encryption needs to be compatible with domain computation. The $k$-nearest neighbor ($k$-NN) computation for a specific query vector is widely used in fields like location-based services. Sanyashi et al. (ICISS 2023) proposed an encryption scheme to facilitate privacy-preserving $k$-NN computation on the cloud by utilizing Asymmetric Scalar-Product-Preserving Encryption (ASPE).
In this work, we identify a significant vulnerability in the aforementioned encryption scheme of Sanyashi et al. Specifically, we give an efficient algorithm and also empirically demonstrate that their encryption scheme is vulnerable to the ciphertext-only attack (COA).
△ Less
Submitted 17 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Revisiting a Privacy-Preserving Location-based Service Protocol using Edge Computing
Authors:
Santosh Kumar Upadhyaya,
Srinivas Vivek
Abstract:
Location-based services are getting more popular day by day. Finding nearby stores, proximity-based marketing, on-road service assistance, etc., are some of the services that use location-based services. In location-based services, user information like user identity, user query, and location must be protected. Ma et al. (INFOCOM-BigSecurity 2019) proposed a privacy-preserving location-based servi…
▽ More
Location-based services are getting more popular day by day. Finding nearby stores, proximity-based marketing, on-road service assistance, etc., are some of the services that use location-based services. In location-based services, user information like user identity, user query, and location must be protected. Ma et al. (INFOCOM-BigSecurity 2019) proposed a privacy-preserving location-based service using Somewhat Homomorphic Encryption (SHE). Their protocol uses edge nodes that compute on SHE encrypted location data and determines the $k$-nearest points of interest contained in the Location-based Server (LBS) without revealing the original user coordinates to LBS, hence, ensuring privacy of users locations. In this work, we show that the above protocol by Ma et al. has a critical flaw. In particular, we show that their secure comparison protocol has a correctness issue in that it will not lead to correct comparison. A major consequence of this flaw is that straightforward approaches to fix this issue will make their protocol insecure. Namely, the LBS will be able to recover the actual locations of the users in each and every query.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Driver Locations Harvesting Attack on pRide
Authors:
Shyam Murthy,
Srinivas Vivek
Abstract:
Privacy preservation in Ride-Hailing Services (RHS) is intended to protect privacy of drivers and riders. pRide, published in IEEE Trans. Vehicular Technology 2021, is a prediction based privacy-preserving RHS protocol to match riders with an optimum driver. In the protocol, the Service Provider (SP) homomorphically computes Euclidean distances between encrypted locations of drivers and rider. Rid…
▽ More
Privacy preservation in Ride-Hailing Services (RHS) is intended to protect privacy of drivers and riders. pRide, published in IEEE Trans. Vehicular Technology 2021, is a prediction based privacy-preserving RHS protocol to match riders with an optimum driver. In the protocol, the Service Provider (SP) homomorphically computes Euclidean distances between encrypted locations of drivers and rider. Rider selects an optimum driver using decrypted distances augmented by a new-ride-emergence prediction. To improve the effectiveness of driver selection, the paper proposes an enhanced version where each driver gives encrypted distances to each corner of her grid. To thwart a rider from using these distances to launch an inference attack, the SP blinds these distances before sharing them with the rider. In this work, we propose a passive attack where an honest-but-curious adversary rider who makes a single ride request and receives the blinded distances from SP can recover the constants used to blind the distances. Using the unblinded distances, rider to driver distance and Google Nearest Road API, the adversary can obtain the precise locations of responding drivers. We conduct experiments with random on-road driver locations for four different cities. Our experiments show that we can determine the precise locations of at least 80% of the drivers participating in the enhanced pRide protocol.
△ Less
Submitted 4 January, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Passive Triangulation Attack on ORide
Authors:
Shyam Murthy,
Srinivas Vivek
Abstract:
Privacy preservation in Ride Hailing Services is intended to protect privacy of drivers and riders. ORide is one of the early RHS proposals published at USENIX Security Symposium 2017. In the ORide protocol, riders and drivers, operating in a zone, encrypt their locations using a Somewhat Homomorphic Encryption scheme (SHE) and forward them to the Service Provider (SP). SP homomorphically computes…
▽ More
Privacy preservation in Ride Hailing Services is intended to protect privacy of drivers and riders. ORide is one of the early RHS proposals published at USENIX Security Symposium 2017. In the ORide protocol, riders and drivers, operating in a zone, encrypt their locations using a Somewhat Homomorphic Encryption scheme (SHE) and forward them to the Service Provider (SP). SP homomorphically computes the squared Euclidean distance between riders and available drivers. Rider receives the encrypted distances and selects the optimal rider after decryption. In order to prevent a triangulation attack, SP randomly permutes the distances before sending them to the rider. In this work, we use propose a passive attack that uses triangulation to determine coordinates of all participating drivers whose permuted distances are available from the points of view of multiple honest-but-curious adversary riders. An attack on ORide was published at SAC 2021. The same paper proposes a countermeasure using noisy Euclidean distances to thwart their attack. We extend our attack to determine locations of drivers when given their permuted and noisy Euclidean distances from multiple points of reference, where the noise perturbation comes from a uniform distribution. We conduct experiments with different number of drivers and for different perturbation values. Our experiments show that we can determine locations of all drivers participating in the ORide protocol. For the perturbed distance version of the ORide protocol, our algorithm reveals locations of about 25% to 50% of participating drivers. Our algorithm runs in time polynomial in number of drivers.
△ Less
Submitted 4 January, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Comments on "A Privacy-Preserving Online Ride-Hailing System Without Involving a Third Trusted Server"
Authors:
Srinivas Vivek
Abstract:
Recently, Xie et al. (IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3068-3081, 2021) proposed a privacy-preserving Online Ride-Hailing (ORH) protocol that does not make use of a trusted third-party server. The primary goal of such privacy-preserving ORH protocols is to ensure the privacy of riders' and drivers' location data w.r.t. the ORH Service Provider (SP). In this not…
▽ More
Recently, Xie et al. (IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3068-3081, 2021) proposed a privacy-preserving Online Ride-Hailing (ORH) protocol that does not make use of a trusted third-party server. The primary goal of such privacy-preserving ORH protocols is to ensure the privacy of riders' and drivers' location data w.r.t. the ORH Service Provider (SP). In this note, we demonstrate a passive attack by the SP in the protocol of Xie et al. that enables it to completely recover the location of the rider as well as that of the responding drivers in each and every ride request query.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Cryptanalysis of the Privacy-Preserving Ride-Hailing Service TRACE
Authors:
Deepak Kumaraswamy,
Srinivas Vivek
Abstract:
In a typical ride-hailing service, the service provider (RS) matches a customer (RC) with the closest vehicle (RV) registered to this service. TRACE is an efficient privacy-preserving ride-hailing service proposed by Wang et al. in 2018. TRACE uses masking along with other cryptographic techniques to ensure efficient and accurate ride-matching. The RS uses masked location information to match RCs…
▽ More
In a typical ride-hailing service, the service provider (RS) matches a customer (RC) with the closest vehicle (RV) registered to this service. TRACE is an efficient privacy-preserving ride-hailing service proposed by Wang et al. in 2018. TRACE uses masking along with other cryptographic techniques to ensure efficient and accurate ride-matching. The RS uses masked location information to match RCs and RVs within a quadrant without obtaining their exact locations, thus ensuring privacy. In this work, we disprove the privacy claims in TRACE by showing the following: a) RCs and RVs can identify the secret spatial division maintained by RS (this reveals information about the density of RVs in the region and other potential trade secrets), and b) the RS can identify exact locations of RCs and RVs (this violates location privacy). Prior to exchanging encrypted messages in the TRACE protocol, each entity masks the plaintext message with a secret unknown to others. Our attack allows other entities to recover this plaintext from the masked value by exploiting shared randomness used across different messages, that eventually leads to a system of linear equations in the unknown plaintexts. This holds even when all the participating entities are honest-but-curious. We implement our attack and demonstrate its efficiency and high success rate. For the security parameters recommended for TRACE, an RV can recover the spatial division in less than a minute, and the RS can recover the location of an RV in less than a second on a commodity laptop.
△ Less
Submitted 24 December, 2021; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Attacks on a Privacy-Preserving Publish-Subscribe System and a Ride-Hailing Service
Authors:
Srinivas Vivek
Abstract:
A privacy-preserving Context-Aware Publish-Subscribe System (CA-PSS) enables an intermediary (broker) to match the content from a publisher and the subscription by a subscriber based on the current context while preserving confidentiality of the subscriptions and notifications. While a privacy-preserving Ride-Hailing Service (RHS) enables an intermediary (service provider) to match a ride request…
▽ More
A privacy-preserving Context-Aware Publish-Subscribe System (CA-PSS) enables an intermediary (broker) to match the content from a publisher and the subscription by a subscriber based on the current context while preserving confidentiality of the subscriptions and notifications. While a privacy-preserving Ride-Hailing Service (RHS) enables an intermediary (service provider) to match a ride request with a taxi driver in a privacy-friendly manner. In this work, we attack a privacy-preserving CA-PSS proposed by Nabeel et al. (2013), where we show that any entity in the system including the broker can learn the confidential subscriptions of the subscribers. We also attack a privacy-preserving RHS called lpRide proposed by Yu et al. (2019), where we show that any rider/driver can efficiently recover the secret keys of all other riders and drivers. Also, we show that any rider/driver will be able to learn the location of any rider. The attacks are based on our cryptanalysis of the modified Paillier cryptosystem proposed by Nabeel et al. that forms a building block for both the above protocols.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Revisiting Driver Anonymity in ORide
Authors:
Deepak Kumaraswamy,
Shyam Murthy,
Srinivas Vivek
Abstract:
Ride Hailing Services (RHS) have become a popular means of transportation, and with its popularity comes the concerns of privacy of riders and drivers. ORide is a privacy-preserving RHS proposed at the USENIX Security Symposium 2017 and uses Somewhat Homomorphic Encryption (SHE). In their protocol, a rider and all drivers in a zone send their encrypted coordinates to the RHS Service Provider (SP)…
▽ More
Ride Hailing Services (RHS) have become a popular means of transportation, and with its popularity comes the concerns of privacy of riders and drivers. ORide is a privacy-preserving RHS proposed at the USENIX Security Symposium 2017 and uses Somewhat Homomorphic Encryption (SHE). In their protocol, a rider and all drivers in a zone send their encrypted coordinates to the RHS Service Provider (SP) who computes the squared Euclidean distances between them and forwards them to the rider. The rider decrypts these and selects the optimal driver with least Euclidean distance. In this work, we demonstrate a location-harvesting attack where an honest-but-curious rider, making only a single ride request, can determine the exact coordinates of about half the number of responding drivers even when only the distance between the rider and drivers are given. The significance of our attack lies in inferring locations of other drivers in the zone, which are not (supposed to be) revealed to the rider as per the protocol. We validate our attack by running experiments on zones of varying sizes in arbitrarily selected big cities. Our attack is based on enumerating lattice points on a circle of sufficiently small radius and eliminating solutions based on conditions imposed by the application scenario. Finally, we propose a modification to ORide aimed at thwarting our attack and show that this modification provides sufficient driver anonymity while preserving ride matching accuracy.
△ Less
Submitted 24 December, 2021; v1 submitted 16 January, 2021;
originally announced January 2021.
-
Integrating Network Embedding and Community Outlier Detection via Multiclass Graph Description
Authors:
Sambaran Bandyopadhyay,
Saley Vishal Vivek,
M. N. Murty
Abstract:
Network (or graph) embedding is the task to map the nodes of a graph to a lower dimensional vector space, such that it preserves the graph properties and facilitates the downstream network mining tasks. Real world networks often come with (community) outlier nodes, which behave differently from the regular nodes of the community. These outlier nodes can affect the embedding of the regular nodes, i…
▽ More
Network (or graph) embedding is the task to map the nodes of a graph to a lower dimensional vector space, such that it preserves the graph properties and facilitates the downstream network mining tasks. Real world networks often come with (community) outlier nodes, which behave differently from the regular nodes of the community. These outlier nodes can affect the embedding of the regular nodes, if not handled carefully. In this paper, we propose a novel unsupervised graph embedding approach (called DMGD) which integrates outlier and community detection with node embedding. We extend the idea of deep support vector data description to the framework of graph embedding when there are multiple communities present in the given network, and an outlier is characterized relative to its community. We also show the theoretical bounds on the number of outliers detected by DMGD. Our formulation boils down to an interesting minimax game between the outliers, community assignments and the node embedding function. We also propose an efficient algorithm to solve this optimization framework. Experimental results on both synthetic and real world networks show the merit of our approach compared to state-of-the-arts.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Regularizers for Single-step Adversarial Training
Authors:
B. S. Vivek,
R. Venkatesh Babu
Abstract:
The progress in the last decade has enabled machine learning models to achieve impressive performance across a wide range of tasks in Computer Vision. However, a plethora of works have demonstrated the susceptibility of these models to adversarial samples. Adversarial training procedure has been proposed to defend against such adversarial attacks. Adversarial training methods augment mini-batches…
▽ More
The progress in the last decade has enabled machine learning models to achieve impressive performance across a wide range of tasks in Computer Vision. However, a plethora of works have demonstrated the susceptibility of these models to adversarial samples. Adversarial training procedure has been proposed to defend against such adversarial attacks. Adversarial training methods augment mini-batches with adversarial samples, and typically single-step (non-iterative) methods are used for generating these adversarial samples. However, models trained using single-step adversarial training converge to degenerative minima where the model merely appears to be robust. The pseudo robustness of these models is due to the gradient masking effect. Although multi-step adversarial training helps to learn robust models, they are hard to scale due to the use of iterative methods for generating adversarial samples. To address these issues, we propose three different types of regularizers that help to learn robust models using single-step adversarial training methods. The proposed regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model. Performance of models trained using the proposed regularizers is on par with models trained using computationally expensive multi-step adversarial training methods.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
FDA: Feature Disruptive Attack
Authors:
Aditya Ganeshan,
B. S. Vivek,
R. Venkatesh Babu
Abstract:
Though Deep Neural Networks (DNN) show excellent performance across various computer vision tasks, several works show their vulnerability to adversarial samples, i.e., image samples with imperceptible noise engineered to manipulate the network's prediction. Adversarial sample generation methods range from simple to complex optimization techniques. Majority of these methods generate adversaries thr…
▽ More
Though Deep Neural Networks (DNN) show excellent performance across various computer vision tasks, several works show their vulnerability to adversarial samples, i.e., image samples with imperceptible noise engineered to manipulate the network's prediction. Adversarial sample generation methods range from simple to complex optimization techniques. Majority of these methods generate adversaries through optimization objectives that are tied to the pre-softmax or softmax output of the network. In this work we, (i) show the drawbacks of such attacks, (ii) propose two new evaluation metrics: Old Label New Rank (OLNR) and New Label Old Rank (NLOR) in order to quantify the extent of damage made by an attack, and (iii) propose a new adversarial attack FDA: Feature Disruptive Attack, to address the drawbacks of existing attacks. FDA works by generating image perturbation that disrupt features at each layer of the network and causes deep-features to be highly corrupt. This allows FDA adversaries to severely reduce the performance of deep networks. We experimentally validate that FDA generates stronger adversaries than other state-of-the-art methods for image classification, even in the presence of various defense measures. More importantly, we show that FDA disrupts feature-representation based tasks even without access to the task-specific network or methodology. Code available at: https://github.com/BardOfCodes/fda
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Cyber-physical risks of hacked Internet-connected vehicles
Authors:
Skanda Vivek,
David Yanni,
Peter J. Yunker,
Jesse L. Silverberg
Abstract:
The integration of automotive technology with Internet-connectivity promises to both dramatically improve transportation, while simultaneously introducing the potential for new unknown risks. Internet-connected vehicles are like digital data because they can be targeted for malicious hacking. Unlike digital data, however, Internet-connected vehicles are cyber-physical systems that physically inter…
▽ More
The integration of automotive technology with Internet-connectivity promises to both dramatically improve transportation, while simultaneously introducing the potential for new unknown risks. Internet-connected vehicles are like digital data because they can be targeted for malicious hacking. Unlike digital data, however, Internet-connected vehicles are cyber-physical systems that physically interact with each other and their environment. As such, the extension of cybersecurity concerns into the cyber-physical domain introduces new possibilities for self-organized phenomena in traffic flow. Here, we study a scenario envisioned by cybersecurity experts leading to a large number of Internet-connected vehicles being suddenly and simultaneously disabled. We investigate post-hack traffic using agent-based simulations, and discover the critical relevance of percolation for probabilistically predicting the outcomes on a multi-lane road in the immediate aftermath of a vehicle-targeted cyber attack. We develop an analytic percolation-based model to rapidly assess road conditions given the density of disabled vehicles and apply it to study the street network of Manhattan (NY, USA) revealing the city's vulnerability to this particular cyber-physical attack.
△ Less
Submitted 28 February, 2019;
originally announced March 2019.