-
Confidential Computing across Edge-to-Cloud for Machine Learning: A Survey Study
Authors:
SM Zobaed,
Mohsen Amini Salehi
Abstract:
Confidential computing has gained prominence due to the escalating volume of data-driven applications (e.g., machine learning and big data) and the acute desire for secure processing of sensitive data, particularly, across distributed environments, such as edge-to-cloud continuum. Provided that the works accomplished in this emerging area are scattered across various research fields, this paper ai…
▽ More
Confidential computing has gained prominence due to the escalating volume of data-driven applications (e.g., machine learning and big data) and the acute desire for secure processing of sensitive data, particularly, across distributed environments, such as edge-to-cloud continuum. Provided that the works accomplished in this emerging area are scattered across various research fields, this paper aims at surveying the fundamental concepts, and cutting-edge software and hardware solutions developed for confidential computing using trusted execution environments, homomorphic encryption, and secure enclaves. We underscore the significance of building trust in both hardware and software levels and delve into their applications particularly for machine learning (ML) applications. While substantial progress has been made, there are some barely-explored areas that need extra attention from the researchers and practitioners in the community to improve confidentiality aspects, develop more robust attestation mechanisms, and to address vulnerabilities of the existing trusted execution environments. Providing a comprehensive taxonomy of the confidential computing landscape, this survey enables researchers to advance this field to ultimately ensure the secure processing of users' sensitive data across a multitude of applications and computing tiers.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
AI-Driven Confidential Computing across Edge-to-Cloud Continuum
Authors:
SM Zobaed
Abstract:
With the meteoric growth of technology, individuals and organizations are widely adopting cloud services to mitigate the burdens of maintenance. Despite its scalability and ease of use, many users who own sensitive data refrain from fully utilizing cloud services due to confidentiality concerns. Maintaining data confidentiality for data at rest and in transit has been widely explored but data rema…
▽ More
With the meteoric growth of technology, individuals and organizations are widely adopting cloud services to mitigate the burdens of maintenance. Despite its scalability and ease of use, many users who own sensitive data refrain from fully utilizing cloud services due to confidentiality concerns. Maintaining data confidentiality for data at rest and in transit has been widely explored but data remains vulnerable in the cloud while it is in use. This vulnerability is further elevated once the scope of computing spans across the edge-to-cloud continuum. Accordingly, the goal of this dissertation is to enable data confidentiality by adopting confidential computing across the continuum. Towards this goal, one approach we explore is to separate the intelligence aspect of data processing from the pattern-matching aspect. We present our approach to make confidential data clustering on the cloud, and then develop confidential search service across edge-to-cloud for unstructured text data. Our proposed clustering solution named ClusPr, performs topic-based clustering for static and dynamic datasets that improves cluster coherency up to 30%-to-60% when compared with other encryption-based clustering techniques. Our trusted enterprise search service named SAED, provides context-aware and personalized semantic search over confidential data across the continuum. We realized that enabling confidential computing across edge-to-cloud requires major contribution from the edge tiers particularly to run multiple Deep Learning (DL) services concurrently. This raises memory contention on the edge tier. To resolve this, we develop Edge-MultiAI framework to manage Neural Network (NN) models of DL applications such that it can meet the latency constraints of the DL applications without compromising inference accuracy.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge
Authors:
SM Zobaed,
Ali Mokhtari,
Jaya Prakash Champati,
Mathieu Kourouma,
Mohsen Amini Salehi
Abstract:
Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that c…
▽ More
Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that cannot be simultaneously maintained in the limited memory space of the edge. Accordingly, the main contribution of this research is to overcome the memory contention challenge, thereby, meeting the latency constraints of the DL applications without compromising their inference accuracy. We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory such that the degree of multi-tenancy and the number of warm-starts are maximized. Edge-MultiAI leverages NN model compression techniques, such as model quantization, and dynamically loads NN models for DL applications to stimulate multi-tenancy on the edge server. We also devise a model management heuristic for Edge-MultiAI, called iWS-BFE, that functions based on the Bayesian theory to predict the inference requests for multi-tenant applications, and uses it to choose the appropriate NN models for loading, hence, increasing the number of warm-start inferences. We evaluate the efficacy and robustness of Edge-MultiAI under various configurations. The results reveal that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2X and increase the number of warm-starts by around 60% without any major loss on the inference accuracy of the applications.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
DeepFakes: Detecting Forged and Synthetic Media Content Using Machine Learning
Authors:
Sm Zobaed,
Md Fazle Rabby,
Md Istiaq Hossain,
Ekram Hossain,
Sazib Hasan,
Asif Karim,
Khan Md. Hasib
Abstract:
The rapid advancement in deep learning makes the differentiation of authentic and manipulated facial images and video clips unprecedentedly harder. The underlying technology of manipulating facial appearances through deep generative approaches, enunciated as DeepFake that have emerged recently by promoting a vast number of malicious face manipulation applications. Subsequently, the need of other s…
▽ More
The rapid advancement in deep learning makes the differentiation of authentic and manipulated facial images and video clips unprecedentedly harder. The underlying technology of manipulating facial appearances through deep generative approaches, enunciated as DeepFake that have emerged recently by promoting a vast number of malicious face manipulation applications. Subsequently, the need of other sort of techniques that can assess the integrity of digital visual content is indisputable to reduce the impact of the creations of DeepFake. A large body of research that are performed on DeepFake creation and detection create a scope of pushing each other beyond the current status. This study presents challenges, research trends, and directions related to DeepFake creation and detection techniques by reviewing the notable research in the DeepFake domain to facilitate the development of more robust approaches that could deal with the more advance DeepFake in the future.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
SAED: Edge-Based Intelligence for Privacy-Preserving Enterprise Search on the Cloud
Authors:
Sakib M Zobaed,
Mohsen Amini Salehi,
Rajkumar Buyya
Abstract:
Cloud-based enterprise search services (e.g., AWS Kendra) have been entrancing big data owners by offering convenient and real-time search solutions to them. However, the problem is that individuals and organizations possessing confidential big data are hesitant to embrace such services due to valid data privacy concerns. In addition, to offer an intelligent search, these services access the user…
▽ More
Cloud-based enterprise search services (e.g., AWS Kendra) have been entrancing big data owners by offering convenient and real-time search solutions to them. However, the problem is that individuals and organizations possessing confidential big data are hesitant to embrace such services due to valid data privacy concerns. In addition, to offer an intelligent search, these services access the user search history that further jeopardizes his/her privacy. To overcome the privacy problem, the main idea of this research is to separate the intelligence aspect of the search from its pattern matching aspect. According to this idea, the search intelligence is provided by an on-premises edge tier and the shared cloud tier only serves as an exhaustive pattern matching search utility. We propose Smartness At Edge (SAED mechanism that offers intelligence in the form of semantic and personalized search at the edge tier while maintaining privacy of the search on the cloud tier. At the edge tier, SAED uses a knowledge-based lexical database to expand the query and cover its semantics. SAED personalizes the search via an RNN model that can learn the user interest. A word embedding model is used to retrieve documents based on their semantic relevance to the search query. SAED is generic and can be plugged into existing enterprise search systems and enable them to offer intelligent and privacy-preserving search without enforcing any change on them. Evaluation results on two enterprise search systems under real settings and verified by human users demonstrate that SAED can improve the relevancy of the retrieved results by on average 24% for plain-text and 75% for encrypted generic datasets.
△ Less
Submitted 11 March, 2021; v1 submitted 26 February, 2021;
originally announced February 2021.
-
SensPick: Sense Picking for Word Sense Disambiguation
Authors:
Sm Zobaed,
Md Enamul Haque,
Md Fazle Rabby,
Mohsen Amini Salehi
Abstract:
Word sense disambiguation (WSD) methods identify the most suitable meaning of a word with respect to the usage of that word in a specific context. Neural network-based WSD approaches rely on a sense-annotated corpus since they do not utilize lexical resources. In this study, we utilize both context and related gloss information of a target word to model the semantic relationship between the word a…
▽ More
Word sense disambiguation (WSD) methods identify the most suitable meaning of a word with respect to the usage of that word in a specific context. Neural network-based WSD approaches rely on a sense-annotated corpus since they do not utilize lexical resources. In this study, we utilize both context and related gloss information of a target word to model the semantic relationship between the word and the set of glosses. We propose SensPick, a type of stacked bidirectional Long Short Term Memory (LSTM) network to perform the WSD task. The experimental evaluation demonstrates that SensPick outperforms traditional and state-of-the-art models on most of the benchmark datasets with a relative improvement of 3.5% in F-1 score. While the improvement is not significant, incorporating semantic relationships brings SensPick in the leading position compared to others.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Privacy-Preserving Clustering of Unstructured Big Data for Cloud-Based Enterprise Search Solutions
Authors:
SM Zobaed,
Mohsen Amini Salehi
Abstract:
Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big data (eg, credential documents) are reluctant to fully embrace such services, due to valid concerns about data privacy. Solutions based on client-side…
▽ More
Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big data (eg, credential documents) are reluctant to fully embrace such services, due to valid concerns about data privacy. Solutions based on client-side encryption have been explored to mitigate privacy concerns. Nonetheless, such solutions hinder data processing, specifically clustering, which is pivotal in dealing with different forms of big data. For instance, clustering is critical to limit the search space and perform real-time search operations on big datasets. To overcome the hindrance in clustering encrypted big data, we propose privacy-preserving clustering schemes for three forms of unstructured encrypted big datasets, namely static, semi-dynamic, and dynamic datasets. To preserve data privacy, the proposed clustering schemes function based on statistical characteristics of the data and determine (A) the suitable number of clusters and (B) appropriate content for each cluster. Experimental results obtained from evaluating the clustering schemes on three different datasets demonstrate between 30% to 60% improvement on the clusters' coherency compared to other clustering schemes for encrypted data. Employing the clustering schemes in a privacy-preserving enterprise search system decreases its search time by up to 78%, while increases the search accuracy by up to 35%.
△ Less
Submitted 8 June, 2022; v1 submitted 22 May, 2020;
originally announced May 2020.
-
ClustCrypt: Privacy-Preserving Clustering of Unstructured Big Data in the Cloud
Authors:
SM Zobaed,
Sahan Ahmad,
Raju Gottumukkala,
Mohsen Amini Salehi
Abstract:
Security and confidentiality of big data stored in the cloud are important concerns for many organizations to adopt cloud services. One common approach to address the concerns is client-side encryption where data is encrypted on the client machine before being stored in the cloud. Having encrypted data in the cloud, however, limits the ability of data clustering, which is a crucial part of many da…
▽ More
Security and confidentiality of big data stored in the cloud are important concerns for many organizations to adopt cloud services. One common approach to address the concerns is client-side encryption where data is encrypted on the client machine before being stored in the cloud. Having encrypted data in the cloud, however, limits the ability of data clustering, which is a crucial part of many data analytics applications, such as search systems. To overcome the limitation, in this paper, we present an approach named ClustCrypt for efficient topic-based clustering of encrypted unstructured big data in the cloud. ClustCrypt dynamically estimates the optimal number of clusters based on the statistical characteristics of encrypted data. It also provides clustering approach for encrypted data. We deploy ClustCrypt within the context of a secure cloud-based semantic search system (S3BD). Experimental results obtained from evaluating ClustCrypt on three datasets demonstrate on average 60% improvement on clusters' coherency. ClustCrypt also decreases the search-time overhead by up to 78% and increases the accuracy of search results by up to 35%
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
Edge Computing for User-Centric Secure Search on Cloud-Based Encrypted Big Data
Authors:
Sahan Ahmad,
SM Zobaed,
Raju Gottumukkala,
Mohsen Amini Salehi
Abstract:
Cloud service providers offer a low-cost and convenient solution to host unstructured data. However, cloud services act as third-party solutions and do not provide control of the data to users. This has raised security and privacy concerns for many organizations (users) with sensitive data to utilize cloud-based solutions. User-side encryption can potentially address these concerns by establishing…
▽ More
Cloud service providers offer a low-cost and convenient solution to host unstructured data. However, cloud services act as third-party solutions and do not provide control of the data to users. This has raised security and privacy concerns for many organizations (users) with sensitive data to utilize cloud-based solutions. User-side encryption can potentially address these concerns by establishing user-centric cloud services and granting data control to the user. Nonetheless, user-side encryption limits the ability to process (e.g., search) encrypted data on the cloud. Accordingly, in this research, we provide a framework that enables processing (in particular, searching) of encrypted multi-organizational (i.e., multi-source) big data without revealing the data to cloud provider. Our framework leverages locality feature of edge computing to offer a user-centric search ability in a real-time manner. In particular, the edge system intelligently predicts the user's search pattern and prunes the multi-source big data search space to reduce the search time. The pruning system is based on efficient sampling from the clustered big dataset on the cloud. For each cluster, the pruning system dynamically samples appropriate number of terms based on the user's search tendency, so that the cluster is optimally represented. We developed a prototype of a user-centric search system and evaluated it against multiple datasets. Experimental results demonstrate 27% improvement in the pruning quality and search accuracy.
△ Less
Submitted 9 August, 2019;
originally announced August 2019.