-
PABED A Tool for Big Education Data Analysis
Authors:
Samiya Khan,
Kashish Ara Shakil,
Mansaf Alam
Abstract:
Cloud computing and big data have risen to become the most popular technologies of the modern world. Apparently, the reason behind their immense popularity is their wide range of applicability as far as the areas of interest are concerned. Education and research remain one of the most obvious and befitting application areas. This research paper introduces a big data analytics tool, PABED Project A…
▽ More
Cloud computing and big data have risen to become the most popular technologies of the modern world. Apparently, the reason behind their immense popularity is their wide range of applicability as far as the areas of interest are concerned. Education and research remain one of the most obvious and befitting application areas. This research paper introduces a big data analytics tool, PABED Project Analyzing Big Education Data, for the education sector that makes use of cloud-based technologies. This tool is implemented using Google BigQuery and R programming language and allows comparison of undergraduate enrollment data for different academic years. Although, there are many proposed applications of big data in education, there is a lack of tools that can actualize the concept into practice. PABED is an effort in this direction. The implementation and testing details of the project have been described in this paper. This tool validates the use of cloud computing and big data technologies in education and shall head start development of more sophisticated educational intelligence tools.
△ Less
Submitted 31 July, 2018;
originally announced August 2018.
-
Big Data Computing Using Cloud-Based Technologies, Challenges and Future Perspectives
Authors:
Samiya Khan,
Kashish Ara Shakil,
Mansaf Alam
Abstract:
The excessive amounts of data generated by devices and Internet-based sources at a regular basis constitute, big data. This data can be processed and analyzed to develop useful applications for specific domains. Several mathematical and data analytics techniques have found use in this sphere. This has given rise to the development of computing models and tools for big data computing. However, the…
▽ More
The excessive amounts of data generated by devices and Internet-based sources at a regular basis constitute, big data. This data can be processed and analyzed to develop useful applications for specific domains. Several mathematical and data analytics techniques have found use in this sphere. This has given rise to the development of computing models and tools for big data computing. However, the storage and processing requirements are overwhelming for traditional systems and technologies. Therefore, there is a need for infrastructures that can adjust the storage and processing capability in accordance with the changing data dimensions. Cloud Computing serves as a potential solution to this problem. However, big data computing in the cloud has its own set of challenges and research issues. This chapter surveys the big data concept, discusses the mathematical and data analytics techniques that can be used for big data and gives taxonomy of the existing tools, frameworks and platforms available for different big data computing models. Besides this, it also evaluates the viability of cloud-based big data computing, examines existing challenges and opportunities, and provides future research directions in this field.
△ Less
Submitted 24 November, 2017;
originally announced December 2017.
-
Workflow-Based Big Data Analytics in The Cloud Environment Present Research Status and Future Prospects
Authors:
Samiya Khan,
Kashish Ara Shakil,
Mansaf Alam
Abstract:
Workflow is a common term used to describe a systematic breakdown of tasks that need to be performed to solve a problem. This concept has found best use in scientific and business applications for streamlining and improving the performance of the underlying processes targeted towards achieving an outcome. The growing complexity of big data analytical problems has invited the use of scientific work…
▽ More
Workflow is a common term used to describe a systematic breakdown of tasks that need to be performed to solve a problem. This concept has found best use in scientific and business applications for streamlining and improving the performance of the underlying processes targeted towards achieving an outcome. The growing complexity of big data analytical problems has invited the use of scientific workflows for performing complex tasks for specific domain applications. This research investigates the efficacy of workflow-based big data analytics in the cloud environment, giving insights on the research already performed in the area and possible future research directions in the field.
△ Less
Submitted 4 November, 2017;
originally announced November 2017.
-
BAMHealthCloud: A Biometric Authentication and Data Management System for Healthcare Data in Cloud
Authors:
Kashish A. Shakil,
Farhana J. Zareen,
Mansaf Alam,
Suraiya Jabin
Abstract:
Advancements in healthcare industry with new technology and population growth has given rise to security threat to our most personal data. The healthcare data management system consists of records in different formats such as text, numeric, pictures and videos leading to data which is big and unstructured. Also, hospitals have several branches at different locations throughout a country and overse…
▽ More
Advancements in healthcare industry with new technology and population growth has given rise to security threat to our most personal data. The healthcare data management system consists of records in different formats such as text, numeric, pictures and videos leading to data which is big and unstructured. Also, hospitals have several branches at different locations throughout a country and overseas. In view of these requirements a cloud based healthcare management system can be an effective solution for efficient health care data management. One of the major concerns of a cloud based healthcare system is the security aspect. It includes theft to identity, tax fraudulence, insurance frauds, medical frauds and defamation of high profile patients. Hence, a secure data access and retrieval is needed in order to provide security of critical medical records in health care management system. Biometric authentication mechanism is suitable in this scenario since it overcomes the limitations of token theft and forgetting passwords in conventional token id-password mechanism used for providing security. It also has high accuracy rate for secure data access and retrieval. In this paper we propose BAMHealthCloud which is a cloud based system for management of healthcare data, it ensures security of data through biometric authentication. It has been developed after performing a detailed case study on healthcare sector in a developing country. Training of the signature samples for authentication purpose has been performed in parallel on hadoop MapReduce framework using Resilient Backpropagation neural network. From rigorous experiments it can be concluded that it achieves a speedup of 9x, Equal error rate (EER) of 0.12, sensitivity of 0.98 and specificity of 0.95 as compared to other approaches existing in literature.
△ Less
Submitted 19 May, 2017;
originally announced May 2017.
-
Exploiting Data Reduction Principles in Cloud-Based Data Management for Cryo-Image Data
Authors:
Kashish Ara Shakil,
Ari Ora,
Mansaf Alam,
Shabih Shakeel
Abstract:
Cloud computing is a cost-effective way for start-up life sciences laboratories to store and manage their data. However, in many instances the data stored over the cloud could be redundant which makes cloud-based data management inefficient and costly because one has to pay for every byte of data stored over the cloud. Here, we tested efficient management of data generated by an electron cryo micr…
▽ More
Cloud computing is a cost-effective way for start-up life sciences laboratories to store and manage their data. However, in many instances the data stored over the cloud could be redundant which makes cloud-based data management inefficient and costly because one has to pay for every byte of data stored over the cloud. Here, we tested efficient management of data generated by an electron cryo microscopy (cryoEM) lab on a cloud-based environment. The test data was obtained from cryoEM repository EMPIAR. All the images were subjected to an in-house parallelized version of principal component analysis. An efficient cloud-based MapReduce modality was used for parallelization. We showed that large data in order of terabytes could be efficiently reduced to its minimal essential self in a cost-effective scalable manner. Furthermore, on-spot instance on Amazon EC2 was shown to reduce costs by a margin of about 27 percent. This approach could be scaled to data of any large volume and type.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Big Data Analytics in Cloud environment using Hadoop
Authors:
Mansaf Alam,
Kashish Ara Shakil
Abstract:
The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volum…
▽ More
The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volume, velocity, value and variety. In this work we have classify the input data and routed to various processing node. At the last after processing from each node, we can combine the output of all nodes to get the final result. We have used Hadoop to partition the data as well as process it.
△ Less
Submitted 15 September, 2016;
originally announced October 2016.
-
Cloud-Based Big Data Management and Analytics for Scholarly Resources: Current Trends, Challenges and Scope for Future Research
Authors:
Samiya Khan,
Kashish A. Shakil,
Mansaf Alam
Abstract:
With the shifting focus of organizations and governments towards digitization of academic and technical documents, there has been an increasing need to use this reserve of scholarly documents for developing applications that can facilitate and aid in better management of research. In addition to this, the evolving nature of research problems has made them essentially interdisciplinary. As a result…
▽ More
With the shifting focus of organizations and governments towards digitization of academic and technical documents, there has been an increasing need to use this reserve of scholarly documents for developing applications that can facilitate and aid in better management of research. In addition to this, the evolving nature of research problems has made them essentially interdisciplinary. As a result, there is a growing need for scholarly applications like collaborator discovery, expert finding and research recommendation systems. This research paper reviews the current trends and identifies the challenges existing in the architecture, services and applications of big scholarly data platform with a specific focus on directions for future research.
△ Less
Submitted 6 June, 2016;
originally announced June 2016.
-
BAMCloud: A Cloud Based Mobile Biometric Authentication Framework
Authors:
Farhana Javed Zareen,
Kashish Ara Shakil,
Mansaf Alam,
Suraiya Jabin
Abstract:
With an exponential increase in number of users switching to mobile banking, various countries are adopting biometric solutions as security measures. The main reason for biometric technologies becoming more common in the everyday lives of consumers is because of the facility to easily capture biometric data in real time, using their mobile phones. Biometric technologies are providing the potential…
▽ More
With an exponential increase in number of users switching to mobile banking, various countries are adopting biometric solutions as security measures. The main reason for biometric technologies becoming more common in the everyday lives of consumers is because of the facility to easily capture biometric data in real time, using their mobile phones. Biometric technologies are providing the potential security framework to make banking more convenient and secure than it has ever been. At the same time, the exponential growth of enrollment in the biometric system produces massive amount of high dimensionality data that leads to degradation in the performance of the mobile banking systems. Therefore, in order to overcome the performance issues arising due to this data deluge, this paper aims to propose a distributed mobile biometric system based on a high performance cluster Cloud. High availability, better time efficiency and scalability are some of the added advantages of using the proposed system. In this paper a Cloud based mobile biometric authentication framework (BAMCloud) is proposed that uses dynamic signatures and performs authentication. It includes the steps involving data capture using any handheld mobile device, then storage, preprocessing and training the system in a distributed manner over Cloud. For this purpose we have implemented it using MapReduce on Hadoop platform and for training Levenberg-Marquardt backpropagation neural network has been used. Moreover, the methodology adopted is very novel as it achieves a speedup of 8.5x and a performance of 96.23%. Furthermore, the cost benefit analysis of the implemented system shows that the cost of implementation and execution of the system is lesser than the existing ones. The experiments demonstrate that the better performance is achieved by proposed framework as compared to the other methods used in the recent literature.
△ Less
Submitted 19 May, 2017; v1 submitted 12 January, 2016;
originally announced January 2016.
-
Cloud based Big Data Analytics: A Survey of Current Research and Future Directions
Authors:
Samiya Khan,
Kashish Ara Shakil,
Mansaf Alam
Abstract:
The advent of the digital age has led to a rise in different types of data with every passing day. In fact, it is expected that half of the total data will be on the cloud by 2016. This data is complex and needs to be stored, processed and analyzed for information that can be used by organizations. Cloud computing provides an apt platform for big data analytics in view of the storage and computing…
▽ More
The advent of the digital age has led to a rise in different types of data with every passing day. In fact, it is expected that half of the total data will be on the cloud by 2016. This data is complex and needs to be stored, processed and analyzed for information that can be used by organizations. Cloud computing provides an apt platform for big data analytics in view of the storage and computing requirements of the latter. This makes cloud-based analytics a viable research field. However, several issues need to be addressed and risks need to be mitigated before practical applications of this synergistic model can be popularly used. This paper explores the existing research, challenges, open issues and future research direction for this field of study.
△ Less
Submitted 18 August, 2015;
originally announced August 2015.
-
Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig
Authors:
Kashish Ara Shakil,
Mansaf Alam,
Shuchi Sethi
Abstract:
Cloud computing deals with heterogeneity and dynamicity at all levels and therefore there is a need to manage resources in such an environment and properly allocate them. Resource planning and scheduling requires a proper understanding of arrival patterns and scheduling of resources. Study of workloads can aid in proper understanding of their associated environment. Google has released its latest…
▽ More
Cloud computing deals with heterogeneity and dynamicity at all levels and therefore there is a need to manage resources in such an environment and properly allocate them. Resource planning and scheduling requires a proper understanding of arrival patterns and scheduling of resources. Study of workloads can aid in proper understanding of their associated environment. Google has released its latest version of cluster trace, trace version 2.1 in November 2014.The trace consists of cell information of about 29 days spanning across 700k jobs. This paper deals with statistical analysis of this cluster trace. Since the size of trace is very large, Hive which is a Hadoop distributed file system (HDFS) based platform for querying and analysis of Big data, has been used. Hive was accessed through its Beeswax interface. The data was imported into HDFS through HCatalog. Apart from Hive, Pig which is a scripting language and provides abstraction on top of Hadoop was used. To the best of our knowledge the analytical method adopted by us is novel and has helped in gaining several useful insights. Clustering of jobs and arrival time has been done in this paper using K-means++ clustering followed by analysis of distribution of arrival time of jobs which revealed weibull distribution while resource usage was close to zip-f like distribution and process runtimes revealed heavy tailed distribution.
△ Less
Submitted 23 March, 2015;
originally announced March 2015.
-
Dengue disease prediction using weka data mining tool
Authors:
Kashish Ara Shakil,
Shadma Anis,
Mansaf Alam
Abstract:
Dengue is a life threatening disease prevalent in several developed as well as developing countries like India.In this paper we discuss various algorithm approaches of data mining that have been utilized for dengue disease prediction. Data mining is a well known technique used by health organizations for classification of diseases such as dengue, diabetes and cancer in bioinformatics research. In…
▽ More
Dengue is a life threatening disease prevalent in several developed as well as developing countries like India.In this paper we discuss various algorithm approaches of data mining that have been utilized for dengue disease prediction. Data mining is a well known technique used by health organizations for classification of diseases such as dengue, diabetes and cancer in bioinformatics research. In the proposed approach we have used WEKA with 10 cross validation to evaluate data and compare results. Weka has an extensive collection of different machine learning and data mining algorithms. In this paper we have firstly classified the dengue data set and then compared the different data mining techniques in weka through Explorer, knowledge flow and Experimenter interfaces. Furthermore in order to validate our approach we have used a dengue dataset with 108 instances but weka used 99 rows and 18 attributes to determine the prediction of disease and their accuracy using classifications of different algorithms to find out the best performance. The main objective of this paper is to classify data and assist the users in extracting useful information from data and easily identify a suitable algorithm for accurate predictive model from it. From the findings of this paper it can be concluded that Naïve Bayes and J48 are the best performance algorithms for classified accuracy because they achieved maximum accuracy= 100% with 99 correctly classified instances, maximum ROC = 1, had least mean absolute error and it took minimum time for building this model through Explorer and Knowledge flow results
△ Less
Submitted 18 February, 2015;
originally announced February 2015.
-
An Effective Framework for Managing University Data using a Cloud based Environment
Authors:
Kashish Ara Shakil,
Shuchi Sethi,
Mansaf Alam
Abstract:
Management of data in education sector particularly management of data for big universities with several employees, departments and students is a very challenging task. There are also problems such as lack of proper funds and manpower for management of such data in universities. Education sector can easily and effectively take advantage of cloud computing skills for management of data. It can enha…
▽ More
Management of data in education sector particularly management of data for big universities with several employees, departments and students is a very challenging task. There are also problems such as lack of proper funds and manpower for management of such data in universities. Education sector can easily and effectively take advantage of cloud computing skills for management of data. It can enhance the learning experience as a whole and can add entirely new dimensions to the way in which education is imbibed. Several benefits of Cloud computing such as monetary benefits, environmental benefits and remote data access for management of data such as university database can be used in education sector. Therefore, in this paper we have proposed an effective framework for managing university data using a cloud based environment. We have also proposed cloud data management simulator: a new simulation framework which demonstrates the applicability of cloud in the current education sector. The framework consists of a cloud developed for processing a universities database which consists of staff and students. It has the following features (i) support for modeling cloud computing infrastructure, which includes data centers containing university database; (ii) a user friendly interface; (iii) flexibility to switch between the different types of users; and (iv) virtualized access to cloud data.
△ Less
Submitted 28 January, 2015;
originally announced January 2015.
-
Seeking Black Lining In Cloud
Authors:
Shuchi Sethi,
Kashish Ara Shakil,
Mansaf Alam
Abstract:
This work is focused on attacks on confidentiality that require time synchronization. This manuscript proposes a detection framework for covert channel perspective in cloud security. This problem is interpreted as a binary classification problem and the algorithm proposed is based on certain features that emerged after data analysis of Google cluster trace that forms base for analyzing attack free…
▽ More
This work is focused on attacks on confidentiality that require time synchronization. This manuscript proposes a detection framework for covert channel perspective in cloud security. This problem is interpreted as a binary classification problem and the algorithm proposed is based on certain features that emerged after data analysis of Google cluster trace that forms base for analyzing attack free data. This approach can be generalized to study the flow of other systems and fault detection. The detection framework proposed does not make assumptions pertaining to data distribution as a whole making it suitable to meet cloud dynamism.
△ Less
Submitted 19 January, 2015;
originally announced January 2015.
-
Analysis and Clustering of Workload in Google Cluster Trace based on Resource Usage
Authors:
Mansaf Alam,
Kashish Ara Shakil,
Shuchi Sethi
Abstract:
Cloud computing has gained interest amongst commercial organizations, research communities, developers and other individuals during the past few years.In order to move ahead with research in field of data management and processing of such data, we need benchmark datasets and freely available data which are publicly accessible. Google in May 2011 released a trace of a cluster of 11k machines referr…
▽ More
Cloud computing has gained interest amongst commercial organizations, research communities, developers and other individuals during the past few years.In order to move ahead with research in field of data management and processing of such data, we need benchmark datasets and freely available data which are publicly accessible. Google in May 2011 released a trace of a cluster of 11k machines referred as Google Cluster Trace.This trace contains cell information of about 29 days.This paper provides analysis of resource usage and requirements in this trace and is an attempt to give an insight into such kind of production trace similar to the ones in cloud environment.The major contributions of this paper include Statistical Profile of Jobs based on resource usage, clustering of Workload Patterns and Classification of jobs into different types based on k-means clustering.Though there have been earlier works for analysis of this trace, but our analysis provides several new findings such as jobs in a production trace are trimodal and there occurs symmetry in the tasks within a long job type
△ Less
Submitted 7 January, 2015;
originally announced January 2015.
-
Recent Developments in Cloud Based Systems: State of Art
Authors:
Mansaf Alam,
Kashish Ara Shakil
Abstract:
Cloud computing is the new buzzword in the head of the techies round the clock these days. The importance and the different applications of cloud computing are overwhelming and thus, it is a topic of huge significance. It provides several astounding features like Multitenancy, on demand service, pay per use etc. This manuscript presents an exhaustive survey on cloud computing technology and potent…
▽ More
Cloud computing is the new buzzword in the head of the techies round the clock these days. The importance and the different applications of cloud computing are overwhelming and thus, it is a topic of huge significance. It provides several astounding features like Multitenancy, on demand service, pay per use etc. This manuscript presents an exhaustive survey on cloud computing technology and potential research issues in cloud computing that needs to be addressed.
△ Less
Submitted 5 January, 2015;
originally announced January 2015.
-
A Decision Matrix and Monitoring based Framework for Infrastructure Performance Enhancement in A Cloud based Environment
Authors:
Mansaf Alam,
Kashish Ara Shakil
Abstract:
Cloud environment is very different from traditional computing environment and therefore tracking the performance of cloud leverages additional requirements. The movement of data in cloud is very fast. Hence, it requires that resources and infrastructure available at disposal must be equally competent. Infrastructure level performance in cloud involves the performance of servers, network and stora…
▽ More
Cloud environment is very different from traditional computing environment and therefore tracking the performance of cloud leverages additional requirements. The movement of data in cloud is very fast. Hence, it requires that resources and infrastructure available at disposal must be equally competent. Infrastructure level performance in cloud involves the performance of servers, network and storage which act as the heart and soul for driving the entire cloud business. Thus a constant improvement and enhancement of infrastructure level performance is an important task that needs to be taken into account. This paper proposes a framework for infrastructure performance enhancement in a cloud based environment. The framework is broadly divided into four steps: a) Infrastructure level monitoring of usage pattern and behaviour of the cloud end users, b) Reporting of the monitoring activities to the cloud service provider c) Cloud service provider assigns priority according to our decision matrix based max-min algorithm (DMMM) d) Providing services to cloud users leading to infrastructure performance enhancement. Our framework is based on decision matrix and monitoring in cloud using our proposed decision matrix based max-min algorithm, which draws its inspiration from the original min-min algorithm. This algorithm makes use of decision matrix to make decisions regarding distribution of resources among the cloud users.
△ Less
Submitted 27 December, 2014;
originally announced December 2014.
-
An NBDMMM Algorithm Based Framework for Allocation of Resources in Cloud
Authors:
Mansaf Alam,
Kashish Ara Shakil
Abstract:
Cloud computing is a technological advancement in the arena of computing and has taken the utility vision of computing a step further by providing computing resources such as network, storage, compute capacity and servers, as a service via an internet connection. These services are provided to the users in a pay per use manner subjected to the amount of usage of these resources by the cloud users.…
▽ More
Cloud computing is a technological advancement in the arena of computing and has taken the utility vision of computing a step further by providing computing resources such as network, storage, compute capacity and servers, as a service via an internet connection. These services are provided to the users in a pay per use manner subjected to the amount of usage of these resources by the cloud users. Since the usage of these resources is done in an elastic manner thus an on demand provisioning of these resources is the driving force behind the entire cloud computing infrastructure therefore the maintenance of these resources is a decisive task that must be taken into account. Eventually, infrastructure level performance monitoring and enhancement is also important. This paper proposes a framework for allocation of resources in a cloud based environment thereby leading to an infrastructure level enhancement of performance in a cloud environment. The framework is divided into four stages Stage 1: Cloud service provider monitors the infrastructure level pattern of usage of resources and behavior of the cloud users. Stage 2: Report the monitoring activities about the usage to cloud service providers. Stage 3: Apply proposed Network Bandwidth Dependent DMMM algorithm .Stage 4: Allocate resources or provide services to cloud users, thereby leading to infrastructure level performance enhancement and efficient management of resources. Analysis of resource usage pattern is considered as an important factor for proper allocation of resources by the service providers, in this paper Google cluster trace has been used for accessing the resource usage pattern in cloud. Experiments have been conducted on cloudsim simulation framework and the results reveal that NBDMMM algorithm improvises allocation of resources in a virtualized cloud.
△ Less
Submitted 27 December, 2014;
originally announced December 2014.