Search | arXiv e-print repository

arXiv:1208.1942 [pdf]

doi 10.5121/ijdps.2012.3411

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds

Authors: B. Thirumala Rao, L. S. S. Reddy

Abstract: MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloud-based data processing environments like Hadoop. There is a conflict between the scheduling MR jobs to meet deadlines and "data locality" (assigning tasks to nodes that contain their input data… ▽ More MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloud-based data processing environments like Hadoop. There is a conflict between the scheduling MR jobs to meet deadlines and "data locality" (assigning tasks to nodes that contain their input data). To meet the deadline a task may be scheduled on a node without local input data for that task causing expensive data transfer from a remote node. In this paper, a novel scheduler is proposed to address the above problem which is primarily based on the dynamic resource reconfiguration approach. It has two components: 1) Resource Predictor: which dynamically determines the required number of Map/Reduce slots for every job to meet completion time guarantee; 2) Resource Reconfigurator: that adjusts the CPU resources while not violating completion time goals of the users by dynamically increasing or decreasing individual VMs to maximize data locality and also to maximize the use of resources within the system among the active jobs. The proposed scheduler has been evaluated against Fair Scheduler on virtual cluster built on a physical cluster of 20 machines. The results demonstrate a gain of about 12% increase in throughput of Jobs △ Less

Submitted 9 August, 2012; originally announced August 2012.

Journal ref: International Journal of Distributed and Parallel Systems (IJDPS)Vol.3, No.4, Pages 99-110, July 2012

arXiv:1207.0894 [pdf]

Performance Issues of Heterogeneous Hadoop Clusters in Cloud Computing

Authors: B. Thirumala Rao, N. V. Sridevi, V. Krishna Reddy, L. S. S. Reddy

Abstract: Nowadays most of the cloud applications process large amount of data to provide the desired results. Data volumes to be processed by cloud applications are growing much faster than computing power. This growth demands new strategies for processing and analyzing information. Dealing with large data volumes requires two things: 1) Inexpensive, reliable storage 2) New tools for analyzing unstructured… ▽ More Nowadays most of the cloud applications process large amount of data to provide the desired results. Data volumes to be processed by cloud applications are growing much faster than computing power. This growth demands new strategies for processing and analyzing information. Dealing with large data volumes requires two things: 1) Inexpensive, reliable storage 2) New tools for analyzing unstructured and structured data. Hadoop is a powerful open source software platform that addresses both of these problems. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Hadoop lacks performance in heterogeneous clusters where the nodes have different computing capacity. In this paper we address the issues that affect the performance of hadoop in heterogeneous clusters and also provided some guidelines on how to overcome these bottlenecks △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: 6 Pages

Journal ref: Global Journal of Computer Science and Technology, Volume XI Issue VIII May 2011

arXiv:1207.0780 [pdf]

Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments

Authors: B. Thirumala Rao, L. S. S. Reddy

Abstract: Cloud Computing is emerging as a new computational paradigm shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity hardware clusters such as Clouds. In all Hadoop implementations, the default FIFO scheduler is available where jobs are scheduled in FIFO order with support for other priority based schedulers also. In this paper we study var… ▽ More Cloud Computing is emerging as a new computational paradigm shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity hardware clusters such as Clouds. In all Hadoop implementations, the default FIFO scheduler is available where jobs are scheduled in FIFO order with support for other priority based schedulers also. In this paper we study various scheduler improvements possible with Hadoop and also provided some guidelines on how to improve the scheduling in Hadoop in Cloud Environments. △ Less

Submitted 3 July, 2012; originally announced July 2012.

Comments: 5 Pages, 2 figures; International Journal of Computer Applications, November 2011

arXiv:1002.1156 [pdf]

Dimensionality Reduction: An Empirical Study on the Usability of IFE-CF (Independent Feature Elimination- by C-Correlation and F-Correlation) Measures

Authors: M. Babu Reddy, L. S. S. Reddy

Abstract: The recent increase in dimensionality of data has thrown a great challenge to the existing dimensionality reduction methods in terms of their effectiveness. Dimensionality reduction has emerged as one of the significant preprocessing steps in machine learning applications and has been effective in removing inappropriate data, increasing learning accuracy, and improving comprehensibility. Feature… ▽ More The recent increase in dimensionality of data has thrown a great challenge to the existing dimensionality reduction methods in terms of their effectiveness. Dimensionality reduction has emerged as one of the significant preprocessing steps in machine learning applications and has been effective in removing inappropriate data, increasing learning accuracy, and improving comprehensibility. Feature redundancy exercises great influence on the performance of classification process. Towards the better classification performance, this paper addresses the usefulness of truncating the highly correlated and redundant attributes. Here, an effort has been made to verify the utility of dimensionality reduction by applying LVQ (Learning Vector Quantization) method on two Benchmark datasets of 'Pima Indian Diabetic patients' and 'Lung cancer patients'. △ Less

Submitted 5 February, 2010; originally announced February 2010.

Comments: International Journal of Computer Science Issues, IJCSI, Vol. 7, Issue 1, No. 1, January 2010, http://ijcsi.org

Journal ref: International Journal of Computer Science Issues, IJCSI, Vol. 7, Issue 1, No. 1, January 2010, http://ijcsi.org/articles/Dimensionality-Reduction-An-Empirical-Study-on-the-Usability-of-IFE-CF-(Independent-Feature-Elimination-by-C-Correlation-and-F-Correlation)-Measures.php

Showing 1–4 of 4 results for author: Reddy, L S S