Please use this identifier to cite or link to this item: http://13.232.72.61:8080/jspui/handle/123456789/370
Title: A Survey on Hadoop MapReduce Framework and the Data Skew Issues
Authors: Wajid, Nawab
Satish, S.
Manjunath, T. N.
Keywords: Information Science
Computer simulation
Issue Date: Apr-2015
Publisher: IJSRET
Citation: Wajid, Nawab., Satish, S., & Manjunath, T. N. (2015). A Survey on Hadoop MapReduce Framework and the Data Skew Issues. International Journal of Scientific Research Engineering & Technology 4(4), 398-404.
Abstract: Hadooop is an open source implementation of google’s MapReduce framework. MapReduce is the heart of the apache’s hadoop. The file system which is used by the hadoop for storing the files is known as hadoop distributed file system(HDFS) which is an open source implementation of the google file system (GFS). Hadoop allows the parallel processing of the large data sets by splitting the larger data set into smaller partitions and each partition is fed to the separate task in the data node by the job tracker. The datanode is the node where the data actually resides. The task tracker resides on the data node and it runs the tasks and also reports the status of the tasks to the job tracker. In a MapReduce, the slowest running task decides the job completion time. If the task is slower, it delays the progress of the entire job. This slowest running task is known as the straggler. There can be many reasons for the straggler to occur. One of the reasons is the data skew. This paper reviews the different types of the data skew, where in MapReduce data skew can occur and what is the measure taken to overcome these problems.
URI: http://13.232.72.61:8080/jspui/handle/123456789/370
ISSN: 2278 – 0882
Appears in Collections:Articles

Files in This Item:
File Description SizeFormat 
A Survey on Hadoop Map Reduce Framework and the Data Skew Issues.pdf414.67 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.