Why , When and How Hadoop and Spark...: Hadoop mapreduce basics

Below are the basic points to be remebered about mapreduce..

map: (K1,V1) → list(K2,V2)

reduce: (K2,list(V2)) → list(K3,V3)

MapReduce works in 2 phase . Map phase and Reduce phase.
Mapreduce is a data preparation phase and for filteration of unwanted records.
Reducers have an Iterator which keeps on iterating the values associated with Key.
Rewriting on the same directory will raise exception.
1 reducer = 1 output file.
we can set number of reducers in mapreduce program.
Input is divided into Splits.and should be equal to the block size
if the reducer is not defined the out put will be written to hdfs.
Read Data Localisation.
Map write output is local not hdfs and in shuffle and sort phase it is sent to reducers.
combiner works well in association and commutative processes.
Combiners cutdown shuffle btw networks in map and reduce phase.
Hadoop Streaming to run in different languages.

Why , When and How Hadoop and Spark...