Friday, 19 June 2015

Tips to clear Cloudera Apache Hadoop Certification CCD 410

Last weekend i cleared Cloudera Apache Hadoop CCD410 certification.The level of certification is neither hard nor very easy.One who is good in understanding Mapreduce and HDFS and have good practice will clear this exam.

I started taking interest in BigData and Hadoop in end of March.Here is my advice on how to prepare for CCD-410:

  • Hadoop: The Definitive Guide is the bible and one of the best book which gives you good understanding.Try making your own notes when u start reading this book.It would be good to have cloudera quickstart or hortonworks sandbox installed on your laptop . http://datatwist.blogspot.in/2015/05/getting-started.html will help u installing the same.
  • get yourself clear on every topic in detail.serde,serialization,sequencefile, mapfile,compression,setjarbyclassname().hidden files will not be processed,split,recordreader.etc.
  • Read Part 2 of this book in detail,you will get maximum question from this section.
  • You will get 20-25 practical questions question which will ask you about the output of the mapreduce flow .you will be asked questions on on conditions like if you comment fileoutput path what will happen and so on..
  • Try commenting driver class  and see what you get as output .eg comment job.setjarclassname().
  • Start practising on Hive and Sqoop.(SQOOP inport/export are very common but lil confusing some times.) 
  • A strong Java foundation is required before taking the exam. Since Hadoop and many of its most useful features are written in Java, extending abstract classes and implementing its base class interfaces is critical. Hadoop streaming (along with Avro) lend many Hadoop features to other languages, but unlocking the full potential of Hadoop requires Java (for now). In addition, many of Java’s Collections (ArrayList, TreeSets, HashMap classes), string manipulation(String, Pattern, and Matcher classes), and architectural/design capabilities (base serialization, primitive types, etc.) are considered fair game for the exam — and you will find many of these concepts on the exam in typical use cases.
  •  Read the differnce between Mapreduce 1 and YARN.
  • Dont overlook Flume,Sqoop,Hive and crunch.If you are bored of reading you can watch youtube videos.
  • There is a Youtube channel IT Versity by Durga Gadiraju,which explains every aspect of this certification in depth,specially Sqoop,Hive and Performance tuning .
  • Rest All the best dnt panic during the exam if youarnt able to get the answer of practical questions.look for the wrong answer and you will get right one.

Some useful links