I have to do in the following layers.
- HDFS Layer
- NameNode-Master
- DataNode-Store Data(Actual Storage)
- MapReduce Layer
- JobTracker
- TaskTracker
- Secondary Namenode– storing backup of NameNode it will not work as an alternate namenode, it just stored namenode metadata
Types of Hadoop Configurations
- Standalone Mode
- All processes runs as single process
- Preferred in development
- Pseudo Cluster Mode
- All processes run in different process but on a single machine
- Simulate cluster
- Fully Cluster Mode
- All processes running on different boxes
- Preferred in production Mode
What are important files to be configure
- hadoop-env.sh (set java environment and logging file)
- core-site.xml (configure namenode)
- hdfs-site.xml (configure datanode)
- mapred-site.xml (map reduce here taking responsibility of configuring jobTracker and taskTracker)
- yarn-site.xml
- master (file configured on each datanodes telling about its namenode)
- slave (file configured on namenode telling what all slave of datanode it has to manage)