Overview
Hadoop Administration training introduces you to the fundamental concepts of Apache Hadoop and Hadoop cluster. Through hands on exercises and practice sessions you will learn to configure, deploy and maintain a Hadoop cluster, and to confidently navigate the Hadoop ecosystem.
Objectives
At the end of Hadoop Administration training course, participants will
Prerequisites
- Basic knowledge of Linux
Course Outline
- Hadoop cluster architecture
- Data loading into HDFS
- Roles and Responsibilities of a Hadoop Cluster Administrator
- Hadoop server roles and their usage
- Rack awareness
- Write and Read
- Replication Pipeline
- Data Processing
- Hadoop Installation and Initial Configuration
- Deploying Hadoop in pseudo-distributed mode
- Deploying a multi-node Hadoop cluster
- Installing Hadoop Clients
- Selecting the appropriate hardware
- Designing a scalable cluster
- Building the cluster
- Installing the Hadoop daemons
- Optimizing the network architecture
- Managing and scheduling jobs
- Types of schedulers in Hadoop
- Configuring the schedulers and run MapReduce jobs
- Cluster monitoring and troubleshooting
- How to manage hardware failures
- Securing Hadoop clusters
- Configuring Hadoop backup
- Distcp to copy data
- Cluster maintenance
- Configuring HDFS Federation
- Basics of Hadoop Platform Security
- Securing the Platform
- Configuring Kerberos
- Isolating single points of failure
- Maintaining High Availability
- Triggering manual failover
- Automating failover with Zookeeper
- Extending HDFS resources
- Managing the namespace volumes
- Critiquing the YARN architecture
- Identifying the new daemons
- Starting and stopping Hadoop daemonso Monitoring HDFS status
- Adding and removing data nodes
- Managing MapReduce jobs
- Tracking progress with monitoring tools
- Commissioning and decommissioning compute nodes
- Oozie
- Hcatalog/Hive Administration
- HBase Architecture
- HBase setup
- HBase and Hive Integration
- HBase performance optimization