Overview
This training course will help participants to gain the skills they need to store, manage, process, and analyze massive amounts of structured and unstructured data to extract meaningful insights.
Objectives
At the end of Intro to Big Data & Hadoop training course, participants will
Prerequisites
Before undertaking a Big Data and Hadoop course, participant is recommended to have a basic knowledge of programming languages like Python, Scala, Java and a better understanding of SQL and RDBMS.
Course Outline
- Understanding Big Data
- Types of Big Data
- Difference between Traditional Data and Big Data
- Introduction to Hadoop
- Distributed Data Storage In Hadoop, HDFS and Hbase
- Hadoop Data processing Analyzing Services MapReduce and spark, Hive Pig and Storm
- Data Integration Tools in Hadoop
- Resource Management and cluster management Services
- Need of Hadoop in Big Data
- Understanding Hadoop And Its Architecture
- The MapReduce Framework
- What is YARN?
- Understanding Big Data Components
- Monitoring, Management and Orchestration Components of Hadoop Ecosystem
- Different Distributions of Hadoop
- Installing Hadoop 3
- Hortonworks sandbox installation & configuration
- Hadoop Configuration files
- Working with Hadoop services using Ambari
- Hadoop Daemons
- Browsing Hadoop UI consoles
- Basic Hadoop Shell commands
- Eclipse & winscp installation & configurations on VM
- Running a MapReduce application in MR2
- MapReduce Framework on YARN
- Fault tolerance in YARN
- Map, Reduce & Shuffle phases
- Understanding Mapper, Reducer & Driver classes
- Writing MapReduce WordCount program
- Executing & monitoring a Map Reduce job
- SparkSQL and DataFrames
- DataFrames and the SQL API
- DataFrame schema
- Datasets and encoders
- Loading and saving data
- Aggregations
- Joins
- A short introduction to streaming
- Spark Streaming
- Discretized Streams
- Stateful and stateless transformations
- Checkpointing
- Operating with other streaming platforms (such as Apache Kafka)
- Structured Streaming
- Background of Pig
- Pig architecture
- Pig Latin basics
- Pig execution modes
- Pig processing – loading and transforming data
- Pig built-in functions
- Filtering, grouping, sorting data
- Relational join operators
- Pig Scripting
- Pig UDF’s
- Background of Hive
- Hive architecture
- Hive Query Language
- Derby to MySQL database
- Managed & external tables
- Data processing – loading data into tables
- Hive Query Language
- Using Hive built-in functions
- Partitioning data using Hive
- Bucketing data
- Hive Scripting
- Using Hive UDF’s
- HBase overview
- Data model
- HBase architecture
- HBase shell
- Zookeeper & its role in HBase environment
- HBase Shell environment
- Creating table
- Creating column families
- CLI commands – get, put, delete & scan
- Scan Filter operations
- Importing data from RDBMS to HDFS
- Exporting data from HDFS to RDBMS
- Importing & exporting data between RDBMS & Hive tables
- Overview of Oozie
- Oozie Workflow Architecture
- Creating workflows with Oozie
- Introduction to Flume
- Flume Architecture
- Flume Demo
- Introduction
- Tableau
- Chart types
- Data visualization tools