Intro to Big Data and Hadoop

Overview

This training course will help participants to gain the skills they need to store, manage, process, and analyze massive amounts of structured and unstructured data to extract meaningful insights.

Objectives

At the end of Intro to Big Data & Hadoop training course, participants will

Understand what Big Data is and gain in-depth knowledge of Big Data Analytics concepts and tools.
Learn to Process large data sets with Big Data tools to extract information from disparate sources.
Learn about MapReduce, Hadoop Distributed File System (HDFS), YARN, and how to write MapReduce code.
Learn best practices and considerations for Hadoop development as well as debugging techniques.
Learn how to use Hadoop frameworks like ApachePig™, ApacheHive™, Sqoop, Flume, among other projects.
Perform real-world analytics by learning advanced Hadoop API topics with an e-courseware.

Prerequisites

Before undertaking a Big Data and Hadoop course, participant is recommended to have a basic knowledge of programming languages like Python, Scala, Java and a better understanding of SQL and RDBMS.

Course Outline

Introductionedforce2021-06-29T11:41:32+05:30

Introduction

Understanding Big Data
Types of Big Data
Difference between Traditional Data and Big Data
Introduction to Hadoop
Distributed Data Storage In Hadoop, HDFS and Hbase
Hadoop Data processing Analyzing Services MapReduce and spark, Hive Pig and Storm
Data Integration Tools in Hadoop
Resource Management and cluster management Services

Big Data Ecosystemedforce2021-06-29T11:41:38+05:30

Big Data Ecosystem

Need of Hadoop in Big Data
Understanding Hadoop And Its Architecture
The MapReduce Framework
What is YARN?
Understanding Big Data Components
Monitoring, Management and Orchestration Components of Hadoop Ecosystem
Different Distributions of Hadoop
Installing Hadoop 3

Hadoop Cluster Configurationedforce2021-06-29T11:41:43+05:30

Hadoop Cluster Configuration

Hortonworks sandbox installation & configuration
Hadoop Configuration files
Working with Hadoop services using Ambari
Hadoop Daemons
Browsing Hadoop UI consoles
Basic Hadoop Shell commands
Eclipse & winscp installation & configurations on VM

Big Data Processing with MapReduceedforce2021-06-29T11:41:49+05:30

Big Data Processing with MapReduce

Running a MapReduce application in MR2
MapReduce Framework on YARN
Fault tolerance in YARN
Map, Reduce & Shuffle phases
Understanding Mapper, Reducer & Driver classes
Writing MapReduce WordCount program
Executing & monitoring a Map Reduce job

Batch Analytics with Apache Sparkedforce2021-06-29T11:25:03+05:30

Batch Analytics with Apache Spark

SparkSQL and DataFrames
DataFrames and the SQL API
DataFrame schema
Datasets and encoders
Loading and saving data
Aggregations
Joins

Real Time Analytics with Apache Sparkedforce2021-06-29T11:25:30+05:30

Real Time Analytics with Apache Spark

A short introduction to streaming
Spark Streaming
Discretized Streams
Stateful and stateless transformations
Checkpointing
Operating with other streaming platforms (such as Apache Kafka)
Structured Streaming

Analysis using Pigedforce2021-06-29T11:25:53+05:30

Analysis using Pig

Background of Pig
Pig architecture
Pig Latin basics
Pig execution modes
Pig processing – loading and transforming data
Pig built-in functions
Filtering, grouping, sorting data
Relational join operators
Pig Scripting
Pig UDF’s

Analysis using Hive Data Warehousing Infrastructureedforce2021-06-29T11:26:48+05:30

Analysis using Hive Data Warehousing Infrastructure

Background of Hive
Hive architecture
Hive Query Language
Derby to MySQL database
Managed & external tables
Data processing – loading data into tables
Hive Query Language
Using Hive built-in functions
Partitioning data using Hive
Bucketing data
Hive Scripting
Using Hive UDF’s

Working with HBaseedforce2021-06-29T11:27:27+05:30

Working with HBase

HBase overview
Data model
HBase architecture
HBase shell
Zookeeper & its role in HBase environment
HBase Shell environment
Creating table
Creating column families
CLI commands – get, put, delete & scan
Scan Filter operations

Importing and Exporting Data using Sqoopedforce2021-06-29T11:42:37+05:30

Importing and Exporting Data using Sqoop

Importing data from RDBMS to HDFS
Exporting data from HDFS to RDBMS
Importing & exporting data between RDBMS & Hive tables

Oozie Workflow Management and Using Flume for Analyzing Streaming Dataedforce2021-06-29T11:28:10+05:30

Oozie Workflow Management and Using Flume for Analyzing Streaming Data

Overview of Oozie
Oozie Workflow Architecture
Creating workflows with Oozie
Introduction to Flume
Flume Architecture
Flume Demo

Visualizing Big Dataedforce2021-06-29T11:28:45+05:30

Visualizing Big Data

Introduction
Tableau
Chart types
Data visualization tools

2023-01-06T15:24:53+05:30

I recently attended the AWS Security Training with edForce and had a great first-hand experience. The trainer Anurag, and facilitator Ashutosh, both delivered the content effectively, ensuring that I thoroughly understood the concepts. The training tools and materials were comprehensive and effective in helping me grasp the content. I am looking forward to attending additional trainings with edForce to continue improving my professional knowledge.
– Lavkesh Jain

I have attended a session regarding Docker and Kubernetes with edForce. It was a very good experience. In a short duration of session, instructor explained almost every detail about the topics to at least start with that tech stack. I am looking forward for more session with edForce in future.
– Digvijay Singh Parmar

I have attended few AWS sessions conducted by edForce. Trainers have always been knowledgeable, they covered concepts starting from basics of AWS. The presentations of topics are always very good and in nutshell I found the sessions were very helpful. Thanks you!!
– Vinoth Shanmugam

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA