Big Data Hadoop Spark Developer (BDHS)

Overview

With this Big Data Hadoop course, you will learn the big data framework using Hadoop and Spark, including HDFS, YARN, and MapReduce. The course will also cover Pig, Hive, and Impala to process and analyse large datasets stored in the HDFS and use Sqoop and Flume for data ingestion.

Objectives

At the end of BDHS training, participants will be able to understand:

The different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
Hadoop Distributed File System (HDFS) and YARN architecture
MapReduce and its characteristics and assimilate advanced MapReduce concepts
Different types of file formats, Avro schema, using Avro with Hive, and Sqoop and Schema evolution
Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
HBase, its architecture and data storage, and learn the difference between HBase and RDBMS
The common use cases of Spark and various interactive algorithms

Prerequisites

There are no prerequisites for this course. However, it’s beneficial to have some knowledge of Core Java and SQL.

Course Outline

Introduction to Apache Hadoop and the Hadoop Ecosystemedforce2021-11-12T16:10:20+05:30

Introduction to Apache Hadoop and the Hadoop Ecosystem

Apache Hadoop Overview
Data Processing
Introduction to the Hands-On Exercises

Apache Hadoop File Storageedforce2021-11-12T16:10:43+05:30

Apache Hadoop File Storage

Apache Hadoop Cluster Components
HDFS Architecture
Using HDFS

Distributed Processing on an Apache Hadoop Clusteredforce2021-11-12T16:11:05+05:30

Distributed Processing on an Apache Hadoop Cluster

YARN Architecture
Working With YARN

Apache Spark Basicsedforce2021-11-12T16:11:22+05:30

Apache Spark Basics

What is Apache Spark?
Starting the Spark Shell
Using the Spark Shell
Getting Started with Datasets and DataFrames
DataFrame Operations

Working with DataFrames and Schemasedforce2021-11-12T16:11:39+05:30

Working with DataFrames and Schemas

Creating DataFrames from Data Sources
Saving DataFrames to Data Sources
DataFrame Schemas
Eager and Lazy Execution

Analyzing Data with DataFrame Queriesedforce2021-11-12T16:23:27+05:30

Analyzing Data with DataFrame Queries

Querying DataFrames Using Column Expressions
Grouping and Aggregation Queries
Joining DataFrames

RDD Overviewedforce2021-11-12T16:23:44+05:30

RDD Overview

RDD Overview
RDD Data Sources
Creating and Saving RDDs
RDD Operations

Transforming & Aggregating Data with RDDsedforce2021-11-12T16:24:48+05:30

Transforming & Aggregating Data with RDDs

Writing and Passing Transformation Functions
Transformation Execution
Converting Between RDDs and DataFrames
Key-Value Pair RDDs
Map-Reduce
Other Pair RDD Operations

Working with Datasets in Scalaedforce2021-11-12T16:25:08+05:30

Working with Datasets in Scala

Datasets and DataFrames
Creating Datasets
Loading and Saving Datasets
Dataset Operations

Writing, Configuring, and Running Spark Applicationsedforce2021-11-12T16:26:24+05:30

Writing, Configuring, and Running Spark Applications

Writing a Spark Application
Building and Running an Application
Application Deployment Mode
The Spark Application Web UI
Configuring Application Properties

Spark Distributed Processingedforce2021-11-12T16:26:42+05:30

Spark Distributed Processing

Review: Apache Spark on a Cluster
RDD Partitions
Example: Partitioning in Queries
Stages and Tasks
Job Execution Planning
Example: Catalyst Execution Plan
Example: RDD Execution Plan

Structured Streamingedforce2021-11-12T16:28:11+05:30

Structured Streaming

Apache Spark Streaming Overview
Creating Streaming DataFrames
Transforming DataFrames
Executing Streaming Queries
Receiving Kafka Messages
Sending Kafka Messages

2023-01-06T15:23:36+05:30

I recently attended the AWS Security Training with edForce and had a great first-hand experience. The trainer Anurag, and facilitator Ashutosh, both delivered the content effectively, ensuring that I thoroughly understood the concepts. The training tools and materials were comprehensive and effective in helping me grasp the content. I am looking forward to attending additional trainings with edForce to continue improving my professional knowledge.
– Lavkesh Jain

I have attended a session regarding Docker and Kubernetes with edForce. It was a very good experience. In a short duration of session, instructor explained almost every detail about the topics to at least start with that tech stack. I am looking forward for more session with edForce in future.
– Digvijay Singh Parmar

I have attended few AWS sessions conducted by edForce. Trainers have always been knowledgeable, they covered concepts starting from basics of AWS. The presentations of topics are always very good and in nutshell I found the sessions were very helpful. Thanks you!!
– Vinoth Shanmugam

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA