Big Data Hadoop Spark Developer (BDHS)

Live Online (VILT) & Classroom Corporate Training Course

The Big Data Hadoop training course will teach you the concepts of the Hadoop framework, its formation in a cluster environment, and prepares you for Cloudera's Big Data certification.

How can we help you?


  • CloudLabs

  • Projects

  • Assignments

  • 24x7 Support

  • Lifetime Access

Big Data Hadoop Spark Developer (BDHS)

Overview

With this Big Data Hadoop course, you will learn the big data framework using Hadoop and Spark, including HDFS, YARN, and MapReduce. The course will also cover Pig, Hive, and Impala to process and analyse large datasets stored in the HDFS and use Sqoop and Flume for data ingestion.

Objectives

At the end of BDHS training, participants will be able to understand:

  • The different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
  • Hadoop Distributed File System (HDFS) and YARN architecture
  • MapReduce and its characteristics and assimilate advanced MapReduce concepts
  • Different types of file formats, Avro schema, using Avro with Hive, and Sqoop and Schema evolution
  • Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
  • HBase, its architecture and data storage, and learn the difference between HBase and RDBMS
  • The common use cases of Spark and various interactive algorithms

Prerequisites

  • There are no prerequisites for this course. However, it’s beneficial to have some knowledge of Core Java and SQL.

Course Outline

Introduction to Apache Hadoop and the Hadoop Ecosystem2021-11-12T16:10:20+05:30
  • Apache Hadoop Overview
  • Data Processing
  • Introduction to the Hands-On Exercises
Apache Hadoop File Storage2021-11-12T16:10:43+05:30
  • Apache Hadoop Cluster Components
  • HDFS Architecture
  • Using HDFS
Distributed Processing on an Apache Hadoop Cluster2021-11-12T16:11:05+05:30
  • YARN Architecture
  • Working With YARN
Apache Spark Basics2021-11-12T16:11:22+05:30
  • What is Apache Spark?
  • Starting the Spark Shell
  • Using the Spark Shell
  • Getting Started with Datasets and DataFrames
  • DataFrame Operations
Working with DataFrames and Schemas2021-11-12T16:11:39+05:30
  • Creating DataFrames from Data Sources
  • Saving DataFrames to Data Sources
  • DataFrame Schemas
  • Eager and Lazy Execution
Analyzing Data with DataFrame Queries2021-11-12T16:23:27+05:30
  • Querying DataFrames Using Column Expressions
  • Grouping and Aggregation Queries
  • Joining DataFrames
RDD Overview2021-11-12T16:23:44+05:30
  • RDD Overview
  • RDD Data Sources
  • Creating and Saving RDDs
  • RDD Operations
Transforming & Aggregating Data with RDDs2021-11-12T16:24:48+05:30
  • Writing and Passing Transformation Functions
  • Transformation Execution
  • Converting Between RDDs and DataFrames
  • Key-Value Pair RDDs
  • Map-Reduce
  • Other Pair RDD Operations
Working with Datasets in Scala2021-11-12T16:25:08+05:30
  • Datasets and DataFrames
  • Creating Datasets
  • Loading and Saving Datasets
  • Dataset Operations
Writing, Configuring, and Running Spark Applications2021-11-12T16:26:24+05:30
  • Writing a Spark Application
  • Building and Running an Application
  • Application Deployment Mode
  • The Spark Application Web UI
  • Configuring Application Properties
Spark Distributed Processing2021-11-12T16:26:42+05:30
  • Review: Apache Spark on a Cluster
  • RDD Partitions
  • Example: Partitioning in Queries
  • Stages and Tasks
  • Job Execution Planning
  • Example: Catalyst Execution Plan
  • Example: RDD Execution Plan
Structured Streaming2021-11-12T16:28:11+05:30
  • Apache Spark Streaming Overview
  • Creating Streaming DataFrames
  • Transforming DataFrames
  • Executing Streaming Queries
  • Receiving Kafka Messages
  • Sending Kafka Messages
2023-01-06T15:23:36+05:30

Title

Go to Top