Intro to Big Data and Hadoop

Live Online (VILT) & Classroom Corporate Training Course

Given the ease with which it allows you to make sense of huge volumes of data and leverage frameworks to transform the same into actionable insights, training for Hadoop & Big Data are in great demand.

How can we help you?

  • CloudLabs

  • Projects

  • Assignments

  • 24x7 Support

  • Lifetime Access

Intro to Big Data and Hadoop


This training course will help participants to  gain the skills they need to store, manage, process, and analyze massive amounts of structured and unstructured data to extract meaningful insights.


At the end of Intro to Big Data & Hadoop training course, participants will

  • Understand what Big Data is and gain in-depth knowledge of Big Data Analytics concepts and tools.

  • Learn to Process large data sets with Big Data tools to extract information from disparate sources.

  • Learn about MapReduce, Hadoop Distributed File System (HDFS), YARN, and how to write MapReduce code.

  • Learn best practices and considerations for Hadoop development as well as debugging techniques.

  • Learn how to use Hadoop frameworks like ApachePig™, ApacheHive™, Sqoop, Flume, among other projects.

  • Perform real-world analytics by learning advanced Hadoop API topics with an e-courseware.


Before undertaking a Big Data and Hadoop course, participant is recommended to have a basic knowledge of programming languages like Python, Scala, Java and a better understanding of SQL and RDBMS.

Course Outline

  • Understanding Big Data
  • Types of Big Data
  • Difference between Traditional Data and Big Data
  • Introduction to Hadoop
  • Distributed Data Storage In Hadoop, HDFS and Hbase
  • Hadoop Data processing Analyzing Services MapReduce and spark, Hive Pig and Storm
  • Data Integration Tools in Hadoop
  • Resource Management and cluster management Services
Big Data Ecosystem2021-06-29T11:41:38+05:30
  • Need of Hadoop in Big Data
  • Understanding Hadoop And Its Architecture
  • The MapReduce Framework
  • What is YARN?
  • Understanding Big Data Components
  • Monitoring, Management and Orchestration Components of Hadoop Ecosystem
  • Different Distributions of Hadoop
  • Installing Hadoop 3
Hadoop Cluster Configuration2021-06-29T11:41:43+05:30
  • Hortonworks sandbox installation & configuration
  • Hadoop Configuration files
  • Working with Hadoop services using Ambari
  • Hadoop Daemons
  • Browsing Hadoop UI consoles
  • Basic Hadoop Shell commands
  • Eclipse & winscp installation & configurations on VM
Big Data Processing with MapReduce2021-06-29T11:41:49+05:30
  • Running a MapReduce application in MR2
  • MapReduce Framework on YARN
  • Fault tolerance in YARN
  • Map, Reduce & Shuffle phases
  • Understanding Mapper, Reducer & Driver classes
  • Writing MapReduce WordCount program
  • Executing & monitoring a Map Reduce job
Batch Analytics with Apache Spark2021-06-29T11:25:03+05:30
  • SparkSQL and DataFrames
  • DataFrames and the SQL API
  • DataFrame schema
  • Datasets and encoders
  • Loading and saving data
  • Aggregations
  • Joins
Real Time Analytics with Apache Spark2021-06-29T11:25:30+05:30
  • A short introduction to streaming
  • Spark Streaming
  • Discretized Streams
  • Stateful and stateless transformations
  • Checkpointing
  • Operating with other streaming platforms (such as Apache Kafka)
  • Structured Streaming
Analysis using Pig2021-06-29T11:25:53+05:30
  • Background of Pig
  • Pig architecture
  • Pig Latin basics
  • Pig execution modes
  • Pig processing – loading and transforming data
  • Pig built-in functions
  • Filtering, grouping, sorting data
  • Relational join operators
  • Pig Scripting
  • Pig UDF’s
Analysis using Hive Data Warehousing Infrastructure2021-06-29T11:26:48+05:30
  • Background of Hive
  • Hive architecture
  • Hive Query Language
  • Derby to MySQL database
  • Managed & external tables
  • Data processing – loading data into tables
  • Hive Query Language
  • Using Hive built-in functions
  • Partitioning data using Hive
  • Bucketing data
  • Hive Scripting
  • Using Hive UDF’s
Working with HBase2021-06-29T11:27:27+05:30
  • HBase overview
  • Data model
  • HBase architecture
  • HBase shell
  • Zookeeper & its role in HBase environment
  • HBase Shell environment
  • Creating table
  • Creating column families
  • CLI commands – get, put, delete & scan
  • Scan Filter operations
Importing and Exporting Data using Sqoop2021-06-29T11:42:37+05:30
  • Importing data from RDBMS to HDFS
  • Exporting data from HDFS to RDBMS
  • Importing & exporting data between RDBMS & Hive tables
Oozie Workflow Management and Using Flume for Analyzing Streaming Data2021-06-29T11:28:10+05:30
  • Overview of Oozie
  • Oozie Workflow Architecture
  • Creating workflows with Oozie
  • Introduction to Flume
  • Flume Architecture
  • Flume Demo
Visualizing Big Data2021-06-29T11:28:45+05:30
  • Introduction
  • Tableau
  • Chart types
  • Data visualization tools




Go to Top