Apache Pig & Hive

Overview

This training will introduce you to the world of Hadoop and MapReduce. You will learn through a series of practical, hands on exercises on writing complex MapReduce transformations, about HDFSand writing scripts using the advanced features of Pig. You will understand the Hive environment, the Hive querying language and how to perform data analysis with Hive.

Objectives

At the end of Apache Pig & Hive training course, participants will learn

How Big data can change the way businesses operate
The Hadoop ecosystem and its architecture
To analyse large data sets using Pig Latins scripts and parallel processing using MapReduce
About Hive and its use in Big Data
The benefits of HiveQL
To use Hive on complex data sets and derive insights to help business

Prerequisites

Understanding of Linux commands and SQL queries
Basic Knowledge of core Java

Course Outline

The Hadoop Ecosystemedforce2021-06-29T12:56:03+05:30

The Hadoop Ecosystem

Hadoop overview
Surveying the Hadoop components
Defining the Hadoop architecture

Exploring HDFS and MapReduceedforce2021-06-29T12:57:46+05:30

Exploring HDFS and MapReduce

Storing data in HDFS

Achieving reliable and secure storage
Monitoring storage metrics
Controlling HDFS from the Command Line

Parallel processing with MapReduce

Detailing the MapReduce approach
Transferring algorithms not data
Dissecting the key stages of a MapReduce job

Automating data transfer

Facilitating data Ingress and Egress
Aggregating data with Flume
Configuring data fan in and fan out
Moving relational data with Sqoop

Executing Data Flows with Pigedforce2021-06-29T12:57:55+05:30

Executing Data Flows with Pig

Contrasting Pig with MapReduce
Identifying Pig use cases
Pinpointing key Pig configurations

Advanced Pigedforce2021-06-29T12:58:02+05:30

Advanced Pig

Pig Latin: Relational Operators
File Loaders
Group Operator
CO GROUP Operator
Joins and CO GROUP
Union, Diagnostic Operators
Pig UDF

Structuring unstructured data

Representing data in Pig’s data model
Running Pig Latin commands at the Grunt Shell
Expressing transformations in Pig Latin Syntax
Invoking Load and Store functions

Performing ETL with Pigedforce2021-06-29T12:58:10+05:30

Performing ETL with Pig

Transforming data with Relational Operators

Creating new relations with joins
Reducing data size by sampling
Extending Pig with user–defined functions

Filtering data with Pig

Consolidating data sets with unions
Partitioning data sets with splits
Injecting parameters into Pig scripts

Hiveedforce2021-06-29T12:58:39+05:30

Hive

Hive Background
Hive Use Case
About Hive
Hive vs Pig
Hive Architecture and Components
Meta-store in Hive
Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models
Partitions and Buckets
Hive Tables(Managed Tables and External Tables)
Importing Data
Querying Data
Managing Outputs

Advanced Hiveedforce2021-06-29T12:59:01+05:30

Advanced Hive

Hive Script
Hive UDF and Hive Demo on Healthcare Data set
Hive QL: Joining Tables
Dynamic Partitioning
Custom MapReduce Scripts
Thrift Server
User Defined Functions

2023-01-06T15:23:59+05:30

I recently attended the AWS Security Training with edForce and had a great first-hand experience. The trainer Anurag, and facilitator Ashutosh, both delivered the content effectively, ensuring that I thoroughly understood the concepts. The training tools and materials were comprehensive and effective in helping me grasp the content. I am looking forward to attending additional trainings with edForce to continue improving my professional knowledge.
– Lavkesh Jain

I have attended a session regarding Docker and Kubernetes with edForce. It was a very good experience. In a short duration of session, instructor explained almost every detail about the topics to at least start with that tech stack. I am looking forward for more session with edForce in future.
– Digvijay Singh Parmar

I have attended few AWS sessions conducted by edForce. Trainers have always been knowledgeable, they covered concepts starting from basics of AWS. The presentations of topics are always very good and in nutshell I found the sessions were very helpful. Thanks you!!
– Vinoth Shanmugam

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA