Apache Pig & Hive

Live Online (VILT) & Classroom Corporate Training Course

Apache Pig is known for its simplistic syntax and ability to decrease development time and hence is widely used by organizations that analyse Big Data. The Hive tool in the Hadoop ecosystem is much sought after because it is scalable and provides tools for easy data analysis and extraction.

How can we help you?

Thanks for sharing your details. Our team will get in touch with you soon.
There was an error trying to send your message. Please try again later.

  • CloudLabs

  • Projects

  • Assignments

  • 24x7 Support

  • Lifetime Access

Apache Pig & Hive

Overview

This training will introduce you to the world of Hadoop and MapReduce. You will learn through a series of practical, hands on exercises on writing complex MapReduce transformations, about HDFSand writing scripts using the advanced features of Pig. You will understand the Hive environment, the Hive querying language and how to perform data analysis with Hive.

Objectives

At the end of Apache Pig & Hive training course, participants will learn

  • How Big data can change the way businesses operate
  • The Hadoop ecosystem and its architecture
  • To analyse large data sets using Pig Latins scripts and parallel processing using MapReduce

  • About Hive and its use in Big Data
  • The benefits of HiveQL

  • To use Hive on complex data sets and derive insights to help business

Prerequisites

  • Understanding of Linux commands and SQL queries
  • Basic Knowledge of core Java

Course Outline

The Hadoop Ecosystem2021-06-29T12:56:03+05:30
  • Hadoop overview
  • Surveying the Hadoop components
  • Defining the Hadoop architecture
Exploring HDFS and MapReduce2021-06-29T12:57:46+05:30

Storing data in HDFS

  • Achieving reliable and secure storage
  • Monitoring storage metrics
  • Controlling HDFS from the Command Line

Parallel processing with MapReduce

  • Detailing the MapReduce approach
  • Transferring algorithms not data
  • Dissecting the key stages of a MapReduce job

Automating data transfer

  • Facilitating data Ingress and Egress
  • Aggregating data with Flume
  • Configuring data fan in and fan out
  • Moving relational data with Sqoop
Executing Data Flows with Pig2021-06-29T12:57:55+05:30
  • Contrasting Pig with MapReduce
  • Identifying Pig use cases
  • Pinpointing key Pig configurations
Advanced Pig2021-06-29T12:58:02+05:30
  • Pig Latin: Relational Operators
  • File Loaders
  • Group Operator
  • CO GROUP Operator
  • Joins and CO GROUP
  • Union, Diagnostic Operators
  • Pig UDF

Structuring unstructured data

  • Representing data in Pig’s data model
  • Running Pig Latin commands at the Grunt Shell
  • Expressing transformations in Pig Latin Syntax
  • Invoking Load and Store functions
Performing ETL with Pig2021-06-29T12:58:10+05:30

Transforming data with Relational Operators

  • Creating new relations with joins
  • Reducing data size by sampling
  • Extending Pig with user–defined functions

Filtering data with Pig

  • Consolidating data sets with unions
  • Partitioning data sets with splits
  • Injecting parameters into Pig scripts
Hive2021-06-29T12:58:39+05:30
  • Hive Background
  • Hive Use Case
  • About Hive
  • Hive vs Pig
  • Hive Architecture and Components
  • Meta-store in Hive
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Partitions and Buckets
  • Hive Tables(Managed Tables and External Tables)
  • Importing Data
  • Querying Data
  • Managing Outputs
Advanced Hive2021-06-29T12:59:01+05:30
  • Hive Script
  • Hive UDF and Hive Demo on Healthcare Data set
  • Hive QL: Joining Tables
  • Dynamic Partitioning
  • Custom MapReduce Scripts
  • Thrift Server
  • User Defined Functions
2022-01-23T18:35:04+05:30

Go to Top