Apache Spark and Scala

Overview

Apache Spark and Scala course is designed to help you become proficient in Apache Spark Development. You will learn about topics such as Apache Spark Core, Motivation for Apache Spark, Spark Internals, RDD, SparkSQL, Spark Streaming, MLlib, and GraphX that form key constituents of the Apache Spark course.

Objectives

At the end of Apache Spark & Scala training course, participants will

Master the concepts of the Apache Spark framework
Understand the Spark Internals RDD and use of Spark’s API and Scala functions to create RDDs and transform RDDs
Master the RDD Combiners, SparkSQL, Spark Context, Spark Streaming, MLlib, and GraphX

Prerequisites

Hadoop Basics

Course Outline

Introductionedforce2021-06-29T12:31:51+05:30

Introduction

Overview of Hadoop
Architecture of HDFS & YARN
Overview of Spark version 2.2.0
Spark Architecture
Spark Components
Comparison of Spark & Hadoop
Installation of Spark v 2.2.0 on Linux 64 bit

Spark Coreedforce2021-06-29T12:33:35+05:30

Spark Core

Exploring the Spark shell
Creating Spark Context
Operations on Resilient Distributed Dataset – RDD
Transformations & Actions
Loading Data and Saving Data

Spark SQL & Hive SQLedforce2021-06-29T12:33:44+05:30

Spark SQL & Hive SQL

Introduction to SQL Operations
SQL Context
Data Frame
Working with Hive
Loading Partitioned Tables
Processing CSV, Json ,Parquet files

Scala Programmingedforce2021-06-29T12:33:53+05:30

Scala Programming

Introduction to Scala
Feature of Scala
Scala vs Java Comparison
Data types
Data Structure
Arrays
Literals
Logical Operators
Mutable & Immutable variables
Type interface

Scala Functionsedforce2021-06-29T12:34:01+05:30

Scala Functions

Oops vs Functions
Anonymous
Recursive
Call-by-name
Currying
Conditional statement

Scala Collectionsedforce2021-06-29T12:34:09+05:30

Scala Collections

List
Map
Sets
Options
Tuples
Mutable collection
Immutable collection
Iterating
Filtering and counting
Group By
Flat Map
Word count
File Access

Scala Object Oriented Programmingedforce2021-06-29T12:36:02+05:30

Scala Object Oriented Programming

Classes, Objects & Properties
Inheritance

Spark Submitedforce2021-06-29T12:36:08+05:30

Spark Submit

Maven build tool implementation
Build Libraries
Create Jar files
Spark-Submit

Spark Streamingedforce2021-06-29T12:36:13+05:30

Spark Streaming

Overview of Spark Streaming
Architecture of Spark Streaming
File streaming
Twitter Streaming

Kafka Streamingedforce2021-06-29T12:36:19+05:30

Kafka Streaming

Overview of Kafka Streaming
Architecture of Kafka Streaming
Kafka Installation
Topic
Producer
Consumer
File streaming
Twitter Streaming

Spark Mlibedforce2021-06-29T12:36:24+05:30

Spark Mlib

Overview of Machine Learning Algorithm
Linear Regression
Logistic Regression

Spark GraphXedforce2021-06-29T12:46:37+05:30

Spark GraphX

GraphX overview
Vertices
Edges
Triplets
Page Rank
Pregel

Performance Tuningedforce2021-06-29T12:46:52+05:30

Performance Tuning

On-Off-heap memory tuning
Kryo Serialization
Broadcast Variable
Accumulator Variable
DAG Scheduler
Data Locality
Check Pointing
Speculative Execution
Garbage Collection

Project Planning, Monitoring & Trouble Shootingedforce2021-06-29T12:47:16+05:30

Project Planning, Monitoring & Trouble Shooting

Master – Driver Node capacity
Slave – Worker Node capacity
Executor capacity
Executor core capacity
Project scenario and execution
Out-of-memory error handling
Master logs, Worker logs, Driver logs
Monitoring Web UI
Heap memory dump

2023-01-06T15:24:10+05:30

I recently attended the AWS Security Training with edForce and had a great first-hand experience. The trainer Anurag, and facilitator Ashutosh, both delivered the content effectively, ensuring that I thoroughly understood the concepts. The training tools and materials were comprehensive and effective in helping me grasp the content. I am looking forward to attending additional trainings with edForce to continue improving my professional knowledge.
– Lavkesh Jain

I have attended a session regarding Docker and Kubernetes with edForce. It was a very good experience. In a short duration of session, instructor explained almost every detail about the topics to at least start with that tech stack. I am looking forward for more session with edForce in future.
– Digvijay Singh Parmar

I have attended few AWS sessions conducted by edForce. Trainers have always been knowledgeable, they covered concepts starting from basics of AWS. The presentations of topics are always very good and in nutshell I found the sessions were very helpful. Thanks you!!
– Vinoth Shanmugam

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA