Apache Spark and Scala

Live Online (VILT) & Classroom Corporate Training Course

Apache Spark is a big data processing framework and its popularity lies in the fact that it is fast, easy to use and offers sophisticated solutions to data analysis. Its built-in modules for streaming, machine learning, SQL, and graph processing make it useful in diverse Industries.

How can we help you?


  • CloudLabs

  • Projects

  • Assignments

  • 24x7 Support

  • Lifetime Access

Apache Spark and Scala

Overview

Apache Spark and Scala course is designed to help you become proficient in Apache Spark Development. You will learn about topics such as Apache Spark Core, Motivation for Apache Spark, Spark Internals, RDD, SparkSQL, Spark Streaming, MLlib, and GraphX that form key constituents of the Apache Spark course.

Objectives

At the end of Apache Spark & Scala training course, participants will

  • Master the concepts of the Apache Spark framework
  • Understand the Spark Internals RDD and use of Spark’s API and Scala functions to create RDDs and transform RDDs
  • Master the RDD Combiners, SparkSQL, Spark Context, Spark Streaming, MLlib, and GraphX

Prerequisites

Hadoop Basics

Course Outline

Introduction2021-06-29T12:31:51+05:30
  • Overview of Hadoop
  • Architecture of  HDFS  & YARN
  • Overview of Spark version 2.2.0
  • Spark Architecture
  • Spark  Components
  • Comparison of  Spark &  Hadoop
  • Installation of Spark v 2.2.0 on Linux 64 bit
Spark Core2021-06-29T12:33:35+05:30
  • Exploring the Spark shell
  • Creating Spark Context
  • Operations on Resilient Distributed Dataset – RDD
  • Transformations & Actions
  • Loading Data and Saving Data
Spark SQL & Hive SQL2021-06-29T12:33:44+05:30
  • Introduction to SQL  Operations
  • SQL Context
  • Data Frame
  • Working with Hive
  • Loading Partitioned Tables
  • Processing  CSV, Json ,Parquet files
Scala Programming2021-06-29T12:33:53+05:30
  • Introduction to Scala
  • Feature of Scala
  • Scala vs Java Comparison
  • Data types
  • Data Structure
  • Arrays
  • Literals
  • Logical Operators
  • Mutable & Immutable variables
  • Type interface
Scala Functions2021-06-29T12:34:01+05:30
  • Oops  vs Functions
  • Anonymous
  • Recursive
  • Call-by-name
  • Currying
  • Conditional statement
Scala Collections2021-06-29T12:34:09+05:30
  • List
  • Map
  • Sets
  • Options
  • Tuples
  • Mutable collection
  • Immutable collection
  • Iterating
  • Filtering and counting
  • Group By
  • Flat Map
  • Word count
  • File Access
Scala Object Oriented Programming2021-06-29T12:36:02+05:30
  • Classes, Objects & Properties
  • Inheritance
Spark Submit2021-06-29T12:36:08+05:30
  • Maven  build tool implementation
  • Build Libraries
  • Create  Jar files
  • Spark-Submit
Spark Streaming2021-06-29T12:36:13+05:30
  • Overview  of Spark Streaming
  • Architecture of Spark Streaming
  • File streaming
  • Twitter Streaming
Kafka Streaming2021-06-29T12:36:19+05:30
  • Overview  of Kafka Streaming
  • Architecture of Kafka Streaming
  • Kafka Installation
  • Topic
  • Producer
  • Consumer
  • File streaming
  • Twitter Streaming
Spark Mlib2021-06-29T12:36:24+05:30
  • Overview  of Machine Learning Algorithm
  • Linear Regression
  • Logistic Regression
Spark GraphX2021-06-29T12:46:37+05:30
  • GraphX overview
  • Vertices
  • Edges
  • Triplets
  • Page Rank
  • Pregel
Performance Tuning2021-06-29T12:46:52+05:30
  • On-Off-heap memory tuning
  • Kryo Serialization
  • Broadcast Variable
  • Accumulator Variable
  • DAG Scheduler
  • Data Locality
  • Check Pointing
  • Speculative Execution
  • Garbage Collection
Project Planning, Monitoring & Trouble Shooting2021-06-29T12:47:16+05:30
  • Master – Driver Node capacity
  • Slave –   Worker Node capacity
  • Executor capacity
  • Executor core capacity
  • Project scenario and execution
  • Out-of-memory error handling
  • Master logs, Worker logs, Driver  logs
  • Monitoring Web UI
  • Heap memory dump
2023-01-06T15:24:10+05:30

Title

Go to Top