Apache Spark and Scala

Live Online (VILT) & Classroom Corporate Training Course

Apache Spark is a big data processing framework and its popularity lies in the fact that it is fast, easy to use and offers sophisticated solutions to data analysis. Its built-in modules for streaming, machine learning, SQL, and graph processing make it useful in diverse Industries.

How can we help you?

Thanks for sharing your details. Our team will get in touch with you soon.
There was an error trying to send your message. Please try again later.

  • CloudLabs

  • Projects

  • Assignments

  • 24x7 Support

  • Lifetime Access

Apache Spark and Scala


Apache Spark and Scala course is designed to help you become proficient in Apache Spark Development. You will learn about topics such as Apache Spark Core, Motivation for Apache Spark, Spark Internals, RDD, SparkSQL, Spark Streaming, MLlib, and GraphX that form key constituents of the Apache Spark course.


At the end of Apache Spark & Scala training course, participants will

  • Master the concepts of the Apache Spark framework
  • Understand the Spark Internals RDD and use of Spark’s API and Scala functions to create RDDs and transform RDDs
  • Master the RDD Combiners, SparkSQL, Spark Context, Spark Streaming, MLlib, and GraphX


Hadoop Basics

Course Outline

  • Overview of Hadoop
  • Architecture of  HDFS  & YARN
  • Overview of Spark version 2.2.0
  • Spark Architecture
  • Spark  Components
  • Comparison of  Spark &  Hadoop
  • Installation of Spark v 2.2.0 on Linux 64 bit
Spark Core2021-06-29T12:33:35+05:30
  • Exploring the Spark shell
  • Creating Spark Context
  • Operations on Resilient Distributed Dataset – RDD
  • Transformations & Actions
  • Loading Data and Saving Data
Spark SQL & Hive SQL2021-06-29T12:33:44+05:30
  • Introduction to SQL  Operations
  • SQL Context
  • Data Frame
  • Working with Hive
  • Loading Partitioned Tables
  • Processing  CSV, Json ,Parquet files
Scala Programming2021-06-29T12:33:53+05:30
  • Introduction to Scala
  • Feature of Scala
  • Scala vs Java Comparison
  • Data types
  • Data Structure
  • Arrays
  • Literals
  • Logical Operators
  • Mutable & Immutable variables
  • Type interface
Scala Functions2021-06-29T12:34:01+05:30
  • Oops  vs Functions
  • Anonymous
  • Recursive
  • Call-by-name
  • Currying
  • Conditional statement
Scala Collections2021-06-29T12:34:09+05:30
  • List
  • Map
  • Sets
  • Options
  • Tuples
  • Mutable collection
  • Immutable collection
  • Iterating
  • Filtering and counting
  • Group By
  • Flat Map
  • Word count
  • File Access
Scala Object Oriented Programming2021-06-29T12:36:02+05:30
  • Classes, Objects & Properties
  • Inheritance
Spark Submit2021-06-29T12:36:08+05:30
  • Maven  build tool implementation
  • Build Libraries
  • Create  Jar files
  • Spark-Submit
Spark Streaming2021-06-29T12:36:13+05:30
  • Overview  of Spark Streaming
  • Architecture of Spark Streaming
  • File streaming
  • Twitter Streaming
Kafka Streaming2021-06-29T12:36:19+05:30
  • Overview  of Kafka Streaming
  • Architecture of Kafka Streaming
  • Kafka Installation
  • Topic
  • Producer
  • Consumer
  • File streaming
  • Twitter Streaming
Spark Mlib2021-06-29T12:36:24+05:30
  • Overview  of Machine Learning Algorithm
  • Linear Regression
  • Logistic Regression
Spark GraphX2021-06-29T12:46:37+05:30
  • GraphX overview
  • Vertices
  • Edges
  • Triplets
  • Page Rank
  • Pregel
Performance Tuning2021-06-29T12:46:52+05:30
  • On-Off-heap memory tuning
  • Kryo Serialization
  • Broadcast Variable
  • Accumulator Variable
  • DAG Scheduler
  • Data Locality
  • Check Pointing
  • Speculative Execution
  • Garbage Collection
Project Planning, Monitoring & Trouble Shooting2021-06-29T12:47:16+05:30
  • Master – Driver Node capacity
  • Slave –   Worker Node capacity
  • Executor capacity
  • Executor core capacity
  • Project scenario and execution
  • Out-of-memory error handling
  • Master logs, Worker logs, Driver  logs
  • Monitoring Web UI
  • Heap memory dump

Go to Top