Google Cloud Data Engineer

Overview

This Data Engineering on Google Cloud Platform training course teaches attendees how to design data processing systems, build end-to-end data pipelines, analyze data, and carry out machine learning.

Objectives

At the end of Google Data Engineer training course, participants will be able to

Design and build data processing systems on Google Cloud
Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow
Derive business insights from extremely large datasets using Google BigQuery
Train, evaluate, and predict using machine learning models using Tensorflow and Cloud ML
Leverage unstructured data using Spark and ML APIs on Cloud Dataproc
Enable instant insights from streaming data

Prerequisites

Basic proficiency with common query language such as SQL
Experience with data modeling, extract, transform, load activities
Experience developing applications using a common programming language such as Python
Familiarity with Machine Learning and/or statistics

Course Outline

Google Cloud Dataproc Overviewedforce2021-06-26T18:08:10+05:30

Google Cloud Dataproc Overview

Creating and managing clusters.
Leveraging custom machine types and preemptible worker nodes
Scaling and deleting Clusters

Running Dataproc Jobsedforce2021-06-26T18:09:54+05:30

Running Dataproc Jobs

Running Pig and Hive jobs.
Separation of storage and compute.

Integrating Dataproc with Google Cloud Platformedforce2021-06-26T18:10:03+05:30

Integrating Dataproc with Google Cloud Platform

Customize cluster with initialization actions.
BigQuery Support.

Making Sense of Unstructured Data with Google’s Machine Learning APIsedforce2021-06-26T18:10:09+05:30

Making Sense of Unstructured Data with Google’s Machine Learning APIs

Google’s Machine Learning APIs
Common ML Use Cases
Invoking ML APIs
Serverless Data Analysis with Google BigQuery and Cloud Dataflow

Serverless Data Analysis with BigQueryedforce2021-06-26T18:10:19+05:30

Serverless Data Analysis with BigQuery

What is BigQuery
Queries and Functions
Loading data into BigQuery
Exporting data from BigQuery
Nested and repeated fields
Querying multiple tables
Performance and pricing

Serverless, Autoscaling Data Pipelines with Dataflowedforce2021-06-26T18:10:36+05:30

Serverless, Autoscaling Data Pipelines with Dataflow

The Beam programming model
Data pipelines in Beam Python
Data pipelines in Beam Java
Scalable Big Data processing using Beam
Incorporating additional data
Handling stream data
GCP Reference architecture
Serverless Machine Learning with TensorFlow on Google Cloud Platform

Getting Started with Machine Learningedforce2021-06-26T18:12:04+05:30

Getting Started with Machine Learning

What is machine learning (ML)
Effective ML: concepts, types
ML datasets: generalization

Building ML Models with Tensorflowedforce2021-06-26T18:12:12+05:30

Building ML Models with Tensorflow

Getting started with TensorFlow
TensorFlow graphs and loops + lab
Monitoring ML training

Scaling ML Models with CloudMLedforce2021-06-26T18:12:24+05:30

Scaling ML Models with CloudML

Why Cloud ML?
Packaging up a TensorFlow model
End-to-end training

Feature Engineeringedforce2021-06-26T18:12:32+05:30

Feature Engineering

Creating good features
Transforming inputs
Synthetic features
Preprocessing with Cloud ML
Building Resilient Streaming Systems on Google Cloud Platform

Architecture of Streaming Analytics Pipelinesedforce2021-06-26T18:12:43+05:30

Architecture of Streaming Analytics Pipelines

Stream data processing: Challenges
Handling variable data volumes
Dealing with unordered/late data

Ingesting Variable Volumesedforce2021-06-26T18:13:05+05:30

Ingesting Variable Volumes

What is Cloud Pub/Sub?
How it works: Topics and Subscriptions

Implementing Streaming Pipelinesedforce2021-06-26T18:13:40+05:30

Implementing Streaming Pipelines

Challenges in stream processing.
Handle late data: watermarks, triggers, accumulation.

Streaming Analytics and Dashboardsedforce2021-06-26T18:13:47+05:30

Streaming Analytics and Dashboards

Streaming analytics: from data to decisions
Querying streaming data with BigQuery
What is Google Data Studio?

High Throughput and Low-Latency with Bigtableedforce2021-06-26T18:14:14+05:30

High Throughput and Low-Latency with Bigtable

What is Cloud Spanner?
Designing Bigtable schema
Ingesting into Bigtable

2023-01-06T15:29:30+05:30

I recently attended the AWS Security Training with edForce and had a great first-hand experience. The trainer Anurag, and facilitator Ashutosh, both delivered the content effectively, ensuring that I thoroughly understood the concepts. The training tools and materials were comprehensive and effective in helping me grasp the content. I am looking forward to attending additional trainings with edForce to continue improving my professional knowledge.
– Lavkesh Jain

I have attended a session regarding Docker and Kubernetes with edForce. It was a very good experience. In a short duration of session, instructor explained almost every detail about the topics to at least start with that tech stack. I am looking forward for more session with edForce in future.
– Digvijay Singh Parmar

I have attended few AWS sessions conducted by edForce. Trainers have always been knowledgeable, they covered concepts starting from basics of AWS. The presentations of topics are always very good and in nutshell I found the sessions were very helpful. Thanks you!!
– Vinoth Shanmugam

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA