Overview
This Data Engineering on Google Cloud Platform training course teaches attendees how to design data processing systems, build end-to-end data pipelines, analyze data, and carry out machine learning.
Objectives
At the end of Google Data Engineer training course, participants will be able to
- Design and build data processing systems on Google Cloud
- Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow
- Derive business insights from extremely large datasets using Google BigQuery
- Train, evaluate, and predict using machine learning models using Tensorflow and Cloud ML
- Leverage unstructured data using Spark and ML APIs on Cloud Dataproc
- Enable instant insights from streaming data
Prerequisites
- Basic proficiency with common query language such as SQL
- Experience with data modeling, extract, transform, load activities
- Experience developing applications using a common programming language such as Python
- Familiarity with Machine Learning and/or statistics
Course Outline
Google Cloud Dataproc Overviewedforce2021-06-26T18:08:10+05:30
- Creating and managing clusters.
- Leveraging custom machine types and preemptible worker nodes
- Scaling and deleting Clusters
Running Dataproc Jobsedforce2021-06-26T18:09:54+05:30
- Running Pig and Hive jobs.
- Separation of storage and compute.
Integrating Dataproc with Google Cloud Platformedforce2021-06-26T18:10:03+05:30
- Customize cluster with initialization actions.
- BigQuery Support.
Making Sense of Unstructured Data with Google’s Machine Learning APIsedforce2021-06-26T18:10:09+05:30
- Google’s Machine Learning APIs
- Common ML Use Cases
- Invoking ML APIs
- Serverless Data Analysis with Google BigQuery and Cloud Dataflow
Serverless Data Analysis with BigQueryedforce2021-06-26T18:10:19+05:30
- What is BigQuery
- Queries and Functions
- Loading data into BigQuery
- Exporting data from BigQuery
- Nested and repeated fields
- Querying multiple tables
- Performance and pricing
Serverless, Autoscaling Data Pipelines with Dataflowedforce2021-06-26T18:10:36+05:30
- The Beam programming model
- Data pipelines in Beam Python
- Data pipelines in Beam Java
- Scalable Big Data processing using Beam
- Incorporating additional data
- Handling stream data
- GCP Reference architecture
- Serverless Machine Learning with TensorFlow on Google Cloud Platform
Getting Started with Machine Learningedforce2021-06-26T18:12:04+05:30
- What is machine learning (ML)
- Effective ML: concepts, types
- ML datasets: generalization
Building ML Models with Tensorflowedforce2021-06-26T18:12:12+05:30
- Getting started with TensorFlow
- TensorFlow graphs and loops + lab
- Monitoring ML training
Scaling ML Models with CloudMLedforce2021-06-26T18:12:24+05:30
- Why Cloud ML?
- Packaging up a TensorFlow model
- End-to-end training
Feature Engineeringedforce2021-06-26T18:12:32+05:30
- Creating good features
- Transforming inputs
- Synthetic features
- Preprocessing with Cloud ML
- Building Resilient Streaming Systems on Google Cloud Platform
Architecture of Streaming Analytics Pipelinesedforce2021-06-26T18:12:43+05:30
- Stream data processing: Challenges
- Handling variable data volumes
- Dealing with unordered/late data
Ingesting Variable Volumesedforce2021-06-26T18:13:05+05:30
- What is Cloud Pub/Sub?
- How it works: Topics and Subscriptions
Implementing Streaming Pipelinesedforce2021-06-26T18:13:40+05:30
- Challenges in stream processing.
- Handle late data: watermarks, triggers, accumulation.
Streaming Analytics and Dashboardsedforce2021-06-26T18:13:47+05:30
- Streaming analytics: from data to decisions
- Querying streaming data with BigQuery
- What is Google Data Studio?
High Throughput and Low-Latency with Bigtableedforce2021-06-26T18:14:14+05:30
- What is Cloud Spanner?
- Designing Bigtable schema
- Ingesting into Bigtable