Data Science with Python

Overview

This Data Science with Python training course teaches engineers, data scientists, statisticians, and other quantitative professionals the Python programming skills they need to analyze and chart data.

Objectives

At the end of Data Science with Python training course, participants will be able to

Understand the difference between Python basic data types
Know when to use different python collections
Implement python functions
Understand control flow constructs in Python
Handle errors via exception handling constructs
Be able to quantitatively define an answerable, actionable question
Import both structured and unstructured data into Python
Parse unstructured data into structured formats
Understand the differences between NumPy arrays and pandas dataframes
Understand where Python fits in the Python/Hadoop/Spark ecosystem
Simulate data through random number generation
Understand mechanisms for missing data and analytic implications
Explore and Clean Data
Create compelling graphics to reveal analytic results
Reshape and merge data to prepare for advanced analytics
Find test for group differences using inferential statistics
Implement linear regression from a frequentist perspective
Understand non-linear terms, confounding, and interaction in linear regression
Extend to logistic regression to model binary outcomes
Understand the difference between machine learning and frequentist approaches to statistics
Implement classification and regression models using machine learning
Score new datasets, evaluate model fit, and quantify variable importance

Prerequisites

All attendees should have prior programming experience and an understanding of basic statistics.

Course Outline

Base Python Introductionedforce2021-06-30T12:45:22+05:30

Base Python Introduction

History and current use
- Installing the Software
- Python Distributions
String Literals and numeric objects
Collections (lists, tuples, dicts)
Datetime classes in Python
Memory Management in Python
Control Flow
Functions
Exception Handling

Defining Actionable, Analytic Questionsedforce2021-06-30T12:48:33+05:30

Defining Actionable, Analytic Questions

Defining the quantitative construct to make inference on the question
Identifying the data needed to support the constructs
Identifying limitations to the data and analytic approach
Constructing Sensitivity analyses

Bringing Data Inedforce2021-06-30T12:48:41+05:30

Bringing Data In

Structured Data
- Structured Text Files
- Excel workbooks
- SQL databases
Working with Unstructured Text Data
- Reading Unstructured Text
- Introduction to Natural Language Processing with Python

NumPy: Matrix Languageedforce2021-06-30T12:48:49+05:30

NumPy: Matrix Language

Introduction to the ndarray
NumPy operations
Broadcasting
Missing data in NumPy (masked array)
NumPy Structured arrays
Random number generation

Data Preparation with Pandasedforce2021-06-30T12:48:55+05:30

Data Preparation with Pandas

Filtering
Creating and deleting variables
Discretization of Continuous Data
Scaling and standardizing data
Identifying Duplicates
Dummy Coding
Combining Datasets
Transposing Data
Long to wide and back

Exploratory Data Analysis with Pandasedforce2021-06-30T12:49:02+05:30

Exploratory Data Analysis with Pandas

Univariate Statistical Summaries and Detecting Outliers
Multivariate Statistical Summaries and Outlier Detection
Group-wise calculations using Pandas
Pivot Tables

Exploring Data Graphicallyedforce2021-06-30T12:49:09+05:30

Exploring Data Graphically

Histogram
Box-and-whiskers plot
Scatter plots
Forest Plots
Group-by plotting

Advanced Graphing with Matplotlib, Pandas, and Seabornedforce2021-06-30T13:10:42+05:30

Advanced Graphing with Matplotlib, Pandas, and Seaborn

Python, Hadoop and Sparkedforce2021-06-30T12:50:54+05:30

Python, Hadoop and Spark

Introduction to the difference in Python, Hadoop, and Spark
Importing data from Spark and Hadoop to Python
Parallel execution leveraging Spark or Hadoop

Missing Dataedforce2021-06-30T12:52:31+05:30

Missing Data

Exploring and understanding patterns in missing data
Missing at Random
Missing Not at Random
Missing Completely at Random
Data imputation methods

Traditional Inferential Statisticsedforce2021-06-30T12:52:37+05:30

Traditional Inferential Statistics

Comparing Groups
- P-Values, summary statistics, sufficient statistics, inferential targets
- T-Tests (equal and unequal variances)
- ANOVA
- Chi-Square Tests
Correlation

Frequentist Approaches to Multivariate Statisticsedforce2021-06-30T12:52:44+05:30

Frequentist Approaches to Multivariate Statistics

Linear Regression
- Multivariate linear regression
- Capturing Non-linear Relationships
- Comparing Model Fits
- Scoring new data
- Poisson Regression Extension
Logistic regression
- Logistic Regression Example
- Classification Metrics

Machine Learning Approaches to Multivariate Statisticsedforce2021-06-30T12:52:50+05:30

Machine Learning Approaches to Multivariate Statistics

Machine Learning Theory
Data pre-processing
- Missing Data
- Dummy Coding
- Standardization
- Training/Test data
Supervised Versus Unsupervised Learning
Unsupervised Learning: Clustering
- Clustering Algorithms
- Evaluating Cluster Performance
Dimensionality Reduction
- A-priori
- Principal Components Analysis
- Penalized Regression

Supervised Learning: Regressionedforce2021-06-30T12:52:57+05:30

Supervised Learning: Regression

Linear Regression
Penalized Linear Regression
Stochastic Gradient Descent
Scoring New Data Sets
Cross Validation
Variance Bias-Tradeoff
Feature Importance

Supervised Learning: Classificationedforce2021-06-30T12:53:27+05:30

Supervised Learning: Classification

Logistic Regression
LASSO
Random Forest
Ensemble Methods
Feature Importance
Scoring New Data Sets
Cross Validation

2023-01-06T14:06:59+05:30

I recently attended the AWS Security Training with edForce and had a great first-hand experience. The trainer Anurag, and facilitator Ashutosh, both delivered the content effectively, ensuring that I thoroughly understood the concepts. The training tools and materials were comprehensive and effective in helping me grasp the content. I am looking forward to attending additional trainings with edForce to continue improving my professional knowledge.
– Lavkesh Jain

I have attended a session regarding Docker and Kubernetes with edForce. It was a very good experience. In a short duration of session, instructor explained almost every detail about the topics to at least start with that tech stack. I am looking forward for more session with edForce in future.
– Digvijay Singh Parmar

I have attended few AWS sessions conducted by edForce. Trainers have always been knowledgeable, they covered concepts starting from basics of AWS. The presentations of topics are always very good and in nutshell I found the sessions were very helpful. Thanks you!!
– Vinoth Shanmugam

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

Programming & Testing

Stacks

SoftSkills

Mobile & Web

Blockchain & Security

Databases

Agile & Scrum

AXELOS / PeopleCert

Mirantis

PECB

LPI

CertNexus

CSA

DASA

CybergymIEC

Cisco

AWS

Azure

AI, ML, Data Science, IoT & RPA

Automotive

AR/VR

Cloud Computing

Big Data & Analytics

DevOps & ITSM

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA

AI, ML, Data Science, IoT
& RPA