Overview
Statistics is an essential course for anyone technical, managerial or administrative — interested in using data to inform their decision-making.
Objectives
At the end of Statistics training course, participants will learn how to
Prerequisites
There are no formal prerequisites for attending this course.
Course Outline
- Definition
- Types of statistician
- Variability
- Probability
- Die roll outcomes
- Why is knowledge of statistics important?
- Descriptive vs inferential statistics
- Inferring population parameters
- Quantitative data
- Qualitative data
- R statistical software
- RStudio
- Interactive exercise manual demo
- What is exploratory data analysis (EDA)
- Histograms and bar charts
- Bar chart vs histogram
- Central tendency and spread
- Bin width is crucial
- Right-skewed data
- Outliers
- Left-skewed data
- Bimodal data
- Separate subpopulations for analysis
- Individual value plot
- Subpopulation individual value plots
- Benefits of boxplots
- Boxplot
- Boxplot vs histogram
- Left-skewed boxplot
- Compare subpopulations using boxplots
- Swedish salaries by level of education
- Measures of central tendency
- Mean vs median
- Mean vs median for skewed data
- Mode
- Measures of spread
- Range and IQR
- Standard deviation
- Six figure summary
- Central tendency and spread equations
- Quantiles
- Benefits of scatterplots
- Scatterplot
- Highlighting subgroups on scatterplot
- What is correlation?
- Correlation examples
- Random data correlation
- Literacy rate correlation
- # children per woman correlation
- Interpreting correlation coefficients
- Correlation doesn’t imply causation
- Causation doesn’t imply (linear) correlation
- Numbers are mostly reckless estimates
- Random variables
- Male life expectancy in UK distribution
- What’s the probability that a US man is 6’ or more?
- What is a probability distribution?
- Populations vs samples
- Sampling the heights of 10 random American men
- Sampling the heights of 100,000 random American men
- Discrete probability distributions
- Roll two dice and histogram the results
- Poisson distribution
- Binary probability distributions
- Probability distribution for cars/household in the UK
- Binomial distribution
- Geometric distribution
- Negative Binomial distribution
- Continuous probability distributions
- Uniform distribution
- Triangular distribution
- Normal distribution
- Properties of the normal distribution
- Distribution of IQ scores
- Different means (same standard deviation)
- Different standard deviations (same mean)
- z-distribution
- 68–95–99.7 (empirical) rule
- Quantile-Quantile (Q-Q) plot
- Q-Q plot of non-normal data
- Common probability distributions “family tree”
- Samples are proxies for the population of interest
- Unfortunately, samples vary
- Larger samples exhibition less variation
- Statistics vs parameters
- Distributions involved in statistical inference
- Sampling distribution of mean IQ
- Collecting more IQ samples
- Sampling distribution of mean die roll
- Sampling distribution of mean project duration
- Create a sampling distribution
- Central limit theorem
- Implications of the central limit theorem
- Standard error of the mean (SEM)
- Impact of sample size on SEM
- What is a confidence interval?
- 95% confidence interval
- Bigger samples give greater precision
- Smaller confidence levels result in tighter intervals
- How should we interpret the confidence interval?
- Random sampling
- Simple random sampling
- Stratified sampling
- Cluster sampling
- What is bootstrapping?
- Estimating median life expectancy
- What is statistical inference?
- Why must we use samples?
- Why do we need to conduct hypothesis tests?
- What is hypothesis testing?
- Null hypothesis
- Alternative hypothesis
- Rejecting the null hypothesis
- One- vs Two-tailed hypothesis tests
- Choosing between one- and two-tailed tests
- What are p-values?
- Significance level (?)
- Types of errors
- Confidence levels vs significance levels
- Performing hypothesis tests
- p-value controversy
- When to use a t-test
- t-value
- t-distribution
- t-distributions
- Slot machine observed ”Return to Player”
- Are slot machine payouts within tolerance?
- Preform a t-test on RTP data using R
- Two-sample t-test
- When to use a z-test
- Conducting hypothesis tests using z-scores
- When to use a 2 test
- Education and Brexit vote
- Brexit vote breakdown
- 2 value
- 2 distributions
- Are education and Brexit vote related?
- When to use a F-test
- Conducting hypothesis tests using F-values
- F-distributions
- Height distribution by sex
- Does height variation differ by sex?
- When to use analysis of variance (ANOVA)
- Determining the F-value
- Are all diets the same?
- All diets are apparently not the same
- Normality hypothesis tests
- Statistically significant treatments?
- What is statistical power?
- Calculating statistical power
- Statistical power curve
- Improving statistical power of hypothesis tests