Table of Contents
1. Data and Sampling
1.1 What is data?
1.2 What is statistics?
1.3 Observational studies and experiments
1.4 Surveys and sampling methods
2. Data Visualization
2.1 What is data visualization?
2.2 Python for data visualization
2.3 Data frames
2.4 Bar charts
2.5 Pie charts
2.6 Scatter plots
2.7 Line charts
2.8 Data visualization example
3. Descriptive Statistics
3.1 Measures of center
3.2 Measures of variability
3.3 Box plots
3.4 Histograms
3.5 Violin plots
4. Probability and Counting
4.1 Introduction to probability
4.2 Addition rule and complements
4.3 Multiplication rule and independence
4.4 Conditional probability
4.5 Bayes’ Theorem
4.6 Combinations and permutations
5. Probability Distributions
5.1 Introduction to random variables
5.2 Properties of discrete probability distributions
5.3 Binomial distribution
5.4 Hypergeometric distribution
5.5 Poisson distribution
5.6 Properties of continuous probability distributions
5.7 Normal distribution
5.8 Student’s t-Distribution
5.9 F-distribution
5.10 Chi-square distribution
6. Inferential Statistics
6.1 Confidence intervals
6.2 Confidence intervals for population means
6.3 Confidence intervals for population proportions
6.4 Hypothesis testing
6.5 Hypothesis test for a population mean
6.6 Hypothesis test for a population proportion
6.7 Hypothesis test for the difference between two population means
6.8 Hypothesis test for the difference between two population proportions
6.9 One-way analysis of variance (one-way ANOVA)
7. Chi-square Tests for Categorical Data
7.1 Categorical data
7.2 Fisher’s exact test
7.3 Introduction to chi-square tests
7.4 Chi-square test for homogeneity and independence
7.5 Relative risk and odds ratios
8. Linear Regression
8.1 Introduction to simple linear regression (SLR)
8.2 SLR assumptions
8.3 Correlation and coefficient of determination
8.4 Interpreting SLR models
8.5 Confidence and prediction intervals for SLR models
8.6 Testing SLR parameters
8.7 Linear regression example
9. Multiple Linear Regression
9.1 Introduction to multiple regression
9.2 Multiple regression assumptions and diagnostics
9.3 Coefficient of multiple determination
9.4 Multicollinearity
9.5 Interpreting multiple regression models
9.6 Confidence and prediction intervals for MLR models
9.7 Testing multiple regression parameters
9.8 Multiple regression example
10. Higher Order Regression
10.1 Categorical predictor variables
10.2 Interaction terms
10.3 Quadratic models
10.4 Complete second order models
10.5 Comparing nested models: F-test
10.6 Higher order models
11. Logistic Regression
11.1 Introduction to logistic regression (LR)
11.2 Estimating LR parameters
11.3 LR models with multiple predictors
11.4 LR assumptions and diagnostics
11.5 Testing LR parameters
11.6 Interpreting LR models
11.7 Comparing nested models: Likelihood ratio tests and AIC
11.8 Classification using LR models
12. Transformations
12.1 Logarithmic transformations
12.2 Ladder of powers
12.3 Box-Cox transformation
13. Stepwise Regression
13.1 Introduction to stepwise regression
13.2 Forward selection
13.3 Backward selection
13.4 Stepwise selection
14. Non-parametric Analysis
14.1 Parametric vs. nonparametric statistics
14.2 Resampling: Randomization and bootstrapping
14.3 Wilcoxon rank-sum test
14.4 Kruskal-Wallis test
14.5 Multiple tests
15. Introduction to Data Mining
15.1 What is data mining?
15.2 Data formats
15.3 Machine learning methods
15.4 scikit-learn
16. Data Cleansing and Preparation
16.1 What is data cleansing?
16.2 Handling missing values
16.3 Outliers
16.4 Standardization and normalization
16.5 Dimensionality reduction
16.6 Training, validation, and test sets
17. Supervised Learning
17.1 k nearest neighbors
17.2 Logistic regression
17.3 Evaluating classification models
17.4 Supervised learning examples
18. Unsupervised Learning
18.1 Clustering methods
18.2 Association rules
18.3 Evaluating clustering models
18.4 Unsupervised learning examples
19. Decision Tree Learning
19.1 Introduction to decision trees
19.2 Classification and regression trees (CART)
19.3 ID3 and C4.5 algorithms
19.4 Classification tree example
19.5 Regression tree example
19.6 Random forests
20. Principal Component Analysis
20.1 Introduction to principal component analysis (PCA)
20.2 Calculating principal components for two variables
20.3 Extending PCA to more variables
20.4 Determining the number of components
20.5 Interpreting principal components
21. Time Series
21.1 What is a time series?
21.2 Time series patterns and stationarity
21.3 Moving average and exponential smoothing forecasting
21.4 Forecasting using regression
22. Monte Carlo Methods
22.1 What is a Monte Carlo simulation?
22.2 Building simulations
22.3 Optimization and forecasting
22.4 What-if analysis
22.5 Advanced simulations
23. Ethics
23.1 Misleading statistics
23.2 Abuse of the p-value
23.3 Data privacy
23.4 Ethical guidelines
24. Appendix A: Distribution tables
14.1 t-distribution table
14.2 z-distribution table
14.3 Chi-squared distribution table
25. Appendix B: CSV Files
25.1 Data sets
Teach applied statistics through a powerful interactive approach that includes programming using Jupyter Notebooks
Applied Statistics with Data Analytics (Python) focuses on statistical concepts and techniques used in data analysis. Important Python libraries are introduced to visualize data, perform statistical inference, and make predictions.
- Packed with interactive animations, questions, and learning activities to help students master the material
- Covers elementary statistical concepts, modeling relationships between two or more variables, and advanced topics such as time series and Monte-Carlo methods
- Data analytics and data mining techniques such as logistic regression, clustering, and decision trees are also covered
- Built-in Python environment and Jupyter Notebooks allow students to experiment with real-world data sets
- Adopters have access to a test bank with over 400 questions
- zyLabs users can add their own Jupyter Notebooks via custom content
What is a zyBook?
Applied Statistics with Data Analytics (Python) is a web-native, interactive zyBook that helps students visualize concepts to learn faster and more effectively than with a traditional textbook. Check out our research.
Since 2012, over 1,700 academic institutions have adopted web-native zyBooks to transform their STEM education.
zyBooks benefit students and instructors:
- Instructor benefits
- Customize your course by reorganizing existing content or adding your own
- Continuous publication model automatically updates your course with the latest content and technologies
- Robust reporting gives you insight into students’ progress, reading and participation
- Save time with auto-graded labs and challenge activities that seamlessly integrate with your LMS gradebook
- Student benefits
- Learning questions and other content serve as an interactive form of reading
- Instant feedback on labs and homework
- Concepts come to life through extensive animations embedded into the interactive content
- Save chapters as PDFs to reference the material at any time
Give students real-life practice with a data set using embedded Jupyter Notebooks.
Senior Contributors
Heather Berrier
Content Developer, Mathematics / PhD Physics and Astronomy, Univ. of California, Irvine
Joel Berrier
Assistant Professor, Dept. of Physics and Astronomy, Univ. of Nebraska, Kearney / PhD Physics and Astronomy, UC Irvine
Chris Chan
MA Mathematics, San Francisco State Univ.
Scott Nestler
Associate Teaching Professor, Mendoza College of Business, Univ. of Notre Dame / PhD Management Science, Univ. of Maryland, College Park
Iain Pardoe
Mathematics and Statistics Instructor, Thompson Rivers Univ., Pennsylvania State Univ., and Statistics.com / PhD Statistics, Univ. of Minnesota
Rodney X. Sturdivant
Professor, Dept. of Mathematics and Physics, Azusa Pacific Univ. / PhD Biostatistics, Univ. of Massachusetts, Amherst
Krista Watts
Assistant Professor, Director—Center for Data Analysis and Statistics, United States Military Academy, West Point / PhD Biostatistics, Harvard