Python or R for Data Science?
A question we hear from instructors is, how do you choose the right language for your data science courses?
The two primary languages used in data science are Python and R. In fact, zyBooks publishes two programming versions of our foundational introduction to data science – one for each language. (And another one without coding.)
To give you some perspective, here’s a quick chart comparing Python to R:
PYTHON | R |
General-purpose language widely used in industry and intro CS courses | Used by statisticians, data analysts and research scientists |
Very flexible for multiple applications | Specifically written for data and statistical analysis, displaying graphics, and statistical modeling |
Open source and supports object-oriented programming | Open source and supports object-oriented programming |
Commonly used data science packages are pandas, seaborn, and scikit-learn | Commonly used data science packages are tidyr, ggplot2, and dplyr. |
Cleaner syntax; students find Python easier to learn | More difficult syntax, but commonly used data science packages within tidyverse ecosystem are designed to work together |
In this short video, data science professor and zyBooks co-author Dr. Aimee Schwab-McCoy walks you through how to pick the right language for your students: