Computing Competencies for Data Science
This is the first post in a three-part series on data science knowledge areas, program structures, and the intro course.
What computing skills are necessary for data science graduates? The Association for Computing Machinery (ACM) recommends 11 knowledge areas for data science undergraduate programs. Not all knowledge areas can be covered in a single class! Instead, knowledge areas may be introduced and reinforced throughout the data science curriculum.
Each competency is described in more detail below. The statements below are goals for knowledge and skills data scientists should be able to have/accomplish. Data science skills are often honed over time and through experience.
When incorporating learning objectives, choosing materials, and selecting professors, donāt let the competencies slip by. Be sure to include these knowledge areas in your program courses!
1. Analysis and presentation
Presentation is an essential professional skill for data scientists, who need to share their findings with colleagues and stakeholders.
Data scientists should…
- Be able to effectively present data, models, and inferences to stakeholders in verbal, written, and graphical forms.
- Use data visualization techniques to explore data, make inferences, and present information.
- Apply effective visualizations for different types of data.
- Select appropriate tools for the size and type of data.
Data visualization and presentation skills can be developed through in-class presentations, group work, and semester projects.
2. Artificial intelligence
Artificial intelligence (AI) has exploded in popularity over the last few years. Data scientists should be familiar with popular AI tools like ChatGPT, and understand the uses and limitations of AI.
Data scientists should…
- Describe major areas of AI.
- Identify contexts for which AI is/is not appropriate.
- Be aware of ethical considerations in AI, and techniques for avoiding harm with AI systems.
Artificial intelligence is constantly evolving, and data scientists are expected to keep up with the latest tools and trends.
3. Big data systems
Professional data scientists work with big data daily.
Data scientists should…
- Understand problems of scale and implications of big data on computation.
- Use suitable tools and algorithms to improve computational efficiency.
- Recognize when to use big data tools like cloud computing, parallel programming, and distributed storage.
Big data systems may be taught in conjunction with a query language like SQL, or in a stand-alone course.
4. Computing and computer fundamentals
Data scientists use computers to analyze data, but knowledge of computing fundamentals is important for performing their jobs efficiently.
Data scientists should…
- Understand the tradeoffs between different processors, operating systems, and data management systems.
- Describe how data networks are organized.
- Use the Internet to gather data and share information.
Intro computer science courses are almost always required in a data science degree, and introduce students to fundamental programming and computing concepts.
5. Data acquisition, management, and governance
Data scientists should understand how their organization collects, manages, and protects data.
Data scientists should…
- Describe how to extract, transform, and load data for analysis.
- Explain principles of data governance, privacy, and security.
Students are often exposed to these concepts in database courses or through professional experiences like internships.
6. Data mining
Data mining is the process of identifying patterns or unusual features in datasets.
Data scientists should…
- Understand a range of data mining techniques.
- Identify and apply suitable data mining techniques for a given problem.
Data mining is often covered in a Machine Learning course, or as a separate course.
7. Data privacy, security, integrity, and analysis for security
Some industries and countries have very specific regulations around data privacy and security.
Data scientists should…
- Be aware of relevant privacy rights and regulations.
- Implement data practices that reduce the risk of data breaches or other violations.
Students often learn industry-specific practices through internships or after graduation. Data science programs should introduce common regulations, like the EU’s GDPR.
8. Machine learning
Machine learning is used by data scientists to make predictions and uncover patterns or trends in data.
Data scientists should…
- Select appropriate machine learning models and algorithms for a given task.
- Use suitable training and testing methods to fit and deploy machine learning algorithms.
- Be aware of machine learning pitfalls like the bias-variance tradeoff and curse of dimensionality.
Machine Learning is usually a separate course in a data science program. Machine Learning is also the newest zyBook in data science, coming March 2024!
9. Professionalism
Communication, presentation, and documentation are essential professional skills in data science.
Data scientists should…
- Keep up to date with changes in technology and methods.
- Be prepared to communicate and work with stakeholders from different business units.
- Prepare documentation, write technical reports, and give presentations describing their work.
These skills should be introduced early in a data science curriculum, and reinforced throughout the program.
10. Programming, data structures, and algorithms
Data scientists use algorithms to process and model datasets. Data scientists should not only be aware of existing algorithms, but have the programming skills to modify and create their own.
Data scientists should…
- Be able to design an algorithm to solve problems.
- Write clear and well-documented code using custom functions or standard libraries.
- Describe different data structures and use appropriate data structures for a given task.
Data science programs often include an algorithms course, and good coding practices should be demonstrated and assessed throughout the program.
11. Software development and maintenance
Models created by data scientists are used by businesses to make decisions, identify customers, and recommend new products. Data scientists must be able to convert their models into production software.
Data scientists should…
- Be aware of basic software engineering principles and potential vulnerabilities in models and data software.
Tools like Kubernetes, Docker, and Git should be introduced in data science programs to help students build professional software development skills.
Computing skills are essential for data scientists, but that’s not all! Data scientists should also have a strong background in calculus, probability theory, linear algebra, and statistics.
The next blog post in this series will cover specific courses for a data science program. In the meantime, take a look at the Computing Competencies for Undergraduate Data Science Curricula (2021), and request an evaluation copy of our Data Science Foundations zyBook!