Ultra-Lightweight Early Prediction of At-Risk Students in CS1
Published March 2023
Authors
Chelsea Gordon
zyBooks, Campbell, CA, USA
Stanley Zhao
University of California, Riverside, Riverside, CA, USA
Frank Vahid
University of California, Riverside & zyBooks, Riverside & Campbell, CA, USA
Abstract
Early prediction of students at risk of doing poorly in CS1 can enable early interventions or class adjustments. Preferably, prediction methods would be lightweight, not requiring much extra activity or data-collection work from instructors beyond what they already do. Previous methods included giving surveys, collecting(potentially sensitive) demographic data, introducing clicker questions into lectures, or using locally-developed systems that analyze programming behavior, each requiring some effort by instructors. Today, a widely used textbook / learning system in CS1 classes is zyBooks, used by several hundred thousand students annually. The system automatically collects data related to reading, homework, and programming assignments. For a 300+ student CS1 class, we found that three data metrics, auto-collected by that system in early weeks (1-4), were good at predicting performance on the week-6 midterm exam: non-earnest completion of the assigned readings, struggle on the coding homework, and low scores on the programming assignments, with correlation magnitudes of 0.44, 0.58, and 0.72, respectively. We combined those metrics in a decision tree model to predict students at risk of failing midterm exams (<70%, meaning D or F) and achieved 85%prediction accuracy with 82% sensitivity and 89% specificity, which is higher than previously published early-prediction approaches. The approach may mean that thousands of instructors already using zyBooks or a similar system can get a more accurate early prediction of at-risk students, without requiring extra effort or activities, and avoiding collection of sensitive demographic data