The rise of the zyLab program auto-grader in introductory CS courses

Chelsea Gordon, Roman Lysecky, Frank Vahid

zyBooks (www.zybooks.com)

Whitepaper

V1 Nov 1, 2020: Initial version.

V2 Jan 12, 2021: Updated data for full 2020 year.

V3 Jan 26, 2021: Added figure 8.

V4 Mar 3, 2022: Updated with data for 2021, and a section on exam usage.

V4 May 4, 2023: Updated with data for 2022

Abstract

In recent years, hundreds of college courses have switched how they grade programming assignments, from grading manually and/or using batch scripts, to using commercial cloud-based auto-graders with immediate score feedback to students. This white paper provides data on the rise in usage of one of the most widely-used program auto-graders, zyLabs, as one indicator of the strong shift in college course grading to the auto-grading paradigm. The number of courses, instructors and students using zyLabs have increased dramatically since it was first introduced, such that from 2016 to 2021, the number of courses per year grew from 284 to 3,935, the number of students per year from 24,216 in 2016 to 220,453 in 2021, and the number of instructors per year from 364 to 3,724. The result is a substantial shift in the classroom dynamic that enables instructors and students to spend more time on quality teaching and learning.

Introduction

Nearly all college-level introductory computer science (CS) courses require students to write programming assignments, often writing one or more programs every week.

In the past, most courses graded student programs by hand. A student would develop the program on their own, and then submit that program on paper or as online files. A teacher (instructor or teaching assistant) would grade each program's runtime correctness and often the code quality, providing written feedback. Human grading's main benefit is high-quality feedback, especially regarding code-quality (style, problem-solving approach). But drawbacks include extensive human resource usage, which is expensive and detracts from other high-value contributions that teachers could make, and delays of days or weeks before students get feedback, which can hinder learning.

Today, many courses use a cloud-based auto-grader. Students submit their programs to a webpage, which in seconds gives feedback on the program's runtime correctness along with a score. Students can then resubmit to improve their score. The benefits include reduced human resources and immediate feedback to aid learning. Drawbacks include little or no feedback on coding style, potential student overreliance on the auto-grader to test programs, and potential cheating of the auto-grader. Some instructors combine manual and auto-grading, letting the auto-grader provide an initial score based on runtime correctness, and then later manually adding a score based on code quality.

This white paper provides data detailing the growth and usage of the zyLabs auto-grader.

zyLabs since 2015

zyBooks was founded in 2012 to improve learning content in introductory CS courses. Its initial product in 2013 included a web-based textbook replacement created natively for the web, thus using less text and instead using 100+ animations and 1000+ interactive learning questions. An integrated homework system was added in 2014, consisting of short auto-graded coding challenges, where students complete a program by writing about 3-10 lines of code, or determining the output of a given program.

In late 2015, zyBooks released "zyLabs" to auto-grade the main remaining component of introductory CS courses, namely the weekly programming assignments.

Adoptions

Figure 1 shows the number of course offerings that adopted a zyBook with zyLabs enabled, per year since 2016. A course offering is a delivery of a course in a given term, such as "CS1 at Univ. of Springfield in Fall 2016". zyLabs adds an extra cost beyond a base zyBook's cost, and thus courses with zyLabs enabled almost always make use of the zyLab auto-grader. For all figures in this document, courses are included if at least 9 students were subscribed to that course.

Figure 1: Course offerings adopting zyLabs per year.

Figure 2 shows the number of students subscribed to those zyBook course offerings each year, thus representing the number of students to whom the zyLab auto-grader was available and likely used.

Figure 2: Student zyBook subscriptions with zyLabs enabled per year.

Figure 3 shows the number of instructors teaching classes using a zyBook with zyLabs enabled, per year. As multiple instructors might teach the same course, this number is larger than the number of courses shown above.

Figure 3: Instructors teaching courses using a zyBook with zyLabs enabled, per year.

Figure 4 shows the proportion of zyBooks in C, C++, Java, and Python courses that have enabled zyLabs, per year.

Figure 4: Instructors teaching C, C++, Java, Python courses using a zyBook with zyLabs not enabled and enabled, per year.

Usage

Figure 5 below shows the average number of zyLab programming assignments used in a course offering. We consider a zyLab as being "used" if at least 5 students submitted programs for grading. We also calculated the average number of zyLabs programming assignments used per student, and the numbers were the same as those used in a course offering.

Figure 5: Average number of zyLab programming assignments used in a course offering. “Used” means that at least 5 students submitted programs for grading.

Figure 6 shows the distribution of the number of zyLab programming assignments that were used by instructors. For instance, in 2018 the large majority of instructors used between 20 and 29 labs, whereas in 2020 only about a fifth of instructors used between 20 and 29 labs and with about half of instructors using more. We observe that not only is the total number of instructors using labs increasing each year as shown above, but the total number of labs that instructors use is also rising substantially. In 2019, zyBooks introduced zyBooks Maintained Labs (ZMLs), which allowed instructors to assign available labs without needing to create all labs independently. Starting that same year, a substantial increase in the number of instructors using 20+ labs, and even those using 100+ labs, in a course offering can be observed.

Figure 6: Distribution of approximate number of zyLabs used by instructors each year.

Ex: In 2018, over half of instructors used between 20 and 29 labs, while in 2021, the distribution is much more spread out.

Figure 7 shows the number of total submissions made to the auto-grader per year. The figure shows that not only are the number of zyLab programming assignments per course increasing, but the usage of those assignments is increasing as well. 2020 saw a dramatic 2.4x increase in usage.

Figure 7: Total number of auto-graded zyLab submissions per year.

Figure 8 shows the distribution of the number of non-comment lines of code in the instructor solution to auto-graded assignments in one term, Fall 2020. The auto-grader has been used for assignments that range from 1 line of code to 780 lines of code in the instructor's solution, with a median of 24 lines of code. (Note: Some solutions might include template code provided to the student). One can see that the auto-grader can be used for small to large programs (in the context of CS classes), with over 1,000 programming assignments being for programs with 100+ line solutions. Figure 9 zooms in to the larger programming assignments, some of which have more than 400 lines of code.

Figure 8: Distribution of solution lengths for auto-graded assignments in Fall 2020.

Figure 9: Distribution of solution lengths for large auto-graded assignments in Fall 2020.

Develop runs (coding in the book)

Instructors have the option to enable a develop mode for zyLab programming assignments. When enabled, develop mode allows students to write and run their code within the zyBook as often as they choose, before (and after) submitting the assignment for grading. Instructors who do not enable develop mode have students use other tools to develop code (IDEs -- integrated development environments). Figure 10 shows that develop mode is enabled for over 80% of auto-graded zyLabs. The percentages were similar for each programming language in labs used in 2020: 83% for C, C++, and Python, and 80% for Java.

Figure 10: Percentage of zyLab programming assignments with development mode enabled.

zyLabs used for exams

When zyLabs were introduced in 2016, instructors were discouraged from using these assignments for course exams. This discouragement was due to zyLabs being a new feature that hadn't yet been vetted for use in high stakes exams. Nevertheless, we began to observe instructors using zyLabs for exams anyway, increasing each year. We detected some such usage by instructors notifying us. We also analyzed zyBook sections whose titles included the key words "midterm", "exam", "final", or "test", and we further examined those sections manually to verify that they were in fact exam sections. Based on that analysis, Figure 11 shows the number of courses that included such sections per year, and Figure 12 shows the total number of students in those courses. Note that this is a lower limit of the actual courses and users that completed zyLabs for exams, as these key words are not exhaustive.

Figure 11: Number of zyBook courses identified using zyLabs in course exams.

Figure 12: Number of students in zyBook courses identified using zyLabs in course exams.

Survey data

To understand how the zyLabs auto-grader is changing the classroom, zyBooks surveyed instructors who used zyLabs during Fall 2020 semester. A self-selecting sample of 116 instructors using zyLabs responded.

Figure 13 shows the number of instructors who used various methods of grading assignments prior to switching to zyLabs auto-grader. The large majority (79%) report grading assignments by hand, which can be very time-consuming.

Figure 13: Number of instructors using various methods of grading prior to switching to zyLabs

Figure 14 shows the reported number of minutes saved per student each week on grading after switching to zyLabs. Nearly all instructors reported spending less time grading; only three instructors (not shown) reported spending more on grading per student, ranging from 1 minute to 6 minutes more per student per week.

Figure 14: Reported number of minutes that instructors saved per student each week on grading after switching to zyLabs.

Figure 15 shows the total number of hours that instructors saved per week on grading. Nearly half (48.7%) of reporting instructors said that they saved at least 5 hours per week using the zyLabs autograder, and over a quarter (26.3%) said that they saved 9 or more hours per week using zyLabs. The median reported grading time saved per week was 4.3 hours, and the mean was 9 hours saved per week.

Figure 15: Reported number of hours saved each week on grading after switching to zyLabs.

Figure 16 shows the change in hours per week that students spent on programming assignments after their course switched to using zyLabs. Approximately half (53%) of instructors reported that students spend about the same amount of time working on programming assignments. Over a third (35%) of instructors reported that students have spent more time on programming assignments, with 16.2% indicating their students spend 3+ additional hours per week on such assignments.

Figure 16: Reported change in time that students spent on programming assignments each week after switching to zyLabs

Discussion

Steep rise

The data above shows a steep rise in courses using the zyLabs auto-grader, growing from zero to over 2,000 course offerings in just a few years, used by over 130,000 students in 2020. In most cases, courses switched to zyLabs not from another auto-grader but rather from manual grading. The auto-grader is used for small and large programs. Thus, this growth suggests substantial changes in the nature of programming courses:

Previously, students would not get feedback for days or even weeks. But now students get immediate feedback, knowing what score they have earned, and can correct their code to earn a better score. This tight feedback loop can yield better learning, and also reduce disappointment due to a lower-than-expected grade with no opportunity for correction.
Instructors are saving large amounts of time grading, averaging 9 hours per week, with some reporting 20+ hours saved. These reductions are large, with the average representing nearly 25% of a 40-hour work week. The large reduction frees instructors to spend time doing higher value teaching, including holding help sessions, answering questions, creating new activities, analyzing student code for common errors for class discussion, detecting and meeting with struggling / at-risk students, implementing class research experiments, or doing higher-value grading based on style or problem-solving approach. A survey respondent put it this way: "[zyLabs] has freed up a lot of my time so I can spend more time working with students." Instructors can also handle more students or teach more classes, and/or departments can assign fewer teaching assistants for the same number of students.

We note that part of the steep rise in 2020 in particular was due to the COVID-19 pandemic, where most courses were suddenly being taught online. Many instructors quickly adopted zyBooks that year, stating the increased need for better quality learning outside of class.

These changes illustrate a future with a much different classroom dynamic. The role of instructors shifts away from grading and more towards educating. This shift may also shape the way that students see instructors, seeing them less as their testers/assessors, and more as their aids/educators.

Potential issues

Immediate feedback focuses on correctness rather than style or approach. As such, auto-graders might promote poor coding style/approach. This issue can be addressed by instructors complementing auto-grading with manual grading of at least some programs, which requires far less time than full manual grading.

Thinking of how to test one's own code is an important part of creating good programs. Auto-grading may reduce students' focus on testing their own code, with some students over-relying on the auto-grader. This issue can be addressed by future techniques that require students to create their own test cases before submitting for auto-grading, and even auto-grading the quality of those test cases.

Future directions

Commercial program auto-graders are relatively young. Looking forward, some improvements may include:

Providing automated hints, especially for common mistakes. When surveyed, students’ most common request in programming classes is often for "more help when stuck". In fact, some "tutorial" labs could be designed in a tutoring style, allowing students to try to develop alone, but then incrementally providing parts of a solution upon request.
Logging a student's develop and submission runs, such that instructors could give credit not just for the final program but also for the effort along the way. zyLabs already provides "effort signatures" and lets instructors see the code for all develop and submission runs of each student. But more logging and more compact representations may be possible ahead.
Detecting similar submissions, not only across a class, but across terms, or with solutions on the internet. zyLabs already provides a built-in similarity checker across a class.
Auto-generating problems so that each student gets a unique problem, and/or so that students can get more practice.
Using auto-graders not just for weekly programs but also for exams. zyLabs is already used in hundreds of courses for exams, and in the future may provide even more support.

Summary

In summary, the data suggests that auto-grading is a major trend in programming courses, with continued rapid growth and increasing usage, which can positively transform class pedagogy. The maturing of auto-graders means that new higher-level tools and techniques may begin to grow in coming years.