Asee peer logo

Human vs. Automated Coding Style Grading in Computing Education

Download Paper |

Conference

2019 ASEE Annual Conference & Exposition

Location

Tampa, Florida

Publication Date

June 15, 2019

Start Date

June 15, 2019

End Date

June 19, 2019

Conference Session

Technical Session 11: Topics related to Computer Science

Tagged Division

Computers in Education

Page Count

13

DOI

10.18260/1-2--32906

Permanent URL

https://strategy.asee.org/32906

Download Count

430

Paper Authors

biography

James Perretta University of Michigan

visit author page

James Perretta is currently pursuing a master's degree in Computer Science at the University of Michigan, where he also develops automated grading systems. His research interests and prior work focus on using automated grading systems and feedback policies to enhance student learning.

visit author page

author page

Westley Weimer University of Michigan

biography

Andrew Deorio University of Michigan Orcid 16x16 orcid.org/0000-0001-5653-5109

visit author page

Andrew DeOrio is a teaching faculty member at the University of Michigan and a consultant for web and machine learning projects. His research interests are in ensuring the correctness of computer systems, including medical and IOT devices and digital hardware, as well as engineering education. In addition to teaching software and hardware courses, he teaches Creative Process and works with students on technology-driven creative projects. His teaching has been recognized with the Provost's Teaching Innovation Prize, and he has twice been named Professor of the Year by the students in his department.

visit author page

Download Paper |

Abstract

Human vs. Automated Coding Style Grading in Computing Education

Computer programming courses often evaluate student coding style manually. Static analysis tools provide an opportunity to automate this process. In this paper, we explore the tradeoffs of human style graders and general-purpose static analysis tools to evaluate student code. We investigate the following research questions: - Are human coding style evaluation scores consistent with static analysis tools? - Which style grading criteria are best evaluated with existing static analysis tools and which are more effectively evaluated by human graders?

We analyze data from a second-semester programming course at a large research institution with 943 students enrolled. Hired student graders evaluated student code with rubric criteria such as “Lines are not too long” or “Code is not too deeply nested.” We also ran several static analysis tools on the same student code to evaluate the same criteria. We then analyzed the correlation between the number of static analysis warnings and human style grading score for each criterion.

In our preliminary results, we see that static analysis tools tend to be more effective at evaluating objective code style criteria. We found a weak negative or no correlation between the human style grading score and number of static analysis warnings. Note that we expect student code with more static analysis warnings to receive fewer human style grading points. When comparing the “Lines are not too long” human style grading criterion to a related line-length static analysis inspection, we see a Pearson correlation score of r=-0.21. We also see trends in the distributions of human style grading scores that suggest human graders perform inconsistently. For example, 50% of students who received full human style grading points for the line-length criterion had 3 or more static analysis warnings from a related line-length inspection. Additionally, 23% of students who received no points on the same criterion had no static analysis warnings for the line-length inspection.

We also found that some code style criteria are not well suited to the general-purpose static analysis tools we investigated. For example, none of the static analysis tools we investigated provide a robust way of evaluating the quality of variable and function names in a program. Some tools provide an inspection for detecting variable names that are shorter than a user-specified length threshold; however, this inspection fails to identify low-quality variable names that happen to be longer than the minimum allowed length. Furthermore, there are some common scenarios where a short variable name is acceptable by convention.

Static analysis tools have the benefit of integration with an automated grading system, facilitating faster and more frequent feedback compared to human grading. The literature suggests that frequent feedback encourages students to actively improve on their work (Spacco et al. 2006). There is also evidence to suggest that increased engagement is most beneficial to students with less experience (Carini et al. 2006). Our results suggest that automated code quality evaluation could be one tool that benefits student learning in intro CS courses, helping most those students with least access to CS training pre-college.

References - Carini, R.M., Kuh, G.D. & Klein, S.P. Res High Educ (2006) 47: 1. - Spacco, Jaime and Pugh, William. Helping students appreciate test-driven development (TDD). Proceedings of OOPSLA, pages 907–913, 2006.

Perretta, J., & Weimer, W., & Deorio, A. (2019, June), Human vs. Automated Coding Style Grading in Computing Education Paper presented at 2019 ASEE Annual Conference & Exposition , Tampa, Florida. 10.18260/1-2--32906

ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2019 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015