in CS Automated Program Grading Andy Wildenberg Jackie Baldwin David Baur Cornell College Mount Vernon, IA Christelle Scharff Olly Gotel Pace University New York, NY in CS Outline • Systems for Automated Assessment of Programming Assignments • WeBWorK • JUnit-based program fragment grader • Conclusions and Future Work in CS Systems for Automated Assessment of Programming Assignments • Web-based systems • Programming as the first skill a computer science undergraduate is expected to master • To improve, reinforce and improve students’ understanding of programming • Types of problems – True / false, matching, multiple-choice, program writing • Grading – Correctness + authenticity + quality in CS Existing Systems • • • • • • • Boss www.dcs.warwick.ac.uk/boss CodeLab www.turingscraft.com CourseMarker www.cs.nott.ac.uk/CourseMarker Gradiance www.gradiance.com MyCodeMate www.mycodemate.com OWL owl.course.com Viope www.viope.com in CS WeBWorK • webwork.rochester.edu • Web-based, automated problem generation, delivery and grading system • Free, open-source project funded by NSF • Initial development and applications in the fields of mathematics and physics • Currently in use at more than 50 colleges and universities in CS WeBWorK • Problems are written in the Problem Generating macro language (PG) – Text, HTML, Latex, Perl • Underlying engine dedicated to dealing with mathematical formulae x+1 = (x^2-1)/(x-1) = x+sin(x)^2+cos(x)^2 • Individualized and parameterized versions of problems in CS WeBWorK for Programming Fundamentals • Programming fundamentals [CC2001] – Fundamental programming constructs, algorithms and problem solving, elementary data structures, recursion, event-driven programming • Extension of WeBWorK for use in the core courses of the Computer Science Curriculum • Interface WeBWorK with other tools to facilitate grading of new problem types • Demo site: – webwork.cornellcollege.edu/webwork2/csc213Apr07 – atlantis.seidenberg.pace.edu/webwork2/demo • Work funded by NSF grant in CS Types of WeBWorK Programming Problems • True / false, matching and multiple choice problems for Java, Python and SML • Sample problems designed from textbook (with permission) – Java Software Solutions: Foundations of Program Design (4th Edition), John Lewis and William Loftus, 2004 • Evaluation of Java programs / fragments by interfacing WeBWorK with JUnit [www.junit.org] in CS Evaluation of Java Fragments • Want to provide a system that can automatically grade program fragments in real time – Individual lines of a program – Single or multiple methods – Full .java file in CS Goals of System • Real time, intelligent grading – More gentle than ACM contest standards • Relative ease of authoring new problems • Using standard tools and techniques in CS A Sample Session in CS Top Level in CS Blank Screen in CS Entered but not Submitted in CS First Response in CS Corrected but not Submitted in CS Acknowledgement of Correct Answer in CS Syntax Error in CS Components of a Problem • PG file to specify a problem – All problems in WeBWorK specified in PG • Code to typeset the question and compute an answer • Answer evaluator determines if answer matches – We provide new evaluator that calls JUnit • Template file – When correct answer inserted, forms valid .java file • JUnit test file – Provides a series of JUnit tests to assess the response. in CS PG Problem DOCUMENT(); # This should be the first executable line in the problem. loadMacros("PG.pl","PGbasicmacros.pl","PGchoicemacros.pl", "PGanswermacros.pl", "PGauxiliaryFunctions.pl", "javaAnswerEvaluators.pl"); TEXT("Boolean Operator"); BEGIN_TEXT $PAR Write a static method named 'flip' of return type 'boolean' which will take a single boolean parameter and simply return its opposite. $BR \{ANS_BOX(1,5,60);\} END_TEXT ANS(java_cmp("JavaSampleSet/BoolOp/","BoolOp")); ENDDOCUMENT(); # This should be the last executable line in the problem. in CS Template File public class BoolOp { replaceme } • Note the student response will replace replaceme. in CS JUnit Test File import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; import java.lang.reflect.Modifier; import java.util.Random; import junit.framework.*; public class BoolOpJUnitTest extends TestCase{ boolean exists,returntype,paramtype,isStatic; Method flip; public static Test suite(){ return new TestSuite(BoolOpJUnitTest.class); } public static void main (String [] args){ BoolOpJUnitTest bojunit = new BoolOpJUnitTest(); } in CS Introspection to Avoid Unnecessary Errors • In setup method, set instance vars to show result of introspection on method signature – – – – – – exists = there is a method named flip isStatic = it is static isPublic = it is public isPrivate = it is private returnType = has correct return (boolean) paramType = has correct params (boolean) in CS JUnit Tests for Method Signature public void testExists(){ Assert.assertTrue("Creating the method",exists); } public void testStatic(){ Assert.assertTrue("Making the method static",isStatic); } public void testReturnType(){ Assert.assertTrue("Making the method return type 'boolean'",returntype); } public void testParamType(){ Assert.assertTrue("Making the method take one 'boolean' parameter",paramtype); } in CS JUnit Tests for Correct Functionality public void testWorks(){ boolean works=false; } if(exists&&returntype&&paramtype&&isStatic){ try { Boolean testBool = new Boolean(false); Object[] args = {testBool}; Boolean result = (Boolean)flip.invoke(BoolOp.class,args); works=(result.booleanValue()); Object[] args2 = {result}; Boolean result2 = (Boolean)flip.invoke(BoolOp.class,args2); works=(works&&!result2.booleanValue()); } catch (Exception e){ Assert.assertTrue("Exception: <BR>"+e.getCause(),false); } } Assert.assertTrue("Making the method return the opposite of its parameter",works); in CS General Execution Flow • User’s question is displayed by WeBWorK • User enters answer and submits • Tmp directory is created – Template file with user response inserted – JUnit test file • Both .java files compiled (syntax errors reported) • JUnit tests run • User score is % of tests that are correct in CS User Sandbox • User code is run in very tight sandbox: – Permissions set in .policy file – File permissions on a per-directory level – Programs run in separate thread and killed aka CPU_LIMIT – Java is executed with low/hard stack/heap limit in CS Early Results • Pilot at Pace University – CS1/CS2 – Higher level course actually designing new problems to help teach JUnit • Pilot at Cornell College – Used very briefly in CS1.5 – Will use more in CS2.5 in CS (Positive) Feedback on JUnit Extension • Students liked being able to test interactively • Students missed IDE features – Syntax coloring: found silly syntax errors distracting – Some used IDEs to preview answer • Preferred WeBWorK/JUnit to CodeLab • Became more helpful as you used it longer in CS (Negative) Feedback on JUnit Extension • HCI issues • Found question language rough/confusing • Want even more detail/feedback/guidance on errors • Tendency to fight system – One student spent 60+ minutes submitting Flip in CS Future Directions (1) • Need to further massage feedback • Need to develop a full set of problems – Problems often text-specific • Check style as well as correctness • Quality control/service for AP in CS Future Directions (2) • Unit testing is not just for Java – – – – Same architecture works for most languages Edit the “system” call in Java.pm Provide appropriate sandbox Write XML output for xUnit • Interface with other CMSes in CS Summary • Implemented system to test Java program fragments in real time via web • Part of larger project to provide auto grading support for CS1/2 – Rest of project ready for prime time (java, python) • Already in use at Pace, Cornell (a bit) in CS Acknowledgement • NSF CCLI AI Grant #0511385 • Collaborative Research: Adapting and Extending WeBWorK for Use in the Computer Science Curriculum in CS Demo Site • http://webwork.cornellcollege.edu/webwork2 • Login as student0/student0 … student9/student9 • Choose csc213Apr07 class