AB-TREEASSIGNMENTTHATISREALISTICENOUGHTHATSTUDENTSCANLIST ITONTHEIRRESUMES JamesS.Plank EECSDepartment UniversityofTennessee Knoxville,TN37996 865-974-4397 plank@cs.utk.edu Citationinformationforthispaper: TheJournalofComputingSciencesinColleges. Volume31,#2,December,2015,pages278-282. MOTIVATION HighlevellanguageslikePythonandJava,andevenC++,abstractawayso manydetailsofwhatgoeson“beneaththehood'”whenacomputerprogramis running,thatitbecomesachallengetoopenstudents'eyestothefascinating interplaybetweenalgorithmsandcomputersystems.Inparticular,theC++ StandardTemplateLibrarydoessuchagoodjobwithvectors,setsandmaps,thatit ishardtoimparttostudentsthesubtleimpactofissuessuchasmemory management,thatcanprofoundlyaffecttheperformanceofaprogram.Most ComputerSciencecurriculafocusonthetheoryandabstractmechanicsof algorithms,andthenmoveontothedesignprinciplesofoperatingsystems.Inmy experience,studentsdonotgetenoughhands-onexperiencewiththebitsandbytes oftheircomputersystems,andhowthealgorithmsthattheylearnintheabstract interactwiththesebitsandbytes. Onedatastructurethatweteachoursophomores/juniors,almostasanaside, istheB-Tree[1].Thisisabalancedtreedatastructure,whichlikeotherbalanced trees(forexampleAVLtrees,red-blacktrees,splaytrees)haslogarithmictime queries,insertionsanddeletions.However,theB-Treeisdesignedtoworkinan out-of-coreenvironment,wheretherelevantdataisstoredondisk,andtherefore themostexpensiveoperationsarereadsandwritesofdisksectors,ratherthanCPU cycles. FrommyexperiencebothlearningB-Treesasanundergraduateinthe 1980'sandteachingthemforthepast20years,IbelievethatwhenweteachB-Tree tostudents,theydonotintuitivelyunderstandhowbrilliantlyB-Treesachievetheir goals.Thereasonisthatwhenweteachthem,welimitourexamplestotoy scenariosapplicabletopencilandpaper. OVERVIEW Toaddressthismotivation,Ihavedevelopedaprogrammingassignmentthat revolvesaroundB-Trees.Inthisassignment,Igivethestudentsalibrarythat presentsthemwithasimulatedrawdiskinterface.Thisisimplementedontopof standardUnixfiles.Theinterfaceisrealistic,allowingthestudentsonlytoreadand write1Kdisksectorsintheirentirety. Ontopofthisinterface,thestudentswriteasecondlibrarythatimplements B-Trees.TheB-Treesarestoredonthedisksinaspecific,andveryrealisticformat: • Thefirstblockofthediskisa``bootblock''whichcontains informationaboutthekeysthattheB-Treestores,andhowmanydisk blockstheB-Treecurrentlyconsumes. • TheB-Treesstorekeys,whichareafixednumberofbytes,anddata, whicharewholedisksectors. • ThenodesoftheB-Treesholdkeys,anddiskblockaddresses,which arepointerseithertootherB-Treenodes(ifthenodeisaninternal node),ortodata(ifthenodeisanexternalnode). • Thenumberofkeysthatanodemayholdisfixed,anditdetermined bythesizeofthekeys.Intheassignment,keysmayrangefromfour bytesto254bytes. Iprovidemultipleprogramsthatusetheproceduresofthestudents'libraries tobuildandtestavarietyofB-Trees.Someofthesearedesignedtohelpthe studentsdebug,andothersaredesignedtotesttheirimplementation.Becausethe B-Treesadheretoaspecificformat,thetreesthemselvesmustbeinteroperable betweenthestudents'implementations,andmine. EXAMPLE Tohelpillustrate,IdrawanexampleB-TreeinFigure1.Inthisexample,BTreenodesholduptofourkeyseach.Therootofthetreeisheldindiskblock8 (identifiedinthebootblockofthedisk),andholdsonekey.Therefore,itisan internalnodewithtwopointerstoothernodes,whichareindiskblocks1and7 respectively.Thosetwonodesareexternalnodes,whichalsoholduptofourkeys, andtheystorepointerstothedata.Forexample,theexternalnodethatholdskeys 1through4pointstodataindiskblocks,4,11,9,2,and6. Figure1:ExampleB-Treestructure:Internalnodes,externalnodesanddata eachfitintodisksectors,whichmustbereadandwrittenintheirentirety. TheB-Treestructureshouldbeapparentfromthefigure;howeverallofthe componentsarestoredondisksectorsthatmustbereadandwrittenintheir entirety.ItisthejobofthestudentstoimplementB-Treeoperationsthatwork withinthisveryrealisticstructure. ALGORITHMSANDSYSTEMS Astrengthofthisassignmentisthatisstressesbothalgorithmsandsystems. Considerthestepsthatthestudent'slibrarymustperformtofindakeyandreadits data: • Itmustreadthebootblockandinterpretittofigureoutthekeysize andthelocationoftherootnode. • Itmust,foreveryinternalnodefromtherootdowntotheappropriate leafnode,readthenode'sblockfromdisk,andconvertittoan internalformatthatrepresentstheB-Treenode,andthenusethat nodetofindthenextappropriatenode. • Whenitgetstoaleafnode,itmustreadthediskblockcorresponding tothedata. Ofcourse,thewritepathismorechallenging,asitcaninvolve``splitting'' bothexternalandinternalnodeswhentheyaccumulatetoomanykeys. Therefore,thestudentsneedtounderstandnotonlythetheoretical mechanicsofB-Trees,buttheymustaddressthechallengeofconverting1Kblocks ofbytesintoB-Treestructure,perhapsmodifythatstructure,andthenconvertit backto1Kblocks. EXPERIENCE Idevelopedthisassignmentforajunior/seniorelectivecourseentitled ``AdvancedProgrammingandAlgorithms.''Theprerequisitecourseisourthird semesterprogrammingcourse,``DataStructures/AlgorithmsII.''Thestudents programmedthisassignmentinC,althoughC++(withnoSTL)wouldwork,too.No linguisticdatastructuresupportisrequired,becausethemaindatastructure(theBTree)isheldondisk. Inthespringof2015,Iallottedthestudentsthreeweekstodothis assignment.WereIteachingitinanOperatingSystemsorSystemsProgramming course,Iwouldprobablygivethemtwoweeks,andperhapspartitionitintotwo week-longassignmentstohelpthembudgettheirtime. Theassignmentwasachallengetothestudents.Themajorstumblingblocks weretheconversionfrom1024-bytechunksondisktoin-memorydatastructures andbackagain,anddoingtheactualsplittingofexternalandinternalnodes.These issueswerechallengesforourstudentsbecausetheyhavelimitedexperiencewith thebitsandbytesoftheircomputers,havingcuttheirteethongeneral-purposedata structuresliketheSTL. Thestudentsreportedagreatdealofsatisfactiononcompletingthis assignment.Severalofthemputtheassignmentasaline-itemontheirresume--- ``ImplementedB-TreesonaRawDiskInterface''---whichcommunicatesthatthey havesomeofthelow-levelexperiencethatmanyofourlocalemployers(for exampleCisco,andCadre5)craveintheirhires. AVAILABILITYANDAPPLICABILITY Theassignment,plusallsupportingsoftware,isavailableat http://web.eecs.utk.edu/~plank/plank/classes/cs494/494/labs/Lab-1-Btree.As withallofmyprogrammingcourses,Idevelopedautomaticgradingsoftwarethat applies100testcasestothestudents'libraries(bothaloneandinconjunctionwith myprograms).Thishelpsthemdeveloptheircode,debugandtest.Allofthecode andscriptsmaybeportedtoothersystemswithminimaleffort. ThisassignmentwouldbeappropriateinComputerScienceandComputer Engineeringcurriculainthefollowingplaces: • SystemsProgramming:AttheUniversityofTennessee,werequire ourCSmajorstotakeaSystemsProgrammingcoursewhosefocusis onlow-levelprogrammingabovetheUnixoperatingsystem.This coursedovetailsbeautifullywithstandardoperatingsystemscourses, andwouldbeanappropriateplaceforthisassignment. • OperatingSystems:Oneofthestrengthsofthisassignmentisthatit deliversthe“flavor”ofanOperatingSystemsassignmentwithoutall ofthesoftware(andsometimeshardware)scaffoldingthatisrequired forarealOSassignment. • GroupProjectforanAlgorithmsClass:Althoughthe``systems'' flavorofthisassignmentmakeitdifficultforanAlgorithmsclass,it wouldmakeagoodgroupproject. REFERENCES [1]Weiss,M.A.,DataStructuresandAlgorithmAnalysisinC++,3rdEdition,Boston, MA:AddisonWesley,1007.