A B-TREE ASSIGNMENT THAT IS REALISTIC ENOUGH THAT STUDENTS CAN LIST IT ON THEIR RESUMES James S. Plank

advertisement
AB-TREEASSIGNMENTTHATISREALISTICENOUGHTHATSTUDENTSCANLIST
ITONTHEIRRESUMES
JamesS.Plank
EECSDepartment
UniversityofTennessee
Knoxville,TN37996
865-974-4397
plank@cs.utk.edu
Citationinformationforthispaper:
TheJournalofComputingSciencesinColleges.
Volume31,#2,December,2015,pages278-282.
MOTIVATION
HighlevellanguageslikePythonandJava,andevenC++,abstractawayso
manydetailsofwhatgoeson“beneaththehood'”whenacomputerprogramis
running,thatitbecomesachallengetoopenstudents'eyestothefascinating
interplaybetweenalgorithmsandcomputersystems.Inparticular,theC++
StandardTemplateLibrarydoessuchagoodjobwithvectors,setsandmaps,thatit
ishardtoimparttostudentsthesubtleimpactofissuessuchasmemory
management,thatcanprofoundlyaffecttheperformanceofaprogram.Most
ComputerSciencecurriculafocusonthetheoryandabstractmechanicsof
algorithms,andthenmoveontothedesignprinciplesofoperatingsystems.Inmy
experience,studentsdonotgetenoughhands-onexperiencewiththebitsandbytes
oftheircomputersystems,andhowthealgorithmsthattheylearnintheabstract
interactwiththesebitsandbytes.
Onedatastructurethatweteachoursophomores/juniors,almostasanaside,
istheB-Tree[1].Thisisabalancedtreedatastructure,whichlikeotherbalanced
trees(forexampleAVLtrees,red-blacktrees,splaytrees)haslogarithmictime
queries,insertionsanddeletions.However,theB-Treeisdesignedtoworkinan
out-of-coreenvironment,wheretherelevantdataisstoredondisk,andtherefore
themostexpensiveoperationsarereadsandwritesofdisksectors,ratherthanCPU
cycles.
FrommyexperiencebothlearningB-Treesasanundergraduateinthe
1980'sandteachingthemforthepast20years,IbelievethatwhenweteachB-Tree
tostudents,theydonotintuitivelyunderstandhowbrilliantlyB-Treesachievetheir
goals.Thereasonisthatwhenweteachthem,welimitourexamplestotoy
scenariosapplicabletopencilandpaper.
OVERVIEW
Toaddressthismotivation,Ihavedevelopedaprogrammingassignmentthat
revolvesaroundB-Trees.Inthisassignment,Igivethestudentsalibrarythat
presentsthemwithasimulatedrawdiskinterface.Thisisimplementedontopof
standardUnixfiles.Theinterfaceisrealistic,allowingthestudentsonlytoreadand
write1Kdisksectorsintheirentirety.
Ontopofthisinterface,thestudentswriteasecondlibrarythatimplements
B-Trees.TheB-Treesarestoredonthedisksinaspecific,andveryrealisticformat:
• Thefirstblockofthediskisa``bootblock''whichcontains
informationaboutthekeysthattheB-Treestores,andhowmanydisk
blockstheB-Treecurrentlyconsumes.
• TheB-Treesstorekeys,whichareafixednumberofbytes,anddata,
whicharewholedisksectors.
• ThenodesoftheB-Treesholdkeys,anddiskblockaddresses,which
arepointerseithertootherB-Treenodes(ifthenodeisaninternal
node),ortodata(ifthenodeisanexternalnode).
• Thenumberofkeysthatanodemayholdisfixed,anditdetermined
bythesizeofthekeys.Intheassignment,keysmayrangefromfour
bytesto254bytes.
Iprovidemultipleprogramsthatusetheproceduresofthestudents'libraries
tobuildandtestavarietyofB-Trees.Someofthesearedesignedtohelpthe
studentsdebug,andothersaredesignedtotesttheirimplementation.Becausethe
B-Treesadheretoaspecificformat,thetreesthemselvesmustbeinteroperable
betweenthestudents'implementations,andmine.
EXAMPLE
Tohelpillustrate,IdrawanexampleB-TreeinFigure1.Inthisexample,BTreenodesholduptofourkeyseach.Therootofthetreeisheldindiskblock8
(identifiedinthebootblockofthedisk),andholdsonekey.Therefore,itisan
internalnodewithtwopointerstoothernodes,whichareindiskblocks1and7
respectively.Thosetwonodesareexternalnodes,whichalsoholduptofourkeys,
andtheystorepointerstothedata.Forexample,theexternalnodethatholdskeys
1through4pointstodataindiskblocks,4,11,9,2,and6.
Figure1:ExampleB-Treestructure:Internalnodes,externalnodesanddata
eachfitintodisksectors,whichmustbereadandwrittenintheirentirety.
TheB-Treestructureshouldbeapparentfromthefigure;howeverallofthe
componentsarestoredondisksectorsthatmustbereadandwrittenintheir
entirety.ItisthejobofthestudentstoimplementB-Treeoperationsthatwork
withinthisveryrealisticstructure.
ALGORITHMSANDSYSTEMS
Astrengthofthisassignmentisthatisstressesbothalgorithmsandsystems.
Considerthestepsthatthestudent'slibrarymustperformtofindakeyandreadits
data:
• Itmustreadthebootblockandinterpretittofigureoutthekeysize
andthelocationoftherootnode.
• Itmust,foreveryinternalnodefromtherootdowntotheappropriate
leafnode,readthenode'sblockfromdisk,andconvertittoan
internalformatthatrepresentstheB-Treenode,andthenusethat
nodetofindthenextappropriatenode.
• Whenitgetstoaleafnode,itmustreadthediskblockcorresponding
tothedata.
Ofcourse,thewritepathismorechallenging,asitcaninvolve``splitting''
bothexternalandinternalnodeswhentheyaccumulatetoomanykeys.
Therefore,thestudentsneedtounderstandnotonlythetheoretical
mechanicsofB-Trees,buttheymustaddressthechallengeofconverting1Kblocks
ofbytesintoB-Treestructure,perhapsmodifythatstructure,andthenconvertit
backto1Kblocks.
EXPERIENCE
Idevelopedthisassignmentforajunior/seniorelectivecourseentitled
``AdvancedProgrammingandAlgorithms.''Theprerequisitecourseisourthird
semesterprogrammingcourse,``DataStructures/AlgorithmsII.''Thestudents
programmedthisassignmentinC,althoughC++(withnoSTL)wouldwork,too.No
linguisticdatastructuresupportisrequired,becausethemaindatastructure(theBTree)isheldondisk.
Inthespringof2015,Iallottedthestudentsthreeweekstodothis
assignment.WereIteachingitinanOperatingSystemsorSystemsProgramming
course,Iwouldprobablygivethemtwoweeks,andperhapspartitionitintotwo
week-longassignmentstohelpthembudgettheirtime.
Theassignmentwasachallengetothestudents.Themajorstumblingblocks
weretheconversionfrom1024-bytechunksondisktoin-memorydatastructures
andbackagain,anddoingtheactualsplittingofexternalandinternalnodes.These
issueswerechallengesforourstudentsbecausetheyhavelimitedexperiencewith
thebitsandbytesoftheircomputers,havingcuttheirteethongeneral-purposedata
structuresliketheSTL.
Thestudentsreportedagreatdealofsatisfactiononcompletingthis
assignment.Severalofthemputtheassignmentasaline-itemontheirresume---
``ImplementedB-TreesonaRawDiskInterface''---whichcommunicatesthatthey
havesomeofthelow-levelexperiencethatmanyofourlocalemployers(for
exampleCisco,andCadre5)craveintheirhires.
AVAILABILITYANDAPPLICABILITY
Theassignment,plusallsupportingsoftware,isavailableat
http://web.eecs.utk.edu/~plank/plank/classes/cs494/494/labs/Lab-1-Btree.As
withallofmyprogrammingcourses,Idevelopedautomaticgradingsoftwarethat
applies100testcasestothestudents'libraries(bothaloneandinconjunctionwith
myprograms).Thishelpsthemdeveloptheircode,debugandtest.Allofthecode
andscriptsmaybeportedtoothersystemswithminimaleffort.
ThisassignmentwouldbeappropriateinComputerScienceandComputer
Engineeringcurriculainthefollowingplaces:
• SystemsProgramming:AttheUniversityofTennessee,werequire
ourCSmajorstotakeaSystemsProgrammingcoursewhosefocusis
onlow-levelprogrammingabovetheUnixoperatingsystem.This
coursedovetailsbeautifullywithstandardoperatingsystemscourses,
andwouldbeanappropriateplaceforthisassignment.
• OperatingSystems:Oneofthestrengthsofthisassignmentisthatit
deliversthe“flavor”ofanOperatingSystemsassignmentwithoutall
ofthesoftware(andsometimeshardware)scaffoldingthatisrequired
forarealOSassignment.
• GroupProjectforanAlgorithmsClass:Althoughthe``systems''
flavorofthisassignmentmakeitdifficultforanAlgorithmsclass,it
wouldmakeagoodgroupproject.
REFERENCES
[1]Weiss,M.A.,DataStructuresandAlgorithmAnalysisinC++,3rdEdition,Boston,
MA:AddisonWesley,1007.
Download