Listoflectures IntroductionandFunctionalProgramming ImperativeProgrammingandDataStructures 3 Environment 4 Evaluation 5 ObjectOrientedProgramming 6 Macrosanddecorators 7 VirtualMachinesandBytecode 8 GarbageCollectionandNativeCode 9 ParallelandDistributedComputing 10LogicProgramming 11Summary 1 TDDA69DataandProgramStructure ParallelandDistributedComputing CyrilleBerger 2 2/64 Lecturegoal Lecturecontent Learnabouttheconcept, thechallengesofdistributed computing Theimpactofdistributed programmingonprogramming languageandimplementations ParallelProgramming MultithreadedProgramming TheStatesProblemsandSolutions Atomicactions LanguageandInterpreterDesignConsiderations SingleInstruction,MultipleThreads Programming Distributedprogramming MessagePassing MapReduce 3/64 4/64 Concurrentcomputing Inconcurrentcomputingseveral computationsareexecutedatthe sametime Inparallelcomputingallcomputations unitshaveaccesstosharedmemory (forinstanceinasingleprocess) Indistributedcomputingcomputations unitscommunicatethroughmessages passing 5/64 Disadvantagesofconcurrentcomputing Benefitsofconcurrentcomputing Faster Responsiveness Interactiveapplicationscanbeperformingtwotasks atthesametime:rendering,spellchecking... Availabilityofservices Loadbalancingbetweenservers Controllability Tasksrequiringcertainpreconditionscansuspend andwaituntilthepreconditionshold,thenresume executiontransparently. 6/64 Concurrentcomputingprogramming Fourbasicapproachtocomputing: Concurrencyishardtoimplement properly Safety Sequencialprogramming:noconcurrency Declarativeconcurrency:streamsina functionallanguage Messagepassing:withactiveobjects,usedin distributedcomputing Atomicactions:onasharedmemory,usedin parallelcomputing Easytocorrupt Deadlock Taskscanwaitindefinitelyforeach NonNotalwaysfaster! ThememorybandwidthandCPUcacheis 7/64 8/64 StreamProgramminginFunctionalProgramming Noglobal Functionsonlyactontheirinput, theyarereentrant Functionscanthenbeexecuted inparallel ParallelProgramming Aslongastheydonotdependontheoutput ofanotherfunction 9/64 ParallelProgramming SIMD,SIMT,SMT(1/2) Inparallelcomputingseveral computationsareexecutedatthe sametimeandhaveaccessto sharedmemory Unit Unit SIMD:SingleInstruction,Multiple Elementsofashortvector(4to8elements)areprocessedinparallel SIMT:SingleInstruction,Multiple Thesameinstructionisexecutedbymultiplethreads(from128to3048 ormoreinthefuture) Unit SMT:Simultaneous Memory Generalpurpose,differentinstructionsareexecutedbydifferentthreads 11 12 SIMD,SIMT,SMT(2/2) Whytheneedforthedifferentmodels? Flexibility: SIMD: SMT>SIMT>SIMD PUSH[1,2,3,4] PUSH[4,5,6,7]chrome://downloads/ VEC_ADD_4 Lessflexibilitygive higherperformance SIMT: execute([1,2,3,4],[4,5,6,7],lambdaa,b,ti:a[ti]=a[ti] +max(b[ti],5)) Unlessthelackofflexibilitypreventto accomplishthetask SMT: a=[1,2,3,4] b=[4,5,6,7] ... Thread.new(lambda:a=a+b) Thread.new(lambda:c=c*b) Performance: SIMD>SIMT>SMT 13 14 SinglethreadedvsMultithreaded MultithreadedProgramming 16 Amultithreadedexample MultithreadedProgrammingModel Startwithasingleroot thread Fork:tocreateconcurently executingthreads Join:tosynchronize threads Threadscommunicate throughsharedmemory Threadsexecute assynchronously Theymayormay notexecuteon differentprocessors thread1=newThread( function() { /*dosomecomputation*/ }); thread2=newThread( function() { /*dosomecomputation*/ }); thread1.start(); thread2.start(); thread1.join(); thread2.join(); main sub0 ... subn main sub0 ... subn main 17 18 GlobalStatesandmulti-threading Example: TheStatesProblemsandSolutions vara=0; thread1=newThread( function() { a=a+1; }); thread2=newThread( function() { a=a+1; }); thread1.start(); thread2.start(); Whatisthevalueofa? Thisiscalledaracecondition 20 Mutex Mutexistheshortof Mutualexclusion Itisatechniquetopreventtwo threadstoaccessashared resourceatthesametime Atomicactions Example: vara=0; varm=newMutex(); thread1=new Thread( function() { m.lock(); a=a+1; m.unlock(); }); Dependency Example: vara=1; varm=new Mutex(); thread1=new Thread( function() { m.lock(); a=a+1; m.unlock(); }); thread2=new Thread( function() { m.lock(); a=a+1; m.unlock(); }); thread1.start(); thread2.start(); Now 22 Conditionvariable thread2=newThread( function() { m.lock(); a=a*3; m.unlock(); }); thread1.start(); thread2.start(); Whatisthevalue ofa?4or6? 23 AConditionvariableisa setofthreadswaitingfor acertaincondition Example: vara=1; varm=newMutex(); varcv=newConditionVariable(); thread1=newThread( function() { m.lock(); a=a+1; cv.notify(); m.unlock(); }); thread2=new Thread( function() { cv.wait(); m.lock(); a=a*3; m.unlock(); }); thread1.start(); thread2.start(); a=6 24 Deadlock Whatmighthappen: vara=0; varb=2; varma=newMutex(); varmb=newMutex(); thread1=new Thread( function() { ma.lock(); mb.lock(); b=b-1; a=a-1; ma.unlock(); mb.unlock(); }); Advantagesofatomicactions Veryefficient Lessoverhead,faster thanmessagepassing thread2=new Thread( function() { mb.lock(); ma.lock(); b=b-1; a=a+b; mb.unlock(); ma.unlock(); }); thread1.start(); thread2.start(); thread1waitsformb, thread2waitsforma 25 26 Disadvantagesofatomicactions Blocking Meaningsomethreadshavetowait Smalloverhead Deadlock Alow-prioritythreadcanblocka highprioritythread Acommonsourceof programmingerrors LanguageandInterpreterDesignConsiderations 27 Commonmistakes Forgettounlockamutex Forgettounlockamutex Racecondition Deadlocks Granularityissues:toomuch lockingwillkilltheperformance Mostprogramminglanguage have,either: Aguardobjectthatwillunlockamutexupon destruction Asynchronizationstatement some_rlock=threading.RLock() withsome_rlock: print("some_rlockislockedwhilethisexecutes") 29 Racecondition 30 SafeSharedMutableStateinrust(1/3) Canwedetectpotentialrace conditionduringcompilation? Intherustprogramminglanguage letmutdata=vec![1u32,2,3]; forjin0..2{ thread::spawn(move||{ for(inti=0;i<2;++i) data[i]+=1; }); } Objectsareownedbyaspecificthread TypescanbemarkedwithSendtrait Givesanerror:"captureofmoved value:`data`" indicatethattheobjectcanbemovedbetween threads TypescanbemarkedwithSynctrait indicatethattheobjectcanbeaccessedbymultiple threadssafely 31 32 SafeSharedMutableStateinrust(2/3) SafeSharedMutableStateinrust(3/3) letmutdata=Mutex::new(vec![1u32,2,3]); forjin0..2{ letdata=data.lock().unwrap(); thread::spawn(move||{ for(inti=0;i<2;++i) data[i]+=1; }); } letmutdata=Arc::new(vec![1u32,2,3]); forjin0..2{ letdata=data.clone(); thread::spawn(move||{ letmutdata=data.lock().unwrap(); for(inti=0;i<2;++i) data[i]+=1; }); } Givesanerror:MutexGuarddoes nothaveSendtraits ArchastheSynctrait. Meaningwecanotmovedatatothethread 33 34 SingleInstruction,MultipleThreadsProgramming SingleInstruction,MultipleThreadsProgramming WithSIMT,thesameinstructions isexecutedbymultiplethreads ondifferentregisters 36 Singleinstruction,multipleflowpaths(1/2) Singleinstruction,multipleflowpaths(1/2) Benefits: Usingamaskingsystem,itispossibleto supportif/elseblock Threadsarealwaysexecutingtheinstructionofbothpartof theif/elseblocks data=[-2,0,1,-1,2],data2=[...] functionf(thread_id,data,data2) { if(data[thread_id]<0) { data[thread_id]=data[thread_id]-data2[thread_id]; }elseif(data[thread_id]>0) { data[thread_id]=data[thread_id]+data2[thread_id]; } } Multipleflowsareneededinmanyalgorithms Drawbacks: Onlyoneflowpathisexecutedatatime,non runningthreadsmustwait Randomizememoryaccess Elementsofavectorarenotaccessedsequentially 37 38 ProgrammingLanguageDesignforSIMT OpenCL,CUDAarethemostcommon Verylowlevel,C/C++-derivative Generalpurposeprogramminglanguage arenotsuitable Someworkhasbeendonetowrite inPythonforCUDA Distributedprogramming @jit(argtypes=[float32[:],float32[:], float32[:]],target='gpu') defadd_matrix(A,B,C): A[cuda.threadIdx.x]=B[cuda.threadIdx.x] +C[cuda.threadIdx.x] withlimitationonstandardfunctionthatcanbecalled 39 DistributedProgramming(1/4) Indistributedcomputingseveral computationsareexecutedatthe sametimeandcommunicate throughmessagespassing Unit Unit Unit Memory Memory Memory Distributedprogramming(2/4) Adistributedcomputingapplication consistsofmultipleprogramsrunning onmultiplecomputersthattogether coordinatetoperformsometask. Computationisperformedinparallelby manycomputers. Informationcanberestrictedto certaincomputers. Redundancyandgeographicdiversity improvereliability. 41 Distributedprogramming(3/4) 42 Distributedprogramming(4/4) Characteristicsofdistributed computing: Individualprogramshave differentiatingroles. Distributedcomputingforlargescaledataprocessing: Computersareindependent—theydonot sharememory. Coordinationisenabledbymessagespassed acrossanetwork. Databasesrespondtoqueriesoveranetwork. Datasetscanbepartitionedacrossmultiple machines. 43 44 MessagePassing Messagesare(usually)passedthroughsockets Messagesareexchangedsyncrhonouslyor asynchronously Communicationcanbecentralizedorpeer-topeer MessagePassing 46 Python'sGlobalInterpreterLock Python'sMultiprocessingmodule CPythoncanonlyinterpretone singlethreadatagiventime Thelockisreleased, Themultiprocessingpackageoffers bothlocalandremoteconcurrency, effectivelyside-steppingtheGlobal InterpreterLockbyusing subprocessesinsteadofthreads Itimplementstransparentmessage passing,allowingtoexchange Pythonobjectsbetweenprocesses ThecurrentthreadisblockingforI/O Every100interpreterticks Truemultithreadingisnotpossible withCPython 47 48 Python'sMessagePassing(1/2) Python'sMessagePassing(2/2) Exampleofmessagepassing Exampleofmessagepassingwithpipes frommultiprocessingimportProcess,Pipe frommultiprocessingimportProcess deff(conn): conn.send([42,None,'hello']) conn.close() deff(name): print'hello',name if__name__=='__main__': parent_conn,child_conn=Pipe() p=Process(target=f,args=(child_conn,)) p.start() printparent_conn.recv() p.join() if__name__=='__main__': p=Process(target=f, args=('bob',)) p.start() p.join() Output Output [42,None,'hello'] Transparentmessagepassingispossiblethanksto serialization hellobob 49 50 Serialization pickle Aserializedobjectisanobject representedasasequenceof bytesthatincludestheobject’s data,itstypeandthetypesof datastoredintheobject. InPython,serializationisdonewiththepickle module Itcanserializeuser-defined Theclassdefinitionmustbeavailablebeforedeserialization Workswithdifferentversionof Bydefault,useanASCII Itcanserialize: Basictypes:booleans,numbers, Containers:tuples,lists,setsanddictionnary(ofpickable Toplevelfunctionsandclasses(onlythe Objectswhere__dict__or__getstate()__are Example: pickle.loads(pickle.dumps(10)) 51 52 Sharedmemory MemorycanbesharedbetweenPythonprocesswithaValueor Array. frommultiprocessingimportProcess,Value,Array MapReduce deff(n,a): n.value=3.1415927 foriinrange(len(a)): a[i]=-a[i] if__name__=='__main__': num=Value('d',0.0) arr=Array('i',range(10)) p=Process(target=f,args=(num,arr)) p.start() p.join() printnum.value printarr[:] Andofcourse,youwouldneedtouseMutextoavoidrace 53 BigDataProcessing(1/2) BigDataProcessing(2/2) TheMapReduce MapReduceisaframeworkforbatch processingofbigdata. Framework:Asystemusedbyprogrammersto buildapplications Batchprocessing:Allthedataisavailableat theoutset,andresultsarenotuseduntil processingcompletes Bigdata:Usedtodescribedatasetssolarge andcomprehensivethattheycanrevealfacts aboutawholepopulation,usuallyfrom statisticalanalysis 55 Datasetsaretoobigtobeanalyzedbyone machine Usingmultiplemachineshasthesame complications,regardlessoftheapplication/ analysis Purefunctionsenableanabstractionbarrier betweendataprocessinglogicand coordinatingadistributedapplication 56 MapReduceEvaluationModel(1/2) Mapphase:Applyamapper functiontoallinputs,emitting intermediatekey-valuepairs MapReduceEvaluationModel(2/2) Reducephase:Foreachintermediatekey, applyareducerfunctiontoaccumulateall valuesassociatedwiththatkey Thereducertakesaniterablevaluecontainingintermediate key-valuepairs Allpairswiththesamekeyappearconsecutively Thereduceryieldszeroormorevalues,eachassociated withthatintermediatekey Themappertakesaniterablevaluecontaining inputs,suchaslinesoftext Themapperyieldszeroormorekey-value pairsforeachinput 57 MapReduceExecutionModel(1/2) 58 MapReduceExecutionModel(2/2) 59 60 MapReduceexample Froma1.1billionpeople database(facebook?),we wanttoknowtheaverage numberoffriendsperage In SELECTage,AVG(friends)FROMusers GROUPBYage In thetotalsetofusersinsplitted differentusers_set functionmap(users_set) { for(userinusers_set) { send(user.age,user.friends.size); } } MapReduceAssumptions Constraintsonthemapperandreducer: Thekeysareshuffledandassignedto reducers functionreduce(age,friends): { varr=0; for(friendinfriends) { r+=friend; } send(age,r/friends.size); } Themappermustbeequivalenttoapplyingadeterministic purefunctiontoeachinputindependently Thereducermustbeequivalenttoapplyingadeterministic purefunctiontothesequenceofvaluesforeachkey Benefitsoffunctionalprogramming: Whenaprogramcontainsonlypurefunctions,call expressionscanbeevaluatedinanyorder,lazily,andin parallel Referentialtransparency:acallexpressioncanbereplacedby itsvalue(orvisversa)withoutchangingtheprogram InMapReduce,thesefunctionalprogramming ideasallow: Consistentresults,howevercomputationis Re-computationandcachingofresults,as 61 62 Summary MapReduceBenefits Parallelprogramming Multi-threadingandhowtohelp reduceprogrammererror Distributedprogrammingand MapReduce Faulttolerance:Amachineorharddrivemight crash TheMapReduceframeworkautomaticallyre-runsfailedtasks Speed:Somemachinemightbeslowbecause it'soverloaded Theframeworkcanrunmultiplecopiesofataskandkeepthe resultoftheonethatfinishesfirst Networklocality:Datatransferisexpensive Theframeworktriestoschedulemaptasksonthemachines thatholdthedatatobeprocessed Monitoring:Willmyjobfinishbeforedinner?!? Theframeworkprovidesaweb-basedinterfacedescribingjobs 63 64/64