Listoflectures TDDA69DataandProgramStructure ConcurrentComputing CyrilleBerger 1IntroductionandFunctionalProgramming 2ImperativeProgrammingandDataStructures 3Parsing 4Evaluation 5ObjectOrientedProgramming 6Macrosanddecorators 7VirtualMachinesandBytecode 8GarbageCollectionandNativeCode 9ConcurrentComputing 10DeclarativeProgramming 11LogicProgramming 12Summary 2/65 Lecturecontent Concurrentcomputing ParallelProgramming MultithreadedProgramming TheStatesProblemsandSolutions Atomicactions LanguageandInterpreterDesignConsiderations SingleInstruction,MultipleThreadsProgramming Distributedprogramming MessagePassing MapReduce 3/65 Inconcurrentcomputingseveral computationsareexecutedatthesame time Inparallelcomputingallcomputationsunits haveaccesstosharedmemory(forinstance inasingleprocess) Indistributedcomputingcomputationsunits communicatethroughmessagespassing 4/65 Benefitsofconcurrentcomputing Fastercomputation Responsiveness Interactiveapplicationscanbeperformingtwo tasksatthesametime:rendering,spellchecking... Disadvantagesofconcurrentcomputing Concurrencyishardtoimplement properly Safety Easytocorruptmemory Deadlock Availabilityofservices Taskscanwaitindefinitelyforeachother Loadbalancingbetweenservers Non-deterministic Notalwaysfaster! Controllability Taskscanbesuspended,resumedandstopped. ThememorybandwidthandCPUcacheislimited 5/65 Concurrentcomputingprogramming Fourbasicapproachtocomputing: 6/65 StreamProgramminginFunctionalProgramming Noglobalstate Functionsonlyactontheirinput, theyarereentrant Functionscanthenbeexecutedin parallel Sequencialprogramming:noconcurrency Declarativeconcurrency:streamsinafunctional language Messagepassing:withactiveobjects,usedin distributedcomputing Atomicactions:onasharedmemory,usedin parallelcomputing Aslongastheydonotdependontheoutputof anotherfunction 7/65 8/65 ParallelProgramming Inparallelcomputingseveral computationsareexecutedatthe sametimeandhaveaccessto sharedmemory ParallelProgramming Unit Unit Unit Memory 10 SIMD,SIMT,SMT(1/2) SIMD,SIMT,SMT(2/2) SIMD:SingleInstruction,MultipleData SIMD: Elementsofashortvector(4to8elements)areprocessedinparallel PUSH[1,2,3,4] PUSH[4,5,6,7] VEC_ADD_4 SIMT:SingleInstruction,MultipleThreads SIMT: Thesameinstructionisexecutedbymultiplethreads(from128to3048ormorein thefuture) execute([1,2,3,4],[4,5,6,7],lambdaa,b,ti:a[ti]=a[ti]+max(b[ti],5)) SMT: a=[1,2,3,4] b=[4,5,6,7] ... Thread.new(lambda:a=a+b) Thread.new(lambda:c=c*b) SMT:SimultaneousMultithreading Generalpurpose,differentinstructionsareexecutedbydifferentthreads 11 12 Whytheneedforthedifferentmodels? Flexibility: SMT>SIMT>SIMD Lessflexibilitygivehigher performance MultithreadedProgramming Unlessthelackofflexibilitypreventtoaccomplish thetask Performance: SIMD>SIMT>SMT 13 MultithreadedProgrammingModel SinglethreadedvsMultithreaded Startwithasingleroot thread Fork:tocreateconcurently executingthreads Join:tosynchronizethreads Threadscommunicate throughsharedmemory Threadsexecute assynchronously Theymayormaynotexecute ondifferentprocessors 15 main sub0 ... subn main sub0 ... subn main 16 Amultithreadedexample thread1=newThread( function() { /*dosomecomputation*/ }); thread2=newThread( function() { /*dosomecomputation*/ }); thread1.start(); thread2.start(); thread1.join(); thread2.join(); TheStatesProblemsandSolutions 17 GlobalStatesandmulti-threading Example: vara=0; thread1=newThread( function() { a=a+1; }); thread2=newThread( function() { a=a+1; }); thread1.start(); thread2.start(); Atomicactions Whatisthevalueofa? Thisiscalleda(data)racecondition 19 Atomicoperations Mutex MutexistheshortofMutual exclusion Anoperationissaidtobeatomic,ifit appearstohappeninstantaneously Itisatechniquetopreventtwo threadstoaccessasharedresource atthesametime read/write,swap,fetch-and-add... Example: test-and-set:setavalueandreturnthe oldone vara=0; varm=newMutex(); thread1=newThread( function() { m.lock(); a=a+1; m.unlock(); }); Toimplementalock: while(test-and-set(lock,1)==1){} Tounlock: lock=0 thread2=newThread( function() { m.lock(); a=a+1; m.unlock(); }); thread1.start(); thread2.start(); Nowa=2 21 Dependency Example: vara=1; varm=newMutex(); thread1=newThread( function() { m.lock(); a=a+1; m.unlock(); }); 22 Conditionvariable thread2=newThread( function() { m.lock(); a=a*3; m.unlock(); }); thread1.start(); thread2.start(); AConditionvariableisaset ofthreadswaitingfora certaincondition Example: vara=1; varm=newMutex(); varcv=newConditionVariable(); thread1=newThread( function() { m.lock(); a=a+1; cv.notify(); m.unlock(); }); Whatisthevalueof a?4or6? 23 thread2=newThread( function() { cv.wait(); m.lock(); a=a*3; m.unlock(); }); thread1.start(); thread2.start(); a=6 24 Advantagesofatomicactions Deadlock Whatmighthappen: vara=0; varb=2; varma=newMutex(); varmb=newMutex(); thread1=newThread( function() { ma.lock(); mb.lock(); b=b-1; a=a-1; ma.unlock(); mb.unlock(); }); thread2=newThread( function() { mb.lock(); ma.lock(); b=b-1; a=a+b; mb.unlock(); ma.unlock(); }); thread1.start(); thread2.start(); Veryefficient Lessoverhead,fasterthanmessage passing thread1waitsformb, thread2waitsforma 25 26 Disadvantagesofatomicactions Blocking Meaningsomethreadshavetowait Smalloverhead Deadlock Alow-prioritythreadcanblockahigh prioritythread Acommonsourceofprogramming errors LanguageandInterpreterDesignConsiderations 27 Commonmistakes Forgettounlockamutex Forgettounlockamutex Racecondition Deadlocks Granularityissues:toomuchlocking willkilltheperformance Mostprogramminglanguagehave, either: Aguardobjectthatwillunlockamutexupon destruction Asynchronizationstatement some_rlock=threading.RLock() withsome_rlock: print("some_rlockislockedwhilethisexecutes") 29 30 SafeSharedMutableStateinrust(1/3) Racecondition letmutdata=vec![1,2,3]; foriin0..3{ thread::spawn(move||{ data[i]+=1; }); } Canwedetectpotentialrace conditionduringcompilation? Intherustprogramminglanguage Objectsareownedbyaspecificthread TypescanbemarkedwithSendtrait Givesanerror:"captureofmoved value:`data`" indicatethattheobjectcanbemovedbetweenthreads TypescanbemarkedwithSynctrait indicatethattheobjectcanbeaccessedbymultiple threadssafely 31 32 SafeSharedMutableStateinrust(2/3) SafeSharedMutableStateinrust(3/3) letmutdata=Arc::new(vec![1,2,3]); foriin0..3{ letdata=data.clone(); thread::spawn(move||{ data[i]+=1; }); } letdata=Arc::new(Mutex::new(vec![1,2,3])); foriin0..3{ letdata=data.clone(); thread::spawn(move||{ letmutdata=data.lock().unwrap(); data[i]+=1; }); } Arcaddreferencecounting,ismovable andsyncable Giveserror:cannotborrowimmutable borrowedcontentasmutable Itnowcompilesanditworks 33 34 SingleInstruction,MultipleThreadsProgramming SingleInstruction,MultipleThreadsProgramming WithSIMT,thesameinstructionsis executedbymultiplethreadson differentregisters 36 Singleinstruction,multipleflowpaths(1/2) Singleinstruction,multipleflowpaths(1/2) Usingamaskingsystem,itispossibletosupportif/ elseblock Benefits: Multipleflowsareneededinmanyalgorithms Threadsarealwaysexecutingtheinstructionofbothpartoftheif/else blocks Drawbacks: data=[-2,0,1,-1,2],data2=[...] functionf(thread_id,data,data2) { if(data[thread_id]<0) { data[thread_id]=data[thread_id]-data2[thread_id]; }elseif(data[thread_id]>0) { data[thread_id]=data[thread_id]+data2[thread_id]; } } Onlyoneflowpathisexecutedatatime,non runningthreadsmustwait Randomizememoryaccess Elementsofavectorarenotaccessedsequentially 37 38 ProgrammingLanguageDesignforSIMT OpenCL,CUDAarethemostcommon Verylowlevel,C/C++-derivative Generalpurposeprogramminglanguagearenot suitable Someworkhasbeendonetobeabletowritein PythonandrunonaGPUwithCUDA Distributedprogramming @jit(argtypes=[float32[:],float32[:],float32[:]],target='gpu') defadd_matrix(A,B,C): A[cuda.threadIdx.x]=B[cuda.threadIdx.x] +C[cuda.threadIdx.x] withlimitationonstandardfunctionthatcanbecalled 39 DistributedProgramming(1/4) Distributedprogramming(2/4) Adistributedcomputingapplication consistsofmultipleprogramsrunning onmultiplecomputersthattogether coordinatetoperformsometask. Indistributedcomputingseveral computationsareexecutedatthe sametimeandcommunicate throughmessagespassing Unit Unit Unit Memory Memory Memory Computationisperformedinparallelbymany computers. Informationcanberestrictedtocertaincomputers. Redundancyandgeographicdiversityimprove reliability. 41 Distributedprogramming(3/4) 42 Distributedprogramming(4/4) Characteristicsofdistributed computing: Individualprogramshave differentiatingroles. Distributedcomputingforlargescaledataprocessing: Computersareindependent—theydonotshare memory. Coordinationisenabledbymessagespassed acrossanetwork. Databasesrespondtoqueriesoveranetwork. Datasetscanbepartitionedacrossmultiple machines. 43 44 MessagePassing Messagesare(usually)passedthroughsockets Messagesareexchangedsynchronouslyor asynchronously Communicationcanbecentralizedorpeer-to-peer MessagePassing 46 Python'sGlobalInterpreterLock(1/2) Python'sGlobalInterpreterLock(2/2) CPythoncanonlyinterpretonesinglethreadat agiventime CPythoncanonlyinterpretone singlethreadatagiventime Thelockisreleased,when: Single-threadedarefaster(noneedtolockinmemory management) ManyC-libraryusedasextensionsarenotthreadsafe ThecurrentthreadisblockingforI/O Every100interpreterticks ToeliminatetheGILPythondevelopershavethe followingrequirements: Truemultithreadingisnotpossible withCPython Simplicity Doactuallyimproveperformance Backwardcompatible Promptandordereddestruction 47 48 Python'sMultiprocessingmodule Python'sMessagePassing(1/2) Exampleofmessagepassing Themultiprocessingpackageoffersboth localandremoteconcurrency,effectively side-steppingtheGlobalInterpreterLock byusingsubprocessesinsteadofthreads Itimplementstransparentmessage passing,allowingtoexchangePython objectsbetweenprocesses frommultiprocessingimportProcess deff(name): print'hello',name if__name__=='__main__': p=Process(target=f,args=('bob',)) p.start() p.join() Output hellobob 49 Python'sMessagePassing(2/2) Exampleofmessagepassingwithpipes 50 Serialization Aserializedobjectisanobject representedasasequenceofbytes thatincludestheobject’sdata,its typeandthetypesofdatastoredin theobject. frommultiprocessingimportProcess,Pipe deff(conn): conn.send([42,None,'hello']) conn.close() if__name__=='__main__': parent_conn,child_conn=Pipe() p=Process(target=f,args=(child_conn,)) p.start() printparent_conn.recv() p.join() Output [42,None,'hello'] Transparentmessagepassingispossiblethankstoserialization 51 52 pickle Sharedmemory MemorycanbesharedbetweenPythonprocesswithaValueorArray. InPython,serializationisdonewiththepicklemodule frommultiprocessingimportProcess,Value,Array Itcanserializeuser-definedclasses deff(n,a): n.value=3.1415927 foriinrange(len(a)): a[i]=-a[i] Theclassdefinitionmustbeavailablebeforedeserialization WorkswithdifferentversionofPython Bydefault,useanASCIIformat Itcanserialize: if__name__=='__main__': num=Value('d',0.0) arr=Array('i',range(10)) Basictypes:booleans,numbers,strings Containers:tuples,lists,setsanddictionnary(ofpickableobjects) Toplevelfunctionsandclasses(onlythename) Objectswhere__dict__or__getstate()__arepickable p=Process(target=f,args=(num,arr)) p.start() p.join() Example: printnum.value printarr[:] pickle.loads(pickle.dumps(10)) Andofcourse,youwouldneedtouseMutextoavoidracecondition 53 54 BigDataProcessing(1/2) MapReduceisaframeworkforbatch processingofbigdata. MapReduce Framework:Asystemusedbyprogrammerstobuild applications Batchprocessing:Allthedataisavailableattheoutset, andresultsarenotuseduntilprocessingcompletes Bigdata:Usedtodescribedatasetssolargeand comprehensivethattheycanrevealfactsaboutawhole population,usuallyfromstatisticalanalysis 56 MapReduceEvaluationModel(1/2) BigDataProcessing(2/2) Mapphase:Applyamapperfunctiontoall inputs,emittingintermediatekey-value pairs TheMapReduceidea: Datasetsaretoobigtobeanalyzedbyone machine Usingmultiplemachineshasthesame complications,regardlessoftheapplication/ analysis Purefunctionsenableanabstractionbarrier betweendataprocessinglogicandcoordinatinga distributedapplication Themappertakesaniterablevaluecontaininginputs, suchaslinesoftext Themapperyieldszeroormorekey-valuepairsforeach input 57 MapReduceEvaluationModel(2/2) 58 MapReduceExecutionModel(1/2) Reducephase:Foreachintermediatekey, applyareducerfunctiontoaccumulateall valuesassociatedwiththatkey Thereducertakesaniterablevaluecontainingintermediate key-valuepairs Allpairswiththesamekeyappearconsecutively Thereduceryieldszeroormorevalues,eachassociatedwith thatintermediatekey 59 60 MapReduceExecutionModel(2/2) MapReduceexample Thekeysareshuffledandassignedto reducers functionreduce(age,friends): { varr=0; for(friendinfriends) { r+=friend; } send(age,r/friends.size); } Froma1.1billionpeople database(facebook?),wewantto knowtheaveragenumberof friendsperage InSQL: SELECTage,AVG(friends)FROMusersGROUP BYage InMapReduce: thetotalsetofusersinsplitteddifferent users_set functionmap(users_set) { for(userinusers_set) { send(user.age,user.friends.size); } } 61 MapReduceAssumptions 62 MapReduceBenefits Constraintsonthemapperandreducer: Faulttolerance:Amachineorharddrivemightcrash Themappermustbeequivalenttoapplyingadeterministicpurefunctionto eachinputindependently Thereducermustbeequivalenttoapplyingadeterministicpurefunctionto thesequenceofvaluesforeachkey TheMapReduceframeworkautomaticallyre-runsfailedtasks Speed:Somemachinemightbeslowbecauseit's overloaded Theframeworkcanrunmultiplecopiesofataskandkeeptheresultof theonethatfinishesfirst Benefitsoffunctionalprogramming: Whenaprogramcontainsonlypurefunctions,callexpressionscanbe evaluatedinanyorder,lazily,andinparallel Referentialtransparency:acallexpressioncanbereplacedbyitsvalue(orvis versa)withoutchangingtheprogram Networklocality:Datatransferisexpensive Theframeworktriestoschedulemaptasksonthemachinesthathold thedatatobeprocessed InMapReduce,thesefunctionalprogrammingideasallow: Monitoring:Willmyjobfinishbeforedinner?!? Consistentresults,howevercomputationispartitioned Re-computationandcachingofresults,asneeded Theframeworkprovidesaweb-basedinterfacedescribingjobs 63 64 Summary Parallelprogramming Multi-threadingandhowtohelp reduceprogrammererror Distributedprogrammingand MapReduce 65/65