List of lectures TDDA69 Data and Program Structure

advertisement
Listoflectures
TDDA69DataandProgramStructure
ConcurrentComputing
CyrilleBerger
1IntroductionandFunctionalProgramming
2ImperativeProgrammingandDataStructures
3Parsing
4Evaluation
5ObjectOrientedProgramming
6Macrosanddecorators
7VirtualMachinesandBytecode
8GarbageCollectionandNativeCode
9ConcurrentComputing
10DeclarativeProgramming
11LogicProgramming
12Summary
2/65
Lecturecontent
Concurrentcomputing
ParallelProgramming
MultithreadedProgramming
TheStatesProblemsandSolutions
Atomicactions
LanguageandInterpreterDesignConsiderations
SingleInstruction,MultipleThreadsProgramming
Distributedprogramming
MessagePassing
MapReduce
3/65
Inconcurrentcomputingseveral
computationsareexecutedatthesame
time
Inparallelcomputingallcomputationsunits
haveaccesstosharedmemory(forinstance
inasingleprocess)
Indistributedcomputingcomputationsunits
communicatethroughmessagespassing
4/65
Benefitsofconcurrentcomputing
Fastercomputation
Responsiveness
Interactiveapplicationscanbeperformingtwo
tasksatthesametime:rendering,spellchecking...
Disadvantagesofconcurrentcomputing
Concurrencyishardtoimplement
properly
Safety
Easytocorruptmemory
Deadlock
Availabilityofservices
Taskscanwaitindefinitelyforeachother
Loadbalancingbetweenservers
Non-deterministic
Notalwaysfaster!
Controllability
Taskscanbesuspended,resumedandstopped.
ThememorybandwidthandCPUcacheislimited
5/65
Concurrentcomputingprogramming
Fourbasicapproachtocomputing:
6/65
StreamProgramminginFunctionalProgramming
Noglobalstate
Functionsonlyactontheirinput,
theyarereentrant
Functionscanthenbeexecutedin
parallel
Sequencialprogramming:noconcurrency
Declarativeconcurrency:streamsinafunctional
language
Messagepassing:withactiveobjects,usedin
distributedcomputing
Atomicactions:onasharedmemory,usedin
parallelcomputing
Aslongastheydonotdependontheoutputof
anotherfunction
7/65
8/65
ParallelProgramming
Inparallelcomputingseveral
computationsareexecutedatthe
sametimeandhaveaccessto
sharedmemory
ParallelProgramming
Unit
Unit
Unit
Memory
10
SIMD,SIMT,SMT(1/2)
SIMD,SIMT,SMT(2/2)
SIMD:SingleInstruction,MultipleData
SIMD:
Elementsofashortvector(4to8elements)areprocessedinparallel
PUSH[1,2,3,4]
PUSH[4,5,6,7]
VEC_ADD_4
SIMT:SingleInstruction,MultipleThreads
SIMT:
Thesameinstructionisexecutedbymultiplethreads(from128to3048ormorein
thefuture)
execute([1,2,3,4],[4,5,6,7],lambdaa,b,ti:a[ti]=a[ti]+max(b[ti],5))
SMT:
a=[1,2,3,4]
b=[4,5,6,7]
...
Thread.new(lambda:a=a+b)
Thread.new(lambda:c=c*b)
SMT:SimultaneousMultithreading
Generalpurpose,differentinstructionsareexecutedbydifferentthreads
11
12
Whytheneedforthedifferentmodels?
Flexibility:
SMT>SIMT>SIMD
Lessflexibilitygivehigher
performance
MultithreadedProgramming
Unlessthelackofflexibilitypreventtoaccomplish
thetask
Performance:
SIMD>SIMT>SMT
13
MultithreadedProgrammingModel
SinglethreadedvsMultithreaded
Startwithasingleroot
thread
Fork:tocreateconcurently
executingthreads
Join:tosynchronizethreads
Threadscommunicate
throughsharedmemory
Threadsexecute
assynchronously
Theymayormaynotexecute
ondifferentprocessors
15
main
sub0
...
subn
main
sub0
...
subn
main
16
Amultithreadedexample
thread1=newThread(
function()
{
/*dosomecomputation*/
});
thread2=newThread(
function()
{
/*dosomecomputation*/
});
thread1.start();
thread2.start();
thread1.join();
thread2.join();
TheStatesProblemsandSolutions
17
GlobalStatesandmulti-threading
Example:
vara=0;
thread1=newThread(
function()
{
a=a+1;
});
thread2=newThread(
function()
{
a=a+1;
});
thread1.start();
thread2.start();
Atomicactions
Whatisthevalueofa?
Thisiscalleda(data)racecondition
19
Atomicoperations
Mutex
MutexistheshortofMutual
exclusion
Anoperationissaidtobeatomic,ifit
appearstohappeninstantaneously
Itisatechniquetopreventtwo
threadstoaccessasharedresource
atthesametime
read/write,swap,fetch-and-add...
Example:
test-and-set:setavalueandreturnthe
oldone
vara=0;
varm=newMutex();
thread1=newThread(
function()
{
m.lock();
a=a+1;
m.unlock();
});
Toimplementalock:
while(test-and-set(lock,1)==1){}
Tounlock:
lock=0
thread2=newThread(
function()
{
m.lock();
a=a+1;
m.unlock();
});
thread1.start();
thread2.start();
Nowa=2
21
Dependency
Example:
vara=1;
varm=newMutex();
thread1=newThread(
function()
{
m.lock();
a=a+1;
m.unlock();
});
22
Conditionvariable
thread2=newThread(
function()
{
m.lock();
a=a*3;
m.unlock();
});
thread1.start();
thread2.start();
AConditionvariableisaset
ofthreadswaitingfora
certaincondition
Example:
vara=1;
varm=newMutex();
varcv=newConditionVariable();
thread1=newThread(
function()
{
m.lock();
a=a+1;
cv.notify();
m.unlock();
});
Whatisthevalueof
a?4or6?
23
thread2=newThread(
function()
{
cv.wait();
m.lock();
a=a*3;
m.unlock();
});
thread1.start();
thread2.start();
a=6
24
Advantagesofatomicactions
Deadlock
Whatmighthappen:
vara=0;
varb=2;
varma=newMutex();
varmb=newMutex();
thread1=newThread(
function()
{
ma.lock();
mb.lock();
b=b-1;
a=a-1;
ma.unlock();
mb.unlock();
});
thread2=newThread(
function()
{
mb.lock();
ma.lock();
b=b-1;
a=a+b;
mb.unlock();
ma.unlock();
});
thread1.start();
thread2.start();
Veryefficient
Lessoverhead,fasterthanmessage
passing
thread1waitsformb,
thread2waitsforma
25
26
Disadvantagesofatomicactions
Blocking
Meaningsomethreadshavetowait
Smalloverhead
Deadlock
Alow-prioritythreadcanblockahigh
prioritythread
Acommonsourceofprogramming
errors
LanguageandInterpreterDesignConsiderations
27
Commonmistakes
Forgettounlockamutex
Forgettounlockamutex
Racecondition
Deadlocks
Granularityissues:toomuchlocking
willkilltheperformance
Mostprogramminglanguagehave,
either:
Aguardobjectthatwillunlockamutexupon
destruction
Asynchronizationstatement
some_rlock=threading.RLock()
withsome_rlock:
print("some_rlockislockedwhilethisexecutes")
29
30
SafeSharedMutableStateinrust(1/3)
Racecondition
letmutdata=vec![1,2,3];
foriin0..3{
thread::spawn(move||{
data[i]+=1;
});
}
Canwedetectpotentialrace
conditionduringcompilation?
Intherustprogramminglanguage
Objectsareownedbyaspecificthread
TypescanbemarkedwithSendtrait
Givesanerror:"captureofmoved
value:`data`"
indicatethattheobjectcanbemovedbetweenthreads
TypescanbemarkedwithSynctrait
indicatethattheobjectcanbeaccessedbymultiple
threadssafely
31
32
SafeSharedMutableStateinrust(2/3)
SafeSharedMutableStateinrust(3/3)
letmutdata=Arc::new(vec![1,2,3]);
foriin0..3{
letdata=data.clone();
thread::spawn(move||{
data[i]+=1;
});
}
letdata=Arc::new(Mutex::new(vec![1,2,3]));
foriin0..3{
letdata=data.clone();
thread::spawn(move||{
letmutdata=data.lock().unwrap();
data[i]+=1;
});
}
Arcaddreferencecounting,ismovable
andsyncable
Giveserror:cannotborrowimmutable
borrowedcontentasmutable
Itnowcompilesanditworks
33
34
SingleInstruction,MultipleThreadsProgramming
SingleInstruction,MultipleThreadsProgramming
WithSIMT,thesameinstructionsis
executedbymultiplethreadson
differentregisters
36
Singleinstruction,multipleflowpaths(1/2)
Singleinstruction,multipleflowpaths(1/2)
Usingamaskingsystem,itispossibletosupportif/
elseblock
Benefits:
Multipleflowsareneededinmanyalgorithms
Threadsarealwaysexecutingtheinstructionofbothpartoftheif/else
blocks
Drawbacks:
data=[-2,0,1,-1,2],data2=[...]
functionf(thread_id,data,data2)
{
if(data[thread_id]<0)
{
data[thread_id]=data[thread_id]-data2[thread_id];
}elseif(data[thread_id]>0)
{
data[thread_id]=data[thread_id]+data2[thread_id];
}
}
Onlyoneflowpathisexecutedatatime,non
runningthreadsmustwait
Randomizememoryaccess
Elementsofavectorarenotaccessedsequentially
37
38
ProgrammingLanguageDesignforSIMT
OpenCL,CUDAarethemostcommon
Verylowlevel,C/C++-derivative
Generalpurposeprogramminglanguagearenot
suitable
Someworkhasbeendonetobeabletowritein
PythonandrunonaGPUwithCUDA
Distributedprogramming
@jit(argtypes=[float32[:],float32[:],float32[:]],target='gpu')
defadd_matrix(A,B,C):
A[cuda.threadIdx.x]=B[cuda.threadIdx.x]
+C[cuda.threadIdx.x]
withlimitationonstandardfunctionthatcanbecalled
39
DistributedProgramming(1/4)
Distributedprogramming(2/4)
Adistributedcomputingapplication
consistsofmultipleprogramsrunning
onmultiplecomputersthattogether
coordinatetoperformsometask.
Indistributedcomputingseveral
computationsareexecutedatthe
sametimeandcommunicate
throughmessagespassing
Unit
Unit
Unit
Memory
Memory
Memory
Computationisperformedinparallelbymany
computers.
Informationcanberestrictedtocertaincomputers.
Redundancyandgeographicdiversityimprove
reliability.
41
Distributedprogramming(3/4)
42
Distributedprogramming(4/4)
Characteristicsofdistributed
computing:
Individualprogramshave
differentiatingroles.
Distributedcomputingforlargescaledataprocessing:
Computersareindependent—theydonotshare
memory.
Coordinationisenabledbymessagespassed
acrossanetwork.
Databasesrespondtoqueriesoveranetwork.
Datasetscanbepartitionedacrossmultiple
machines.
43
44
MessagePassing
Messagesare(usually)passedthroughsockets
Messagesareexchangedsynchronouslyor
asynchronously
Communicationcanbecentralizedorpeer-to-peer
MessagePassing
46
Python'sGlobalInterpreterLock(1/2)
Python'sGlobalInterpreterLock(2/2)
CPythoncanonlyinterpretonesinglethreadat
agiventime
CPythoncanonlyinterpretone
singlethreadatagiventime
Thelockisreleased,when:
Single-threadedarefaster(noneedtolockinmemory
management)
ManyC-libraryusedasextensionsarenotthreadsafe
ThecurrentthreadisblockingforI/O
Every100interpreterticks
ToeliminatetheGILPythondevelopershavethe
followingrequirements:
Truemultithreadingisnotpossible
withCPython
Simplicity
Doactuallyimproveperformance
Backwardcompatible
Promptandordereddestruction
47
48
Python'sMultiprocessingmodule
Python'sMessagePassing(1/2)
Exampleofmessagepassing
Themultiprocessingpackageoffersboth
localandremoteconcurrency,effectively
side-steppingtheGlobalInterpreterLock
byusingsubprocessesinsteadofthreads
Itimplementstransparentmessage
passing,allowingtoexchangePython
objectsbetweenprocesses
frommultiprocessingimportProcess
deff(name):
print'hello',name
if__name__=='__main__':
p=Process(target=f,args=('bob',))
p.start()
p.join()
Output
hellobob
49
Python'sMessagePassing(2/2)
Exampleofmessagepassingwithpipes
50
Serialization
Aserializedobjectisanobject
representedasasequenceofbytes
thatincludestheobject’sdata,its
typeandthetypesofdatastoredin
theobject.
frommultiprocessingimportProcess,Pipe
deff(conn):
conn.send([42,None,'hello'])
conn.close()
if__name__=='__main__':
parent_conn,child_conn=Pipe()
p=Process(target=f,args=(child_conn,))
p.start()
printparent_conn.recv()
p.join()
Output
[42,None,'hello']
Transparentmessagepassingispossiblethankstoserialization
51
52
pickle
Sharedmemory
MemorycanbesharedbetweenPythonprocesswithaValueorArray.
InPython,serializationisdonewiththepicklemodule
frommultiprocessingimportProcess,Value,Array
Itcanserializeuser-definedclasses
deff(n,a):
n.value=3.1415927
foriinrange(len(a)):
a[i]=-a[i]
Theclassdefinitionmustbeavailablebeforedeserialization
WorkswithdifferentversionofPython
Bydefault,useanASCIIformat
Itcanserialize:
if__name__=='__main__':
num=Value('d',0.0)
arr=Array('i',range(10))
Basictypes:booleans,numbers,strings
Containers:tuples,lists,setsanddictionnary(ofpickableobjects)
Toplevelfunctionsandclasses(onlythename)
Objectswhere__dict__or__getstate()__arepickable
p=Process(target=f,args=(num,arr))
p.start()
p.join()
Example:
printnum.value
printarr[:]
pickle.loads(pickle.dumps(10))
Andofcourse,youwouldneedtouseMutextoavoidracecondition
53
54
BigDataProcessing(1/2)
MapReduceisaframeworkforbatch
processingofbigdata.
MapReduce
Framework:Asystemusedbyprogrammerstobuild
applications
Batchprocessing:Allthedataisavailableattheoutset,
andresultsarenotuseduntilprocessingcompletes
Bigdata:Usedtodescribedatasetssolargeand
comprehensivethattheycanrevealfactsaboutawhole
population,usuallyfromstatisticalanalysis
56
MapReduceEvaluationModel(1/2)
BigDataProcessing(2/2)
Mapphase:Applyamapperfunctiontoall
inputs,emittingintermediatekey-value
pairs
TheMapReduceidea:
Datasetsaretoobigtobeanalyzedbyone
machine
Usingmultiplemachineshasthesame
complications,regardlessoftheapplication/
analysis
Purefunctionsenableanabstractionbarrier
betweendataprocessinglogicandcoordinatinga
distributedapplication
Themappertakesaniterablevaluecontaininginputs,
suchaslinesoftext
Themapperyieldszeroormorekey-valuepairsforeach
input
57
MapReduceEvaluationModel(2/2)
58
MapReduceExecutionModel(1/2)
Reducephase:Foreachintermediatekey,
applyareducerfunctiontoaccumulateall
valuesassociatedwiththatkey
Thereducertakesaniterablevaluecontainingintermediate
key-valuepairs
Allpairswiththesamekeyappearconsecutively
Thereduceryieldszeroormorevalues,eachassociatedwith
thatintermediatekey
59
60
MapReduceExecutionModel(2/2)
MapReduceexample
Thekeysareshuffledandassignedto
reducers
functionreduce(age,friends):
{
varr=0;
for(friendinfriends)
{
r+=friend;
}
send(age,r/friends.size);
}
Froma1.1billionpeople
database(facebook?),wewantto
knowtheaveragenumberof
friendsperage
InSQL:
SELECTage,AVG(friends)FROMusersGROUP
BYage
InMapReduce:
thetotalsetofusersinsplitteddifferent
users_set
functionmap(users_set)
{
for(userinusers_set)
{
send(user.age,user.friends.size);
}
}
61
MapReduceAssumptions
62
MapReduceBenefits
Constraintsonthemapperandreducer:
Faulttolerance:Amachineorharddrivemightcrash
Themappermustbeequivalenttoapplyingadeterministicpurefunctionto
eachinputindependently
Thereducermustbeequivalenttoapplyingadeterministicpurefunctionto
thesequenceofvaluesforeachkey
TheMapReduceframeworkautomaticallyre-runsfailedtasks
Speed:Somemachinemightbeslowbecauseit's
overloaded
Theframeworkcanrunmultiplecopiesofataskandkeeptheresultof
theonethatfinishesfirst
Benefitsoffunctionalprogramming:
Whenaprogramcontainsonlypurefunctions,callexpressionscanbe
evaluatedinanyorder,lazily,andinparallel
Referentialtransparency:acallexpressioncanbereplacedbyitsvalue(orvis
versa)withoutchangingtheprogram
Networklocality:Datatransferisexpensive
Theframeworktriestoschedulemaptasksonthemachinesthathold
thedatatobeprocessed
InMapReduce,thesefunctionalprogrammingideasallow:
Monitoring:Willmyjobfinishbeforedinner?!?
Consistentresults,howevercomputationispartitioned
Re-computationandcachingofresults,asneeded
Theframeworkprovidesaweb-basedinterfacedescribingjobs
63
64
Summary
Parallelprogramming
Multi-threadingandhowtohelp
reduceprogrammererror
Distributedprogrammingand
MapReduce
65/65
Download