List of lectures TDDA69 Data and Program Structure

advertisement
Listoflectures
IntroductionandFunctionalProgramming
ImperativeProgrammingandDataStructures
3 Environment
4 Evaluation
5 ObjectOrientedProgramming
6 Macrosanddecorators
7 VirtualMachinesandBytecode
8 GarbageCollectionandNativeCode
9 ParallelandDistributedComputing
10LogicProgramming
11Summary
1
TDDA69DataandProgramStructure
ParallelandDistributedComputing
CyrilleBerger
2
2/64
Lecturegoal
Lecturecontent
Learnabouttheconcept,
thechallengesofdistributed
computing
Theimpactofdistributed
programmingonprogramming
languageandimplementations
ParallelProgramming
MultithreadedProgramming
TheStatesProblemsandSolutions
Atomicactions
LanguageandInterpreterDesignConsiderations
SingleInstruction,MultipleThreads
Programming
Distributedprogramming
MessagePassing
MapReduce
3/64
4/64
Concurrentcomputing
Inconcurrentcomputingseveral
computationsareexecutedatthe
sametime
Inparallelcomputingallcomputations
unitshaveaccesstosharedmemory
(forinstanceinasingleprocess)
Indistributedcomputingcomputations
unitscommunicatethroughmessages
passing
5/64
Disadvantagesofconcurrentcomputing
Benefitsofconcurrentcomputing
Faster
Responsiveness
Interactiveapplicationscanbeperformingtwotasks
atthesametime:rendering,spellchecking...
Availabilityofservices
Loadbalancingbetweenservers
Controllability
Tasksrequiringcertainpreconditionscansuspend
andwaituntilthepreconditionshold,thenresume
executiontransparently.
6/64
Concurrentcomputingprogramming
Fourbasicapproachtocomputing:
Concurrencyishardtoimplement
properly
Safety
Sequencialprogramming:noconcurrency
Declarativeconcurrency:streamsina
functionallanguage
Messagepassing:withactiveobjects,usedin
distributedcomputing
Atomicactions:onasharedmemory,usedin
parallelcomputing
Easytocorrupt
Deadlock
Taskscanwaitindefinitelyforeach
NonNotalwaysfaster!
ThememorybandwidthandCPUcacheis
7/64
8/64
StreamProgramminginFunctionalProgramming
Noglobal
Functionsonlyactontheirinput,
theyarereentrant
Functionscanthenbeexecuted
inparallel
ParallelProgramming
Aslongastheydonotdependontheoutput
ofanotherfunction
9/64
ParallelProgramming
SIMD,SIMT,SMT(1/2)
Inparallelcomputingseveral
computationsareexecutedatthe
sametimeandhaveaccessto
sharedmemory
Unit
Unit
SIMD:SingleInstruction,Multiple
Elementsofashortvector(4to8elements)areprocessedinparallel
SIMT:SingleInstruction,Multiple
Thesameinstructionisexecutedbymultiplethreads(from128to3048
ormoreinthefuture)
Unit
SMT:Simultaneous
Memory
Generalpurpose,differentinstructionsareexecutedbydifferentthreads
11
12
SIMD,SIMT,SMT(2/2)
Whytheneedforthedifferentmodels?
Flexibility:
SIMD:
SMT>SIMT>SIMD
PUSH[1,2,3,4]
PUSH[4,5,6,7]chrome://downloads/
VEC_ADD_4
Lessflexibilitygive
higherperformance
SIMT:
execute([1,2,3,4],[4,5,6,7],lambdaa,b,ti:a[ti]=a[ti]
+max(b[ti],5))
Unlessthelackofflexibilitypreventto
accomplishthetask
SMT:
a=[1,2,3,4]
b=[4,5,6,7]
...
Thread.new(lambda:a=a+b)
Thread.new(lambda:c=c*b)
Performance:
SIMD>SIMT>SMT
13
14
SinglethreadedvsMultithreaded
MultithreadedProgramming
16
Amultithreadedexample
MultithreadedProgrammingModel
Startwithasingleroot
thread
Fork:tocreateconcurently
executingthreads
Join:tosynchronize
threads
Threadscommunicate
throughsharedmemory
Threadsexecute
assynchronously
Theymayormay
notexecuteon
differentprocessors
thread1=newThread(
function()
{
/*dosomecomputation*/
});
thread2=newThread(
function()
{
/*dosomecomputation*/
});
thread1.start();
thread2.start();
thread1.join();
thread2.join();
main
sub0
...
subn
main
sub0
...
subn
main
17
18
GlobalStatesandmulti-threading
Example:
TheStatesProblemsandSolutions
vara=0;
thread1=newThread(
function()
{
a=a+1;
});
thread2=newThread(
function()
{
a=a+1;
});
thread1.start();
thread2.start();
Whatisthevalueofa?
Thisiscalledaracecondition
20
Mutex
Mutexistheshortof
Mutualexclusion
Itisatechniquetopreventtwo
threadstoaccessashared
resourceatthesametime
Atomicactions
Example:
vara=0;
varm=newMutex();
thread1=new
Thread(
function()
{
m.lock();
a=a+1;
m.unlock();
});
Dependency
Example:
vara=1;
varm=new
Mutex();
thread1=new
Thread(
function()
{
m.lock();
a=a+1;
m.unlock();
});
thread2=new
Thread(
function()
{
m.lock();
a=a+1;
m.unlock();
});
thread1.start();
thread2.start();
Now
22
Conditionvariable
thread2=newThread(
function()
{
m.lock();
a=a*3;
m.unlock();
});
thread1.start();
thread2.start();
Whatisthevalue
ofa?4or6?
23
AConditionvariableisa
setofthreadswaitingfor
acertaincondition
Example:
vara=1;
varm=newMutex();
varcv=newConditionVariable();
thread1=newThread(
function()
{
m.lock();
a=a+1;
cv.notify();
m.unlock();
});
thread2=new
Thread(
function()
{
cv.wait();
m.lock();
a=a*3;
m.unlock();
});
thread1.start();
thread2.start();
a=6
24
Deadlock
Whatmighthappen:
vara=0;
varb=2;
varma=newMutex();
varmb=newMutex();
thread1=new
Thread(
function()
{
ma.lock();
mb.lock();
b=b-1;
a=a-1;
ma.unlock();
mb.unlock();
});
Advantagesofatomicactions
Veryefficient
Lessoverhead,faster
thanmessagepassing
thread2=new
Thread(
function()
{
mb.lock();
ma.lock();
b=b-1;
a=a+b;
mb.unlock();
ma.unlock();
});
thread1.start();
thread2.start();
thread1waitsformb,
thread2waitsforma
25
26
Disadvantagesofatomicactions
Blocking
Meaningsomethreadshavetowait
Smalloverhead
Deadlock
Alow-prioritythreadcanblocka
highprioritythread
Acommonsourceof
programmingerrors
LanguageandInterpreterDesignConsiderations
27
Commonmistakes
Forgettounlockamutex
Forgettounlockamutex
Racecondition
Deadlocks
Granularityissues:toomuch
lockingwillkilltheperformance
Mostprogramminglanguage
have,either:
Aguardobjectthatwillunlockamutexupon
destruction
Asynchronizationstatement
some_rlock=threading.RLock()
withsome_rlock:
print("some_rlockislockedwhilethisexecutes")
29
Racecondition
30
SafeSharedMutableStateinrust(1/3)
Canwedetectpotentialrace
conditionduringcompilation?
Intherustprogramminglanguage
letmutdata=vec![1u32,2,3];
forjin0..2{
thread::spawn(move||{
for(inti=0;i<2;++i)
data[i]+=1;
});
}
Objectsareownedbyaspecificthread
TypescanbemarkedwithSendtrait
Givesanerror:"captureofmoved
value:`data`"
indicatethattheobjectcanbemovedbetween
threads
TypescanbemarkedwithSynctrait
indicatethattheobjectcanbeaccessedbymultiple
threadssafely
31
32
SafeSharedMutableStateinrust(2/3)
SafeSharedMutableStateinrust(3/3)
letmutdata=Mutex::new(vec![1u32,2,3]);
forjin0..2{
letdata=data.lock().unwrap();
thread::spawn(move||{
for(inti=0;i<2;++i)
data[i]+=1;
});
}
letmutdata=Arc::new(vec![1u32,2,3]);
forjin0..2{
letdata=data.clone();
thread::spawn(move||{
letmutdata=data.lock().unwrap();
for(inti=0;i<2;++i)
data[i]+=1;
});
}
Givesanerror:MutexGuarddoes
nothaveSendtraits
ArchastheSynctrait.
Meaningwecanotmovedatatothethread
33
34
SingleInstruction,MultipleThreadsProgramming
SingleInstruction,MultipleThreadsProgramming
WithSIMT,thesameinstructions
isexecutedbymultiplethreads
ondifferentregisters
36
Singleinstruction,multipleflowpaths(1/2)
Singleinstruction,multipleflowpaths(1/2)
Benefits:
Usingamaskingsystem,itispossibleto
supportif/elseblock
Threadsarealwaysexecutingtheinstructionofbothpartof
theif/elseblocks
data=[-2,0,1,-1,2],data2=[...]
functionf(thread_id,data,data2)
{
if(data[thread_id]<0)
{
data[thread_id]=data[thread_id]-data2[thread_id];
}elseif(data[thread_id]>0)
{
data[thread_id]=data[thread_id]+data2[thread_id];
}
}
Multipleflowsareneededinmanyalgorithms
Drawbacks:
Onlyoneflowpathisexecutedatatime,non
runningthreadsmustwait
Randomizememoryaccess
Elementsofavectorarenotaccessedsequentially
37
38
ProgrammingLanguageDesignforSIMT
OpenCL,CUDAarethemostcommon
Verylowlevel,C/C++-derivative
Generalpurposeprogramminglanguage
arenotsuitable
Someworkhasbeendonetowrite
inPythonforCUDA
Distributedprogramming
@jit(argtypes=[float32[:],float32[:],
float32[:]],target='gpu')
defadd_matrix(A,B,C):
A[cuda.threadIdx.x]=B[cuda.threadIdx.x]
+C[cuda.threadIdx.x]
withlimitationonstandardfunctionthatcanbecalled
39
DistributedProgramming(1/4)
Indistributedcomputingseveral
computationsareexecutedatthe
sametimeandcommunicate
throughmessagespassing
Unit
Unit
Unit
Memory
Memory
Memory
Distributedprogramming(2/4)
Adistributedcomputingapplication
consistsofmultipleprogramsrunning
onmultiplecomputersthattogether
coordinatetoperformsometask.
Computationisperformedinparallelby
manycomputers.
Informationcanberestrictedto
certaincomputers.
Redundancyandgeographicdiversity
improvereliability.
41
Distributedprogramming(3/4)
42
Distributedprogramming(4/4)
Characteristicsofdistributed
computing:
Individualprogramshave
differentiatingroles.
Distributedcomputingforlargescaledataprocessing:
Computersareindependent—theydonot
sharememory.
Coordinationisenabledbymessagespassed
acrossanetwork.
Databasesrespondtoqueriesoveranetwork.
Datasetscanbepartitionedacrossmultiple
machines.
43
44
MessagePassing
Messagesare(usually)passedthroughsockets
Messagesareexchangedsyncrhonouslyor
asynchronously
Communicationcanbecentralizedorpeer-topeer
MessagePassing
46
Python'sGlobalInterpreterLock
Python'sMultiprocessingmodule
CPythoncanonlyinterpretone
singlethreadatagiventime
Thelockisreleased,
Themultiprocessingpackageoffers
bothlocalandremoteconcurrency,
effectivelyside-steppingtheGlobal
InterpreterLockbyusing
subprocessesinsteadofthreads
Itimplementstransparentmessage
passing,allowingtoexchange
Pythonobjectsbetweenprocesses
ThecurrentthreadisblockingforI/O
Every100interpreterticks
Truemultithreadingisnotpossible
withCPython
47
48
Python'sMessagePassing(1/2)
Python'sMessagePassing(2/2)
Exampleofmessagepassing
Exampleofmessagepassingwithpipes
frommultiprocessingimportProcess,Pipe
frommultiprocessingimportProcess
deff(conn):
conn.send([42,None,'hello'])
conn.close()
deff(name):
print'hello',name
if__name__=='__main__':
parent_conn,child_conn=Pipe()
p=Process(target=f,args=(child_conn,))
p.start()
printparent_conn.recv()
p.join()
if__name__=='__main__':
p=Process(target=f,
args=('bob',))
p.start()
p.join()
Output
Output
[42,None,'hello']
Transparentmessagepassingispossiblethanksto
serialization
hellobob
49
50
Serialization
pickle
Aserializedobjectisanobject
representedasasequenceof
bytesthatincludestheobject’s
data,itstypeandthetypesof
datastoredintheobject.
InPython,serializationisdonewiththepickle
module
Itcanserializeuser-defined
Theclassdefinitionmustbeavailablebeforedeserialization
Workswithdifferentversionof
Bydefault,useanASCII
Itcanserialize:
Basictypes:booleans,numbers,
Containers:tuples,lists,setsanddictionnary(ofpickable
Toplevelfunctionsandclasses(onlythe
Objectswhere__dict__or__getstate()__are
Example:
pickle.loads(pickle.dumps(10))
51
52
Sharedmemory
MemorycanbesharedbetweenPythonprocesswithaValueor
Array.
frommultiprocessingimportProcess,Value,Array
MapReduce
deff(n,a):
n.value=3.1415927
foriinrange(len(a)):
a[i]=-a[i]
if__name__=='__main__':
num=Value('d',0.0)
arr=Array('i',range(10))
p=Process(target=f,args=(num,arr))
p.start()
p.join()
printnum.value
printarr[:]
Andofcourse,youwouldneedtouseMutextoavoidrace
53
BigDataProcessing(1/2)
BigDataProcessing(2/2)
TheMapReduce
MapReduceisaframeworkforbatch
processingofbigdata.
Framework:Asystemusedbyprogrammersto
buildapplications
Batchprocessing:Allthedataisavailableat
theoutset,andresultsarenotuseduntil
processingcompletes
Bigdata:Usedtodescribedatasetssolarge
andcomprehensivethattheycanrevealfacts
aboutawholepopulation,usuallyfrom
statisticalanalysis
55
Datasetsaretoobigtobeanalyzedbyone
machine
Usingmultiplemachineshasthesame
complications,regardlessoftheapplication/
analysis
Purefunctionsenableanabstractionbarrier
betweendataprocessinglogicand
coordinatingadistributedapplication
56
MapReduceEvaluationModel(1/2)
Mapphase:Applyamapper
functiontoallinputs,emitting
intermediatekey-valuepairs
MapReduceEvaluationModel(2/2)
Reducephase:Foreachintermediatekey,
applyareducerfunctiontoaccumulateall
valuesassociatedwiththatkey
Thereducertakesaniterablevaluecontainingintermediate
key-valuepairs
Allpairswiththesamekeyappearconsecutively
Thereduceryieldszeroormorevalues,eachassociated
withthatintermediatekey
Themappertakesaniterablevaluecontaining
inputs,suchaslinesoftext
Themapperyieldszeroormorekey-value
pairsforeachinput
57
MapReduceExecutionModel(1/2)
58
MapReduceExecutionModel(2/2)
59
60
MapReduceexample
Froma1.1billionpeople
database(facebook?),we
wanttoknowtheaverage
numberoffriendsperage
In
SELECTage,AVG(friends)FROMusers
GROUPBYage
In
thetotalsetofusersinsplitted
differentusers_set
functionmap(users_set)
{
for(userinusers_set)
{
send(user.age,user.friends.size);
}
}
MapReduceAssumptions
Constraintsonthemapperandreducer:
Thekeysareshuffledandassignedto
reducers
functionreduce(age,friends):
{
varr=0;
for(friendinfriends)
{
r+=friend;
}
send(age,r/friends.size);
}
Themappermustbeequivalenttoapplyingadeterministic
purefunctiontoeachinputindependently
Thereducermustbeequivalenttoapplyingadeterministic
purefunctiontothesequenceofvaluesforeachkey
Benefitsoffunctionalprogramming:
Whenaprogramcontainsonlypurefunctions,call
expressionscanbeevaluatedinanyorder,lazily,andin
parallel
Referentialtransparency:acallexpressioncanbereplacedby
itsvalue(orvisversa)withoutchangingtheprogram
InMapReduce,thesefunctionalprogramming
ideasallow:
Consistentresults,howevercomputationis
Re-computationandcachingofresults,as
61
62
Summary
MapReduceBenefits
Parallelprogramming
Multi-threadingandhowtohelp
reduceprogrammererror
Distributedprogrammingand
MapReduce
Faulttolerance:Amachineorharddrivemight
crash
TheMapReduceframeworkautomaticallyre-runsfailedtasks
Speed:Somemachinemightbeslowbecause
it'soverloaded
Theframeworkcanrunmultiplecopiesofataskandkeepthe
resultoftheonethatfinishesfirst
Networklocality:Datatransferisexpensive
Theframeworktriestoschedulemaptasksonthemachines
thatholdthedatatobeprocessed
Monitoring:Willmyjobfinishbeforedinner?!?
Theframeworkprovidesaweb-basedinterfacedescribingjobs
63
64/64
Download