The Influence of Parallel Programming on Emerging Internet-based Distributed Application Models

advertisement
The Influence of Parallel Programming
on Emerging Internet-based Distributed
Application Models
Prof Mark Baker
School of Systems Engineering
of Reading
Tel: +44 118 378 8615
E-mail: Mark.Baker@computer.org
Web: http://acet.rdg.ac.uk/~mab
May 21, 07
mark.baker@computer.org
University
•
Outline
•
Programming Models
Parallel Programming:
–
Considerations,
•
Characteristics:
–
•
–
Applications,
Patterns.
Distributed Programming:
–
Considerations,
•
Characteristics:
–
•
•
•
•
–
Applications,
Patterns.
Observations.
Architectures.
Random musing.
Summary.
May 21, 07
mark.baker@computer.org
•
Programming Models
We are interested in programming models because they
can potentially help us understand application
development productivity and efficiency.
•
We want easy to write programs which can then be executed
efficiently on a multi-processors/core-based system.
Typical questions when thinking about programming
models:
–
–
–
–
How well do different programming models perform on different
hardware/software platforms?
How do hardware and system software features of a computer
affect the model’ s performance?
How easy is it to use each programming model?
Can the computer platforms support multiple programming
models!
May 21, 07
mark.baker@computer.org
Parallel Programming
Considerations
May 21, 07
mark.baker@computer.org
•
System characteristics
Machines:
•
–
Moved from specialised to commodity clusters.
Homogeneous:
–
–
Hardware:
• Processors, memory and cache.
Network:
• Low-latency/high-bandwidth,
Topologies:
• Specialised (Hypercubes/Torus) moving to commodity (Fat-Trees).
OS:
• Often specialised to support hardware.
Runtime:
• Optimised for underlying hardware and software, typically message
passing.
–
–
•
•
–
Nodes - moved from single processor, to SMPs, to multicore SMPs.
Security - none.
May 21, 07
mark.baker@computer.org
•
Application characteristics
•
•
•
•
Typically fine grain and fairly regular.
Message passing (MPI) - parallelising compilers failed.
Written in C/Fortran - or MPJ!!
SPMD.
Want to:
–
–
Attempt to overlap communication and computation,
Avoid race condition and deadlock,
Consider Amdahl's law, speedup of code overall.
Minimise communications,
Attempt to retain data locality,
Potentially pin processes to processors.
–
–
–
•
–
Other matters:
–
Take advantage of processor architecture:
•
•
–
–
Cache lines/hits,
Pipelining, Vectorisation!
Compiler/Library optimisations…
Execute - batch or interactive access to resources.
May 21, 07
mark.baker@computer.org
Parallel Programming
Patterns
May 21, 07
mark.baker@computer.org
•
Parallel Programming Patterns
Driving Forces of Parallel Patterns
–
Decomposition:
•
(D) Data decomposition,
(F) Functional decomposition.
•
–
Ordering:
•
(P) Ordering must be preserved,
(NP) Ordering does not matter and only the effect counts,
•
–
Communication structure:
•
(L) Local: neighbour-to-neighbour,
(R) Recursive: Tree structure,
(I) Irregular: Complex geometric shapes.
•
•
–
Dependencies:
•
•
•
(C) Clearly separable,
(D) Deterministically (or functionally) dependant,
(I) Inseparable.
Marc Snir - http://wing.cs.uiuc.edu/group/patterns/
May 21, 07
mark.baker@computer.org
Some Parallel Patterns
Just the more
familiar patterns!
May 21, 07
– Decomposition:
•
(D) Data decomposition,
•
(F) Functional decomposition.
– Ordering:
•
(P) Ordering must be preserved,
•
(NP) Ordering does not matter and only the effect counts,
– Communication structure:
•
(L) Local: neighbour-to-neighbour,
•
(R) Recursive: Tree structure,
•
(I) Irregular: Complex geometric shapes.
Dependencies:
•
(C) Clearly separable,
•
(D) Deterministically (or functionally) dependant,
mark.baker@computer.org
•
(I) Inseparable.
•
Examples of Parallel Patterns
Pipes:
•
–
Image Processing, streams of pixel data are passed through several filters.
Layers:
•
–
Gaussian elimination.
Repositories:
•
–
Tuple space - sources generate asynchronous requests to read, remove,
and add tuples.
Master/Worker:
•
–
Image processing.
Replicable:
•
–
N-Body problem, calculation of forces needs data from other N-1 bodies.
Divide and Conquer:
•
–
Mergesort.
Geometric:
–
Simulation of dynamic systems - Atmospheric model.
May 21, 07
mark.baker@computer.org
Distributed Programming
Considerations
May 21, 07
mark.baker@computer.org
•
System characteristics
Machines:
•
–
Anything…
Heterogeneous:
–
Hardware:
•
–
Processors, memory and cache.
Network:
•
–
Relatively high-latency/low-bandwidth,
Topologies:
•
–
Network of networks, head-node access to cluster.
OS:
•
–
Anything - Windows, Linux, Mac, …
Runtime:
•
•
Anything…
Nodes:
•
–
From a single processor, to SMPs, to multi-core SMPs.
Security:
–
Lots…
May 21, 07
firewalls, credentials (x.509, proxy),…
mark.baker@computer.org
•
Application characteristics
•
•
•
•
•
Typically course grain and regular,
Typically sequential, some messaging via Internet
protocols.
Latency tolerant!
Not to worried about race conditions and deadlock!
C/C++/Fortran…
Other matters:
•
•
•
–
Data locality…
caching!
Cannot pin processes to processors,
Need to cope with firewalls/security,
Normally via batch processing…
May 21, 07
mark.baker@computer.org
•
Distributed Application Patterns
•
•
Client/Server - majority!
P2P - BitTorrent/Limewire!
Master/Slave:
•
–
Parameter sweeps - Nimrod/Condor-G/…
Workflow:
•
–
Taverna, BPEL…
Distributed Databases:
•
–
OGSA-DAI…
Data Distribution:
•
–
LHC.
Coupled Simulations!
•
–
Environmental…
Access to shared resources:
–
Earthquake Engineering Simulation Grid (NEESgrid).
May 21, 07
mark.baker@computer.org
Observations
• Some distributed applications have
models/patterns similar to those of parallel
applications, but these are as expected coarse
grain with little of no inter-processor
communications:
– Pipes/Filters.
– Master/Worker,
– Repositories.
• However, the nature of parallel and distributed
systems are very different, and consequently, the
application that suit the systems differ.
May 21, 07
mark.baker@computer.org
Observations
• Typically:
– Parallel applications are written to exploit the
underlying hardware/software platform - be that a
Beowulf or proprietary platform (e.g. BG/L),
– Distributed/Grid applications have to utilise the
underlying middleware to execute on whatever
resources are available - so are much more general in
nature and not optimised for the underlying platform:
• Note - there appear to be no tools to auto-optimise/tune these
applications… also there are not many tools to aid the
programmer debug either.
May 21, 07
mark.baker@computer.org
•
Hardware Architecture
Global layer of peer-related gateways:
–
Which in turn have a local layer that interacts with the local data
sources, and/or a hierarchy child gateways.
May 21, 07
mark.baker@computer.org
Software Architecture
• Service Oriented Architecture:
– Publish, subscribe and bind paradigm,
– Ocean of services:
• Need to find them…
• Interact via messages.
– Written in any language,
– Services are dynamic, adaptable, and composable.
– Statefull/stateless services?
May 21, 07
mark.baker@computer.org
•
Random Musing
Tangential ones:
–
–
–
–
–
–
–
–
–
What are the affects of virtualisation of system resources on application
models?
How do we exploit multi-core systems?
• Message-passing, plus threads…
How do we provide fault-tolerant services?
What are the affects of security on the application models?
Following standards means that we have to manipulate our applications
to fit into a certain “ model” of operation/execution! - is this a good thing?
Waiting for standards to pop out, means that many developers are in
limbo…
• We have used XOH to avoid the “ standards” trap!
Integration of start-up functionality into applications, rather than a
batching/scripting approach.
Current generation of Grid middleware software is far to complicated…
Higher likelihood that functionality needed for distributed systems will be
put onto chips.
May 21, 07
mark.baker@computer.org
•
Summary
•
•
Parallel platforms are typically homogeneous, tightly coupled,
and a low-latency/high-bandwidth networks.
Distributed platforms are heterogeneous, loosely coupled, with
relatively high-latency/low-bandwidth networks.
The applications style that work on both platforms are:
–
–
•
•
–
Pipes/Filters - functional decomposition,
Master/Worker,
Repositories.
The application that best exploit distributed platforms are
coarse grain and fairly long lived to ameliorate the overheads
of the underlying infrastructure.
It seems that many of the lessons learnt from parallel
programming have been lost/forgotten for distributed
programming!
May 21, 07
mark.baker@computer.org
Summary
• Potential ways ahead:
– Look at current generation of distributed/grid
applications,
– Identify key common features,
– Use these to help categorise the applications,
– Features to be identified:
• For Applications perhaps:
–
–
–
–
May 21, 07
Architecture (layout/framework),
Components,
Interactions - user/IO/data/messages,
Algorithms.
mark.baker@computer.org
Download