IPDPS Looking Back Panel Uzi Vishkin,

advertisement
IPDPS Looking Back Panel
Uzi Vishkin, University of Maryland
Moderator: What has gone well?
IPDPS a big success. Has become top facilitator for:
•Specific technical contributions, and
•Open debate of challenges – e.g., this panel
Warm congrats to Viktor and all others!
2
What also has gone well
•Parallel PRAM algorithmic theory, second in
magnitude only to the serial algorithmic theory
•Won the “battle of ideas” in the 1980s. Repeatedly:
•Challenged without success  no real alternative!
3
Dream opportunity
Limited interest in parallel computing evolved into
quest for general-purpose parallel computing in
mainstream machines
Example many-core desktops
So far for the good news
Are we doing everything we can to ensure that manycores are not rejected by programmers?
Recall Einstein’s observation:
A perfection of means, and confusion of aims, seems
to be our main problem
4
“If you find yourself in a hole, stop digging”
Moderator: What has gone wrong
We found ourselves in a hole: most programmers can’t handle today’s
(multicore) desktops
Moderator: What was surprising
We keep digging
Why are we in trouble
1940’s Stored-program & program-counter  For serial comp:
knowledge of algorithms was low priority for arch (& knowledge of
arch was low priority for alg people)
No such agreed “bridge” for many-cores. Still:
- Industry realizes the need to reinvent computing for parallelism but is
stuck with short-term pressures/culture. Academia hold on.
- Education Architects pay too limited attention to (this time parallel)
algorithms. How will they know to build machines that are easy to
program? Instead, funding guides using problematic designs
5
What follows
- Are many-core architectures doomed to mismatch
parallel algorithms and ease-of-programming (EoP)?
- What difference can a matching arch make?
- What is feasible?
6
How come that most programmers can’t handle
today’s (multicore) desktops?
Hypothesis: Flawed architecture foundation
Origin: ‘build-first figure-out-how-to-program-later’
Parallel languages: fitted flawed architectures then standardized
Who can save the field and promote the aim of ease-of-programming (EoP)?
Industry (perfecting means)
- Follow-up architectures fit language standards  remain flawed
- Insufficient competition
Academia (perfecting means)
- Consider a vendor-backed flawed system. Wonderful opportunity for our
originality-seeking publications culture:
* The simplest problem requires creativity  More papers
* Cite one another if on similar systems high # citations coupled
with ‘industry impact’
- Ultimate job security – By the time the ink dries on these papers, next
7
flawed ‘modern’ ‘state-of-the-art’ system. Culture of short-term impact
Anecdotal Validation (?)
• Breadth-first-search (BFS) example 42 students: joint UIUC/UMD course
- <1X speedups using OpenMP on 8-processor SMP
- 7x-25x speedups on 64-processor XMT FPGA prototype [Built at UMD]
What’s the big deal of 64 processors beating 8?
Silicon area of 64 XMT processors ~= 1-2 SMP processors
• Questionnaire Rank approaches for achieving (hard) speedups: All students,
but one : XMTC ahead of OpenMP
• Order-of-magnitude advantage on teachability (MS, HS & up, SIGCSE’10)
• SPAA’11: >100X speedup on max-flow relative to 2.5X on GPU (IPDPS’10)
• Fleck/Kuhn: research too esoteric to be reliable  exoteric validation!
What has gone wrong Only heroic programmers can exploit the vast
parallelism in current machines – The Future of Computing Performance:
Game Over or Next Level?, Report by NAE 2011. Conclusion Fund power..
Reward alert: Try to publish a paper boasting easy to obtain results
8
 EoP: 1. Badly needed. Yet, 2. A lose-lose proposition.
Parallel Random-Access Machine/Model
PRAM:
n synchronous processors all having unit time access to a shared memory.
Reactions
You got to be kidding, this is way:
- Too easy
- Too difficult:
Why even mention processors? What to do with
How to allocate processors to instructions?
n processors?
Immediate Concurrent Execution
‘Work-Depth framework’ SV82, Adopted in Par Alg texts [J92,KKT01].
ICE basis for architecture specs:
V, Using simple abstraction to reinvent computing for parallelism, CACM 1/2011
Similar to role of stored-program & program-counter in arch specs for serial comp
10
Algorithms-aware many-core is feasible
Algorithms
PRAM-On-Chip HW Prototypes
Programming
64-core, 75MHz FPGA of XMT [SPAA98..CF08]
Toolchain Compiler +
simulator HIPS’11
128-core interconnection network
Programmer’s
workflow
IBM 90nm: 9mmX5mm,
400 MHz [HotI07]
-
Rudimentary yet stable FPGA designASIC
•
IBM 90nm: 10mmX10mm
compiler
•
150 MHz
Architecture scales to 1000+ cores on-chip
XMT homepage: www.umiacs.umd.edu/users/vishkin/XMT/index.shtml or search:
‘XMT’
Where are your specs?
What is your par alg abstraction?
• ‘First-specs then-build’ is “not uncommon”.. for engineering
• I see only 2 options for architects:
A. 1. Go through parallel algorithms immersion 2. Develop abstraction
that meets EoP 3. Develop specs 4. Build
B. Start from abstraction with proven EoP
Sociologists of science
• Debates between adherents of different thought
styles consist almost entirely of misunderstandings.
Members of both parties are talking of different
things (though they are usually under an illusion
that they are talking about the same thing). They
are applying different methods and criteria of
correctness (although they are usually under an
illusion that their arguments are universally valid
and if their opponents do not want to accept them,
then they are either stupid or malicious)
13
Download