Parallel computing

advertisement
PARALLEL COMPUTING
平行計算
賴寶蓮
Reference
https://computing.llnl.gov/tutorials/parallel_comp/
Parallel Programming in C with MPI and OpenMP
TextBook
2

Introduction to Parallel Computing
 George
Karypis, Vipin Kumar
 2003,
Pearson
 全華代理
2015/4/13
參考教材
3







Parallel Programming in C with MPI and OpenMP, McGraw
Hill,2004
Patterns for Parallel Programming, Addison-Wesley,2004
Parallel Programming with MPI, Peter S. Pacheco, Morgan
Kaufmann Publishers, 1997
OpenMP Specification, www.openmp.org
www.top500.org
Parallel Computing
International Journal of Parallel Programming
2015/4/13
von Neumann Architecture
4
Comprised of four main components:
Memory
Control Unit
Arithmetic Logic Unit
Input/Output
2015/4/13
Serial computing
5
Traditionally, software has been written for serial computation:
To be run on a single computer having a single Central
Processing Unit (CPU);
A problem is broken into a discrete series of instructions.
Instructions are executed one after another.
Only one instruction may execute at any moment in time.
2015/4/13
Parallel computing
6
In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem:
To be run using multiple CPUs
A problem is broken into discrete parts that can be solved concurrently
Each part is further broken down to a series of instructions
Instructions from each part execute simultaneously on different CPUs
2015/4/13
Parallel computing
7

The compute resources can include:
A
single computer with multiple processors;
 An arbitrary number of computers connected by a
network;
 A combination of both.
2015/4/13
Parallel computing
8

The computational problem usually demonstrates
characteristics such as the ability to be:
 Broken
apart into discrete pieces of work that can be
solved simultaneously;
 Execute multiple program instructions at any moment in
time;
 Solved in less time with multiple compute resources than
with a single compute resource.
2015/4/13
The Universe is Parallel:
9



Parallel computing is an evolution of serial computing that attempts
to emulate what has always been the state of affairs in the natural
world:
many complex, interrelated events happening at the same time, yet
within a sequence.
For example:








Galaxy銀河formation
Planetary行星的movement
Weather and ocean patterns
Tectonic築造的plate drift
Rush hour traffic
Automobile assembly line
Building a space shuttle
Ordering a hamburger at the drive through.
2015/4/13
Uses for Parallel Computing:
10


Historically, parallel computing has been considered to be "the
high end of computing", and has been used to model difficult
scientific and engineering problems found in the real world.
Some examples:








Atmosphere, Earth, Environment
Physics - applied, nuclear, particle, condensed matter, high pressure,
fusion, photonics
Bioscience, Biotechnology, Genetics
Chemistry, Molecular Sciences
Geology, Seismology
Mechanical Engineering - from prosthetics to spacecraft
Electrical Engineering, Circuit Design, Microelectronics
Computer Science, Mathematics
2015/4/13
Uses for Parallel Computing:
11


Today, commercial applications provide an equal or greater driving
force in the development of faster computers. These applications
require the processing of large amounts of data in sophisticated ways.
For example:










Databases, data mining
Oil exploration
Web search engines, web based business services
Medical imaging and diagnosis
Pharmaceutical design
Management of national and multi-national corporations
Financial and economic modeling
Advanced graphics and virtual reality, particularly in the entertainment industry
Networked video and multi-media technologies
Collaborative work environments
2015/4/13
Why Use Parallel Computing?
12

Main Reasons:

Save time and/or money:

Solve larger problems:
Provide concurrency:
Use of non-local resources:

Limits to serial computing:


2015/4/13
Why Use Parallel Computing?
13

Save time and/or money:
In theory, throwing more resources at a task will shorten its
time to completion, with potential cost savings.
 Parallel clusters can be built from cheap, commodity
components.

2015/4/13
Why Use Parallel Computing?
14

Solve larger problems:
 Many
problems are so large and/or complex that it is
impractical or impossible to solve them on a single
computer, especially given limited computer memory.
 For example:
 "Grand
Challenge"
(en.wikipedia.org/wiki/Grand_Challenge) problems
requiring PetaFLOPS and PetaBytes of computing resources.
 Web
search engines/databases processing millions of
transactions per second
2015/4/13
Why Use Parallel Computing?
15

Provide concurrency:
A
single compute resource can only do one thing at a
time.
 Multiple computing resources can be doing many things
simultaneously.
 For example, the Access Grid (www.accessgrid.org)
provides a global collaboration network where people
from around the world can meet and conduct work
"virtually".
2015/4/13
Why Use Parallel Computing?
16

Use of non-local resources:
 Using
compute resources on a wide area network, or
even the Internet when local compute resources are
scarce.
 For example:
 SETI@home
(setiathome.berkeley.edu) uses over 330,000
computers for a compute power over 528 TeraFLOPS (as of
August 04, 2008)
 Folding@home (folding.stanford.edu) uses over 340,000
computers for a compute power of 4.2 PetaFLOPS (as of
November 4, 2008)
One petaflops is equal to 1,000 teraflops, or 1,000,000,000,000,000 FLOPS.
2015/4/13
Why Use Parallel Computing?
17

Limits to serial computing:
 Both
physical and practical reasons pose significant
constraints to simply building ever faster serial
computers:
 Transmission
speeds
 Limits to miniaturization小型化
 Economic limitations
2015/4/13
Why Use Parallel Computing?
18

Current computer architectures are increasingly
relying upon hardware level parallelism to improve
performance:
 Multiple
execution units
 Pipelined instructions
 Multi-core
2015/4/13
Who and What ?
19

Top500.org provides statistics on parallel computing
users –
the charts below are just a sample.
 Some things to note: Sectors may overlap –
 for example, research may be classified research.
Respondents have to choose between the two.
 "Not Specified" is by far the largest application probably means multiple applications.

2015/4/13
The Future:
20

During the past 20 years, the trends indicated by
ever faster networks, distributed systems, and multiprocessor computer architectures (even at the
desktop level) clearly show that parallelism is the
future of computing.
2015/4/13
Modern Parallel Computers
21


Caltech’s Cosmic Cube (Seitz and Fox)
Commercial copy-cats
 nCUBE
Corporation
 Intel’s Supercomputer Systems Division
 Lots more

Thinking Machines Corporation
2015/4/13
Modern Parallel Computers
22







Cray Jaguar (224162 cores, 1.75PFlops)
IBM Roadrunner (122400 cores, 1.04PFlops)
Cray Kraken XT5 (98928 cores, 831TFlops)
IBM JUGENE (294912 cores, 825TFlops)
NUDT Tianhe-1 (71680 cores, 563TFlops)
(2009/11 TOP 5)
IBM 1350 (台灣國網中心) (2048 cores, 23TFlops)
2015/4/13
Seeking Concurrency
23




Data dependence graphs
Data parallelism
Functional parallelism
Pipelining
2015/4/13
Data Dependence Graph
24



Directed graph
Vertices = tasks
Edges = dependences
2015/4/13
Data Parallelism
25

Independent tasks apply same operation to
different elements of a data set
for i  0 to 99 do
a[i]  b[i] + c[i]
endfor

Okay to perform operations concurrently
2015/4/13
Functional Parallelism
26

Independent tasks apply different operations to
different data elements
1.
2.
3.
4.
5.
a2
b3
m  (a + b) / 2
s  (a2 + b2) / 2
v  s - m2
First and second statements
 Third and fourth statements

2015/4/13
Pipelining
27


Divide a process into stages
Produce several items simultaneously
2015/4/13
Partial Sums Pipeline
28
p[0]
p[1]
p[0]
p[2]
p[1]
p[3]
p[2]
=
+
+
+
a[0]
a[1]
a[2]
a[3]
2015/4/13
Data Clustering
29

Data mining


looking for meaningful patterns in large data sets
Data clustering
 organizing

a data set into clusters of “similar” items
Data clustering can speed retrieval of related items
2015/4/13
A graphical representation of Amdahl's law.
36
The speedup of a program
using multiple processors in
parallel computing is limited by
the sequential fraction of the
program.
For example,
if 95% of the program can
be parallelized, the
theoretical maximum
speedup using parallel
computing would be 20×
as shown in the diagram,
no matter how many
processors are used.
2015/4/13
Programming Parallel Computers
37

Extend compilers:
 translate

Extend languages:
 add


sequential programs into parallel programs
parallel operations
Add parallel language layer on top of sequential
language
Define totally new parallel language and compiler
system
2015/4/13
Strategy 1: Extend Compilers
38

Parallelizing compiler
 Detect
parallelism in sequential program
 Produce parallel executable program

Focus on making Fortran programs parallel
2015/4/13
Extend Compilers (cont.)
39

Advantages
 Can
leverage millions of lines of existing serial
programs
 Saves time and labor
 Requires no retraining of programmers
 Sequential programming easier than parallel
programming
2015/4/13
Extend Compilers (cont.)
40

Disadvantages
 Parallelism
may be irretrivably lost when programs
written in sequential languages
 Performance of parallelizing compilers on broad range
of applications still up in air
2015/4/13
Extend Language
41

Add functions to a sequential language
 Create
and terminate processes
 Synchronize processes
 Allow processes to communicate
2015/4/13
Extend Language (cont.)
42

Advantages
 Easiest,
quickest, and least expensive
 Allows existing compiler technology to be leveraged
 New libraries can be ready soon after new parallel
computers are available
2015/4/13
Extend Language (cont.)
43

Disadvantages
 Lack
of compiler support to catch errors
 Easy to write programs that are difficult to debug
2015/4/13
Current Status
47



Low-level approach is most popular
 Augment existing language with low-level parallel
constructs (by function call)
 MPI and OpenMP are examples
Advantages of low-level approach
 Efficiency
 Portability
Disadvantage: More difficult to program and debug
2015/4/13
OpenMP
48

使用Shared Memory的架構
https://computing.llnl.gov/tutorials/parallel_comp/
2015/4/13
MPI
49
MPI是一種standard
使用Distributed Memory
市面上比較流行的軟體有LAM和MPICH。
https://computing.llnl.gov/tutorials/parallel_comp/
2015/4/13
Organization and Contents of this Course
51

Fundamentals: This part of the class covers
 basic
parallel platforms
 principles of algorithm design
 group communication primitives, and
 analytical modeling techniques.
2015/4/13
Organization and Contents of this Course
52

Parallel Programming: This part of the class deals
with programming using
 message
passing libraries and
 threads.

Parallel Algorithms: This part of the class covers
basic algorithms for
 matrix
computations,
 graphs, sorting,
 discrete optimization, and
 dynamic programming.
2015/4/13
課程大綱
53







Introduction to Parallel Computing
Parallel Programming Platforms
Principles of Parallel Algorithm Design
Basic Communication Operations
Analytical Modeling of Parallel Programs
Programming Using the Message-Passing Paradigm
(MPI)
期中報告(or 期中考試)
2015/4/13
課程大綱
54








Programming Shared Address Space Platforms
(OpenMP)
Dense Matrix Algorithms
Sorting Algorithms
Graph Algorithms
Hadoop 簡介與安裝架設
Map-Reduce 程式架構簡介
Map-Reduce 範例程式介紹與實作
期末計劃 (or期末報告)
2015/4/13
計分
55



平時成績(上課,出席,實作,作業) 40%
期中報告(or 期中考試) 30%
期末計劃(or期末報告) 30%
2015/4/13
P2P system
56

Peer-to-peer system
 Client
acts as a server
 Share data
 BT
2015/4/13
http://www.fidis.net/typo3temp/tx_rlmpofficelib_0c97e8a6cd.png
57
2015/4/13
Grid Computing
58




Distributed computing
Ethernet/ Internet
Volunteer computing networks
Software-as-a-service (SaaS)
 Software
that is owned, delivered and managed
remotely by one or more providers.
2015/4/13
59
2015/4/13
http://www.csa.com/discoveryguides/grid/images/gridcomp.gif
Cloud Computing
60



雲端運算不是技
術,它是概念
Distributed
computing
Web services
2015/4/13
Cloud Computing
61
http://lonewolflibrarian.files.wordpress.com/2009/02/cloud-computing-kitchen-sink.jpg
2015/4/13
Web 2.0
62

Web 2.0 是一個架構在網際網路上的平台,人
與人之間互動和分享而產生出的內容,經由在
服務導向的架構中的程式,在這個平台被發佈,
管理和使用。其中服務是透過一個應用程式介
面(Application Programming Interface, API)來叫用
的。
2015/4/13
軟體即服務(Software as a Service)
63

SaaS 是一種軟體應用和提供模式。
 在此模式中,應用軟體是由服務提供商(Service
Providers)所控管,將軟體及應用程式以網路服務形
式進行,提供使用者和客戶按需(on-demand)軟體應
用服務。

在使用者及客戶端,無須安裝、維護、更新應
用軟體和硬體設備。
2015/4/13
平台即服務(Platform as a Service)
64

PaaS 是因應 SaaS 而生的應用系統佈建模式。
模式提供建構與發佈 Web Applications 與服務
所需的設備與建置軟體所需的 Life cycle,藉由
Internet 即可使用,無須軟體的下載與安裝,包括
開發人員,IT 管理者,使用者等皆可以享用此平台
提供的好處。也就是一般所說的 Cloudware。
 PaaS
2015/4/13
基礎架構即服務
65



(Originally Hardware as a Service, HaaS) 提供電腦
運算基礎設施,通常是一個虛擬環境的平台
(Platform virtualization),作為一種服務。
資源包括:伺服器(Servers)、網路設施(Network
equipment)、記憶體(RAM)及儲存硬體(Disk)、CPU、
資料中心設施、等等。
動態資源配置、增加或減少依據應用程式運算資
源需求。
2015/4/13
雲端運算
66

雲端運算的概念結合了HaaS、PaaS、SaaS、
Web 2.0、和其它相關技術(像是MapReduce、
Ajax、Virtualization等等),共同在 Internet 架構
上,來滿足使用者在運算資源的需求。
2015/4/13
網路泡沫化?!
67


2000網路泡沫化原因
2011觀點

Nasdaq 100超越2007年高點,網路泡沫化再現?


寒冬來,網路免費午餐還有嗎?


出處:Web Only 2009/03
日期:2010/02/08 ‧作者:ZDNet記者曠文溱 /台北報導
谷歌執行長看見網路泡沫


作者:經濟學人
分析師看雲端:小心重演網路泡沫化


2011/01/11-雷佳宜
Yahoo news 更新日期:2011/02/11 13:36
二次網路泡沫 愈吹愈大了

Udn news 更新日期:2011/02/21 09:46
2015/4/13
寒冬來,網路免費午餐還有嗎?
68


上一次網路泡沫化,正是因為各個網站所提供
的免費服務,無法在吸引大量使者用後,透過
廣告業務轉化成利潤。
而Google的成功,再度吹大了「Web 2.0」的泡
沫。
2015/4/13
寒冬來,網路免費午餐還有嗎?
69


大部分人士認為,上一次網路泡沫化最重要原
因是頻寬不足。
現在,免費午餐再度大行其道,MySpace、
YouTube、Facebook與Twitter,全都提供免費的
服務,並且期待未來的某一天,大量使用者能
為網站帶來大量的廣告利潤。
2015/4/13
寒冬來,網路免費午餐還有嗎?
70


不過,現實再次證明,靠網路廣告收益維持營
運的公司,遠比大部分想像的少。
矽谷似乎再度進入寒冬,網路公司再度開始裁
員、縮小規模、停止營運、試著把自己賣掉,
或者開始向使用者收費。
2015/4/13
寒冬來,網路免費午餐還有嗎?
71


MySpace和YouTube在泡沫破裂前,就找到了買
家。他們把「找出可獲利的企業商業模式」丟
給了別人。
Facebook嘗試了許多方法,最新的一 招是
Facebook Connect
 這項服務可以讓外部網站,利用Facebook的登入機
制,讓Facebook的會員在概念上直接變成外部網站
的會員。(站在巨人的肩膀上:超上手之Facebook Connect)

Twitter的創立者則計畫2010年前,不把利潤列
入考量,但現在卻又打算在頁面插入廣告。
2015/4/13
寒冬來,網路免費午餐還有嗎?
72


免費提供服務,希望未來能獲得廣告收益的想
法,自然受到使用者歡迎,也符合商業邏輯。
拜網路科技之賜,新公司進入市場的門檻降低
,也讓企業利用網絡效應,以極低的成本吸引
並黏住使用者。
2015/4/13
寒冬來,網路免費午餐還有嗎?
73




各個社交網站與搜尋引擎等網站,無不急著提
供免費服務。
因為,如果太擔心未來的利潤,很可能就已落
後別人一截。
不過,最終每個企業還是需要利潤,而廣告並
沒有辦法提供足夠的利潤。
免費服務是個迷人的想法,但兩次網路泡沫化
的教訓是:一定會有某個人,要為這頓午餐買
單。
2015/4/13
分析師看雲端:
小心重演網路泡沫化
74

從Gartner的角度看來,這一波雲端的熱潮,可
能造成很多業者戰死沙場。
 Gartner是美國一家從事信息技術研究和諮詢的公司
。它的總部位於康乃狄克州的史丹福。公司創立於
1979年,在2001前的名稱為The Gartner Group。
2015/4/13
分析師看雲端:
小心重演網路泡沫化
75



Gartner全球儲存市場管理副總裁Phillip Sargeant上週在台
表示,由於許多業者諸如IBM、HP、微軟等,提供基礎架
構服務化 (IaaS, Infrastructure-as-a-services)或平台服務化
(PaaS, Platform-as-a-services),降低業者提供雲端服務的
門檻,致使投身雲端者前仆後繼。
可以預見的未來,在同類型的服務中,會有數間公 司競
逐,屆時必定會有所淘汰,重演達康泡沫化時期。
他認為,有意提供雲端服務的業者,必須確保產品的獨特
性,否則勢必得退出市場。
2015/4/13
分析師看雲端:
小心重演網路泡沫化
76
另外一點造就成敗與否的關鍵在於成本。
 Sargeant表示,業者除了必須提出差異化的
產品,也要能說服用戶

 採用雲端服務確實比自行建置來得划算——從
總持有成本的角度來看。
2015/4/13
分析師看雲端:
小心重演網路泡沫化
77

雲端議題當道。根據Gartner的調查報告,
今年十大IT技術中,前三名排名分別為
 虛擬化
 雲端運算
 Web

2.0
其中,雲端運算和Web 2.0晉級的速度飛快
 去年這兩項技術的排名分別為16名和15名。
2015/4/13
分析師看雲端:
小心重演網路泡沫化
78

看好雲端前景。
 Gartner預期到了2012年,有兩成的企業就不會再
擁有自家的IT設備。

然而Gartner也一再為企業踩煞車,表示雲
端確實正在發生,但是不能貿進。
 Gartner副總Stephen
Prentice在上個月底即指出,對
企業用戶而言,目前要採用雲端服務的風險,就是
選擇還太少
 僅有Google、Amazon和微軟等服務供應商。
遠
在美國的雲端架構,使得安全性和即時反應的速度
都啟人疑竇。
2015/4/13
谷歌執行長看見網路泡沫
79


瑞士雜誌Bilanz今天刊登谷歌(Google)執行長
施密特(Eric Schmidt)專訪,他說網路公司市值
有泡沫化的跡象很明顯。
詢及市場對臉書(Facebook)等社群網路公司
與Zynga等遊戲開發商的市值評估都很高,施密
特說:「明顯有泡沫化的跡象,但市值就是這
樣。大家認為這些公司能在未來獲得巨額營收
。」
2015/4/13
谷歌執行長看見網路泡沫
80


「華爾街日報」(The Wall Street Journal)今天
報導,谷歌、臉書與其他公司都與推特(Twitter
)進行低階的併購談判。市場估計推特市值高
達100億美元。
詢及他是否將繼續在谷歌待4年,以獲取價值1
億美元的股票,施密特說:「是,我的計畫是
這樣。」
2015/4/13
二次網路泡沫 愈吹愈大了
81

英國衛報報導,新生代網路公司近來潛在市值
迅速竄升,以當紅炸子雞Facebook為例,估價
甚至超越福特汽車(Ford),但對部份投資人
而言,第二波網路泡沫的警鐘儼然已經響起。
2015/4/13
二次網路泡沫 愈吹愈大了
82

新生代網路公司幾乎每隔一周就傳出令人興奮
的消息。
 社群遊戲業者Zygna靠著農場鄉村(FarmVille)身
價扶搖直上,估值90億美元
 微網誌推特(Twitter)雖然還在虧損,估值100億
美元
 甫拒絕Google 60億美元併購提議的團購折扣網站
Groupon,潛在市值150億美元。
2015/4/13
二次網路泡沫 愈吹愈大了
83



有些科技觀察家說,等到Facebook上市(時間
可能在明年),真正的榮景就要來臨。
Facebook員工最近計劃售股籌資,以其每股價
格推估,市值高 達600億美元,比1月估值的
100億美元高出許多。
儘管這還不及Google市值的三分之一,卻已超
越福特汽車的550億美元,略低於威士公司 (
Visa)的630億美元。
2015/4/13
二次網路泡沫 愈吹愈大了
84


對照過往,1995年網景公司(Netscape)上市
,股價一飛沖天,象徵前一波達康(dotcom)
榮景的起點,Facebook上市成為觀察家密切關
注的分水嶺。
科技顧問公司Broadsight共同創辦人派翠克(
Alan Patrick )則警告,世界正處於另一波泡沫
的起點。他指出十個泡沫生成的跡象,而前八
個跡象已經浮現:
2015/4/13
二次網路泡沫 愈吹愈大了
85

一,出現無法以舊方法衡量價值的「新事物」
(意指這類科技新秀或新創公司)。
 有人看好「新事物」的機會,砸大筆笨錢收購。


二,聰明的人發現泡沫出現;「新事物」的鼓
吹者編織著更遠大的夢想。
三,系出名門的新創公司創辦人(如出身於某
個新事物公司)毫無理由獲得多得令人咋舌的
資金。
2015/4/13
二次網路泡沫 愈吹愈大了
86





四,一批批新投資基金湧入新創公司。
五,公司無須擁有產品,只要透過簡報就能取
得資金。
六,商學碩士(MBA)離開銀行,自行創立公
司。
七,新公司大量上市。
八,銀行為「新事物」製造興旺假象,退休金
投入。
2015/4/13
二次網路泡沫 愈吹愈大了
87


九,計程車司機開始建議你買某支股票。
十,「新事物」用笨錢買下舊世界的公司,泡
沫尾聲近了。
2015/4/13
Download