Robert van Engelen - FSU Computer Science Department

advertisement
Automa'c Programming Robert van Engelen Research Seminar 2011 1 Programming •  Is fun •  Is crea've •  Is sa'sfying to solve real-­‐world problems Research Seminar 2011 2 Programming •  Is fun •  Is crea've •  Is sa'sfying to solve real-­‐world problems •  The old saying goes that programming is 80% transpira'on and 20% inspira'on •  “Mythical man month” suggests that effort does not scale with project size •  Parallel programming methodologies s'll in infancy stage Research Seminar 2011 3 Lots of Tools for Code Checking To name just a few: Sta/c code analysis: Lint and splint GNU tool for bug-­‐sniffing C PC-­‐lint bug-­‐sniffing C++ clang C/C++ compiler and sta'c checker Model checking steps through every possible execu'on path: Klocwork C/C++, C#, and Java analysis Coverity C/C+ analysis Dynamic analysis Valgrind run'me memory, threading, and code analysis Dmalloc malloc debugger Insure++ run'me memory error detec'on TotalView profiling and debugging OpenMP/MPI apps Alinea DDT profiling and debugging OpenMP/MPI/Cuda apps Research Seminar 2011 4 Abstract Interpreta'on Program Analysis Axioma'c Seman'cs Denota'onal Seman'cs Opera'onal Seman'cs Logical Inference Theorem Proving Research Seminar 2011 Formal Verifica2on Model Checking Program Verifiers Data Flow Analysis Compilers Sta'c Analysis vs. Model Checking vs. Formal Verifica'on Sta'c Seman'c Analysis 5 Accurate and Fast Sta'c Analysis is Essen'al for Compiler Op'miza'ons •  Given a program P, produce a program P' that produces the same output values as P for a given input, but has a lower cost [Aho88] •  Op'miza'on is an economic ac'vity [Novak05] –  Cost: a larger and some'mes slower compiler –  Benefit: amount saved by the code improvement •  It is not possible to op'mize everything –  NP completeness and undecidability of many program analysis and op'miza'on problems •  The goal is to find leverage: cases where there is a large expected payoff for a small cost Research Seminar 2011 6 Compilers are Powerful Op'mizers, but they’re Brain-­‐Dead Many inefficiencies can be op'mized away by compilers if the sta2c analysis and data flow analysis are sufficiently powerful Warning: compilers op'mize code without regard to parallel execu'on! x = 1; // compiler removes this as dead code … // because there is no use of x here x = 0; Research Seminar 2011 7 Compilers are Powerful Op'mizers, but they’re Brain-­‐Dead Many inefficiencies can be op'mized away by compilers if the sta2c analysis and data flow analysis are sufficiently powerful Warning: compilers op'mize code without regard to parallel execu'on! Process 0 Process 1 x = 1; // removed if (x == 1) … // no use of x here exit(0); x = 0; Note: use volatile in C Research Seminar 2011 8 A Sta'c Checker is your Friend! What is the problem with this example that a compiler can warn about? 1 if( x != 0 ) 2 if( p ) *p = *p/x; 3 else 4 if( p ) *p = 0; Research Seminar 2011 9 A Sta'c Checker is your Friend! What is the problem that a compiler will spot here? 3 void out( int n ) 4 { 5 cout << n << "\n"; 6 } 7 8 void show( int a, int b, int c ) 9 { 10 out( a ); out( b ); out( c ); 11 } 12 13 int main() 14 { 15 int i = 1; 16 show( i++, i++, i++ ); 17 return 0; 18 } Research Seminar 2011 10 A Sta'c Checker is your Friend! What is the problem that a compiler will spot here? 4 unsigned a[100] = {0}; 5 6 int main() 7 { 8 char buf[200]; 9 unsigned n = 0; 10 11 while( fgets( buf, 200, stdin ) ) 12 if( n < 100 ) a[n++] = strlen(buf); 13 14 while( -­‐-­‐n >= 0 ) 15 priny( "%d\n", a[n] ); 16 17 return 0; 18 } Research Seminar 2011 11 A Sta'c Checker is your Friend! What is the problem that a compiler will spot here? 3 void print_mod( int i, int n ) 4 { 5 if( n == 0 && i == 0 ) return; 6 priny( "%d mod %d == %d\n", i, n, i % n ); 7 } 8 9 int main() 10 { 11 for( int i = 0; i < 10; i++ ) 12 for( int j = 0; j < 10; j++ ) 13 print_mod( i, j ); 14 return 0; 15 } Research Seminar 2011 12 A Sta'c Checker is your Friend! What is the problem that a compiler will spot here? 1 int shamrock_count( int leaves, double leavesPerShamrock ) 2 { 3 double shamrocks = leaves; 4  shamrocks /= leavesPerShamrock; 5 return leaves; 6 } 7 8  int main() 9  { 10 priny( "%d\n", shamrock_count( 314159, 3.14159 ) ); 11 return 0; 12 } Research Seminar 2011 13 A Sta'c Checker is your Friend! What is the problem that a compiler can spot here? 1 class BankAccount 2 3 def accountName 4 @accountName = "John Smith" 5 end 6 7 def deposit 8 @deposit 9 end 10 11 def deposit=(dollars) 12 @deposit = dollars 13 end 14 15 def ini'alize () 16 @deposet = 100.00 17 end 18 19 def test_method 20 puts "The class is working" 21 puts accountName 22 end 23 end Research Seminar 2011 14 Let’s Take a Look at Sta'c Analysis A novice programmer once wrote (true story): for( int i = 0; i < 5; i++ ) p++; He probably meant to increase the pointer p by 5 Research Seminar 2011 15 Let’s Take a Look at Sta'c Analysis for( int i = 0; i < 5; i++ ) p++; Fortunately, a compiler with an excellent sta'c analyzer will op'mize this loop to p += 5 Ques2on: can we do the same for the following loop with n? for( int i = 0; i < n; i++ ) p++; Research Seminar 2011 16 Let’s Take a Look at Sta'c Analysis for( int i = 0; i < 5; i++ ) p++; Fortunately, a compiler with an excellent sta'c analyzer will op'mize this loop to p += 5 Ques2on: can we do the same for the following loop with n? for( int i = 0; i < n; i++ ) p++; Answer: if(n > 0) p += n; Research Seminar 2011 17 Formal Verifica'on How do we know for sure that the transformed code if(n > 0) p += n; is correct? We use formal verifica2on: first we rewrite for( int i = 0; i < n; i++ ) p++; into the de-­‐sugared equivalent form: int i = 0; while( i < n ) { p = p + 1; i = i + 1; } Research Seminar 2011 18 Formal Verifica'on with Axioma'c Seman'cs { 0 < n ∧ p = q } { 0 < n ∧ p – q = 0 } i = 0; { i < n ∧ p – q = i } while( i < n ) { { i < n ∧ p – q = i } { i+1 < n ∧ p+1 – q = i+1 } p = p + 1; { i+1 < n ∧ p – q = i+1 } i = i + 1; { i < n ∧ p – q = i } } { i > n ∧ i < n ∧ p – q = i } { p = q + n } weakest precondi/on apply assignment rule loop invariant i < n ∧ loop invariant apply assignment rule apply assignment rule loop invariant i > n ∧ loop invariant postcondi/on Research Seminar 2011 19 Formal Verifica'on •  Axioma'c seman'cs assumes –  a rela'vely simple machine model (state) –  a rela'vely simple data type system (no pointers) –  requires the user to determine invariants •  Not suitable for large projects Research Seminar 2011 20 A Sta'c Checker is your Friend! What is the problem that the compiler can spot here? 1 int a[10000]; 2 3 void init() 4 { 5 int i; 6 7 for( i = 0; i =< 10000; i++ ); 8 a[i] = i; 9 } Research Seminar 2011 21 How to Analyze: Abstract Interpreta'on int a[10000]; i = 0; p0: while( i =< 10000 ) { p1: a[i] = i; i = i + 1; p2: } p3: Define a la€ce: [a,b] ⊔ [a’,b’] = [min(a,a’), max(b,b’)] [a,b] ⊓ [a’,b’] = [max(a,a’), min(b,b’)] i = [0,0] i = [0,0] ⊓ [-­‐∞,10000] = [0,0] i = [0,0] ⊔ [1,1] ⊓ [-­‐∞,10000] = [0,1] … use accelera/on to determine finite convergence i = [0,10000] i = [1,1] ⊓ [-­‐∞,10001] = [1,1] i = [1,1] ⊔ [2,2] ⊓ [-­‐∞,10001] = [1,2] … use accelera/on to determine finite convergence i = [1,10001] i = [1,10001] ⊓ [10001,+∞] = [10001,10001] Research Seminar 2011 22 Limita'ons of Sta'c Analysis What about this example? 1 2 3 void init(int *a) 4 { 5 int i; 6 7 for( i = 0; i =< 10000; i++ ); 8 a[i] = i; 9 } Research Seminar 2011 23 Model Checking Verifies every execu2on path, along upda'ng the program state 1 int n = 9999; 2 int *b = malloc(sizeof(int) * n); init(b+1); 3 void init(int *a) 4 { 5 int i; 6 7 for( i = 0; i =< 10000; i++ ); 8 a[i] = i; 9 } Research Seminar 2011 24 Model Checking S'll limited when program depends on run'me data 1 int n; n << std::in; 2 int *b = new int[n]; init(b); 3 void init(int *a) 4 { 5 int i; 6 7 for( i = 0; i =< 10000; i++ ); 8 a[i] = i; 9 } Research Seminar 2011 25 Model Checking of Parallel Code Process 1 Process 2 Model p0: while( x > 0 ) { q0: x = 0; p1: use resource q1: use resource forever (using abstract traces where x is posi?ve or 0) x = x + 1; p0,q0,x>0 } p2: sleep; p1,q0,x>0 p0,q1,x=0 Q: star'ng with x > 1, will p1 ever concurrently execute p1,q1,x>0 p2,q1,x=0 with q1? p0,q1,x>0 Q: will execu'on reach a state where x stays 0? p1,q1,x>0 Research Seminar 2011 26 What did we Learn so far? •  Compilers use sta'c analysis –  for code op'miza'on –  for compiler warnings •  Sta'c analysis has limita'ons –  complexity of the code structure –  the absence of parallel threads that change shared state –  needs “knowns” (constants) •  Model checking is more comprehensive, applies to parallel code •  Formal verifica'on is not automa'c and costly Research Seminar 2011 27 Can we do Be•er? • 
• 
• 
• 
Programming hasn’t changed since the 80s Cost of developing and maintaining code is high An industrial revolu'on is needed to scale it up Source code genera'on by automa'c programming: the genera/on of source code by computer, usually based on specifica/ons that are higher-­‐
level and easier for humans to specify than ordinary programming languages Research Seminar 2011 28 Automa'c Programming by Code Genera'on •  Specializa'on, aka lowering of code –  From high-­‐level abstract specifica'ons –  ... to intermediate high-­‐level coding forms –  … to low-­‐level Fortran, C/C++, … whatever •  How? –  By pa•ern matching and subs'tu'on, or some form of computer algebra –  By type systems (a form of theorem proving) to propagate informa'on and intelligently convert code –  By high-­‐level compiler sta'c analysis and op'miza'on Research Seminar 2011 29 Automa'c Programming by Specializa'on and Op'miza'on •  Specializa'on by pa•ern matching and code rewri'ng lowers the abstract code to more concrete forms x[j] = sum(b*a[i,j]+c, i=1..n) forall j=1..m
doall(j, 1, m, x[j] = b*reduce(a[j], 1, n)+c)
$OMP DO PRIVATE(i,t)
DO j=1,m
t = c
DO i=1,n
t = t+a[i,j]
ENDDO
x[j] = b*t
ENDDO
$OMP END DO
DO j=1,n
x[j] = c
ENDDO
DO i=1,n
DO j=1,m
x[j] = x[j]+a[i,j]
ENDDO
ENDDO
DO j=1,n
x[j] = b*x[j]
ENDDO
•  Each lowering phase requires an op'miza'on phase applicable to that code level Research Seminar 2011 30 Publica'ons Related to Automa'c Programming • 
Code generators and numerical applica'ons –  Robert van Engelen, Automa2cally Genera2ng High-­‐Performance Parallel Code for Atmospheric Simula2on Models: Challenges and Solu2ons for Auto-­‐Programming Tools, in the proceedings of the joint ASCM/MACIS conference (Math-­‐for-­‐Industry Lecture Note Series Vol. 22.), 2009, pages 395-­‐398. –  Yixin Shou and Robert A. van Engelen, Automa2c SIMD Vectoriza2on of Chains of Recurrences, in the proceedings of the ACM Interna'onal Conference on Supercompu'ng (ICS), 2008, pages 245-­‐255. –  P. van der Mark, R. van Engelen, K. Gallivan, and W. Dewar, A Case Study for Automa2c Code Genera2on on a Coupled Ocean-­‐Atmosphere Model, in the proceedings of the 2002 Interna'onal Conference on Computa'onal Science, April 21-­‐24, 2002, Amsterdam, the Netherlands. –  Robert A. van Engelen, ATMOL: A Domain-­‐Specific Language for Atmospheric Modeling, in the Journal of Compu'ng and Informa'on Technology, 2002, pages 289-­‐304. –  Robert van Engelen, Lex Wolters, and Gerard Cats, Tomorrow's Weather Forecast: Automa2c Code Genera2on for Atmospheric Modeling, IEEE Journal of Computa'onal Science and Engineering, Vol. 4, No. 3, July/September 1997, pages 22-­‐31. –  Robert van Engelen, Lex Wolters, and Gerard Cats, The Ctadel Applica2on Driver for Numerical Weather Forecast Systems, proceedings of the 15th IMACS World Congress (session: Problem Solving Environments for Scien'fic Compu'ng), (ed. A. Sydow), vol. 4, Wissenshaƒ & Technik Verlag, Berlin, Germany, 1997, pages 571-­‐576. Research Seminar 2011 31 Publica'ons Related to Automa'c Programming • 
Code generators and numerical applica'ons (cont’d) – 
– 
– 
– 
– 
– 
– 
Robert van Engelen, Lex Wolters, and Gerard Cats, PDE-­‐Oriented Language Compila2on and Op2miza2on with Ctadel for Parallel Compu2ng, proceedings of the HIPS'97 Second Interna'onal Workshop, April 1-­‐5, 1997, Geneva, Switzerland, IEEE Computer Society Press, pages 105-­‐109. Robert A. van Engelen, Ilja Heitlager, Lex Wolters, and Gerard Cats, Incorpora2ng Applica2on Dependent Informa2on in an Automa2c Code Genera2ng Environment, proceedings of the 11th ACM Interna'onal Conference on Supercompu'ng, July 7-­‐11, 1997, Vienna, Austria, pages 180-­‐187. Lex Wolters, Robert A. van Engelen, and Gerard Cats, Automa'c Code Genera'on: Ctadel, proceedings of the 17th EWGLAM and SRNWP mee'ng, LAM Newsle•er, Number 25, May 1996, pp. 137-­‐143. Robert A. van Engelen, Lex Wolters, and Gerard Cats, Mul2-­‐Pla]orm Code Genera2on for the HIRLAM Dyn Rou2ne with Ctadel, proceedings of the HIRLAM workshop on Varia'onal Data Assimila'on, February 1996, De Bilt, the Netherlands, pages 115-­‐122. Robert A. van Engelen, Lex Wolters, and Gerard Cats, Automa2c Code Genera2on for High Performance Compu2ng in Environmental Modeling, proceedings of the 1996 EUROSIM Interna'onal Conference on HPCN Challenges in Telecomp and Telecom: Parallel Simula'on of Complex Systems and Large-­‐Scale Applica'ons, June 10-­‐12, 1996, Delƒ, the Netherlands, pages 421-­‐428. Robert A. van Engelen, Lex Wolters, and Gerard Cats, The Ctadel Code Genera2on Tool for PDE-­‐based Scien2fic Applica2ons, proceedings of the Second Annual Conference of the Advanced School for Compu'ng and Imaging, June 5-­‐7 1996, Lommel, Belgium, pages 120-­‐125. Robert A. van Engelen, Lex Wolters, and Gerard Cats, Ctadel: A Generator of Mul2-­‐Pla]orm High Performance Codes for PDE-­‐based Scien2fic Applica2ons, proceedings of the 10th ACM Interna'onal Conference on Supercompu'ng, May 25-­‐28 1996, Philadelphia, USA, ACM Press, pages 86-­‐93. Research Seminar 2011 32 Publica'ons Related to Automa'c Programming • 
Code generators for XML Web services protocol implementa'ons, such as SOAP, WSDL, WS-­‐* – 
– 
– 
– 
– 
– 
– 
– 
– 
Robert A. van Engelen, A Framework for Service-­‐Oriented Compu2ng with C and C++ Web Service Components, ACM Transac'ons on Internet Technologies, Volume 8, Issue 3, Ar'cle 12, May 2008. Robert van Engelen and Wei Zhang, An Overview and Evalua2on of Web Services Security Performance Op2miza2ons, in the proceedings of the IEEE Interna'onal Conference on Web Services (ICWS), 2008, pages 137-­‐144. Robert A. van Engelen and Wei Zhang, Iden2fying Opportuni2es for Web Services Security Performance Op2miza2ons, in the proceedings of the IEEE Congress on Services (SERVICES), 2008. Wei Zhang and Robert van Engelen, A Table-­‐Driven XML Streaming Methodology for High-­‐Performance Web Services, in the proceedings of IEEE Interna'onal Conference on Web Services (ICWS), 2006, pages 197-­‐206. Robert van Engelen, Construc2ng Finite State Automata for High Performance XML Web Services, in the proceedings of the Interna'onal Symposium on Web Services (ISWS), 2004, pages 975-­‐981. Robert van Engelen, Code Genera2on Techniques for Developing Web Services for Embedded Devices, in the proceedings of the 9th ACM Symposium on Applied Compu'ng SAC, Nicosia, Cyprus, 2004, pages 854-­‐861. Robert A. van Engelen, Pushing the SOAP Envelope with Web Services for Scien2fic Compu2ng, in the proceedings of the Interna'onal Conference on Web Services (ICWS), 2003, pages 346-­‐354. Robert van Engelen, Gunjan Gupta, and Saurabh Pant, Developing Web Services for C and C++, in IEEE Internet Compu'ng Journal, March, 2003, pages 53-­‐61. Robert A. van Engelen and Kyle Gallivan, The gSOAP Toolkit for Web Services and Peer-­‐To-­‐Peer Compu2ng Networks, in the proceedings of the 2nd IEEE Interna'onal Symposium on Cluster Compu'ng and the Grid (CCGrid), 2002, pages 128-­‐135. Research Seminar 2011 33 Publica'ons Related to Sta'c Analysis • 
Array dependence tes'ng, induc'on variables and loop scalar evolu'ons (used in GCC 4.x !) – 
– 
– 
– 
– 
– 
– 
Y. Shou, R. van Engelen, and J. Birch, Flow-­‐Sensi2ve Loop-­‐Variant Variable Classifica2on in Linear Time, in the proceedings of the Interna'onal Workshop on Languages and Compilers for Parallel Compu'ng (LCPC), 2007. J. Birch, R. van Engelen, K. A. Gallivan, and Y. Shou, An Empirical Evalua2on of Chains of Recurrences for Array Dependence Tes2ng, in the proceedings of to the IEEE/ACM Interna'onal Conference on Parallel Architectures and Compila'on Techniques (PACT), 2006, pages 295-­‐304. Robert van Engelen, Johnnie Birch, Yixin Shou, Burt Walsh, and Kyle Gallivan, A Unified Framework for Nonlinear Dependence Tes2ng and Symbolic Analysis, in the proceedings of the ACM Interna'onal Conference on Supercompu'ng (ICS), 2004, pages 106-­‐115. Johnnie Birch, Robert van Engelen, and Kyle Gallivan, Value Range Analysis of Condi2onally Updated Variables and Pointers, in the proceedings of Compilers for Parallel Compu'ng (CPC), 2004, pages 265-­‐276. Robert van Engelen, Johnnie Birch, and Kyle Gallivan, Array Data Dependence Tes2ng with the Chains of Recurrences Algebra, in the proceedings of the IEEE Interna'onal Workshop on Innova've Architectures for Future Genera'on High-­‐Performance Processors and Systems (IWIA), January 2004, pages 70-­‐81. Robert A. van Engelen and Kyle A. Gallivan, An Efficient Algorithm for Pointer-­‐to-­‐Array Access Conversion for Compiling and Op2mizing DSP Applica2ons, in the proceedings of the 2001 Interna'onal Workshop on Innova've Architectures for Future Genera'on High-­‐Performance Processors and Systems (IWIA 2001), 18-­‐19 January 2001, Maui, Hawaii, pages 80-­‐89. Robert A. van Engelen, Efficient Symbolic Analysis for Op2mizing Compilers, in the proceedings of the Interna'onal Conference on Compiler Construc'on, ETAPS 2001, LNCS 2027, pages 118-­‐132 Research Seminar 2011 34 Publica'ons Related to Formal Verifica'on •  Automa'c verifica'on of code-­‐improving transforma'ons –  Robert A. van Engelen, David Whalley, and Xin Yuan, Automa2c Valida2on of Code-­‐Improving Transforma2ons on Low-­‐Level Program Representa2ons, in the Science of Computer Programming Journal, 2004, Volume 52, pages 257-­‐280. –  Robert A. van Engelen, David Whalley, and Xin Yuan, Valida2on of Code-­‐Improving Transforma2ons for Embedded Systems, in the proceedings of the 8th ACM Symposium on Applied Compu'ng SAC 2003, March 9-­‐12, 2003, Melbourne Florida, pages 684-­‐691. –  Robert A. van Engelen, D. Whalley, and X. Yuan, Automa2c Valida2on of Code-­‐Improving Transforma2ons, in ACM SIGPLAN LCTES'00 Workshop, 2000. Research Seminar 2011 35 Example: Automa'c Programming for Atmospheric Models Compute for each grid point in 3D space: Pressure p Temperature T Humidity q Wind velocity vector (u,v) PS3 cluster Research Seminar 2011 36 FORALL(i=1:n+1,j=1:m) t(i,j)=t(i,j)/h
p(i,j)=t(i+1,j)-t(i,j)
efficient serialFORALL(i=1:n,j=1:m)
and parallel computing
is an example of a
combined use of symbolic algebra techniques for commuPROGRAM P2 The optimization techniques are ilnication optimization.
REAL u(0:n+1,m,l),g(0:n+1),h,p(0:n+1,m),q(0:n+1)
REAL s(0:n+1),t(0:n+1,m),T1(1:n+1,m)
lustrated by means
of a problem example. The examples
...
CMIC$ PARALLEL ...
CMIC$ CASE
are simplified
problems derived from the documentation of
DO 2270 j = 1,m
FORALL(i=0:n+1) s(i)=s(i)+u(i,j,l)
the HIRLAM
2270 weather
CONTINUE forecast model [5] and are illustrative
CMIC$ CASE
2280 k = 1,l
for the type ofDOproblems
solved by CTADEL. Consider
FORALL(i=1:n+1) T1(i,m)=0
Numerical Code Genera'on DO 2290 j = m,2,-1
High-­‐level equa'on FORALL(i=1:n+1) T1(i,j-1)=T1(i,j)+u(i,j,k)
2290 #CONTINUE
1# 1
FORALL(i=1:n+1,j=1:m)
t(i,j)=t(i,j)+T1(i,j)
∂u
CONTINUE
dy
dz
∀(x, y) ∈ Ωx,y
p2280
=
CMIC$ END CASE
CMIC$ END
0 PARALLEL
y ∂x
CMIC$ PARALLEL ...
CMIC$ CASE
FORALL(i=1:n) q(i)=g(i) *(s(i)-s(i-1))/h
CMIC$ CASE
FORALL(i=1:n+1,j=1:m) t(i,j)=t(i,j)/h
CMIC$ END CASE
CMIC$ END PARALLEL
FORALL(i=1:n,j=1:m) p(i,j)=t(i+1,j)-t(i,j)
(1)
Tab
"
step 1:
the MasPar cod
communication
sor mesh resul
↑ are not paralle
k j→
poor, especiall
duction optimi
or not app
Figure (P1)
1. Reducing
mance compar
two processors
P0
investigated;
P
the order
of the sum
where p(x, y) and u(x, y, z) are dependent variables, x, y, z
3
p2mize are the independent variables on theTransform, domain Ω = [0,o1]
.
Discretization of Eq. (1) using finite differences yields
4. Conclusio
The interchange of the
PROGRAM P3
formed by GPAS while in
m
!
REAL u(0:n+1,m,l),g(0:n+1),h,p(0:n+1,m),q(0:n+1)
$
$
In this pape
1
REAL s(0:n+1),t(0:n+1,m)
complished by writing ad
...
(u
pi,j = CMIC$ PARALLEL
−
u
)
∀(i,
j)
∈
D(p)
i+1,j,k
i,j,k
...
erating efficien
CMIC$
CASE h
GPAS
uses
the abstract no
j=j+1
k=1
DO 2300 j = 1,m
sented
techniqu
FORALL(i=0:n+1) s(i)=s(i)+u(i,j,l)
(2)
the class of commuting
o
2300 CONTINUE
CMIC$ CASE
cient
codes
for
2310discretized
j = m,2,-1
where D(p) isDOFORALL(i=1:n+1)
the
domain or grid of p and the
Section 2.1. More specifi
t(i,j-1)=t(i,j)
a prototype
ver
DO 2320
k = 1,l
u and p fields have
effectively
become functions of the disthis class commute
if and
FORALL(i=1:n+1)
t(i,j-1)=t(i,j-1)+u(i,j,k)
CONTINUE
optimization
oo
Parallel code crete (i, j,2320
k)-grid
with domain [1, n]×[1, m]×[1, "] ⊆ ZZ 3 .
2310
CONTINUE
relationship
between
the
CMIC$ END CASE
a new
approac
END PARALLEL
Here, the CMIC$
double
integration is replaced with a double sumor (re)defined
by the
user.
CMIC$ PARALLEL ...
CMIC$ CASE
quire
expensiv
mation using the
midpoint
quadrature
formula, and the parinduce a default functiona
FORALL(i=1:n)
q(i)=g(i)
*(s(i)-s(i-1))/h
CMIC$ CASE
approach
FORALL(i=1:n+1,j=1:m)
t(i,j)=t(i,j)/h
tial derivative
is replaced by a difference
quotient assuming
ator instances
of the may
clas
CMIC$ END CASE
CMIC$ END PARALLEL
code. This
is
grid-point distance
h in the x-direction.
way, the application
order
FORALL(i=1:n,j=1:m)
p(i,j)=t(i+1,j)-t(i,j)
descriptio
Eq. (2) can be simplified
usingSeminar an algebraic
simplifier.
lowing the lem
functional
com
Research 2011 37 Step 1 (Temperature T) Research Seminar 2011 38 Step 2 (Temperature T) Research Seminar 2011 39 Step 3 (Temperature T) Research Seminar 2011 40 Step 4 CMIC$ DO ALL
DO 1120 k = 1,nlev
DO 1130 j = 1,nlat
DO 1140 i = 1,nlon
IF (k.LE.1) THEN
PDTDT(i,j,k) = (0.5*t144(i,j,2)+(1.55424620615724E-8*
. PTZ(i,j,k)*(1/(0.860825768434464*PQZ(i,j,1)+1)+0.60782469342251
. 9*(PQZ(i,j,1)/(0.860825768434464*PQZ(i,j,1)+1)))*(rdlam*(hyu(i,
. j)*t47(i,j,1)*PUZ(i,j,1)-hyu(i-1,j)*t47(i-1,j,1)*PUZ(i-1,j,1))+
. rdth*(hxv(i,j)*t49(i,j,1)*PVZ(i,j,1)-hxv(i,j-1)*t49(i,j-1,1)*PV
. Z(i,j-1,1)))+7.84806152880239E-8*(rdlam*((-t116(i,j,1))-t116(i+
. 1,j,1))+rdth*((-t120(i,j,1))-t120(i,j+1,1)))-0.285714285714286*
. ((3.92403076440119E-8*(rdlam*((-t90(i,j,1))-t90(i+1,j,1))+rdth*
. ((-t95(i,j,1))-t95(i,j+1,1)))+6.30915356075388E-8*(rdlam*((-t10
. 3(i,j,1))-t103(i+1,j,1))+rdth*((-t108(i,j,1))-t108(i,j+1,1))))/
. (0.860825768434464*PQZ(i,j,1)+1)))/(hxt(i,j)*hyt(i,j)))/(p(i,j,
. k+1)-p(i,j,k))
ELSE IF (nlev.LE.k) THEN
PDTDT(i,j,k) = (0.285714285714286*PTZ(i,j,k)*(7.84806
. 152880239E-8*((t18(i,j,nlev)+t26(i,j,nlev))*(rdlam*(hyu(i,j)*t4
. 7(i,j,nlev)*PUZ(i,j,nlev)-hyu(i-1,j)*t47(i-1,j,nlev)*PUZ(i-1,j,
. nlev))+rdth*(hxv(i,j)*t49(i,j,nlev)*PVZ(i,j,nlev)-hxv(i,j-1)*t4
. 9(i,j-1,nlev)*PVZ(i,j-1,nlev)))/(hxt(i,j)*hyt(i,j)))-t18(i,j,nl
. ev)*(PDPSDT(i,j)+t70(i,j,nlev+1)))*(1/(0.860825768434464*PQZ(i,
. j,nlev)+1)+0.607824693422519*(PQZ(i,j,nlev)/(0.860825768434464*
. PQZ(i,j,nlev)+1)))+0.5*t144(i,j,nlev)+(7.84806152880239E-8*(rdl
. am*((-t116(i,j,nlev))-t116(i+1,j,nlev))+rdth*((-t120(i,j,nlev))
. -t120(i,j+1,nlev)))-0.285714285714286*((3.92403076440119E-8*(rd
. lam*((-t90(i,j,nlev))-t90(i+1,j,nlev))+rdth*((-t95(i,j,nlev))-t
. 95(i,j+1,nlev)))+6.30915356075388E-8*(rdlam*((-t103(i,j,nlev)). t103(i+1,j,nlev))+rdth*((-t108(i,j,nlev))-t108(i,j+1,nlev))))/(
. 0.860825768434464*PQZ(i,j,nlev)+1)))/(hxt(i,j)*hyt(i,j)))/(p(i,
. j,k+1)-p(i,j,k))
ELSE
PDTDT(i,j,k) = (0.285714285714286*PTZ(i,j,k)*(7.84806
. 152880239E-8*((t18(i,j,k)+t26(i,j,k))*(rdlam*(hyu(i,j)*t47(i,j,
. k)*PUZ(i,j,k)-hyu(i-1,j)*t47(i-1,j,k)*PUZ(i-1,j,k))+rdth*(hxv(i
. ,j)*t49(i,j,k)*PVZ(i,j,k)-hxv(i,j-1)*t49(i,j-1,k)*PVZ(i,j-1,k))
. )/(hxt(i,j)*hyt(i,j)))-t18(i,j,k)*(PDPSDT(i,j)+t70(i,j,k+1)))*(
. 1/(0.860825768434464*PQZ(i,j,k)+1)+0.607824693422519*(PQZ(i,j,k
. )/(0.860825768434464*PQZ(i,j,k)+1)))+0.5*(t144(i,j,k)+t144(i,j,
. k+1))+(7.84806152880239E-8*(rdlam*((-t116(i,j,k))-t116(i+1,j,k)
. )+rdth*((-t120(i,j,k))-t120(i,j+1,k)))-0.285714285714286*((3.92
. 403076440119E-8*(rdlam*((-t90(i,j,k))-t90(i+1,j,k))+rdth*((-t95
. (i,j,k))-t95(i,j+1,k)))+6.30915356075388E-8*(rdlam*((-t103(i,j,
. k))-t103(i+1,j,k))+rdth*((-t108(i,j,k))-t108(i,j+1,k))))/(0.860
. 825768434464*PQZ(i,j,k)+1)))/(hxt(i,j)*hyt(i,j)))/(p(i,j,k+1)-p
. (i,j,k))
ENDIF
1140
CONTINUE
1130
CONTINUE
1120 CONTINUE
Research Seminar 2011 41 DYN Specifica'on Size (LOC) Compared to Fortran 77, 95, and HPF Research Seminar 2011 42 Performance Results '$!"
'$!"
'#!"
'#!"
&!"
*+,-./-0-"1/232+,"44"
5678089"
%!"
:0,02+30-"1/232+,"44"
5678089"
$!"
:0,02+30-"1/232+,"44"
;//<"/<3"5678089"
#!"
'!!"
!"#$%&#'(%
!"#$%&#'(%
'!!"
*+,-./-0-"1/232+,"44"
&!"
%!"
50,02+30-"1/232+,"44"
67(89"
$!"
50,02+30-"1/232+,"44"
://;"/;3"67(89"
#!"
!"
!"
$"()"
'%"()"
%$"()"
$"()"
Cray T3D using shared memory Research Seminar 2011 '%"()"
%$"()"
Cray T3D using Cray MPI 43 Related Work: Genera'ng Numerical Algebra Libraries with FLAME FLAME Research Seminar 2011 library code 44 Example: Automa'c Programming for XML Web Service Protocols Started out in 1999 with the idea to enhance data interoperability with XML using compilers for sta'c analysis and the use of code generators Now in 2011 a $7.3M valued product “gSOAP” by Genivia Inc soapcpp2
compiler
soapClient.cpp
soapServer.cpp
soapC.cpp
C/C++ application
Na've C/C++ data client/server API
XML serializers
schema-opt
serializer
schema-opt
deserializer
gSOAP engine
HTTP/S + TCP/IP + UDP
Plugins stats
logging
WSSE
XML data Network fabric
Research Seminar 2011 45 Requirements and Benefits •  Requirements –  High-­‐level specifica'on syntax and seman'cs, easy to write, easy to learn –  Pure code at high levels: code that is side-­‐effect free –  Implicit parallelism, expressed at a high level •  Benefits –  Saves 'me, assuming |spec| < |target code| –  Increased soƒware reliability –  Simplified maintenance (re-­‐generate code) –  Enhanced portability, using playorm-­‐specific transforma'ons –  No low-­‐level library API to learn Research Seminar 2011 46 Automa'c Programming in Industry •  Acceleo is an open source code generator for Eclipse, based on UML-­‐like concepts –  Model-­‐Driven Architecture (MDA) approach –  Meta-­‐Object Facility (MOF) originates from UML •  Maple, Mathema'ca, Matlab support code genera'on –  Design of programs by symbolic algebra from algorithm specs –  Program codes are objects to manipulate –  Matlab Simulink FPGA development •  Program “wizards” that allow programmers to design graphical user interfaces interac'vely –  Mi'gates the cost of GUI bloat by code template instan'a'ons •  OOP, C++ templates, and generic programming –  Code reuse, no algorithmic restructuring and thus inflexible Research Seminar 2011 47 Ques'ons? Research Seminar 2011 48 
Download