Geometry of Arrays: Mathematics of Arrays and y-calculus Lenore R. Mullin Computer Science Department College of Computing and Information University at Albany, State University of New York October 14-15, 2005 Conformal Computing 2005 1 Overview • Mathematics of Arrays (MoA): an algebraic system for dealing with arbitrary multidimensional arrays • y-calculus: reducing a complicated set of array operations to direct indexing of the original array • Elimination of temporary arrays: significant enhancement in speed and reduction of memory requirements • Example: block decomposition “lifts” (raises) the dimension of the a array October 14-15, 2005 Conformal Computing 2005 2 “Reshape” Operator 0 1 2 3 4 5 6 7 A 8 9 10 11 12 13 14 15 Becomes 4-dimensional October 14-15, 2005 2-dimensional ˆA A 2222 0 2 A 4 6 Conformal Computing 2005 18 9 310 11 512 13 714 15 3 Block Decomposition 0 1 2 3 4 5 6 7 A 8 9 10 11 12 13 14 15 2-dimensional Viewed as 4-dimensional 0 4 A 0 2 1 3 A 2 6 October 14-15, 2005 Conformal Computing 2005 18 9 512 13 310 11 714 15 4 Array “shapes” • Shape operator: returns a vector containing the lengths of each dimension • First vector (two-dimensional): A 4 4 • Second vector (four-dimensional): A 2 2 2 2 October 14-15, 2005 Conformal Computing 2005 5 “Transpose” operator • Transpose operator permutes the 0 18 9 dimensions 2 310 11 A 4 512 13 6 714 15 0 4 A 0 2 1 3 A 2 6 October 14-15, 2005 18 9 512 13 310 11 714 15 Conformal Computing 2005 6 “y-operator” • The yoperator extracts a component of the array • Full index extracts an element: 11 y A 0101 A 0011 A 5 • Partial index extracts a sub array: 0 1 00 y A 4 5 October 14-15, 2005 Conformal Computing 2005 7 “Reshape” operator • The process of “lifting” the dimension is ˆ carried out with the “reshape” operator • We write: ˆA A 2222 • And: ˆ 2 2222 (4) October 14-15, 2005 Conformal Computing 2005 8 “Hypercube” Representation • The arrays A and A are examples of “hypercubes” • Often array operations simplify in the hypercube representation • In a hypercube: all dimensions have length 2 • For an arbitrary array B we write: B 222 October 14-15, 2005 ˆB 2 Conformal Computing 2005 9 “Product” operator • The product operator preturns an integer equal to the product of the elements of a vector • The total number of elements in A is: p ( A) p 4 4 16 • Number of hypercube dimensions: log 2 (p (A)) 4 October 14-15, 2005 Conformal Computing 2005 10 Composing operations • The hypercube representation of A is thus written: A log pA ˆ 2 ˆA 2 • Likewise: ˆ 2 ˆA A 0213 log 2 pA • Lastly: 00 y 0213 log 2 pA ˆ 2 ˆA October 14-15, 2005 0 1 4 5 Conformal Computing 2005 11 Big Picture: y-calculus • Multiple operations acting on original array (previous slide) • Each operation defined in terms of its action on the indices of the array • By applying all operator definitions we “yreduce” the composite operations • Final result: prescription (operational normal form (ONF)) for indexing operations on original array October 14-15, 2005 Conformal Computing 2005 12 y-reduction • By selecting the i j ‘th component of the twodimensional array: ˆ 2 ˆA 00 y 0213 log 2 pA • We obtain the ONF: for all 0<=i<2,0<=j<2 a[ ( 0 i 0 j ; 2222 ] (2^0)+(2^0)+(2^0) (2^0)=0, (2^0)+(2^0)+(2^0) (2^1)=1, (2^0)+(2^1)+(2^0) (2^0)=4, (2^0)+(2^1)+(2^0) (2^1)=5 NOTE: a is the row major layout of A. October 14-15, 2005 Conformal Computing 2005 13 Temporal Index Composition • Usual methods: Pipeline multiple temporal operations • LU followed by QR for example • Decompose LU • Map to Processors • Execute • Start the pipeline • Set up for QR • Decompose • Map to processors • Execute • Feed pipeline October 14-15, 2005 Conformal Computing 2005 14 Temporal Index Composition LU Decomposition 3 2 3 2 1 3 1 2 3 2 T=0 October 14-15, 2005 T=1 Conformal Computing 2005 T=21 15 LU to QR(or anything else) • Think of it as indexing • Temporal index = 012 • Then again with QR: temporal index = 0 1 2 • Then again with something else: index = 0 1 2 • Formulate temporal index as an array index like processors or anything else • Consequently temporal indices can be Transposed to produce 000 111 222 October 14-15, 2005 Conformal Computing 2005 16 3 3 T=0 3 3 2 T=1 2 2 2 T=2 1 October 14-15, 2005 1 Conformal Computing 2005 1 1 17 Temporal Index Composition • Without transpose: Data moves around connecting networks. • Multiple decompositions and communication • Filling and emptying pipeline causes delays • With transpose: • One decomposition • Mimimal data flow • Highest Performance October 14-15, 2005 Conformal Computing 2005 18 Impact: efficiency and programming ease • Efficiency: through direct indexing access we eliminate temporary arrays allowing BIGGER PROBLEMS • Minimizing operations: significant speed enhancement • Programming ease: ONF directly translated into code (no thinking or hackery) • Recent FFT cache optimization: with design (ONF) in hand…code built in ONE DAY and it WORKED!! October 14-15, 2005 Conformal Computing 2005 19 Mathematical Reasoning • Same mathematics used to describe the problem as is used to describe the organization of the machine • Machine: decomposition over levels of memory, processors, networks, FPGA’s in terms of array dimensions • Provability: two designs leading to the same ONF are equivalent! October 14-15, 2005 Conformal Computing 2005 20 Success to date • This approach applied to many operations ubiquitous across science and engineering disciplines – FFT, QR decomposition, LU decomposition – Digital signal processing: RADAR (MIT Lincoln Labs) • Most recent: FFT cache optimization factor of FOUR speedup over previous records (next talk) October 14-15, 2005 Conformal Computing 2005 21 Ongoing Research • General-radix, multiple-dimension FFT optimized over cache and processors • Application-specific hardware: programming hybrid software/hardware (FPGA) calculations • Grid computing • Large-scale materials simulations with densityfunctional theory • Quantum computing and quantum algorithms (Density Matrix naturally expressed as a hypercube and qubits as indices(talk follows). October 14-15, 2005 Conformal Computing 2005 22