Geometry of Arrays - University at Albany

advertisement
Geometry of Arrays:
Mathematics of Arrays and
y-calculus
Lenore R. Mullin
Computer Science Department
College of Computing and Information
University at Albany, State University of
New York
October 14-15, 2005
Conformal Computing 2005
1
Overview
• Mathematics of Arrays (MoA): an algebraic
system for dealing with arbitrary multidimensional arrays
• y-calculus: reducing a complicated set of
array operations to direct indexing of the
original array
• Elimination of temporary arrays: significant
enhancement in speed and reduction of
memory requirements
• Example: block decomposition “lifts” (raises)
the dimension of the a array
October 14-15, 2005
Conformal Computing 2005
2
“Reshape” Operator
0 1 2 3 


4
5
6
7

A  
8 9 10 11


12 13 14 15
Becomes

4-dimensional
October 14-15, 2005
2-dimensional
ˆA
A 2222  
0

2


A
4

6
Conformal Computing 2005
18 9 


310 11
512 13


714 15
3
Block Decomposition
0 1 2 3 


4
5
6
7

A  
8 9 10 11


12 13 14 15
2-dimensional
Viewed as
4-dimensional
0

4


A 0 2 1 3  A
2

6
October 14-15, 2005
Conformal Computing 2005
18 9 


512 13
310 11


714 15
4
Array “shapes”
• Shape operator: returns a vector
containing the lengths of each dimension
• First vector (two-dimensional):
A  4 4
• Second vector (four-dimensional):
A 2 2 2 2
October 14-15, 2005
Conformal Computing 2005
5
“Transpose” operator
• Transpose operator permutes the
0 18 9 
dimensions



2 310 11


A
4 512 13



6 714 15
0

4


A 0 2 1 3  A
2




6
October 14-15, 2005
18 9 


512 13
310 11


714 15
Conformal Computing 2005
6
“y-operator”
• The yoperator extracts a component of the
array
• Full index extracts an element:
11 y A  0101 A 0011 A 5
• Partial index extracts a sub array:
0 1
00 y A 

4 5

October 14-15, 2005
Conformal Computing 2005
7
“Reshape” operator
• The process of “lifting” the dimension is
ˆ
carried out with the “reshape” operator 
• We write:
ˆA
A 2222 
• And:

ˆ 2
2222  (4) 

October 14-15, 2005
Conformal Computing 2005
8
“Hypercube” Representation
• The arrays A and A are examples of
“hypercubes”
• Often array operations simplify in the
hypercube representation

• 
In a hypercube:
all dimensions have
length 2
• For an arbitrary array B we write:
B 222
October 14-15, 2005
ˆB
2 
Conformal Computing 2005
9
“Product” operator
• The product operator preturns an integer
equal to the product of the elements of a
vector
• The total number of elements in A is:
p ( A)  p 4 4  16
• Number of hypercube dimensions:
log 2 (p (A))  4
October 14-15, 2005
Conformal Computing 2005
10
Composing operations
• The hypercube representation of A is thus
written: A log pA
ˆ 2 
ˆA
2
• Likewise:


ˆ 2 
ˆA
A 0213  log 2 pA
• Lastly:
 00 y 0213  log 2 pA
ˆ 2 
ˆA

October 14-15, 2005
0 1
 

4 5
Conformal Computing 2005
11
Big Picture: y-calculus
• Multiple operations acting on original array
(previous slide)
• Each operation defined in terms of its action on
the indices of the array
• By applying all operator definitions we “yreduce” the composite operations
• Final result: prescription (operational normal
form (ONF)) for indexing operations on original
array
October 14-15, 2005
Conformal Computing 2005
12

y-reduction
• By selecting the i j ‘th component of the twodimensional array:
ˆ 2 
ˆA
00 y 0213  log 2 pA
• We obtain the ONF: for all 0<=i<2,0<=j<2
 a[ ( 0 i 0 j ; 2222 ] 
(2^0)+(2^0)+(2^0) (2^0)=0, (2^0)+(2^0)+(2^0) (2^1)=1,
(2^0)+(2^1)+(2^0) (2^0)=4, (2^0)+(2^1)+(2^0) (2^1)=5
NOTE: a is the row major layout of A.
October 14-15, 2005
Conformal Computing 2005
13
Temporal Index Composition
• Usual methods: Pipeline multiple temporal operations
• LU followed by QR for example
• Decompose LU
• Map to Processors
• Execute
• Start the pipeline
• Set up for QR
• Decompose
• Map to processors
• Execute
• Feed pipeline
October 14-15, 2005
Conformal Computing 2005
14
Temporal Index Composition
LU
Decomposition
3
2
3
2
1
3
1
2
3
2
T=0
October 14-15, 2005
T=1
Conformal Computing 2005
T=21
15
LU to QR(or anything else)
• Think of it as indexing
• Temporal index =
012
• Then again with QR: temporal index = 0 1 2
• Then again with something else: index = 0 1 2
• Formulate temporal index as an array
index like processors or anything else
• Consequently temporal indices can be
Transposed to produce
000
111
222
October 14-15, 2005
Conformal Computing 2005
16
3
3
T=0
3
3
2
T=1
2
2
2
T=2
1
October 14-15, 2005
1
Conformal Computing 2005
1
1
17
Temporal Index Composition
• Without transpose: Data moves around connecting
networks.
• Multiple decompositions and communication
• Filling and emptying pipeline causes delays
• With transpose:
• One decomposition
• Mimimal data flow
• Highest Performance
October 14-15, 2005
Conformal Computing 2005
18
Impact: efficiency and
programming ease
• Efficiency: through direct indexing access we
eliminate temporary arrays allowing BIGGER
PROBLEMS
• Minimizing operations: significant speed
enhancement
• Programming ease: ONF directly translated into
code (no thinking or hackery)
• Recent FFT cache optimization: with design
(ONF) in hand…code built in ONE DAY and it
WORKED!!
October 14-15, 2005
Conformal Computing 2005
19
Mathematical Reasoning
• Same mathematics used to describe the
problem as is used to describe the
organization of the machine
• Machine: decomposition over levels of
memory, processors, networks, FPGA’s in
terms of array dimensions
• Provability: two designs leading to the
same ONF are equivalent!
October 14-15, 2005
Conformal Computing 2005
20
Success to date
• This approach applied to many operations
ubiquitous across science and engineering
disciplines
– FFT, QR decomposition, LU decomposition
– Digital signal processing: RADAR (MIT Lincoln
Labs)
• Most recent: FFT cache optimization factor of
FOUR speedup over previous records (next talk)
October 14-15, 2005
Conformal Computing 2005
21
Ongoing Research
• General-radix, multiple-dimension FFT
optimized over cache and processors
• Application-specific hardware: programming
hybrid software/hardware (FPGA) calculations
• Grid computing
• Large-scale materials simulations with densityfunctional theory
• Quantum computing and quantum algorithms
(Density Matrix naturally expressed as a
hypercube and qubits as indices(talk follows).
October 14-15, 2005
Conformal Computing 2005
22
Download