Document 14078309

advertisement
An Update on the s
Coconut Project
Christopher Kumar Anand
Michal Dobrogost
Maryam Moghadas
Jessica Pavlin
Yuriy Toporovskyy
Wolfram Kahl
COCONUT
COde CONstructing User Tool
• DSLs embedded in Haskell
!
strong types (detect errors in model and code)
y
a
•
d
To
• high level
•! start with objective function
y
differentiate symbolically
a
•
d
o
T ay •! simplify expressions
Tod • recognize recurrence relations
• generate code
• verification
•! SIMD & distributed parallelization (private and shared-memory)
linear-time verification
y
a
•
d
o
T
2
PP12 - Anand
a node
in a hash
each entry
is either
a basic
with respect to the real
variable
Xtable,
usingwhere
a simple
recursion
through
di↵erential, or constant) or an operation combining mult
as
The hash table is implemented using a Haskell IntMap,
Models
wrapped in classes which hide the implementation from th
suc
rea
(mp ["X"]) (norm2 (ft (x +: y)))
can even work interactively to build up an expression,
FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16])))))
FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16]))))))
transform, ft(x + iy) of a complex vector composed of
(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16])))))
FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16])))))))
define variables
> let x = var3d (16,16,16) "x"
> let y = var3d (16,16,16) "y"
> ft (x +: y)
representation
is
deceptively
long
as
it
hides
(FT((x(16,16,16)+:y(16,16,16))))
his flat
the redundan
ed by CEL internally. Di↵erentiating with respect to both real variab
define anThis
objective
minimize
DSL hasfunction
classes fortoreal
and complex vector spac
of these corresponding to regular hexahedral (cubic) discre
d real
ft(x
+
iy) 2 .X usingspaces.
tive with respecttwo-,
to the
variable
a simple
recursion
threeand
four-dimensional
For example,
th
3 , which we approximate usin
tissue
velocity
is
a
field
in
R
DAG asthe above length, but not with sharing:
d double
and
take
it’s derivative
var3d
with
specified
resolution. Another example is the
conservation-of-mass
regularizer is
>> diff (mp ["X"]) (norm2
(ft (x +: y)))
((((Re(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16])))))

3 t<
sum
3
dx
;
+
m
a
s
s
C
o
n
s
e
r
v
a
i
o
n
(
ThreeD
vx
,
ThreeD
vy
,
ThreeD
v
+ ((Re(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16]))))))
/
f
t
+i
dot
+
+<
;
;
+ (((Im(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16])))))
= norm2 (dyconv3Zip1ZM com
+ ((Im(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16])))))))
sum
where#
3x
#
3=
comdot
(vX , vY , vZ )3 f=t (vX/ +i
[1 ,0 ,0]
vX[ 1 , 0 , 0 ] )
#
+=
+y
J
sum
+ (vY [ 0 , 1 , 0 ] PP12 - Anand
vY[ 0 , 1 , 0 ] )
3
transform, ft(x + iy) of a complex vector composed of real vectors x and y:
more complicated
models
> let x = var3d (16,16,16) "x"
> let y = var3d (16,16,16) "y"
> ft (x +: y)
(FT((x(16,16,16)+:y(16,16,16))))
This DSL has classes for real and complex vector spaces, and instances
of these corresponding to regular hexahedral (cubic) discretizations of one-,
two-, threeandconservation
four-dimensional spaces.
For example,
the x component of
e.g.,
of
mass
in
material
tissue velocity is a field in R3 , which we approximate using a discretization
transport
models Another
is a function
var3d with
specified resolution.
example is the problem-specific
conservation-of-mass regularizer is
•
m a s s C o n s e r v a t i o n ( ThreeD vx , ThreeD vy , ThreeD vz )
= norm2 ( conv3Zip1ZM com vx vy vz )
where
com (vX , vY , vZ ) = (vX [ 1 , 0 , 0 ]
vX[ 1 , 0 , 0 ] )
+ (vY [ 0 , 1 , 0 ]
vY[ 0 , 1 , 0 ] )
+ ( vZ [ 0 , 0 , 1 ]
vZ [ 0 , 0 , 1 ] )
4
where conv3ZipZM constructs a stencil computation with assumed zero margins for out-of-bounds references, norm2 is the norm squared, and we use
familiar array subscript notation for relative indexing of neighbouring values for the three velocity components.
PP12 - Anand
what was that d(X),
and where’s ∇f?
• d(X) is a vector of differential forms
• comes from implicit derivative
• we can extract ∇f by simplifying
5
PP12 - Anand
product is only defined for real values, some linear operations do not have
simple adjoints and must be replaced by sums of real and imaginary parts,
etc. In the following example, we look at the pop/push operations after the
other simplifications have finished, and use the operator (+i0) to mean the
embedding of a real value into a complex value by adding a zero imaginary
part. This is a simplification of the actual procedure, which interleaves the
application of di↵erent rules, and also recombines the real and imaginary
parts after the pop/push operations to avoid multiple applications of the
Fourier Transform.
For the real/x part of the term graph (5), the pop/push proceeds as
simplify rules
• commute d(x) to LHS of dot
• move linear operators to RHS via adjoint
Left "
<
ft
(+i0)
dx
Right
#
<
ft
(+i0)
x
·
Left "
(+i0)
dx
·
6
Right
#
ft 1
(+i0)
<
ft
(+i0)
x
7!
Left "
ft
(+i0)
dx
Right
#
(+i0)
<
ft
(+i0)
x
·
Left "
dx
7!
·
Right
#
<
ft 1
(+i0)
<
ft
(+i0)
x
7!
PP12 - Anand
simplification DSL
up 4500 lines of literate Haskell, kept manageable (and readable) by the de
velopment of a second DSL for term graph rewriting, with two components
pattern matching and construction. We do not have room to adequately
00
lines
of
literate
Haskell,
kept
manageable
(and
reada
describe this language, but we can convey its flavour with two examples:
ment of a second DSL for term graph rewriting, with tw
| Just ( x ) < (M. in vF t ‘ o ‘ M. f t ) e x p r s node
= x exprs
n | matching
construction.
not have
Just ( x , ) and
< (M.
xRe ‘ o ‘ M. reIm )We
e x pdo
r s node
= x e xroom
prs
be this language, but we
can convey its flavour with tw
1
which perform rewrites ft ft (x) 7! x and <(x + iy) 7! x. The matchMatch
inverse
FT
composed
with
FT
of
x
ng
module
is
imported
qualified
(i.e.,
M.*
)
with
each
function
matching
ust ( x ) < (M. in vF t ‘ o ‘ M. f t ) e x p r s node
an operation replace
and returning
constructor(s)
for source nodes (i.e., x). Pat
with
x
ust ( x , ) < (M. xRe ‘ o ‘ M. reIm ) e x p r s node
•
•
ern matchers can be chained together using composition syntax (i.e., ‘o‘)
and alternative matches are chained
1 together using Haskell pattern guards
perform
rewrites
ft
ft
(x)
!
7
x
and
<(x
+
iy)
!
7
i.e., |). Composition is designed to have syntax very close to that of the
odule isDSL,
imported
qualified
(i.e.,
M.*
) withCurrently,
each fun
modelling
although the
semantics are
subtly
di↵erent.
com
plicated
and
conditional
term
rewriting
must
access
the
underlying
Haskel
eration and returning constructor(s) for source nodes
data structures, but we are working on improving the expressivity of the
matchers
can be chained together using composition
syn
PP12 - Anand
7
matching
calculus.
Type Checking
(|| work)
• with stronger typing, compilers find
more mistakes
• in C, all dynamic arrays (float*) look
alike
• others check size and dimension
• we can do better
8
PP12 - Anand
Arrays of Samples
2.5
2
1.5
1
0.5
0
9
0
2
4
6
8
10
12
PP12 - Anand
Arrays of Samples
• number of samples
• dimension
• frame of reference
• resolution (including units)
• units of measurement
10
PP12 - Anand
Formalization
Haskell types define
• inphysical
units
Figure 3. Sampling of height of water in
•
•r array
Disc
e t i z asize
t i oand
n 1 Ddimension
[1 ,1.2 ,1.3 ,1.12 ,1.23 ,1.12 ,1.15 ,1.
• frame of reference
canalSample2
::
D i s c r e t i z a t i o n 1 D (F ” CanalFrame ” )
(NAT 1 2 )
[ f l o a t | 0.02 | ]
Meter
[ Double ]
canalSample2 =
Discretization1D
11
[1 ,1.21 ,1.2 ,1.42 ,1.3 ,1.32 ,1.12 ,1.
Trying to add those two samplings to each other
PP12 - Anand
2.5
Discretization1D
2
canalSample2
1.5
1
::
D i s c r e t i z a t i o n 1 D (F ” CanalFrame ” )
(NAT 1 2 )
[ f l o a t | 0.02 | ]
Meter
[ Double ]
canalSample2 =
Discretization1D
0.5
[1 ,1.2 ,1.3 ,1.12 ,1.23 ,1.12 ,1.15 ,1.25 ,1.18 ,1.20 ,1
• array sizes match, so
try to add them …
[1 ,1.21 ,1.2 ,1.42 ,1.3 ,1.32 ,1.12 ,1.25 ,1.23 ,1.20 ,1
Trying to add those two samplings to each other
> c a n a l S a m p l e 1 U.+ c a n a l S a m p l e 2
0
0
2
4
will
time
error:
6 cause
8 a compile
10
12
No instance f o r
( Add
( Discretization1D
(F ” CanalFrame ” )
(NAT 1 2 )
(FLOAT ’ Pos 1 ( ’ E
( S I U n i t ( ’M 1 ) ( ’ S
[ Double ] )
( Discretization1D
(F ” CanalFrame ” )
(NAT 1 2 )
(FLOAT ’ Pos 2 ( ’ E
( S I U n i t ( ’M 1 ) ( ’ S
[ Double ] )
12
2) )
0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) )
2) )
0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) )
The error message indicates that the discretizations have di↵erent types a
- Anand
them is not a valid operation. Ideally, we would PP12
prefer
the error messa
Type Inference Across
Linear Operations
• Fourier Transforms are very picky
• Nyquist Sampling Theorem
• a theorem about how you can
• frame of reference
• resolution (including units)
• units of measurement
13
PP12 - Anand
–
[e1 ⇤ f2 ]h [e1 ⇤ f2 ]l
–
[ Double ] )
–
–
[e2 ⇤ f0 ]h [e2 ⇤ f0 ]l
The error message
indicates
the
–
[e2 ⇤ f1 that
]h [e
f1 ] l
– have
2 ⇤discretizations
them is not
[e2 a⇤ fvalid
[e2 ⇤ f2 ]l Ideally,
– we would
– prefer
2 ]h operation.
the sample spacing needs to be identical but it gwould
no
2
expert
to
understand
this
error.
For the first digit of result ‘g0 ’, there is just one term
As
we
mentioned,
another
powerful
feature
in
our
ty
low Properties
digit of ‘e0 ⇥
f0 ’,Classes
and this can be implemented using
are
which
is
defined
by:
digit of result ‘g1 ’ is obtained by adding the high digit
c l a s s FT a b | a ! b , b ! a where
‘e0 ⇥ f1 ’, and the low digit of ‘e1 ⇥ f0 ’ which can be im
ft :: a ! b
class.
third
i n v F tThe
:: b
! adigit is the result of adding five di↵e
implemented using Add5 class. Continuing in this way,
To implement an instance, we assert the preconditions
c l a s s MultD3
f2 f1 f0
e2 e1 e0
g2 g1 g0 |
constructor:
f 2 f 1 f 0 e2 e1 e0 ! g2 ,
14,
f 2 f 1 f 0 e2 e1 e0 ! g1
Classy Proofs
•
instance (
14
f 2 f 1 f 0 e2 e1 e0 ! g0 where
Times f 0 e0 p00h p 0 0 l , Times f 1 e0 p10
Times f 2 e0 D0 p 2 0 l , Times f 0 e1 p01h
Times f 1 e1 D0 p 1 1 l , Times f 2 e1 D0 D0
PP12 - Anand
Times f 0 e2 D0 p 0 2 l , Times
f 1 e2 D0 D0
0
1 h
0
1 l
–
–
[e0 ⇤ f2 ]h [e0 ⇤ f2 ]l
–
–
–
–
[e1 ⇤ f0 ]h [e1 ⇤ f0 ]l
–
–
–
[e1 ⇤ f1 ]h [e1 ⇤ f1 ]l
–
–
–
[e1 ⇤ f2 ]h [e1 ⇤ f2 ]l
–
–
–
–
–
[e2 ⇤ f0 ]h [e2 ⇤ f0 ]l
–
–
–
[e2 ⇤ fProblems
[e2 ⇤ f1Moghadas,
]l
–Toporovskyy–& Anand –
1 ]h
Type-Safety for Inverse
Imaging
[e2 ⇤ f2 ]h [e2 ⇤ f2 ]l
–
–
–
–
instance (
g
g1
g0
A s s e r t D u a l F r a m e s f r a m e 1 frame2 , Frame frame1 , Frame2 frame2 ,
l o a tfirst
s t e p digit
S i z e 2 , of
IsF
l o a t s t‘g
e p S’,i z there
e 1 , ToFloat
⇠ numSampF
ForI s Fthe
result
is justnumSamp
one term
to be, counted, which is th
0
anddigit
encode
criterion:
low
ofthe
‘e0Nyquist
⇥ f0 ’, and
this can be implemented using the Times class. The secon
MultNZ
s t e p S i z‘g
e 1 ’numSampF
t0 ,
digit
of
result
is
obtained
by adding the high digit of ‘e0 ⇥ f0 ’, the low digit o
1
MultNZ t 0 s t e p S i z e 2 t1 ,
1 ⇠
1 (E
‘e0t ⇥
f1FLOAT
’, andPosthe
low0 ) digit of ‘e1 ⇥ f0 ’ which can be implemented using the ’Add3
class.
) ) The third digit is the result of adding five di↵erent known terms which
FT ( D i s c r e t i z using
a t i o n 1 DAdd5
f r a m class.
e 1 numSamp
s t e p S i z e 1 in
rangeU
Complex
implemented
Continuing
this [way,
weDouble
arrive] ) at the definition
Proofs are Instances
( D i s c r e t i z a t i o n 1 D f r a m e 2 numSamp s t e p S i z e 2 rangeU [ Complex Double ] )
c l a s s where
MultD3
f2 f1 f0
e2 e1 e0
g2 g1 g0 |
f t ( D i sfc2r e tfi1
z a tfi 0
o n 1e2
D xe1
) = e0
( D i!
s c r eg2
t i z,a t i o n 1 D $ FFT . f f t x )
i n v F t ( D i s c r e t i z a t i o n 1 D x ) = ( D i s c r e t i z a t i o n 1 D $ FFT . i f f t x )
f 2 f 1 f 0 e2 e1 e0 ! g1 ,
f 2 from
f 1 the
f 0 pure
e2 e1
e0 ! g0
where ↵t and i↵t come
↵t package.
Twowhere
parameters of that type class
instance
(
Times
f 0 e0and
p00h
p00
l , Times
f 1 e0 p10h type
p 1 0 classes
l,
represent
a discretized
function
its FT.
Using
the multi-parameter
Times dependency,
f 2 e0 D0 it
p 2is0 lpossible
, Times
f 0 e1
1 l ,FT of a
together with functional
to infer
thep01h
type ofp 0the
Times
f 1 e1the
D0type
p 1 1ofl ,theTimes
f 2 e1
D0 D0itself
,
discretized function
if it knows
discretized
function
and vice
Times
f 0 e2 there
D0 p are
0 2 l some
, Times
f 1 e2constraints
D0 D0 , on the two
versa. In the instance
definition,
important
f 2 e2 D0 D0 ,
parameters of the Times
FT class which are related to FT properties. First, the number of
Add3 p00h p 1 0 l p 0 1 l c1h c 1 l ,
samples in those two parameters should be the same, which is encoded by clarifying
Add5 p 2 0 l p01h p 1 1 l p 0 2 l c1h D0 c 2 l )
the same numSamp
for those discretized parameters. Second, the discretized functions
) MultD3 f 2 f 1 f 0 e2 e1 e0 c 2 l c 1 l p 0 0 l where
15
(class parameters) are related to two dual frames, and this duality must be asserted
byWe
theimplemented
user. The most arithmetic
subtle constraint
is encoded
two instances
of thein
MultNZ
for threeandvia
ten-digit
numbers
this way and define
class.
This class
constraint
that size-specific
the product of multiply
the first two
parameters
a MultD
suchspecifies
that each
would
be anshould
instance of this clas
be equal to the third parameter. To allow for type inferencing, we must also
PP12 assert
- Anand
In addition, we defined an special case:
The data type T is not exported, so a user can never pass it to any
type functions and the constraint can never be satisfied. This is neces
closed type families must have at least one clause.
The type Assert is a useful synonym: if the condition is true, Asser
Otherwise it is equal to err , a constraint which is never satisfied.
Readable Errors
type A s s e r t b o o l e r r = I f
b o o l Always e r r
Now we redefine our original example:
• type inference fails on modelling errors
• ghc loves to throw up thousand-line errors
• we tamed it
type A s s e r t D u a l U n i t s a b = A s s e r t ( a : ⇤ : b ⌘ ? U n i t l e s s ) ( D u a l U
c l a s s ( Frame a , Frame b , A s s e r t D u a l U n i t s ( B ase Un i t a ) ( Ba seU n it
) ) AssertDualFrames a b | a ! b , b ! a
instance A s s e r t D u a l F r a m e s (F ”BadFrame1” ) (F ”BadFrame0” ) where
which now produces the error:
No instance f o r ( D u a l U n i t s
( S I U n i t ( ’M 1 ) ( ’ S 0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) )
( S I U n i t ( ’M 0 ) ( ’ S 0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) ) )
16
The type error now indicates what types have caused the error. It
user that conceptually they have declared dual frames which physical
be dual, since their units are not dual.
PP12 - Anand
Multi-Core =
ILP Reinvented
Instruction Level
Parallelism
CPU
Execution Unit
Load/Store Instruction
Arithmetic Instruction
Register
Multi-Core
Parallelism
Chip
Core
DMA
Computational
Kernel
Buffer / Signal
PP12 - Anand
The Catch: Soundness
• on CPUs hardware maintains OOE
• instructions execute out of order
• hardware hides this from software
• ensures order independence
• in our Multi-Core virtual CPU
• compiler inserts synchronization
• soundness up to software
• uses asynchronous communication
18
PP12 - Anand
Asynchronous
• no locks
• locking is a multi-way operation
• a lock is only local to one core
• incurs long, unpredictable delays
• use asynchronous messages
• matches efficient hardware
19
PP12 - Anand
Reorder
Window
No writes to
buffer until
DMA
completion
is confirmed
.
.
.
other
operations
.
.
.
x
1
.
.
.
other
operations
.
.
.
No reads or
writes to buffer
until past
barrier WaitData
1
Reorder
Window
WaitSignal
SendData
Hazard
Async Signals
SendSignal
WaitDMA
WaitData
PP12 - Anand
Multi-Core Language
AtomicVerifiableOperations
AVOps
do a computation with local
Computation operation bufferList
data
start DMA to send local data
SendData localBuffer remoteBuffer tags
off core
wait for arrival of DMAed
WaitData localBuffer tag
data
wait for locally controlled
WaitDMA tag
DMA to complete
SendSignal core signal
send a signal to distant core
WaitSignal signal
wait for signal to arrive
Loop n π body
body; π(body); π(π(body))…
PP12 - Anand
locally Sequential
Program
Nested Code Graphs for Multi-Core Parallelism
index
1
2
3
4
5
6
•
core 1
SendSignal s
c2
core 2
long computation
WaitSignal s
computation
WaitSignal s
23
core 3
SendSignal s
c2
for instructions
emember thattotal
each order
core executes
independently of the other cores, except
here explicit wait
instructions
block
easier
to think
inexecution
order until some kind of commucation (signal, change in data tag, DMA) is confirmed to have completed.
send
wait(s)
herefore, in this
caseprecedes
the most likely
instruction completion order has core 3
xecuting
the SendSignal as soon as it is queued, allowing the PP12
signal
to be sent
- Anand
22
•
•
Remember that each core executes independently of the other cores, except
where explicit wait instructions block execution until some kind of communication (signal, change in data tag, DMA) is confirmed to have completed.
Therefore, in this case the most likely instruction completion order has core 3
executing the SendSignal as soon as it is queued, allowing the signal to be sent
before core 2 has received the core 1’s signal and cleared the signal hardware:
NOT sequential
index
2
5
1
3
4
6
core 1
SendSignal s
c2
core 2
core 3
SendSignal s
second signal overlaps the first, only one registered
long computation
WaitSignal s
computation
no signal is sent, so the next WaitSignal blocks
WaitSignal s
c2
To be precise, completion of the SendSignal means that the signal has been
can
execute
out of
order
initiated by the
sender,
and reception
may
be delayed, so the signal from core
3 could even arrive before the signal from core 1. In either case, neither signal
will arrive after the first WaitSignal, so the second WaitSignal will wait forever,
PP12 - Anand
23and this program execution will not terminate.
•
initiated by the sender, and reception may be delayed, so the signal from core
3 could even arrive before the signal from core 1. In either case, neither signal
will arrive after the first WaitSignal, so the second WaitSignal will wait forever,
and this program execution will not terminate.
The problem is caused because there are no signals or data transmissions
enforcing completion of instruction 5 to follow completion of instruction 3.
This example, when considered as part of a longer program, also demonstrates a possible safety violation with the valid completion order:
does NOT imply
order independent
index
1
5
3
core 1
core 2
long computation
WaitSignal s
computation
using
wrong assumptions
4
2
6
24
SendSignal s
c2
core 3
SendSignal s
WaitSignal s
PP12 - Anand
c2
Linear-Time Verification
• must show
• results are independent of execution order
• no deadlocks
• need to keep track of all possible states
• linear in time = one-pass verifier
• constant space
• = max possible states at each instruction
25
PP12 - Anand
Proof State
• state of buffers (valid, waiting for DMA, ...)
• active signals
• Φollows map Φ
©n(c1,c2)
n
c1
c2
c
last instruction on core 1 known to
• records
complete before the last instruction on core 2
completing before instruction n
26
PP12 - Anand
Algorithm
• maintain the state one instruction at a time
• flag indeterminate states as errors
Proof
that any indeterminacy and/or deadlock
• show
would have been flagged
27
PP12 - Anand
Impact
• no parallel debugging !!
optimization trick used for ILP can be
• every
adapted
28
PP12 - Anand
Loops (Simple Repetition)
• locally sequential after 2 iterations
locally sequential
• no hazards after 2 iterations
no hazards
• proof: Φ... are periodic after 1st iteration
29
PP12 - Anand
Loops (With Rewriting)
• rewriting = permutation per iteration
• if rewritten Φ... repeats a state
& locally sequential
locally sequential
• if rewritten Φ... repeats a state,
& hazard free
hazard free
• proof: Φ... deja vu (all over again)
30
PP12 - Anand
But…
• assumes you have signals
• what about shared memory?
• still lock-free synchronization?
• let’s try it on x86
31
PP12 - Anand
Single-Reader, SingleWriter AVOp Ring Buffers
(SRSWARB)
%&' ( !" #"$
CDPW14 - Anand
Limit Hazards
• AVOps can only conflict if they can fit in
the Ring Buffers at the same time
• on each core, AVOps are sequential,
therefore safe
• reads on different cores are safe
• while write AVOp is on a core, check
that no other core reads or writes
33
PP12 - Anand
Nest Step
• signals across nodes
• SRSWARB system for multicore
• new system for GPU
34
PP12 - Anand
i) Peer
Reviewed
after
low-frequency corrections, k-space data is usually conjugate-symmetric. The innovation
a) Books
is in novel segmentations of k-space.
Details ...
Christopher
Anand, Paul Baird, Eric Loubeau, John Wood, editors, Harmonic morphisms of
ii)[1]
Not
Peer Reviewed
metric graphs, in Harmonic morphisms, harmonic maps and related topics, Pitman (2000).
e) Proceedings articles
b) Contributions to Books
[52] Christopher Anand, Jacques Carette, Alexandre Korobkine, Target Detection using Maple Code
[2] Generation,
Michal Dobrogost,
Christopher
Anand,
Wolfram
Kahl, Verified
Multicore
Parallelism
using
Proceedings
of the Maple
Summer
Workshop
July 11–14,
Waterloo,
Ontario (2004)
Atomic
13
pages.Verifiable Operations, accepted for publication in Multicore Technology: Architecture, Reconfiguration and Modeling, Muhammad Yasir Qadri and Stephen J. Sangwine (eds), CRC Press.,
2013, 107–151.
vi) Unpublished Documents
[3] Christopher Kumar Anand, Wolfram Kahl, Synthesising and Verifying Multi-Core Parallelism
a) Technical
Reports
(18 publications)
in Categories
of Nested
Code Graphs; “Process Algebra for Parallel and Distributed Processing
(AlgebraicSharma,
Languages
in Specification-Based
Software
Development)”,
eds. Michael
Alexander
[53] Anuroop
Christopher
Kumar Anand,
A Domain-Specific
Architecture
for Elementary
and William
Gardner,CAS-14-06-CA.
Chapman and Hall/CRC, 2008, 3–45.
Function
Evaluation,
[4] Jessica
Christopher
Anand,and
Harmonic
morphisms
metricSymbolic
graphs, in
Anand, Baird,
Loubeau,
Wood
[54]
L M Pavlin
Christopher
Kumar of
Anand,
Generation
of Parallel
Solvers
for
(eds.), Harmonic
morphisms,
harmonic maps and related topics, (2000) 109–112.
Inverse
Imaging Problems,
CAS-14-05-CA.
[55]
Maryam
Moghadas,
Yuriyy Toporovskyy, Christopher Kumar Anand, Type-Safety for Inverse
c) Journal
Articles
(23 publications)
Imaging Problems, CAS-14-04-CA.
[5] Kevin Browne and Christopher Anand and Elizabeth Gosse, Gamification and serious game
[56] Christopher
Kumar
Anuroop
Sharma:
Unified Computing,
Tables for Exponential
and Logaapproaches for
adultAnand
literacyand
tablet
software,
Entertainment
5 (2014) 135–146.
rithm
Families, AdvOL2009/02. (Same as [11].)
dx.doi.org/10.1016/j.entcom.2014.04.003
35
[57]
Kumar
Anand,
Wolfram
Kahl, “Synthesising
and
VerifyingNie
Multi-Core
Parallelism
[6] Christopher K
Anand,
Alex D
Bain, Andrew
Thomas Curtis,
Zhenghua
Designing
Optimal
in
Categories
of Nested
Graphs”, Large-Scale,
SQRL ReportNonlinear
50, 2008. Earlier
version J.
ofMagn.
chapterReson.
[3].
Universal
Pulses
Using Code
Second-Order,
Optimization,
219 (2012) 61–74. http://dx.doi.org/10.1016/j.jmr.2012.04.004
[58] Jacques Carette, Spencer Smith, John McCutchan, Christopher Anand, Alexandre Korobkine,
Manipulation
as Part of
a Better
Process for
Scientific
Computing
Code,
[7] Model
Kevin Browne,
Christopher
Anand,
AnDevelopment
empirical evaluation
of user
interfaces
for a mobile
PP12 - Anand
It Works !
• used to generate, from the objective
function, a multi-core (shared memory)
image reconstruction software for
parallel Magnetic Resonance Imaging
for AllTech Medical Systems America
36
PP12 - Anand
Thanks
Stephen Adams
Curtis d’Alves
Kevin Browne
Shiqi Cao
Nathan Cumpson
Saeed Jahed
Damith Karunaratne
Clayton Goes
Gabriel Grant
William Hua
Fletcher Johnson
Wei Li
Nick Mansfield
Maryam Moghadas
Mehrdad Mozafari
Adam Schulz
Anuroop Sharma
Sanvesh Srivastava
Wolfgang Thaller
Gordon Uszkay
Christopher Venantius
Paul Vrbik
Fei Zhao
Robert Enenkel
IBM Centre for Advanced Studies, CFI, OIT, MITACS,
NSERC, OCA Inc. and Apple Canada for research support.
∞
37
PP12 - Anand
Download