An Update on the s Coconut Project Christopher Kumar Anand Michal Dobrogost Maryam Moghadas Jessica Pavlin Yuriy Toporovskyy Wolfram Kahl COCONUT COde CONstructing User Tool • DSLs embedded in Haskell ! strong types (detect errors in model and code) y a • d To • high level •! start with objective function y differentiate symbolically a • d o T ay •! simplify expressions Tod • recognize recurrence relations • generate code • verification •! SIMD & distributed parallelization (private and shared-memory) linear-time verification y a • d o T 2 PP12 - Anand a node in a hash each entry is either a basic with respect to the real variable Xtable, usingwhere a simple recursion through di↵erential, or constant) or an operation combining mult as The hash table is implemented using a Haskell IntMap, Models wrapped in classes which hide the implementation from th suc rea (mp ["X"]) (norm2 (ft (x +: y))) can even work interactively to build up an expression, FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16]))))) FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16])))))) transform, ft(x + iy) of a complex vector composed of (FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16]))))) FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16]))))))) define variables > let x = var3d (16,16,16) "x" > let y = var3d (16,16,16) "y" > ft (x +: y) representation is deceptively long as it hides (FT((x(16,16,16)+:y(16,16,16)))) his flat the redundan ed by CEL internally. Di↵erentiating with respect to both real variab define anThis objective minimize DSL hasfunction classes fortoreal and complex vector spac of these corresponding to regular hexahedral (cubic) discre d real ft(x + iy) 2 .X usingspaces. tive with respecttwo-, to the variable a simple recursion threeand four-dimensional For example, th 3 , which we approximate usin tissue velocity is a field in R DAG asthe above length, but not with sharing: d double and take it’s derivative var3d with specified resolution. Another example is the conservation-of-mass regularizer is >> diff (mp ["X"]) (norm2 (ft (x +: y))) ((((Re(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16]))))) 3 t< sum 3 dx ; + m a s s C o n s e r v a i o n ( ThreeD vx , ThreeD vy , ThreeD v + ((Re(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Re(FT((X[16][16][16]+:Y[16][16][16])))))) / f t +i dot + +< ; ; + (((Im(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16]))))) = norm2 (dyconv3Zip1ZM com + ((Im(FT((d(X[16][16][16])+:0.0[16,16,16])))).(Im(FT((X[16][16][16]+:Y[16][16][16]))))))) sum where# 3x # 3= comdot (vX , vY , vZ )3 f=t (vX/ +i [1 ,0 ,0] vX[ 1 , 0 , 0 ] ) # += +y J sum + (vY [ 0 , 1 , 0 ] PP12 - Anand vY[ 0 , 1 , 0 ] ) 3 transform, ft(x + iy) of a complex vector composed of real vectors x and y: more complicated models > let x = var3d (16,16,16) "x" > let y = var3d (16,16,16) "y" > ft (x +: y) (FT((x(16,16,16)+:y(16,16,16)))) This DSL has classes for real and complex vector spaces, and instances of these corresponding to regular hexahedral (cubic) discretizations of one-, two-, threeandconservation four-dimensional spaces. For example, the x component of e.g., of mass in material tissue velocity is a field in R3 , which we approximate using a discretization transport models Another is a function var3d with specified resolution. example is the problem-specific conservation-of-mass regularizer is • m a s s C o n s e r v a t i o n ( ThreeD vx , ThreeD vy , ThreeD vz ) = norm2 ( conv3Zip1ZM com vx vy vz ) where com (vX , vY , vZ ) = (vX [ 1 , 0 , 0 ] vX[ 1 , 0 , 0 ] ) + (vY [ 0 , 1 , 0 ] vY[ 0 , 1 , 0 ] ) + ( vZ [ 0 , 0 , 1 ] vZ [ 0 , 0 , 1 ] ) 4 where conv3ZipZM constructs a stencil computation with assumed zero margins for out-of-bounds references, norm2 is the norm squared, and we use familiar array subscript notation for relative indexing of neighbouring values for the three velocity components. PP12 - Anand what was that d(X), and where’s ∇f? • d(X) is a vector of differential forms • comes from implicit derivative • we can extract ∇f by simplifying 5 PP12 - Anand product is only defined for real values, some linear operations do not have simple adjoints and must be replaced by sums of real and imaginary parts, etc. In the following example, we look at the pop/push operations after the other simplifications have finished, and use the operator (+i0) to mean the embedding of a real value into a complex value by adding a zero imaginary part. This is a simplification of the actual procedure, which interleaves the application of di↵erent rules, and also recombines the real and imaginary parts after the pop/push operations to avoid multiple applications of the Fourier Transform. For the real/x part of the term graph (5), the pop/push proceeds as simplify rules • commute d(x) to LHS of dot • move linear operators to RHS via adjoint Left " < ft (+i0) dx Right # < ft (+i0) x · Left " (+i0) dx · 6 Right # ft 1 (+i0) < ft (+i0) x 7! Left " ft (+i0) dx Right # (+i0) < ft (+i0) x · Left " dx 7! · Right # < ft 1 (+i0) < ft (+i0) x 7! PP12 - Anand simplification DSL up 4500 lines of literate Haskell, kept manageable (and readable) by the de velopment of a second DSL for term graph rewriting, with two components pattern matching and construction. We do not have room to adequately 00 lines of literate Haskell, kept manageable (and reada describe this language, but we can convey its flavour with two examples: ment of a second DSL for term graph rewriting, with tw | Just ( x ) < (M. in vF t ‘ o ‘ M. f t ) e x p r s node = x exprs n | matching construction. not have Just ( x , ) and < (M. xRe ‘ o ‘ M. reIm )We e x pdo r s node = x e xroom prs be this language, but we can convey its flavour with tw 1 which perform rewrites ft ft (x) 7! x and <(x + iy) 7! x. The matchMatch inverse FT composed with FT of x ng module is imported qualified (i.e., M.* ) with each function matching ust ( x ) < (M. in vF t ‘ o ‘ M. f t ) e x p r s node an operation replace and returning constructor(s) for source nodes (i.e., x). Pat with x ust ( x , ) < (M. xRe ‘ o ‘ M. reIm ) e x p r s node • • ern matchers can be chained together using composition syntax (i.e., ‘o‘) and alternative matches are chained 1 together using Haskell pattern guards perform rewrites ft ft (x) ! 7 x and <(x + iy) ! 7 i.e., |). Composition is designed to have syntax very close to that of the odule isDSL, imported qualified (i.e., M.* ) withCurrently, each fun modelling although the semantics are subtly di↵erent. com plicated and conditional term rewriting must access the underlying Haskel eration and returning constructor(s) for source nodes data structures, but we are working on improving the expressivity of the matchers can be chained together using composition syn PP12 - Anand 7 matching calculus. Type Checking (|| work) • with stronger typing, compilers find more mistakes • in C, all dynamic arrays (float*) look alike • others check size and dimension • we can do better 8 PP12 - Anand Arrays of Samples 2.5 2 1.5 1 0.5 0 9 0 2 4 6 8 10 12 PP12 - Anand Arrays of Samples • number of samples • dimension • frame of reference • resolution (including units) • units of measurement 10 PP12 - Anand Formalization Haskell types define • inphysical units Figure 3. Sampling of height of water in • •r array Disc e t i z asize t i oand n 1 Ddimension [1 ,1.2 ,1.3 ,1.12 ,1.23 ,1.12 ,1.15 ,1. • frame of reference canalSample2 :: D i s c r e t i z a t i o n 1 D (F ” CanalFrame ” ) (NAT 1 2 ) [ f l o a t | 0.02 | ] Meter [ Double ] canalSample2 = Discretization1D 11 [1 ,1.21 ,1.2 ,1.42 ,1.3 ,1.32 ,1.12 ,1. Trying to add those two samplings to each other PP12 - Anand 2.5 Discretization1D 2 canalSample2 1.5 1 :: D i s c r e t i z a t i o n 1 D (F ” CanalFrame ” ) (NAT 1 2 ) [ f l o a t | 0.02 | ] Meter [ Double ] canalSample2 = Discretization1D 0.5 [1 ,1.2 ,1.3 ,1.12 ,1.23 ,1.12 ,1.15 ,1.25 ,1.18 ,1.20 ,1 • array sizes match, so try to add them … [1 ,1.21 ,1.2 ,1.42 ,1.3 ,1.32 ,1.12 ,1.25 ,1.23 ,1.20 ,1 Trying to add those two samplings to each other > c a n a l S a m p l e 1 U.+ c a n a l S a m p l e 2 0 0 2 4 will time error: 6 cause 8 a compile 10 12 No instance f o r ( Add ( Discretization1D (F ” CanalFrame ” ) (NAT 1 2 ) (FLOAT ’ Pos 1 ( ’ E ( S I U n i t ( ’M 1 ) ( ’ S [ Double ] ) ( Discretization1D (F ” CanalFrame ” ) (NAT 1 2 ) (FLOAT ’ Pos 2 ( ’ E ( S I U n i t ( ’M 1 ) ( ’ S [ Double ] ) 12 2) ) 0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) ) 2) ) 0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) ) The error message indicates that the discretizations have di↵erent types a - Anand them is not a valid operation. Ideally, we would PP12 prefer the error messa Type Inference Across Linear Operations • Fourier Transforms are very picky • Nyquist Sampling Theorem • a theorem about how you can • frame of reference • resolution (including units) • units of measurement 13 PP12 - Anand – [e1 ⇤ f2 ]h [e1 ⇤ f2 ]l – [ Double ] ) – – [e2 ⇤ f0 ]h [e2 ⇤ f0 ]l The error message indicates the – [e2 ⇤ f1 that ]h [e f1 ] l – have 2 ⇤discretizations them is not [e2 a⇤ fvalid [e2 ⇤ f2 ]l Ideally, – we would – prefer 2 ]h operation. the sample spacing needs to be identical but it gwould no 2 expert to understand this error. For the first digit of result ‘g0 ’, there is just one term As we mentioned, another powerful feature in our ty low Properties digit of ‘e0 ⇥ f0 ’,Classes and this can be implemented using are which is defined by: digit of result ‘g1 ’ is obtained by adding the high digit c l a s s FT a b | a ! b , b ! a where ‘e0 ⇥ f1 ’, and the low digit of ‘e1 ⇥ f0 ’ which can be im ft :: a ! b class. third i n v F tThe :: b ! adigit is the result of adding five di↵e implemented using Add5 class. Continuing in this way, To implement an instance, we assert the preconditions c l a s s MultD3 f2 f1 f0 e2 e1 e0 g2 g1 g0 | constructor: f 2 f 1 f 0 e2 e1 e0 ! g2 , 14, f 2 f 1 f 0 e2 e1 e0 ! g1 Classy Proofs • instance ( 14 f 2 f 1 f 0 e2 e1 e0 ! g0 where Times f 0 e0 p00h p 0 0 l , Times f 1 e0 p10 Times f 2 e0 D0 p 2 0 l , Times f 0 e1 p01h Times f 1 e1 D0 p 1 1 l , Times f 2 e1 D0 D0 PP12 - Anand Times f 0 e2 D0 p 0 2 l , Times f 1 e2 D0 D0 0 1 h 0 1 l – – [e0 ⇤ f2 ]h [e0 ⇤ f2 ]l – – – – [e1 ⇤ f0 ]h [e1 ⇤ f0 ]l – – – [e1 ⇤ f1 ]h [e1 ⇤ f1 ]l – – – [e1 ⇤ f2 ]h [e1 ⇤ f2 ]l – – – – – [e2 ⇤ f0 ]h [e2 ⇤ f0 ]l – – – [e2 ⇤ fProblems [e2 ⇤ f1Moghadas, ]l –Toporovskyy–& Anand – 1 ]h Type-Safety for Inverse Imaging [e2 ⇤ f2 ]h [e2 ⇤ f2 ]l – – – – instance ( g g1 g0 A s s e r t D u a l F r a m e s f r a m e 1 frame2 , Frame frame1 , Frame2 frame2 , l o a tfirst s t e p digit S i z e 2 , of IsF l o a t s t‘g e p S’,i z there e 1 , ToFloat ⇠ numSampF ForI s Fthe result is justnumSamp one term to be, counted, which is th 0 anddigit encode criterion: low ofthe ‘e0Nyquist ⇥ f0 ’, and this can be implemented using the Times class. The secon MultNZ s t e p S i z‘g e 1 ’numSampF t0 , digit of result is obtained by adding the high digit of ‘e0 ⇥ f0 ’, the low digit o 1 MultNZ t 0 s t e p S i z e 2 t1 , 1 ⇠ 1 (E ‘e0t ⇥ f1FLOAT ’, andPosthe low0 ) digit of ‘e1 ⇥ f0 ’ which can be implemented using the ’Add3 class. ) ) The third digit is the result of adding five di↵erent known terms which FT ( D i s c r e t i z using a t i o n 1 DAdd5 f r a m class. e 1 numSamp s t e p S i z e 1 in rangeU Complex implemented Continuing this [way, weDouble arrive] ) at the definition Proofs are Instances ( D i s c r e t i z a t i o n 1 D f r a m e 2 numSamp s t e p S i z e 2 rangeU [ Complex Double ] ) c l a s s where MultD3 f2 f1 f0 e2 e1 e0 g2 g1 g0 | f t ( D i sfc2r e tfi1 z a tfi 0 o n 1e2 D xe1 ) = e0 ( D i! s c r eg2 t i z,a t i o n 1 D $ FFT . f f t x ) i n v F t ( D i s c r e t i z a t i o n 1 D x ) = ( D i s c r e t i z a t i o n 1 D $ FFT . i f f t x ) f 2 f 1 f 0 e2 e1 e0 ! g1 , f 2 from f 1 the f 0 pure e2 e1 e0 ! g0 where ↵t and i↵t come ↵t package. Twowhere parameters of that type class instance ( Times f 0 e0and p00h p00 l , Times f 1 e0 p10h type p 1 0 classes l, represent a discretized function its FT. Using the multi-parameter Times dependency, f 2 e0 D0 it p 2is0 lpossible , Times f 0 e1 1 l ,FT of a together with functional to infer thep01h type ofp 0the Times f 1 e1the D0type p 1 1ofl ,theTimes f 2 e1 D0 D0itself , discretized function if it knows discretized function and vice Times f 0 e2 there D0 p are 0 2 l some , Times f 1 e2constraints D0 D0 , on the two versa. In the instance definition, important f 2 e2 D0 D0 , parameters of the Times FT class which are related to FT properties. First, the number of Add3 p00h p 1 0 l p 0 1 l c1h c 1 l , samples in those two parameters should be the same, which is encoded by clarifying Add5 p 2 0 l p01h p 1 1 l p 0 2 l c1h D0 c 2 l ) the same numSamp for those discretized parameters. Second, the discretized functions ) MultD3 f 2 f 1 f 0 e2 e1 e0 c 2 l c 1 l p 0 0 l where 15 (class parameters) are related to two dual frames, and this duality must be asserted byWe theimplemented user. The most arithmetic subtle constraint is encoded two instances of thein MultNZ for threeandvia ten-digit numbers this way and define class. This class constraint that size-specific the product of multiply the first two parameters a MultD suchspecifies that each would be anshould instance of this clas be equal to the third parameter. To allow for type inferencing, we must also PP12 assert - Anand In addition, we defined an special case: The data type T is not exported, so a user can never pass it to any type functions and the constraint can never be satisfied. This is neces closed type families must have at least one clause. The type Assert is a useful synonym: if the condition is true, Asser Otherwise it is equal to err , a constraint which is never satisfied. Readable Errors type A s s e r t b o o l e r r = I f b o o l Always e r r Now we redefine our original example: • type inference fails on modelling errors • ghc loves to throw up thousand-line errors • we tamed it type A s s e r t D u a l U n i t s a b = A s s e r t ( a : ⇤ : b ⌘ ? U n i t l e s s ) ( D u a l U c l a s s ( Frame a , Frame b , A s s e r t D u a l U n i t s ( B ase Un i t a ) ( Ba seU n it ) ) AssertDualFrames a b | a ! b , b ! a instance A s s e r t D u a l F r a m e s (F ”BadFrame1” ) (F ”BadFrame0” ) where which now produces the error: No instance f o r ( D u a l U n i t s ( S I U n i t ( ’M 1 ) ( ’ S 0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) ) ( S I U n i t ( ’M 0 ) ( ’ S 0 ) ( ’ Kg 0 ) ( ’A 0 ) ( ’ Mol 0 ) ( ’K 0 ) ) ) 16 The type error now indicates what types have caused the error. It user that conceptually they have declared dual frames which physical be dual, since their units are not dual. PP12 - Anand Multi-Core = ILP Reinvented Instruction Level Parallelism CPU Execution Unit Load/Store Instruction Arithmetic Instruction Register Multi-Core Parallelism Chip Core DMA Computational Kernel Buffer / Signal PP12 - Anand The Catch: Soundness • on CPUs hardware maintains OOE • instructions execute out of order • hardware hides this from software • ensures order independence • in our Multi-Core virtual CPU • compiler inserts synchronization • soundness up to software • uses asynchronous communication 18 PP12 - Anand Asynchronous • no locks • locking is a multi-way operation • a lock is only local to one core • incurs long, unpredictable delays • use asynchronous messages • matches efficient hardware 19 PP12 - Anand Reorder Window No writes to buffer until DMA completion is confirmed . . . other operations . . . x 1 . . . other operations . . . No reads or writes to buffer until past barrier WaitData 1 Reorder Window WaitSignal SendData Hazard Async Signals SendSignal WaitDMA WaitData PP12 - Anand Multi-Core Language AtomicVerifiableOperations AVOps do a computation with local Computation operation bufferList data start DMA to send local data SendData localBuffer remoteBuffer tags off core wait for arrival of DMAed WaitData localBuffer tag data wait for locally controlled WaitDMA tag DMA to complete SendSignal core signal send a signal to distant core WaitSignal signal wait for signal to arrive Loop n π body body; π(body); π(π(body))… PP12 - Anand locally Sequential Program Nested Code Graphs for Multi-Core Parallelism index 1 2 3 4 5 6 • core 1 SendSignal s c2 core 2 long computation WaitSignal s computation WaitSignal s 23 core 3 SendSignal s c2 for instructions emember thattotal each order core executes independently of the other cores, except here explicit wait instructions block easier to think inexecution order until some kind of commucation (signal, change in data tag, DMA) is confirmed to have completed. send wait(s) herefore, in this caseprecedes the most likely instruction completion order has core 3 xecuting the SendSignal as soon as it is queued, allowing the PP12 signal to be sent - Anand 22 • • Remember that each core executes independently of the other cores, except where explicit wait instructions block execution until some kind of communication (signal, change in data tag, DMA) is confirmed to have completed. Therefore, in this case the most likely instruction completion order has core 3 executing the SendSignal as soon as it is queued, allowing the signal to be sent before core 2 has received the core 1’s signal and cleared the signal hardware: NOT sequential index 2 5 1 3 4 6 core 1 SendSignal s c2 core 2 core 3 SendSignal s second signal overlaps the first, only one registered long computation WaitSignal s computation no signal is sent, so the next WaitSignal blocks WaitSignal s c2 To be precise, completion of the SendSignal means that the signal has been can execute out of order initiated by the sender, and reception may be delayed, so the signal from core 3 could even arrive before the signal from core 1. In either case, neither signal will arrive after the first WaitSignal, so the second WaitSignal will wait forever, PP12 - Anand 23and this program execution will not terminate. • initiated by the sender, and reception may be delayed, so the signal from core 3 could even arrive before the signal from core 1. In either case, neither signal will arrive after the first WaitSignal, so the second WaitSignal will wait forever, and this program execution will not terminate. The problem is caused because there are no signals or data transmissions enforcing completion of instruction 5 to follow completion of instruction 3. This example, when considered as part of a longer program, also demonstrates a possible safety violation with the valid completion order: does NOT imply order independent index 1 5 3 core 1 core 2 long computation WaitSignal s computation using wrong assumptions 4 2 6 24 SendSignal s c2 core 3 SendSignal s WaitSignal s PP12 - Anand c2 Linear-Time Verification • must show • results are independent of execution order • no deadlocks • need to keep track of all possible states • linear in time = one-pass verifier • constant space • = max possible states at each instruction 25 PP12 - Anand Proof State • state of buffers (valid, waiting for DMA, ...) • active signals • Φollows map Φ ©n(c1,c2) n c1 c2 c last instruction on core 1 known to • records complete before the last instruction on core 2 completing before instruction n 26 PP12 - Anand Algorithm • maintain the state one instruction at a time • flag indeterminate states as errors Proof that any indeterminacy and/or deadlock • show would have been flagged 27 PP12 - Anand Impact • no parallel debugging !! optimization trick used for ILP can be • every adapted 28 PP12 - Anand Loops (Simple Repetition) • locally sequential after 2 iterations locally sequential • no hazards after 2 iterations no hazards • proof: Φ... are periodic after 1st iteration 29 PP12 - Anand Loops (With Rewriting) • rewriting = permutation per iteration • if rewritten Φ... repeats a state & locally sequential locally sequential • if rewritten Φ... repeats a state, & hazard free hazard free • proof: Φ... deja vu (all over again) 30 PP12 - Anand But… • assumes you have signals • what about shared memory? • still lock-free synchronization? • let’s try it on x86 31 PP12 - Anand Single-Reader, SingleWriter AVOp Ring Buffers (SRSWARB) %&' ( !" #"$ CDPW14 - Anand Limit Hazards • AVOps can only conflict if they can fit in the Ring Buffers at the same time • on each core, AVOps are sequential, therefore safe • reads on different cores are safe • while write AVOp is on a core, check that no other core reads or writes 33 PP12 - Anand Nest Step • signals across nodes • SRSWARB system for multicore • new system for GPU 34 PP12 - Anand i) Peer Reviewed after low-frequency corrections, k-space data is usually conjugate-symmetric. The innovation a) Books is in novel segmentations of k-space. Details ... Christopher Anand, Paul Baird, Eric Loubeau, John Wood, editors, Harmonic morphisms of ii)[1] Not Peer Reviewed metric graphs, in Harmonic morphisms, harmonic maps and related topics, Pitman (2000). e) Proceedings articles b) Contributions to Books [52] Christopher Anand, Jacques Carette, Alexandre Korobkine, Target Detection using Maple Code [2] Generation, Michal Dobrogost, Christopher Anand, Wolfram Kahl, Verified Multicore Parallelism using Proceedings of the Maple Summer Workshop July 11–14, Waterloo, Ontario (2004) Atomic 13 pages.Verifiable Operations, accepted for publication in Multicore Technology: Architecture, Reconfiguration and Modeling, Muhammad Yasir Qadri and Stephen J. Sangwine (eds), CRC Press., 2013, 107–151. vi) Unpublished Documents [3] Christopher Kumar Anand, Wolfram Kahl, Synthesising and Verifying Multi-Core Parallelism a) Technical Reports (18 publications) in Categories of Nested Code Graphs; “Process Algebra for Parallel and Distributed Processing (AlgebraicSharma, Languages in Specification-Based Software Development)”, eds. Michael Alexander [53] Anuroop Christopher Kumar Anand, A Domain-Specific Architecture for Elementary and William Gardner,CAS-14-06-CA. Chapman and Hall/CRC, 2008, 3–45. Function Evaluation, [4] Jessica Christopher Anand,and Harmonic morphisms metricSymbolic graphs, in Anand, Baird, Loubeau, Wood [54] L M Pavlin Christopher Kumar of Anand, Generation of Parallel Solvers for (eds.), Harmonic morphisms, harmonic maps and related topics, (2000) 109–112. Inverse Imaging Problems, CAS-14-05-CA. [55] Maryam Moghadas, Yuriyy Toporovskyy, Christopher Kumar Anand, Type-Safety for Inverse c) Journal Articles (23 publications) Imaging Problems, CAS-14-04-CA. [5] Kevin Browne and Christopher Anand and Elizabeth Gosse, Gamification and serious game [56] Christopher Kumar Anuroop Sharma: Unified Computing, Tables for Exponential and Logaapproaches for adultAnand literacyand tablet software, Entertainment 5 (2014) 135–146. rithm Families, AdvOL2009/02. (Same as [11].) dx.doi.org/10.1016/j.entcom.2014.04.003 35 [57] Kumar Anand, Wolfram Kahl, “Synthesising and VerifyingNie Multi-Core Parallelism [6] Christopher K Anand, Alex D Bain, Andrew Thomas Curtis, Zhenghua Designing Optimal in Categories of Nested Graphs”, Large-Scale, SQRL ReportNonlinear 50, 2008. Earlier version J. ofMagn. chapterReson. [3]. Universal Pulses Using Code Second-Order, Optimization, 219 (2012) 61–74. http://dx.doi.org/10.1016/j.jmr.2012.04.004 [58] Jacques Carette, Spencer Smith, John McCutchan, Christopher Anand, Alexandre Korobkine, Manipulation as Part of a Better Process for Scientific Computing Code, [7] Model Kevin Browne, Christopher Anand, AnDevelopment empirical evaluation of user interfaces for a mobile PP12 - Anand It Works ! • used to generate, from the objective function, a multi-core (shared memory) image reconstruction software for parallel Magnetic Resonance Imaging for AllTech Medical Systems America 36 PP12 - Anand Thanks Stephen Adams Curtis d’Alves Kevin Browne Shiqi Cao Nathan Cumpson Saeed Jahed Damith Karunaratne Clayton Goes Gabriel Grant William Hua Fletcher Johnson Wei Li Nick Mansfield Maryam Moghadas Mehrdad Mozafari Adam Schulz Anuroop Sharma Sanvesh Srivastava Wolfgang Thaller Gordon Uszkay Christopher Venantius Paul Vrbik Fei Zhao Robert Enenkel IBM Centre for Advanced Studies, CFI, OIT, MITACS, NSERC, OCA Inc. and Apple Canada for research support. ∞ 37 PP12 - Anand