Pre-conference Session David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager Microsoft Corporation Overview and Architecture The Shift to Manycore • Parallel computing matters Foundations • Parallel computing concepts Techniques • Concerns, top-down “That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer.” -- Intel co-founder Gordon Moore in 1965 Quad-core Nehalem announced at IDF in 2007: 731 Million transistors (more than 13 doublings later…) Thanks to Jim Larus of Microsoft Research 1000.0 Processor (SPECInt) Memory (MB) Disk (MB) Moore's Law 100.0 10.0 1.0 Windows 3.1 NT 3.51 Windows 95 52% CAGR in Spec Performance! Attack of the Killer Micros! Software is a Gas! Windows 98 Windows 2000 Windows XP Windows XP SP2 Vista Premium Sun’s Surface Power Density (W/cm2) 10,000 Rocket Nozzle 1,000 Nuclear Reactor 100 10 8086 8085 4004 8008 1 ‘70 286 Hot Plate Pentium® processors 386 486 8080 ‘80 ‘90 ‘00 ‘10 Dr. Pat Gelsinger, Sr. VP, Intel Corporation and GM, Digital Enterprise Group, February 19, 2004, Intel Developer Forum, Spring 2004 The Memory Wall The ILP Wall Single-thread software performance will not be improving (much) Intel Larrabee Latent parallelism for future scaling Focus on data – the scalable dimension Tasks instead of threads No silver bullet – many “right” approaches Identify connected components and map every node to its containing component All code in this talk is pseudo-code foreach node do node.component = null foreach node do if(node.component == null) then node.component = new Component; roots.add(node); dfsearch(node) fi function dfsearch(n) foreach m in adjacent(n) do if(m.component == null) do m.component = node.component dfsearch(m) fi Candidates & connections form a reduced graph Recursively find components on reduced graph Update nodes to refer to final components Concurrent processing: independent requests (most server applications) Parallel processing: decompose one task to enable concurrent execution “Arbitrate “ownership” of the nodes” “Start concurrent searches …” Simulating isolation of threads Scheduling tasks Multi-threading, Asynchronous, … Fairness Preemption Responsiveness Throughput For parallelism, not a goal but a context Existing architectural concern Drive overheads down parallel foreach node do node.component = NULL Classically data parallel: same operation applied to an homogenous collection parallel foreach node do …start a parallel search … Data-focused but built on an underlying “task” model for generality • Emphasize recursive decomposition • Preserves function interfaces • “fork-join” • Structured control constructs • Parallel loops, co-begin function dfsearch(n) parallel foreach m in adjacent(n) do if(… first to visit m …) dfsearch(m) fi Each iteration is a task All tasks finish before function returns • Emphasizes processors • “fork –join” threads + barrier • Structured control constructs • “shared loops” • Improving support for recursion Parallel -- acquire workers shared foreach node do node.component = NULL -- implied barrier, workers wait shared foreach node do …start a parallel seearch … -- release workers OpenMP is the common binding of this model Resource Management is too hard m1 m2 c11 m3 c12 m4 m5 c21 m6 c22 m7 Data flow graph for subtasks of Strassen Multiplication Identify where searches “collide” Arbitrate “ownership” of the nodes function dfsearch(n) foreach m in adjacent(n) do if(m.component == NULL) do m.component = n.component dfsearch(m) fi Search 1 Search 2 If(m.component == null)? If(m.component == null)? m.Component = n1.component m.Component = n2.component dfsearch(m) dfsearch(m) ! One action at a time for any specific node function dfsearch(n) foreach m in adjacent(n) do m.lock(); var old = m.component; if(old == NULL) m.component = n.component m.unlock(); if(old == NULL) then dfsearch(m) else if (old != n.component) then -- record the “edge” between searches endif Locks provide exclusion but the algorithm correction depends on careful reasoning that order does not matter word compare_and_swap(word * loc, word oldv, word newv) { word current = *loc; if(current == oldv) *loc = newv; Common return current; hardware } primitive function dfsearch(n) foreach m in adjacent(n) do var old = compare_and_swap(&m.component, NULL, n.component) if(old == NULL) then • Short duration • Preemption friendly • Limited scenarios function dfsearch(n,edges) foreach m in adjacent(n) do m.lock(); -- Arbitrate “ownership” of the nodes var old = m.component; if(old == NULL) m.component = n.component m.unlock(); if(old == null) then dfsearch(m,edges) else if (old != n.component) then edges.insert(old, n.component) endif Concurrency Safe High-Bandwidth parallel foreach node do node = NULL parallel foreach node do node.lock() var old = node.component if(old == NULL) node.component = new Component node.unlock() if(old == NULL) then roots.add(node) dfsearch(node, edges) fi -- (roots, edges) form a derived problem 120 100 80 Time 95% 60 Efficiency 95% 40 Efficiency 99% 20 0 1 2 3 4 5 6 7 8 9 Processors 10 11 12 13 14 A program that is 95% (99%) with 3% overhead to parallelize Contention Load Balance Cache Effects Latencies Preemption Microsoft Visual Studio: Bringing out the Best in Multicore Systems Parallel Programming for C++ Developers in the Next Version of Microsoft Visual Studio The Concurrency and Coordination Runtime and Decentralized Software Services Toolkit Research: Concurrency Analysis Platform and Tools for Finding Concurrency Bugs Parallel Programming for Managed Developers with the Next Version of Microsoft Visual Studio Concurrency Runtime Deep Dive: How to Harvest Multicore Computing Resources Parallel Computing Application Architectures and Opportunities Addressing the Hard Problems of Concurrency Future of Parallel Computing (Panel) Mechanisms for Asynchrony For coarse-grained work and agents Thread t = new Thread(delegate { // concurrent work }); t.Start(); For fine-grained work ThreadPool.QueueUserWorkItem(delegate { // concurrent work }); ThreadPool Queue Worker Thread 1 Item Item45 Item 1 Item 2 Program Item 3 Item 6 Thread … Worker Thread p For fine-grained work ThreadPool.QueueUserWorkItem(delegate { // concurrent work }); Advanced capabilities Common Async API Pattern in the Framework int Foo(object o, string s); IAsyncResult BeginFoo(object o, string s, AsyncCallback callback, object state); int EndFoo(IAsyncResult result); Efficient async I/O on Windows public static unsafe bool UnsafeQueueNativeOverlapped(NativeOverlapped* overlapped) UI Marshaling // on background thread Control c = …; c.BeginInvoke((Action)delegate { // runs on UI thread }); // on background thread Control c = …; c.Dispatcher.BeginInvoke((Action)delegate { // runs on UI thread }); Synchronization Context BackgroundWorker ExecutionContext Lunch (12pm-1:15pm) Topics in Synchronization The Pitfalls of Shared Memory class C { static int s_f; int m_f; public: void f(int * py) { int x; x++; // local variable s_f++; // static class member m_f++; // class member (*py)++; // pointer to something } }; Isolation, Immutability, and Synchronization +: no overhead, easy to reason about -: sharing is often needed, leading to message passing +: no overhead, easy to reason about -: C# and VB encourage mutability … [lineage] -: copying means efficiency can be a challenge +: see F# for promising advances! +: flexible, programming techniques remain similar -: perf overhead, deadlocks, races, … R/W static int x = 0; void t1() { int y = x; … int z = x; // y != z } void t2() { x = 42; } W/R static int x = 0; void t1() { try { x = 42; … … throw e; … } catch { // whoops; // rollback! x = 0; throw; } } void t2() { } int y = x; f(y); W/W static int x = 0; void t1() { x = 42; int y = x; } void t2() { x = 99; int z = x; } Ensuring A happens-before () B Example of a Serializability Problem T 0 1 2 3 4 5 6 7 8 t0 t1 t2 t2(0): MOV EAX,[a] #0 t0(0): MOV EAX,[a] #0 t0(1): INC,EAX #1 t0(2): MOV [a],EAX #1 t1(0): MOV EAX,[a] #1 t1(1): INC,EAX #2 t1(2): MOV [a],EAX #2 t2(1): INC,EAX #1 t2(2): MOV [a],EAX #1 Sequential Concurrent Behavior Deterministic Nondeterministic Memory Stable In flux (unless private, read-only, or protected by a lock) Unnecessary Essential Invariants Must hold only on method entry/exit or calls to external code Anytime the protecting lock is not held Deadlock Impossible Possible, but can be mitigated Code coverage finds most bugs Code coverage insufficient; races, timing, and environments probabilistically change Trace execution leading to failure; finding a fix is generally assured Postulate a race and inspect code; root causes easily remain unindentified Locks Testing Debugging Hardware Synchronization int int int int int Add(ref int l, int v); CompareExchange(ref int l, int v, int cmp); Decrement(ref int l); Increment(ref int l); Exchange(ref int l, int v); The Foundation on top of Which All Else Exists public class WaitHandle : IDisposable { public void Close(); public void WaitOne(); // timeout-variants, and plenty of others… } public static void WaitAll(WaitHandle[] hs); public static int WaitAny(WaitHandle[] hs); In .NET public class Mutex : WaitHandle { public Mutex(string name, MutexSecurity acl, …); public void ReleaseMutex(); } public class Semaphore : WaitHandle { public Semaphore( int initialCount, int maximumCount, string name, SemaphoreSecurity acl, …); public void Release(int count); } In .NET public class EventWaitHandle : WaitHandle { public EventWaitHandle( bool initialState, EventResetMode mode, string name, EventWaitHandleSecurity acl, …); public void Reset(); public void Set(); } public enum EventResetMode { AutoReset, ManualReset } public class AutoResetEvent : EventWaitHandle { … } public class ManualResetEvent : EventWaitHandle { … } Locking [C#] [VB] lock (obj) { … } SyncLock obj … End SyncLock Monitor.Enter(obj); try { … } finally { Monitor.Exit(obj); } Condition Variables bool P = false; … lock (obj) { while (!P) Monitor.Wait(obj); … } … elsewhere … lock (obj) { P = true; Monitor.Pulse[All](obj); } When Mutual Exclusion is Unnecessary Convoy Avoidance Confined State Within Threads An ImmutableStack<T> Type public class ImmutableStack<T> { private readonly T m_value; private readonly ImmutableStack<T> m_next; private readonly bool m_empty; public ImmutableStack() { m_empty = true; } internal ImmutableStack(T value, ImmutableStack<T> next) { m_value = value; m_next = next; m_empty = false; } public ImmutableStack<T> Push(T value) { return new ImmutableStack(value, this); } } public ImmutableStack<T> Pop(out T value) { if (m_empty) throw new Exception("Empty."); return m_next; } Architecture and Platform Guarantees Examples X = Y = 0; ~~~ X = 1; A = Y; Y = 1; B = X; ~~~ A == 1 && B == 0? X = Y = 0; ~~~ X = 1; Y = 1; A = Y; B = X; ~~~ A == 0 && B == 0? No, except on IA64. (No StoreStore, No LoadLoad) Yes! (StoreLoad is permitted) X = Y = 0; ~~~ X = 1; A = X; Y = 1; ~~~ A == 1 && B == 1 && C == 0? No. (Transitivity) B = Y; C = X; Accessing Nonatomic Locations w/out Proper Synchronization internal static long s_x; void t1() { int i = 0; while (true) { s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL; i++; } } void t2() { while (true) { long x = s_x; Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL); } } Double Edged Sword class Stack<T> { Node<T> head; void Push(T obj) { Node<T> n = new Node<T>(obj); Node<T> h; do { h = head; = h; } while (Interlocked. CompareExchange(ref head, n, h) != h); } T Pop() { Node<T> n; do { n = head; } while (Interlocked. CompareExchange(ref head,, n) != n); return n.Value; } … } Efficient Lazy Initialization (Variant 1: Never Create >1) class Foo { private static volatile Foo s_inst; private static object s_mutex = new object(); internal Foo { get { if (s_inst == null) lock (s_mutex) if (s_inst == null) s_inst = new Foo(…); return s_inst; } } } Efficient Lazy Initialization (Variant 2: >1 OK) class Foo { private static volatile Foo s_inst; internal Foo { get { if (s_inst == null) { Foo candidate = new Foo(); Interlocked.CompareExchange( ref s_inst, candidate, null); } return s_inst; } } } Trickier Than You Think! class SpinLock { private int m_state = 0; public void Enter() { while (Interlocked.CompareExchange( ref m_state, 1, 0) != 0) ; } public void Exit() { m_state = 0; } } Brain Melting Details … Try Numero Dos – Still Imperfect class SpinLock { private volatile int m_state = 0; public void Enter() { int tid = Thread.CurrentThread.ManagedThreadId; while (true) { if (Interlocked.CompareExchange(ref m_state, tid, 0) != 0) { int iters = 1; while (m_state != 0) { if (Environment.ProcessorCount == 1) { if (iters % 5 == 0) Thread.Sleep(1); else Thread.Sleep(0); iters++; } else { Thread.SpinWait(iters); if (iters >= 4096) Thread.Sleep(1); else { if (iters >= 2048) Thread.Sleep(0); iters *= 2 } } } } } } public void Exit() { m_state = 0; } } Synchronization Best Practices Lock consistently Do Do Do class MyList<T> { T[] items; // lock: items int n; // lock: items void Add(T item) { lock (items) { items[n] = item; n++; } } … } Lock for the right duration Do Don’t class MyList<T> { T[] items; // lock: items int n; // lock: items // invariant: n is count of valid // items in list and items[n] == null void Add(T item) { lock (items) { items[n] = item; n++; } } … } Make critical regions short and sweet Do Don’t Don’t class MyList<T> { ... void Add(T t) { lock(items) { items[n] = t; n++; } Listener.Notify(this); } … } Encapsulate your locks Don’t Don’t class MyList<T> { T[] items; int n; static object slk = new object(); … static void ResetStats() { lock(slk){ … } } … } Avoiding deadlocks Do: Acquire locks in a consistent order class MyService { A a; B b; … void DoAB() { lock(a) lock(b) { a.Do(); b.Do(); } } void DoBA() { lock(b) lock(a) { b.Do(); a.Do(); } } } Locking Miscellany Do: Document your locking policy Especially for public APIs Do: Use a reader/writer lock if readers are common Do: Prefer lock-based code to lock-free code Do: Prefer Monitors over kernel synchronization Avoid: Lock recursion in your designs Don’t: Build your own lock Avoid: Writing your own thread pools Break (3:15pm-3:45pm) Designs and Algorithms The Impact of Multi-core on Apps Code and Data A Taxonomy of Concurrency Agents/CSPs * Message Passing * Loose Coupling Task Parallelism * Statements * Structured * Futures * ~O(1) Parallelism Data Parallelism * Data Operations * O(N) Parallelism Messaging … Metrics Worth Measuring Parallel For Loops +: simple, predictable, efficient -: can’t tolerate iteration imbalance, blocking +: tolerates imbalance, blocking -: more difficult, communication overhead Parallel For Loops – Static Decomposition void ParallelForS(int lo, int hi, Action<int> body, int p) { int chunk = ((hi – lo) + p - 1) / p; // Iterations/thread ManualResetEvent mre = new ManualResetEvent(false); int remaining = p; // Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { int start = lo + (int)procId * chunk; for (int j=start; j<start + chunk && j < hi; j++) { body(j); } if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); } mre.WaitOne(); // Wait for them to finish } Parallel For Loops – Dynamic Decomposition void ParallelForD(int lo, int hi, Action<int> body, int p) { const int chunk = 16; // Chunk size (constant) ManualResetEvent mre = new ManualResetEvent(false); int remaining = p; int current = lo; // Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { int j; while ((j = (Interlocked.Add( ref current, chunk) – chunk)) < hi) { for (int k = 0; k < chunk && j + k < hi; k++) { body(j + k); } } if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); } mre.WaitOne(); // Wait for them to finish } Parallel Foreach Loops Parallel Foreach Loops void ParallelForEach<T>(IEnumerable<T> e, Action<T> body, int p) { const int chunk = 16; // Chunk size (constant) ManualResetEvent mre = new ManualResetEvent(false); int remaining = p; using (IEnumerator<T> en = e.GetEnumerator()) { // shared // Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { T[] buffer = new T[chunk]; int j; do { lock (en) { for (j = 0; j < chunk && en.MoveNext(); j++) buffer[j] = en.Current; } for (int k = 0; k < j; k++) body(buffer[k]); } while (j == chunk); if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); } mre.WaitOne(); // Wait for them to finish } } Divide and Conquer - Recursion Mirror(node.Right); Reductions int ParallelSum(int[] array, int p) { int chunk = (array.Length + p - 1) / p; // Iterations/thread ManualResetEvent mre = new ManualResetEvent(false); int sum = 0, remaining = p; // Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { int mySum = 0; int start = (int)procId * chunk; for (int j=start; j<start + chunk && j < array.Length; j++) mySum += array[j]; Interlocked.Add(ref sum, mySum); if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); } mre.WaitOne(); // Wait for them to finish return sum; } When to “Go Parallel”? There is a cost; only worthwhile when Work per task/element is large, and/or Number of tasks/elements is large ? tasks Point of diminishing returns -- Speedup ++ 1 task (Sequential) -- ? tasks Break even point Work Per Task // # of Tasks ++ Synchronous I/O Thread 1: 6 work items in 4 time Thread 2: time Overlapped IO Thread 1: 6 work items in 3 time Thread 2: time = Running () = Waiting () Synchronization Thread 1: (lock) (lock) (lock) Thread 2: Thread 3: (lock) Thread 4: … = Running () = Running w/ lock () = Waiting () Load Imbalance Sequential: Parallel: Thread 1 Thread 2 More than 2 threads is just wasted resource: S = 50%, 1/S == 2 No matter how many processors, 2x is it Thread 3 Thread 4 = Your API Other Miscellaneous Algorithms Producer/Consumer: Blocking & Bounded Queue public class BlockingBoundedQueue<T> { private Queue<T> m_queue = new Queue<T>(); private Semaphore m_fullSemaphore = new Semaphore(128); private Semaphore m_emptySemaphore = new Semaphore(0); public void Enqueue(T item) { m_fullSemaphore.WaitOne(); lock (m_queue) { m_queue.Enqueue(item); } m_emptySemaphore.Release(); } public T Dequeue() { T e; m_emptySemaphore.WaitOne(); lock (m_queue) { e = m_queue.Dequeue(); } m_fullSemaphore.Release(); return e; } } .NET Framework 4.0 IEnumerable<BabyInfo> babies = ...; var results = new List<BabyInfo>(); foreach(var baby in babies) { if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { results.Add(baby); } } results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year)); IEnumerable<BabyInfo> babies = …; var results = new List<BabyInfo>(); int partitionsCount = Environment.ProcessorCount; int remainingCount = partitionsCount; var enumerator = babies.GetEnumerator(); try { using (var done = new ManualResetEvent(false)) { for(int i = 0; i < partitionsCount; i++) { ThreadPool.QueueUserWorkItem(delegate { var partialResults = new List<BabyInfo>(); while(true) { BabyInfo baby; lock (enumerator) { if (!enumerator.MoveNext()) break; baby = enumerator.Current; } if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { partialResults.Add(baby); } } lock (results) results.AddRange(partialResults); if (Interlocked.Decrement(ref remainingCount) == 0) done.Set(); }); } done.WaitOne(); results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year)); } } finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); } var results = from baby in babies.AsParallel() where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby; Tools Programming Models PLINQ Concurrency Runtime Profiler Concurrency Analysis ThreadPool Task Scheduler Parallel Pattern Library Data Structures Data Structures Task Parallel Library Parallel Debugger Windows Task Scheduler Resource Manager Resource Manager Operating System Threads Key: Managed Library Agents Library Native Library Tools What is it? Why is it good? .NET Program Declarative Queries Parallel Algorithms C# Compiler PLINQ Execution Engine Query Analysis Data Partitioning Chunk Range Hash Striped Repartitioning Operator Types Map Filter Sort Search Reduce … Merging Buffering options Order preservation Inverted VB Compiler Task Parallel Library C++ Compiler F# Compiler PLINQ Coordination Data Structures Loop replacements Imperative Task Parallelism Scheduling Concurrent Collections Synchronization Types Coordination Types Other .NET Compiler Threads MSIL TPL or CDS Proc 1 … Proc p Work-Stealing Scheduler Global Queue Local Queue Worker Thread 1 Task 1 TaskProgram 2 Thread Task 4Task 3 Task 5 … … Local Queue Worker Thread p Task 6 Thread-safe collections Locks Work exchange Initialization Phased Operation Wrap-up Talk Recap What the Future Holds Programming Models Safety Current offerings minimal impact (sharp knives) Three key themes Functional: immutable & pure Safe imperative: isolated Safe side-effects: transactions Verification tools Patterns Agents (CSPs) + tasks + data 1st class isolated agents Raise level of abstraction: what, not how 110 What the Future Holds Efficiency and Heterogeneity Efficiency “Do no harm” O(P) >= O(1) More static decision-making vs. dynamic Profile guided optimizations The future is heterogeneous + =~ Chip multiprocessors are “easy” Out-of-order vs. in-order GPGPU’ (fusion of X86 with GPU) Vector ISAs Possibly different memory systems 111 All Programmers Will Not Be Parallel Implicit Parallelism Use APIs that internally use parallelism Structured in terms of agents Apps, LINQ queries, etc. Explicit Parallelism Safe Frameworks, DSLs, XSLT, sorting, searching Explicit Parallelism Unsafe (Parallel Extensions, etc) In Conclusion Opportunity and crisis Architects & senior developers pay heed Time to start thinking and experimenting Not yet for ubiquitous consumption [5 year horizon] but… Can make a real difference today in select places: embarassingly parallel Begin experimenting today Competitive advantage for those who grok it Less incentive for the client platform without Windows Vista + .NET 3.5 Play with Parallel Extensions (.NET 4.0 and C++) Exciting times! Thank-you. 113 Just released! Available at the PDC bookstore Concurrent Programming on Windows (Addison-Wesley) Covers Win32 & .NET Framework Book Signing Where: PDC bookstore Date/Time: Wednesday, Oct. 29 2:30PM – 3:00PM And download Parallel Extensions to the .NET Framework! Microsoft Visual Studio: Bringing out the Best in Multicore Systems Parallel Programming for C++ Developers in the Next Version of Microsoft Visual Studio The Concurrency and Coordination Runtime and Decentralized Software Services Toolkit Research: Concurrency Analysis Platform and Tools for Finding Concurrency Bugs Parallel Programming for Managed Developers with the Next Version of Microsoft Visual Studio Concurrency Runtime Deep Dive: How to Harvest Multicore Computing Resources Parallel Computing Application Architectures and Opportunities Addressing the Hard Problems of Concurrency Future of Parallel Computing (Panel) © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.