Graphical Asymmetric Processing Prototype Presentation December 13, 2004 December 13, 2004 1 Team Organization Mohammed Iraqi Web Developer Sunny Nanda Budget Analyst Joseph Williams Technical Writer December 13, 2004 Gene Hill Price General Manager John Zareno Project Manager Thomas James Team Lead Roberta Serbenescu Software Analyst Tiffany Williams Research Analyst 2 http://www.apple.com/education/science/profiles/vatech/ December 13, 2004 3 http://www.apple.com/education/science/profiles/vatech/ December 13, 2004 4 http://www.apple.com/education/science/profiles/vatech/ December 13, 2004 5 Problem Statement • Computationally intensive environments underutilize Graphical Processing Units. December 13, 2004 6 Background Information • Discussed since 1996, but never implemented. • GPU Performance – Multiplied at a rate of 2.8 times per year since 1993 – Expected to increase at this rate for another 5 years • The more performance increases the more helpful our product becomes http://www.computer.org/computer/homepage/1003/entertainment December 13, 2004 7 Solution: G.A.P. Create a usable, extendable, and maintainable API to leverage the unused computing power of graphics processors that will result in increased performance of scientific, database, and other processorintensive applications. December 13, 2004 8 Solution: G.A.P. Utilizing existing hardware: – Improve computing power – Improve computing time – Improve computing responsiveness December 13, 2004 9 Solution Implementation By creating a: – SDK to utilize the GPU – Selling that SDK to NVIDIA December 13, 2004 10 What is an SDK? • S.D.K. – Software Development Kit • A set of programs that allows software developers to create products to run on a particular platform or to work with an API. • Include: Manual, Examples, Libraries • Other examples, both free and commercial: – Java, OS/2, AW, Windows, DirectX December 13, 2004 11 Phase 1 Product Goals • Demonstrate amount of power in current GPUs – Also: Ability to utilize power • Secure funding to continue development • Secure interested parties – universities and research labs • Take first steps towards NVIDIA partnership December 13, 2004 12 Phase 1 Product Objectives • Leverage the GPU for additional power • Improve throughput on workstation machines • Ease programming difficulty for utilizing the GPU • Maintain current program compatibility • Preserve system stability December 13, 2004 13 Product Risks & Mitigations • Vendor Support – NVIDIA sets aside $1billion to use on • Acquisitions • R&D • Writing the Software – Time intensive product • “Build first and optimize later” December 13, 2004 14 Product Functional Diagram USER December 13, 2004 GAP SOFTWARE OUTPUT 15 Product Dataflow Diagram USER CONTEXT BUILD CONTEXT REQUEST VERSION CHECK CAPABILITIES TABLE GAP BEGIN CONTEXT GENERATOR GAP COMMAND GAP FLUSH GAP PROCESSING CONTEXT CPU QUEUE GAP END GPU RESULTS December 13, 2004 16 Product Dataflow Diagram USER CONTEXT BUILD CONTEXT REQUEST VERSION CHECK CAPABILITIES TABLE GAP BEGIN CONTEXT GENERATOR GAP COMMAND GAP FLUSH GAP PROCESSING CONTEXT CPU QUEUE GAP END GPU RESULTS December 13, 2004 17 Product Dataflow Diagram USER CONTEXT BUILD CONTEXT REQUEST VERSION CHECK CAPABILITIES TABLE GAP BEGIN CONTEXT GENERATOR GAP COMMAND GAP FLUSH GAP PROCESSING CONTEXT CPU QUEUE GAP END GPU RESULTS December 13, 2004 18 Product Dataflow Diagram USER CONTEXT BUILD CONTEXT REQUEST VERSION CHECK CAPABILITIES TABLE GAP BEGIN CONTEXT GENERATOR GAP COMMAND GAP FLUSH GAP PROCESSING CONTEXT CPU QUEUE GAP END GPU RESULTS December 13, 2004 19 Dataflow Diagram for Product CONTEXT BUILD CONTEXT REQUEST VERSION CHECK CAPABILITIES TABLE GAP BEGIN CONTEXT GENERATOR GAP COMMAND GAP FLUSH GAP PROCESSING CONTEXT CPU QUEUE GAP END GPU RESULTS December 13, 2004 20 Prototype December 13, 2004 21 Navier-Stokes Equations • used to refer to the incompressible form of the momentum equation. • a full and general set of differential equations governing the motion of a fluid http://www.navier-stokes.net/nsdef.htm December 13, 2004 22 Navier-Stokes Equations • Simulation of Fluid Like Behavior – Example of applications used within Computational Intensive Environments – Multiple Old Dominion PHD candidate’s thesis topics focus on Navier Stokes • Will serve as a basis application to prove efficiency of GPU over CPU – Shows an average 60% gain in efficiency December 13, 2004 23 Prototype Functional Diagram USER USER December 13, 2004 FLUID SIMULATION GPU VERSION FLUID SIMULATION CPU VERSION OUTPUT OUTPUT 24 Dataflow Diagram for Prototype CONTEXT BUILD CONTEXT REQUEST VERSION CHECK CAPABILITIES TABLE CONTEXT GENERATOR STREAM OPERATION CONTEXT GPU PROCESSING RESULTS GAP VERSION FUNCTIONAL PROTOTYPE December 13, 2004 25 Dataflow Diagram for Prototype CONTEXT BUILD CONTEXT REQUEST VERSION CHECK CAPABILITIES TABLE CONTEXT GENERATOR USE OF CPU ONLY WITH NO GPU PROCESSING STREAM OPERATION CONTEXT CPU RESULTS CPU VERSION FUNCTIONAL PROTOTYPE December 13, 2004 26 Demonstration • Two versions of an executable – CPU vs GPU • Navier Stokes on a vector field with four jets – Demonstration will consist of firing the jets for different lengths of time and observing performance – Observe CPU alone – Observe GPU alone – Observe Simultaneously December 13, 2004 27 On the CPU December 13, 2004 28 With GAP on the GPU December 13, 2004 29 Risks • Main research issues include quality of floating point – The numbers are ‘single precision’ not double. • Works best when ‘batched,’ which requires a relatively ‘parallel’ system – Already a multithreading issue. Solutions both in programmer practice and compiler design exist. December 13, 2004 30 Risks Mitigated (Prototype) • Floating Point Quality: – Distributed the field thickly enough that floating point was accurate. • Batching: – Used “Stream” operator that ensured a command size was sufficient before it flushed the results. December 13, 2004 31 Risk Mitigation (Product) • Floating Point – NVIDIA says cards will include double precision upon demand – NVIDIA partnership will expedite. • Batching – The Context system has an internal, self optimizing queue, with the “flush” instruction for programmer flexibility. December 13, 2004 32 Testing and Evaluation • 20 Frames to 1 “real world second” – Translates: • .75-1.75 speed on GPU – Faster than a “real world second”! • .025-.25 speed on CPU December 13, 2004 33 Suitability • What does this prove? – – – – Gives magnitude of performance increase Efficiency gain with no new hardware “Real world” problem solved Standard interface any program could use December 13, 2004 34 Degree of Completeness Similarities Prototype Release •General access functions •“Context” based input •Demonstrated performance gain •Utilizes GPU for as much work as possible •General access functions •“Context” based input •Demonstrated performance gain •Utilizes GPU for as much work as possible December 13, 2004 35 Degree of Completeness Differences Prototype Release •Specific to GF5 platform •Limited GAP Commands •“All or Nothing” GPU use •General platform •Wide array of GAP commands •Dynamic GPU use based on capabilities December 13, 2004 36 Budget Reports December 13, 2004 37 Phase I Funding • Phase I SBIR – Completed at the end of Phase 0 December 13, 2004 38 Phase I Budget Staff Resource N am e Project Manager Technical Documenter Web Developer Programmer-1 Programmer-2 Programmer-3 Programmer-4 TOTAL December 13, 2004 Initials Standard Rate Hours * Total PM $40.00/hr 88x8=704 $28,160 TD $15.00/hr 24x8=192 $2,880 WD $15.00/hr 10x8=80 $1,200 P1 $25.00/hr 39x8=312 $7,800 P2 $25.00/hr 39x8=312 $7,800 P3 $25.00/hr 39x8=312 $7,800 P4 $25.00/hr 39x8=312 $7,800 ~$63, 500 39 Phase I Budget L en g t h Staffing 40% Overhead Non-Staff T ot al December 13, 2004 88 day s $63,500 $25,400 $3,600 $92, 500 40 Major Milestones Phase I • • • • • • • • Organize Project Group Produce Project Descriptive Paper Develop Contracts Produce Budget White Paper Produce Project User Manual Develop Prototype Produce SBIR Phase II Proposal Produce Project Website December 13, 2004 41 Phase I Schedule December 13, 2004 42 Phase II Funding • Phase II SBIR – Completed at the end of Phase I December 13, 2004 43 Phase II Budget Staff Resource Name Project Manager Initials PM Standard Rate Hours * $50.00/hr 90x8=720 Total $36,000 Marketing Business Expert Communication Specialist Programmer* Web Developer Lawyer 1 Lawyer 2 Technical Document Writer Software Quality Assurance 1 Software Quality Assurance 2 BE $45.00/hr 90x8=720 $32,400 CS P1 WD L1 L2 TD SQA 1 SQA 2 $45.00/hr $35.00/hr $25.00/hr $40.00/hr $40.00/hr $25.00/hr $30.00/hr $30.00/hr 90x8=720 30x8=240 15*8=120 60*8=480 60*8=240 15*8=120 30*8=240 30*8=240 $32,400 $8,400 $3,000 $19,200 $19,200 $3,000 $7,200 $7,200 TOTAL $165,600 *4 programmers needed December 13, 2004 44 Patent Acquisition L ength Preliminary Patent Search Preparing and Filing Patent Application Patent Abstract Filing Fee Patent Prosecution Phase Patent Issue Phase Patent Maintenance Fee T OT AL December 13, 2004 90 days $1,500 $1,000 $7,000 $500 $6,000 $2,000 $6,000 $38, 000 45 Phase II Budget L en g t h Staffing 40% Overhead Non-Staff Patent Acquisition Travel Expenses T OT AL 90 day s $165,600 $66,200 $0 (*) 38,000 $15,000 $314, 800 * Purchased in Phase 1 December 13, 2004 46 Major Milestones Phase II • • • • Production Marketing Legal Negotiation Final Preproduction Alterations December 13, 2004 47 Phase II Schedule December 13, 2004 48 Phase III • We plan to sell the product to NVIDIA at the end of Phase II • Doing so would mitigate all responsibilities and risk factors that may arise on the market – While we increase the companies profit by over $6.5 million December 13, 2004 49 Profit Margin/Break Even • Immediate Profit • $70 million average profit for acquisitions – If we obtain 1/10(average) – We would still make a $6.5 million gain http://nvidia.com/object/IO_20010612_6602.html http://nvidia.com/object/IO_8086.html December 13, 2004 50 Profit Margin/Break Even Phase 1 Budget <$93,000> Phase 2 Budget <$315,000> Total <$408,000> GAP Acquisition by NVIDIA $7,000,000 NET PROFIT $6,592,000 December 13, 2004 51 Conclusion • Through our prototype we have achieved “proof of concept” • The overall efficiency gain obtained within computationally intensive environments proves a need for GAP December 13, 2004 52 Graphical Asymmetric Processing December 13, 2004 53