Multi-user Extensible Virtual Worlds Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types of interactions. Sheldon Brown, Site Director CHMPR, UCSD Daniel Tracy, Programmer, Experimental Game Lab Erik Hill, Programmer, Experimental Game Lab Todd Margolis, Technical Director, CRCA Kristen Kho, Programmer, Experimental Game Lab Current schemes using compute clusters break virtual worlds into small “shards” which have a few dozen interacting objects. Compute systems with large amounts of coherent addressable memory alleviate cluster node jumping and can create worlds with several orders of higher level data complexity. Tens of thousands of entities vs. dozens per shard. Takes advantage of techniques hybrid compute techniques for richer object dynamics. Central server manages world state changes Number of clients and amount of activity determines world size and shape City road schemes are computed for each player when they enter a new city, using Hybrid multicore compute accelerators Each player has several views of the world: •Partial view of one city •Total view of one city •Partial view of two cities •View of entire globe Within a city are several thousand objects. The dynamics of these objects are computed on the best available resource, balancing computability and coherency and alleviating world Sharding. Many classes of computing devices are used. z10 mainframe – transaction processing state management Server side compute accelerators: NVidia Tesla, Cell processor and x86 Multi-core portable devices (i.e. snapdragon based cell phone) Varied desktop comptuation including hybrid multicore Computing cloud data storage. Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types of interactions. Multiple 10gb interfaces to compute accelerators, storage clusters and compute cloud. Cell Processor, x86 and GPU compute accelerators for asset transformation, physics and behaviors. Server services are distributed across cloud clusters, and redistributed across clients as performance or local work necessitates. Coherency with overall system is pursued, managed by centralized server. Virtual world components have dynamic tolerance levels for discoherency and latency. Development Server Framework 5/2010 2 QS22 blades – 4 Cell Processors 3 10gb interfaces to compute accelerators 2 HS22 blades - 4 Xeons 1 10gb interfaces to internet Many Clients Z10 mainframe computer at San Diego Supercomputer Center 2- IFL’s with 128mb Ram, zVM virtual OS manager with Linux guests 6 tb storage fast local storage – 15K disks 4 SR and 2 LR 10gb ethernet interfaces 4 QS20 blades nVidia Tesla accelerator – 4 GPU’s on linux host, external dual pci connection. SDSC View Multi-user Extensible Virtual Worlds Producing a multi-user networked virtual world from a single-player environment Goals • Feasibility – – Transformation from single-player program to client/server multi-player networking is non-trivial Structured methodology for transformation required • Scalability – – Support large environments, massively multi-player After working version, iteratively tackle bottlenecks • Multi-platform server – – Explore z10, x86, CellBE, Tesla accelerators Cross-platform communication required Evaluate “drop in” solutions Benefits and liabilities of client/server side schemes such as OpenSIM and Darkstar. The (Original) Scalable City Technology Infrastructure OpenGL Direct3D Ogre3D Custom virtual reality engine real time3D rendering engine ERSATZ ODE, Newton Open source physics libraries fmod Sound library Intel OpenCV Real time computer vision CGAL Computational Geometry Library Autodesk Maya, 3DMax Procedural assets creation through our own plug-ins Loki, Xerces, Boost Utilities Libraries Chromium, DMX, Sage Distributed rendering Libraries NVIDIA FX Composer, ATI Render Monkey Serial pipeline. Increase performance by IDEs for HLSL and GLSL, GPU programming increasing CPU speed. Moore’s law computational gains have not been achievable via faster clock speeds for the past 8 years. Multicore computing is the tactic • • • • New New New New computing architectures algorithmic methods software engineering systems designs nVidia Fermi GPGPU 16 units with 32 cores each IBM System z processor 4 cores 1 service procesor Sony/Toshiba/IBM Cell BE Processor Intel Larrabee Processor 1 PPU, 8 SPU’s per chip 32 x86 cores per chip The Scalable City Next Stage Technology Infrastructure Cell Processors compute Dynamic Assets Intel OpenCV Real time computer vision ERSATZ ENGINE Computational Geometry Library Input Data Data Parallel n threads + SIMD Thread Barrier Abstract physics to use multiple physics libraries (ODE, Bullet, etc.) Replace computational Output Data bottlenecks in these libraries with data parallel operations. Fmod Sound library Input Data Output Data Convert assets to data parallel meshes after physics transformation, boosts rendering ~33% Ogre3D Scene graph Open Source Libraries – needs work for adding data level parallelism The Scalable City Next Stage Technology Infrastructure Cell Processors compute Dynamic Assets Intel OpenCV Real time computer vision ERSATZ ENGINE Computational Geometry Library Input Data DarkStar Server Data Parallel n threads + SIMD Thread Barrier Abstract physics to use multiple physics libraries (ODE, Bullet, etc.) Replace computational Output Data bottlenecks in these libraries with data parallel operations. Fmod Sound library Input Data Output Data Convert assets to data parallel meshes after physics transformation, boosts rendering ~33% Ogre3D Scene graph Max’s out at about 12 clients for world as complex as Scalable City Open Sim Server Real Xtend or Linden Client ERSATZ ENGINE Systems are not designed for interaction of 10,000’s of dynamic objects Even a handful of complex objects overload dynamics computation. Extensive re-engineering makes to provide capability and use hybrid multicore infrastructure – defeating their general purpose platform Challenges & Approach • Software Engineering Challenges: – SC: Large, Complex, with many behaviors. – Code consisted of tightly coupled systems not conducive to separation into client and server. – Multi-user support takes time, and features will be expanded by others simultaneously! • Basic Approach - Agile methodology: – Incrementally evolve single-user code into a system that can be trivially made multi-user in the final step. – Always have a running and testable program. – Test for unwanted behavioral changes at each step. – Allows others to expand features simultaneously. Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. Data Structures Landscape Manager BlackBoard (Singleton) Rendering Clouds Physics Player Inverse Kinematics Camera House Piece Audio Road Animation House Lots User Input Visual Component MeshHandler Abstracting Client & Server Object Representations • Server: Visual Component – Visual asset representation on the server side – Consolidates task of updating clients – Used for house pieces, cyclones, landscape, roads, fences, trees, signs (animated, static, dynamic). – Dynamic, run-time properties control update behavior • Client: Mesh – Mesh properties communicated from Visual Component – Used to select rendering algorithm – Groups assets per city for quick de-allocation Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. Data Access Paths • Systems access world state via the Blackboard (singleton pattern) • After separating into Client & Server Blackboard, Server systems must be weaned off of Client Blackboard and vice versa. • Cross-boundary calls recast as buffered communication. Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved. Initialization & Run-loop Initialize Graphics Initialize Physics Init Loading Screen Load Landscape Data Initialize Clouds Create Roads Place Lots Place House Pieces Place Player Get Camera Position Initialize Graphics Init Loading Screen Initialize Clouds Get Camera Position Initialize Physics Load Landscape Data Create Roads Place Lots Place House Pieces Place Player Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved. 4. Unify cross-boundary comm. to one subsystem. – This will interface with network code in the end. Unify Communication ReadClient ReadServer MovePlayer Transforms Animations Render Physics/IK UserInput WriteClient WriteServer Single buffer, common format, ordered messages Communicate in one stage: solve addiction to immediate answers Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved. 4. Unify cross-boundary comm. to one subsystem. – This will interface with network code in the end. 5. Final separation of client & server into two programs – Basic networking code allows communication Separate Two programs, plus basic synchronous networking code Loops truly asynchronous (previously one called the other) Step by Step Conversion 1. Data-structure focused: is it client or server? – Some data structures may have to be split. 2. All data access paths must be segmented into c/s – Cross-boundary calls recast as buffered communication. 3. Initialization & run loop separation – Dependencies on order must be resolved. 4. Unify cross-boundary comm. to one subsystem. – This will interface with network code in the end. 5. Final separation of client & server into two programs – Basic networking code allows communication 6. Optimize! – New configuration changes behavior even for single player Experience • Positives – Smooth transition to multi-user possible – All features/behaviors retained or explicitly disabled – Feature development continued successfully during transition (performance, feature, and behavioral enhancements on both client and server side, CAVE support, improved visuals, machinima engine, etc). • Negatives – Resulting code structure not ideal for client/server application (no MVC framework, some legacy structure). – Feature development and client/server work sometimes clash, require re-working in client/server fashion. Initial Optimizations Basic issues addressed in converting to a massively multi-user networked model Multi-User Load Challenges • Communications • Graphics Rendering • Geometry Processing • Shaders • Rendering techniques • Dynamics Computation • Physics • AI or other application specific behaviors • Animation Multi-User Load Challenges • Communications • Graphics Rendering • Geometry Processing • Shaders • Rendering techniques • Dynamics Computation • Physics • AI or other application specific behaviors • Animation Communication • In a unified system, subsystems can share data and communicate quickly. • In a Client/Server model, subsystems on different machines have to rely on messages sent over the network – Data marshalling overhead – Data unmarshalling overhead – Bandwidth/latency limitations New Client Knowledge Model • Stand-Alone version had all cities in memory – All clients received updates for activity in all cities – Increased memory & bandwidth use as environment scales • Now: Clients only given cities they can see – City assets dynamically loaded onto client as needed – Reduces the updates the clients need • Further Challenge: Dynamically loading cities without server or client hiccups. Communication Challenges • More Clients leads to: – More activity – Physics object movements – Road/Land Animations – House Construction – More communication – Per client due to increase in activity – More clients for server to keep up to date – Server communication = activity x clients! • Dynamically loading large data sets (cities in this case) without server or client hiccups Communication Subsystem – Code-generation for data marshalling – Fast data structure serialization – Binary transforms for cross-platform – Token or text-based too slow – Endian issues resolved during serialization – Tested on z10, Intel • Asynchronous reading and writing – Dedicated threads perform communication – Catch up on all messages each game cycle Reducing Data Marshalling Time • Reduce use of per-player queues: – Common messages sent to a queue associated with the event’s city – Players receive buffers of each city they see, in addition to their player-specific queue. – Perform buffer allocation, data marshalling, & copy once for many players. – Significantly reduces communication overhead for server. Preventing Stutters • Send smaller chunks of data – Break up large messages • Incrementally load cities as a player approaches them – Space out sending assets over many cycles – Large geometry (landscape) subdivided – If player arrives, finish all transfers • Prevent disk access on client – Pre-load resources