The Next Generation Application Server – How Event Based Processing yields scalability Guy Korland R&D Team Leader Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 2 About me… • Core Team Leader – GigaSpaces since 2005 • MSc – Technion (Prof. Roy Friedman) • PhD candidate – Tel Aviv University (Prof. Nir Shavit) • Lead Deuce STM – (www.deucestm.org) Java Software Transactional Memory 3 GigaSpaces XAP – Designed For: Performance Scalability Latency 4 About GigaSpaces eXtreme Application Platform (XAP) A middleware platform enabling applications to run a distributed cluster as if it was a single machine 2,000+ Deployments “GigaSpaces has saved us significant time and cost” Phil Ruhlman, CIO, Gallup Among Top 50 Cloud Vendors “GigaSpaces exceeded our performance requirements and enabled us to build a flexible, cost-effective infrastructure” Julian Browne, Virgin Mobile 5 100+ Direct Customers “GigaSpaces has allowed us to greatly improve the scalability and performance of our trading platform” Geoff Buhn, Options Trading Technology Manager, SIG 5 GigaSpaces Evolution SLA container Partition & Replication Event Container Load Balance NG Application Server PaaS Cloud Single space 2000 2003 2005 2006 2007 6 2008 2009 Not going to talk about… • Jini (Java SOA) • Data Grid implementation. • Map-Reduce. • JDBC/JMS/JPA. • Cloud computing. • Batch processing. • Mule ESB. • WAN vs LAN • Different languages interoperability. • TupleSpace model extension. • JDK improvements (RMI, Reflection, Serialization, Classloading…) 7 Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 8 Today’s Reality – Tier Based Architecture Separate technology implementation bottlenecks bottlenecks Separate technology implementation Separate technology implementation Bottlenecks in all areas where state is stored, architecture can’t scale linearly! 9 Traditional Architecture - path to complexity… (marktplaats.nl) A Auction Service B Bid Service T Trade Service I Info Service T Timer Service Auction Bid Trade Info A B T II Process Service Service Service Service Bid Result Bid Process Accepted Validate Trade Bid Result Place bid Bidder Get Bid Result Auction Owner Timer T Service 10 Traditional Architecture - path to complexity… A Auction Service B Bid Service T Trade Service I Info Service T Timer Service A B T I Business tier Bidder Auction Owner Back-up Separate failover strategy and implementation for each tier Redundancy doubles network traffic 11 Bottlenecks are created Latency is increased Back-up 11 Do you see the Problem? Scalability is not linear Scalability management Business tier I nightmare A B T B Bidder Auction Owner Back-up Back-up Back-up 12 12 Back-up There is a huge gap between peak and average loads 1,300,000,000 1,200,000,000 1,100,000,000 1,000,000,000 900,000,000 800,000,000 700,000,000 600,000,000 500,000,000 400,000,000 300,000,000 200,000,000 100,000,000 0 J-04 M-04 M-04 J-04 S-04 N-04 J-05 M-05 M-05 J-05 S-05 N-05 J-06 M-06 M-06 J-06 S-06 N-06 J-07 M-07 M-07 J-07 S-07 13 Bottlenecks, Performance, Scalability and High availability headaches Bad Publicity Revenue Loss Customer Dissatisfaction Regulatory Penalties 14 15 TBA – Summary • Historically the following has been done… – Tune, tune and tune configuration and code • Once a bottleneck has been resolved, the next one glooms – Hardware over provision • To make sure that at peak times the response times were still acceptable – Hardware upgrades • To get rid of bottlenecks, whose origin was impossible to track down – Alternative patterns • Avoiding 2-phase-commit, using patterns like ‘compensating transactions’ • Using Active/Passive failover, to make the response times faster, risking and in fact accepting potential data-loss • Partition the database, but not for size-reasons 16 Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 17 Event Containers 18 Based on JavaSpaces C++ 19 The Master Worker Pattern 20 GigaSpaces - based on Shared Transactional Memory • Write – writes a data object • Notify – generates an event on data updates • Read – reads a copy of a data object • Take – reads a data object and deletes it Write + Read Data Caching Write + Take Master Worker Write + Notify Messaging - Pub/Sub 21 Event Containers 22 Step 1 – Create a Processing Unit A Auction Service B Bid Service A T Trade Service I Info Service T Timer Service B T I Processing Unit Business tier Bidder Auction Owner Single model for design, deployment and management No integration effort Manage data in memory Collapse the tiers Collocate the services 23 23 Step 2 – Async Persistency A Auction Service B Bid Service A T Trade Service I Info Service T Timer Service B T I Place Bid Processing Unit Validate Bidder Process Bid Auction Owner Process Trade Get Bid Results Process Results Collocation of data, messaging and services in memory: Minimum Latency (no network hops) Maximum Throughput 24 24 Persist for Compliance & Reporting purposes: - Storing State - Register Orders - etc. Step 3 – Resiliency Backup SLA Driven Container A B T I A B T I Processing Unit Single, built-in failover/redundancy investment strategy Fewer points of failure Automated SLA driven failover/redundancy mechanism Continuous High Availability 25 Step 3 – Resiliency SLA Driven Container Primary Backup Backup Processing Unit Single, built-in failover/redundancy investment strategy Fewer integration points mean fewer chances for failure Automated SLA driven failover/redundancy mechanism Continuous Availability Self Healing Capability 26 Step 4 – Scale Backup A Backup B T I B T I B T I A A A B T I Processing Unit Write Once Scale Anywhere: Linear scalability Single monitoring and management engine Automated, SLA-Driven deployment and management - Scaling policy, System requirements, Space cluster topology 27 27 Event Containers 28 Step 5 – Auto Scale Out 29 Processing Unit – Scalability Unit Single Processing Unit Processing Unit - Scaled Involves Config Change No code changes! 30 Processing Unit – High-Availability Unit Primary - Processing Unit Business logic – Active mode Backup - Processing Unit Business logic – Standby mode Sync Replication 31 Database Integration - Async persistency Primary - Processing Unit Business logic – Active mode Backup - Processing Unit Business logic – Standby mode Sync Replication Initial Load Async Replication Async Replication Mirror Process ORM 32 XAP = Enterprise Grade Middleware • Scale-out application server End 2 End scale-out middleware for: Web, Data, Messaging, Business logic Space Based Architecture – designed for scaling stateful applications In-memory • Proven performance, Scalability, Low latency, Reliability • SLA Driven • Unique database scaling solution that fits cloud environment In Memory Data Grid O/R mapping support • Support major Enterprise languages Java, .Net, C++ 33 Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 34 Built-in Event Containers • Polling Container • Notify Container Processing Unit Service Bean Service Bean Polling Event Container Notify Event Container Take Write Data 35 Notify Messaging Polling Container • Used for point-to-point messaging • Container polls the Space Processing Unit for events Service • Comparable with the Bean way Ajax works Polling Event Container Take 36 Write Notify Container • Used for publish-subscribe messaging • Space notifies the container Processing Unit Service Bean Notify Event Container Notify 37 Typical Application 38 Service Grid Summary Powerful Universal Container Java/Net/C++ Distributed Fault Tolerant Object based Transactional Publish/Subscribe 39 Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 40 Event Containers 41 The POJO Based Data Domain Model @SpaceClass(fifo=true) public class Data { … @SpaceId(autoGenerate = true) public String getId() { return id; } public String setId(String id) { this.id = id; } SpaceClass indicate that this is a SpaceEntry – SpaceClass includes classlevel attributes such as FIFO,Persistent… SpaceId used to define the key for that entry. public void setProcessed(boolean processed) { this.processed = processed; } public boolean isProcessed(boolean processed) { return this.processed; } } 42 Data Processor Service Bean @SpaceDataEvent to be called when an event is triggered. public class DataProcessor{ @SpaceDataEvent public Data processData(Data data){ … data.setProcessed(true); //updates the space return data; } Updates the data in the Space. } 43 Wiring Order Processor Service Bean through Spring <bean id="dataProcessor“ class="com.gigaspaces.pu.example1.processor.DataProcessor" /> <os-events:polling-container id="dataProcessorPollingEventContainer" giga-space="gigaSpace"> <os-events:tx-support tx-manager="transactionManager"/> <os-core:template> <bean class="org.openspaces.example.data.common.Data"> <property name="processed" value="false"/> </bean> The event Template </os-core:template> <os-events:listener> <os-events:annotation-adapter> <os-events:delegate ref="dataProcessor"/> </os-events:annotation-adapter> The event Listener </os-events:listener> </os-events:polling-container> 44 Data Feeder public class DataFeeder { public void feed(){ Data data = new Data(counter++); data.setProcessed(false); //feed data gigaSpace.write(data); } } 45 Feed Data Remoting – Taking one step forward Event 46 Remoting – Taking one step forward Reduce 47 Remoting – IDataProcessor Service API public interface IDataProcessor { // Process a given Data object Data processData(Data data); } 48 Remoting - DataProcessor Service @RemotingService public class DataProcessor implements IDataProcessor { public Data processData(Data data) { … data.setProcessed(true); return data; } } 49 Remoting - Order Feeder public class DataFeeder { private IDataProcessor dataProcessor; public void setDataProcessor(…) { this.dataProcessor = dataProcessor; } public Data feed(){ Data data = new Data(counter++); // remoting call return dataProcessor.process(data) } } 50 Summary 51 Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 52 Scale up Throughput Benchmark – Physical Deployment Topology Embedded (one machine , one process) X4450 Client GigaSpaces 8 spaces Remote (multiple machines , multiple processes) white box Client Switched Ethernet LAN X4450 GigaSpaces 4 spaces , one per GSC X4450 GigaSpaces 4 spaces , one per GSC 53 Scale up Throughput Benchmark – Embedded mode x4450 - stac-sun-1 - Embedded Space - TP vs. Multiple Threads - 8 Partitions 2000000 1.8 Million read sec! 1800000 TP (operations/sec) 1600000 1400000 1.1 Million write/take sec! 1200000 1000000 800000 600000 Write TP 400000 Read TP 200000 Take TP 0 1 2 3 4 6 8 Client Threads 54 10 12 16 20 30 Scale up Throughput Benchmark – Remote mode x4450 - stac-sun-3 - Remote Space - TP vs. Multiple Threads - 4 partitions 100000 90000 Write TP 80000 TP (operations/sec) Read TP 70000 90,00 read sec! Take TP 60000 50000 40000 30000 45,00 write/take sec! 20000 10000 0 1 2 3 4 5 8 12 16 20 24 Client Threads 55 26 30 34 38 42 46 50 54 58 Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 56 Event Containers 57 Web Container Grid 58 Web application – Pet Clinic 59 Classic Architecture – Step 1- Request Submission Data Grid Processing Unit Processing Unit Processing Unit Processing Unit Service Bean Service Bean Service Bean Service Bean T T Replication Primary 1 Primary 2 Backup 1 3. invocation Get request and invoke Service Replication Async Mirroring Load Backup 2 3. invocation Web PU Task Web PU Proxy Task Web PU Proxy Task 2. Route Request Apache Load-Balancer 1. User Click Submits request 60 Proxy Classic Architecture – Step 2- Retrieve Results Data Grid Processing Unit Processing Unit Processing Unit Processing Unit Service Bean Service Bean Service Bean Service Bean T T Replication Primary 1 Primary 2 Backup 1 Backup 2 Result Result Page Generation Load 1. Result returned 1. Result returned Web PU Replication Async Mirroring Aggregated Result Web PU Reducer Proxy Task Web PU Proxy Task 2. Route Request Apache Load-Balancer 3. User getting Page 61 Proxy Web Application Benchmark Results - Capacity 70 60 50 40 30 20 10 0 Users 62 0 10 90 80 70 60 50 40 20 15 10 5 4 3 2 1 Server 2 Servers 3 Servers 1 Latency(ms) Web Benchmark (pet clinic) - Latency vs. Users Web Application Benchmark Results - Capacity Web Benchmark (pet clinic) - Latency vs. Users 3500 3000 1 Server 2 Servers 3 Servers 2000 1500 1000 500 10 00 20 00 25 00 30 00 35 00 40 00 45 00 50 00 75 0 50 0 25 0 20 0 15 0 10 0 0 50 Latency 2500 Users 63 Game Server 64 Space Based Architecture – Game Server Scaling out Game Servers Intercepts update events Table Feeder GameTable Loading Game Tables into the partitioned spaces Partitioned Space Notify Query Publisher (II) Publisher (lobby) Randomly updates the game tables Game Table Directory Game Table search Pub/Sub messaging Player search 65 Space Based Architecture – Game Server Game Servers Publisher Servers GigaSpaces Service Grid Java runtime GameTable Physical backup Partitioned Space GameTable Partitioned Space Notify / Query Running continues query per user GameTable Partitioned Space GameTable Partitioned Space Notify / Query GameTable Partitioned Space Uploading 30,000 players for 6000 tables Randomly updates game tables 66 GameTable Partitioned Space Dynamic repartitioning and load sharing I Indexed Notify / Query template Notify / Query template Partitioned Space Partitioned Space Partitioned Space SLA Driven Container 67 Dynamic repartitioning and load sharing II Partitioned Space Partitioned Space Partitioned Space SLA Driven Container SLA Driven Container 68 Scaling • 2000 tables • 4000 tables • 6000 tables • 10,000 players • 20,000 players • 30,000 players Partitioned Space Partitioned Space Partitioned Space Backup Space Backup Space Backup Space SLA Driven Container SLA Driven Container Throughput: ~12K/sec Throughput:~18K/sec ~6K/sec 69 SLA Driven Container Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 70 Challenges • Distributed queries (Join, Subqueries…) – Select * from Person p where p.name in (Select * from Managers) – Select * from Person p, Address a where p.addressId = a.addressId and a.street =“MyStreet” • Dynamic partition – Consistent hashing, buckets – Update routing tables (proxy) – Live queries • Distributed transactions • Cluster of Clusters data integration over the WAN. • Integration with External Data Source – e.g. DB (bottleneck) 71 Challenges • Integration with External Data Source – e.g. DB (bottleneck) • Index of a complex event query/Blocking query. (Notify) • Cluster status Consensus (Who is alive?) • Even distribution of data. • Technical: How do you maintain 100K tcp connection? • Cloud computing? • Too big set of data LRU/LFU cache? • Scalable distributed Lookup service. • Network Split Brain. 72 Agenda • Preview • The Problem • The Solution • Event Containers • Example • Benchmarks • Customer Use Cases • Challenges • Summary Q&A 73 Thank You! Q&A Appendix SLA Driven Deployment SLA: • Failover policy • Scaling policy • Ststem requirements • Space cluster topology PU Services beans definition 76 Continuous High Availability Failure Fail-Over 77 Dynamic Partitioning = Dynamic Capacity Growth P - Primary Max Capacity=6G Capacity=2G Capacity=4G B - Backup VM 1 ,2G GSC P A P E VM 2 ,2G GSC VM 3 , 2G GSC B Partition 2 F Partition 1 P C D Partition 3 GSC A In some point VM 1 free memory is below 20 % - it about the time to Later .. Partition 2 needs to increase the capacity – lets move move… After the move , data Partitions 1 to another GSC and is recovered from the backup recover the data from the running backup! VM 4 ,4G B VM 5 , 4G GSC B Partition 2 C B E F Partition 1 78 D Partition 3 B Executors 79 Task Executors – Task Execution Executing a task is done using the execute method AsyncFuture<Integer> future = gigaSpace.execute( new MyTask(2) ); int result = future.get(); Client Task Processing Unit 1 2 Task Proxy 3 Result 4 80 Task Executors – Task Routing Routing a task can be done in three ways 1. Using the task itself 2. Passing a POJO to the execute method 3. Specifying a routing-parameter in the execute method Processing Unit Processing Unit Processing Unit Processing Unit Processing Unit Task Client Task Proxy Task Client Proxy 81 Result Processing Unit Task Executors – DistributedTask Execution Executing a distributed task is done using the execute method AsyncFuture<Integer> future = gigaSpace.execute( new MyDistTask() ); int result = future.get(); Processing Unit Processing Unit Task Task Processing Unit Processing Unit Task Processing Unit Task Task Result Client Task Client Proxy 82 Processing Unit Aggregated Result Task Result Result Proxy Reducer Task Executors – DistributedTask Routing Routing a distributed task can be done 1. In the same ways as with the plain Task interface 2. By broadcasting 3. Specifying a number of routing-parameters in the execute method Processing Unit Processing Unit Task Processing Unit Processing Unit Task Processing Unit Task Task Result Client Task Client Proxy 83 Aggregated Result Result Proxy Reducer Processing Unit Service Executors 84 Service Executors 85 IMDG Operations 86 IMDG Basic Operations Space Application Write Space Application WriteMultiple Space Application Space Read Application ReadMultiple Space Application Take Space Application TakeMultiple Space Application Space Notify Application 87 Execute IMDG Access – Space Operations – Write Operation write-operation writes a new object to a space Instantiate an object Set fields as necessary Write the object to the space Space Application Write Auction auction = new Auction(); auction.setType("Bicycle"); gigaSpace.write(auction); 88 IMDG Access – Space Operations – Read Operation read-operation reads an object from a space A copy of the object is returned The original copy remains in the space Build a template/query (more on this later) Read a matching object from the space Space Application Read Auction template = new Auction(); Auction returnedAuction = gigaSpace.read(template); 89 Object SQL Query Support Supported Options and Queries Opeations: =, <>, <,>, >=, <=, [NOT] like, is [NOT] null, IN. GROUP BY – performs DISTINCT on the POJO properties Order By (ASC | DESC) SQLQuery rquery = new SQLQuery(MyPojo.class,"firstName rlike '(a|c).*' or ago > 0 and lastName rlike '(d|k).*'"); Object[] result = space.readMultiple(rquery); Dynamic Query Support SQLQuery query = new SQLQuery(MyClass.class,“firstName = ? or lastName = ? and ago>?"); query.setParameters(“david”,”lee”,50); Supported Options via JDBC API COUNT, MAX, MIN, SUM, AVG , DISTINCT , Blob and Clob , rownum , sysdate , Table aliases Join with 2 tables Non Supported HAVING, VIEW, TRIGGERS, EXISTS, BETWEEN, NOT, CREATE USER, GRANT, REVOKE, SET PASSWORD, CONNECT USER, ON. NOT NULL, IDENTITY, UNIQUE, PRIMARY KEY, Foreign Key/REFERENCES, NO ACTION, CASCADE, SET NULL, SET DEFAULT, CHECK. Union, Minus, Union All. STDEV, STDEVP, VAR, VARP, FIRST, LAST. # LEFT , RIGHT [INNER] or [OUTER] JOIN 90 IMDG Access – Space Operations – Take Operation take-operation takes an object from a space The matched object is removed from the space Build a template/query (more on this later) Take a matching object from the space Space Application Take Auction template = new Auction(); Auction removedAuction = gigaSpace.take(template); 91 IMDG Access – Space Operations – Update Operation update is equivalent to performing take and write Executed in a single atomic call Space Application Update AuctionItem item = new AuctionItem(); item.setType("Bicycle"); gigaSpace.write(item); item = gigaSpace.read(item); item.setType("Motorbike"); Object returnedObject = space.update(item, null, Lease.Forever, 2000L, UpdateModifiers.UPDATE_OR_WRITE); 92 IMDG Access – Space Operations – Batch API Apart from the single methods GigaSpaces also provides batch methods The methods are: writeMultiple: writes multiple objects readMultiple: reads multiple objects updateMultiple: updates multiple objects takeMultiple: reads multiple objects and deletes them Notes: Performance of the batch operations is generally higher Requires one call to the space Can be used with Template matching or SQLQuery 93 IMDG Access – Space Operations – Batch API • writeMultiple writes the specified objects to the space. Space Application WriteMultiple Auction[] auctions = new Auction[] { new Auction(10), new Auction(20) }; auctions = gigaSpace.writeMultiple(auctions, 100); 94 IMDG Access – Space Operations – Batch API • readMultiple reads all the objects matching the specified template from the space. Space Application ReadMultiple Auction auction = new Auction(); Auction[] auctions = gigaSpace.readMultiple(auction, 100); 95 IMDG Access – Space Operations – Batch API • takeMultiple takes all the objects matching the specified template from the space. Space Application TakeMultiple Auction auction = new Auction(); Auction[] auctions = gigaSpace.takeMultiple(auction, 100); 96 IMDG Access – Space Operations – Batch API • updateMultiple updates a group of specified objects. Space Application UpdateMultiple Auction[] auctions = new Auction[] { new Auction(10), new Auction(20) }; auctions = gigaSpace.updateMultiple(auctions, 100); 97 IMDG Summary Powerful shared memory Service Distributed Fault Tolerant Object based Single and Batch Operations Transactional 98