InfoSphere Streams for Real Time Analytics in Financial Services Industry Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com Housekeeping • We value your feedback - don't forget to complete your evaluation for each session you attend and hand it to the room monitors at the end of each session • Overall Conference Evaluation will be provided at the General Session on Friday • Visit the Expo Solutions Centre • Please remember this is a 'non-smoking' venue! • Please switch off your mobile phones • Please remember to wear your badge at all times Disclaimer The Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Agenda • Financial Markets Business Challenges • Industry Technical Challenges • InfoSphere Streams • Trend Calculator • Financial Toolkit • Data Mining in Real Time • InfoSphere Streams Directions 4 Firms Must Capitalize on Drivers of Change Drivers Implications Actions Markets becoming electronic Speed as source of Alpha Accelerate the end-to-end marketplace connectivity and execution Real-time data pressures Volume is a barrier Increase capacity to handle current and forecasted volumes Information availability Transparency is required Store, retrieve and distribute comprehensive time series data in a timely manner Transaction costs pressures Detailed analysis of trading process Access to broader markets by accessing multiple markets 5 Real time data pressures We are in a technology arms race Latency reductions with a clear business value or cost associated Exponential increases in volumes For US equity electronic trading brokerage 1 millisecond = $4M in annual revenue Source: Tabb Group 6 The Volume, Complexity & Semantic Depth of data that to be analysed will increase significantly Structured data Historical Trade Data Structured & Unstructured data Market Data Historical Trade Data Risk Analytics Data Market Data Real World Sensors Risk Analytics Data Analytics & Insight Analytics & Insight Internal Message Bus Tomorrow? Blogs & Commentary Corporate Press Reports Government Statistics Weather Data Video News Feeds Web Pages RSS Feeds + Other Feeds Information overload 7 The Transaction Life Cycle or latency loop – end to end latency is the key to success and there are no prizes for coming second Investment / trading goals Transaction Cost Analysis latency measurement is a competitive advantage to deliver Alpha Market Data WAN Connectivity Trading Decision What to Buy/Sell Middleware Execution Algorithm VWAP,etc. CEP Engines Order Routing Decision OMS/EMS Matching Exchanges , Speed Speed Speed Speed End to end latency knowledge and a continuous performance road map is required Current approaches reaching limits, based on x86 and networking technologies 8 The Manycore programming challenge Programmers cannot cope with thousands of threads and complex data flows using existing programming models I/O I/O DSK I/O RAM CPU NET NET DSK DSK RAM RAM RAM RAM Core Core Core Core RAM RAM RAM RAM Core Core Core Core RAM RAM RAM RAM Core Core Core Core RAM RAM RAM RAM Core Core Core Core Single Core Single Thread 100% Serial Programming Multicore (2-16) Multithread (10s) 80/20 Serial/Parallel Programming Yesterday Today Manycore (32-100s) 20/80 Serial/Parallel Programming Threading model breaks as complexity exceeds programmer capability Tomorrow 9 Options for exposing parallelism in a programming model Parallelism Fully Exposed Full exposure of machine details Only usable by experts High performance Low productivity Partial Exposure Limits exposure to machine details Expands programmer community High performance Higher productivity for C/C++ class programmers - Bounds checks, pointer checks, strong typing, etc. Parallelism Implicit No exposure of machine details, e.g., Hadoop/map reduce, IBM Streams Processing Language Usable by larger number of programmers High Performance High Productivity 10 Time is ripe for a new era of computing • Emerging trends create need for new languages – – – – – Scientific programming Fortran Business programming Cobol Systems programming at higher level C Increased productivity C++ Web programming Java • Streaming data sources and multicore architectures – Streams Processing Language 11 Delivering ‘Continuous Intelligence’ with Powerful Analytics Real time delivery Automated Options Market Making: – Peak throughput of 10 million messages per second – Mean latency under 100 micro seconds across 28 dual quad core x86 blades Powerful Analytics Millions of events per second Microsecond Latency Traditional / Non-traditional data sources 12 IBM InfoSphere Streams v1.2 Development Environment Runtime Environment Toolkits & Adapters Front Office 3.0 Eclipse IDE StreamSight Stream Debugger RHEL v5.3 or v5.4 x86 multicore hardware InfiniBand support Up to 125 servers Connectors to data sources Operator Library Financial Toolkit Mining Toolkit 13 Scalable stream processing • InfoSphere Streams provides – A programming model and IDE for defining data sources and software analytic modules called operators that are fused into process execution units (PEs) – infrastructure to support the composition of scalable stream processing applications from these components – deployment and operation of these applications across distributed x86 processing nodes, when scaled processing is required – stream connectivity between data sources and PEs of a stream processing application 14 Trend Calculator Example Symbols to be output Trend File 1 playback Up/down trend for Requested symbols Trend File 2 playback Trend File 3 playback Algo Parameters Per Symbol 15 Streams offers tremendous deployment flexibility With only a simple re-compile of application: All on one machine fused into one multi-threaded process All on one machine; each operator in its own process Each operator in its own process, each process on its own machine 16 Trend Calculator Example 17 Financial Services Toolkit Speeds development of Streams financial domain applications • Adapters layer used by top two layers and user-written apps • Functions layer used by top layer and user-written apps • Solution Frameworks are “starter” applications that target a particular use case 18 Adapters, Functions, Utilities • Financial Information Exchange (FIX) Adapters – fixInitiator Operator, fixAcceptor Operator, FixMessageToStream Operator, StreamToFixMessage Operator • WebSphere Front Office for Financial Markets (WFO) Adapters – WFOSource Operator, WFOSink Operator • WebSphere MQ Low-Latency Messaging (LLM) Adapters – MQRmmSink Operator • Functions: – Coefficient of Correlation – “The Greeks” (Put/Call values, Delta, Theta, Rho, Charm, DualDelta, etc.) • Operators: – Wrappering QuantLib financial analytics open source package. – Provides operators to compute theoretical value of an option: • EuropeanOptionValue Operator – 11 different analytic pricing engines – e.g. Black Scholes, Integral, Finite Differences, Binomial, Monte Carlo, etc. • AmericanOptionValue Operator - 11 different analytic pricing engines – e.g. Barone Adesi Whaley, Bjerksund Stensland, Additive Equiprobabilities, etc. 19 Equities Trading “Starter Application” Modular design Components are plug-replaceable – extend these or substitute your own Demonstrates how trading strategies may be swapped out at runtime, without stopping the rest of the application TradingStrategy module looks for opportunities that have specific quality values and trends OpportunityFinder module looks for opportunities and computes quality metrics SimpleVWAPCalculator module computes a running volume-weighted average price metric 20 Options Trading “Starter Application” DataSources module consumes incoming data; formats and maps for later use Pricing module computes theoretical put and call values Decision module matches theoretical values against incoming market values to identify buying opportunities Option Price Stock Price Decision DataSources Data Filtering and Preparation Stock Information Identification of Buying Opportunities Data Sinks Pricing Stock RiskFreeRate Risk Free Rate OptionsValue Theoretical Price Computation OptionsPriceFeedData 21 Multinational Mutual Funds Manager and Broker • High speed market trend calculation system that can provide instant insights into the market behavior • Improved development time from days to hours to add new features to the trend calculation system using the Streams programming model • Customizable to run on one server or distributed across many servers to garner more compute power • Visualization tools for effective live trade monitoring and risk assessment 22 Notionalinformation Information Supply for DecisionTypical supplyChain chain making Transforming the Information Supply Chain to reduce the time to action! Elapsed Time to Action Analytical Modeling & Information Operational Reports Dashboards Planning Scorecarding Bus Process & Event Mgmt Reports Ad-hoc Queries WAREHOUSE SOURCES DATAMARTS DATA INTEGRATION OPERATIONAL DATA STORES 23 Stream Computing: Analytical Modeling & Information Reduces Time to Action Widens the aperture Reduces costs Time to Action Analytical Modeling & Information Operational Reports Dashboards Planning Scorecarding Bus Process & Event Mgmt Reports Ad-hoc Queries More context WAREHOUSE SOURCES DATAMARTS DATA INTEGRATION OPERATIONAL DATA STORES 24 Market Surveillance & Fraud applications Solution User Interface Real time analysis processing Solution User Interface Alerts Rule Parameters Existing business rules Market Feeds and Trade Data Historical Enrich ment Additional sophisticated Collected analytics results PMML Model Scoring 25 What are key advantages of Streams? Language built for Streaming applications: • Reusable operators • Rapid application development • Continuous “pipeline” processing Compiling groups of operators into single processes enables: • Efficient use of cores • Distributed execution • Very fast data exchange • Can be automatic or tuned • Can be scaled with the push of a button Use the data that gives you a competitive advantage: • Can handle virtually any data type • Use data that is too expensive and time sensitive for other approaches Easy to extend: • Built in adaptors • Extend with C++ and Java • Extend running applications Extremely flexible and high performance transport: • Very low latency • High data rates 26 IBM InfoSphere Streams directions Tools Streams Studio enhancements Video/audio analytics Text/unstructured analytics Streams Processing Language improvements Native XML support Runtime High Availability Expanded platform support Performance improvements Cognos 8BI WebSphere Business Events InfoSphere Warehouse Millions of events per second Millisecond Latency Existing business information Adapters WebSphere MQ RSS feeds Mashup Hub WebSphere Business Events Oracle SQL Server MySQL IBM Mashup Hub Data in motion Front Office All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Any reliance on these statements are at the relying party's sole risk and will not create any liability or obligation for IBM. 27 InfoSphere Streams sessions Time Session Title Location Thursday May 20 10:45 AM - 11:35 AM 3666A InfoSphere Streams for Real Time Analytics in Financial Services Industry Marriott Park Hotel, Room 14 Friday May 21 09:00 AM – 09:50 AM 3661A InfoSphere Streams helps Stockholm build Ver 2.0 Traffic Control System Marriott Park Hotel, Room 13 Friday May 21 11:30 AM - 12:30 PM 3692A InfoSphere Streams at Marine Institute of Ireland: Deep Dive Marriott Park Hotel, IOD Mini Theatre 3 Wednesday 10AM - 6PM Thursday 10AM - 5PM Friday 9AM - 2PM Demo Room InfoSphere Streams Demonstrations Marriott Park Hotel, IOD Demo Room Station 19 Wednesday 10:30 – 11:30 Thursday 12:30 – 13:00 Thursday 16:30 – 17:00 Mini Theater on Expo Floor InfoSphere Streams in Telco Marriott Park Hotel, InfoSphere Streams Business Insight InfoSphere Mini Theater Expo Floor Leverage Warehouse, SPSS with Streams