operators

advertisement
InfoSphere Streams for Real Time
Analytics in Financial Services Industry
Krishna Mamidipaka, krishnag@us.ibm.com
Roger Rea, rrea@us.ibm.com
Housekeeping
• We value your feedback - don't forget to complete your
evaluation for each session you attend and hand it to
the room monitors at the end of each session
• Overall Conference Evaluation will be provided
at the General Session on Friday
• Visit the Expo Solutions Centre
• Please remember this is a 'non-smoking' venue!
• Please switch off your mobile phones
• Please remember to wear your badge at all times
Disclaimer
The Information regarding potential future products is
intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
The information mentioned regarding potential future
products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality.
Information about potential future products may not be
incorporated into any contract. The development,
release, and timing of any future features or functionality
described for our products remains at our sole discretion.
Agenda
• Financial Markets Business Challenges
• Industry Technical Challenges
• InfoSphere Streams
• Trend Calculator
• Financial Toolkit
• Data Mining in Real Time
• InfoSphere Streams Directions
4
Firms Must Capitalize on Drivers of Change
Drivers
Implications
Actions
Markets becoming
electronic
Speed as source of
Alpha
Accelerate the end-to-end
marketplace connectivity
and execution
Real-time data
pressures
Volume is a barrier
Increase capacity to handle
current and forecasted
volumes
Information
availability
Transparency is
required
Store, retrieve and distribute
comprehensive time
series data in a timely
manner
Transaction costs
pressures
Detailed analysis of
trading process
Access to broader markets
by accessing multiple
markets
5
Real time data pressures
We are in a technology arms race
Latency reductions with a clear business value
or cost associated
Exponential increases in volumes
For US equity electronic trading brokerage
1 millisecond = $4M in annual revenue
Source: Tabb Group
6
The Volume, Complexity & Semantic Depth of data
that to be analysed will increase significantly
Structured data
Historical
Trade Data
Structured & Unstructured data
Market
Data
Historical
Trade
Data
Risk
Analytics
Data
Market
Data
Real
World
Sensors
Risk
Analytics
Data
Analytics &
Insight
Analytics &
Insight
Internal
Message
Bus
Tomorrow?
Blogs
&
Commentary
Corporate
Press
Reports
Government
Statistics
Weather
Data
Video
News
Feeds
Web
Pages
RSS
Feeds
+ Other
Feeds
Information overload
7
The Transaction Life Cycle or latency loop – end to
end latency is the key to success and there are no
prizes for coming second
Investment / trading goals
Transaction
Cost
Analysis
latency measurement is a competitive advantage to deliver Alpha
Market
Data
WAN
Connectivity
Trading
Decision
What to
Buy/Sell
Middleware
Execution
Algorithm
VWAP,etc.
CEP Engines
Order
Routing
Decision
OMS/EMS
Matching
Exchanges
,
Speed
Speed
Speed
Speed
End to end latency
knowledge and a continuous
performance road map is required
Current approaches reaching limits, based on x86 and networking technologies
8
The Manycore programming challenge
Programmers cannot cope with
thousands of threads and complex data
flows using existing programming models
I/O
I/O
DSK
I/O
RAM
CPU
NET
NET
DSK
DSK
RAM
RAM
RAM
RAM
Core
Core
Core
Core
RAM
RAM
RAM
RAM
Core
Core
Core
Core
RAM
RAM
RAM
RAM
Core
Core
Core
Core
RAM
RAM
RAM
RAM
Core
Core
Core
Core
Single Core
Single Thread
100% Serial Programming
Multicore (2-16)
Multithread (10s)
80/20 Serial/Parallel Programming
Yesterday
Today
Manycore (32-100s)
20/80 Serial/Parallel
Programming Threading model
breaks as complexity exceeds
programmer capability
Tomorrow
9
Options for exposing parallelism in a
programming model
Parallelism
Fully Exposed
 Full exposure of machine
details
 Only usable by experts
 High performance
 Low productivity
Partial
Exposure
 Limits exposure to machine
details
 Expands programmer
community
 High performance
 Higher productivity for C/C++
class programmers
- Bounds checks, pointer
checks, strong typing, etc.
Parallelism
Implicit
 No exposure of machine
details, e.g., Hadoop/map
reduce, IBM Streams
Processing Language
 Usable by larger number
of programmers
 High Performance
 High Productivity
10
Time is ripe for a new era of computing
• Emerging trends create need for new languages
–
–
–
–
–
Scientific programming  Fortran
Business programming  Cobol
Systems programming at higher level  C
Increased productivity  C++
Web programming  Java
• Streaming data sources and multicore architectures
– Streams Processing Language
11
Delivering ‘Continuous Intelligence’ with Powerful Analytics
Real time delivery
Automated Options
Market Making:
– Peak throughput of 10
million messages per
second
– Mean latency under 100
micro seconds across 28
dual quad core x86 blades
Powerful
Analytics
Millions of
events per
second
Microsecond
Latency
Traditional /
Non-traditional
data sources
12
IBM InfoSphere Streams v1.2
Development
Environment
Runtime
Environment
Toolkits & Adapters
Front Office 3.0
Eclipse IDE
StreamSight
Stream Debugger
RHEL v5.3 or v5.4
x86 multicore hardware
InfiniBand support
Up to 125 servers
Connectors to data sources
Operator Library
Financial Toolkit
Mining Toolkit
13
Scalable stream processing
• InfoSphere Streams provides
– A programming model and IDE for defining data sources and
software analytic modules called operators that are fused into
process execution units (PEs)
– infrastructure to support the composition of scalable stream
processing applications from these components
– deployment and operation of these applications across distributed
x86 processing nodes, when scaled processing is required
– stream connectivity between data sources and PEs of a stream
processing application
14
Trend Calculator Example
Symbols to be
output
Trend File 1
playback
Up/down trend for
Requested symbols
Trend File 2
playback
Trend File 3
playback
Algo Parameters
Per Symbol
15
Streams offers tremendous deployment flexibility
With only a simple re-compile of application:
All on one machine fused
into one multi-threaded
process
All on one machine; each
operator in its own process
Each operator in its own process,
each process on its own machine
16
Trend Calculator Example
17
Financial Services Toolkit
Speeds development of Streams financial domain applications
• Adapters layer used by top two layers and user-written apps
• Functions layer used by top layer and user-written apps
• Solution Frameworks are “starter” applications that target a particular use case
18
Adapters, Functions, Utilities
• Financial Information Exchange (FIX) Adapters
– fixInitiator Operator, fixAcceptor Operator, FixMessageToStream Operator,
StreamToFixMessage Operator
• WebSphere Front Office for Financial Markets (WFO) Adapters
– WFOSource Operator, WFOSink Operator
• WebSphere MQ Low-Latency Messaging (LLM) Adapters
– MQRmmSink Operator
• Functions:
– Coefficient of Correlation
– “The Greeks” (Put/Call values, Delta, Theta, Rho, Charm, DualDelta, etc.)
• Operators:
– Wrappering QuantLib financial analytics open source package.
– Provides operators to compute theoretical value of an option:
• EuropeanOptionValue Operator – 11 different analytic pricing engines
– e.g. Black Scholes, Integral, Finite Differences, Binomial, Monte Carlo, etc.
• AmericanOptionValue Operator - 11 different analytic pricing engines
– e.g. Barone Adesi Whaley, Bjerksund Stensland, Additive Equiprobabilities, etc.
19
Equities Trading “Starter Application”
Modular design
Components are plug-replaceable –
extend these or substitute your own
Demonstrates how trading strategies
may be swapped out at runtime, without
stopping the rest of the application
TradingStrategy module looks for
opportunities that have specific quality
values and trends
OpportunityFinder module looks for
opportunities and computes quality
metrics
SimpleVWAPCalculator module
computes a running volume-weighted
average price metric
20
Options Trading “Starter Application”
DataSources module consumes incoming data; formats and maps for later use
Pricing module computes theoretical put and call values
Decision module matches theoretical values against incoming market values to identify
buying opportunities
Option
Price
Stock
Price
Decision
DataSources
Data Filtering and
Preparation
Stock
Information
Identification of
Buying
Opportunities
Data
Sinks
Pricing
Stock
RiskFreeRate
Risk Free
Rate
OptionsValue
Theoretical Price
Computation
OptionsPriceFeedData
21
Multinational Mutual Funds Manager and Broker
•
High speed market trend calculation
system that can provide instant
insights into the market behavior
•
Improved development time from
days to hours to add new features to
the trend calculation system using
the Streams programming model
•
Customizable to run on one server
or distributed across many servers
to garner more compute power
•
Visualization tools for effective live
trade monitoring and risk
assessment
22
Notionalinformation
Information Supply
for DecisionTypical
supplyChain
chain
making
Transforming the Information Supply Chain to reduce the time to action!
Elapsed Time to Action
Analytical Modeling & Information
Operational
Reports
Dashboards
Planning
Scorecarding
Bus Process &
Event Mgmt
Reports
Ad-hoc Queries
WAREHOUSE
SOURCES
DATAMARTS
DATA INTEGRATION
OPERATIONAL DATA STORES
23
Stream Computing:
Analytical Modeling
& Information
Reduces Time to Action
Widens the aperture
Reduces costs
Time to Action
Analytical Modeling & Information
Operational
Reports
Dashboards
Planning
Scorecarding
Bus Process &
Event Mgmt
Reports
Ad-hoc Queries
More context
WAREHOUSE
SOURCES
DATAMARTS
DATA INTEGRATION
OPERATIONAL DATA STORES
24
Market Surveillance & Fraud applications
Solution User
Interface
Real time analysis processing
Solution User
Interface
Alerts
Rule Parameters
Existing
business
rules
Market Feeds and
Trade Data
Historical
Enrich
ment
Additional
sophisticated
Collected
analytics
results
PMML Model
Scoring
25
What are key advantages of Streams?
Language built for Streaming
applications:
• Reusable operators
• Rapid application development
• Continuous “pipeline”
processing
Compiling groups of operators into
single processes enables:
• Efficient use of cores
• Distributed execution
• Very fast data exchange
• Can be automatic or tuned
• Can be scaled with the push of a button
Use the data that gives
you a competitive
advantage:
• Can handle virtually
any data type
• Use data that is too
expensive and time
sensitive for other
approaches
Easy to extend:
• Built in adaptors
• Extend with C++ and Java
• Extend running applications
Extremely flexible and high
performance transport:
• Very low latency
• High data rates
26
IBM InfoSphere Streams directions
Tools
Streams Studio enhancements
Video/audio analytics
Text/unstructured analytics
Streams Processing Language
improvements
Native XML support
Runtime
High Availability
Expanded platform support
Performance improvements
Cognos
8BI
WebSphere
Business
Events
InfoSphere
Warehouse
Millions of
events per
second
Millisecond
Latency
Existing
business
information
Adapters
WebSphere MQ
RSS feeds
Mashup Hub
WebSphere Business Events
Oracle
SQL Server
MySQL
IBM
Mashup
Hub
Data in motion
Front Office
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Any
reliance on these statements are at the relying party's sole risk and will not create any liability or obligation for IBM.
27
InfoSphere Streams sessions
Time
Session Title
Location
Thursday May 20
10:45 AM - 11:35 AM
3666A
InfoSphere Streams for Real Time
Analytics in Financial Services
Industry
Marriott Park Hotel,
Room 14
Friday May 21
09:00 AM – 09:50 AM
3661A
InfoSphere Streams helps
Stockholm build Ver 2.0 Traffic
Control System
Marriott Park Hotel,
Room 13
Friday May 21
11:30 AM - 12:30 PM
3692A
InfoSphere Streams at Marine
Institute of Ireland: Deep Dive
Marriott Park Hotel,
IOD Mini Theatre 3
Wednesday 10AM - 6PM
Thursday 10AM - 5PM
Friday 9AM - 2PM
Demo
Room
InfoSphere Streams Demonstrations Marriott Park Hotel,
IOD Demo Room
Station 19
Wednesday 10:30 – 11:30
Thursday 12:30 – 13:00
Thursday 16:30 – 17:00
Mini
Theater
on Expo
Floor
InfoSphere Streams in Telco
Marriott Park Hotel,
InfoSphere Streams Business Insight InfoSphere Mini
Theater Expo Floor
Leverage Warehouse, SPSS with
Streams
Download