Apex Talk (External Edit)_Final

advertisement
Next Gen Decision Making in <2ms
2
VS.
3
4
5
6
X (predictor)
Y (response)
Spend amount
Likelihood of millionaire
Simple
Velocity
Advanced
7
8
9
10
11
Hard Metrics
Goal
Latency
< 40ms
Ideally < 16ms
Throughput
Goal of 2000 events / second
Durability
No loss, every message gets exactly one response
Availability
99.5% uptime (downtime of 1.83 days / year);
Ideally 99.999% uptime (downtime of 5.26 minutes / year)
Scalability
Can add resources, still meet latency requirements
Integration
Transparently connected to existing systems – Hardware, Messaging,
HDFS
Soft Metrics
Goal
Open Source
All components licensed as open source
Extensibility
Rules can be updated, model is regularly refreshed
12
13
Onyx
14
Performance
Roadmap
Enterprise
Readiness
Community
15
16
17
18
19
20
21
22
23
24
25
Performance
• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM
Thread Local on ~54M events
Percentiles (in ms)
Throughput
Count
70k/sec
54,126,122
Avg (ms)
90%
95%
0.19
1
1
99% 99.9%
1
2
4 9’s
5 9’s
6 9’s
2
5
6
26
Durability
• Two physically independent pipelines on the same cluster processing
identical data
• For the same tuple, we find the best-case time between two pipelines
– 39 records out of 5.2M exceeded 16ms
– 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other
• 99.99925% success rate – “Five Nines”
• Average Latency of 0.0981ms
27
28
29
Appendix
30
Streaming Technologies Evaluated
•
•
•
•
•
•
•
•
•
•
Spark Streaming
Samza
Storm
Feedzai
Infosphere Streams
Flink
Ignite
VoltDB
Cassandra
Apex
•
Focus on open source
• Drive Roadmap
• Competitive Advantage for C1
• Of all evaluated technologies, Apache Apex is the only technology that is ready to
bring the decision making solution to production based on:
–
–
–
–
Maturity
Fault-tolerance
Enterprise-readiness
Performance
31
Stream Processing – Apache Storm
• An open-source, distributed, real-time computation system
– Logical operators (spouts and bolts) form statically parallelizable topologies
– Very high throughput of messages with very low latency
– Can provide <10ms latency end-end under normal operation
• Basic abstractions provide an at-least-once processing guarantee
Limitations
• Nimbus is a single point of failure
– Rectified by Hortonworks, but not yet available to the public (no timeline for release)
• Upstream bolt/spout failure triggers re-compute on entire tree
– Can only create parallel independent stream by having separate redundant topologies
•
•
•
•
Bolts/spouts share JVM  Hard to debug
Failed tuples cannot be replayed quicker than 1s
No dynamic topologies
Cannot add or remove applications without service interruption
32
Stream Processing – Apache Flink
• An open-source, distributed, real-time computation system
–
–
–
–
Logical operators are compiled into a DAG of tasks executed by Task Managers
Supports streaming, micro-batch, batch compute
Supports aggregate operations on streams (reduce, join, groupBy)
Capable of <10 ms end-end latency with streaming under normal operation
• Can provide exactly-once processing guarantees
Limitations
• Failures trigger reset of ALL operators to last checkpoint
– Depends on upstream message broker to track state
• Operators share JVM
– Failure in one brings down all tasks sharing that JVM
– Hard to debug
• No dynamic topologies
• Young community, young product
33
Stream Processing – Apache Apex
• An open-source, distributed, real-time computation system on YARN
• Apex is the core system powering DataTorrent, released under ASF
• Demonstrated high throughput with low latency running a next-generation
C1 model (avg. 0.25ms, max 2ms, @ 70k records/sec) w/ 600GB RAM
• True YARN application developed on the principles of Hadoop and YARN
at Yahoo!
• Mature product
– Core principles of Apex are derived from a proven solution in Yahoo Finance and
Yahoo hadoop.
– Operability in Apex is first class citizen with focus on Enterprise capabilities
• DataTorrent (Apex) is executing on production clusters at Fortune 100
companies.
34
Stream Processing – Apache Apex
Maturity
• Designed to process and manage global data for Yahoo! Finance
– Primary focus is on stability, fault-tolerance and data management
– Only OSS streaming technology considered designed explicitly for the financial world
• Data or computation could never be lost or replicated
• Architecture had to never go down
• Goal was to make it rock-solid and enterprise-ready before worrying about performance
• Data flow across countries – perfect for use-case that requires crosscluster interaction
Enterprise Readiness
• Advanced support for:
– Encryption, authentication, compression, administration, and monitoring
– Deployment at scale in the cloud and on-prem – AWS, Google Cloud, Azure
• Integrates with huge set of existing tools:
– HDFS, Kafka, Cassandra, MongoDB, Redis, ElasticSearch, CouchDB, Splunk, etc.
35
Apex Platform – Summary
• Apex Architecture
– Networks of physically independent, parallelizable operators that scale dynamically
– Dynamic topology modification and deployment
– Self-healing, fault tolerant, & recoverable
• Durable messaging queues between operators, check-pointed in memory and on disk
• Resource manager is a replicated YARN process, monitors and restarts downed operators
– No single point of failure, highly modular design
– Can specify locality of execution (avoids network and inter-process latency)
• Guarantees at-least-once, at-most-once, or exactly-once processing
Directed Acyclic Graph (DAG)
Operator
er
Operator
Tuple
Tuple
Operator
er
er
Output
Stream
Operator
er
36
Apex Platform – Overview
37
Apex Platform – Malhar
38
Apex Platform – Cluster View
Hadoop Edge Node
Hadoop Node
DT RTS
Management
Server
YARN Container
REST
API
RTS App Master
CLI
Hadoop Node
Hadoop Node
YARN Container
YARN
Container
Streaming
YARN Container
Container
YARN Container
YARN
Container
Streaming
YARN
Container
Container
Op1
Op1
Op3
Op3
Op2
Op2
Thread1
Thread-N
Thread1
Thread-N
Part of Community Edition
DT RTS
Management
Server
REST
API
39
Apex Platform – Operators
• Operators can be dynamically
scaled
• Flexible stream configuration
• Parallel Redis / HDHT DAGs
• Separate visualization DAG
• Parallel partitioning
• Durability of data
• Scalability
• Organization for in-memory
store
• Unifiers
• Combine statistics from
physical partitions
40
Dynamic Topology Modification
• Can redeploy new operators and models at run-time!
• Can reconfigure settings on the fly
41
Apex Platform – Failure Recovery
• Physical independence of partitions is critical
• Redundant STRAMs
• Configurable window size and heartbeat for low-latency recovery
• Downstream failures do not affect upstream components
– Snapshotting only depends on previous operator, not all previous operators
– Can deploy parallel DAGs with same point of origin (simpler from a hardware and
deployment perspective)
42
Apex Platform – Windowing
• Sliding window and
tumbling window
• Window based on
checkpoint
• No artificial latency
• Used for stats
measurement
43
Enterprise Readiness
• Apex
– Great UI to monitor, debug, and control system performance
– Fault-tolerance and recovery out of the box - no additional setup, or improvement
needed
• YARN is still a single point of failure, a name node failure can still impact the system
– Built-in support for dynamic and automatic scaling to handle larger throughputs
– Native integration with Hadoop, YARN, and Kafka – next-gen standard at C1
– Mature product
• Principles derived from years at Yahoo Finance and Yahoo Hadoop
• Built and planned by deep Hadoop and streaming experts
– Proven performance in production at Fortune 100 companies
44
Enterprise Readiness
• Storm
– Widely used but abandoned by creators at Twitter for Heron in production
• Storm debug-ability - topology components are bundled in one process
• Resource demands
– Need dedicated hardware
– Can’t scale on demand or share usage
• Topology creation/tear-down is expensive, topologies can’t share cluster resources
– Have to manually isolate & de-commission machines
– Performance in failure scenarios is insufficient for this use-case
• Flink
– Operational performance has not been proven
• Only one company (ResearchGate) officially uses Flink in production
– Architecture shares fundamental limitations of Storm with regards to
dynamically scaling operators & topologies and debugability
– Performance in failure scenarios is insufficient for this use-case
45
Performance
• Storm
– Meets latency and throughput requirements only when no failures occur.
– Resilience to failures only possible by running fully independent clusters
– Difficult to debug and operationalize complex systems (due to shared JVM and poor
resource management)
• Flink
– Broader toolset than Storm or Apex – ML, batch processing, and SQL-like queries
– Meets latency and throughput requirements only when no failures occur.
– Failures reset ALL operators back to the source – resilience only possible across
clusters
– Difficult to debug and operationalize complex systems (due to shared JVM)
• Apex
–
–
–
–
Supports redundant parallel pipelines within the same cluster
Outstanding latency and throughput even in failure scenarios
Self-healing independent operators (simple to isolate failures)
Only framework to provide fine-grained control over data and compute locality
46
Roadmap – Storm
• Commercial support from from Hortonworks but limited code
contributions
• Twitter - Storm’s largest user - has completely abandoned Storm for Heron
• Business Continuity
– Enhance Storm’s enterprise readiness with high availability (HA) and failover to standby
clusters
– Eliminate Nimbus as a single point of failure
• Operations
– Apache Ambari support for Nimbus HA node setup
– Elastic topologies via YARN and Apache Slider.
– Incremental improvements to Storm UI to easily deploy, manage and monitor
topologies.
• Enterprise readiness
– Declarative writing of spouts, bolts, and data-sources into topologies
47
Roadmap – Flink
• Fine-grained fault tolerance (avoid rollback to data source) – Q2 2015
• SQL on Flink – Q3/Q4 2015
• Integrate with distributed memory storage – No ECD
• Use off-heap memory – Q1 2015
• Integration with Samoa, Tez, Mahout DSL – No ECD
48
Roadmap – Apex
• Roadmap for next 6 months
• Support creation of reusable pluggable modules (topologies)
• Add additional operators to connect to existing technology
– Databases
– Messaging
– Modeling systems
• Add additional SQL-like operations
–
–
–
–
Join
Filter
GroupBy
Caching
• Add ability to create cycles in graph
– Allows re-use of data for ML algorithms (similar to Spark’s caching)
49
Road Map Comparison
• Storm
– Roadmap is intended to bring Storm to enterprise readiness  Storm is not enterprise
ready today according to Hortonworks
• Flink
– Roadmap brings Flink up to par with Spark and Apex, does not create new capabilities
relative to either
– Spark is more mature for batch-processing and micro-batch and Apex is more mature
from a streaming standpoint.
• Apex
– No need to improve core architecture, focus is instead on adding functionality
• Better support for ML
• Better support for wide variety of business use cases
• Better integration with existing tools
– Stated commitment to letting the community dictate direction. From incubator proposal:
• “DataTorrent plans to develop new functionality in an open, community-driven way”
50
Community
• Vendor and community involvement drive roadmap and project growth
• Storm
– Limited improvements to core components of Storm in recent months
– Limited focused and active committers
– Actively promoted and supported in public by Hortonworks
• Flink
– Some adoption in Europe, growing response in U.S.
– 11 active committers, 10 are from Data Artisans (company behind Flink)
– Community is very young, but there is substantial interest
• Apex
– Wide support network around Apex due to its evolution alongside Hadoop and YARN
– Young but actively growing community: http://incubator.apache.org/projects/apex.html
– Opportunity for C1 to drive growth and define the direction of this product
51
Streaming Solutions Comparison
• Apex
– Ideal for this use case, meets all performance requirements and is ready for out-of-thebox enterprise deployment
– Committer status from C1 allows us to collaboratively drive roadmap and product
evolution to fit our business need.
• Storm
– Great for many streaming use cases but not the right fit for this effort
– Performance in failure scenarios does not meet our requirements
– Community involvement is waning and there is a limited road map for substantial
product growth
• Flink
– Poised to compete with Spark in the future based on community activity and roadmap
– Not ready for enterprise deployment:
• Technical limitations around fault-tolerance and failure recovery
• Lack of broad community involvement
• Roadmap only brings it up to par with existing frameworks
52
New Capabilities Provided by Proposed Architecture
•
•
•
•
•
•
•
•
Millisecond Level Streaming Solution
Fault Tolerant & Highly Available
Parallel Model Scoring for Arbitrary Number of Models
Quick Model Generation & Execution
Dynamic Scalability based on Latency or Throughput
Live Model Refresh
A/B Testing of Models in Production
System is Self Healing upon failure of components (**)
53
Decisioning System Architecture - Strengths
• Internal
– Capital One software, running on Capital One hardware, designed by Capital One
• Open source
– Internally maintainable code
• Living Model
– Can be re-trained on current data & updated in minutes, not years
– Offline models can expanded and re-developed and deployed to production at will
• Extensible
– Modular architecture with swappable components
• A/B Model Testing in Production
• Dynamic Deployment / Refresh of Models
54
Hardware
MDC Hardware Specifications
• Server Quantity – 15
• Server Model – Supermicro
• CPU – Intel Xeon E5-2695v2 2.4Ghz
12Cores
• Memory – 256GB
• HDD – (5) 4TB Seagate SATA
• Network Switch – Cisco Nexus 6001
10GB
• NIC – 2port SFP+ 10GbE
MDC Software Specifications
• Hadoop – v2.6.0
• Yarn – v2.6.0
• Apache Apex – v3.0
• Linux OS – RHEL v6.7
• Linux OS Kernel - 2.6.32573.7.1.el6.x86_64
55
Performance Comparison - Redis vs. Apex-HDHT
Apex-HDHT - Thread Local on ~2M events
Stats
Throughput
Count
70k/sec
1,807,283
Avg (ms)
90%
95%
0.253
1
1
Apex-HDHT Thread Local on ~54M events
Stats
Throughput
Count
70k/sec
54,126,122
Count
40k/sec
2,214,777
90%
95%
0.19
1
1
Count
8.5k/sec
2,018,057
4 9’s
5 9’s
6 9’s
2
2
2
2
99% 99.9%
4 9’s
5 9’s
6 9’s
2
2
5
6
4 9’s
5 9’s
6 9’s
489
494
495
495
99% 99.9%
4 9’s
5 9’s
6 9’s
22
22
22
1
1
Percentiles (in ms)
Avg (ms)
90%
95%
99% 99.9%
51.651
98
126
381
Redis Thread local on ~2M events
Stats
Throughput
99% 99.9%
Percentiles (in ms)
Avg (ms)
Apex-HDHT No locality on ~2M events
Stats
Throughput
Percentiles (in ms)
Percentiles (in ms)
Avg (ms)
90%
95%
13.654
16
18
20
21
56
Download