Streaming Event Processing

advertisement
Orleans Streaming
Gabriel Kliot
Virtual Meetup - 05/22/2015
1
Real-life Scenarios from Developer’s Perspective
• Make an Actor call and get back a sequence of events (e.g., progress
report for background task)
• Orleans Streams let send or return a sequence of events that will be
generated over time – ”remote IEnumerable”
• IoT – millions of device generates events that need to be processes in
a reliable and scalable way, in the context of each device
• Orleans Streams handle storing, retrieval, delivery and scale processing
• Chat Room – publish chat events without knowing who is in the room
• Orleans Streams decouple producers from consumers
2
What’s Out There?
• Durably stream data stores: Kafka, EventHubs, ServiceBus, Azure Queues
• Stream Compute systems: Azure Stream Analytics, Storm, Spark Streaming
• Unified data-flow graph of operations that are applied in the same way to all stream items
• Targets uniform data and similar set of transformations, filtering or aggregation operations
• Optimized for large volume of similar items with similar, and usually limited in terms of expressiveness, processing
• We need to support Interactive Workloads
•
•
•
•
With diverse data
And diverse processing
Potentially stateful processing
Support side effects and external calls
4
Dynamic Scenarios
• Universe of Users
• Different processing logic for each user, within their context:
•
•
•
•
•
Some are interested in weather (different locations)
Some in sport events (different teams)
Some in particular flight
Some processing logic may depend on external conditions, not part of the data stream
Some processing may result in external HTTP call with side effects
• Processing topology changes
• Users come and go
• Interests (subscriptions) come and go
• Processing logic evolves and changes dynamically
• Cheating update example
5
Requirements
1.
Flexible stream processing logic
• No limitations on how to express the processing logic: data-flow, functional, declarative, general imperative
2.
Support for highly dynamic topologies
• Steam processing topology can evolve at runtime, without redeploy
• Stream.GroupBy(x=> x.key).Select(x=> x+2).AverageWindow(x, 5 sec).Where(x=> x > 0.8)
• Want to change the Where threshold and add new Select
• New streams come and go, new processing elements come and go
3.
Fine-grained stream granularity
• Each node and link in the topology is uniquely addressable and can be have different implementations
4.
Distribution
•
•
•
•
•
Scalability - supports large number of streams and compute elements
Elasticity - allows to add/remove resources to grow/shrink based on load
Reliability - be resilient to failures
Efficiency - use the underlying resources efficiently
Responsiveness - enable near real time scenarios
6
Streaming Event Processing
Event source
Processing elements
• Imperative code
• LINQ query
• F#
• StreamInsight
• …
Output
Virtual stream
• ServiceBus
• Azure Queue
• Best effort
• In-memory replicated
• …
7
“Virtual Streams”
• Virtual stream programming abstraction
•
•
•
•
Stream is logical abstraction
Stream always exits, it does not need to be created and cannot fail
Identified by GUID and optional string (stream namespace)
Stream subscriptions are durable
• Unified abstraction over many behaviors and semantics
• Different Delivery guarantees
• Durable queues, pub/sub, best-effort one-way channels, etc.
• Different Ordering guarantees
• Behavior is determined by configuration
• Provider model allows applications to add new stream types seamlessly
8
“Distributed RX” – Decoupled in Time and Space
Producer
IObserver
Virtual Stream
IObservable
Consumer
(IObserver)
Consumer observes the stream, so consumer implements IObserver
The stream looks like a consumer (IObserver) to the producer,
and like a producer (IObservable) to the consumer
9
Virtual Stream Usage API
IStreamProvider streamProvider = base.GetStreamProvider("SimpleStreamProvider");
IAsyncStream<T> stream = streamProvider.GetStream<T>(Guid, "MyStreamNamespace");
IAsyncStream<T> is a handle, like GrainReference, implements IAsyncObserver<T> and
IAsyncObservable<T>
• Producer can produce by calling any of the IAsyncObserver<T> methods
Task OnNextAsync(T item)
Task OnCompleteAsync()
Task OnErrorAsync(Exception ex)
• Consumer can subscribe to the stream
StreamSubscriptionHandle<T> handle = await stream.SubscribeAsync(IAsyncObserver<T>)
await handle.UnsubsribeAsync()
• Subscription is for a grain, not for an activation
10
More Capabilities
• Multiplicity
• Any* number of consumers
• Any* number of producers
• Can subscribe multiple times, manage each subscription separately
• Explicit Subscription based on some trigger/message/external event
• Implicit Subscription
• Stream data will automatically activate a new grain which will be automatically
subscribed
• Mark a grain class with [ImplicitStreamSubscription("MyStreamNamespace")]
• Will automatically subscribe this grain to a matching stream
• Grain with GUID XXX will be subscribed to stream with GUID XXX and namespace
"MyStreamNamespace"
* Practically our current implementation is limited to 100s of consumers/producers per stream. It can be extended by using a more sophisticated persistence scheme.
11
Stream Providers
• Extensibility point to streams behavior and semantics
• Different transports
• TCP, Azure Queues, Event Hubs, …
• Different message delivery semantics
• best effort, queued, at most once, at least once, …
• Different batching
• Different backpressure
• We currently have
• Simple Message Stream Provider
• Azure Queue Provider
• In progress: Event Hub Stream Provider
12
Semantics
• Subscription
• Sequential Consistency between Subscribe and future production
• Message Delivery
• Depends on Stream Provider
• Message Ordering
• Depends on Stream Provider
• Application defined order
• Pass StreamSequenceToken together with the event when producing it
• Will arrive to consumer
• Use app logic to reason and reconstruct order
13
Rewindable Streams
• Some queuing technologies allow “going back in time”
• Expose that capability via a notion of “Rewindable Stream”
• Subscribe from a certain point in time
• stream.SubscribeAsync(IAsyncObserver<T>, StreamSequenceToken)
• Useful for recovery scenarios
• Periodically checkpoint your processing grain state
• Upon recovery, re-subscribe from the past
• StreamSubscriptionHandle<T>.ResumeAsync(IAsyncObserver<T>, StreamSequenceToken)
• Jay Kreps “Questioning the Lambda Architecture”
• http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
• No implementation yet, coming…
14
Implementation
• Events are delivered to grains or clients via regular Orleans messaging
• Delivered to internal grain interface, called grain extension, that invokes the IAsyncObserver methods
• Reliable Pub-Sub service matches producers with subscribers and stores their
identities in persistent storage
• Persistent Stream Provider – common base class for implementing queue-based
stream providers
• Pulling agents - distributed "micro-service“* - partitioned, highly available, and elastic
distributed component
• StreamQueueMapper - list of queues, mapping streams to queues
• StreamQueueBalancer - balancing queues across silos and agents
• IQueueCache - decouple delivery to different streams and different consumers
• Backpressure
• Delivery from cache to consumers – via regular Orleans messaging, one at a time
• From queue into the cache – built-in backpressure mechanism in the IQueueCache
* Calling Pulling agents a “Micro-Serving” was a joke of course
15
Extensibility
• Via Provider Configuration
• # queues, queue names, cache size, queue balancer type, queue mapper, …
• Ask for more, please!
• Queue Adapter
• Plug in model for new queueing technologies
• Abstracts the actual physical queue
• Plugs into base PersistentStreamProvider
• Allows to re-use pulling agents, cache, backpressure
16
What Is Next
• Simple Event Hub Adapter – work in progress
• Rewindable Event Hub Adapter – work in progress
• Scalability/performance improvements
• Support 1000s of consumers/producers per stream
• Pub-Sub now limited to 100s consumers/producers per stream
• Multicast on the messaging layer
• Providers for more queueing technologies
• Kafka, SQS, ServiceBus, RabbitMQ
• Content-based implicit subscription
• Custom data formatters
• Higher level programming model on top of Virtual Streams
• Declarative data-flow
• Your help/input
17
The END
Gabi Kliot
gkliot@microsoft.com
https://github.com/gabikliot
https://www.linkedin.com/in/gabrielkliot
18
Download