Orleans Streaming Gabriel Kliot Virtual Meetup - 05/22/2015 1 Real-life Scenarios from Developer’s Perspective • Make an Actor call and get back a sequence of events (e.g., progress report for background task) • Orleans Streams let send or return a sequence of events that will be generated over time – ”remote IEnumerable” • IoT – millions of device generates events that need to be processes in a reliable and scalable way, in the context of each device • Orleans Streams handle storing, retrieval, delivery and scale processing • Chat Room – publish chat events without knowing who is in the room • Orleans Streams decouple producers from consumers 2 What’s Out There? • Durably stream data stores: Kafka, EventHubs, ServiceBus, Azure Queues • Stream Compute systems: Azure Stream Analytics, Storm, Spark Streaming • Unified data-flow graph of operations that are applied in the same way to all stream items • Targets uniform data and similar set of transformations, filtering or aggregation operations • Optimized for large volume of similar items with similar, and usually limited in terms of expressiveness, processing • We need to support Interactive Workloads • • • • With diverse data And diverse processing Potentially stateful processing Support side effects and external calls 4 Dynamic Scenarios • Universe of Users • Different processing logic for each user, within their context: • • • • • Some are interested in weather (different locations) Some in sport events (different teams) Some in particular flight Some processing logic may depend on external conditions, not part of the data stream Some processing may result in external HTTP call with side effects • Processing topology changes • Users come and go • Interests (subscriptions) come and go • Processing logic evolves and changes dynamically • Cheating update example 5 Requirements 1. Flexible stream processing logic • No limitations on how to express the processing logic: data-flow, functional, declarative, general imperative 2. Support for highly dynamic topologies • Steam processing topology can evolve at runtime, without redeploy • Stream.GroupBy(x=> x.key).Select(x=> x+2).AverageWindow(x, 5 sec).Where(x=> x > 0.8) • Want to change the Where threshold and add new Select • New streams come and go, new processing elements come and go 3. Fine-grained stream granularity • Each node and link in the topology is uniquely addressable and can be have different implementations 4. Distribution • • • • • Scalability - supports large number of streams and compute elements Elasticity - allows to add/remove resources to grow/shrink based on load Reliability - be resilient to failures Efficiency - use the underlying resources efficiently Responsiveness - enable near real time scenarios 6 Streaming Event Processing Event source Processing elements • Imperative code • LINQ query • F# • StreamInsight • … Output Virtual stream • ServiceBus • Azure Queue • Best effort • In-memory replicated • … 7 “Virtual Streams” • Virtual stream programming abstraction • • • • Stream is logical abstraction Stream always exits, it does not need to be created and cannot fail Identified by GUID and optional string (stream namespace) Stream subscriptions are durable • Unified abstraction over many behaviors and semantics • Different Delivery guarantees • Durable queues, pub/sub, best-effort one-way channels, etc. • Different Ordering guarantees • Behavior is determined by configuration • Provider model allows applications to add new stream types seamlessly 8 “Distributed RX” – Decoupled in Time and Space Producer IObserver Virtual Stream IObservable Consumer (IObserver) Consumer observes the stream, so consumer implements IObserver The stream looks like a consumer (IObserver) to the producer, and like a producer (IObservable) to the consumer 9 Virtual Stream Usage API IStreamProvider streamProvider = base.GetStreamProvider("SimpleStreamProvider"); IAsyncStream<T> stream = streamProvider.GetStream<T>(Guid, "MyStreamNamespace"); IAsyncStream<T> is a handle, like GrainReference, implements IAsyncObserver<T> and IAsyncObservable<T> • Producer can produce by calling any of the IAsyncObserver<T> methods Task OnNextAsync(T item) Task OnCompleteAsync() Task OnErrorAsync(Exception ex) • Consumer can subscribe to the stream StreamSubscriptionHandle<T> handle = await stream.SubscribeAsync(IAsyncObserver<T>) await handle.UnsubsribeAsync() • Subscription is for a grain, not for an activation 10 More Capabilities • Multiplicity • Any* number of consumers • Any* number of producers • Can subscribe multiple times, manage each subscription separately • Explicit Subscription based on some trigger/message/external event • Implicit Subscription • Stream data will automatically activate a new grain which will be automatically subscribed • Mark a grain class with [ImplicitStreamSubscription("MyStreamNamespace")] • Will automatically subscribe this grain to a matching stream • Grain with GUID XXX will be subscribed to stream with GUID XXX and namespace "MyStreamNamespace" * Practically our current implementation is limited to 100s of consumers/producers per stream. It can be extended by using a more sophisticated persistence scheme. 11 Stream Providers • Extensibility point to streams behavior and semantics • Different transports • TCP, Azure Queues, Event Hubs, … • Different message delivery semantics • best effort, queued, at most once, at least once, … • Different batching • Different backpressure • We currently have • Simple Message Stream Provider • Azure Queue Provider • In progress: Event Hub Stream Provider 12 Semantics • Subscription • Sequential Consistency between Subscribe and future production • Message Delivery • Depends on Stream Provider • Message Ordering • Depends on Stream Provider • Application defined order • Pass StreamSequenceToken together with the event when producing it • Will arrive to consumer • Use app logic to reason and reconstruct order 13 Rewindable Streams • Some queuing technologies allow “going back in time” • Expose that capability via a notion of “Rewindable Stream” • Subscribe from a certain point in time • stream.SubscribeAsync(IAsyncObserver<T>, StreamSequenceToken) • Useful for recovery scenarios • Periodically checkpoint your processing grain state • Upon recovery, re-subscribe from the past • StreamSubscriptionHandle<T>.ResumeAsync(IAsyncObserver<T>, StreamSequenceToken) • Jay Kreps “Questioning the Lambda Architecture” • http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html • No implementation yet, coming… 14 Implementation • Events are delivered to grains or clients via regular Orleans messaging • Delivered to internal grain interface, called grain extension, that invokes the IAsyncObserver methods • Reliable Pub-Sub service matches producers with subscribers and stores their identities in persistent storage • Persistent Stream Provider – common base class for implementing queue-based stream providers • Pulling agents - distributed "micro-service“* - partitioned, highly available, and elastic distributed component • StreamQueueMapper - list of queues, mapping streams to queues • StreamQueueBalancer - balancing queues across silos and agents • IQueueCache - decouple delivery to different streams and different consumers • Backpressure • Delivery from cache to consumers – via regular Orleans messaging, one at a time • From queue into the cache – built-in backpressure mechanism in the IQueueCache * Calling Pulling agents a “Micro-Serving” was a joke of course 15 Extensibility • Via Provider Configuration • # queues, queue names, cache size, queue balancer type, queue mapper, … • Ask for more, please! • Queue Adapter • Plug in model for new queueing technologies • Abstracts the actual physical queue • Plugs into base PersistentStreamProvider • Allows to re-use pulling agents, cache, backpressure 16 What Is Next • Simple Event Hub Adapter – work in progress • Rewindable Event Hub Adapter – work in progress • Scalability/performance improvements • Support 1000s of consumers/producers per stream • Pub-Sub now limited to 100s consumers/producers per stream • Multicast on the messaging layer • Providers for more queueing technologies • Kafka, SQS, ServiceBus, RabbitMQ • Content-based implicit subscription • Custom data formatters • Higher level programming model on top of Virtual Streams • Declarative data-flow • Your help/input 17 The END Gabi Kliot gkliot@microsoft.com https://github.com/gabikliot https://www.linkedin.com/in/gabrielkliot 18