• Pileus This research project in MSR SVC aims to answer the following question: Can we allow programmers to write cloud applications as though they are accessing centralized, strongly consistent data while at the same time allowing them to specify their consistency/availability/performance (CAP) requirements in terms of service-level agreements (SLAs) that are enforced by the cloud storage system at runtime? • CORFU CORFU (Clusters of Raw Flash Units) is a cluster of network-attached flash exposed as a global shared log. CORFU has two primary goals. As a shared log, it exploits flash storage to alter the trade-off between performance and consistency, supporting applications such as databases at wire speed. As a distributed SSD, it slashes power consumption and infrastructure cost by eliminating storage servers. • Dandelion The goal of the Dandelion project is to provide simple programming abstractions and runtime supports for programming heterogeneous systems. Dandelion supports a uniform sequential programming model across a diverse array of execution contexts, including CPU, GPU, FPGA, and the cloud. • TimeStream: Large-Scale Real-Time Stream Processing in the Cloud TimeStream is a distributed system designed specifically for low-latency continuous processing of big streaming data on a large cluster of commodity machines. The unique characteristics of this emerging application domain have led to a significantly different design from the popular MapReduce-style batch data processing. In particular, we advocate a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model. • MadLINQ: Large-Scale Distributed Matrix Computation for the Cloud The computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system. • Naiad The Naiad project is an investigation of data-parallel dataflow computation, like Dryad and DryadLINQ, but with a focus on low-latency streaming and cyclic computations. Naiad introduces a new computational model, timely dataflow, which combines low-latency asynchronous message flow with lightweight coordination when required. These primitives allow the efficient implementation of many dataflow patterns, from bulk and streaming computation to iterative graph processing and machine learning. • Optimus Optimus is a framework for dynamically rewriting an execution plan graph in distributed dataparallel computing at runtime. It enables optimizations that require knowledge of the semantics of the computation, such as language customizations for domain-specific computations including matrix algebra. We address several problems arising in distributed execution including data skew, dynamic data re-partitioning, unbounded iterative computations, and fault tolerance. • Pasture Mobile user experiences are enriched by applications that support disconnected operations to provide better mobility, availability, and response time. However, offline data access is at odds with security when the user is not trusted, especially in the case of mobile devices, which must be assumed to be under the full control of the user. Pasture leverages commodity trusted hardware to provide secure offline data access by untrusted users. • Plexus: Interactive Multiplayer Games on Windows Phones and Tablets As smart phones and tablets are becoming popular gaming devices, there is a need to support real-time, interactive games more effectively. Plexus is a platform to support interactive multiplayer games on Windows phones. It leverages direct phone-to-phone connections and latency-masking techniques to provide smooth, power-efficient interactive game play. • High-Performance Transactional Storage Transactional Application Protocol for Inconsistent Replication, or TAPIR, is a new protocol for linearizable distributed transactions built atop a new replication protocol that provides no consistency guarantees. TAPIR eliminates expensive coordination from the replication layer, yet provides the same transaction model and consistency semantics as existing transactional storage systems (e.g., Google's Spanner). It can commit transactions in a single round-trip, greatly improving both latency and throughput relative to existing systems. • Co-Designing Data Center Networks and Distributed Systems Distributed systems are traditionally designed independently from the underlying network, making worst-case assumptions about its behavior. Such an approach is well-suited for the Internet, where one cannot predict what paths messages might take or what might happen to them along the way. However, many distributed applications are today deployed in data centers, where the network is more reliable, predictable, and extensible. We argue that in these environments, it is possible to co-design distributed systems with their network layer, and doing so can offer substantial benefits. • Sapphire: Designing new operating system abstractions for mobile/cloud applications Mobile/cloud applications are distributed over users' mobile devices and across back-end cloud servers around the world. As a consequence, application programmers now face deployment decisions that were visible only to designers of large-scale distributed systems in the past. These decisions include where data and computation should be located, what data should be replicated or cached and what data consistency level is needed. We are working on how to separate deployment from applications, while still giving application programmers control over performance trade-offs in the Sapphire project. • Distributed storage systems We are pushing the limits of today’s distributed storage systems on several fronts. Scatter, a scalable peer-to-peer key-value storage system, preserves serializable consistency even under adverse conditions. Comet is a distributed key-value store that lets clients inject snippets of code into storage elements, creating an active key-value store that greatly increases the power and range of applications that use distributed storage applications. • Hadoop MapReduce Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. • MOCA: A Lightweight Mobile Cloud Offloading Architecture • DEIDtect: Towards Distributed Elastic Intrusion Detection