15-740 Project Proposal: A Micro-Architecture Level Study of FAWN Architecture

advertisement
15-740 Project Proposal: A Micro-Architecture Level Study of
FAWN Architecture
Bin Fan (binfan@cs.cmu.edu), Lin Xiao (lxiao@cs.cmu.edu),
Prashant Kashinkunti(pkashink@andrew.cmu.edu)
September 27, 2010
1
Project Description
FAWN (Fast Array of Wimppy Nodes) [6, 1] is a scalable and energy-efficient cluster architecture for dataintensive computing. A FAWN cluster consists of a large number of ”wimpy” nodes with energy-efficient
processors and small amount of flash memory to serve workloads such as key-value lookups or running
MapReduce jobs [3].
Current research of FAWN has been done mainly on optimizations at the system level. Therefore in this
project we plan to study FAWN at the micro-architecture level. The goal is to provide better understanding
of designing FAWN-like architectures: which mechanisms are more important to deliver performance-power
efficiency and what are the potential tradeoffs.
Specifically we are targeting at the workload of Key-Value lookups using FAWN architecture. This type
of workload is traditionally I/O intensive and bounded by the storage (flash in FAWN) I/O speed. However
techniques such as Haffman encoded prefix tree and cuckoo hash are used on the query path in latest FAWN
implementation1 , which require more CPU performance. It is possible to observe difference in performance
and power consumption with optimized micro-architecture.
2
Related Work
The Gordon project [5] described a flash-based system architecture for massively parallel, data-centric computing. It analyzed the design space of Gordon system and the trade-offs mainly by varying CPU, replacing
disk with flash memory. There is not much about the architecture level optimization. Similar in all papers
about FAWN [6, 1], they focus more on how to take current hardware and system architectures to reduce
the consumption of power.
In contrast to Gordon and FAWN, there is work done to study a specific workload in microarchitectures’
view. Performance and power-efficiency of web searching workload is studied with both server-class (Xeon)
and mobile-class (Atom) microarchitectures[4]. They conclude small core design is 5x efficient in terms of
power measurement.
3
Design
In this project, we aim at following specific problems:
• Memory is very important to ensure high query throughput, but also has high power consumption.
We plan to study the performance and power consumption with different configuration space of memory
to find a point balancing the tradeoff well.
1 The
reason is to reduce memory consumption of indexing structure
1
• Cache also has great impact on throughput. We plan to study the cache hit ratio and different type
of L2 cache access to see how well cache is utilized. Particularly for key-value workload, we might be
able to optimize the cache replacement policy in this case.
• Microarchitectural events as Key-Value lookups execute, such as the fraction of different type of
instructions and their performance, the percentage of execution time spent stalled for different reasons.
The purpose is to study if the data path is fully utilized and find the bottleneck. We also try to improve
the microarchitecture or redesign the system to benefit more from the current architecture.
4
Experimental Methodology
In order to study the peroformance and power efficiency with different microarchitecture configurations, we
plan to run simulations on Wattch[2]. Wattch is a simulator for analyzing and optimizing power dissipation at
the architecture level. Wattch integrates power models for these common structures along with performance
simulator in order to get the power estimates along with performance estimates.
5
Research Plan
Goal
• 100% goal is to test different microarchitectures on the Simulator with our targeted workloads, understand the design tradeoffs and look for a good balance for both throughput and energy saving.
• 75% goal is to test current microarchitectures of FAWN on the Simulator with our targeted workloads,
find some bottlenecks in current FAWN and give suggestions.
• 125% goal is to test different microarchitectures on the Simulator and real deployment, with multiple
typical workloads of data intensive computing, understand the implications of the workloads to the
design.
Milestones
• Milestone 1: have the simulator run; generate different microarchitectures;
• Milestone 2: test targeted workload with one or more microarchitectures; finish preliminary measurement
References
[1] David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay
Vasudevan. FAWN: A fast array of wimpy nodes. In SOSP 2009.
[2] David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: a framework for architectural-level power
analysis and optimizations. In International Symposium on Computer Architecture, 2000.
[3] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI
2004, 2004.
[4] Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, and Kushagra Vaid. Web search using small cores:
Quantifying the price of efficiency. Technical report, 2009.
[5] Adrian M. Cauleld Laura M. Grupp Steven Swanson. Gordon: Using flash memory to build fast, powerefcient clusters for data-intensive applications. In ASPLOS 2009, 2009.
[6] Vijay Vasudevan, David Andersen, Michael Kaminsky, Lawrence Tan, Jason Franklin, and Iulian Moraru.
Energy-efficient cluster computing with fawn: Workloads and implication. In e-Energy 2010, 2010.
2
Download