Contention Awareness in the Cloud Mary Lou Soffa Computer

advertisement
Contention Awareness in the Cloud
Mary Lou Soffa
Computer Science Department, University of Virginia 151
Engineer’s Way, Charlottesville, VA 22904
Email: soffa@cs.virginia.edu
Tele: (434) 982 2277
URL: http://www.cs.virginia.edu/ soffa
1. Motivation and Introduction 1.1. Motivation. Web companies, such as Google, use
off-the-shelf commodity components to build data-centers as
they are cheap, abundant and easily replaceable. The current state of the art commodity processors are primarily composed of
multicore chips. However, as multiple processes execute simultaneously, contention for shared on-chip resources can often cause
significant performance degradation and low machine utilization. When the performance of an application is negatively
effectedduetocontentionwithaco-runningapplicationonaneighboringcore,wecallthiscross-coreperformanceinterference. Cross-core
interference is particularly undesirable for latency sensitive applications such as Search and Maps. For instance, in the latency
sensitive domain of web search this kind of cross core interference can cause unexpected slowdowns, negatively impacting the
QoS on a search query. A commonly used solution is to simply disallow the co-location of latency sensitive applications with other
applications on a single machine, resulting in low machine utilization and higher energy costs. To date, there has been very little
research attention paid to investigate deployable solutions to these challenges in the current commodity multicore environment.
Effectively addressing these challenges can significantly improve machine utilization, lower power consumption, improve energy
efficiency, decrease latency, and ultimately lower cost in running the data-center, potentially translating to many millions of dollars
saved.
Figure 1 illustrates the potential cross-core interference that can occur when multiple co-running applications are executing on
current multicore architectures. We perform this experiment using two representative examples of the state of the art multicore
chip designs that can be found in a datacenter, the Intel Core i7 Quad Core chip and AMD’s Phenom X4 Quad Core. The figure
shows the performance slowdown when co-locating each of the SPEC2006 benchmarks with the lbm application (known to be
cache intensive). As this graph shows, there are severe performance degradations (of up to 35%) due to cross-core interference
on most of the applications.
1.2. Research. The vision for our Adaptive Cloud Computing Systems Lab(ACCS Lab) is to develop infrastructure and
techniques to maximize utilization in a datacenter without sacrificing latency. We focus on innovation in application software,
system software, and micro-architectural design. As we form our ACCS Lab, one of our first initiatives is a holistic contention
aware computing environment to address the challenges of on-chip resource contention in the memory subsystem and the
resulting cross core performance interference. This computing environment includes a contention aware compilation framework,
programming model, operating system and micro-architectural design. As shown in Figure 2 this proposal focuses on the
investigation, design and evaluation of the contention aware (CA) Compilation Framework. The state of the art compilation
techniques on the current multi-core architectures are oblivious to contention. We believe that CA compilation may prove critical to
fully utilize the parallelization power of these architectures. Our CA compilation framework includes three inter-related research
directions as shown in Figure 2: 1) the formation of a profiling and characterization framework that will be able to quantify the
inherent cross-core interference sensitivity of an application and specific code regions, 2) novel static compilation techniques that
will allow us to specialize application code layout based on the code regions’ cross-core interference characteristics provided by
our profiler, and 3) sophisticated online adaptive and managed runtime techniques that will allow dynamic responses to contention
as it occurs.
These three components of our CA Compilation framework are highly inter-related and necessary for realizing the vision of
contention-aware compilation. Because contention only occurs during runtime, static analysis alone is not sufficient to characterize
cross-core interference, which necessitates the design of novel characterization and profiling techniques. Our profiling analysis will
be able to reveal an application’s dynamic phases of contention and identify contentious code regions. Our compilation techniques
can then perform code transformations on these contentious regions. The compilation framework will also incorporate other
knowledge gained during profiling through feedback directed optimizations. Novel compilation
1
1.4x
1.3x
1.35x
1.2x
1.25x
Intel Core i7 Quad
AMD Phenom X4
Figure 1. Slowdown when co-located
with lbm
1.1x
1.15x
1
x
1
.
0
5
x
techniques can
also help enable
the online
adaptive
detect-and-resp
ond system. In
addition, the
phase-level
knowledge
gained during
profiling can also
assist the online
system as well.
1.3.
Preliminary
Work. We have
investigated the
challenges
posed by
contention on
mulitcore
architectures
and gained
many insights
during our
preliminary work
and our
collaboration
experience with
Google. We
have also
demonstrated
the potential of
contention
aware and
adaptive
approaches
through our work
on Scenario
Base
Optimization
(SBO) [1], our
Contention
Aware Execution
Runtime (CAER)
[3] and our
preliminary work
of
characterization
Figure 2. Contention Aware
Computing Environment
and profiling
methodology [2].
The SBO and
CAER works [1,
3] were done in
collaboration
with Robert
Hundt at Google
while my PhD
student, Jason
Mars, interned
for the last two
summer. In
summary, using
SBO we were
able to apply
aggressive loop
unrolling and
software cache
prefetching
optimizations
dynamically and
adaptively,
gaining 12%
performance
improvement on
average over
traditional static
approaches.
Meanwhile, our
CAER engine
provides
improved
performance
isolation for
latency sensitive
applications on
current
commodity
hardware. Using
hardware
performance
counter
information the
runtime detects
contention in the
shared on-chip
cache and
responds by
staggering or
halting the
execution of the
latency
insensitive
application. With
CAER, we bring
the overhead
due to
contention from
17% down to 5%
on average,
while gaining
close to 60%
more utilization
of the processor
over running the
latency-sensitive
application
alone. These
results
demonstrates
the great
promise of
contention-awar
e and adaptive
approaches. We
have only
scratched the
surface with
these prior
technologies,
and we believe
by addressing
the problem of
contention
appropriately,
performance
and utilization
can be
significantly
improved.
Please refer to
the full
publications for
more in-depth
details of our
preliminary
work. Also note
we have
continued our
collaboration
with the compiler
team at Google
and provided our
prototype
implementations
to their team via
Google Code.
2
. Proposed
Research
To date,
there is no
compilation
framework
that is
contention
aware. As
the first
initiative of
our
Adaptive
Cloud
Computing
Systems Lab,
we propose a
holistic
contention
aware
compilation
framework. To
achieve this we
propose a
number of
research
projects.
2.1.
Characterizing
and Profiling
Cross Core
Interference
Sensitivity. An
application’s
cross-core
interference
sensitivity is
determined by
its intrinsic
reliance on the
shared memory
resources and
the underlying
abundance and
management of
those resources
in the
micro-architectur
e. Also note that,
as our
preliminary
experimentation
s show, on
average,
cross-core
interference
sensitivity of an
application also
indicates its
aggressiveness
as the sensitivity
of an application
hinges on its
demand on, and
usage of, the
shared resource.
We seek to
characterize this
sensitivity and
aggressiveness
as it relates to
the entire
application, its
phases, and
source-level
code regions. To
date, there has
been no
methodology for
identifying and
extracting this
information on
current
real-world
multicore
architectures.
To address
these
challenges, we
propose an
online empirical
characterization
approach. To
assess an
application’s
sensitivity to
cross-core
interference due
to contention,
we plan to
synthesize
contention. As
an application
executes on our
profiling
framework, a
carefully
designed
contention
synthesis engine
will be spawned
on a neighboring
core to run
alongside the
application. This
contention
synthesis engine
is continually
controlled by our
profiler and run
in a bursty
fashion. The
resulting
performance
impact on the
host application
is monitored,
analyzed, and
profiled by the
profiling
framework. Our
profiler will also
monitor specific
application code
regions as it
relates to this
performance
impact using the
ubiquitous
on-chip
hardware
performance
monitors. This
performance
impact will be
used to generate
a quantitative
metric for cache
contentiousness
that we will be
able to associate
with an
application, its
individual
phases, and
source-level
code regions.
Our preliminary
contention
synthesis
mechanisms are
presented in [2].
In addition to
what is
presented in this
preliminary
work, we
propose
application
phase detection
and
characterization,
and source level
code region
characterization.
2.2.
Contention
Aware
Compilation
Techniques.
For
co-running
applications,
our prior work
[3] indicates
that a small
amount of
throttling
down for one
application
could greatly
reduce
contention,
resulting in
less
performance
degradation.
Based on this
observation,
we propose a
new way of
thinking about
compiler’s
optimization.
Instead of
optimizing an
application
while only
considering its
standalone
performance,
we propose
optimizing an
application
mean
sphinx3
xalancbmk
sjeng
soplex
povray
perlbench
namd
omnetpp
mcf
milc
lbm
libquantum
hmmer
h264ref
gcc
gobmk
dealII
astar
bzip2
Slowdown
considering its
performance
when
co-running
with others.
We will first
investigate
how existing
optimizations
affect a
program’s
behaviors in
the presence
of contention.
Various
existing
optimizations,
including
software
cache
prefetching
and other
optimizations
that modify the rate or order of memory access such as loop transformations, affect how an application interacts with the shared
memory system. Therefore, they may have either positive or negative effects on a program’s sensitivity to cross core interference.
We will investigate these optimizations and then design heuristics and provide the option to apply them based on the new
optimization objectives: optimize for overall performance and/or to accommodate latency sensitive applications’ performance. An
example could be heuristics to restrict the application of software cache prefetching to code regions that are identified as not
aggressive. We will also design novel code transformation techniques. For example, one technique we wish to investigate is to
slow down the data access rate of particular loops that are identified as contentious using a loop padding technique. These
code-regions and particular loops will be identified using our profiling framework.
2.3. Contention Aware Online Adaptive Techniques and Managed Runtime Systems. To enable online adaptive approaches in
previous research [1], we have suggested a Scenario Based Optimization framework (Google Patent Pending), that can apply
application code changes online only when contention is detected. We believe we have just scratched the surface and would like
to investigate a novel framework that includes a continuous GCC versioning server that can continuously evaluate and provide
code re-layout alternatives for dynamic selection.
In addition to these compilation techniques we believe there is groundbreaking work to be done in the domain of managed
runtime environments such as the Java VM and Microsoft’s Common Runtime Language CLR. One of our long term goals is the
design of a Contention Aware VM. The VM provides a broader design space and thus more opportunities to address the
challenges of contention in a datacenter. In this proposed research, we will focus our investigation on restructuring and applying
the novel techniques we have proposed at a finer granularity in the managed dynamic compilation domain. We also plan to
investigate how to use the virtual execution and garbage collection capabilities of managed runtimes to provide a harness for novel
dynamic memory re-layout techniques to address contention.
3. Expected Outcomes and Results Our prior work has shown that contention-aware
approaches in datacenter are promising for improving utilization and
reducing energy consumption while guaranteeing the QoS of latency sensitive applications. Our proposed research is to further
realize this promise and to develop a more advanced and comprehensive contention-aware compiler infrastructure. The research
result will be an end-to-end compilation framework that provides capabilities including profiling and characterization of an
application’s contention sensitivity, compiler optimization and code transformation techniques to address the performance impact
of contention, and online approaches and managed runtime to adapt and respond to contention dynamically. Our detailed
expected outcomes and results are as follows:
• A profiling and characterization framework that effectively captures an application’s inherent sensitivity to cross core
interference caused by contention. The system will also pin-point hot code regions that are contentious.
• New compiler heuristics and code transformations that are designed, implemented, and evaluated. We will investigate
how existing optimization techniques should be used in the contention-conscious environment as well as designing new
code transformation techniques to reduce contention and improve overall performance.
• Online adaptive and managed runtime techniques that enable applications to dynamically detect occurrences of
contention and respond.
• We will provide our contention aware compilation framework to Google as open source profiling systems, and open
source GCC and JVM extensions. All of our deliverables will work on real programs and real systems translating into real
performance boosts.
4. Budget Plan and Google Contact Considering the scope of this project, and the
promise demonstrated in our publications [1, 2, 3], we are requesting one
year of funding for two students, one of which is Jason Mars who interned at Google, and under the supervision of Robert Hundt,
designed and implemented the prior work mentioned. Total amount of funds being requested is $128,000 These funds will be used
to support two PhD student for a period of one year ($54K per year for 2 PhD students) and to provide one month of summer
salary for the PI, Mary Lou Soffa ($20,000).
Robert Hundt will serve as the sponsor of this project. The results and infrastructure of this research work will be regularly
shared with Robert Hundt and his compiler team.
References [1] J. Mars and R. Hundt. Scenario based optimization: A framework
for statically enabling online optimizations. In CGO ’09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 169–179, Washington, DC, USA, 2009. IEEE
Computer Society.
[2] J. Mars and M. L. Soffa. Synthesizing contention. In 2009 Workshop on Binary Instrumentation and Applications (WBIA), New York, NY, USA,
December 2009.
[3] J. Mars, N. Vachharajani, M. L. Soffa, and R. Hundt. Contention aware execution: Online contention detection and response. In CGO ’10:
Proceedings of the 2010 International Symposium on Code Generation and Optimization, Toronto, Canada, April 2010.
Biography: Mary Lou Soffa
Department of Computer Science
University of Virginia Charlottesville,
VA 22904 (434) 298-2277
soffa@virginia.edu
Professional Preparation University of Pittsburgh,
B.S., Mathematics Ohio State University, M.S.,
Mathematics University of Pittsburgh, Ph.D.,
Computer Science
Appointments University of Virginia Owen R Cheatham Prof. and Chair 2004- University of
Pittsburgh Professor 1990–2004 University of Pittsburgh Graduate Dean in Arts & Sciences 1991–1996
University of California, Berkeley Visiting Associate Professor 1987–1988 University of Pittsburgh
Assist, Associate Professor 1977–1989
Honors and Awards (selected) • Invited speaker, International Conference on Software Testing, April, 2009 •
Computing Research Association’s Nico Habermann Award, 2006 • Keynote Speaker, Fifth International
Conference on Quality Software, Melbourne, Australia, September,
2005 • Keynote Speaker, International Compiler Construction Conference, Barcelona, March, 2004.
• ACM Fellow, 1999 • Presidential Award for Excellence in Mentoring in Science, Mathematics, and
Engineering, 1999:
given by the White House for excellence in mentoring under-represented students and encouraging
their significant achievement in science, mathematics and engineering.
• Girl Scout Woman of Distinction for 2003. • Invited Speaker, Grace Hopper Celebrating Women
Conference, October 2002 • Invited Speaker, National Symposium on the Advancement of Women in
Science, Harvard, April
2003. • Most Influential papers of 20 years in ACM/SIGPLAN Programming Languages Design
and
Implementation (PLDI), “Complete Removal of Redundant Expressions”, (co-authored with R.
Bodik and R. Gupta), 40 out of 550 papers selected and appeared in a PLDI Anniversary issue,
2003.
Five Publications Most Closely Related to the Proposed Project
1. Contention Aware Execution: Online Contention Detection and Response Jason Mars, Mary Lou
Soffa, Neil Vachharajani, Robert Hundt To Appear in proceedings of the ACM/IEEE International
Symposium on Code Generation and Optimization (CGO) 2010
2. Jason Mars and Mary Lou Soffa, “Mats: MultiCore Adaptive Trace Selection,” Third workshop on Software
Tools for Multicore Systems, collocated with CGO, April, 2009
3. Min Zhao, Bruce R. Childers, Mary Lou Soffa, “A Framework for Exploring Optimization Properties,
Compiler Conference, March, 2009
4. Jing Yang, Shukang Zhou, and Mary Lou Soffa Dime nsion: An Instrumentation Tool for
Virtual Execution Environments.” Second International Conference on Virtual
Execution Environments (VEE '06) . Ottawa, Canada, June 14, 2006
1
5. N. Kumar, B. R. Childers, D. Williams, J. W. Davidson and M.L. Soffa, “Compile-time Planning for
Overhead Reduction in Software Dynamic Translators,” International Journal on Parallel Programming,
December 2004.
Five Other Significant Publications 1. M. Zhao, B. Childers and M.L. Soffa, “Predicting the Impact
of Optimizations for Embedded
Systems,” 2003 ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded
Systems, San Diego, CA., pp. 1-11, 2003.
2. J. Misurda, J. Clause, J. Reed, P. Gandra, B.R. Childers and M.L. Soffa “Demand-Driven Structural
Testing with Dynamic Instrumentation,” International Conference on Software Engineering, St. Louis,
May, 2005.
3. Min Zhao, Bruce R. Childers and Mary Lou Soffa, “A Model-based Framework: An Approach for
Profit-driven Optimization “, ACM SIGMICRO Int'l. Conference on Code Generation and Optimization
(CGO'05), San Jose, California, March 2005.
4. Bruce Childers, Jack W. Davison and Mary Lou Soffa, “Continuous Compilation: A New Approach to
Aggressive and Adaptive Code Transformation,” Proceedings of the International Parallel and
Distributed Processing Symposium (IPDPS'03) Nice, 2003
5. J. Misurda, J. Clause, J.L. Reed, P. Gandra, B.R. Childers and M.L. Soffa, “Jazz: A Tool for
Demand-Driven Structural Testing,” 14th ETAPS International Conference on Compiler
Construction (CC'05), Edinburgh, Scotland, April 2005.
Five Synergistic Activities 1. Chair/Vice Chair of Professional Organizations: Vice Chair of Computing
Research Association
(CRA) (1998-2001); Co-Chair of CRA-W, Committee on Status of Women in Computing
(19902002; Chair of ACM/ SIGPLAN (1997-1999).
2. Conference Chair: Architectural Support for Programming Languages and Operating Systems
(ASPLOS ) 2009; Conference on Code Generation and Optimization (CGO), March, 2008; SIGSOFT
Eight International Symposium on the Foundations of Software Engineering, 2002; ACM SIGPLAN
Programming Languages Design and Implementation, 1995.
3. Program Chair: ACM/IEEE International Conference on Software Engineering, 2006; ACM
SIGPLAN Programming Languages Design and Implementation, June 2001; Parallel Architectures and
Compiler Techniques, October, 2000.
4. Member on Editorial Board: ACM Transactions on, Software Engineering Methodology (2003 –
present); Journal of Parallel Programming (1995-present); IEEE Transactions of Software Engineering
(1994-2000); South African Journal of Computing (1996 - present), ACM Transactions Programming
Languages and Systems (1993-2000).
5. Diversity/Student Activities: CRA Career Workshops, Snowbird Conference panels,
OOPSLA Doctoral Symposium, ICSE Doctoral Symposium Chair, Presidential Award for
Mentoring Underrepresented groups.
Advisor/Co-advisor: advisor to 55 Master Students, over half were women;
PhD students: (Graduated) Naveen Kumar (VmWare), 2008, Greg Kapfhammer (Allegheny College),
2007, Min Zhao ( HP Research Labs), Atif Memon (University of Maryland), Clara Jaramillo
(Chatham College), Rastislav Bodik (University of California at Berkeley), Neelam Gupta (University of
Arizona), Evelyn Duesterwald (IBM Research Labs), Jodi Tims (St. Francis College), Tia Watts (Indiana
University of Pennsylvania), David Berson (Motorola, Inc.), Tarun Nakra (IBM), Chy Ren Dow
(Feng-Chia University), Pat Pineo (Edinboro University), Deborah Whitfield (Slippery Rock University),
Brian Malloy (Clemson University), Ravi Sharma (Bell Labs), Mary Jean Harrold (Georgia Tech), Mary
Bivens (Allegheny College), Lori Pollock (University of Delaware), Rajiv Gupta (U. of Arizona), George
Logothetis (AT&T), Ching-Chy Wang (Leverage Design Acceleration Corp.), Fernando Lafora-Garcia
(DEC Corp); CURRENT: Apala Guha, Jing Yang, Wei Le, Kristen Walcott, Jason Mars, Lingjia
Tang, Wei Wang, Tanima Dey (University of Virginia)
2
Budget 2010-2011 (Soffa) Salary support for PI Soffa (1 month) $ 20,000 2
graduate students for 1 year ($54K per student) $108,000 Total
$128,000
Breakdown of charges for one student Salary $17,615.00 Tuition $13,670.00
Insurance $2,092.00 Overhead $10,641.78 Student fees $64.00 Out of state fees:
$10,000 Foreign student fees $100.00 Total $54,182.78
Download