Microsoft Researchers Boost Task Productivity Fiftyfold with Cluster Server Software

advertisement
Microsoft Server Product Portfolio
Customer Solution Case Study
Microsoft Researchers Boost Task Productivity
Fiftyfold with Cluster Server Software
Overview
Country or Region: United States
Industry: Life sciences
Customer Profile
Microsoft Research, founded in 1991,
employs more than 700 researchers
worldwide, working in more than 55
research areas that are independent of
software development.
Business Situation
Microsoft researchers working on immune
system interactions with the human
immunodeficiency virus struggled to get
meaningful results using their networked
individual computers.
Solution
With the deployment of Windows® Compute
Cluster Server 2003 for high-performance
computing using 25 IBM eServer 326
server computers in a 64-bit environment,
the researchers gained the necessary
computing power.
Benefits
 Increased task productivity fiftyfold
 Achieved more confidence in research
results
 Streamlined deployment, management,
and use
 Provided extensible solution
“With Windows Compute Cluster Server, we can run
50 jobs—of 200,000 work items each—in the same
amount of time that it used to take to run 1 job.”
Carl Kadie, Research Software Development Engineer, Microsoft Research, Microsoft Corporation
Since 2003, scientists at Microsoft Research have been performing
research on the design of a vaccine for the human
immunodeficiency virus (HIV). However, with only six personal
computers and 10 processors, the research team struggled to
perform statistical analysis, which required a year of computer
processing, or CPU time. With the help of the High Performance
Computing (HPC) group at Microsoft, the team deployed Windows®
Compute Cluster Server 2003 on 25 IBM server computers. Now,
the research team can run 50 jobs—of 200,000 work items each—
in the time it once took to run 1 job. As a result, the research team
has gained enough insights to publish in the top scientific journals.
The team, which finds Windows Compute Cluster Server simple to
deploy, use, manage, and extend, is using it to unravel the puzzles
that may one day lead to an HIV vaccine.
“With the .NET
Framework and
Windows Compute
Cluster Server, running a
new version of an
application is as easy as
copying files.”
Situation
Carl Kadie, Research Software Development
Engineer, Microsoft Research, Microsoft
Corporation
Since 2003, Microsoft Research has been
helping in the quest to develop a vaccine for
the human immunodeficiency virus (HIV),
which causes acquired immunodeficiency
syndrome (AIDS). The HIV research team
members, who all have impressive
credentials, include David Heckerman, MD,
PhD, Senior Researcher; Carl Kadie, PhD,
Research Software Development Engineer;
and Jonathan Carlson, MS, Research Intern
and PhD candidate. The researchers are
pursuing work that may lead to the
development of a vaccine for HIV. Their
research supports the search for an
immunogen—the part of the vaccine that
triggers an immune response. Researchers
elsewhere are working on the other central
component of vaccine design, the vector, or
the part of the vaccine that delivers the
immunogen.
In 1991, Microsoft Corporation created its
own research organization, Microsoft
Research. To build a foundation for future
technology breakthroughs, Microsoft
Research focuses on long-term projects,
independent of day-to-day product
development. Today, Microsoft Research
employs more than 700 researchers, working
in more than 55 research areas.
Complications of Research on HIV
One of the complications that the researchers
face is that HIV mutates rapidly, unlike some
viruses. “At one extreme is the measles virus,
which virtually never changes. On the other
extreme are viruses like HIV that evolve in
each patient’s body,” says Heckerman. “Inbetween, there are viruses such as influenza,
which may mutate only a few times a year.”
To understand the significance of the HIV
mutation rate, it helps to know how the
human immune system works. Conceptually,
the immune system memorizes a virus and,
once trained, the system recognizes the virus
and mounts an assault to kill it. When the
mutation rate is low—as it is in the measles
virus—the virus is essentially the same in
everyone who contracts it. That similarity
makes it easier for researchers to develop an
immunogen that will kill the virus in all
populations. HIV, however, which mutates
much more frequently, presents more of a
challenge. “The immune system trains itself
to attack a particular form of HIV, but then
the HIV mutates to escape the attack,” says
Heckerman.
In addition, the genetic variation in human
immune systems is so great that there are
hundreds of different types of immune
responses. So, finding an immunogen for a
rapidly mutating virus is complicated.
The Microsoft researchers run simulations of
how HIV responds to attacks by the immune
system, using the genetic information about
the virus—a description of its ribonucleic acid
(RNA). The researchers are looking for
correlations between the viral RNA and the
human immune type. To determine which
correlations are significant, they must
perform extensive randomized testing.
Knowledge of these correlations can thus
contribute to the development of an effective
immunogen that works for the wide variety of
human populations.
Challenges for Microsoft Research
Microsoft Research works with many
prestigious universities and research facilities
throughout the world, including Harvard
University, Massachusetts General Hospital,
the Fred Hutchison Cancer Research Center,
and the Los Alamos National Laboratory in
the United States; Oxford University in the
United Kingdom; Murdoch University in
Australia; and the University of British
Columbia in Canada, among others.
These institutions provide the scientists with
genetic sequencing information from the viral
“We can do upgrades
and rollouts of new
versions of code, deploy
them to all the nodes,
reboot them, and have
them running in less
than an hour.”
Doug Lindsey, Program Manager, High
Performance Computing Group, Microsoft
Corporation
samples. The number of calculations involved
is staggering. “A typical job will have a million
work items,” says Kadie. “Each one of those
is a statistical test. To arrive at a result that
we can depend on with confidence, we need
about one year of CPU time.”
Early in the research, however, the scientists
had only their own and borrowed personal
computers available. “If we were lucky, we
would get maybe six computers and—
because some had dual processors—that
would give us 10 processors,” says Kadie.
“We had to start each of the jobs separately,
and then copy the files over and set
everything up. Then, one of the biggest
problems was remembering which computer
we had used for each job, so we could
tabulate the results.”
Heckerman adds, “We were not able to do
anything very interesting at that point,
because we couldn’t do the tests that we
needed to do to provide meaningful results.”
The research team needed more computing
power. Because the primary applications that
the team uses were written in the Microsoft®
Visual C#® development tool and the Visual
C++® development system, any
improvements in the computing
infrastructure needed to support those
applications.
Solution
To help the Microsoft researchers address
their computing challenges, the Microsoft
High Performance Computing (HPC) Group,
which works on product development,
deployed Windows® Compute Cluster Server
2003, which runs on the Windows Server
2003 operating system in a 64-bit computing
environment.
Windows Compute Cluster Server is a highperformance computing solution that
includes setup procedures, a suite of
management tools, and the Compute Cluster
Job Manager, an integrated job scheduler.
The compute cluster software also works with
the Active Directory® service to provide rolebased security.
The deployment took place quickly; it was
completed in little more than an hour.
“Implementation consisted of setting up the
head node using the built-in setup wizard,
and then using a scripted install to add nodes
to the cluster,” says Doug Lindsey, Program
Manager, High Performance Computing
Group, Microsoft. “All the administrator has to
do is approve the setup on the administrative
console. It takes between one and one-and-ahalf minutes to install each node, and they
install in parallel.”
The HPC cluster for the HIV researchers is
based on 25 IBM eServer 326 server
computers, with two AMD Opteron processors
per machine running at 2.6 gigahertz.
Heckerman, Kadie, and Carlson share HPC
clusters with other groups at Microsoft, and
occasionally more resources become
available, depending on demand, but they
always have a minimum of 36 processors
available.
For the database, the researchers use the
Microsoft SQL Server™ Desktop Engine.
Lindsey administers the cluster remotely
using the Compute Cluster Administrator,
which ships with Windows Compute Cluster
Server, and makes use of the Microsoft
Management Console. He uses Microsoft
Operations Manager 2005 with Service Pack
1 for monitoring the system.
The researchers use the Compute Cluster Job
Manager, the integrated job scheduler that
comes with Windows Compute Cluster Server,
to run and monitor jobs. For some projects,
Kadie automated submissions with scripting
using the Microsoft Visual Studio® 2005
Team Suite development system. “Sitting at
“The additional
computational power
and memory space has
made all the difference
in our ability to perform
the necessary research.
We can investigate more
possibilities and get the
results faster.”
our desktop, we can run an application
created in C# that talks directly to the cluster
and accomplishes all the steps to create the
jobs that we want to run.”
David Heckerman, Senior Researcher,
Microsoft Research, Microsoft Corporation
For the HIV scientists at Microsoft Research,
the use of Windows Compute Cluster Server
has dramatically improved their ability to do
research. “We can run larger jobs now, and
so do better science,” says Heckerman. “We
can try more difficult things and learn more
quickly, which can affect the paths we
pursue. The solution has also proven to be
simple to deploy, manage, and use, and it
can be easily expanded.”
The applications and HPC cluster use the
Microsoft .NET Framework. The researchers
never considered using Linux—but not
because the work was done at Microsoft. “We
get a lot of productivity from writing our code
in C# and .NET, which Linux does not
support,” says Kadie.
Benefits
Increased Task Productivity Fiftyfold
The Microsoft researchers have seen a huge
increase in productivity since the deployment
of the HPC solution. By taking advantage of
64-bit computing running in a clustered
environment, the HPC solution has provided a
much higher level of performance for the
enormous number of calculations that must
be run. “With Windows Compute Cluster
Server, we can run 50 jobs—of 200,000 work
items each—in the same amount of time that
it used to take to run 1 job,” says Kadie.
Heckerman also notes the comparison
between the previous environment and the
HPC solution. “By running on 64-bit
computers, our jobs are no longer limited to
two gigabytes of memory,” he says. “The
additional computational power and memory
space has made all the difference in our
ability to perform the necessary research. We
can investigate more possibilities and get the
results faster.”
Achieved More Confidence in Research
Results
Windows Compute Cluster Server has
contributed another benefit, too. “Because
we can process so much data efficiently, we
are much more confident of our results,” says
Heckerman. “We can test more data sets
from more sources. As a result, we have a
much better understanding of the immune
system response, which we could not have
obtained without the cluster.” The team has
also published several scientific papers.
Streamlined Deployment, Management,
and Use
Deploying Windows Compute Cluster Server
is so easy that the HPC group—whose main
mission is engineering and product
development—could quickly set up and
manage the solution for the researchers. Not
only did the deployment take little time, but
the scripted installation also simplifies
changing the configuration and updating the
system. “We can do upgrades and rollouts of
new versions of code, deploy them to all the
nodes, reboot them, and have them running
in less than an hour,” says Lindsey.
“Sometimes upgrades take as little as 15
minutes.”
Lindsey finds administration of the HPC
cluster to be simple, too. “Windows Compute
Cluster Server runs jobs without a lot of
administrative overhead,” he says. “The fact
that it integrates with our existing Microsoft
infrastructure is a huge advantage. For
example, because it integrates with Microsoft
Operations Manager, I don’t have to learn
how to navigate a different set of tools or
figure out how everything can fit together. It
just works.”
The integration with Active Directory also
provides flexibility. “If we get to borrow a
“We can test more data
sets from more sources.
As a result, we have a
much better understanding of the immune
system response, which
we could not have
obtained without the
cluster.”
David Heckerman, Senior Researcher,
Microsoft Research, Microsoft Corporation
cluster, it’s simple to set up, because I can
just use the same security groups for that
cluster as I do for this one,” says Lindsey.
The researchers had no problem adjusting to
the cluster, either. “Our original C# and C++
code ran on the cluster unchanged,” says
Kadie. “Later, the addition of a small amount
of cluster-aware code allowed us to automate
steps that we had been doing manually when
running our PCs.”
Provided Extensible Solution
Because it can take more than 10 years to
test a vaccine even after research has
supplied the critical insights, the research
team is making no promises about how soon
a vaccine will be available for HIV. However,
the information gained from the study of HIV
may also help with the development of
vaccines for malaria or hepatitis C, or even
with the development of personalized
medicine based on an individual’s genetic
makeup. With Windows Compute Cluster
Server, the researchers believe they have a
solution that they can rely on for some time
as they pursue their research. “Because our
HPC solution is based on a Microsoft
infrastructure, we can continue to build on
our solution,” says Heckerman.
The ability to easily add new components and
new or updated applications provides one
more reason that the researchers
recommend the solution to others with
intensive computational needs. Kadie
explains that the team has several
applications that have evolved through
hundreds of versions, as they add new
features and test them. “With the .NET
Framework and Windows Compute Cluster
Server, running a new version of an
application is as easy as copying files,” he
says. “And when another cluster becomes
available, the team can immediately use it
with no code changes.
“Not only is Windows Compute Cluster Server
a high-performance solution that is simple to
use and organize, it’s also simple to add
more computing resources,” says Kadie. “And
anyone who needs to process large amounts
of statistical data will welcome the
productivity that comes with the increased
computational power.”
For More Information
Microsoft Server Product Portfolio
For more information about Microsoft
products and services, call the Microsoft
Sales Information Center at (800) 4269400. In Canada, call the Microsoft
Canada Information Centre at (877) 5682495. Customers who are deaf or hard-ofhearing can reach Microsoft text telephone
(TTY/TDD) services at (800) 892-5234 in
the United States or (905) 568-9641 in
Canada. Outside the 50 United States and
Canada, please contact your local
Microsoft subsidiary. To access information
using the World Wide Web, go to:
www.microsoft.com
For more information about the Microsoft
server product portfolio, go to:
www.microsoft.com/servers/default.mspx
For more information about Microsoft highperformance computing solutions, go to:
www.microsoft.com/hpc
For more information about Microsoft
Research visit the Web site at:
http://research.microsoft.com
Software and Services

Microsoft Server Product Portfolio
− Windows Server 2003 Standard x64
Edition
− Microsoft Operations Manager 2005
− Microsoft SQL Server Desktop Engine
− Windows Compute Cluster Server
2003
Microsoft Visual Studio
− Microsoft Visual Studio 2005 Team
Suite
 Technologies
− Active Directory
− Microsoft .NET Framework

Hardware

This case study is for informational purposes only. MICROSOFT
MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS
SUMMARY.
Document published February 2007
IBM eServer 326 server computers
Download