Microsoft Server Product Portfolio Customer Solution Case Study Microsoft Researchers Boost Task Productivity Fiftyfold with Cluster Server Software Overview Country or Region: United States Industry: Life sciences Customer Profile Microsoft Research, founded in 1991, employs more than 700 researchers worldwide, working in more than 55 research areas that are independent of software development. Business Situation Microsoft researchers working on immune system interactions with the human immunodeficiency virus struggled to get meaningful results using their networked individual computers. Solution With the deployment of Windows® Compute Cluster Server 2003 for high-performance computing using 25 IBM eServer 326 server computers in a 64-bit environment, the researchers gained the necessary computing power. Benefits Increased task productivity fiftyfold Achieved more confidence in research results Streamlined deployment, management, and use Provided extensible solution “With Windows Compute Cluster Server, we can run 50 jobs—of 200,000 work items each—in the same amount of time that it used to take to run 1 job.” Carl Kadie, Research Software Development Engineer, Microsoft Research, Microsoft Corporation Since 2003, scientists at Microsoft Research have been performing research on the design of a vaccine for the human immunodeficiency virus (HIV). However, with only six personal computers and 10 processors, the research team struggled to perform statistical analysis, which required a year of computer processing, or CPU time. With the help of the High Performance Computing (HPC) group at Microsoft, the team deployed Windows® Compute Cluster Server 2003 on 25 IBM server computers. Now, the research team can run 50 jobs—of 200,000 work items each— in the time it once took to run 1 job. As a result, the research team has gained enough insights to publish in the top scientific journals. The team, which finds Windows Compute Cluster Server simple to deploy, use, manage, and extend, is using it to unravel the puzzles that may one day lead to an HIV vaccine. “With the .NET Framework and Windows Compute Cluster Server, running a new version of an application is as easy as copying files.” Situation Carl Kadie, Research Software Development Engineer, Microsoft Research, Microsoft Corporation Since 2003, Microsoft Research has been helping in the quest to develop a vaccine for the human immunodeficiency virus (HIV), which causes acquired immunodeficiency syndrome (AIDS). The HIV research team members, who all have impressive credentials, include David Heckerman, MD, PhD, Senior Researcher; Carl Kadie, PhD, Research Software Development Engineer; and Jonathan Carlson, MS, Research Intern and PhD candidate. The researchers are pursuing work that may lead to the development of a vaccine for HIV. Their research supports the search for an immunogen—the part of the vaccine that triggers an immune response. Researchers elsewhere are working on the other central component of vaccine design, the vector, or the part of the vaccine that delivers the immunogen. In 1991, Microsoft Corporation created its own research organization, Microsoft Research. To build a foundation for future technology breakthroughs, Microsoft Research focuses on long-term projects, independent of day-to-day product development. Today, Microsoft Research employs more than 700 researchers, working in more than 55 research areas. Complications of Research on HIV One of the complications that the researchers face is that HIV mutates rapidly, unlike some viruses. “At one extreme is the measles virus, which virtually never changes. On the other extreme are viruses like HIV that evolve in each patient’s body,” says Heckerman. “Inbetween, there are viruses such as influenza, which may mutate only a few times a year.” To understand the significance of the HIV mutation rate, it helps to know how the human immune system works. Conceptually, the immune system memorizes a virus and, once trained, the system recognizes the virus and mounts an assault to kill it. When the mutation rate is low—as it is in the measles virus—the virus is essentially the same in everyone who contracts it. That similarity makes it easier for researchers to develop an immunogen that will kill the virus in all populations. HIV, however, which mutates much more frequently, presents more of a challenge. “The immune system trains itself to attack a particular form of HIV, but then the HIV mutates to escape the attack,” says Heckerman. In addition, the genetic variation in human immune systems is so great that there are hundreds of different types of immune responses. So, finding an immunogen for a rapidly mutating virus is complicated. The Microsoft researchers run simulations of how HIV responds to attacks by the immune system, using the genetic information about the virus—a description of its ribonucleic acid (RNA). The researchers are looking for correlations between the viral RNA and the human immune type. To determine which correlations are significant, they must perform extensive randomized testing. Knowledge of these correlations can thus contribute to the development of an effective immunogen that works for the wide variety of human populations. Challenges for Microsoft Research Microsoft Research works with many prestigious universities and research facilities throughout the world, including Harvard University, Massachusetts General Hospital, the Fred Hutchison Cancer Research Center, and the Los Alamos National Laboratory in the United States; Oxford University in the United Kingdom; Murdoch University in Australia; and the University of British Columbia in Canada, among others. These institutions provide the scientists with genetic sequencing information from the viral “We can do upgrades and rollouts of new versions of code, deploy them to all the nodes, reboot them, and have them running in less than an hour.” Doug Lindsey, Program Manager, High Performance Computing Group, Microsoft Corporation samples. The number of calculations involved is staggering. “A typical job will have a million work items,” says Kadie. “Each one of those is a statistical test. To arrive at a result that we can depend on with confidence, we need about one year of CPU time.” Early in the research, however, the scientists had only their own and borrowed personal computers available. “If we were lucky, we would get maybe six computers and— because some had dual processors—that would give us 10 processors,” says Kadie. “We had to start each of the jobs separately, and then copy the files over and set everything up. Then, one of the biggest problems was remembering which computer we had used for each job, so we could tabulate the results.” Heckerman adds, “We were not able to do anything very interesting at that point, because we couldn’t do the tests that we needed to do to provide meaningful results.” The research team needed more computing power. Because the primary applications that the team uses were written in the Microsoft® Visual C#® development tool and the Visual C++® development system, any improvements in the computing infrastructure needed to support those applications. Solution To help the Microsoft researchers address their computing challenges, the Microsoft High Performance Computing (HPC) Group, which works on product development, deployed Windows® Compute Cluster Server 2003, which runs on the Windows Server 2003 operating system in a 64-bit computing environment. Windows Compute Cluster Server is a highperformance computing solution that includes setup procedures, a suite of management tools, and the Compute Cluster Job Manager, an integrated job scheduler. The compute cluster software also works with the Active Directory® service to provide rolebased security. The deployment took place quickly; it was completed in little more than an hour. “Implementation consisted of setting up the head node using the built-in setup wizard, and then using a scripted install to add nodes to the cluster,” says Doug Lindsey, Program Manager, High Performance Computing Group, Microsoft. “All the administrator has to do is approve the setup on the administrative console. It takes between one and one-and-ahalf minutes to install each node, and they install in parallel.” The HPC cluster for the HIV researchers is based on 25 IBM eServer 326 server computers, with two AMD Opteron processors per machine running at 2.6 gigahertz. Heckerman, Kadie, and Carlson share HPC clusters with other groups at Microsoft, and occasionally more resources become available, depending on demand, but they always have a minimum of 36 processors available. For the database, the researchers use the Microsoft SQL Server™ Desktop Engine. Lindsey administers the cluster remotely using the Compute Cluster Administrator, which ships with Windows Compute Cluster Server, and makes use of the Microsoft Management Console. He uses Microsoft Operations Manager 2005 with Service Pack 1 for monitoring the system. The researchers use the Compute Cluster Job Manager, the integrated job scheduler that comes with Windows Compute Cluster Server, to run and monitor jobs. For some projects, Kadie automated submissions with scripting using the Microsoft Visual Studio® 2005 Team Suite development system. “Sitting at “The additional computational power and memory space has made all the difference in our ability to perform the necessary research. We can investigate more possibilities and get the results faster.” our desktop, we can run an application created in C# that talks directly to the cluster and accomplishes all the steps to create the jobs that we want to run.” David Heckerman, Senior Researcher, Microsoft Research, Microsoft Corporation For the HIV scientists at Microsoft Research, the use of Windows Compute Cluster Server has dramatically improved their ability to do research. “We can run larger jobs now, and so do better science,” says Heckerman. “We can try more difficult things and learn more quickly, which can affect the paths we pursue. The solution has also proven to be simple to deploy, manage, and use, and it can be easily expanded.” The applications and HPC cluster use the Microsoft .NET Framework. The researchers never considered using Linux—but not because the work was done at Microsoft. “We get a lot of productivity from writing our code in C# and .NET, which Linux does not support,” says Kadie. Benefits Increased Task Productivity Fiftyfold The Microsoft researchers have seen a huge increase in productivity since the deployment of the HPC solution. By taking advantage of 64-bit computing running in a clustered environment, the HPC solution has provided a much higher level of performance for the enormous number of calculations that must be run. “With Windows Compute Cluster Server, we can run 50 jobs—of 200,000 work items each—in the same amount of time that it used to take to run 1 job,” says Kadie. Heckerman also notes the comparison between the previous environment and the HPC solution. “By running on 64-bit computers, our jobs are no longer limited to two gigabytes of memory,” he says. “The additional computational power and memory space has made all the difference in our ability to perform the necessary research. We can investigate more possibilities and get the results faster.” Achieved More Confidence in Research Results Windows Compute Cluster Server has contributed another benefit, too. “Because we can process so much data efficiently, we are much more confident of our results,” says Heckerman. “We can test more data sets from more sources. As a result, we have a much better understanding of the immune system response, which we could not have obtained without the cluster.” The team has also published several scientific papers. Streamlined Deployment, Management, and Use Deploying Windows Compute Cluster Server is so easy that the HPC group—whose main mission is engineering and product development—could quickly set up and manage the solution for the researchers. Not only did the deployment take little time, but the scripted installation also simplifies changing the configuration and updating the system. “We can do upgrades and rollouts of new versions of code, deploy them to all the nodes, reboot them, and have them running in less than an hour,” says Lindsey. “Sometimes upgrades take as little as 15 minutes.” Lindsey finds administration of the HPC cluster to be simple, too. “Windows Compute Cluster Server runs jobs without a lot of administrative overhead,” he says. “The fact that it integrates with our existing Microsoft infrastructure is a huge advantage. For example, because it integrates with Microsoft Operations Manager, I don’t have to learn how to navigate a different set of tools or figure out how everything can fit together. It just works.” The integration with Active Directory also provides flexibility. “If we get to borrow a “We can test more data sets from more sources. As a result, we have a much better understanding of the immune system response, which we could not have obtained without the cluster.” David Heckerman, Senior Researcher, Microsoft Research, Microsoft Corporation cluster, it’s simple to set up, because I can just use the same security groups for that cluster as I do for this one,” says Lindsey. The researchers had no problem adjusting to the cluster, either. “Our original C# and C++ code ran on the cluster unchanged,” says Kadie. “Later, the addition of a small amount of cluster-aware code allowed us to automate steps that we had been doing manually when running our PCs.” Provided Extensible Solution Because it can take more than 10 years to test a vaccine even after research has supplied the critical insights, the research team is making no promises about how soon a vaccine will be available for HIV. However, the information gained from the study of HIV may also help with the development of vaccines for malaria or hepatitis C, or even with the development of personalized medicine based on an individual’s genetic makeup. With Windows Compute Cluster Server, the researchers believe they have a solution that they can rely on for some time as they pursue their research. “Because our HPC solution is based on a Microsoft infrastructure, we can continue to build on our solution,” says Heckerman. The ability to easily add new components and new or updated applications provides one more reason that the researchers recommend the solution to others with intensive computational needs. Kadie explains that the team has several applications that have evolved through hundreds of versions, as they add new features and test them. “With the .NET Framework and Windows Compute Cluster Server, running a new version of an application is as easy as copying files,” he says. “And when another cluster becomes available, the team can immediately use it with no code changes. “Not only is Windows Compute Cluster Server a high-performance solution that is simple to use and organize, it’s also simple to add more computing resources,” says Kadie. “And anyone who needs to process large amounts of statistical data will welcome the productivity that comes with the increased computational power.” For More Information Microsoft Server Product Portfolio For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 4269400. In Canada, call the Microsoft Canada Information Centre at (877) 5682495. Customers who are deaf or hard-ofhearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234 in the United States or (905) 568-9641 in Canada. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to: www.microsoft.com For more information about the Microsoft server product portfolio, go to: www.microsoft.com/servers/default.mspx For more information about Microsoft highperformance computing solutions, go to: www.microsoft.com/hpc For more information about Microsoft Research visit the Web site at: http://research.microsoft.com Software and Services Microsoft Server Product Portfolio − Windows Server 2003 Standard x64 Edition − Microsoft Operations Manager 2005 − Microsoft SQL Server Desktop Engine − Windows Compute Cluster Server 2003 Microsoft Visual Studio − Microsoft Visual Studio 2005 Team Suite Technologies − Active Directory − Microsoft .NET Framework Hardware This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Document published February 2007 IBM eServer 326 server computers