Research Business Technology Pfizer Enterprise Elastic HPC Mike Miller Pfizer Research Business Technology May 18th Prism Meeting Stockholm Sweden Research Business Technology How do we define HPC? • Simply summarized as the computational laboratory • Consists of: • Desktop/Services, integrated with • Global high performance cached file system • Centralized large capacity/capability compute resources • Used by: • Direct • 300-400 expert computational scientists in chemistry, biology, DMPK, stats, pharm sci & clin pharm • Indirect • >2000 lab scientists using desktop apps that utilize HPC compute 2 Research Business Technology The Evolution of HPC at Pfizer 105 CPU Hours NGS/HCS/Image Biologics Clin Pharm Stats Pharm Sci Biology 2009 6 x3950 (520 cores) 1999.1 1999.4 2000.3 2001.2 2002.1 2002.4 2003.3 2004.2 2005.1 2005.4 2006.3 2007.2 2008.1 2008.4 2009.3 2010.2 2011.1 2011.4 Chemistry 2000 SGI Origins (128 cores) 2010 on-demand Amazon VPC 2004 150 blades (300 cores) 3 Research Business Technology Intersection of “The Cloud” and HPC 4 Research Business Technology Pfizer VPC Overview • The Pfizer Virtual Private Cloud (pilot effort) has been implemented an extension of our physical data center. • Infrastructure as a service affords rapid provisioning without compromising on: – Security – Compatibility – Accessibility – Agility – Utility • Implementation Pfizer’s isolated VPC resources Subnets AWS Virginia DC VPN Gateway Groton DMZ Secure VPN Connection over the Internet Router Amazon Web Services Cloud Research Business Technology Computing Requirements come in Many forms Feature AWS Internal Data Center VM’s Required to be joined to the Pfizer network Security Monitoring Confidentiality Complexity Provisioning Costs Avail. Config. Public Moderate Low 1000’s $ mid-10’s $ $0 AMI/Xen 1 mo. Request Capacity/Wk 100-1000s 10-100s Runtime/Depreciation low-10’s $ high-10’s$ Self / incident Support SLAs None OS Configurations Linux REHL 5.x Environment POC 6 mo. 1-10s Low-100’s $ 7x24 8x5 Immediate 24 hr Windows server 2003/2008 Dev / Test HPC Black Box 2-8 wks 4 hrs 1 hr. Support Model Bare Metal VMWare Xen 1 hr Min. Billable Period High Simple Stand Alone Provisioning SLA Controls high med low Prod HPC System root level access Qualified / Validated Solaris, AS 400 Research Business Technology Security • Amazon practices & security measures successfully met audit criteria for Research level use • Pfizer employed the same security systems used internally – IP-sec tunnels in to AWS – Pfizer Global Active Directory • Joining machines and managing permissions – Linux & Windows Research Business Technology Compatibility • To get the most benefit from the cloud it was necessary to align AWS resource offerings with existing internal systems: – AMI’s (VM) Pfizer Qualified RHEL 5 image • Centrify/AD provides identification/authorization • Kerberos credentials via AD – File cache (storage) accessible – IP mappings OpenAFS volumes Pfizer DNS • AMI’s have Pfizer network identities & are discoverable – Allows AMI’s to be part of our LSF cluster – Users can do development work accessing the full range of Pfizer resources • e.g. Software licenses utilize the pfizer flexlm server Research Business Technology Availability • AD & DNS give us full range of access to internal systems – LSF for job scheduling – Oracle / mySQL instances for accessing structured data – AFS for secure access to unstructured data • High performance via local caching – Access to licensed and internally developed software Research Business Technology Agility • The $50M decision – Required completion of a time sensitive chemoinformatics task • Workload was diverted from internal resources so they could be dedicated. • Within 30 min 64 cores were spun up and joined to LSF • For 4 days >50,000 jobs were executed • Total cost <$1,500 – Results were obtained on-time and the decision taken Research Business Technology Utility • Internal Application Development – Tomcat web applications – Nightly builds & regression testing • HPC capacity – Over 250 apps are accessible – LSF uses resource specifications to determine suitability and schedules jobs accordingly • Over 100,000 jobs run – QM, ab initio – Virtual screening – Systems biology Research Business Technology Implementation • From PoC Production – Provisioning, exploring commercial solutions that enable: • One-time actions – Integrate with our procurement system • Move to a debit (pre-allocated funding) model – Standard configurations • Repeatable actions – Start/ Stop instances via a user centric dashboard • User’s manage / are accountable for the resources they use • LSF – Custom code • detect workload • Start / Stop AMI’s • Leverage accounting