Research Business Technology

advertisement
Research Business Technology
Pfizer Enterprise Elastic HPC
Mike Miller
Pfizer Research Business Technology
May 18th Prism Meeting
Stockholm Sweden
Research Business Technology
How do we define HPC?
• Simply summarized as the computational
laboratory
• Consists of:
• Desktop/Services, integrated with
• Global high performance cached file system
• Centralized large capacity/capability compute resources
• Used by:
• Direct
• 300-400 expert computational scientists in chemistry, biology, DMPK,
stats, pharm sci & clin pharm
• Indirect
• >2000 lab scientists using desktop apps that utilize HPC compute
2
Research Business Technology
The Evolution of HPC at Pfizer
105 CPU Hours
NGS/HCS/Image
Biologics
Clin Pharm
Stats
Pharm Sci
Biology
2009 6 x3950 (520 cores)
1999.1
1999.4
2000.3
2001.2
2002.1
2002.4
2003.3
2004.2
2005.1
2005.4
2006.3
2007.2
2008.1
2008.4
2009.3
2010.2
2011.1
2011.4
Chemistry
2000 SGI Origins (128 cores)
2010 on-demand Amazon VPC
2004 150 blades (300 cores)
3
Research Business Technology
Intersection of “The Cloud” and HPC
4
Research Business Technology
Pfizer VPC Overview
• The Pfizer Virtual Private Cloud (pilot effort) has been
implemented an extension of our physical data center.
• Infrastructure as a service affords rapid provisioning
without compromising on:
– Security
– Compatibility
– Accessibility
– Agility
– Utility
• Implementation
Pfizer’s
isolated VPC
resources
Subnets
AWS Virginia
DC
VPN Gateway
Groton
DMZ
Secure VPN
Connection
over the
Internet
Router
Amazon
Web Services
Cloud
Research Business Technology
Computing Requirements come in Many forms
Feature
AWS
Internal
Data Center
VM’s
Required to be joined to the Pfizer network
Security Monitoring
Confidentiality
Complexity
Provisioning Costs
Avail. Config.
Public
Moderate
Low 1000’s $
mid-10’s $
$0
AMI/Xen
1 mo.
Request Capacity/Wk
100-1000s
10-100s
Runtime/Depreciation
low-10’s $
high-10’s$
Self / incident
Support SLAs
None
OS Configurations
Linux REHL 5.x
Environment
POC
6 mo.
1-10s
Low-100’s $
7x24
8x5
Immediate
24 hr
Windows server 2003/2008
Dev / Test
HPC
Black Box
2-8 wks
4 hrs
1 hr.
Support Model
Bare Metal
VMWare
Xen
1 hr
Min. Billable Period
High
Simple
Stand Alone
Provisioning SLA
Controls
high
med
low
Prod
HPC
System root level access
Qualified / Validated
Solaris,
AS 400
Research Business Technology
Security
• Amazon practices & security
measures successfully met audit
criteria for Research level use
• Pfizer employed the same security
systems used internally
– IP-sec tunnels in to AWS
– Pfizer Global Active Directory
• Joining machines and managing permissions
– Linux & Windows
Research Business Technology
Compatibility
• To get the most benefit from the cloud it was
necessary to align AWS resource offerings with
existing internal systems:
– AMI’s (VM)

Pfizer Qualified RHEL 5 image
• Centrify/AD provides identification/authorization
• Kerberos credentials via AD
– File cache (storage) 
accessible
– IP mappings

OpenAFS volumes
Pfizer DNS
• AMI’s have Pfizer network identities & are discoverable
– Allows AMI’s to be part of our LSF cluster
– Users can do development work accessing the full range of Pfizer resources
• e.g. Software licenses utilize the pfizer flexlm server
Research Business Technology
Availability
• AD & DNS give us full range of access to
internal systems
– LSF for job scheduling
– Oracle / mySQL instances for accessing
structured data
– AFS for secure access to unstructured data
• High performance via local caching
– Access to licensed and internally developed software
Research Business Technology
Agility
• The $50M decision
– Required completion of a time sensitive
chemoinformatics task
• Workload was diverted from internal resources so they
could be dedicated.
• Within 30 min 64 cores were spun up and joined to LSF
• For 4 days >50,000 jobs were executed
• Total cost <$1,500
– Results were obtained on-time and the decision
taken
Research Business Technology
Utility
• Internal Application Development
– Tomcat web applications
– Nightly builds & regression testing
• HPC capacity
– Over 250 apps are accessible
– LSF uses resource specifications to determine
suitability and schedules jobs accordingly
• Over 100,000 jobs run
– QM, ab initio
– Virtual screening
– Systems biology
Research Business Technology
Implementation
• From PoC
 Production
– Provisioning, exploring commercial solutions that
enable:
• One-time actions
– Integrate with our procurement system
• Move to a debit (pre-allocated funding) model
– Standard configurations
• Repeatable actions
– Start/ Stop instances via a user centric dashboard
• User’s manage / are accountable for the resources they use
• LSF
– Custom code
• detect workload
• Start / Stop AMI’s
• Leverage accounting
Download