Southeastern Universities Reseach Association (SURA)

advertisement
SURA IT Program Update
• See Board materials book Tab 17 for IT highlights
– ITSG activities & summary of interactions with I2 & NLR –
Page 1-2 and
– SURA letters to I2 leadership - Page 13-16
– SURAgrid status & activities – Page 3-5
– SURAgrid corporate partnership program – Page: 5-6
– AtlanticWave update – Page: 8-10
– AT&T GridFiber update: Page:12
– SERON background and committee structure – Page: 17-27
– SURAgrid Governance Proposal – Page: 28-33
– SURAgrid application summaries – Page: 34-43
• Multiple Genome Alignment – GSU
• Urban Water System Threat Management Simulation – NC
State
• Bio-electric Simulator for Whole Body Tissues – ODU
• Storm Surge Modeling with ADCIRC – UNC-CH/RENCI
SURAgrid Update
– A regional HPC infrastructure supporting
research applications, SCOOP and
other SURA programs (BioEnergy…)
– 29 Participating institutions
– In excess of 10TF capacity (quadrupled
capacity in past year)
– SURAgrid governance structure
approved by IT Steering Group
– Recently hired Linda Akli as Grid
Potential new participants:
Applications / Outreach Specialist
Stephen F. Austin , U Delaware
– Interviewing candidates for Grid
NC A&T, Mercer
Infrastructure Specialist
• SURAgrid Corporate Partnership Activity
- 7 IBM AIX systems being added to SURAgrid (GSU, TAMU, LSU)
- New IBM partnership (addition of 3 and 6 TF Linux systems)
- see 4/18 Press Release
• New Dell Partnership (2 TF Linux system)
SURAgrid Corporate Partnerships
• Existing IBM p575 partnership
• New IBM e1350 Linux partnenrship
• New Dell PowerEdge 1950 partnership
• Significant product discounts
• Owned and operated by SURAgrid
participants
• Integrated into SURAgrid with 20% of
capacity available to SURAgrid
Existing P5 575 System Solution
• Robust Hardware with High Reliability
Components
• 16 CPU scalability within a node
• Low Latency High Performance Switch
Technology
• Industrial Strength OS and HPC Software
Subsystems
• High Compute Density Packaging
• Ability to scale to very large configurations
Two Options
BPA
8, 16W Nodes at 1.9 Ghz
128 Processors
BPA
2U
16W
16W
16W
16W
16W
16W
16W
16W
Fed SW
0.97 TFlop Solution For SURA
4U
Federation Switch
16W
16W
16W
16W
16W
16W
16W
16W
16W
16W
16W
16W
16W
16W
2U
Fed SW
4U
256 GB or 128 GB System Memory
Storage Capacity: 2.35 TBytes
1.7 TFlop Solution for SURA
14, 16W Nodes at 1.9 Ghz
224 Processors
Federation Switch
224 GB or 448 GB System Memory
Storage Capacity: 4.11 TBytes
p5 575 Software
• AIX 5.3
• General Parallel File System (GPFS) with WAN
Support
• LoadLeveler
• Cluster Systems Management (CSM)
• Compilers (XL/FORTRAN, XLC)
• Engineering and Scientific Subroutine Library
(ESSL)
• IBM’s Parallel Environment (PE)
• Simultaneous Multi-Threading (SMT) Support
• Virtualization, Micro-Partitioning, DLPAR
SURA Pricing for p5 575
Solutions
• 0.97 TFlop Solution
– 8 Nodes $380,000.00 to SURA (16GB/Node)*
– 8 Nodes $410,000.00 to SURA (32GB/Node)*
• 1.70 TFlop Solution
– 14 Nodes $610,000.00 to SURA (16GB/Node)*
– 14 Nodes $660,000.00 to SURA (32GB/Node)*
• Price Includes 3 Year Warranty
– Hardware M-F, 8-5, Next Day Service
• Pricing Available Through the End of Calendar Year 2007
Net Price to Add a Node with 32 GB Memory : $56,752
Net Price to Add a Node with 16 GB Memory : $53,000
New SURA e1350 Linux Cluster
• New IBM BladeCenter-H, new HS21XM Blades and
Intel Quad-Core Processors
• 3 TFLOP Configuration
– One Rack solution with GigE interconnect
– 1GB/core
– Combination Management/User node with storage
• 6 TFLOP – Performance Focused Solution for HPC
–
–
–
–
Two Rack solution utilizing DDR Infiniband
2GB/core
Combination Management/User node with storage
Optional SAN supporting 4Gbs storage at 4.6Tbytes
New IBM e1350 Linux Cluster
• BladeCenter-H Based Chassis
•
•
•
•
•
•
•
•
•
– Redundant power supplies and Fan Units
– Advanced Management Module
– Dual 10 Gbps Backplanes
Fully integrated, tested and installed e1350 Cluster
Onsite configuration, setup and skills transfer
QUAD-CORE Intel Processors (8 Cores/Node)
Single Point of Support for the Cluster
Terminal Server connection to every Node
IBM 42U Enterprise Racks
Pull-out Console monitor, keyboard, mouse
Redundant power and fans on all nodes
3 Year Onsite Warranty
– 9x5xNext Day on-site on Compute Nodes
– 24x7x4 Hour on-site on Management Node, switches, racks
(optional Storage)
 SMC 8-port 10Gb Ethernet Switch
 (2) 32-port Cyclades Terminal Servers
 RedHat ES 4 License and Media Kit (3 years update support)
 3 Year Onsite Warranty
2
1
 Shipping and Installation
Ser
PS4
IB
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
PS3
Fan 2
B
R F
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Blade Center2
39M2816
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
 5 Days onsite Consulting for configuration, skills transfer
 One 42U Enterpise Rack, all cables, PDU’s
Fan 1
PS2
29
28
11
10
9
8
7
6
5
4
3
 Console Manger, Pull-out console, keyboard, mouse
PS1
Blk.
x3650 Mgmt
Console Switch
No Kbd.
Force10 48 Port Switch
SMC Tiger IV 8 Port
1U Blank
Blade Center2
C
F R
Mgt
PS1
Fan 1
PS3
Ser
Blk.
PS2
Fan 2
PS4
IB
D
R F
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Blade Center2
Mgt
PS1
Fan 1
PS3
Ser
Blk.
PS2
Fan 2
PS4
IB
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Blade Center2
39M2818
 Force10 48-port GigE Switch with 2 10Gb Uplinks
A
F R
Mgt
39M2818
 x3650 2U Management/User Node
Dual Quad-Core 2.67 GHz Clovertown Processors
1 GB Memory per core
Myricom 10Gb NIC Card
RAID Controller with (6) 300GB 10K Hot-swap SAS Drives
Redundant power/fans
33
32
31
30
32 Terminal Server
32 Terminal Server
1U Blank
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
Dual Quad-Core 2.67 GHz Clovertown Processors
1 GB Memory per core
73 GB SAS Disk per blade
GigE Ethernet to Blade with 10Gbit Uplink
Serial Terminal Server connection to every blade
Redundant power/fans
3U Blank
39M2818
 34 HS21XM Blade Servers in 3 BladeCenter H Chassis
###
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
3 TFLOP e1350 Cluster - $217,285
U
42
41
40
39
38
37
36
35
34
6 TFLOP e1350 Cluster - $694,309
 RedHat ES 4 License and Media Kit (3 years update support)
 Console Manger, Pull-out console, keyboard, mouse
 One 42U Enterpise Rack, all cables, PDU’s
 Shipping and Installation
 10 Days onsite Consulting for configuration, skills transfer
 3 Year Onsite Warranty
7
6
5
4
3
2
1
Console Switch
1 2 3 4 5 6 7 8 9 10 11 12 13 14
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
1U InfiniBand Sw
1U InfiniBand Sw
1U IB Core
1U IB Core
1U IB Core
7
6
5
4
3
2
1
Kbd.
48 Terminal Server
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
C
F R
1 2 3 4 5 6 7 8 9 10 11 12 13 14
D
R F
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
3U Blank
3U Blank
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
A
F R
39M2818
x3650 Mgmt Node
1U GB Ethernet Sw
39M2816
39M2816
x3650 Storage Node
3U Blank
C
F R
1 2 3 4 5 6 7 8 9 10 11 12 13 14
48 Terminal Server
1U Blank
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
L
L
L
L
L
73.4
SF 73.4
SF 73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
SF 73.4
SF 73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
B
R F
36
35
34
33
32
31
30
29
28
27
EXP
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Blade Center Comp.
39M2818
 (3) 32-port Cyclades Terminal Servers
A
F R
U
42
41
40
39
38
37
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
Force10 48-port GigE Switch
DS4700
39M2818
 DDR Non-Blocking Infiniband Network
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
EXP810
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
 x3650 2U Management/User Node
Dual Quad-Core 2.67 GHz Clovertown Processors
1 GB Memory per core
Myricom 10Gb NIC Card
RAID Controller with (6) 300GB 10K Hot-swap SAS Drives
Redundant power/fans
36
35
34
33
32
31
30
29
28
27
No. MT/M or PN
3U Blank
39M2818
Dual Quad-Core 2.67 GHz Clovertown Processors
2 GB Memory per core
73 GB SAS Disk per blade
GigE Ethernet to Blade
DDR Non-Blocking Voltaire Infiniband Low Latency Network
Serial Terminal Server connection to every blade
Redundant power/fans
Main
1U Blank
1U Blank
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
 70 HS21XM Blade Servers in 5 BladeCenter H Chassis
U
42
41
40
39
38
37
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1U
1U
1U
1U
InfiniBand
InfiniBand
InfiniBand
InfiniBand
Sw
Sw
Sw
Sw
7
6
5
4
3
2
1
Console Switch
1 2 3 4 5 6 7 8 9 10 11 12 13 14
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
1U InfiniBand Sw
1U InfiniBand Sw
1U IB Core
1U IB Core
1U IB Core
7
6
5
4
3
2
1
Kbd.
48 Terminal Server
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
C
F R
1 2 3 4 5 6 7 8 9 10 11 12 13 14
D
R F
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
3U Blank
3U Blank
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
A
F R
39M2818
x3650 Mgmt Node
1U GB Ethernet Sw
39M2816
x3650 Storage Node
3U Blank
C
F R
1 2 3 4 5 6 7 8 9 10 11 12 13 14
48 Terminal Server
1U Blank
Blade Center Comp.
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
B
R F
36
35
34
33
32
31
30
29
28
27
EXP
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Blade Center Comp.
39M2818
A
F R
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
DS4700
U
42
41
40
39
38
37
Mgt
PS1
Fan 1
PS3 GbE
Blk.
PS2
Fan 2
PS4
Ser
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
L
L
L
L
L
73.4
SF 73.4
SF 73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
73.4
SF 73.4
SF 73.4
73.4
73.4
EXP810
39M2818
 DS4700 Storage Subsystem
4 Gbps Performance (Fiber Channel)
EXP810 Expansion System
(32) 4 Gbps FC, 146.8 GB/15K
Enhanced Disk Drive Module (E-DDM)
Total 4.6 TB Storage Capacity
3U Blank
39M2816
Dual Quad-Core 2.67 GHz Clovertown Processors
1 GB Memory per core
Myricom 10Gb NIC Card
(2) 3.5" 73GB 10k Hot Swap SAS Drive
(2) IBM 4-Gbps FC Dual-Port PCI-E HBA
Redundant power/fans
3 Year Onsite 24x7x4Hour On-site Warranty
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
No. MT/M or PN
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
 x3650 Storage Node
36
35
34
33
32
31
30
29
28
27
Main
1U Blank
1U Blank
39M2818
$41,037
U
42
41
40
39
38
37
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
BladeServer
6 TFLOP e1350 Cluster
Storage Option -
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1U
1U
1U
1U
InfiniBand
InfiniBand
InfiniBand
InfiniBand
Sw
Sw
Sw
Sw
SURA – Dell Partnership
• Complete Dell Dell PowerEdge 1950 2TFlop High Performance
Computing Cluster
• SURAGrid Special Offer - $112,500
– Master Node – Dell PowerEdge 1950 (Qty = 1)
– Compute Nodes – Dell PowerEdge 1950 (Qty = 27)
– Gigabit Ethernet Interconnect
• Dell PowerConnect 6248 (Qty = 2)
– PowerEdge 4210, 42U Frame
– Platform Rocks – Cluster Management Software
• 1 year support agreement
– Complete Rack & Stack,
• including cabling, prior to delivery
– Complete Software Installation –
• Operating system, Cluster Management Software
– 2 Day on-site systems engineer
Compute Nodes – Dell PowerEdge 1950 (Qty = 27)
•Dual 2.33GHz/2x4MB Cache, Quad Core Intel® Xeon E5345, 1333MHz
FSB Processors
• 12GB FBD 667MHz Memory
• 80GB 7.2K RPM SATA Hard Drive
• Red Hat Enterprise Linux WS v4 1Yr RHN Subscription, EM64T
• 24X CD-ROM
• 3 Years HPCC Next Business Day Parts and Labor On-Site Service
Master Node – Dell PowerEdge 1950 (Qty = 1)
• Dual 2.33GHz/2x4MB Cache, Quad Core Intel® Xeon E5345,
1333MHz FSB Processors
• 12GB FBD 667MHz Memory
• Embedded RAID Controller – PERC5
• (2) 146GB, SAS Hard Drive (RAID1)
• Dual On-Board 10/100/100 NICS
• 24X CDRW/DVD-ROM
• Dell Remote Assistance Card
• Redundant Power Supply
• Red Hat Enterprise Linux AS v4 1Yr Red Hat Network Subscription,
EM64T
• 3 Years Premier HPCC Support with Same Day 4 Hour Parts and
Labor On-Site Service
Gigabit Ethernet Interconnect – Dell PowerConnect 6248 (Qty = 2)
• PowerConnect 6248 Managed Switch, 48 Port 10/100/1000 Mbps
• Four 10 Gigabit Ethernet uplinks
• 3 Years Support with Next Business Day Parts Service
Other Components • PowerEdge 4210, Frame, Doors, Side Panel, Ground, 42U
• (3) 24 Amp, Hi-Density PDU, 208V, with IEC to IEC Cords
• 1U Rack Console with 15" LCD Display, Mini-Keyboard/Mouse Combo
• Platform Rocks – Cluster Management Software with 1 year support agreement
• Complete Rack & Stack, including cabling, prior to delivery
• Complete Software Installation - Operating system, Cluster Management Software, etc.
• 2 Day on-site systems engineer
For More Information Regarding
IBM and Dell Discount Packages
Contact Gary Crane
315-597-1459
or
gcrane@sura.org
SURAgrid Governance and Decision-Making
Structure Overview
• See Tab 17 (page 28) of Board materials book for a copy
of the SURAgrid Governance and Decision Making
Structure Proposal
• SURAgrid Project Planning Working Group established at
Sep 2006 In-Person Meeting to develop governance
options for SURAgrid. Participants included:
–
–
–
–
–
–
–
–
–
Linda Akli, SURA
Gary Crane, SURA
Steve Johnson, Texas A&M University
Sandi Redman, University of Alabama in Huntsville
Don Riley, University of Maryland & SURA IT Fellow
Mike Sachon, Old Dominium University
Srikanth Sastry, Texas A&M University
Mary Fran Yafchak, SURA
Art Vandenberg, Georgia State University
SURAgrid Governance Overview
• To date SURAgrid has been used Consensus based
decision-making with SURA facilitating process
• State of maturity & investment >> formal governance
needed
• Expected purposes of formal governance
– Ensure those investing have appropriate role in governance
– Support sustainable growth of active participation to enhance
SURAgrid infrastructure 3 initial membership classes
• 3 Classes of Membership Defined
– 1. Contributing Member
• Higher education or related org contributing significant resources
to advance SURAgrid regional infrastructure
• SURA is contributing member by definition
– 2. Participating Member
• Higher education or related org participating in SURAgrid activities
other than Contributing Member
– 3. Partnership Member
• Entity (org, commercial, non-HE…) with strategic relationship with
SURAgrid
SURAgrid Governance Overview
• SURAgrid Contributing Members form primary
governing body
• Each SURAgrid Contributing Member will
designate one SURAgrid voting member
• SURAgrid Governance Committee elected by
SURAgrid Contributing Members
– SURAGrid Governance Committee provides
• Act on behalf of Contributing Members
• guidance, facilitation, reporting
• Initial SURAgrid Governance Committee will
have 9 members:
– 8 elected by contributing members
– 1 appointed by SURA
Transitioning to the New SURAgrid Governance
and Decision-Making Structure
• SURAgrid Participating Organization designate a
SURAgrid LEAD – Done
• New governance structure approved by SURA IT
Steering Group – Done
• New Governance structure approved by vote of
SURAgrid participating leads – Done
• Call for nominations for SURAgrid Governance
Committee candidates – Done
• Nominations will be accepted through midnight
April 28
• Election of SURAgrid Governance Committee
members is expected to be completed by May 12
Download