Data Movement & Storage Using the Data Capacitor Filesystem Justin Miller

advertisement
Data Movement & Storage
Using the Data Capacitor Filesystem
Justin Miller
jupmille@indiana.edu
http://pti.iu.edu/dc
Big Data for Science Workshop
July 2010
Challenges for DISC
•  Keynote by Alex Szalay identified the
challenges that researchers face
– “Scientific data doubles every year”
•  Amount of data is a barrier to extracting knowledge
– Problem of today: data access
– How can we minimize data movement?
Workflow Example – Single Compute
0$/$)1+.%&"
*+,-./")!"#+.%&"
!"#"$%&'"%(#)*+,-./"%
Workflow Example – Multiple Compute
3$/$)4+.%&"
*+,-./"
!"#+.%&")01
*+,-./"
!"#+.%&")02
!"#"$%&'"%(#)*+,-./"%
Workflow Example – Visualization
3$/$)4+.%&"
*+,-./")
!"#+.%&")01
!"#"$%&'"%(#)
*+,-./"%
*+,-./")
!"#+.%&")02
56#.$768$/6+9
!"#+.%&"
Workflow Example – Archive
3$/$)4+.%&"
!"#"$%&'"%(#)
*+,-./"%
56#.$768$/6+9
!"#+.%&"
*+,-./")
!"#+.%&")01
*+,-./")
!"#+.%&")02
:$-"
;%&'6<"
Data Movement & Storage
•  This is an unsustainable workflow
–  Works for GB, maybe single TB, but not more
•  Every resource is another series of transfers
–  Data movement is in the way of doing work
–  Good reasons to add resources to workflow
•  And we haven’t addressed other drawbacks
IU Central Filesystem Workflow
*+,-./")
!"#+.%&")01
3$/$)4+.%&"
!"#"$%&'"%(#)
*+,-./"%
!"#"
$"%"&'#()
*+,-./")
!"#+.%&")02
:$-"
;%&'6<"
56#.$768$/6+9
!"#+.%&"
IU’s Data Capacitor Filesystem
•  National Science Foundation funded in 2005
•  Funds purchased 535TB of Lustre storage
–  339TB available as production service
•  Data Capacitor name comes from electronics
–  capacitors provides transient storage of
electrons
–  absorbs and evens out peaks in flow
–  provides consistent output
Idea of Data Capacitor
•  Centralized short-term storage for IU resources
–  Store your data to compute against, and use
for “scratch space” during your run
–  Possibility exists for mid-term storage
Data Capacitor Centralized Storage
•  Compute using IU’s supercomputer Big Red
•  Compute using IU’s Quarry cluster
•  Archive to IU’s massive HPSS tape archive
–  hierarchical storage
–  archive your data to tape
Central to IU Cyberinfrastructure
Physics Research
•  Dr. Chuck Horowitz, IU physicist
–  Interested in the behavior of neutron stars
–  Studying the behavior of nuclear matter near
saturation density
•  can form interesting phase "nuclear pasta”
–  Using MDGRAPE-2 hardware for increased
performance
Physics Research
•  Particle interaction is simulated via molecular
dynamics using specialized MDGRAPE-2
hardware
–  configurations are saved
•  Post processing
–  creates VTK frames
•  Visualization system
–  ingests frame data
–  displays as movie
Physics Research
!"#$%&'(
)'*"%+,'
7/&/
!/$/,.&"+
-.*%/0.1/&."2
)'*"%+,'
3/$'
4+,5.6'
Earth Science Research
•  Linked Environments for Atmospheric Discovery
(LEAD)
•  WxChallenge
–  The WxChallenge is a meteorological forecast
competition.
–  Compete to forecaste maximum and minimum
temperatures, precipitation, and maximum
wind speeds for select U.S. cities over a tenweek period each semester
LEAD Workflow
0'.&1'+
-.&.
!"#$%&'(
)'*"%+,'
-.&.
!.$.,/&"+
!"#$%&'+
2,/'3,'
!4%*&'+
5+.3*6'+
)'*"%+,'*
Extend the Centralized FS Model
•  The natural progression is to be central to
more resources
– Make data available to more resources
•  IU did this by extending the filesystem
across the wide-area network (WAN)
– Data Capacitor WAN (DC-WAN)
– New FS separate from the original DC
Data Capacitor WAN
Data Capacitor WAN Tradeoffs
•  The benefit of a centralized WAN
filesystem is the illusion of locality
•  Your data is transferred behind the scenes
across the network
– At worst your data will be transferred
slower than you like
– At best it is as fast, or faster, than local
storage; typically comparable across
research networks
DC-WAN Namespace Mapping
•  WAN FS challenge is heterogeneous user
identification across sites
•  The numeric user identification (UID) for a
particular user not the same across sites
•  You don’t have to worry about this
because DC-WAN does the conversion
Indiana
TACC
PSC
NCSA
SDSC
jupmille
tg803934
jupmille
jupmille
jupmille
uid=648424
uid=803934
uid=43415
uid=40436
uid=502639
Physics Research with DC-WAN
;'+-$#.')"()"
*)+,-./
0/&)-12/
!-&.'"<(5=
>>?(+'$/&
3'&-#$'4#.')"
0/&)-12/
!"#$%&'&()"
*)+,-./
0/&)-12/
8#.#
*#,#2'.)1
9!:
5#,/
!126'7/
Astronomy with DC-WAN
3-2&)"?(!@
ABCD(+'$/&
79:8(3/$/&2),/
;"/(6/<1//(9+#</1
=;69>
!"#$%&'&()"
*)+,-./
0/&)-12/
6#.#
*#,#2'.)1
7!8
3#,/
!124'5/
Image NOAO/AURA/NSF
Center for the Remote Sensing of Ice
Sheets (CReSIS) Workflow
!"#$%&'
(')"%*+'
5*''67-68
9-:*'6+';<=>
?@A<#07')
2-&!-$-+0&"*
3.4
,-$'
.*+/01'
.6&-*&0+-
Gas Giant Planet Research
1)23"4)5",).6
7$2.3&'$
G9G),,2B3&H(<=G%
FE?=A)4$2
+","
-"#"'),.&
/%0
0-9%
:&B"6"<=CD
EF@=A)4$2
!"#$
%&'()*$
89:
9,"&;*)44$<=89
>?@=A)4$2
Demo
•  Small sample of Gas Giant Planet Research
•  Data is on DC-WAN, which is mounted on two
different resources
–  Compute on PSC’s Pople (SGI Altix 4700)
–  Post-process and visualize results on IU
machine that has proprietary software (IDL
v7.0); view over network
IU’s Data Capacitor WAN Filesystem
•  Funded by Indiana University in 2008
•  339TB of storage available as production service
•  Centralized short-term storage for nationwide
resources, including TeraGrid
–  Use your data on the best resource for your
needs
–  Short-term storage like DC, possibility exists
for mid-term storage
Based on Lustre Filesystem
•  Lustre is a parallel distributed file system
•  Available under the GNU GPL
•  Used by U.S. government, movie studios,
financial institutions, oil and gas industry
•  7 of the top 10 HPC systems on the June 2009
"Top 500" list
–  52 of the top 100 run Lustre in 2010
Based on Lustre Filesystem
•  “Lustre filesystems can support up to tens of
thousands of client systems, petabytes (PBs) of
storage and hundreds of gigabytes per second
(GB/s) of I/O throughput.”
•  Scalable filesystem
–  uses separate servers to aggregate for
performance
–  storage backend is hidden from the client
Lustre Filesystem Architecture
•  Lustre presents all clients with standard POSIX
filesystem interface
–  Filesystem mount
•  My scratch directory for example:
–  IU: /N/dcwan/scratch/jupmille/
–  PSC: /N/dcwan/scratch/jupmille/
–  TACC: /N/dcwan/scratch/jupmille/
–  NCSA: /N/dcwan/scratch/jupmille/
–  Standard commands
•  ls, cp, cat, etc. from the command line
Lustre Filesystem Architecture
•  Metadata Server (MDS)
–  stores the filesystem metadata such as
filenames, directories, and permissions.
–  file operations such as open/close
•  Object Storage Server (OSS)
–  bulk I/O servers
•  Object Storage Targets (OST)
–  back-end storage devices
Lustre Filesystem Architecture
'()
*))
*))
*))
*))
*)+
*)+
*)+
*)+
*)+
*)+
*)+
*)+
*)+
*)+
*)+
!"#$%&
Data Capacitor Hardware
•  8 pairs Dell PowerEdge 2950
–  2 x 3.0 GHz Dual Core Xeon
–  Myrinet 10G Ethernet
–  Dual port Qlogic 2432 HBA (4 x FC)
–  2.6 Kernel (RHEL 5), Lustre 1.8
•  4 DDN S2A9550 Controllers
–  Over 2.4 GB/sec measured throughput each
–  339Tb of spinning SATA disk
Data Capacitor WAN Hardware
•  2 pairs Dell PowerEdge 2950
–  2 x 3.0 GHz Dual Core Xeon
–  Myrinet 10G Ethernet
–  Dual port Qlogic 2432 HBA (4 x FC)
–  2.6 Kernel (RHEL 5), Lustre 1.8
•  1 DDN S2A9550 Controllers
–  Over 2.4 GB/sec measured throughput
–  339Tb of spinning SATA disk
Getting the Most out of Lustre
•  Lustre is optimized for large files (where large is
>1Mb), not so good for small files
•  Lustre has aggressive client side caching
–  if you plan reading the same files more than
once, big win
•  Lustre allows you to control how your data is
striped across the OSTs, so optimization based
on your I/O patterns can reap benefits in
throughput
Lustre WAN Future
•  DC-WAN will be mounted on the India and
Sierra FutureGrid clusters
–  In the testing phase right now
•  IU’s Lustre UID mapping code will be used in a
new TeraGrid Lustre-WAN project in
development now
Thank you for listening.
•  Questions are welcome.
–  Please use moderators for Q&A
Justin Miller
jupmille@indiana.edu
Data Capacitor Team
dc-team-l@indiana.edu
http://pti.iu.edu/dc
Download