GLAST Large Area Telescope ISOC Peer Review Section 4.3 Pipeline, Data Storage and

advertisement
GLAST LAT Project
Gamma-ray Large
Area Space
Telescope
ISOC Peer Review - March 2, 2004
GLAST Large Area Telescope
ISOC Peer Review
Section 4.3
Pipeline, Data Storage and
Networking Issues
for the ISOC
Richard Dubois
SAS System Manager
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
1
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Outline
•
SAS Summary Requirements
•
Pipeline
•
•
•
•
Processing database
Prototype status
Networking
•
•
•
•
•
Requirements
Proposed Network Topology
Network Monitoring
File exchange
Security
Data Storage and Archive
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
2
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Level III Requirements Summary
Ref: LAT-SS-00020
Function
Flight Ground Processing
Requirement
perform prompt processing
from Level 0 through Level 1
provide near-real time
monitoring to IOC
maintain state and
performance tracking
facilitate monitoring and
updating of iinstrument
calibrations
archive all data passing
through
Expected Performance (if
applicable)
keep pace with up to 10 GB
Level 0 per day and deliver to
SSC within 24 hrs
within 6 hrs
Verification
demonstration
demonstration
demonstration
demonstration
> 50 TB on disk and tape
backup
demonstration
• Basic requirements are to routinely handle ~10 GB/day in multiple passes coming from
the MOC
• @ 150 Mb/s – 2 GB should take < 2 mins
• Outgoing volume to SSC is << 1 GB/day
• NASA 2810 Security Regs – normal security levels for IOC’s; as practiced by computing
centers already
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
3
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Pipeline Spec
•
•
•
•
Function
– The Pipeline facility has five major functions
• automatically process Level 0 data through reconstruction (Level 1)
• provide near real-time feedback to IOC
• facilitate the verification and generation of new calibration constants
• produce bulk Monte Carlo simulations
• backup all data that passes through
Must be able to perform these functions in parallel
Fully configurable, parallel task chains allow great flexibility for use online as
well as offline
– Will test the online capabilities during Flight Integration
The pipeline database and server, and diagnostics database have been
specified (will need revision after prototype experience!)
– database: LAT-TD-00553
– server: LAT-TD-00773
– diagnostics: LAT-TD-00876
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
4
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Expected Capacity
•
We routinely made use of 100-300 processors on the SLAC farm for
repeated Monte Carlo simulations, lasting weeks
– Expanding farm net to France and Italy
– Unknown yet what our MC needs will be
– We are very small compared to our SLAC neighbour BABAR –
computing center sized for them
• 2000-3000 CPUS; 300 TB of disk; 6 robotic silos holding
~30000 200 GB tapes total
– SLAC computing center has guaranteed our needs for CPU and
disk, including maintenance for the life of the mission.
– Data rate small compared to already demonstrated MC capability
• ~75 of today’s CPUs to handle 5 hrs of data in 1 hour @ 0.15
sec/event
• Onboard compression may make it 75 of tomorrow’s CPUs too
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
5
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Pipeline in Pictures
State machine + complete processing record
Configurable
linked list of
applications
to run
Document: LAT-PR-03213-01
Expandable and
configurable set of
processing nodes
Section 4.3 Pipeline, Data Storage and Networking Issues
6
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Processing Dataset Catalogue
Processing
records
Datasets grouped
by task
Datasets info is here
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
7
GLAST LAT Project
ISOC Peer Review - March 2, 2004
First Prototype - OPUS
Open source project from STScI
In use by several missions
Now outfitted to run DC1
dataset
OPUS Java mangers for pipelines
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
8
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Disk and Archives
• We expect ~10 GB raw data per day and assume comparable
volume of events for MC
– Leads to ~50 TB over 5 years for all data types
• No longer frightening – keep it all on disk
– Use SLAC’s mstore archiving system to keep a copy in the
silo
• Already practicing with it and will hook it up to OPUS
– Archive all data we touch; track in dataset catalogue
– Not an issue
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
9
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Network Path: SLAC-Goddard
‫٭‬
‫٭‬
‫٭‬
‫٭‬
‫٭‬
SLAC
Stanford
Oakland (CENIC)
Houston
Atlanta
Washington
Document: LAT-PR-03213-01
LA
UC-AID (Abilene)
GSFC (77 ms ping)
Section 4.3 Pipeline, Data Storage and Networking Issues
10
GLAST LAT Project
ISOC Peer Review - March 2, 2004
ISOC Stanford/SLAC Network
• SLAC Computing Center
– OC48 connection to outside world
– provides data connections to MOC and SSC
– hosts the data and processing pipeline
– Transfers MUCH larger datasets around the world for
BABAR
– World renowned for network monitoring expertise
• Will leverage this to understand our open internet model
– Sadly, a great deal of expertise with enterprise security as
well
• Part of ISOC expected to be in new Kavli Institute building on
campus
– Connected by fiber (~2 ms ping)
– Mostly monitoring and communicating with processes/data
at SLAC
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
11
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Network Monitoring
Need to understand failover reliability, capacity and latency
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
12
GLAST LAT Project
ISOC Peer Review - March 2, 2004
LAT Monitoring
LAT Monitoring
Keep track of connections to
collaboration sites
Alerts if they go down
Fodder for complaints if poor
connectivity
Monitoring nodes at most LAT
collaborating institutions
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
13
GLAST LAT Project
ISOC Peer Review - March 2, 2004
File Exchange
•
•
•
•
Secure
– No passwords in plain text etc
Reliable
– Has to work > 99% of the time (say)
handle the (small) data volume
– order 10 GB/day from Goddard (MOC); 0.3 GB/day back to Goddard (SSC)
– keep records of transfers
– database records of files sent and received
– handshakes
– both ends agree on what happened
– some kind of clean error recovery
– Notification sent out on failures
Web interface to track performance
•
Are
–
–
–
•
To be captured in an ICD with the SSC prior to DC2
investigating DTS now
Not super happy with how it’s built as far as reliability and security go
Installing it now
Starting to work with author on issues
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
14
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Security
• Network security – application vs network
– ssh/vpn among all sites – MOC, SSC and internal ISOC
– A possible avenue is to make all applications secure (ie
encrypted), using SSL.
• File and Database security
– Controlled membership in disk ACLs
– Controlled access to databases
– Depend on SLAC security otherwise
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
15
GLAST LAT Project
ISOC Peer Review - March 2, 2004
Summary
•
We are testing out the OPUS pipeline as our first prototype
– Already outfitted to run DC1 dataset
– Interfaces to processing database and SLAC batch done
– Practice with DCs and Flight Integration
•
We expect to need O(50 TB) of disk and ~2-3x that in tape archive
– Not an issue
•
We expect to use Internet2 connectivity for reliable and fast transfer
of data between SLAC and Goddard
– Transfer rates of > 150 Mb/s already demonstrated
– < 2 min transfer for standard downlink. More than adequate.
– Starting a program of routine network monitoring to practice
•
Network security is an ongoing, but largely solved, problem
– There are well-known mechanisms to protect sites
– We will leverage considerable expertise from the SLAC and
Stanford networking/security folks
Document: LAT-PR-03213-01
Section 4.3 Pipeline, Data Storage and Networking Issues
16
Download