SAMGrid:JIM and CDF Development • CDF Accepts the Need for the Grid

advertisement
SAMGrid:JIM and CDF
Development
Rick St. Denis, University of Glasgow
• CDF Accepts the Need for the Grid
– Requirements
• How to Meet the Need
– Status of SAMGrid for CDF
4 March 2004
GridPP 9th Collaboration Meeting
Spokespersons’ Requirements for
CDF
Maximize physics output @ low Lumi
–L3 output rate: 80 -> 360Hz by 06
CDF
needs
the
Grid
Director’s review, International Finance
Committee: 50% computing outside FNAL
CDFGrid supported by FNAL PAC
4 March 2004
GridPP 9th Collaboration Meeting
Scale of CDF Requirements
THz
FY04
3.7
%offsite CPU
Speed
25%
3GHz
#duals
FY05
9.0
50%
5GHz
+360
FY06
16.5
50%
8GHz
+220
150
6-7 sites, 100Duals each, by 2006
+ 700 @FNAL
4 March 2004
GridPP 9th Collaboration Meeting
CDF Computing Model
• Develop Analysis on desktop
– Access to all CDF data from
anywhere
• Large scale processing on batch
clusters
– Submission from anywhere
Implemented
Now with
– interactive
tools: ls,top,head/tail/cat
CAF
– Output to scratch space or desktop
4 March 2004
GridPP 9th Collaboration Meeting
Use Cases for Summer 2004
• User Level MC Production
– All CDF Users have access
– No data on site -> SAM write
SAM
Essential
for
Summer
2004
• User Level Data Access
– All users have access
– Selected samples on site: Full SAM
Support
4 March 2004
GridPP 9th Collaboration Meeting
Medium Term Vision
• Many Sites
• Fully transparent submission to all of
CDF resources: 75% FNAL, 25%
outside
• Fully transparent input and output of
data
4 March 2004
GridPP 9th Collaboration Meeting
Summer 04 Functionality
• User selects submission site, saying what
dataset they will use
• System checks they can do this (privileges)
• User access with SAM/dCache
• User registers output with SAM
4 March 2004
GridPP 9th Collaboration Meeting
October 04
• To extend beyond 25% outside
computing JIM is essential: JIM Test
for CDF June04, production October
04
• HOWEVER: It already seems that the 25%
resources are not sufficient for the
produciton passes: will want JIM earlier.
4 March 2004
GridPP 9th Collaboration Meeting
CDF
Grid from
CDFGrid
fromaaUser
UserPerspective
Perspective
CAF Gui/CLI
Uses SAM
AC++
Grid
Italy
4 March 2004
Toronto Korea
Only
Outside
Grid
Fermilab
Lab
Taiwan FermiCAF
GridPP 9th Collaboration Meeting
UK
CDF Grid Strategy
• 25% of CDF Computing from external
resources. All CDF computing on
CDF Grid by April 15: Utilize
resources fully controlled by CDF:
Kerberos/fbsng: dCAF + SAM
• October 15, 2004: JIM to capture
shared resources
• June 2005: 50% of Computing
resources external
4 March 2004
GridPP 9th Collaboration Meeting
Anywhere
@ each site
Desktop
Simple JIM
Private LAN
Globus GK
CAF Submitter
SAM Station
@regional centers
Condor
Submitter
WN
Private LAN
dCache
@FNAL
SAM DB
Condor
Matchmaker
4 March 2004
GridPP 9th Collaboration Meeting
June 2004
testing
June 2005
required
Detailed JIM
User Interface
Flow of: job
data
User Interface
User Interface
Submission
meta-data
User Interface
Submission
Global Job Queue
Resource Selector
Grid Client
Match Making
Global DH Services
Info Gatherer
SAM Naming Server
Info Collector
SAM Log Server
Resource Optimizer
MSS
Cluster
Data Handling
Local Job Handling
SAM Station
(+other servs)
Grid Gateway
SAM Stager(s)
Local Job Handler
(CAF, D0MC, BS, ...)
AAA4 March 2004
Worker Nodes
SAM DB Server
Site
RC
MetaData Catalog
Bookkeeping Service
Info Manager
JIM Advertise
Dist.FS
Cache
MDS
Web Serv
Info Providers
Grid Monitoring
XML DB server
Site Conf.
Glob/Loc JIDMeeting
map
GridPP 9th Collaboration
...
User Tools
Site
Site
Site
Meeting the Needs
•
•
•
•
•
Progress in SAM
JIM Status
RunJob
CDFGridWorkshop: “Nerd’s Paradise”
Strict Project Management and process
to respond to operational issues
4 March 2004
GridPP 9th Collaboration Meeting
Progress in SAM
• Dbserver, the database server between
applications and Oracle, was upgraded to
use a common schema for CDF and D0.
• All CDF data files are in SAM
• Sam in is in beta testing on the CDF CAF
(1200 cpus): passed 20TB/Day delivery
• Minos uses SAM for its Data Handling
• Steve Mrenna (Phenomenology) depositing
ALPGEN files in SAM for common
CDF/D0 use.
4 March 2004
GridPP 9th Collaboration Meeting
JIM Deployment Issues
Focus:
• 200 jobs each
getting 200 files generated 120000
Communication with the expert!
requests simultaneously to the DBServer!
– Sensible sam: reliability went to 60%. Now add retries.
Training Users
• D0 has D0Tools: Big script; determines where
user is and copies files: harder to get into a
sandbox;
• CAF conditions users!
Distribution and compatibility:
• This has made great strides with SAM, now time
for JIM
4 March 2004
GridPP 9th Collaboration Meeting
RunJob
• Dedicated farms at FNAL will go away and
RunJob will be used for production
processing of data
• CDF will use RunJob for MC production
• Dave Evans worked for CDF for 2 mo.: has
made CDFRunJob based on
RunJob(Shakar), a tool common to CMS.
Morag will work on this.
4 March 2004
GridPP 9th Collaboration Meeting
Florida workshop:
• 11 installations in about 2 hours. Integrated with
dCAF in 2 cases inNow
2 days.
20!
• 3 in Asia, 4 in Europe
• 6 sites committed to summer 2004 usage of their
facilities for all of CDF (mostly MC)
• Sam installation now: initsam cdf <stationname>
• Follow-up on April 1.
• Each site has a local user support person to reduce
load on core development team.
• Generally: Security ate 80% of the effort!
4 March 2004
GridPP 9th Collaboration Meeting
4 March 2004
GridPP 9th Collaboration Meeting
Florida Workshop: After 2 Days
Installations progress
Participating Institues installation and testing progress
Sam
CDF
Sam
Sam
Caf
Caf DCAF
Sam
Sam
File
INSTITUTE krb5
Sam
sam_par_ret
File
AC++Dump
Head Node Works
Station
AC++Dump
Store
Software
Store
on CAF
Remote
MIT
Yes ?
Korea
Yes Yes
Yes
Yes
Yes
knu
Yes
Yes
Pisa
Yes Yes
Yes
Yes
Yes
pisa
Yes
Yes
Yes
Japan
Yes Yes
Yes
Problems Yes
japan
Yes
Yes
Yes
Karlsruhe Yes Yes
Yes
Problems Yes
fzzka
Yes
Yes
Yes
Yes Yes
Yes
Problems Yes
liverpool Yes
Yes
Yes
Liverpool
In
progress
Toronto
Yes
Taiwan
Yes Yes
TTU
Yes
Glasgow
Yes
Yes
Yes
Yes
toronto
Yes
Yes
taiwan
Yes
Yes
Yes
Yes
Yes
Yes
Yes
-ttu,-ttuYes
phys
In
Progress
Yes
glasgow Yes
UCSD
Yes Yes
Yes
Yes
Yes
ucsd
Yes
CNAF
Yes Yes
Yes
Yes
Yes
cnaf
Yes
4 March 2004
Yes
Yes
Yes
GridPP 9th Collaboration Meeting
Yes
2TB/Day: Karlsruhe
4 March 2004
GridPP 9th Collaboration Meeting
CDF Dcache on CAF
ALL CDF on CAF reads 20TB/Day
4 March 2004
GridPP 9th Collaboration Meeting
4 March 2004
GridPP 9th Collaboration Meeting
4 March 2004
GridPP 9th Collaboration Meeting
Dcache and SAM
• Dcache shapes traffic into disk: If a SAM
cache is large, need to use Dcache instead
of nfs mounts
• Dcache gives the user what is requested.
1TB gets same priority as 1GB: CDF users
must send email requesting data to be
staged.
• SAM examines consumption rate before
staging next files – No EMAIL needed.
• SAM uses Dcache for its Caching at FNAL.
•4 March
This
work with
2004 needs further
GridPP 9th Collaboration
Meeting SRM
SAMGrid Management
Sam Management Team
Sam Project Leaders
Sam Technical Leaders
Sam Operations
And Projects
4 March 2004
GridPP 9th Collaboration Meeting
Sam Design
SamGrid Development Process
Chaired by Project Leaders
Chaired by Technical Managers
SAMGrid Operations/Projects
Issue Raised
SAMGrid Design
SAMGrid
Management Team
Grid Deliverables
Subproject
4 March 2004
GridPP 9th Collaboration Meeting
Subproject Organization
• Each Subproject has a subproject leader
(SPL) responsible for making a plan and
reporting progress.
• Each Subproject has one of the Technical
leaders evaluating against an assessment
template.
• No deliverable requires more than 3mo
work to deliver.
4 March 2004
GridPP 9th Collaboration Meeting
SubProject Assessment Template
1.
2.
3.
4.
5.
6.
7.
8.
Background Documents
Project Definition/Mission Statement
Deliverables and timetable
Inter-project deliverables
Project status
Challenges and Critical Path Items
Lessons Learned
Project specific comments, alternate views
4 March 2004
GridPP 9th Collaboration Meeting
SAMGrid Assigned SubProjects
MC / Reconstruction
Housekeeping
Work FlowPackage
MCRequest
Housekeeping
H Stream for CDF
JIM:MCD0
Test Harness
User analysis Apps
JIM:D0Tools
Infrastructure
Common API
4 March 2004
Retire CDF Replica Catalog
Database Server Rewrite
Database Servers
toLinux
Configuration Management
Caching
Metadata Query with
configurable Params
GridPP 9th Collaboration Meeting
Status of Assessments
• Subprojects defined
• Interviews conducted on about ½
• Assessment reports being written
4 March 2004
GridPP 9th Collaboration Meeting
Conclusions
• CDF has embraced the need for the Grid to
achieve its physics mission
• Progress in deployment, robustness testing
has SAM in CDF
• JIM is rapidly solving its problems
• … with the help of a review and
management process
4 March 2004
GridPP 9th Collaboration Meeting
Download