DQ2 James Walder Lancaster University

advertisement
DQ2
James Walder
Lancaster University
DQ2
•
The data management system (DQ2) is responsible for the movement of
data around the grid.
•
For users, provides high-level commands to access the data you need
•
•
The underlying machinery should hopefully remain hidden to you.
Many commands exists; only a few will be useful to use:
•
Some you won’t have privileges to use.
dq2-check-replica-consistency
dq2-close-dataset
dq2-delete-datasets
dq2-delete-files
dq2-delete-replicas
dq2-delete-subscription
dq2-delete-subscription-container
dq2-destinations
dq2-erase
dq2-freeze-dataset
dq2-get
dq2-get-metadata
•
dq2-get-number-files
dq2-get-replica-metadata
dq2-list-dataset
dq2-list-dataset-by-creationdate
dq2-list-dataset-replicas
dq2-list-dataset-replicas-container
dq2-list-datasets-container
dq2-list-dataset-site
dq2-list-erased-datasets
dq2-list-file-replicas
dq2-list-files
dq2-list-replica-history
dq2-list-subscription
dq2-list-subscription-info
dq2-list-subscription-site
dq2-ls
dq2-metadata
dq2-ping
dq2-put
dq2-register-container
dq2-register-dataset
dq2-register-datasets-container
dq2-register-files
dq2-register-location
dq2-register-subscription
dq2-register-subscription-container
dq2-register-version
dq2-reset-subscription
dq2-reset-subscription-site
dq2-sample
dq2-set-metadata
dq2-set-replica-metadata
dq2-sources
dq2-usage
The two most useful commands are:
•
•
dq2-ls ; to list datasets/containers and files
dq2-get ; to retrieve dataset files
2
James Walder
The Basics
•
•
Using DQ2:
•
•
As this is a Grid tool, you will need to be correctly Authenticated and
Authorised: i.e you need a valid grid certificate installed on the UI.
On lxplus you will need to source the correct environmental settings:
•
•
•
DQ2 will be available from User Interface (UI) enabled machines
source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh (or .zsh)
•
Choose sh or zsh, depending on your shell type.
To create a grid proxy type:
•
voms-proxy-init -voms atlas
•
and enter your grid password
You should see output similar to:
> voms-proxy-init -voms atlas
Cannot find file or dir: /home/atlas/jww/.glite/vomses
Enter GRID pass phrase:
Your identity: /C=UK/O=eScience/OU=Lancaster/L=Physics/CN=james walder
Creating temporary proxy ............................................................................................................ Done
Contacting voms.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch] "atlas" Done
Creating proxy .............................................................................................................................................. Done
Your proxy is valid until Fri Jan 15 22:35:56 2010
•
You can check the status of the certificate with:
•
voms-proxy-info
3
James Walder
Files, Datasets and Containers
•
•
•
•
•
•
Files that are stored on the grid will be placed within ‘Datasets’.
•
•
The output of your Grid jobs will end up in a dataset.
Datasets and filenames must be uniquely named.
Conventions are followed for official data (eg. data09_ ) and for user datasets.
•
User data must follow:
•
user<YY>.<DN>.xxx eg, user10.jameswalder.ganga.xxx
For official production and additional layer of hierarchy has been introduced:
•
•
The Container. Similar name structure to a dataset but have a suffix “/”
Containers are a set of datasets:
Example: List of AOD containers for run 142193:
•
•
data09_900GeV.00142193.physics_BPTX.merge.AOD.r988_p62/
data09_900GeV.00142193.physics_MuonswBeam.merge.AOD.r988_p62/
dq2-ls "data09*142193*AOD*r988_p62/"
data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/
data09_900GeV.00142193.express_express.merge.AOD.r988_p62/
data09_900GeV.00142193.physics_L1Calo.merge.AOD.r988_p62/
data09_900GeV.00142193.physics_RNDM.merge.AOD.r988_p62/
data09_900GeV.00142193.physics_L1CaloEM.merge.AOD.r988_p62/
The min-bias stream container has the following dataset:
dq2-list-datasets-container data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/
•
data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00
Which contains the files:
•
dq2-ls -f data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00
•
•
•
•
•
•
[ ]
AOD.102194._000011.pool.root.1
B84E7973-A9F0-DE11-A53B-00A0D1E49F91
ad:0fa0eedb
115781764
....
total files: 51
local files: 0
total size: 5680849886
date: 2009-12-24 23:19:36
4
James Walder
Datasets (cont.)
•
On the previous page we saw the hierarchy:
data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/
data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00
AOD.102194._000011.pool.root.1, ...
•
You will find on the grid the datasets with _tid and _sub (or _shadow)
•
•
Produced as part of the production process.
These should not be used directly; Instead use the container,
as this will consolidate correctly all of the files.
•
dq2-ls -f will also work on the container to display the lists of files in the set
of datasets.
•
dq2-XXX --help for more information on the usage of the individual tool.
5
James Walder
Retrieving Data
•
•
•
To retrieve your data from the grid to your UI machine you just use:
•
dq2-get <dataset>
A directory with name <dataset> will be created and files will be downloaded into that.
•
For big datasets (eg. real data / mc samples). Only download a few files for testing. Then
use the grid for the bulk submission.
•
dq2-get -n 1 <dataset>
container, or
•
dq2-get --files <file1>,<file2> <dataset> for specific files.
This will retrieve one random file from the dataset or
You can locate the site-replicas of your datasets with:
•
dq2-list-dataset-replicas data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00
INCOMPLETE: NIKHEF-ELPROD_DATADISK
•
•
•
COMPLETE: AGLT2_DATADISK,ANL_LOCALGROUPDISK,BNL-OSG2_DATADISK,CERN-PROD_DATADISK,DESYZN_DATADISK,DUKE_LOCALGROUPDISK,FZK-LCG2_DATADISK,GRIF-SACLAY_DATADISK,IN2P3-CC_DATADISK,INFNROMA1_DATADISK,INFN-T1_DATADISK,MWT2_DATADISK,NDGF-T1_DATADISK,NET2_DATADISK,PIC_DATADISK,RALLCG2_DATADISK,SWT2_CPB_DATADISK,TAIWAN-LCG2_DATADISK,TOKYO-LCG2_DATADISK,TRIUMFLCG2_DATADISK,UAM-LCG2_DATADISK,UKI-SCOTGRID-GLASGOW_DATADISK,UKI-SOUTHGRIDRALPP_DATADISK,UNI-FREIBURG_DATADISK,WISC_DATADISK
Avoid sites with incomplete datasets. Normally you don’t need this level of information,
but can be used in problem diagnosis.
For containers you need a slightly different command:
•
dq2-list-dataset-replicas-container data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/
Sites are given names like: UKI-SCOTGRID-GLASGOW_DATADISK
•
It is separated into Sitename _ Spacetoken
6
James Walder
Tasks
•
•
•
Use dq2 to find the dataset located in AMI.
Find out which sites the replicas, physically reside on:
•
•
Do you know which clouds (~countries) these belong to?
What is the size of the total container (dq2-ls -f <dataset>).
Follow the TWiki for instructions on obtaining some data.
•
Download a couple of files into the /tmp directory from your chosen
dataset.
•
Make a note of which lxplus node your are on, if you need to log out or
use a new terminal.
7
James Walder
Download