DQ2 James Walder Lancaster University DQ2 • The data management system (DQ2) is responsible for the movement of data around the grid. • For users, provides high-level commands to access the data you need • • The underlying machinery should hopefully remain hidden to you. Many commands exists; only a few will be useful to use: • Some you won’t have privileges to use. dq2-check-replica-consistency dq2-close-dataset dq2-delete-datasets dq2-delete-files dq2-delete-replicas dq2-delete-subscription dq2-delete-subscription-container dq2-destinations dq2-erase dq2-freeze-dataset dq2-get dq2-get-metadata • dq2-get-number-files dq2-get-replica-metadata dq2-list-dataset dq2-list-dataset-by-creationdate dq2-list-dataset-replicas dq2-list-dataset-replicas-container dq2-list-datasets-container dq2-list-dataset-site dq2-list-erased-datasets dq2-list-file-replicas dq2-list-files dq2-list-replica-history dq2-list-subscription dq2-list-subscription-info dq2-list-subscription-site dq2-ls dq2-metadata dq2-ping dq2-put dq2-register-container dq2-register-dataset dq2-register-datasets-container dq2-register-files dq2-register-location dq2-register-subscription dq2-register-subscription-container dq2-register-version dq2-reset-subscription dq2-reset-subscription-site dq2-sample dq2-set-metadata dq2-set-replica-metadata dq2-sources dq2-usage The two most useful commands are: • • dq2-ls ; to list datasets/containers and files dq2-get ; to retrieve dataset files 2 James Walder The Basics • • Using DQ2: • • As this is a Grid tool, you will need to be correctly Authenticated and Authorised: i.e you need a valid grid certificate installed on the UI. On lxplus you will need to source the correct environmental settings: • • • DQ2 will be available from User Interface (UI) enabled machines source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh (or .zsh) • Choose sh or zsh, depending on your shell type. To create a grid proxy type: • voms-proxy-init -voms atlas • and enter your grid password You should see output similar to: > voms-proxy-init -voms atlas Cannot find file or dir: /home/atlas/jww/.glite/vomses Enter GRID pass phrase: Your identity: /C=UK/O=eScience/OU=Lancaster/L=Physics/CN=james walder Creating temporary proxy ............................................................................................................ Done Contacting voms.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch] "atlas" Done Creating proxy .............................................................................................................................................. Done Your proxy is valid until Fri Jan 15 22:35:56 2010 • You can check the status of the certificate with: • voms-proxy-info 3 James Walder Files, Datasets and Containers • • • • • • Files that are stored on the grid will be placed within ‘Datasets’. • • The output of your Grid jobs will end up in a dataset. Datasets and filenames must be uniquely named. Conventions are followed for official data (eg. data09_ ) and for user datasets. • User data must follow: • user<YY>.<DN>.xxx eg, user10.jameswalder.ganga.xxx For official production and additional layer of hierarchy has been introduced: • • The Container. Similar name structure to a dataset but have a suffix “/” Containers are a set of datasets: Example: List of AOD containers for run 142193: • • data09_900GeV.00142193.physics_BPTX.merge.AOD.r988_p62/ data09_900GeV.00142193.physics_MuonswBeam.merge.AOD.r988_p62/ dq2-ls "data09*142193*AOD*r988_p62/" data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/ data09_900GeV.00142193.express_express.merge.AOD.r988_p62/ data09_900GeV.00142193.physics_L1Calo.merge.AOD.r988_p62/ data09_900GeV.00142193.physics_RNDM.merge.AOD.r988_p62/ data09_900GeV.00142193.physics_L1CaloEM.merge.AOD.r988_p62/ The min-bias stream container has the following dataset: dq2-list-datasets-container data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/ • data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00 Which contains the files: • dq2-ls -f data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00 • • • • • • [ ] AOD.102194._000011.pool.root.1 B84E7973-A9F0-DE11-A53B-00A0D1E49F91 ad:0fa0eedb 115781764 .... total files: 51 local files: 0 total size: 5680849886 date: 2009-12-24 23:19:36 4 James Walder Datasets (cont.) • On the previous page we saw the hierarchy: data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/ data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00 AOD.102194._000011.pool.root.1, ... • You will find on the grid the datasets with _tid and _sub (or _shadow) • • Produced as part of the production process. These should not be used directly; Instead use the container, as this will consolidate correctly all of the files. • dq2-ls -f will also work on the container to display the lists of files in the set of datasets. • dq2-XXX --help for more information on the usage of the individual tool. 5 James Walder Retrieving Data • • • To retrieve your data from the grid to your UI machine you just use: • dq2-get <dataset> A directory with name <dataset> will be created and files will be downloaded into that. • For big datasets (eg. real data / mc samples). Only download a few files for testing. Then use the grid for the bulk submission. • dq2-get -n 1 <dataset> container, or • dq2-get --files <file1>,<file2> <dataset> for specific files. This will retrieve one random file from the dataset or You can locate the site-replicas of your datasets with: • dq2-list-dataset-replicas data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62_tid102194_00 INCOMPLETE: NIKHEF-ELPROD_DATADISK • • • COMPLETE: AGLT2_DATADISK,ANL_LOCALGROUPDISK,BNL-OSG2_DATADISK,CERN-PROD_DATADISK,DESYZN_DATADISK,DUKE_LOCALGROUPDISK,FZK-LCG2_DATADISK,GRIF-SACLAY_DATADISK,IN2P3-CC_DATADISK,INFNROMA1_DATADISK,INFN-T1_DATADISK,MWT2_DATADISK,NDGF-T1_DATADISK,NET2_DATADISK,PIC_DATADISK,RALLCG2_DATADISK,SWT2_CPB_DATADISK,TAIWAN-LCG2_DATADISK,TOKYO-LCG2_DATADISK,TRIUMFLCG2_DATADISK,UAM-LCG2_DATADISK,UKI-SCOTGRID-GLASGOW_DATADISK,UKI-SOUTHGRIDRALPP_DATADISK,UNI-FREIBURG_DATADISK,WISC_DATADISK Avoid sites with incomplete datasets. Normally you don’t need this level of information, but can be used in problem diagnosis. For containers you need a slightly different command: • dq2-list-dataset-replicas-container data09_900GeV.00142193.physics_MinBias.merge.AOD.r988_p62/ Sites are given names like: UKI-SCOTGRID-GLASGOW_DATADISK • It is separated into Sitename _ Spacetoken 6 James Walder Tasks • • • Use dq2 to find the dataset located in AMI. Find out which sites the replicas, physically reside on: • • Do you know which clouds (~countries) these belong to? What is the size of the total container (dq2-ls -f <dataset>). Follow the TWiki for instructions on obtaining some data. • Download a couple of files into the /tmp directory from your chosen dataset. • Make a note of which lxplus node your are on, if you need to log out or use a new terminal. 7 James Walder