ESG-Walkthrough-Atif

advertisement
An ESG Walkthrough
-ESG Federation website
-- DCC File system for ESG
Muhammad Atif
ESG-NCI Gateway
Search
We highly recommend
subscribing to our
tweets
Latest news
Search and Access Data



You can search without login; but not download
Quick Links → Create Account

Follow the on screen instructions

You will receive an email to confirm your
registration; at the same time the administrators
are also notified.

Admin has to validate you before you can download
the data.
After confirmation email from admin; login to the NCI
Node

Account → Apply for Group Membership
 Mk-3.6
 CMIP-5 research (not in our control – request
goes to PCMDI)

Requests to NCI are usually quick.

Same ID can be used on all the gateways.
Your OpenID: https://esg.nci.org.au/esgcet/myopenid/<username>
Searching for data
•Recommend that you browse the website and get familiarized.
•If asked about authentication, you may use a temporary openid
https://esg.nci.org.au/esgcet/myopenid/dcc000
Password: abc123
Please note that this openid will be removed after the workshop, it is highly recommended
that you create your openid
Download data from the Gateway

Three Download Methods



Using the web browser.
A set of wget-scripts.
Via GridFTP (Data mover lite)
Download Method-1 (Web browser based)

Intuitive but slow


Works like normal downloads from the browser
of your choice i.e. click to download.


Follow on-screen instructions and nothing can go wrong.
IE, FF, Chrome and Safari are supported.
Works well if you are after couple of files
Download -2 (Wget Scripts)

Ability to download multiple files




Select the files (variables) you are interested in.
Presents a wget-download.sh script, that you need to save and run.
Command line based – No GUI
Two methods of Authentication

Authorization token (Depreciated)



My Proxy Login (Official)

Needs a separate step for authentication.

Need to run Java applet or MyProxyClient software
If authentication expires, just run the MyProxyClient software


PCMDI gateway only.
No login required, However the token expires in 24 hours and
the script is of no use after that.
Note: If you are interested in doing lots of downloads, we can provide a custom
script to speedup the process on DCC………
Example to follow later
Process for MyProxyLogon
Download on the DCC
•ssh abc123@dcc.nci.org.au -Y
•Download MyProxyLogon-ESG.jar file
•wget http://esg.nci.org.au/esgcet/webstart/myProxyLogon/MyProxyLogonESG.jar
•Run MyProxyLogon.jar file;
•module load java
•Java –jar MyProxyLogon-ESG.jar
•It writes the certificates to your $HOME/.esg folder
•Run the wget-download.sh command
Download-3 (DML)

Parallel Downloads


Faster than wget


DML Preferences Concurrency
Uses GridFTP
Caveat: Not available on all ESG nodes

NCI one of the few that has the facility
ESG Data on the DCC

IPCC AR5/CMIP5




CSIRO-QCCCE Mk-3.6
CAWCR ACCESS
Replicated data from the other ESG nodes.
Other data



CMIP3
Observational data
Processed data
DCC File system organization

All ESG data in /projects/ESG:
•
•
•

/projects/ESG/Authoritative




Authoritative
Unofficial-ESG-replica
CAWCR_CVC_processed
Serves data using the policies of the ESG Federation
This is the directory that our ESG software serves data from
All data is the current official copy
User example: login to the DCC and have a look.
Unofficial Replica

/projects/ESG/Unofficial_Replica


IPCC
 The IPCC directory is where you can reference data that we have
downloaded from other nodes (though not an official replica).
The subdirectories could be partial datasets or complete ones.
IPCC_tmp_flat



Direct symlinks to files, flat directory structure
tmp
 You can download your data here in $USER folder.
 We can provide you with scripts to help download data here
GlobalObs_and_Reanalysis
 data sourced from various places that Lawrie Rikus/Ben Hu have
been maintaining.
 Also served through a THREDDS service - for remote access
Unofficial Replica

/projects/ESG/Unofficial-ESG-replica/IPCC

User downloads using wget scripts/DML.

Partial data; Not all of the data is downloaded.

Does not necessarily contain the most up to date version



Data may be changed by the remote node since last
download.
ESG (and official replica directory) always has the latest
version.
Organised as Data Reference Syntax (DRS)
Data Reference Syntax (How files are organized @ DCC)
• This is how the tree looks like compared to DRS
• DRS
cmip5.<product>.<institute>.<model>.<experiment>.
<time_frequency>.<realm>.<cmor_table>.<ensemble>
• File System
/projects/ESG/unofficial-ESG-replica/IPCC/CMIP5/output1/NCC/NorESM1M/historical/mon/seaIce/OImon/r1i1p1/v20110901/sic/<FILE>
Downloading data to DCC File System
•If you would like a significant amount of data that
we don’t have, then … please contact us.
•Reasons:
• It may already be downloaded but not linked
• Downloading data is still tricky
• Space management
•That said – we would like to facilitate downloads
of priority data.
•How? …. Lets do it
Demo
•
•
•
•
•
•
•
•
•
Download the wget file from esg-gateway
ssh –Y user@dcc.nci.org.au
java –jar MyProxyLogon-ESG.jar
Copy wget-file to dcc (scp, copy n paste) in a new folder
./esg-download.py wget-download.sh
View the directory, it should have a number of wget-split-*
./esg-qsub-download.py –i wget-splitPress “y”
Check the files after some time
We will be streamlining it further
Help
• DCC and ESG both are evolving continuously
• Comments and suggestions are always welcome
• Help Desks
– Anything related to ESG federation website/ other
models are not native to NCI
• Cmip5-helpdesk@stfc.ac.uk
– Related to DCC compute cluster
• help@nf.nci.org.au
•Downloads are managed by Us
•GridFTP
•Fast
•Downloads managed by you as a user

Controlled Vocab:

http://esg-pcmdi.llnl.gov/internal/esg-data-nodedocumentation/cmip5_controlled_vocab.txt
ESG-NCI Gateway
Search by
categories
We highly recommend
subscribing to our tweets
Search and Access Data



You can search without login; but not download
Quick Links → Create Account

Follow the on screen instructions

You will receive an email to confirm your
registration; at the same time the
administrator(s) are also notified.

Admin has to validate you before you can
download the data.
After confirmation email; login to the NCI Node

Account → Apply for Group Membership
 Mk-3.6
 CMIP-5 research (not in our control –
request goes to PCMDI)

Requests to NCI are usually quick.

For others, this may take time (one – two days)
Searching for data
Download data from the Gateway

Three Methods



Using the web browser.
A set of wget-scripts.
Via GridFTP (Data mover lite)
Download – 1 (Web based)

Intuitive but slow


Works like normal downloads from the browser
of your choice


Follow on-screen instructions and nothing can go
wrong.
IE, FF, Chrome and Safari are supported.
Works well if you are after couple of files
Download -2 (Wget Scripts)

Ability to download multiple files



Presents you with a wget-download.sh script.
Command line based – No GUI
Two methods

Authorization token (Depreciated)



PCMDI gateway only.
My Proxy Login (Official)

Needs a separate step for authentication

Need to run Java applet or MyProxyClient software
Note: If you are interested in doing lots of downloads, we
can provide a custom script to speedup the downloads on
DCC.
Process for MyProxyLogon
Download on the DCC
•ssh abc123@dcc.nci.org.au -Y
•Download MyProxyLogon-ESG.jar file
•wget http://esg.nci.org.au/esgcet/webstart/myProxyLogon/MyProxyLogonESG.jar
•Run MyProxyLogon.jar file; instructions are provided in the wget download
script that you have already downloaded via web-browser.
•module load java
•Java –jar MyProxyLogon-ESG.jar
•It writes the certificates to your $HOME/.esg folder
Download-3 (DML)

Parallel Downloads


Faster than wget


DML Preferences Concurrency
Uses GridFTP
Caveat: Not available on all ESG nodes

NCI one of the few that has the facility
Interacting with ESG data on DCC
ESG Data on the DCC

IPCC AR5/CMIP5




CSIRO-QCCCE Mk-3.6
CAWCR ACCESS
Replicated data from the other ESG nodes.
Other data



CMIP3
Observational data
Processed data
DCC File system organization

All ESG data in /projects/ESG:
•
•
•

/projects/ESG/Authoritative
 Serves data using the policies of the ESG Federation



Authoritative
Unofficial-ESG-replica
CAWCR_CVC_processed
This is the directory that our ESG software serves data from
All data is the current official copy.
User example: login to the DCC and have a look.
Unofficial Replica

/projects/ESG/Unofficial_Replica


IPCC
 The IPCC directory is where you can reference data that we have
downloaded from other nodes (though not an official replica).
The subdirectories could be partial datasets or complete ones.
IPCC_tmp_flat



Direct symlinks to files, flat directory structure
tmp
 You can download your data here in $USER folder.
 We can provide you with scripts to help download data here
GlobalObs_and_Reanalysis
 data sourced from various places that Lawrie Rikus/Ben Hu have
been maintaining.
 Also served through a THREDDS service - for remote access
Unofficial Replica

/projects/ESG/Unofficial-ESG-replica/IPCC

User downloads using wget scripts/DML.

Partial data; Not all of the data is downloaded.

Does not necessarily contain the most up to date version



Data may be changed by the remote node since last
download.
ESG (and official replica directory) always has the latest
version.
<MOVE TO NEW SLIDE>Organised as Data Reference Syntax
(DRS)
cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<realm>.
Downloading data to our ESG
•If you would like a significant amount of data that
we don’t have, then … please contact us.
•Reasons:
• It may already be downloaded but not linked
• Downloading data is still tricky
• Space management
•That said – we would like to facilitate downloads
of priority data.
•How? …. (new slide)
•Downloads are managed by Us
•GridFTP
•Fast
•Downloads managed by you as a user

Controlled Vocab:

http://esg-pcmdi.llnl.gov/internal/esg-data-nodedocumentation/cmip5_controlled_vocab.txt
Download