GEO4012 Data Storage and Research Methodology Some aspects on USE AND STORAGE of DATA AT THE DEPARTMENT Data storage during Master thesis project • Every Master-student at the institute will obtain a user-specific directory to store temporary data, results, documents etc. • Your home-directory (M:\ on windows or ~<username> on Linux) should not be used to store Master project related data. • Reason: • • • More disk space Easier to share data with your supervisor and/or other students working in the same project Jurisdiction Data storage during Master thesis project • Master-project directory • • K:\section- or projectdisk\username (Windows) /felles/section- or projectdisk/username (Linux) • Linux example: Windows example: You will receive an e-mail when the directory has been created. Data storage at IG@UIO • • • • Data and results should be stored in open formats (e.g. ASCII) or Global Standards (e.g. SEGY, bit maps) Proprietary formats (e.g. Excel) should not be used to store final results Raw data, if not ‘confidential’, should preferably be stored at K:\data\<data-type>. The IT-group can help to copy data to that site. IT-related questions should always be directed to drift@geo.uio.no with a cc to your supervisor. It is recommended to talk with your supervisor first. System overview vann DATA HANDLING jern ice ekman abel rossby sverdrup kant HOME DIRECTORIES Some aspects of RESEARCH METHODOLOGY Research Documentation • Why document your research? • • • • • • To allow other researchers to understand the methods you used; To be able to replicate your results; To determine if your findings are reliable; To make it easier for those who come after you; To avoid suspicion of fraud or plagiarism; To receive credit for the research you’ve done on a project and eventually write scientific papers; Research Documentation • How to document your research? • • • • Keep track of all the methods / models used to conduct your research Keep information which describes all aspects of your data (Meta-data) Keep a list of all the scientific papers you read/consult Draft your research report (don’t wait for the very end) Models/Methods • Simulation is an important tool in engineering and research. • But be careful with its use: • • How well does the simulation model reflect the reality? You might be inferring conclusions based on “artificial worlds” ... • So: • Always keep track of the model version you used and all the changes you may have done Meta-data • Metadata (metacontent) are defined as the data providing information about one or more aspects of the data, such as: • • • • • • Means of creation of the data Purpose of the data Time and date of creation Creator or author of the data Location on a computer network where the data were created Standards used Metadata - Examples Bad documentation #!/bin/bash #SBATCH --job-name=test #SBATCH --account=geofag #SBATCH --time=10:10:00 #SBATCH --mem-per-cpu=2000M #SBATCH --nodes=1 --ntasks-per-node=8 source /cluster/bin/jobsetup module load matlab matlab -nodisplay -nodesktop -nosplash < cryo.m Good documentation #!/bin/bash # ### Script for matlab code CryoGrid2 on Abel ### on 8 tasks # ### Mandatory parameters to run a job via SLURM #SBATCH --job-name=PFNNorway #SBATCH --account=geofag #SBATCH --time=10:10:00 #SBATCH --mem-per-cpu=2000M #SBATCH --nodes=1 --ntasks-per-node=8 ## Set up job environment on abel source /cluster/bin/jobsetup module load matlab ## Start matlab matlab -nodisplay -nodesktop -nosplash < cryo.m Metadata- Examples Petrel Metadata - Examples Smart data storage by humans Input data (TB) Article/ Master source Program/code Libraries Compilers Hw/time Script(s) Changes? Machine Meta-data Internet visualization data Output data (TB) To Save Version control • Why do we need version control? • What are the basic operations for version control? • Example with SVN Why do we need version control? • A version control system keeps track of all work and all changes in a set of files, and allows several developers (potentially widely separated in space and time) to collaborate. • To keep track of a larger programming or text project including file locking/version control and conflicts. Other tools for managing projects • rcs - UNIX command: rcs creates new RCS files or changes attributes of existing ones. An RCS file contains multiple revisions of text, an access list, a change log, descriptive text, and some control attributes. • CVS - Concurrent Version Control, http://en.wikipedia.org/wiki/CVS_(software) • GIThub/"GIT" - GitHub offers both paid plans for private repositories, and free accounts for open source projects. http://en.wikipedia.org/wiki/GitHub Basic operations for version control • • • • • • Checkout Update Commit Tag Branch Merge http://en.wikipedia.org/wiki/Revision_control SVN - Initial copy of the repository • Finding the repository Ask a team member where to find it or check the local repository! $ ssh svn.uio.no $ cd /svnroot /usit/vcs-uio/svnroot $ ls osloctm3 ... • Getting the repository (At your master project directory): $ $ $ $ mkdir svn cd svn mkdir osloctm3 svn checkout svn+ssh://svn.uio.no/svnroot/osloctm3 SVN - Normal usage of existing repo • Going there $ cd osloctm3 $ svn update [FILE] U fc/fc-switches.html .. • Editing the file(s) $ emacs –nw fc-switches.html • Checking the updates (optional) $ svn diff fc-switches.html • Sending the change upstream $ svn commit fc-switches.html Note: Why or why not sending the changes upstream? • Yes: you did something for the project Found an error/bug Put new functionality • No: //Think!// Changes are for your interest only It may break the idea of the project NB! *When* you find your changes missing in the original it is way to late and you must drop it. The cost on either side may be quite big choose to make a branch/new repo (who will help you?) • • • • • • • References • USIT (Norwegian): http://www.uio.no/tjenester/it/maskin/filer/versjonskontroll/svn.html • Internet, ie. http://www.abbeyworkshop.com/howto/misc/svn01/ • Subversion own project web http://subversion.tigris.org/ • Wikipedia - http://en.wikipedia.org/wiki/Subversion_%28software%29