Structure Structure is a standard software tool in population genetics. It assigns proportions of individuals to different clusters according to the genotype information using Markov Chain Monte Carlo search algorithms. It is useful to identify population differentiation when individuals are sampled from distinct geographic locations. It can be downloaded here: http://pritchardlab.stanford.edu/structure.html This practical assumes that you are familiar with the software, for example, by running the programme previously on a Windows machine. Structure is already installed on Iceberg in the Genomics folder. Running Structure on the Iceberg Cluster has the advantage that you can obtain many replicates very efficiently. In this example we will just create two replicates for K 1-3 using an array job. Create a folder for structure results in your home directory. $mkdir structure Copy data file and parameter files to your structure folder in your home directory. The data file contains genotypes from 13 microsatellites and 3 Kentish plover populations. The file ‘mainparams’ contains the parameter settings for this exercise (‘extraparams’ is empty). $cp /usr/local/extras/Genomics/HPC_course/structure/* /home/bo1ckx/structure/ (replace ‘bo1ckx’ with your Iceberg - Username) Note: The two files can be created using the Windows version of Structure for your project. Once you imported the files to Iceberg run the commands: dos2unix mainparams dos2unix extraparams to make sure that the files are running in the Linux environment. Open the shell script, e.g. $vi KPjobK1.sh Press ‘i’ to be able to insert/delete text. Replace ‘bo1ckx’ with your username. Check in the last line of the script that the working directory is correctly spelled and indeed the directory where you copied the script and parameter files into. If you would want to increase the number of replicates (please not now!) change the option #$ -t 1-2 to #$ -t 1-<Number of replicates> Press ESC followed by :wq This will save the changes to the file and then let you quit the file. The shell script is written for K = 1 and will run structure twice to create two replicates. Run the shell script. $qsub KPjobK1.sh This will take about 1 min. It should have created two files K1_1_f and K1_2_f once it’s finished. Copy the script and modify for K = 2. $cp KPjobK1.sh KPjobK2.sh $vi KPjobK2.sh Replace all occurrences of K1 with K2 in this file. i.e. #$-N K2 /usr/local/extras/Genomics/apps/structure/current/structure -K 2 -L 13 -N 86 -i project_data_KP.txt -o /home/bo1ckx/structure/K2_$SGE_TASK_ID > /home/bo1ckx/structure/runseqK2_$SGE_TASK_ID (all in one line) Submit the job and modify the script for K = 3. Run this job too. Once this is finished you should find these 6 files in your directory: K1_1_f K1_2_f K2_1_f K2_2_f K3_1_f K3_2_f To extract the likelihood values of each run, enter the command: grep -a 'Estimated Ln Prob of Data' K* > loglikelihood.txt This will write the Loglikelihood values of each run into the file ‘loglikelihood.txt’. Check the content of this file to find any outliers that need excluding. Migrate-n Migrate-n is a popular software that let’s you quantify past gene flow and theta (a measure for effective population size) between populations. It can be run in Maximum Likelihood or Bayesian modes. Migrate-n will estimate gene flow in both directions and therefore can be used to evaluate assymetric gene flow. It should be run in several replicates. In practise, a migrate-n analyses needs a lot of optimisation, don’t expect to finish the analyses in a few days. More information about migrate-n is found here: http://popgen.sc.fsu.edu/Migrate/Migrate-n.html Migrate-n can be run in parallel which speeds up the analysis greatly. When run in parallel the maximum number of cores should be smaller or equal the number of loci x number of replicates plus one. For example if the data set has 10 loci and 5 replicates are needed you should not request more then 51 processors. You can always request less processors, this will make your job start faster. Again, here we only demonstrate how the software is installed and run on the cluster. Please study the manual to understand what the software does and how it works. At the beginning make sure that you are in your home folder. pwd /home/bo1ckx/ On your screen ‘bo1ckx’ will be replaced with your username. (If you are somewhere type: cd <ENTER>) Download the program from the web. wget http://popgen.sc.fsu.edu/oldversions/3.x/3.5/migrate3.5.4.src.tar.gz If for some reason the download fails copy the zipped source file from the directory /usr/local/extras/Genomics/HPC_course/migrate/Download into your home directory and proceed with unzipping the file. Installation and Configuration First unzip the file in your home directory. tar xvfz migrate-3.5.4.src.tar.gz This will create the program folder migrate-3.5.4 containing the program. Move to the sub folder src and install the program. cd migrate-3.5.4/src ./configure make To enable the software to run in parallel we need to load the appropriate parallel module. module add mpi/gcc/openmpi/1.4.4 You can now create an executable file that is configured for parallel computing. make mpis The new executable migrate-n will be created. Test whether the program works: cd ../example ../src/migrate-n parmfile.testbayes This will start the program and you will see a Start menu on your screen. Press ‘q’ to quit. When we run later the program on the cluster we surpress this Start Menu in the Parameter-File (parmfile). If you forget this, your analyses will stall. The software is now ready to use. Copy data files to working directory Before we can start with the analyses we need to copy the data, parameter and qsub files into directory of the program. The data are the same as used previously in the STRUCTURE exercise. cd cp cp cp /usr/local/extras/Genomics/HPC_course/migrate migrat3.sh /home/bo1ckx/migrate-3.5.4/ parmfile /home/bo1ckx/migrate-3.5.4/ infile-KP3_13.txt /home/bo1ckx/migrate-3.5.4/ cd /home/bo1ckx/migrate-3.5.4 Run the script Enter qsub migrat3.sh to submit the job to Iceberg. Use Qstat to check whether it has started. It will require about 3-5 min to finish. Note the Menu is turned off now. To see the progress of the programme open MigKP.o<JOBID> less MigKP.o<jobID> Once the program finished a .pdf document with the results will appear in the directory of the program.