Structure - Víctor Soria

advertisement
Structure
Structure is a standard software tool in population genetics. It assigns
proportions of individuals to different clusters according to the genotype
information using Markov Chain Monte Carlo search algorithms. It is useful to
identify population differentiation when individuals are sampled from distinct
geographic locations.
It can be downloaded here:
http://pritchardlab.stanford.edu/structure.html
This practical assumes that you are familiar with the software, for example, by
running the programme previously on a Windows machine.
Structure is already installed on Iceberg in the Genomics folder. Running
Structure on the Iceberg Cluster has the advantage that you can obtain many
replicates very efficiently. In this example we will just create two replicates for K
1-3 using an array job.
Create a folder for structure results in your home directory.
$mkdir structure
Copy data file and parameter files to your structure folder in your home
directory. The data file contains genotypes from 13 microsatellites and 3 Kentish
plover populations. The file ‘mainparams’ contains the parameter settings for
this exercise (‘extraparams’ is empty).
$cp /usr/local/extras/Genomics/HPC_course/structure/*
/home/bo1ckx/structure/
(replace ‘bo1ckx’ with your Iceberg - Username)
Note: The two files can be created using the Windows version of Structure for your project. Once
you imported the files to Iceberg run the commands:
dos2unix mainparams
dos2unix extraparams
to make sure that the files are running in the Linux environment.
Open the shell script, e.g.
$vi KPjobK1.sh
You may also try any other editor such as nano or gedit.
For vi press ‘i’ to be able to insert/delete text.
Replace ‘bo1ckx’ with your username. Check in the last line of the script that the
working directory is correctly spelled and indeed the directory where you copied
the script and parameter files into.
If you would want to increase the number of replicates (please not now!) change
the option
#$ -t 1-2 to #$ -t 1-<Number of replicates>
Press ESC followed by :wq
This will save the changes to the file and then let you quit the file.
The shell script is written for K = 1 and will run structure twice to create two
replicates. Run the shell script.
$qsub KPjobK1.sh
This will take about 1 min. It should have created two files K1_1_f and K1_2_f
once it’s finished.
Copy the script and modify for K = 2.
$cp KPjobK1.sh KPjobK2.sh
$vi KPjobK2.sh
Replace all occurrences of K1 with K2 in this file:
#$-N K2
/usr/local/extras/Genomics/apps/structure/current/structure -K 2 -L
13 -N 86 -i project_data_KP.txt -o
/home/bo1ckx/structure/K2_$SGE_TASK_ID >
/home/bo1ckx/structure/runseqK2_$SGE_TASK_ID
(all in one line and replace bo1ckx with your username)
Submit the job and modify the script for K = 3. Run this job too.
Once this is finished you should find these 6 files in your directory:
K1_1_f
K1_2_f
K2_1_f
K2_2_f
K3_1_f
K3_2_f
To extract the likelihood values of each run, enter the command:
grep -a 'Estimated Ln Prob of Data' K* > loglikelihood.txt
This will write the Loglikelihood values of each run into the file
‘loglikelihood.txt’. Check the content of this file to find any outliers that need
excluding.
Migrate-n
Migrate-n is a popular software that let’s you quantify past gene flow and theta
(a measure for effective population size) between populations. It can be run in
Maximum Likelihood or Bayesian modes. Migrate-n will estimate gene flow in
both directions and therefore can be used to evaluate asymetric gene flow. It
should be run in several replicates. In practise, a migrate-n analyses needs a lot
of parameter optimisation, the chains have to be much longer, so don’t expect to
finish the analyses in a few days.
More information about migrate-n is found here:
http://popgen.sc.fsu.edu/Migrate/Migrate-n.html
Migrate-n can be run in parallel which speeds up the analysis greatly. When run
in parallel, the maximum number of cores should be smaller or equal the number
of loci x number of replicates plus one. For example if the data set has 10 loci and
5 replicates are needed you should not request more then 51 processors. You
can always request less processors, this will make your job start faster.
Again, here we only demonstrate how the software is installed and run on the
cluster. Please study the manual to understand what the software does and how
it works.
At the beginning make sure that you are in your home folder.
pwd
/home/bo1ckx/
On your screen bo1ckx’will be replaced with your username.
(If you are somewhere type: cd <ENTER>)
Download the program from the web.
wget
http://popgen.sc.fsu.edu/oldversions/3.x/3.5/migrate3.5.4.src.tar.gz
If for some reason the download fails copy the zipped source file from the directory
/usr/local/extras/Genomics/HPC_course/migrate/Download into your home
directory and proceed with unzipping the file.
Installation and Configuration
First unzip the file in your home directory.
tar xvfz migrate-3.5.4.src.tar.gz
This will create the program folder migrate-3.5.4 containing the program. Move
to the sub folder src and install the program.
cd migrate-3.5.4/src
./configure
make
To enable the software to run in parallel we need to load the appropriate parallel
module.
module add mpi/gcc/openmpi/1.4.4
You can now create an executable file that is configured for parallel computing.
make mpis
The new executable migrate-n will be created.
Test whether the program works:
cd ../example
../src/migrate-n parmfile.testbayes
This will start the program and you will see a Start menu on your screen. Press
‘q’ to quit. When we run later the program on the cluster we disable this Start
Menu in the Parameter-File (parmfile). If you forget this, your analyses will stall.
The software is now ready to use.
Copy data files to working directory
Before we can start with the analyses we need to copy the data, parameter and
qsub files into directory of the program. The data are the same as used
previously in the STRUCTURE exercise.
cd
cp
cp
cp
/usr/local/extras/Genomics/HPC_course/migrate
migrat3.sh /home/bo1ckx/migrate-3.5.4/
parmfile /home/bo1ckx/migrate-3.5.4/
infile-KP3_13.txt /home/bo1ckx/migrate-3.5.4/
cd /home/bo1ckx/migrate-3.5.4
As before bo1ckx needs to replaced with your username in all instances.
Run the script
Enter qsub migrat3.sh to submit the job to Iceberg. It is important that you
are in the right directory (/home/bo1ckx/migrate-3.5.4/) to run this
because the script will call the executable file relative from your position. Use
Qstat to check whether it has started. It will require about 3-5 min to finish. If
the program doesn’t run check first where you are using pwd
Note the Menu is turned off now. To see the progress of the programme open
MigKP.o<JOBID>
less MigKP.o<jobID>
Once the program finished a .pdf document with the results will appear in the
directory of the program.
If you have any questions please get in touch: c.kuepper@sheffield.ac.uk
Download