P2.1_DNA_variation

advertisement
GENETIC VARIATION BASED ON DNA SEQUENCES
1. Open the file Chiocytb.fas in BIOEDIT. This alignment includes 124 sequences of 700
bp of the mitochondrial DNA (mtDNA) gene cytochrome b obtained from populations
of Chioglossa lusitanica. The name of each sequence includes the code of the
respective population (see figure 1) followed by the code of the specimen. As it is
mtDNA, only one sequence per specimen exists.
SAL
FS
N
PO
BA
G
B
Douro
Vsi
M
T
C
SA
BU
VA
Mondego
AC
Lvi
Lcp
MU
Tejo
0
100
Figure 1. The 17 sampled localities.
1.1. Verify whether the sequence alignment needs corrections. If not, you can close
BIOEDIT.
2. Now, an evaluation of the summary diversity contained in these sequences will be
done. Open DnaSP. Open the file Chiocytb.fas (File>Open Data File>). A small window
appears with information about the file (nr. of sequences, length, etc.). Always verify if
this information is correct.
2.1. First we need to tell the software that we are analyzing mtDNA sequences. Go to
Data>Format> and select the appropriate option.
2.2. Now we will calculate some diversity indices. Choose Analysis>DNA
polymorphism. A text file is generated with some of the basic diversity indices.
Save this file with an appropriate name (File>Save Current Output). With this
information, answer the following questions:
- How many different haplotypes can be found among the 124 sequences?
- What is the haplotype diversity (Hd)?
- How many positions are variable in the alignment (S)?
- What is the nucleotide diversity (π)?
This provides a very general idea of the diversity that can be found in the dataset. To
perform a more detailed analysis other options can be selected.
2.3. Now, the objective is to study how diversity varies with latitude, as it has been
done for the nuclear DNA markers. To do so, each population must be analyzed
separately. Thus, 17 subsets (= different sampled localities) must be defined from
the data. Follow Data>Define sequence sets. From the list, select all sequences
corresponding to the “Sal” population. Press “>>” to include them in the first
group. Press “Add new sequence set” and write the name of the sample location
(e.g. “Sal”). Do the same for the remaining 16 “populations”. At the end press
“Update all entries”.
2.4. Estimate the haplotype diversity (Hd) in all populations with sample size over 5. To
do so, select each population in the intermediate menu that shows up after
pressing Analysis>DNA polymorphism.
- How does the diversity vary with latitude?
2.5. DnaSP allows converting files to different formats. NETWORK, which we will use
afterwards, reads several file formats including “phylip” (.phy). Arlequin uses a
specific file format (.arp), which can also be generated by DnaSP. The partition into
populations done in 2.3 can be saved a re-read afterwards if saved in nexus format
(.nex).
2.5.1. To generate these files select File>Save/Export Data as…> and choose the
appropriated options.
2.5.1.1. Regarding the nexus and phylip formats, you will only need to indicate
the name of the file (suggestion: chiocytb_phy.phy, for the phylip file).
2.5.1.2. Regarding the “arlequin” file format, you still need to choose amongst
a set of options (here, we will use the default selection and press OK)
before saving. Then, two filenames are needed: one file consisting of
the list of haplotypes (suggestion: chiocytb_hap.hap) and another
corresponding to the input for Arlequin (suggestion: chiocytb_arp.arp).
When opening this file with TextPad, you will notice that the previously
defined structure of the dataset (split in sample localities) is retained.
Download