GENETIC VARIATION BASED ON DNA SEQUENCES 1. Open the file Chiocytb.fas in BIOEDIT. This alignment includes 124 sequences of 700 bp of the mitochondrial DNA (mtDNA) gene cytochrome b obtained from populations of Chioglossa lusitanica. The name of each sequence includes the code of the respective population (see figure 1) followed by the code of the specimen. As it is mtDNA, only one sequence per specimen exists. SAL FS N PO BA G B Douro Vsi M T C SA BU VA Mondego AC Lvi Lcp MU Tejo 0 100 Figure 1. The 17 sampled localities. 1.1. Verify whether the sequence alignment needs corrections. If not, you can close BIOEDIT. 2. Now, an evaluation of the summary diversity contained in these sequences will be done. Open DnaSP. Open the file Chiocytb.fas (File>Open Data File>). A small window appears with information about the file (nr. of sequences, length, etc.). Always verify if this information is correct. 2.1. First we need to tell the software that we are analyzing mtDNA sequences. Go to Data>Format> and select the appropriate option. 2.2. Now we will calculate some diversity indices. Choose Analysis>DNA polymorphism. A text file is generated with some of the basic diversity indices. Save this file with an appropriate name (File>Save Current Output). With this information, answer the following questions: - How many different haplotypes can be found among the 124 sequences? - What is the haplotype diversity (Hd)? - How many positions are variable in the alignment (S)? - What is the nucleotide diversity (π)? This provides a very general idea of the diversity that can be found in the dataset. To perform a more detailed analysis other options can be selected. 2.3. Now, the objective is to study how diversity varies with latitude, as it has been done for the nuclear DNA markers. To do so, each population must be analyzed separately. Thus, 17 subsets (= different sampled localities) must be defined from the data. Follow Data>Define sequence sets. From the list, select all sequences corresponding to the “Sal” population. Press “>>” to include them in the first group. Press “Add new sequence set” and write the name of the sample location (e.g. “Sal”). Do the same for the remaining 16 “populations”. At the end press “Update all entries”. 2.4. Estimate the haplotype diversity (Hd) in all populations with sample size over 5. To do so, select each population in the intermediate menu that shows up after pressing Analysis>DNA polymorphism. - How does the diversity vary with latitude? 2.5. DnaSP allows converting files to different formats. NETWORK, which we will use afterwards, reads several file formats including “phylip” (.phy). Arlequin uses a specific file format (.arp), which can also be generated by DnaSP. The partition into populations done in 2.3 can be saved a re-read afterwards if saved in nexus format (.nex). 2.5.1. To generate these files select File>Save/Export Data as…> and choose the appropriated options. 2.5.1.1. Regarding the nexus and phylip formats, you will only need to indicate the name of the file (suggestion: chiocytb_phy.phy, for the phylip file). 2.5.1.2. Regarding the “arlequin” file format, you still need to choose amongst a set of options (here, we will use the default selection and press OK) before saving. Then, two filenames are needed: one file consisting of the list of haplotypes (suggestion: chiocytb_hap.hap) and another corresponding to the input for Arlequin (suggestion: chiocytb_arp.arp). When opening this file with TextPad, you will notice that the previously defined structure of the dataset (split in sample localities) is retained.