PBG/MCB 621 – Class Exercise 2 Friday Oct 31 st 2014

advertisement

PBG/MCB 621 – Class Exercise 2 Friday Oct 31

st

2014

Today’s lab will use data from the same experiment as last week’s lab. The spreadsheet ‘DxB BOPA1

Genotypes.xlsx’ contains marker data for 156 barley DH lines produced from the F1 of a cross between

Derkado and B-83-12/21/5 together with the parents. All the individuals were scored for two naked eye polymorphisms, a major gene disease resistance locus, a biochemical marker and 506 SNP markers. All the SNP markers are scored according to the DNA base that they carry at each marker locus and the remaining four loci are scored with reference to the parental alleles that they possess so that Derkado alleles are code ‘a’ and B83-12/21/5 alleles are coded ‘b’.

The object of the exercise is to convert the data into Joinmap’s format for co-dominant markers and import the data into Joinmap. Then use Joinmap to identify any errors in the data input and check for any other irregularities in the data that might hamper map construction. Note that the data have been pre-arranged with markers in rows and line names in columns to facilitate data import for Joinmap.

Task 1. Convert all scores into ‘ab-‘ format for Joinmap

1.

Copy the line names, excluding the parents, and paste them into the empty area at the right of the data, with the leftmost line name in cell FF1.

2.

Copy the column of marker names (column A) and paste it into column FE.

3.

In cell FF2, enter the formula =IF(D2=$B2,"a",IF(D2=$C2,"b",D2)) to check if the score of the first individual for the first marker is the same as Derkado and, if so, code it ‘a’. If not, the formula checks to see if the score of the first individual for the first marker is the same asB83-12/21/5 and, if so, code it ‘b’. If it is not the same as either parent, then the cell is filled with the original score for that marker.

4.

Extend this formula to fill in the marker data, now coded as parental allele ‘a’ or ‘b’ – i.e., ‘ab-‘

format, for the rest of the markers and lines.

Task 2. Import the data into Joinmap 4.1

1.

In the Windows start-up menu, go to ‘All Programs’ -> ‘bioinformatics’ -> ‘JoinMap4’ and double click JoinMap4 to open Joinmap.

2.

Once Joinmap opens, from the dropdown menus at the top of the screen, choose ‘File’ -> ‘New

Project’(or use the Ctrl+N shortcut) to create a new project.

3.

You will get a pop-up window (‘Save a new project file’). Choose a name for your JoinMap file and save the file. YOU WILL NEED YOUR JOINMAP FILE FOR NEXT WEEK’S LAB. MAKE SURE TO

SAVE IT IN A LOCATION AND WITH A NAME THAT WILL HELP YOU READILY FIND IT LATER.

4.

From the dropdown menus at the top of the screen, select ‘Dataset’ -> ‘Create New Dataset’.

5.

This should open a Dataset node from the Project node in the sidebar at the lefthand side of the screen. It should also create a 2x2 matrix in the main window of the screen (the cells in this matrix are supposed to be blank, with the rows and columns labeled).

6.

At the bottom left of the main screen, a.

in the ‘Pop. name:’ box, enter a text name (without spaces) for the population; b.

below, in the ‘Nr. of loci:’ box, enter the number of loci (510) and, in the ‘Nr. of indiv.:’ box, enter the number of individuals (i.e., number of lines, which is 156); c.

in the ‘Pop. type:’ box, from the dropdown menu, choose the population type (‘DH1’);

The matrix should now be expanded to the dimensions specified.

7.

Copy the section of the Excel file with the genotypes coded in the ‘ab-‘ format (the reformatted dataset you created in Task 1), including the line names and marker names.

8.

Go back to Joinmap and click in the topmost lefthand cell of the matrix (the empty cell, not the row/column name cell labeled ‘Nr’) and paste the ‘ab-‘ coded scores in.

9.

From the dropdown menus at the top of the screen, select ‘Edit’ -> ‘Reset Tabsheet’. This should widen the spacing of the cells.

10.

Directly below the dropdown menus as the top of the screen, there is a row of icons. Click on the 4 th icon from the right, the ‘Add/remove colors conditional on the genotype’ icon (that kind of looks like a game of Tetris). This should produce a graphical display of the genotypic data.

QUESTION 1. Are all the columns equally spaced? If not, what do you notice about any that are different?

No, some columns are wider due to the presence of heterozygous calls

QUESTION 2. Scroll up and down and across the coloured data sheet. Do you notice any irregularities and, if so, report them?

Yes, two rows are coloured magenta-ish as they contain SNP base calls rather than possibly b codes

Task 3. Correct any errors in the data. Note this checks that the codes that you have entered are consistent so, in this case, it checks that all genotypic codes are either ‘a’, ‘b’, or ‘-‘.

1.

From the dropdown menus at the top of the screen, select ‘Dataset’ -> ‘Highlight Errors’. This will colour any errors in a purple/red colour with the cursor underneath the first one (an improperly formatted genotype listed as ‘C/G’), which is highlighted in blue.

2.

Press the F2 key to edit the cell and then, if it is a heterozygote, the ‘-‘ key to replace the value in the cell with a missing value.

3.

Scroll through the data tabsheet and click on any heterozygotes and replace in the same way as

Task 3.2. [Hint: The data have been edited so that you have only 4 heterozygote calls. It may be faster to, after you change each to a missing value, use ‘Dataset’ -> ‘Highlight Errors’ to find the

next heterozygote call.]

4.

Loci numbers 457 and 497 (11_11054 and 11_11445 respectively, and the loci numbers are the row names for this JoinMap matrix) should have a large number of coloured cells with base calls instead of parental alleles. Whilst you could edit these, they are many and it is simplest to use the step below.

5.

Go back to the base calls part of the Excel datasheet and find cells B458 and C458, which should be the parental scores for 11_10454 (row numbers are increased by one relative to those for the

JoinMap matrix, as row 1 contains the line names in Excel). Note that the parents are polymorphic and the progeny polymorphic. Replace the value in cell C458 with G, the value that is not being recognized in the conversion formula, and colour the cell yellow. The conversion cells in row 458 should now all be a and b

6.

Repeat for Row 498 (except this time, you will enter ‘T’ in cell C498).

7.

To change the heterozygote calls to missing data calls in Excel (so you do not have to later repeat Tasks 3.1-3.3): Select all of the original genotype calls in your Excel spreadsheet except

for the parents - i.e., select cells D2:FC511. Note that you are not selecting the line or marker names, and you are also not selecting the two columns of marker data for the parents. Use

Ctrl+F to search for ‘?/?’ (‘?’ is a wildcard in Excel) and replace any instances with ‘-‘, the missing data indicator that JoinMap uses. Your ‘ab-‘ coded data in the section to the right should have incorporated your changes from Tasks 3.4-3.6.

8.

Repeat Tasks 2.7-2.10

9.

This will overwrite your original data sheet with the corrections you made in Excel. Use ‘Dataset’

-> ‘Highlight Errors’ to verify that your JoinMap matrix now has no errors.

10.

From the dropdown menus at the top, select ‘Dataset’ -> ‘Create Population Node’. This should generate a lot of tabs in the main window of JoinMap.

QUESTION 3. Why might you find polymorphic scores for the progeny from a cross and monomorphic scores for the parent for a marker?

Because different parental stocks were most likely used for crossing and genotyping and there would be some heterogeneity in the seed due to sampling products of say an F5 single plant.

Task 4. Check that the segregation of alleles at each marker locus is random

1.

The Population Node should now be highlighted (on the left, under the ‘Dataset 1’ node, with an icon that is a blue ‘P’ in a yellow box with a red border). Click on the ‘Locus Genot. Freq.’ tab and then press the Calculator icon (in the row of icons below the dropdown menus at the top of the screen, 8 th icon from the right, looks like a calculator – alternatively, you could use the shortcut

F9).

2.

This should produce a list of, for each locus, the observed alleles plus a chi-square test of the deviation from expected ratio (1:1 in this case). To obtain a key for this table, go to the row of icons underneath the dropdown menus at the top of the screen, and click on the ‘Show further information’ icon (2 nd icon from the right, a large blue ‘i').

3.

Use Conditional Formatting in Excel to highlight if the probability is less than 0.5

4.

Click on the X2 header twice to produce a list sorted in descending order of chi-square values.

You should now see a downward-pointing triangle in the ‘X2’ column header.

QUESTION 4. How many markers show highly significant deviations (chi-square > 8.0) from a 1:1 ratio?

Hint: select all the rows containing markers above 8 and copy and paste into a new sheet in excel and count.

176

QUESTION 5. Report the marker (markers, if a tie) with the fewest number of scores for any parental allele (number of scores per locus is in columns a and b) together with the number of lines carrying that allele.

SNPs 11_10422 and 11_20929 each identify 17 lines with b alleles.

(11_10863 and 11_20577 have the fewest a alleles at 43 each)

Task 5. Check the frequencies of alleles for each genotype

1.

Click on the ‘Individual Genot. Freq.’ tab and press the Calculator icon (or use the F9 shortcut).

2.

This should produce a list of the sums of the parental alleles and missing value scores carried by each genotype.

3.

Each column is clickable for sorting.

QUESTION 6. Which line (Individual column) has the highest number of Derkado alleles? Express the number as a percentage of the non-missing values. [Hint: Check Task 1.3 for how Derkado alleles are coded (a or b).]

DxB83_136 has the highest number of Derkado allele (367) at 71.96% of total

Question 7. Which line has the highest number of missing values? Express the number as a percentage of the total number of markers?

DxB83_99 has the most missing values (153) at exactly 30% of the total

Task 6. Check the similarities of markers and individuals

1.

Choose the ‘Similarity of Loci’ tab and click on the Calculate icon (or use the F9 shortcut).

2.

You should get a table of similarity between loci. Sort this table by descending order of similarity

(only values above 95% are reported).

3.

Choose the ‘Similarity of Individuals’ tab and click on the Calculate icon

4.

Sort this table in descending order of similarity (only values above 95% are reported).

QUESTION 8. How many pairs of loci have 100% similarity and what percentage is this of all possible pairwise combinations of the total number of markers? Check your calculation of the number of possible pairwise combinations, using the COMBIN(n,r) function in Excel.

351 marker pairs are 100% identical. ½ n(n-1) or COMBIN(n,r) where n=510 and r=2 give 129,795 possible pairwise combinations. 351 identical pairs represents 0.27% of possible total

Task 7. Organise the markers into linkage groups

1.

Click on the ‘Groupings (tree)’ tab and press the Calculate icon.

2.

You will get a tree of collapsible nodes. The node names are formatted LOD/#Groups (#Loci). For example, the topmost node is named 2.0/1/(510), which indicates that it has a) groups linked with LOD=2.0, b) 1 linkage group, and c) 510 loci.

3.

You are trying to find at least 7 major linkage groups to cover the 7 chromosomes of barley.

4.

Right click on each of the nodes that you think corresponds to the 7 or more linkage groups.

[Hint: You should have 5 groups.]

5.

From the dropdown menus at the top, select ‘Population’ -> ‘Create Group Using the Groupings

Tree’.

6.

This should create a Grouping Node (on the left edge of the whole JoinMap window) from the

Population node, where each Group node (Group 1, Group 2, Group 3, Group 4) has 7 or more groups derived from it.

7.

YOU WILL NEED YOUR JOINMAP FILE FOR NEXT WEEK. Before logging out of your computer, make sure your progress is saved by going to the dropdown menus at the top of the screen

and selecting ‘File’ -> ‘Close Project’.

QUESTION 9. Return to the Population node (click the P icon on the leftmost side of the whole JoinMap window). Click on the group that is not linked with the rest at LOD 2.0. What do you notice about the chromosomal location of these markers (when you left-click on that node, then you will get a list of loci in that node, on the right edge of the tab)? Find the previously reported map positions from them by copying them and pasting them into the ‘Mapped Markers’ worksheet in the Excel file, in columns E and F. Then, use the VLOOKUP function to find map positions (the first row of VLOOKUP formulas are already entered). Where are the markers located?

My apologies, there were two problems that might have affected you in this task. The first is that the

JOINMAP installation on your computer might have been modified from default so that you did not get any separation at LOD 2.0. You might have still seen the separation but at a higher LOD and the group should have been 5 markers all with a 2H suffix. These markers are all located at the telomere of the short arm of chromosome 2H with positions ranging from 0 to 10cM. The formula given in the spreadsheet should have been =VLOOKUP($F2,$A$2:$C$1120,2 ,FALSE ). The addition of ‘,FALSE’ means only exact matches are reported whereas omitting it means the first approximate match is reported and that is why some of you might not have found the results below:

11_10326 2H

11_11059 2H

11_20562 2H

11_20609 2H

11_21377 2H

6.45

7.14

10.94

0

8.57

Congratulations. You have now established your linkage groups and are ready to order the loci within but that is next week’s task…….

Download