Matrix Comparison Bootstrap Program First off, I am no R programmer, so experienced R programmers will be pretty unimpressed. That said, it does appear to work for appropriate data sets, but I present it as is, with no guarantees. This program compares the genetic covariance matrices derived from two data sets. Our original interest was to compare the covariance matrix for a stock population with the covariance matrix for a population that had diverged due to two generations of brother sister mating. Thus, the null hypothesis is that the two covariance matrices are identical. (Yes, this is a frequentist approach!). I have implemented three separate sets of tests: 1) Modified Mantel/Bartlett/Rank tests Goodnight, C. J. and J. M. Schwartz. 1997. A bootstrap comparison of genetic covariance matrices. Biometrics 53:1026-1039. This set of tests recognizes three manners in which two matrices can differ. First they can differ in: Rank: Basically the number of traits with a non-zero genetic variance. There is the slight complication that occasionally one trait is a linear combination of two or more other traits. If this linear combination trait is not counted in the rank. This is measured by D, the difference in rank. It is not calculated by this program, but it is easily calculated from the compiled data. Size: The hypervolume that is enclosed by the matrix as measured by the determinant. In the univariate case this is simply the variance, for the multivariate case it is the total variance corrected for covariances among traits. This is measured using a signed Bartletts test. The signied Bartlett’s test compares the size of two matrices A and B. If they have the same size it returns zero. If A is larger than B it returns a positive number, and if B is larger than A it returns a negative number. Depending on the nature of your hypothesis you can used the signed value (if it is a one tailed test) or the absolute value (two tailed test). Shape: Two matrices have the same shape if the relative magnitudes of the variances and the covariances are the same for all traits and pairs of traits. This is measured by the modified Mantell’s test. This test is modified to (1) make it appropriate for covariance matrices, and (2) to correct the null hypothesis to be that the two matrices have identical shapes. 2) Random Skewers Cheverud, J. M. 1996. Quantitative genetic analysis of cranial morphology in the cotton-top (Saguinus oedipus) and saddle-back (S. fuscicollis) tamarins. J. Evol. Biol. 9:5–42. Revell, L. J. 2007. The G matrix under fluctuating correlational mutation and selection. Evolution 61:1857–1872. This uses the method of Cheverud (1996) modified by Revell (2007), but corrected for the null hypothesis as outlined in Calsbeek and Goodnight (2009) This test compares two G matrices by generating a large number of random unit vectors (number is programmable). Each vector is separately multiplied by the two G matrices and the correlation between the two vectors is calculated. If the two matrices are identical then the vector correlation will be 1. Matrices that are not identical will have correlations that are less than one, and may be negative. The average correlation for the large number of random vectors is reported. We perform the random skewers test in different ways: rawRSkewers are random skewers on the original G matrices, using all of the variables, whether or not they are present in both matrices (if a trait has a zero variable it will not be present in the matrix). In many cases this will be an appropriate statistic, however, it has two shortcomings to be aware of. (1) if a trait is not present in one matrix this will drastically lower the vector correlations, and (2) because there is no standardization of the variables the results of this test will change if the scale of measurement is changed. Also, the correlations will be dominated by the numerically largest variables. rawStdRSkewers are random skewers done on the original G matrices, but with the variables being divided by the genetic standard deviation for each trait. This will remove the effects of scale of measurement. subsetRSkewers are random skewers done on the original G matrices, but only using the variables that are present (have a non-zero variance) in both matrices. This removes the first potential difficulty since no variables are considered unless they are present in both matrices. However, this statistic is sensitive to changes in scale, and will be dominated by the numerically largest variables. subsetStdRSkewers are random skewers done on the original G matrices, but only using the variables that are present (have a non-zero variance) in both matrices.. This is the most modified of the statistics, but is free of both the issue of comparing matrices with different variables, and the problems of scale and domination by the numerically largest variables. My general recommendations: I recommend using standardized skewers. Unless your traits are all similar, measured on the same scale, and of approximately the same magnitude the untransformed skewers will give you very difficult to interpret results that will be dominated by one or two variables. The choice of using the “raw” or “subset” skewers depends on the biological question you are interested in. For questions about whether two populations evolve differently that allows for complete loss of variance for some traits the “raw” is probably more appropriate. If you want to know whether or not two matrices are the same where comparable the “subset” might be more appropriate. 3) Selection Skewers. Calsbeek, B. and C. J. Goodnight, 2009. Empirical comparison of G matrix test statistics: finding biologically relevant change. Evolution 63: 2627-2635. This method compares two matrices by comparing the multivariate response to specific selection pressures. It is particularly appropriate when there is an a priori selection pressure that is of interest. Selection “skewers” are specified in the program. Each selection skewer is a set of weightings for the strength of selection on each trait. These weightings are used to create an index, and the data set is ranked by this index. The proportion of the population corresponding the “SelSkewSelectIntensity” is used to calculate the S selection vector. This is multiplied by GP-1 for both matrices. These “R” vectors are standardized by the phenotypic standard deviation, and the correlation between the two R matrices is calculated. Note that this method takes into account not only differences in the G matrix, but also differences in the P matrix to calculate the resulting correlation. These results are in a separate variable labeled “SelectionSkewers” IN the statistics output I have, for completeness, included statistics on both the complete bootstrap data set, and for the subset in which the two bootstrap covariance matrices have at least two variables in common. In general I would recommend using the complete data rather than the subset. The subset is identified with the “NoO” attached to the statistic name. Some General Comments on the Program This program generates bootstrap data sets in which the null hypothesis that there is no true differences between the two data sets is true. It does this by using a single specified data set to generate bootstrap data sets that have the same structure as the original data sets, but with the actual numbers coming from the specified data set. What the program does is it produces a set of bootstrap data set statistics where the null hypothesis is true. To use this for statistical testing you need to calculate the statistics on the actual data (I leave that to you, but most of the formulae are already in this program), and compare the result from the actual data to the result from the bootstrap data sets. If the actual data set is more extreme than 95% (or what ever you choose) of the bootstrap data samples then the results should be considered to be significant. The important point is that this program generates the bootstrap data sets for which the null hypothesis is true. It DOES NOT analyze the original data set. I hope to have that program up and running one day, but for the moment I leave that to you. Some notes to familiarize you with these statistics. If the “JointRank” is 1 then the subset skewers will necessarily have a correlation of 1, and the modified Mantell will have a correlation of 0. You should take that into account in your comparisons! Running the program The program can be run by cutting and pasting the entire file into R (As I said, I am no R programmer!). You will need to change some of lines to reflect the data sets you want to analyze, and where they are stored on your computer. The commands that need to be changed are near the bottom of the file and separated out by comments: ######################################################## # # # Load the correct data sets and function names. # # This will change from run to run depending on needs # # # ######################################################## # #### load the right data sets stockbal <-read.table("/Research/R Trib Bootstrap/balanced stock females.txt") stockbal <- data.frame(stockbal, row.names = 1:length(stockbal[,1])) stock <-read.table("/Research/R Trib Bootstrap/stock data female.txt") stock <- data.frame(stock, row.names = 1:length(stock[,1])) Pop3 <-read.table("/Research/R Trib Bootstrap/population 3 females.txt") Pop3 <- data.frame(Pop3, row.names = 1:length(Pop3 [,1])) #### various lists that will be used by the functions. FactorNames <- c("Sire", "Dam") StockFactorNames <- c("Sire", "Dam") Pop3FactorNames <- c("Sire") Traits <- c("Pupal_Mass", "Dev_Time", "Dry_Mass", "Rel_Fit") ### in the following the list must be of length (number of traits X Number of skewers) ### any other result will give an error. Number_of_Skewers <- 5 #This is the number of selection skewers to be examined Skewers <- c( 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1) # These are the selection skewers #Parameters to adjust # This is the strength of selection on the selection skewers. This should be the proportion selected # It is a number between 0 and 1, with the smaller the number the stronger the selection. # adjust the following for different mating systems. For sires in a half sib design VA = 4*var(half sibs) # for full sibs adjust VA = 2 * var(full sibs) Adjust as appropriate for your system VA_multiplier <- 4 #adjust this as appropriate. Typically it will be 4 since var(fullsibs)-var(halfsibs) = 1/4 VD VD_multiplier <- 4 SelSkewSelectIntensity <- 0.5 #These parameters change the number of bootstraps, and number of skewers in the random skewers NumberOfBootStraps <- 10 # the number of boot strap samples generated SkewerNumber <- 1000 # the number of random skewers generated You will need to change the file names to the correct names, and also change the factor names, traits names, and selection skewers as appropriate. Spelling and capitalization counts here! Factor names and Traits must be the exact names that head the data columns. For the selection skewers this example has five skewers with an entry for each of the four traits. Thus each column is a weighting, and each row is a separate skewer. Pay particular attention to the VA and VD multipliers. These will change depending on the breeding design. For a standard half sib design the VA multiplier is 4 (VA = 4 * variance among half sibs), and the VD multiplier is 4 (VD = 4 *(Var full sibs-Var half sibs)), but this can change for different breeding designs. You should also change the NumberOfBootStraps to a small number (I use 3) until you are satisfied that the program is working. An actual bootstrap analysis should be done on at least 100, and more likely 1000 bootstraps. After everything is adjusted and ready to go the program can be run by running the command: RunMatrixAnalysis(stockbal, stock, Pop3, FactorNames, StockFactorNames, Pop3FactorNames, Traits) You will probably have changed the parameters enclosed in parentheses to reflect your data. The program will eventually print out the statistical analyses that will look something like this: [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 [1] "General Statistics" [1] "below are the general statistics and the general statistics using the subset with joint matrices of at least size 2" [1] "Subset stats have NoO attached to them" [1] "Number of bootstraps equals" [1] 10 [1] "Number of bootstraps with joint matrix size of at least 2 is" [1] 4 [1] "the dataframe name for this is StatOut" [1] " " StatName Statistic LessThanOrEqual GreaterThanOrEqual MoreExtreme LessThanOrEqualNoO GreaterThanOrEqualNoO MoreExtremeNoO 1 D 1.000000e+00 1.0 0.3 0.5 1.00 1.00 1.00 2 Bartlett -1.102106e+02 0.3 0.7 0.3 0.50 0.50 0.50 3 Mantel -6.633548e-01 0.0 1.0 0.2 0.00 1.00 0.50 4 rawRSkewers 1.195078e-03 0.5 0.5 0.5 0.25 0.75 0.75 5 rawStdRSkewers 8.868861e-02 0.7 0.3 0.5 0.75 0.25 0.75 6 subsetRSkewers 8.796094e-01 0.0 1.0 1.0 0.00 1.00 1.00 7 subsetStdRSkewers 6.387489e-01 0.4 0.6 0.6 1.00 0.00 0.00 [1] "below are the selection skewers and the selection skewers using the subset with joint matrices of at least size 2" [1] "Subset skewer stats have NoO attached to them" [1] "Number of bootstraps equals" [1] 10 [1] "Number of bootstraps with joint matrix size of at least 2 is" [1] 4 [1] "the dataframe name for this output is SelSkewOut" [1] " " LessThanOrEqual GreaterThanOrEqual MoreExtreme LessThanOrEqualNoO GreaterThanOrEqualNoO MoreExtremeNoO 1 0.9 0.1 0.1 1.00 0.00 0.00 2 0.4 0.6 0.6 0.50 0.50 0.50 3 4 5 0.8 0.1 0.8 0.2 0.9 0.2 0.2 0.6 0.2 0.75 0.25 0.75 0.25 0.75 0.25 0.25 0.75 0.25 These results can be used directly, or you can access the original data as follows: AnalysisResults: The analysis of the original data for D, Bartlett, Mantel, and the random skewers AnalysisSelectionSkewers: The analysis of the original data for the selection skewers BootResults: The analysis of the bootstrap data sets for D, Bartlett, Mantel, and the random skewers SelectionSkewers: The analysis of the bootstrap data for selection skewers. StatOut: The statistical analysis. For easier to read output try “format(StatOut, justify=c("left"), scientific=FALSE)” SelSkewOut: The statistical analysis of the selection skewers. One hint: If you want to do a lot of bootstrap samples, it might be a good idea to do a subset of the bootstraps (say 1000 at a time) and save them as text files, merge them and then do the analysis on the merged data set. Otherwise you could tie up your computer for an extended time, and be very vulnerable to computer glitches.