README.

advertisement
This is a help file of how to use the matlab function for a decomposition of the variation
Outline:
- Short overview.
- Example.
- List of all the matlab functions in this folder and what they do.
#############################
Short Overview
#############################
The functions in this folder were created to decompose variation in a set of reaction norm curves into
three directions: vertical-shift, horizontal shift and generalist-specialist.
Here are the two steps of that you will need to perform for the analysis.
Step 1: Formatting the data and choosing sensible starting values.
The function "FormatData.m" helps in formatting and in the choice of intial values.
This function also return exploratory graphics in the file alldata.ps
Step2: Decomposition of the variation along the three modes of interest: vertical shift, horizontal shift
and generalist-specialist.
The function "fitpolywmh.m" will do the fitting and output the graphical and numerical results (graphics
are in the file fig3dir.ps).
You will mainly need to use these two matlab functions (FormatData.m and fitpolywmh.m).
All other matlab functions in this folder will be called by one or the other function, so you will need to
download them all for the two main functions to work properly, but you will not need to use them.
###############################
Example
###############################
1. Toy data
-----------The Excel file "toydata.xls" is a toy example data set. It is a subset of data collected on caterpillars,
courtesy of Professor Joel Kingsolver at the University of North Carolina at Chapel Hill.
This data set has the growth rate (column 3 of the Excel file), for 11 families (family labels are on the first
column: 13-23), at six different environments (here the environments are temperatures in Celsius: 11
17, 23, 29, 35, 40. They are in column 2). The data is organized so that values within each family are
listed individual by individual. Here are the first 12 rows of the file:
mom
13
13
13
13
13
13
13
13
13
13
13
13
temp
11
17
23
29
35
40
11
17
23
29
35
40
grorate
0.161068702
0.379470199
0.63253012
0.973228346
1.086792453
1.120930233
-0.049618321
0.925827815
0.590963855
1.951181102
0.277358491
0.502325581
rows 1 to 6 have the growth rate of individual 1 in family 13 at temperatures 11,17,23,29, 35, and 40
respectively. Rows 7-12 have the growth rate of individual 2 in family 13
at temperatures 11,17,23,29, 35, and 40 respectively.
Note that the temperatures are all in the same order for all individuals (they do not need to be in
ascending order). Note also that some values are missing (denoted by nan)
Since we have 6 different environment, we decided to conduct the analysis with a common shape
polynomial of degree 4.
2. Formatting the data
---------------------- Start a matlab session, make sure to set your working matlab forlder to the same folder where you
stored the matlab functions for this analysis and the toy example Excel file.
- Write the following command in the matlab command line
[data,env,pinit,yinit,FamilyIndex]=FormatData('toydata.xls',6,4)
- The previous command formats the Excel file.
Note that there are three inputs to the function FormatData, the first is the name of the file where our
data is, the second is the number of different environments (here it is 6), and the last is the degree of
the polynomial to fit (here we chose to fit a polynomial of degree 4).
There are several outputs to this function. The outputs data,env, pinit, yinit, and FamilyIndex are the
inputs to the following command (see further for details). You will find in your folder a postscript file
"alldata.ps" with exploratory graphics of this data.
3.Analyzing the data
--------------------- Write the following command in the matlab command line
[wmhval,poptim,SSE,Orig,Fmean,RSSm,RSSw,RSSh,RSSE,k]=fitpolywmh(data,env,pinit,yinit,200,FamilyIn
dex)
- Note that almost all the inputs of this function were outputs of the previous command. The input 200
is the maximum number of iterations for the optimization function (the optimization is usually done
before this maximum is reached). This command might take some time to finish.
- Check the postscript file fig3dir.ps for figures of the results.
- To check the value of each output. Type the name of each variable on the command line or check the
workspace window in matlab.
- The decomposition of variation is given by the four outputs: RSSm (for the horizontal shift), RSSw (for
the generalist specialist), RSSh (for the vertical shift), and RSSE (error).
For our toy data, those percents are respectively:
RSSm = 35.07 %
RSSh = 26.92 %
RSSw = 13.66 %
RSSE = 24.35 %
- The output wmhval has the fitted values of the three parameters w ,m, and h in the model
(parameterizing the generalist-specialist, the horizontal shift and the vertical shift respectively).
For our toy data, here are the values:
wmhval =
1.2006
1.4306
1.4137
1.2392
33.1665
33.3350
35.4189
33.0777
-0.1962
-0.2212
0.1462
-0.2636
1.0887
0.8779
0.9979
0.7948
0.7667
1.0197
1.0791
33.8656
35.2494
33.4257
38.8916
38.8168
48.0592
45.8589
-0.1923
-0.1068
-0.1326
-0.0633
-0.0616
0.5238
0.5677
- The output poptim has the fitted values of the optimized common shape polynomial.
For our toy data, here is the polynomial of degree four
-0.00000846424275*x^4 -0.00056711092039 * x^3 -0.01072828181921 *x^2 + 1.51403316100273
##############################################
List of all the matlab functions in this folder
##############################################
1. function fitpolywmh
%Rima Izem's function
%Finds the common template shape polynomial coefficients of curves of common shape as well as the
width w, location of max m, height h of the least square fit for each curve.
%----------%Input:
%----------%data (d*n matrix) are values of traits (d = # of environments, n= # of families, individuals or
genotypes.)
%env (d column vector) are values of the different environments.
%pinit (d' row vector) are the coefficients to optimize of the d'-degree-polynomial shape
%(This excludes the coefficients of monome 'x' b/c it is set to 0).
%It initiates the search for the common shape polynomial.
%Note that pinit is ordered such as pinit(1) is the coefficient of the highest
%degree and pinit(end) is the constant of the polynomial.
%yinit (n*3 matrix) are initial wmh coefficients.
%Each row i has the initial values for the parameters [w m h] (1*3 vector) of family i.
%iterate is an integer, it is the max number of iterations allowed by the user.
%FamilyIndex (n*2 matrix), the first column has the family number and the
%second column has family sample sizes.
%--------%Output:
%--------%wmhval (n*3 matrix) are fitted wmh coefficients.
%poptim ((d'+1) row vector) are coefficient of the optimized common polynomial
%SSE (scalar) is the sums of squared errors of the fit.
%Orig (2 row vector) is the optimal origin in the 2-dim manifold (w,m)
%Fmean (2 row vector) is the optimal center of var. in the 2-dim manifold (w,m)
%RSSh (scalar) ratio of sum of squares along the vertical shift on total sum of
%squares (i.e RSSh is the contribution of the vertical shift)
%RSSm ratio of sum of squares along the horizontal shift on total sum of
%squares (i.e RSSm is the contribution of the horizontal shift
%RSSw (scalar) ratio of sum of squares along the generalist-specialist on total sum
%of squares (i.e RSSw is the contribution of the generalist-specialist).
%RSSE (scalar) ratio of sum of squared errors in the model
%k (integer) is the total number of iterations before convergence.
%USES functions: errowmh, and errorp, SS12, fitplot, distw,distm,dist1,dist2, sqdist12,
%Copyright Rima Izem (2004)
2. function FormatData
%Rima Izem's function
%Format data to be used by the function fitpolywmh
%Inputs:
%d is the number of environments (In Caterpillar example, d=6).
%'nameofdatafile.xls' is the name of the Excel data set.( In caterpillar example,
'nameofdatafile.xls'='caterpillar.xls')
%individual data:
%first column has family labels, it has to be integers (no missing values).
%second column has environments values (it has to be numbers)
%third column has trait values (only one trait is analyzed by this method)
%Note that the data in each column are given individual by individual, ie in the growth rate column, we
find first the values of trait
% of individual 1 at all environments, then the values of the 2nd individual at all environments, ...etc.
Note also that the order of the environments
% has to be the same and the dimension should be the same
%missing values replaced by NaN
%Polyd(integer) degree of the polynomial
%Outputs:
%graphics in alldata.ps
%(data,env,pinit,yinit,iterate,FamilyIndex)
%copyright rima izem 2004
3. function fitplot
function mindata=fitplot(env,data,Familyindex,wmh,p,graph)
%Rima izem's function
%this function plots the outcome of the optimization
%Input:
%env (d column vector), are the values of different environment variable.
%data (d*n matrix), are the traits of n families at d environments.
%Familyindex (n*2 matrix), the first column has the family number and the second column has family
sizes
%wmh (n*3 matrix) are the optimized [w,m,h] for n families.
%p (d' row vector) are the optimized polynomial coefficients (from highest degree coeff to lowest
degree coeff)
%graph (integer) is the integer higher or equal than n/16
%Output:
%graphics in fig3dir.ps
%copyright rima izem 2004
4. function dist1
dist = dist1(Ow,Om,w1,w2,m1,m2,env,p,pas)
%Rima izem's function
%Finds the ditance btw two points given a fixed origin along the curves which cross the points.
%Input:
%Om, Ow (scalars) are location and max and width paramereters of the fixed origin.
%m1, m2 (scalars) are location of max parameters of point 1 and 2 respectively.
%w1, w2 (scalars) are positive width parameters of point 1 and 2 respectively.
%env (d*1 column vector) are environments values
%p (1*d' row vector) is the vector with polynomial coeff (left to right: coeff of higher degree to coeff of
lower degree)
%pas (integer) is the precision of the linear approximation
%Output:
%dist (scalar): distance btw two points along the curves of variations crossing these points.
%uses:
%dw=distw(w1,w2,m,env,p,pas);
%dm=distm(m1,m2,w,env,p,pas);
%copyright rima izem 2004
5. function errorp
wee = errorp(polycoeff,wmh,data,env,samplesize)
%Rima izem's function
%This function finds the weighted errors of fitting the data to a common shape polynomial and
individual parameters wmh. It is the same function as errorwmh with reordered input arguments.
%Parameters:
%wmh (k*3 matrix) are values [w,m,h], w is the width parameter, m is the location parameter and h is
the height parameter.
%data (d*k matrix) are the trait of m families at d environments.
%env (d column vector) are the different environment values.
%polycoeff (d' row vector) are the coefficients to optimize of the polynomial shape (This excludes the
coefficients of monome 'x' b/c it is set to 0). .
%samplesize (m column vector) are relative sample sizes of m families (relative sample size of family i =
sample size of family i divided by the sum of all sample sizes).
%Output:
%wee ((d*k) row vector) are weighted errors, error = difference between data and polynomial with
three parameters w,m,h
%copyright rima izem 2004
6. function errorwmh
wee = errorwmh(wmh,data,env,p,samplesize);
%Rima izem's function
%This function finds the weighted errors of fitting the data to a common shape polynomial and
individual parameters wmh.
%Parameters:
%wmh (k*3 matrix) are values [w,m,h], w is the width parameter, m is the location parameter and h is
the height parameter.
%data (d*k matrix) are the trait of m families at d environments.
%env (d column vector) are the different environment values.
%p (d' row vector) are the coefficients to optimize of the polynomial shape (This excludes the
coefficients of monome 'x' b/c it is set to 0).
%samplesize (m column vector) are relative sample sizes of m families (relative sample size of family i =
sample size of family i divided by the sum of all sample sizes).
%Output:
%wee ((d*k) row vector) are weighted errors, error = difference between data and polynomial with
three parameters w,m,h
%copyright rima izem 2004
7. function SS12
function [Orig,Fmean,SSm,SSw,SSh] = SS12(wmhval,samplesize,env,p,pas)
%Rima Izem's function
%This function decomposes the total variation in the data
%Input:
%Orig (2 row vector)
%Fmean (2 row vecor)
%umhval (d*3 matrix)
%samplesize (n column vector) are relative sample sizes of m families (relative sample size of family i =
sample size of family i divided by the sum of all sample sizes).
%env (d column vector)
%p (d' row vector) polymomial coefficients (from highest coefficient to lowest coeff)
%pas (integer) precision of the linear approximation of the arc distance
%Output:
%SSh sums of squares along the vertical shift
%SSm sums of squares along the horizontal shift
%SSw sums of squares along the generalist-specialist.
%Copyright rima izem 2004
%Uses functions sqdist12, distw, distm
8. function dist2
function dist = dist2(Ow,Om,w1,w2,m1,m2,env,p,pas)
%Rima izem's function
%Finds the distance btw two points given a fixed origin along the curves which cross the origin.
%Input:
%Om, Ow (scalars) are location and max and width paramereters of the fixed origin.
%m1, m2 (scalars) are location of max parameters of point 1 and 2 respectively.
%w1, w2 (scalars) are positive width parameters of point 1 and 2 respectively.
%env (d*1 column vector) are environments values
%p (1*d' row vector) is the vector with polynomial coeff (left to right: coeff of higher degree to coeff of
lower degree)
%pas (integer) is the precision of the linear approximation
%Output:
%dist (scalar): distance btw two points along the curves of variations crossing the origin.
%uses:
%dw=distw(w1,w2,m,env,p,pas);
%dm=distm(m1,m2,w,env,p,pas);
%%copyright rima izem 2004
9. function distm
%Rima izem's function
%This function finds the distance between two points (w,m1) and (w,m2) along the "m "manifold.
%Inputs:
%m1, m2 (scalars): are location of max parameters for points 1 and 2.
%w (scalar): width parameters for both points.
%env (d*1 column vector): environment values.
%p (1*d' row vector): vector of polynomial coeff
%pas (integer): precision of the linear approximation
%Output:
%dm: distance between two points along the horizontal shift curve.
%copyright rima izem 2004
10. function distw
function dw=distw(w1,w2,m,env,p,pas);
%Rima izem's function
%Finds the distance along the change in width curve, i.e distance between two points (w1,m) and
(w2,m) along the "w "manifold.
%Inputs:
%w1,w2(scalars) are positive width parameters
%m (scalar) is location of max parameters
%env (d*1 column vector) values of the environment
%p (1*d' row vector) polynomial coeff (left to right: coeff of highest degree to coeff of lowest degree)
%pas (integer): precision of the linear approximation
%Output:
%dw: distance btw two points along the w manifold.
%copyright rima izem 2004
11. function dpolyn
function dp = dpolyn(p)
%This function find the coefficients of the derviative polynomial
%Input:
%p ( row vector) are coefficients of a polynomial
%Output:
%dp (row vector) are coefficients of the derviative of the polynomial p
Download