The simplest way to obtain allele- and parent

advertisement
First, note that PLASQ only works on Affymetrix 100K Set .cel files that are ASCII
text. Binary format files may be converted (version 4 to version 3) using
Affmetrix’ CEL File Conversion Tool, available
at http://www.affymetrix.com/support/developer/tools/devnettools.affx
The simplest way to obtain allele- and parent-specific copy number (as defined in
the paper) is to proceed as follows. I am assuming that there are something like
8-15 normal diploid samples to "calibrate" the model with (less than this MAY be
too few to obtain accurate parameter inferences; more than this MAY cause
memory problems in R), in addition to the test samples you want to analyze. I
am also assuming that you are familiar and comfortable with R.
1) Put the .cel files from the normal samples into two separate directories, say
HND and XND, one for the Hind files and the other for the Xba files. If two
different files are from the same sample (i.e. one for Hind and one for Xba), they
should have EXACTLY the same name.
2) Put the .cel files from the test sample(s) into two separate directories, say HTD
and XTD, one for the Hind files and the other for the Xba files. If two different files
are from the same sample (i.e. one for Hind and one for Xba), they should have
EXACTLY the same name.
3) In R, after calling the library(PLASQ), enter the following command:
PSCN<-pscn("HTD", "XTD", "HND", "XND",
computeBetas=T,
normFile="tmp", betasFile="betas.Rdata", rawCNfile="rawCN.Rdata")
Note that here "tmp", "betas.Rdata", and "rawCN.Rdata" may be replaced by
whatever file names you choose, as may the object name PSCN.
This will take a VERY long time to run, but the output will keep you updated with
regard to how it's progressing. When it's done, PSCN will be a matrix whose
rows are the SNP sites and whose columns are: SNPID, followed by major and
minor chromosome copy numbers for the test sample(s). The SNP IDs may be
mapped to their genomic locations using the SNPinfo matrix found by entering
data(SNPinfo)
Alternatively, the PSCNs may be plotted using the command
PSCNplot(mat)
where mat is a 3-column matrix of PSCNs (as obtained with the command pscn)
whose columns are: SNPID, and minor and major chromosome copy numbers of
the test sample you wish to plot.
Note that, if you run more test samples later, you needn't recompute the
parameters, and can thereafter use the command
pscn("HTD", "XTD", "HND", "XND", betasInfile="betas.Rdata", [or whatever you
called the file above]
normFile="tmp", rawCNfile="rawCN.Rdata")
This should take less time to run.
4) If you desire (SNP) allele-specific copy numbers, enter the commands:
load("rawCN.Rdata") (or whatever you used for the rawCNfile argument above)
ASCN<-ascn(rawCN, PSCN)
where PSCN is the object obtained from the pscn command.
Now ASCN will be a matrix whose rows are the SNP sites and whose columns
are: SNPID followed by allele A and allele B copy numbers for the test sample(s).
The SNP IDs may be mapped to their genomic locations using the SNPinfo
matrix found by entering
data(SNPinfo)
Download