Applying PCA-LDA

advertisement
IRootLab Tutorials
PCA-LDA
Julio Trevisan
30/Nov/2012
This document is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Contents
Contents ............................................................................................................................................................................ 1
Loading data...................................................................................................................................................................... 1
Mean-centering the dataset ............................................................................................................................................... 2
Applying PCA-LDA ......................................................................................................................................................... 2
Visualization 1: Scores plot .............................................................................................................................................. 3
Visualization 2: Cluster vectors ........................................................................................................................................ 4
References ......................................................................................................................................................................... 5
Loading data
This tutorial uses Ketan’s Brain data[1], which is shipped with IRootLab.
1. At MATLAB command line, enter browse_demos
2. Click on “LOAD_DATA_KETAN_BRAIN_ATR”
3. Click on “objtool” to launch objtool
3
2
Mean-centering the dataset
This step is necessary to make the zero coordinate to appear in the centre of the scores plot.
4. Click on Apply new blocks/more actions
5. Click on Mean-centering
6. Click on Create, train & use
4
5
6
Applying PCA-LDA
7. Click on ds01_meanc01
8. Click on Cascade
9. Click on PCA-LDA
10. Click on Create, train & use
8
7
9
10
13. The PCA parameter window opens. Change the number of factors, if needed. If not, leave the default number
14. 10. Click on OK
15. The LDA parameter window opens. No need to change anything here, just click on OK
Visualization 1: Scores plot
16.
17.
18.
19.
Click on ds01_meanc01_pcalda01
Click on vis
Click on 2D Scatterplot
Click on Create, train & use
16
17
18
19
20. In the parameters window that opens, you may specify the factors that you want to see (i.e., LD1, LD2, LD3 etc)*.
You can also put a list of more than two factors. In this case, more plots will be generated (one for each
combination of two factors). Examples: [1, 2]; [1, 2, 3], [2, 3], [2, 4]
21. You may also tell confidence ellipses to be drawn. 0.9 below means 90%. You can specify more than one ellipse,
e.g., [0.7, 0.8, 0.9]
22. Click on OK.
*
For this dataset, only LD1 and LD2 are available because the dataset has 3 classes only.
The following figure should appear:
Visualization 2: Cluster vectors
23. Click on Block
24. Click on cascade_pcalda01 (this is the block that transformed the dataset and internally contains the loadings
matrix that will be used in the cluster vectors calculation)
25. Click on Cluster Vectors
26. Click on Create, train & use
23
24
25
26
27. Input dataset needs to be the same one that was inputted into the PCA-LDA before (i.e., ds01_meanc01).
28. Index of class to be the origin may be left as 0 for the classic cluster vectors[2], or the index of a class may be
specified (e.g., the index of the Control class) for the alternative version[3]. In the latter case, a class mean will be
taken as the origin of the space and its corresponding cluster vector will be a flat line.
29. Click on OK
30. Dataset for hint (optional) If specified, a dashed black spectrum is drawn on the background of the cluster
vectors plot. The objective is to help with the biochemical interpretation of the cluster vectors.
31. Click on OK
The following figure should appear:
References
[1]
K. Gajjar, L. Heppenstall, W. Pang, K. M. Ashton, J. Trevisan, I. I. Patel, V. Llabjani, H. F. Stringfellow,
P. L. Martin-Hirsch, T. Dawson, and F. L. Martin, “Diagnostic segregation of human brain tumours
using Fourier-transform infrared and/or Raman spectroscopy coupled with discriminant analysis,”
Analytical Methods, vol. 44, no. 0, pp. 2–41, 2012.
[2]
F. L. Martin, M. J. German, E. Wit, T. Fearn, N. Ragavan, and H. M. Pollock, “Identifying variables
responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a
biological sample.,” J. Comput. Biol., vol. 14, no. 9, pp. 1176–84, Nov. 2007.
[3]
V. Llabjani, J. Trevisan, K. C. Jones, R. F. Shore, and F. L. Martin, “Binary mixture effects by PBDE
congeners (47, 153, 183, or 209) and PCB congeners (126 or 153) in MCF-7 cells: biochemical
alterations assessed by IR spectroscopy and multivariate analysis.,” Environ. Sci. Technol., vol. 44, no.
10, pp. 3992–3998, May 2010.
Download