IRootLab Tutorials PCA-LDA Julio Trevisan 30/Nov/2012 This document is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Contents Contents ............................................................................................................................................................................ 1 Loading data...................................................................................................................................................................... 1 Mean-centering the dataset ............................................................................................................................................... 2 Applying PCA-LDA ......................................................................................................................................................... 2 Visualization 1: Scores plot .............................................................................................................................................. 3 Visualization 2: Cluster vectors ........................................................................................................................................ 4 References ......................................................................................................................................................................... 5 Loading data This tutorial uses Ketan’s Brain data[1], which is shipped with IRootLab. 1. At MATLAB command line, enter browse_demos 2. Click on “LOAD_DATA_KETAN_BRAIN_ATR” 3. Click on “objtool” to launch objtool 3 2 Mean-centering the dataset This step is necessary to make the zero coordinate to appear in the centre of the scores plot. 4. Click on Apply new blocks/more actions 5. Click on Mean-centering 6. Click on Create, train & use 4 5 6 Applying PCA-LDA 7. Click on ds01_meanc01 8. Click on Cascade 9. Click on PCA-LDA 10. Click on Create, train & use 8 7 9 10 13. The PCA parameter window opens. Change the number of factors, if needed. If not, leave the default number 14. 10. Click on OK 15. The LDA parameter window opens. No need to change anything here, just click on OK Visualization 1: Scores plot 16. 17. 18. 19. Click on ds01_meanc01_pcalda01 Click on vis Click on 2D Scatterplot Click on Create, train & use 16 17 18 19 20. In the parameters window that opens, you may specify the factors that you want to see (i.e., LD1, LD2, LD3 etc)*. You can also put a list of more than two factors. In this case, more plots will be generated (one for each combination of two factors). Examples: [1, 2]; [1, 2, 3], [2, 3], [2, 4] 21. You may also tell confidence ellipses to be drawn. 0.9 below means 90%. You can specify more than one ellipse, e.g., [0.7, 0.8, 0.9] 22. Click on OK. * For this dataset, only LD1 and LD2 are available because the dataset has 3 classes only. The following figure should appear: Visualization 2: Cluster vectors 23. Click on Block 24. Click on cascade_pcalda01 (this is the block that transformed the dataset and internally contains the loadings matrix that will be used in the cluster vectors calculation) 25. Click on Cluster Vectors 26. Click on Create, train & use 23 24 25 26 27. Input dataset needs to be the same one that was inputted into the PCA-LDA before (i.e., ds01_meanc01). 28. Index of class to be the origin may be left as 0 for the classic cluster vectors[2], or the index of a class may be specified (e.g., the index of the Control class) for the alternative version[3]. In the latter case, a class mean will be taken as the origin of the space and its corresponding cluster vector will be a flat line. 29. Click on OK 30. Dataset for hint (optional) If specified, a dashed black spectrum is drawn on the background of the cluster vectors plot. The objective is to help with the biochemical interpretation of the cluster vectors. 31. Click on OK The following figure should appear: References [1] K. Gajjar, L. Heppenstall, W. Pang, K. M. Ashton, J. Trevisan, I. I. Patel, V. Llabjani, H. F. Stringfellow, P. L. Martin-Hirsch, T. Dawson, and F. L. Martin, “Diagnostic segregation of human brain tumours using Fourier-transform infrared and/or Raman spectroscopy coupled with discriminant analysis,” Analytical Methods, vol. 44, no. 0, pp. 2–41, 2012. [2] F. L. Martin, M. J. German, E. Wit, T. Fearn, N. Ragavan, and H. M. Pollock, “Identifying variables responsible for clustering in discriminant analysis of data from infrared microspectroscopy of a biological sample.,” J. Comput. Biol., vol. 14, no. 9, pp. 1176–84, Nov. 2007. [3] V. Llabjani, J. Trevisan, K. C. Jones, R. F. Shore, and F. L. Martin, “Binary mixture effects by PBDE congeners (47, 153, 183, or 209) and PCB congeners (126 or 153) in MCF-7 cells: biochemical alterations assessed by IR spectroscopy and multivariate analysis.,” Environ. Sci. Technol., vol. 44, no. 10, pp. 3992–3998, May 2010.