Classification of Protein Localization Patterns in 3-D Meel Velliste Carnegie Mellon University Introduction • Need a Systematics for Protein Localization • Need Microscope Automation • Feature based classification of Localization Patterns • Pioneering work done with 2D images • Now exploring classification of 3D images Ten Major Classes of Protein Localization Features • Derive Numeric Features based on: – Morphology – Texture – Moments feature1 feature2 ... featureN Image1 0.3489 0.1294 ... 1.9012 Image2 0.4985 0.4823 ... 1.8390 ... ... ImageM 1.8245 0.8290 ... 0.9018 Classification • Tried: – Classification Trees – kNN – BPNN • BPNN was the most successful with 84% correct classification rate This is a cyto-skeletal protein Results of 2-D Classification True Class DNA ER Giantin GPP130 LAMP2 Mitoch. Nucleolin Actin TfR Tubulin Output of Classifier DN ER Gia GP LA Mit Nuc Act TfR Tub 0 0 0 0 0 0 1 0 98 1 0 87 2 0 1 5 0 0 1 3 0 0 84 12 1 1 1 0 1 0 0 0 20 72 1 2 3 0 2 0 0 0 5 1 74 0 3 0 15 2 0 8 1 0 0 81 0 0 5 5 0 0 0 1 1 0 98 0 0 0 0 0 0 0 0 1 0 96 1 3 0 2 2 0 18 4 0 2 65 7 0 2 1 0 2 7 0 1 5 84 Overall accuracy = 84% Motivation for 3-D Classification • Cells are 3-dimensional objects • 2-D images take a slice through the cell • Resultant images are largely dependent on the z-position of the slice • Losing a lot of 3-D structural information The Approach • Acquire a set of 3-D images for the same 10 classes as used in the 2-D work (have 5 now) • Calculate equivalent features to what was used with the 2-D images • Compare performance 3-D Classification • Used a subset of the same Morphological features as used with 2-D patterns: – – – – – – – – Number of Objects Euler Number Average Object Size Standard Deviation of Object sizes Ratio of the Largest to the Smallest Object Size Average Distance of Objects from COF Standard Deviation of Object Distances from COF Ratio of the Largest to Smallest Object Distance 3-D Classification Results True Class DNA ER Giantin GPP130 LAMP2 Mitoch. Nucleolin Actin TfR Tubulin Output of Classifier DN ER Gia GP LA Mit Nuc Act TfR Tub 0 0 0 0 99 0 0 1 97 54 0 2 45 0 0 0 82 0 0 16 2 0 0 4 95 Overall accuracy = 84% (95% with GPP=Giantin) 2-D Results — Same 8 Features True Class DNA ER Giantin GPP130 LAMP2 Mitoch. Nucleolin Actin TfR Tubulin Output of Classifier DN ER Gia GP LA Mit Nuc Act TfR Tub 0 0 1 0 99 0 1 1 47 41 7 47 57 1 5 2 89 1 0 3 0 0 0 4 95 Overall accuracy = 84% (95% with GPP=Giantin) Conclusion • Further work needed to determine if there is any advantage to using 3D images over 2D images • Need to design new features to take advantage of extra information in 3D images Acknowledgements • Elizabeth Wu - acquired the 3-D image set • Michael V. Boland & Robert F. Murphy pioneering work on 2-D images