Automated Segmentation and Classification of Zebrafish Histology Images for High-Throughput Phenotyping Brian Canada Academic Computing Fellow and PhD Candidate in Integrative Biosciences Jake Gittlen Cancer Research Institute | Penn State College of Medicine October 22, 2007 The zebrafish (danio rerio): A powerful functional genomics tool • • • • Vertebrate Develop tumors Hundreds of eggs per clutch Rapid, ex vivo development – Most organ systems differentiated before 7 days post-fertilization • Transparent embryos • Reverse genetics – Morpholinos for gene “knock-down” Zebrafish histology Adult zebrafish (sagittal plane view) with papilloma Zebrafish larval array hht mutant 7dpf (days post-fertilization) “High-Throughput” Zebrafish Histology Fixation Embedding in agarose Processing into paraffin Sectioning, staining, mounting onto slides The “ratelimiting step” Scoring and Annotation Digitization Scanning • What can be done to improve the speed and reliability of scoring images? • Can we score abnormalities quantitatively? Current efforts in automated zebrafish image analysis • Stephen T.C. Wong and colleagues at Harvard developed methods for quantitative assessment of neuron loss and automated detection of somites • In principle, such automated methods should be scalable to allow high-throughput phenotyping Retinal cell detection for studying neurogenesis Detection of Rohon-Beard sensory neurons Liu T.L., “A quantitative zebrafish phenotyping tool for developmental biology and disease modeling,” IEEE Signal Processing Magazine, Jan 2007. Building on interdisciplinary expertise Keith Cheng, MD, PhD Zebrafish Functional Genomics James Z. Wang, PhD Content-Based Image Retrieval, Automatic Image Annotation SHIRAZ: System for Histological Image Retrieval and Annotation for Zoopathology Creation of Virtual Slides [ Image Pre-processing IPL_Compactness = 9.8137 IPL_Eccentricity = 0.9019 IPL_Solidity = 0.3086 IPL_Contrast = 0.9375 IPL_Homogeneity = 0.0093 LENS_COMPACTNESS = 1.1262 LENS_eccentricity = 0.3530 … … Extract feature vector for each image Image segmentation Automatically classify and annotate previously uncharacterized images Use feature database to train model for image classification (K-means clustering, Classification trees, Support Vector Machine, etc.) ] Repeat for all images in database SHIRAZ: System for Histological Image Retrieval and Annotation for Zoopathology • Prototype implemented in MATLAB for segmentation and classification of eye and gut images – Eye and gut tissues have a polar or directional organization that is deformed or disrupted on mutation • To our knowledge, we are the first group to publish material on automated zebrafish histology image analysis – Canada, B.A., Thomas, G.K., Cheng ,K.C., Wang, J.Z., “Automated Segmentation and Classification of Zebrafish Histology Images For High-Throughput Phenotyping,” Proc IEEE-NIH Life Science Systems And Applications (LISSA) Workshop 2007 Image pre-processing Aperio T2 Scanner for Creation of Virtual Slides (120 slide capacity) Take snapshot of selected H&E-stained specimens in ImageScope Manually crop eye and gut images from selected larvae To reduce computational costs, convert to grayscale 512 x 512 matrix (pad with white pixels if needed) Example of wild-type eye segmentation Lens Ganglion Cell Layer (GCL) Inner Plexiform Layer (IPL) Inner Nuclear Layer (INL) Photoreceptor Layer (PRL) Retinal Pigmented Epithelium (RPE) Example of mutant eye segmentation Eye feature extraction • • • • • • • Filled area Perimeter Compactness Eccentricity Extent Solidity Fractal dimension • Seven moment invariants • Four gray level cooccurrence features: – Contrast – Correlation – Energy – Homogeneity Yields vector of 92 features per eye image Gut segmentation and feature extraction 30 features extracted per gut image, e.g.: • Thickness and shape of the epithelial lining • Polarity of the epithelial cells (position of nuclei relative to basement membrane) • Number of distinct villi (folds) of the lumen • Amount and “granularity” of cellular debris and mucous in lumen Epithelial lining Lumen Cell nuclei Classification algorithm: CART (Classification And Regression Trees) • • Advantages: – “White-box” model – Helps provide a sense of objectivity and direction to histological assessment Disadvantages: – May not be as accurate as other classification methods (e.g. SVM, GMM, ANN) – “Splits” can only be performed on one dimension at a time (not really a problem in this case) Preliminary Results # of classes Eye Images (n=79) 10-fold Leave CV one out Gut Images (n=87) 10-fold Leave CV one out Binary 90% 87% 86% 86% Three classes 85% 85% 72% 71% Five classes 72% 70% 56% 55% Discussion and Conclusions • Preliminary results are encouraging • Potential opportunities for improvement: – – – – – – Analyze different larval ages separately Improve segmentation accuracy Use color images instead of grayscale Experiment with different classifiers (SVM, for example) Minimize manual preprocessing Increase overall size of datasets • Future: – Direct integration into laboratory pipeline – Parallel image processing for higher throughput – Automatic image annotation and retrieval Current collaborators • Georgia Thomas, Graduate Student • Keith Cheng, co-PI (Functional Genomics) • James Z. Wang, co-PI (Info Science & Tech) • Prof. Yanxi Liu (PSU Computer Science dept.) • Prof. Nancy Hopkins (MIT)