DCellIQ Software Guide First Edition 1 I. Introduction Welcome to the DCellIQ Software Guide. DCellIQ is an open-source software system for cellular image segmentation, multiple objects tracking, feature extraction, and classification (identification). It is extremely user-friendly. You can find updated information from the website: http://www.cbi-tmhs.org/Dcelliq/downloading.html Currently, DCellIQ focuses on the quantitative analysis of time-lapse cellular image sequences. The system consists of four major components: image segmentation, multiple objects tracking, feature extraction, and classification. In the segmentation component, the cells/nuclei are detected and segmented; then the segmented objects are tracked over time. The single cells/nuclei are represented by a 211 dimension feature vector. Finally the cell phases can be identified by the online and offline Support Vector Machine (SVM) classifiers, respectively. II. Installation 2.1 Download. The DCellIQ software is written in the Matlab computer language subset because of its ease of use. 2.2 Organization. The DCellIQ software consists of two independent interfaces: DCellIQ and DCellIQui (user interaction). The DCellIQ interface is used to automatically process the input time-lapse image sequences. The DCellIQui interface is used to manually correct the classification/identification errors, collect training data sets, and generate the new problem-specific classifiers that are needed in the DCellIQ interface. 2 2.3 Installation. Decompress the .rar file into a directory, e.g. c:\Program Files\Matlab7.1\Toolbox\Dcelliq\. When running the DCellIQ system for the first time, users will need to: (1) Change the ‘current directory’ of Matlab to the directory where the system will be stored, e.g. c:\Program Files\Matlab7.1\Toolbox\Dcelliq\, or add it to the Matlab path as: addpath(‘c:\Program Files\Matlab7.1\Toolbox\Dcelliq\’). (2) a. Input ‘DCellIQ’ in the command window of Matlab. b. Input ‘DCellIQui’ in the command window of Matlab. For future use, users just need to directly input ‘DCellIQ’ or ‘DCellIQui’ in the command window of Matlab. III. User’s Guide DCellIQ software can detect and count cells and cell nuclei, segment images, track cell movement and changes, quantify cell features/changes, and classify cells and their specific features and changes. The system can achieve all of these tasks efficiently, process the results, and automatically store them in one directory that can be easily accessed or viewed. All of the detection, segmentation, tracking, feature extraction, and classification steps are integrated together as a pipeline. The user simply specifies the location of the raw data directory which will enable batch processing (Note: No matter how many data sets, it will process them sequentially and store them the same way). In addition, users must generate a problem-specific classifier for their specific study to use the classification function. 3.1 Usage of DCellIQ 3.1.a Interface of DCellIQ. Input ‘DCellIQ’ in the command window and press 3 Enter, the DCellIQ interface will appear (Fig. 1). Figure 1: Interface of DCellIQ (version Beta V1.0). 3.2 Processing data using DCellIQ 3.2.a Obtain the directory of the data. To process data, the user must specify the location of the raw data directory using the menu ‘File->GetDirectory’. (Users can download the ‘test data’ from: http://www.cbi-tmhs.org/Dcelliq/downloading.html . In the test data, there are two data sets: “…\testdata\test1” and “…\testdata\test2”). Since the software has batch process ability, users can put all data sets under one directory, e.g. …\testdata\, then use the menu ‘File->GetDirectory’ to find the directory. 3.2.b Processing the data. To begin processing data, the user can use the menu: ‘Image Processing -> Processing’. Then a sub-window, as seen in the following window (Fig. 2.1), will pop up in which the user can choose the tasks to perform after the segmentation: tracking, feature extraction and classification. If the user wants to choose one operation, they need to input ‘Y’. ‘Yes’, ‘yes’, or ‘y’ will also be recognized. To skip one operation, enter ‘N’, ‘No’, ‘no’ or ‘n’ instead. Every task (detection, segmentation, tracking, feature extraction and classification) will then be 4 done automatically. Figure 2.1: The users can choose or skip some operations by inputting ‘Y’ or ‘N’. 3.2.c Parameters Setting. The user must set specific parameters for specific images (Fig. 2.2). For the details, please refer to References [2,3]. 5 Figure 2.2 Parameter Setting interface 3.3 Checking the processing results using DCellIQ. The user can monitor the processing results using the DCellIQ interface, Note that all the processing results are stored automatically in the same directory as the original data sets, and the name is ‘…_Res’, e.g. ‘…\TestData\data1_Res’. 3.3.a Load the processed data. To check the processed results, the user must load the processed data results first using the menu: ‘Check Processing Results -> Load Data’. Find the processed results in the same directory as the original data sets, and choose the results directory of the specific data set that is sought, e.g. ‘…\TestData\data1_Res’. The data set will be displayed as below (Fig. 3). 6 Figure 3: Snapshot of DcellIQ Interface after loading the processed data. Objects are numbered using green numbers, and their boundaries are delineated using red color. 3.3.b Check and correct the segmentation results. 1) Check the segmentation results. After loading the processed results, the segmentation results can be viewed via inputting specific frame numbers in the Frame ID box and pressing Enter (Fig. 4). The blue arrow button can also be used to view the previous and next frame (Fig. 4). 7 Figure 4: Validation of segmentation results. 2) Correct the segmentation results Merge the over-segmentation (Fig. 5) Step1. Click the ‘Merge’ radio button. Step2. Input the cell IDs to be merged, e.g., 15, 16; 33, 35;, in the pop-up window. Note: the different merging pairs are separated by a semicolon ‘;’. Step3. Click ‘OK’. Fig 5 shows an example of the pop-up window. 8 Figure 5: Example of interface of merging cells. 3) Split the under-segmentation (Fig.6). Step1: Step2: Activate the ‘Selecting Seeds’ radio button. Move the mouse to the under-segmented cells. To separate one under-segmented cell into N-part, select N-seed for each of them. To achieve this, click the rough center and then click the ‘add seeds’ button one by one. Blue star signs ‘*’ will appear at the seed point selected. Note: users can add as many centers as possible. Step3: Click the ‘Split’ button to finish the splitting process. 9 Figure 6: Example interface of splitting cells. 4) Check the general attributes of a single object. Inputting the Cell ID in the Cell ID box, the user can obtain some basic information of the single object, as seen in the blue boxes of Fig. 7. 10 Figure 7: View the basic information of the selected single object. Avg Int: average intensity; Max Int: maximum intensity; Dev Int: standard deviation of the intensity; Area: size of object (number of pixels); PerCov: perimeter of the convex image; Perimeter: perimeter of the object; Compact: compact = (4*pi*Area)/(perimeter^2); AxRatio = LongAx/ShortAx; LongAx: the longer axis; ShortAx: the shorter axis. 5) Check the tracking results (Fig. 8). 1) All of the numbers (the number outside the parentheses is the cell ID in the last frame, and the number inside the parentheses is the cell ID in the first frame) of traces are listed in the TraceID box. The user can track one of them by inputting the number of trace into the Imaging Interval box at bottom, and pressing Enter. The object in each frame is labeled by a red star. The showing speed of the frames can be controlled by inputting a number in the Show Interval box (e.g. 0.02 (Seconds)). Also, the number of frames and number of objects in the frame can be viewed by inputting a number in the FrameID and Cell ID boxes, respectively. In addition, to check the migration of one object in a specific frame, use the red arrow to manually track the object after inputting the trace number and pressing Enter. 11 Figure 8: Validation of tracking. 2) View the tree structures of the cell cycle via inputting the cell number in the first frame (the trace Ids inside the parentheses) in the Draw trace tree box. Some representative cell cycle tree structures are provided in Fig. 9. Figure 9: Tree structures of cell cycles. The red numbers in the notes are the cell IDs in the corresponding frames. The black numbers are the cell division time points. The red numbers are the corresponding trace IDs. 12 IV. DCellIQui 4.1 Usage of DCellIQui. The DCellIQui interface is used to validate and correct the classification results. In addition, it is used to generate the users’ own SVM classifiers or update the existing SVM classifiers. 4.2 Interface of DCellIQui. After inputting ‘DCellIQui’ in the command window of Matlab, the users can see the interface of DCellIQui as in Fig. 10. Figure 10: Interface of DCellIQui. 4.3 Validate or correct the classification (phase identification) results. A. Load processed data. To validate the classification results, the user must load the processed data first. Use the menu: ‘File LoadData’ to download the processed data same as in the DCellIQ interface, e.g. ‘…\TestData\ data_1_Res\. B. Validate and correct the classification results. Click the number of frames in the 13 right list box, the users can view the classification results. Different classes (phases) will be denoted by different color boundaries, as seen in following figure. To correct the classes of cells, the user can click the cell using the mouse, a ‘star’ will appear on the top of the cell, then use the buttons at the right of the interface to change the phase of the selected cell. To save the correction results, the user will need to click the blue ‘Save Phase Correction’ button. Figure 11: Interface of DCellIQui after loading the processed data. The phases (or classes) of cells are represented via different colors. In the example image, the cells with red color boundaries are interfaces; the green color denotes the Prophase; the cyan color denotes the Metaphase; the blue color represents the Anaphase, and the purple color means the cells are classified into bad cell classes which will be thrown out. The other five classes have not been defined. 14 Generate your own SVM Classifier (Fig. 12) A. Build the training data set. To build your own SVM Classifier, a training data set is needed. The DCellIQui provides a convenient way for users to build a training data set as four steps: (1) Choose one frame, correct the cells’ classes as introduced in the Section 2.2. (2) Click the “Select Cells” Radio button at the bottom. The number of cells will be labeled automatically using the green numbers. (3) Input the cell numbers you want to choose in the bottom edit box. (4) Push the Save Selected Cells button to save the selected cells. Note: Cells can be selected in any frame, in any data set by using above four steps. Figure 12: Related controls of building a training data set. B. Build your own SVM Classifier. After building a training data set, a specific SVM classifier can be set as follows: 15 (1) Click the Select Features radio button at bottom. (2) Input the indexes of selected features in the edit box at bottom. (Remember there are 211 features total. Features Gabor(W) CDF (w) Geometry Moments Texture Shape number 70 15 11 48 13 54 index 1-70 71-85 86-96 97-144 145-157 158-211 For the details of these features please refer to the references [1]. (3) Find the training data set and build the SVM Classifier via the menu: ‘Offline SVM -> Offline SVM Training’ (Fig. 13) Figure 13: Related controls of building the SVM classifier. C. Reclassification via new generated SVM classifier. Users can reclassify the cells via the menu: ‘Offline SVM -> Phase Identification (offline)’. Online update the SVM classifier. Users can update the existed SVM classifiers, but the users cannot change the features used in the existed SVM 16 classifiers. To update the SVM classifier, users need to: (1) Save the information of cells whose phases are corrected via the menu: ‘Online SVM -> Save Corrected Cells’. (2) Find the saved corrected cells’ information and update the existed Online SVM Classifier via menu: ‘Online SVM-> Online SVM training’. Users can reclassify the cells via the updated Online SVM classifier via the menu: ‘Online SVM -> Phase Identification (online). 4.6 Replace SVM Classifier Used in the DCellIQ Interface. After building or updating their own SVM classifiers, users can manually update the SVM classifiers used in the DCellIQ interface. Please note: (1) The SVM classifier used in DCellIQ is stored at the directory: ‘CodesDir\SVM_Info\DCSVM.model, restore and SelFeaInd.mat’. Where the ‘CodesDir’ is the directory where the users store the program codes of DCellIQ. (2) Users new generated SVM classifier information is stored at the directory: ‘CodesDir\SVM_Info\OfflineSVM\OfflineSVM.model, restore.txt and SelFeaInd.mat’. (3) Users updated SVM classifier information is stored at the directory: ‘CodesDir\SVM_Info\OnlineSVM\OnlineSVM.model, restore.txt and SelFeaInd.mat’. If the user wants to replace the current SVM classifier of DCellIQ, i.e. DCSVM.model, restore.txt and SelFeaInd.mat, they must copy the new generated or updated SVM classifier information to the directory ‘…\CodesDir\SVM_Info\’ to replace the existing one (NOTE: rename the OfflineSVM.model or OnlineSVM.model into DCSVM.model). To update the SVM classifier online, there must be an existing SVM classifier in the directory: ‘CodesDir\SVM_Info\OnlineSVM\’. The user can generate their first 17 OnlineSVM.model by copying the OfflineSVM.model or DCSVM.model to that directory. (Note: rename the SVM classifier into OnlineSVM.model). V. Introduction of the Processed Data Suppose that ‘RawImageDataDir’ denotes the directory where the raw image data are stored. The processed data are automatically stored at the same directory, and they can be distinguished by their names which contain ‘_Res’. Each ‘_Res’ corresponds to one data set (image sequence), and there are nine or ten sub-directories, as seen in Fig. 14. Figure 14: Organization of the processed data. 5.1 ResIntNor. In this directory, the intensity normalized images are stored. They are stored in ‘.mat’ files. The users can read them using the Matlab ‘load’ function. 5.2 ResSeg. In this directory, the segmentation results are stored. They are stored in ‘.mat’ file. Each segmentation result image is a matrix in which each object is represented by a number, e.g. 10. The number of ‘0’ denotes the background. The maximum number also means the number of objects in that frame. The users can read them using Matlab ‘load’ function. 5.3 ResCelMat. In this directory, the association matrix of objects in two consecutive frames is stored. If the value of element (i,j) is true, the i-th object in frame t is associated with the j-th object in frame (t+1); False means there is no association between them. They are stored in ‘.mat’ files. The users can read them 18 using the Matlab ‘load’ function. 5.4 ResCen. In this directory, the coordinate of objects’ mass centers in each frame are stored. They are stored in ‘.mat’ files. The users can read them using the Matlab ‘load’ function. 5.5 ResFea. In this directory, the 211 features of each object in each frame are stored. They are stored in ‘.mat’ files. The users can read them using the Matlab ‘load’ function. 5.6 ResPha. In this directory, the phase (class) information of each object in each frame is stored in a matrix entitled ‘Phase.mat’. In this matrix, each column represents a frame. The element (i,j) denotes the phase of the i-th object in j-th frame. It is stored in ‘.mat’ files. The user can read them using the Matlab ‘load’ function. If the users have corrected the phase identification results and saved them, there will be another file entitled ‘Phase_Manual.mat’ which records the corrected phase information. 5.7 ResTra. In this directory, the tracking results are stored in a file entitled ‘TraceResult.mat’. There are several files inside it. The ‘TraceAll.mat’ file is a matrix in which each row is trace, and its (i,j) element denotes the object (Cell) ID of the i-th trace in j-th frame. The ‘LabelAll.mat’ contains the phase information corresponding to the objects in the ‘TraceAll.mat’. The ‘TraceIdAll’ contains the Ids of traces. 5.8 ResPro. In this directory, there are four files: Celltrace_1.csv, Celltrace_2.txt, Celltrace_3.csv and Celltrace_4.csv. The ‘Celltrace_1.csv’ records the time in each phase of the object in every trace, the ‘Celltrace_2.txt’ records the phase information of the object in each trace, the ‘Celltrace_3.csv’ records the feature information of the object in each trace, and the ‘Celltrace_4.csv’ records the traces which experience at least one entire cell cycle, e.g. from the beginning of anaphase to the end of 19 metaphase. ‘Celltrace_4.csv’ also records the time in each phase of the object. 5.9 ResMan. If the users have used the DCellIQui interface and saved the corrected results using the menu: ‘Write Result -> Save as Excel’, there will be another sub-directory: ‘ResMan’. Its content is same as the sub-directory “4.8 ResPro’. However, the information has been corrected manually. If there is any problem with the software, please find solutions at: http://www.cbi-tmhs.org/Dcelliq/downloading.html . Mar, 2008 20 References: 1. Li F, Zhou X, Ma J, Wong STC, “Optimal Multiple Nuclei Tracking Using Integer Programming for Quantitative Cancer Cell Cycle Analysis,” IEEE transactions on medical imaging, 29(1):96-105, 2010. 2. Meng Wang, Xiaobo Zhou, Fuhai Li, Jeremy Huckins, Randy W. King, Stephen T.C. Wong, “Novel cell segmentation and online SVM for cell cycle phase identification in automated microscopy”, Bioinformatics, 24(1):94-101, 2008. 3. Fuhai Li, Xiaobo Zhou, Stephen T.C. Wong, “Novel Nuclei Segmentation and Cell Phase Identification Using Markov Model”, International Symposium on Computational Models for Life Sciences (CMLS), December 17-19, 2007, Gold Coast, Queensland, Australia. 4. Fuhai Li, Xiaobo Zhou, Jinmin Zhu, Jinwen Ma, Xudong Huang and Stephen T.C. Wong, “High content image analysis for H4 human neuroglioma cells exposed to CuO nanoparticles”, BMC Biotechnology, 7:66, 2007. 5. Meng Wang, Xiaobo Zhou, Randy W. King, Stephen T.C. Wong, “Context based mixture model for cell phase identification in automated fluorescence microscopy”, BMC Bioinformatics, 8:32, 2007. 6. Xiaowei Chen, Xiaobo Zhou, Stephen T.C. Wong, “Automated segmentation, classification, and tracking of cancer cell nuclei in time-lapse microscopy”, IEEE Transaction on Biomedical Engineering, 53, 762- 766, 2006. 7. Zhou XB and Wong STC "High content cellular imaging for drug development ", IEEE Signal and Processing Magazine, 23(2):170-174, 2006. 8. Jun Yan, Xiaobo Zhou, Qiong Yang, Ning Liu, QianSheng Cheng, Stephen T. C. Wong: An Effective System for Optical Microscopy Cell Image Segmentation, Tracking and Cell Phase Identification. ICIP 2006: 1917-1920 21