UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 1. PURPOSE Acuity is a microarray database and analysis software program. Microarray data is imported, normalized, filtered, queried and clustered in Acuity 2. SCOPE This procedure provides a brief introduction to Acuity. Full detailed instructions can be viewed in Acuity under the “help” function or through the PDF Manual. Acuity is intuitive so explore its functions as you go. 3. MATERIALS AND RESOURCES Axon’s Acuity version 3.1004 Access to Acuity passwords Axon Technical Support: 1-510-675-6200 (M.DeFreitas, PhD, S. Carriedo, PhD etc) Axon’s Director for Genomics : Damian Verdnik, PhD 4. OVERVIEW OF DATA PROCESSING AND ANALYSIS IN ACUITY a. Importing data: GPR (GenePix Results) files are imported into Acuity b. Normalizing data: per chip, by Lowess, or per set of control genes in order to be able to compare data between microarrays. c. Filtering data: removing flagged data, minimum values, or spots with uneven intensities in order to analyze the quality data d. Querying data: creating datasets to display genes with differential expression e. Clustering data: PCA, SOM, Hierarchical, Kmean, Kmedians, Gap Statistic, Gene Shaving; present data in visually interpretable format. f. Various Notes on Acuity 5. OVERVIEW OF SOFTWARE: DISPLAY ORGANIZATION 1. Acuity is written in C++ language, uses MSQL database software, and a windows interface. All windows functionalities are used in Acuity. Page 1 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain The Acuity “Data Window” has the following substructure: Data Pane, Substance Properties Pane, and Views Pane. 2. In general, you will pick a task in the “Common Tasks” window. See list below: Page 2 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain 3. You will apply this task to data that you organize in the “Project Tree” . See example of microarrays organized in the “Project Tree” below. The three tabs in “Project Tree” are Microarrays, Datasets, and Quick Lists. You import microarrays and organize them under the “Microarrays” tab. You create datasets from your microarrays and organize the datasets under the “datasets” tab. You create quicklists of interesting genes and organize them under the “Quicklists” tab. Project Tree 4.When you open a microarray or a dataset, the data will be visible in the “Data Window” which has a “data table pane”, “substance properties pane”, and “views pane”. See example below. In the “Data Table Pane”, numeric data such as ratio of medians, F532-bkg, or any data from the GPR file can be viewed for numerous arrays in a dataset. In the “Substance Properties Pane”, gene descriptions, links to websites, chromosomal locations, statistics etc can be viewed for each substance or gene. In the “Views Pane”, jpg images of features, Page 3 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain experimental parameters, clustering visualizations, replicate values, reports and more can be viewed for a specific microarray or dataset of arrays. Much of the work is performed in this Data Window. Substance Properties Pane Data Table Pane Views Pane 6. IMPORTING MICROARRAYS 1. In the Microarrays tab of the Project Tree, create a new folder in which you will organize your files and then highlight the folder in the Project Tree. 2. In the “Common Tasks Window” , click on “import microarrays” and a window pops up (example below). 3. Use your file management to select on the GPR file(s) that you wish to import. You can highlight multiple GPR files and import them as a group. Click “open”. 4. A “select destination” windows pops up so select your destination folder in the Project Window and click “ok”. 5. “Select Wavelengths” window pops up so leave to default of 635 and 532. Click “ok”. 6. A progress window for importing will pop up. Importing takes approx 1 minute per microarray. Page 4 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 7.When the importing is complete, the newly imported microarray will appear in the Project Window under in your folder. See figure below: 8. The newly imported microarray will have a purple icon with a green dot to the lower left side. The green dot indicates that the jpg image of the microarray was successfully imported along with the numeric data from the GPR file. The purple color of the icon indicates that the numeric data has NOT been normalized. Also under the column “Normalizations”, “none” is indicated. 9. For new genomes, you can also import “Substance Properties” from tab delimted text files such as Excel worksheets. EBL has imported the Substance Properties for Mouse1 and Mouse2, and will do so for Rat and Mtb when we are ready to analyze Rat and Mtb data. Substance properties include all information associated with the genes, including chromosomal location, gene description, GB accession, gene symbol, Tm, Plate position, and Unigene ID. Substance properties also include the weblinks which EBL has established and the statistics which Acuity calculates if replicate spots are included in the microarray. 10. For new microarrays, you can also import “Microarray Parameters” from tab delimited text files such as Excel worksheets. Microarray Parameters could include treatment times, gender, treatment condition, animal number, replicate number, or other experimental conditions defining the microarray’s relevance to your experiment. You can also manually enter Microarray parameters, if you don’t have the data in a tab delimited text file. Page 5 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 11. To manually enter Microarray parameters, right click on your microarray file in the microarray tab within the Project Tree Window. Then left click on “properties” and then left click on “parameters”. Within the “Microarray Parameters” window, there is a drop down box of existing parameters for which you can insert a value below. Or you can click on “configure” and add new parameters that you define, by clicking on “add”. Type in a name for your new property or parameter and then select either “string, int, or float” as the data type. “String” is for text, “integer” is whole number, and “float” is for numbers with decimals. Click on “ok”, click on “close” and click on “ok” and click on “ok”. Be sure to be perfectly consistent in your parameter values or data entry, if you want to later query on microarrays by their “Micorarray parameters”. Example: if you use a number value for day of infection (2), then don’t later use the word “two”… stick to the same format for the values! 7. NORMALIZING MICROARRAYS 1. There is an excellent description of the rationale behind the various types of normalization located in Acuity under “Help” , “Acuity Manual (PDF)”, “Normalization”. So the rationales won’t be discussed here. You should carefully select a normalization method based on the anticipated variation in your microarrays and experiments. 2. In the “Common Tasks” window, click on “Normalization Wizard” and the “Microarray Normalization Wizard Window” pops open. Acuity looks at your data 3. The Normalization Wizard Window (see example below) allows you to select “Ratio Based”, “Wavelength Based” or “Lowess Based” Normalization options. Be sure that you normalize all microarrays in your experiment with the same normalization method. 4. The “Ratio Based” or “per chip” method has been quite common, but is not the only method available. For “Ratio Based” normalization, we usually select “Mean of the ratio of medians of all features is equal to 1”. Leave the defaults “exclude ratios less than 0.1 & greater than 10 “ and “ Exclude Bad, absent & not found features” These defaults will only exclude these values from the normalization and will not exclude the data from the microarray in the database. 5. Lowess Normalization can be used if you detect variation from print tip to print tip (aka from block to block) or may be useful to normalization over different intensity ranges. 6. If you leave the box checked for “Open normalization viewer upon completion”, you will be able to view a scatter plot of your unnormalized and normalized data after the normalization is complete. Page 6 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 7. When the normalization “Summary Window’” pops up, click “next” and click “finish”. 8. When the normalization is completed, you can view the normalized and unnormalized data scatter plots and the icon for your microarray folder has turned orange with a green dot in the lower left corner. See example below. 9. It is easy to remove and change normalizations, but this must be done before you include the microarray file in a microarray dataset in Acuity. If you wish to remove normalization on a microarray file that is already in a microarray dataset, then you must first remove the microarray from the dataset. OR you can reimport the GPR file for the microarray and put it in a new folder and leave it unnormalized or normalize it with a new normalization method. Page 7 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain 8.FILTERING DATA 1. Data can be filtered to remove flagged or poor quality data so that later analysis is only performed on excellent quality data. 2. Click on “Analysis”, “Create DataSet from”, “Query Wizard” 3. The Window entitled “Query Wizard: Filter by Microarray “ pops up (see below) 4. With “Type= microarray” and “Parameter=name”, click on “select from folder” button and then select on the microarray folder or folders desired. 5. Highlight folder name in the “combine conditions” window and then click on “add to query” button. 6. Click on “next” button. 7. The next window “Query Wizard: Filter by Data Value” pops up. You can load a previously saved condition or create a new data value criteria. (see below) 8. Frequently, we load the condition saved as “Cy3&Cy5 min, flags removed, RgnR2>0.6” to remove low Cy3 and Cy5 values, to remove “bad, absent, or not found features” and to remove features with poor pixel regression ratios over the area of the feature. 9. Highlight the title of the condition 10. Click the “load” button. Click on the “add to query” button 11. Click on the “next” button. 12. The next window “Query Wizard: Filter by Substance Property” pops up. 13. You can create criteria for specific substances if you wish to narrow the filtering down further or you can leave this criteria unused if you don’t wish to filter by substance property. 14. Click on the “next” button Page 8 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 15. The “Query Wizard Evaluate” window pops up. (see below) Review your filtering criteria and then click on the “evaluate” button. 16. Review the outcome of the query. Note the number of features that passed the filtering criteria and the number of substances. IF these meet your expectations, proceed to click on the “finish” button if you wish to save the results of the filtering as a new dataset. 17. The “Create Dataset” window pops up. Give the dataset a name, a description, and a logical location among your previously existing datasets. You can click on the button if you wish to create a new folder in which to locate your dataset. 18. You can open the newly created dataset by double left clicking on the title in the Project Tree window. 19. The dataset will open up in the “Data Table Pane” of the “Data Window”. 20. You can double click on the column entitled “substance” to sort ascending or descending by substance name or you can double click on the column titled with “your microarray title” to sort ascending or descending by numeric value. IF you have multiple microarrays in your dataset, whichever one is highlighted in the “data table pane” will have its features displayed and listed in the “Features” tab of the “Views Pane”. 21. In the “status bar” at the bottom of the Acuity screen, the default datatype is listed as “ratio of medians”, the number of substances and number selected are noted, and the number of features selected is noted. Page 9 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 22. If you wish to further edit your dataset, click on “Edit” in the “program toolbar” at the top of the screen and then “find in the window (specified values)” (See figure below) 23. Click on the appropriate boxes for your purpose. This window allows you to select substances “present” in a specified percentage of arrays or to select substances with a minimum value in at least “x” number of arrays etc, or even to invert the selection with “select substances that do NOT match the criteria” 24. Next, click on the “ok” button. A window will pop up displaying the number of substances that meet the criteria. These substances will be highlighted in the “data table pane” so if you wish to create a new dataset from these selected substances, click on “Analysis, Create Dataset from, Selected substances” etc. So you proceed to title, describe, and locate the newly created dataset from your selected substances that met this additional criteria. Page 10 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 9.FILTERING & QUERYING DATA 1. Filtering and Querying data is just like filtering alone, except that you are adding criteria to further explore the data. You use the same essential steps as described under “Filtering Data” above, but you may wish to add criteria under “data type” to extract substances with 2.5 fold higher or lower expression changes. For example, you could add the criteria “Ratio of Medians” is > than 2.5 for genes overexpressed by 2.5 fold or you could add the criteria “Ratio of Medians” is <0.4 for genes underexpressed by 2.5 fold. 2. In Acuity, you can filter or query based on any of the data types imported with the GPR file, any of the microarray parameters, or any of the substance properties. It is very flexible and intuitive. 10. CLUSTERING DATA 1. Clustering generally is performed on a group of arrays within a dataset. You can cluster by array or by gene or by gene and array. 2. Multiple clustering algorithms are available in Acuity, including PCA, SOM, Hierarchical, Kmean, Kmedians, Gap Statistic, Gene Shaving. 3.The purpose of clustering is to present data in visually interpretable format. Clustering hopes to reveal patterns in the microarray data. I don’t think there is a “gold standard” for clustering. Axon provides more details on the pros/cons of each algorithm and many research papers address this issue, so the topic will not be discussed further in this protocol. 4.In the Dataset pane of the project tree, click to open the dataset that you wish to cluster. 5. From the Common Tasks window, click on the clustering algorithm that you wish to use on your dataset. For this example, we will use SOM and then hierarchical clustering. The number above each of the patterns is the number of genes following the SOM pattern. Page 11 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 6.Above is an example window for the SOM clustering. This will find 16 (4x4) clusters or patterns in the data. Definitely log transform before analysis. Above are the defaults for centering,scaling, and similarity metric. On the next page, is an example SOM. 7. SOM patterns are shown under “visualizations” in the Data Window. 8. You can use an SOM cluster to then proceed to a hierarchical cluster. See screen shot on next page for defaults, using the SOM cluster to order before starting the hierarchical cluster. 9. This is an example hierarchical cluster by substances or genes, rather than by arrays. 10. The hierarchical cluster will be viewed under visualizations in the Data Window. See screen shot in next 2 pages. 11. Green represents under expression. Red represents overexpression. Black represents ratios of approx 1. Gray represents no data for that gene/substance in that array. Arrays are in columns and Genes/substances are in rows. 12. The example SOM/hierarchical cluster is a time series, where the first two arrays are visibly different from the next 8 arrays or columns. . Page 12 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain For Self Organizing Maps (SOM) non-hierarchical clustering, you can gain an idea of the number of nodes to suggest by first looking at the number of clumps in a hierarchical cluster. Once you determine the number of nodes for the SOM, it is worthwhile running an SOM prior to a final hierarchical cluster. Hierarchical clustering sets the order between nodes, but not the order within nodes. SOM sets the order within nodes and then the hierarchical clustering sets the order between nodes. Sean Carriedo of Axon suggested and SOM of 1xN or Nx1 rather than 3x3 or 4x4 etc. Hierarchical clustering finds the two most related substances by values and then builds on that. SOM looks for patterns of genes across all the microarrays. Page 13 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain Page 14 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Author: B. Griffith Revision level: 2 Effective Date: 11/07/03 Primary Reviewers: Val Bain 11. QUICKLISTS 1. Quicklists can be used to rapidly view a subset of genes, either based on common values (eg. Overexpressed or underexpressed), specific gene names (eg all transcription factors), or any other substance (gene) parameter or annotation. 2. Quicklists can be associated with a specific color, saved, and applied to later datasets. 3. Please refer to the “help” function in Acuity for more details. In brief, quick lists can be created by highlighting a specific set of genes/substances in the “Data Window” and then going to “Analysis/ Quicklist and Coloring Operations/Create Dataset Quicklist “. You give the quicklist a name and associated color for that dataset. 4. You can also create a global quicklist, not restricted to a specific dataset. 5. Go to “Data/Clear all colors” to remove the applied colorized quicklists. 6. You can also import a list of genes as a quicklist. 12. VARIOUS NOTES ON ACUITY Short Cuts: 1. “Control E”: expands the view of the window with the red bar across the top. “View/remove check from expand” returns the window to original proportions 2. To deselect a quick list and it’s color, you can click on any other field to deselect the quick list highlighting and you can use “data/clear colors” to remove the quicklist color application. 3. Control/mouse click = zoom Naming Conventions: Page 15 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain 4. Microarray names: shorter names are better, so that under graph view, the names are a minimal part of the view and the graphed features are the maximal part of the view 5. Microarray order: within a folder, the order of the microarrays is alphabetic by name so having microarray names that match the time order would keep the microarrays in the time order in the dataset table view and in the analysis views, including clustering. You want to set the order as you desire. Within the table pane, you can rearrange the order of the arrays within the dataset, but when you perform analysis, the order of the arrays in the cluster view will be determined by the microarray name. Before performing analysis/clustering, you can rearrange the columns. See next note. 6. Sean said that you can use the “data set operations” functions, only before you do any analysis on the dataset. Once you perform any analysis on the dataset, then you can’t use the “dataset operations”. So you can rearrange the columns in the dataset, by “analysis/dataset operations/arrange columns” prior to analysis. I think this rearrangement will be permanent. You can also divide all columns by one other column in dataset operations, if you want to normalize against a “zero point” in a time series. A “scriplet” or scroll will become associated with a dataset after the dataset operation is performed. The scriplet shows up underneath the dataset in the dataset tab. 7. You can change the names of the arrays in the Microarray project view and the new names carry downward to the datasets okay. Information available in various panes 8. Pay attention to the bar of text below the view pane tabs. The "# of substances (# selected) # microarrays" relates to the dataset as presented in the upper table pane, then the "features: # (#selected)" relates to the microarray being viewed in the lower view pane, and the "ratio of medians " relates to the type of data being viewed in the upper table pane and the lower view pane. Thus, for a single microarray being viewed in the lower View pane, I can see on the bottom screen bar, the number of replicates that passed the dataset query criteria for the substance selected in the table pane. 9. The Table pane shows substances in rows and the microarrays in columns. The feature values are the averages within a microarray if there is more than one replicate within the array for that substance (ex: actin has 256 values on the array, but only 1 average value is present in the table pane). The average value in the table pane is influenced by the number of features pulled into the dataset by the query criteria. (ex: if only 6 of 256 actin features meet the query criteria, then the average value shown in the table pane will be the average of the 6 values for actin. Thus the average actin value may go up or down with the dataset query criteria) 10. The View pane shows all the individual feature values within the selected microarray. Here you can see the individual values of the replicates for a substance selected in the Table pane. 11. In the Graph tab of the View pane, you are graphing the features from one microarray for which the substances are highlighted in the Table pane. 12. To assure that the microarray data in the table pane is synchronized with the image in the features window of the view pane, right click on the image in the features window and then select the “zoom to selection” option. This makes sure that the features selected in the numeric presentation of the data in the view pane are synchronized with the white outlining of the of the feature in the image Viewing Multiple Arrays simultaneously Page 16 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain 13. To be able to view multiple microarrays simultaneously, one can open multiple windows containing the same dataset and then highlight a distinct microarray for each of the tiled windows. a. Open dataset b. Highlight the dataset title in the dataset tab c. Right click on the dataset title in the dataset tab d. Choose “open selected in new window” e. A new window containing the dataset pops open. f. Repeat this process for as many microarrays as you wish to view separately at the next step g. Go to the top menu bar under “window” and click “tile vertically” to be able to see the multiple windows on the screen simultaneously. h. Click on the “features” tab for each of the open windows i. Click on a unique microarray data column in the table pane, for each of the tiled windows. j. Click on the numeric data column in the view pane, for each of the tiled windows. Then right click on the numeric data column in the view pane and click on “refresh data” . This will synchronize the data between the table pane and view pane for the one tiled window. Repeat this procedure for each of the tiled windows. k. Under features, right click on the image and click on linear regression and appropriate standard deviations to see a scatter plot of the highlighted substances. Queries: substances (or genes) and features (or spots) 14. When you run a query, the software looks at the individual feature values if you have replicates and only the features meeting the query criteria will be averaged and appear in the table pane resulting from the query. When you run the edit/find specific values on the dataset, the software looks at the individual feature values and filters data based on the individual feature values. Substance rows are highlighted or selected, as a result of the “edit/find specific values” and while they are still highlighted, you can go to “analysis; create dataset from selected substances” to create a new dataset based on the substances which were selected based on criteria placed on features. Thus values in this dataset can now change relative to the original dataset resulting from the query criteria. This is why… the query pulls based on individual feature values and the “edit/find/analysis create dataset from selected substances” is now looking at the average value for the replicate features. For example, if a query criteria is Rm>2.5 and the replicate features have values of 2.6 and 2.0, only the one feature with the Rm of 2.6 is pulled into the initial dataset based on the query. If you then use “edit/find specific values” and create a dataset based on the selected substances, both of the feature values will be pulled into the new dataset and their average will drop to 2.3… which appears to no longer meet the original Rm>2.5 criteria. However, in the end, you achieve a final dataset of substances with an Rm>2.5 in at least one of the two replicate features from the array. 15. clarification regarding your suggestion to use a query with new criteria, to pull out a subset of data from a dataset. I understand that I can open a dataset, use edit/find (in this window) to apply new criteria to the dataset, that the substances matching the edit/find criteria will be highlighted in the dataset table view. Now I go to analysis/create dataset from "selected substances" to capture the substances that fit the criteria set in the edit/find. I think this essentially creates a subdata set from the Page 17 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain original dataset, based on the criteria used in the edit/find. It seems to work when I try it. See #12 above also. Per Damian: There are a number of reasons why this is important functionality: - you can't use the Query Wizard for everything; e.g., you can't query on analysis results (clusters, etc) - the Query Wizard is spot-specific, but by selecting rows in the main table, either manually, or by selecting clusters or graph traces, or by using Edit / Find / Specified Values, you are creating a dataset from substance values, i.e. based on the combined values of replicates - further, because the Query Wizard is spot-specific, you cannot use it to make queries such as "Find all the substances that have ratios > 3 or < 0.3 on at least 2 microarrays". You can do that with Edit / Find / Specified Values. 16. Per Damian Verdnik: The Query Wizard looks not at the "averaged spot value" as you say, but the spot value, i.e. the value that you get from a row in GenePix. The value from a single spot, unaveraged. Once all your replicates are displayed in the main Acuity table they have been averaged, so doing an Edit / Find / Specified Values search finds values that have already been averaged for replicates. 17. Differences in apparent numbers of substances detected in a query at the “evaluate” stage and at the “Finalize” stage: When you run a query against a set of multiple microarrays at the “evaluate” stage, the software shows you the number of substances within each array that meet the criteria. So for 5 microarrays, each may have for example 50 substances that meet the criteria so that the total number of substances listed becomes 5 x 50 or 250 substances. Many of these 250 substances may be duplicates with each other. When you proceed then to the “finalize” query stage, then the software removes the duplicate substances and shows a total number of substances with duplicate substances removed. So the total number of substances at the “finalize” stage will generally be less than at the evaluate stage, unless there are no duplicate substances across the set of microarrays, that meet the query criteria. Filtering 18. Reasonable filtering: remove flagged features; remove features with linear regression ratios of <0.6 (RgnR2) to remove features with less even pixel intensities across the features; filter for ratio of medians with values of >2 for 2x higher expressers and for ratio of medians with values<0.5 for underexpressers. Requiring an absolute minimal value can be somewhat restrictive, but some labs do this so you pick you minimum value based on your observed background for the array, either for blanks or salt spots. 19. After you have “made a data set from query wizard”, you can apply further filtering. Copy the data set by highlighting the data set name in the dataset tab of the project tree and then right click and copy. Then point the mouse arrow to a blank space in the project tree and paste the data set in; this makes a copy of the dataset with an extension of _0, which you can later rename. Open the copy of the dataset and then use “Analysis, find specified values, require that at least 5 arrays have the Rm<0.4 “for example. The found substances will be highlighted. Then use edit to invert selection and then use Analysis and remove selected rows to get rid of the substances that aren’t Rm<0.4 in at least 5 arrays. Now the copied and further Page 18 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain filtered dataset can be renamed to reflect the additional filtering and all of the values filtered out in the original dataset query will remain filtered out and all the substances in the copied dataset will represent those with consistent Rm<0.4 in at least 5 arrays. Clustering 20. PCA=principle component analysis. It is for more than looking at outliers. It allows viewing of the biggest changes in the data and gives same result each time the algorthrim is run. It looks for the biggest variance and plots that on the x axis and it looks for the 2nd biggest variance and plots that on the Y axis and the third variance is plotted on the Z axis. You can see that the points most distant to the big clusters are the features/substances representing the greatest variance… most interesting features. You can use the zoom function and shift key to select multiple branches and create a dataset from selected features. Cool. Exporting Data from Acuity 21. The easiest way to copy a graph from the Report tab is to right-click on the graph and select Copy from the popup menu. In Powerpoint, select Edit / Paste Special. From the list of objects, select "Device Independent Bitmap". 22. There are two ways to move data out of Acuity and into txt or xls format. You can use the export function, which results in a txt file, but this is the slower method. Damian suggested a faster method which is to simply select the data, copy and paste it directly into excel. I found this essentially instaneous even for 6 arrays and 3500 rows of data. To export the normalized data from 78 arrays, Mike D suggested that I highlight all 78 microarray files in the microarrays tab and click to open all 78 in the data table pane. Then move the cursor into the table pane and get the red bar across the top of the table pane. Then I can use “edit/select all”, “edit/copy”, and then open excel and “paste into excel”. This will only move the data type being viewed in the table pane, into excel. Data Intrepretation Notes 23. Data review: I looked in GenePix at the Rp, Rm, mR, and rR values for selected bright red or bright green features on slide 91/day 2 of BioSig BA 9/02 series. I noted that excellent RgnR2 values approaching theoretical “1” could be associated with features with low or high absolute values . I convinced myself that I could pull out excellent Rm>2 and Rm<0.5 features, after filtering for features with RgnR2>0.6 for quality control. Absolute Cy5 and Cy3 values minus backgrounds could be low (300 vs 50) and the Ratio of medians would be significant (Rm=6) or the absolute values can be high (60,000 and 30,000) and the Rm would be significant (Rm=2). 24. In Acuity, the ratio of medians already has the Cy5 and Cy3 local backgrounds subtracted out. Choosing Acuity Database Across Network 25. Pointing Acuity client on one PC to a database on another PC a. Start/Settings/Control Panel/Administrative Tools b. ODBC Data Sources c. System DSN/Add d. SQL Server e. Choose a server/Next> f. With SQL server authentication Page 19 of 20 UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS Standard Operating Procedure Title: Acuity 3: importing, normalizing, filtering, querying data SOP#:7.3 Revision level: 2 Effective Date: 11/07/03 Author: B. Griffith Primary Reviewers: Val Bain Importing Quick Lists 26. If you have a list of genes of interest that you wish to view in Acuity, put your list of genes into a tab delimited text file (or excel and then save as tab delimted text). Next, go to “file” and “import quicklist” in Acuity. Give the list a color and save the quicklist with a name. Then open your dataset in Acuity, apply the color and selection with the quicklist. Then create a dataset from the selected substances. This will be a new dataset containing the full rows of data associated with your specific list of genes. Dye Flipping 27. You may have some arrays to which “dye flip” has been applied, meaning that the Cy3 is the experimental and Cy5 is the control. The standard configuration is for Cy5 to be the experimental and Cy3 be the control. If a subset of your arrays are dye flipped, go ahead and collect all the data from all the arrays, even the dye flipped ones, as if Cy5 is the experimental and Cy3 is the control. After you import the arrays into Acuity and create dataset from the mixed set of arrays, use the “analysis/dye swap columns” to select the arrays which need to have their Cy3/Cy5 ratios converted to Cy5/Cy3 so the ratio values can be compared to the traditional Cy5/Cy3 arrays. The dye swap columns will the calculate 1/(cy5/cy3) to get Cy3/Cy5 as experimental over control. Statistics 28. If you wish to view the statistics on data from multiple arrays, click to highlight the multiple arrays in the datapane under “mean data”. While they are highlighted, click on the “statistics” tab, then right click below the statistics tab and in the statistics window. Select “calculate statistics for selected microarrays”. Acuity then calculates the statistics across the multiple arrays and the “count” will equal the number of arrays highlighted and then utilized in the calculation. Comparing arrays by analysis of variance 29. Rather than comparing differences in gene expression by ratio of medians or fold change, one can use Acuity to look at differences in gene expression by significance. You pull all the arrays to be compared into a dataset, then in the datapane you highlight the “A set “of arrays (ex: 6 arrays from 3hr retinoic acid treatment in P19 cells) and then go to “analysis” , “advanced statistics” , and “calculate significance”. The software will then ask you to highlight the “B set” and will calculate the significance. You can use the defaults of “student’s T test with equal variance”, use a p value of 0.05 and use a bonferroni or Benjamin-hochberg correction for the p value which is very conservative and will find the most differentially expressed genes. Page 20 of 20