Acuity 3: importing, normalizing, filtering, querying data

advertisement
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
1. PURPOSE
Acuity is a microarray database and analysis software program. Microarray data is
imported, normalized, filtered, queried and clustered in Acuity
2. SCOPE
This procedure provides a brief introduction to Acuity. Full detailed instructions can be
viewed in Acuity under the “help” function or through the PDF Manual. Acuity is intuitive
so explore its functions as you go.
3. MATERIALS AND RESOURCES
Axon’s Acuity version 3.1004
Access to Acuity passwords
Axon Technical Support: 1-510-675-6200 (M.DeFreitas, PhD, S. Carriedo, PhD etc)
Axon’s Director for Genomics : Damian Verdnik, PhD
4. OVERVIEW OF DATA PROCESSING AND ANALYSIS IN ACUITY
a. Importing data: GPR (GenePix Results) files are imported into Acuity
b. Normalizing data: per chip, by Lowess, or per set of control genes in order
to be able to compare data between microarrays.
c. Filtering data: removing flagged data, minimum values, or spots with uneven
intensities in order to analyze the quality data
d. Querying data: creating datasets to display genes with differential expression
e. Clustering data: PCA, SOM, Hierarchical, Kmean, Kmedians, Gap Statistic,
Gene Shaving; present data in visually interpretable format.
f. Various Notes on Acuity
5. OVERVIEW OF SOFTWARE: DISPLAY ORGANIZATION
1. Acuity is written in C++ language, uses MSQL database software, and a windows
interface. All windows functionalities are used in Acuity.
Page 1 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
The Acuity “Data Window” has the following substructure: Data Pane, Substance Properties
Pane, and Views Pane.
2. In general, you will pick a task in the “Common Tasks” window. See list below:
Page 2 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
3. You will apply this task to data that you organize in the “Project Tree” . See example of
microarrays organized in the “Project Tree” below. The three tabs in “Project Tree” are
Microarrays, Datasets, and Quick Lists. You import microarrays and organize them under the
“Microarrays” tab. You create datasets from your microarrays and organize the datasets
under the “datasets” tab. You create quicklists of interesting genes and organize them under
the “Quicklists” tab.
Project Tree
4.When you open a microarray or a dataset, the data will be visible in the “Data Window”
which has a “data table pane”, “substance properties pane”, and “views pane”. See example
below. In the “Data Table Pane”, numeric data such as ratio of medians, F532-bkg, or any
data from the GPR file can be viewed for numerous arrays in a dataset. In the “Substance
Properties Pane”, gene descriptions, links to websites, chromosomal locations, statistics etc
can be viewed for each substance or gene. In the “Views Pane”, jpg images of features,
Page 3 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
experimental parameters, clustering visualizations, replicate values, reports and more can be
viewed for a specific microarray or dataset of arrays. Much of the work is performed in this
Data Window.
Substance
Properties
Pane
Data Table Pane
Views Pane
6. IMPORTING MICROARRAYS
1. In the Microarrays tab of the Project Tree, create a new folder in which you will organize
your files and then highlight the folder in the Project Tree.
2. In the “Common Tasks Window” , click on “import microarrays” and a window pops up
(example below).
3. Use your file management to select on the GPR file(s) that you wish to import. You can
highlight multiple GPR files and import them as a group. Click “open”.
4. A “select destination” windows pops up so select your destination folder in the Project
Window and click “ok”.
5. “Select Wavelengths” window pops up so leave to default of 635 and 532. Click “ok”.
6. A progress window for importing will pop up. Importing takes approx 1 minute per
microarray.
Page 4 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
7.When the importing is complete, the newly imported microarray will appear in the Project
Window under in your folder. See figure below:
8. The newly imported microarray will have a purple icon with a green dot to the lower left
side. The green dot indicates that the jpg image of the microarray was successfully imported
along with the numeric data from the GPR file. The purple color of the icon indicates that the
numeric data has NOT been normalized. Also under the column “Normalizations”, “none” is
indicated.
9. For new genomes, you can also import “Substance Properties” from tab delimted text files
such as Excel worksheets. EBL has imported the Substance Properties for Mouse1 and
Mouse2, and will do so for Rat and Mtb when we are ready to analyze Rat and Mtb data.
Substance properties include all information associated with the genes, including
chromosomal location, gene description, GB accession, gene symbol, Tm, Plate position, and
Unigene ID. Substance properties also include the weblinks which EBL has established and
the statistics which Acuity calculates if replicate spots are included in the microarray.
10. For new microarrays, you can also import “Microarray Parameters” from tab delimited text
files such as Excel worksheets. Microarray Parameters could include treatment times,
gender, treatment condition, animal number, replicate number, or other experimental
conditions defining the microarray’s relevance to your experiment. You can also manually
enter Microarray parameters, if you don’t have the data in a tab delimited text file.
Page 5 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
11. To manually enter Microarray parameters, right click on your microarray file in the
microarray tab within the Project Tree Window. Then left click on “properties” and then left
click on “parameters”. Within the “Microarray Parameters” window, there is a drop down box
of existing parameters for which you can insert a value below. Or you can click on
“configure” and add new parameters that you define, by clicking on “add”. Type in a name for
your new property or parameter and then select either “string, int, or float” as the data type.
“String” is for text, “integer” is whole number, and “float” is for numbers with decimals. Click
on “ok”, click on “close” and click on “ok” and click on “ok”. Be sure to be perfectly consistent
in your parameter values or data entry, if you want to later query on microarrays by their
“Micorarray parameters”. Example: if you use a number value for day of infection (2), then
don’t later use the word “two”… stick to the same format for the values!
7. NORMALIZING MICROARRAYS
1. There is an excellent description of the rationale behind the various types of
normalization located in Acuity under “Help” , “Acuity Manual (PDF)”, “Normalization”.
So the rationales won’t be discussed here. You should carefully select a
normalization method based on the anticipated variation in your microarrays and
experiments.
2. In the “Common Tasks” window, click on “Normalization Wizard” and the “Microarray
Normalization Wizard Window” pops open. Acuity looks at your data
3. The Normalization Wizard Window (see example below) allows you to select “Ratio
Based”, “Wavelength Based” or “Lowess Based” Normalization options. Be sure that
you normalize all microarrays in your experiment with the same normalization
method.
4. The “Ratio Based” or “per chip” method has been quite common, but is not the only
method available. For “Ratio Based” normalization, we usually select “Mean of the
ratio of medians of all features is equal to 1”. Leave the defaults “exclude ratios less
than 0.1 & greater than 10 “ and “ Exclude Bad, absent & not found features” These
defaults will only exclude these values from the normalization and will not exclude the
data from the microarray in the database.
5. Lowess Normalization can be used if you detect variation from print tip to print tip
(aka from block to block) or may be useful to normalization over different intensity
ranges.
6. If you leave the box checked for “Open normalization viewer upon completion”, you
will be able to view a scatter plot of your unnormalized and normalized data after the
normalization is complete.
Page 6 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
7. When the normalization “Summary Window’” pops up, click “next” and click “finish”.
8. When the normalization is completed, you can view the normalized and
unnormalized data scatter plots and the icon for your microarray folder has turned
orange with a green dot in the lower left corner. See example below.
9. It is easy to remove and change normalizations, but this must be done before you
include the microarray file in a microarray dataset in Acuity. If you wish to remove
normalization on a microarray file that is already in a microarray dataset, then you
must first remove the microarray from the dataset. OR you can reimport the GPR file
for the microarray and put it in a new folder and leave it unnormalized or normalize it
with a new normalization method.
Page 7 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
8.FILTERING DATA
1. Data can be filtered to remove flagged or poor quality data so that later analysis is only
performed on excellent quality data.
2. Click on “Analysis”, “Create DataSet from”, “Query Wizard”
3. The Window entitled “Query Wizard: Filter by Microarray “ pops up (see below)
4. With “Type= microarray” and “Parameter=name”, click on “select from folder” button and
then select on the microarray folder or folders desired.
5. Highlight folder name in the “combine conditions” window and then click on “add to query”
button.
6. Click on “next” button.
7. The next window “Query Wizard: Filter by Data Value” pops up. You can load a previously
saved condition or create a new data value criteria. (see below)
8. Frequently, we load the condition saved as “Cy3&Cy5 min, flags removed, RgnR2>0.6” to
remove low Cy3 and Cy5 values, to remove “bad, absent, or not found features” and to
remove features with poor pixel regression ratios over the area of the feature.
9. Highlight the title of the condition
10. Click the “load” button. Click on the “add to query” button
11. Click on the “next” button.
12. The next window “Query Wizard: Filter by Substance Property” pops up.
13. You can create criteria for specific substances if you wish to narrow the filtering down
further or you can leave this criteria unused if you don’t wish to filter by substance property.
14. Click on the “next” button
Page 8 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
15. The “Query Wizard Evaluate” window pops up. (see below) Review your filtering criteria
and then click on the “evaluate” button.
16. Review the outcome of the query. Note the number of features that passed the filtering
criteria and the number of substances. IF these meet your expectations, proceed to click on
the “finish” button if you wish to save the results of the filtering as a new dataset.
17. The “Create Dataset” window pops up. Give the dataset a name, a description, and a
logical location among your previously existing datasets. You can click on the
button
if you wish to create a new folder in which to locate your dataset.
18. You can open the newly created dataset by double left clicking on the title in the Project
Tree window.
19. The dataset will open up in the “Data Table Pane” of the “Data Window”.
20. You can double click on the column entitled “substance” to sort ascending or descending
by substance name or you can double click on the column titled with “your microarray title” to
sort ascending or descending by numeric value. IF you have multiple microarrays in your
dataset, whichever one is highlighted in the “data table pane” will have its features displayed
and listed in the “Features” tab of the “Views Pane”.
21. In the “status bar” at the bottom of the Acuity screen, the default datatype is listed as
“ratio of medians”, the number of substances and number selected are noted, and the
number of features selected is noted.
Page 9 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
22. If you wish to further edit your dataset, click on “Edit” in the “program toolbar” at the top of
the screen and then “find in the window (specified values)” (See figure below)
23. Click on the appropriate boxes for your purpose. This window allows you to select
substances “present” in a specified percentage of arrays or to select substances with a
minimum value in at least “x” number of arrays etc, or even to invert the selection with “select
substances that do NOT match the criteria”
24. Next, click on the “ok” button. A window will pop up displaying the number of substances
that meet the criteria. These substances will be highlighted in the “data table pane” so if you
wish to create a new dataset from these selected substances, click on “Analysis, Create
Dataset from, Selected substances” etc. So you proceed to title, describe, and locate the
newly created dataset from your selected substances that met this additional criteria.
Page 10 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
9.FILTERING & QUERYING DATA
1. Filtering and Querying data is just like filtering alone, except that you are adding criteria to
further explore the data. You use the same essential steps as described under “Filtering
Data” above, but you may wish to add criteria under “data type” to extract substances with
2.5 fold higher or lower expression changes. For example, you could add the criteria “Ratio
of Medians” is > than 2.5 for genes overexpressed by 2.5 fold or you could add the criteria
“Ratio of Medians” is <0.4 for genes underexpressed by 2.5 fold.
2. In Acuity, you can filter or query based on any of the data types imported with the GPR file,
any of the microarray parameters, or any of the substance properties. It is very flexible and
intuitive.
10. CLUSTERING DATA
1. Clustering generally is performed on a group of arrays within a dataset. You can cluster by
array or by gene or by gene and array.
2. Multiple clustering algorithms are available in Acuity, including PCA, SOM, Hierarchical,
Kmean, Kmedians, Gap Statistic, Gene Shaving.
3.The purpose of clustering is to present data in visually interpretable format. Clustering
hopes to reveal patterns in the microarray data. I don’t think there is a “gold standard” for
clustering. Axon provides more details on the pros/cons of each algorithm and many
research papers address this issue, so the topic will not be discussed further in this protocol.
4.In the Dataset pane of the project tree, click to open the dataset that you wish to cluster.
5. From the Common Tasks window, click on the clustering algorithm that you wish to use on
your dataset. For this example, we will use SOM and then hierarchical clustering. The
number above each of the patterns is the number of genes following the SOM pattern.
Page 11 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
6.Above is an example window for the SOM clustering. This will find 16 (4x4) clusters or
patterns in the data. Definitely log transform before analysis. Above are the defaults for
centering,scaling, and similarity metric. On the next page, is an example SOM.
7. SOM patterns are shown under “visualizations” in the Data Window.
8. You can use an SOM cluster to then proceed to a hierarchical cluster. See screen shot on
next page for defaults, using the SOM cluster to order before starting the hierarchical cluster.
9. This is an example hierarchical cluster by substances or genes, rather than by arrays.
10. The hierarchical cluster will be viewed under visualizations in the Data Window. See
screen shot in next 2 pages.
11. Green represents under expression. Red represents overexpression. Black represents
ratios of approx 1. Gray represents no data for that gene/substance in that array. Arrays are
in columns and Genes/substances are in rows.
12. The example SOM/hierarchical cluster is a time series, where the first two arrays are
visibly different from the next 8 arrays or columns.
.
Page 12 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
For Self Organizing Maps (SOM) non-hierarchical clustering, you can gain an idea of the
number of nodes to suggest by first looking at the number of clumps in a hierarchical
cluster. Once you determine the number of nodes for the SOM, it is worthwhile running
an SOM prior to a final hierarchical cluster. Hierarchical clustering sets the order
between nodes, but not the order within nodes. SOM sets the order within nodes and
then the hierarchical clustering sets the order between nodes. Sean Carriedo of Axon
suggested and SOM of 1xN or Nx1 rather than 3x3 or 4x4 etc. Hierarchical clustering
finds the two most related substances by values and then builds on that. SOM looks for
patterns of genes across all the microarrays.
Page 13 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
Page 14 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Author: B. Griffith
Revision level: 2
Effective Date: 11/07/03
Primary Reviewers: Val Bain
11. QUICKLISTS
1. Quicklists can be used to rapidly view a subset of genes, either based on common values
(eg. Overexpressed or underexpressed), specific gene names (eg all transcription factors),
or any other substance (gene) parameter or annotation.
2. Quicklists can be associated with a specific color, saved, and applied to later datasets.
3. Please refer to the “help” function in Acuity for more details. In brief, quick lists can be
created by highlighting a specific set of genes/substances in the “Data Window” and then
going to “Analysis/ Quicklist and Coloring Operations/Create Dataset Quicklist “. You give
the quicklist a name and associated color for that dataset.
4. You can also create a global quicklist, not restricted to a specific dataset.
5. Go to “Data/Clear all colors” to remove the applied colorized quicklists.
6. You can also import a list of genes as a quicklist.
12. VARIOUS NOTES ON ACUITY
Short Cuts:
1. “Control E”: expands the view of the window with the red bar across the top.
“View/remove check from expand” returns the window to original proportions
2. To deselect a quick list and it’s color, you can click on any other field to deselect the
quick list highlighting and you can use “data/clear colors” to remove the quicklist color
application.
3. Control/mouse click = zoom
Naming Conventions:
Page 15 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
4. Microarray names: shorter names are better, so that under graph view, the names
are a minimal part of the view and the graphed features are the maximal part of the
view
5. Microarray order: within a folder, the order of the microarrays is alphabetic by name
so having microarray names that match the time order would keep the microarrays in
the time order in the dataset table view and in the analysis views, including
clustering. You want to set the order as you desire. Within the table pane, you can
rearrange the order of the arrays within the dataset, but when you perform analysis,
the order of the arrays in the cluster view will be determined by the microarray name.
Before performing analysis/clustering, you can rearrange the columns. See next
note.
6. Sean said that you can use the “data set operations” functions, only before you do
any analysis on the dataset. Once you perform any analysis on the dataset, then you
can’t use the “dataset operations”. So you can rearrange the columns in the dataset,
by “analysis/dataset operations/arrange columns” prior to analysis. I think this
rearrangement will be permanent. You can also divide all columns by one other
column in dataset operations, if you want to normalize against a “zero point” in a time
series. A “scriplet” or scroll will become associated with a dataset after the dataset
operation is performed. The scriplet shows up underneath the dataset in the dataset
tab.
7. You can change the names of the arrays in the Microarray project view and the new
names carry downward to the datasets okay.
Information available in various panes
8. Pay attention to the bar of text below the view pane tabs. The "# of substances (#
selected) # microarrays" relates to the dataset as presented in the upper table pane,
then the "features: # (#selected)" relates to the microarray being viewed in the lower
view pane, and the "ratio of medians " relates to the type of data being viewed in the
upper table pane and the lower view pane. Thus, for a single microarray being
viewed in the lower View pane, I can see on the bottom screen bar, the number of
replicates that passed the dataset query criteria for the substance selected in the
table pane.
9. The Table pane shows substances in rows and the microarrays in columns. The
feature values are the averages within a microarray if there is more than one
replicate within the array for that substance (ex: actin has 256 values on the array,
but only 1 average value is present in the table pane). The average value in the
table pane is influenced by the number of features pulled into the dataset by the
query criteria. (ex: if only 6 of 256 actin features meet the query criteria, then the
average value shown in the table pane will be the average of the 6 values for actin.
Thus the average actin value may go up or down with the dataset query criteria)
10. The View pane shows all the individual feature values within the selected microarray.
Here you can see the individual values of the replicates for a substance selected in
the Table pane.
11. In the Graph tab of the View pane, you are graphing the features from one microarray
for which the substances are highlighted in the Table pane.
12. To assure that the microarray data in the table pane is synchronized with the image
in the features window of the view pane, right click on the image in the features
window and then select the “zoom to selection” option. This makes sure that the
features selected in the numeric presentation of the data in the view pane are
synchronized with the white outlining of the of the feature in the image
Viewing Multiple Arrays simultaneously
Page 16 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
13. To be able to view multiple microarrays simultaneously, one can open multiple
windows containing the same dataset and then highlight a distinct microarray for
each of the tiled windows.
a. Open dataset
b. Highlight the dataset title in the dataset tab
c. Right click on the dataset title in the dataset tab
d. Choose “open selected in new window”
e. A new window containing the dataset pops open.
f. Repeat this process for as many microarrays as you wish to view separately
at the next step
g. Go to the top menu bar under “window” and click “tile vertically” to be able to
see the multiple windows on the screen simultaneously.
h. Click on the “features” tab for each of the open windows
i. Click on a unique microarray data column in the table pane, for each of the
tiled windows.
j. Click on the numeric data column in the view pane, for each of the tiled
windows. Then right click on the numeric data column in the view pane and
click on “refresh data” . This will synchronize the data between the table
pane and view pane for the one tiled window. Repeat this procedure for
each of the tiled windows.
k. Under features, right click on the image and click on linear regression and
appropriate standard deviations to see a scatter plot of the highlighted
substances.
Queries: substances (or genes) and features (or spots)
14. When you run a query, the software looks at the individual feature values if you have
replicates and only the features meeting the query criteria will be averaged and
appear in the table pane resulting from the query. When you run the edit/find specific
values on the dataset, the software looks at the individual feature values and filters
data based on the individual feature values. Substance rows are highlighted or
selected, as a result of the “edit/find specific values” and while they are still
highlighted, you can go to “analysis; create dataset from selected substances” to
create a new dataset based on the substances which were selected based on criteria
placed on features. Thus values in this dataset can now change relative to the
original dataset resulting from the query criteria. This is why… the query pulls based
on individual feature values and the “edit/find/analysis create dataset from selected
substances” is now looking at the average value for the replicate features. For
example, if a query criteria is Rm>2.5 and the replicate features have values of 2.6
and 2.0, only the one feature with the Rm of 2.6 is pulled into the initial dataset based
on the query. If you then use “edit/find specific values” and create a dataset based on
the selected substances, both of the feature values will be pulled into the new
dataset and their average will drop to 2.3… which appears to no longer meet the
original Rm>2.5 criteria. However, in the end, you achieve a final dataset of
substances with an Rm>2.5 in at least one of the two replicate features from the
array.
15. clarification regarding your suggestion to use a query with new criteria, to pull out a
subset of data from a dataset. I understand that I can open a dataset, use edit/find
(in this window) to apply new criteria to the dataset, that the substances matching the
edit/find criteria will be highlighted in the dataset table view. Now I go to
analysis/create dataset from "selected substances" to capture the substances that fit
the criteria set in the edit/find. I think this essentially creates a subdata set from the
Page 17 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
original dataset, based on the criteria used in the edit/find. It seems to work when I
try it. See #12 above also.
Per Damian: There are a number of reasons why this is important functionality:
- you can't use the Query Wizard for everything; e.g., you can't query on analysis
results (clusters, etc)
- the Query Wizard is spot-specific, but by selecting rows in the main table, either
manually, or by selecting clusters or graph traces, or by using Edit / Find / Specified
Values, you are creating a dataset from substance values, i.e. based on the
combined values of replicates
- further, because the Query Wizard is spot-specific, you cannot use it to make
queries such as "Find all the substances that have ratios > 3 or < 0.3 on at least 2
microarrays". You can do that with Edit / Find / Specified Values.
16. Per Damian Verdnik: The Query Wizard looks not at the "averaged spot value" as
you say, but the spot value, i.e. the value that you get from a row in GenePix. The
value from a single spot, unaveraged. Once all your replicates are displayed in the
main Acuity table they have been averaged, so doing an Edit / Find / Specified
Values search finds values that have already been averaged for replicates.
17. Differences in apparent numbers of substances detected in a query at the “evaluate”
stage and at the “Finalize” stage: When you run a query against a set of multiple
microarrays at the “evaluate” stage, the software shows you the number of
substances within each array that meet the criteria. So for 5 microarrays, each may
have for example 50 substances that meet the criteria so that the total number of
substances listed becomes 5 x 50 or 250 substances. Many of these 250 substances
may be duplicates with each other. When you proceed then to the “finalize” query
stage, then the software removes the duplicate substances and shows a total
number of substances with duplicate substances removed. So the total number of
substances at the “finalize” stage will generally be less than at the evaluate stage,
unless there are no duplicate substances across the set of microarrays, that meet the
query criteria.
Filtering
18. Reasonable filtering: remove flagged features; remove features with linear
regression ratios of <0.6 (RgnR2) to remove features with less even pixel intensities
across the features; filter for ratio of medians with values of >2 for 2x higher
expressers and for ratio of medians with values<0.5 for underexpressers. Requiring
an absolute minimal value can be somewhat restrictive, but some labs do this so you
pick you minimum value based on your observed background for the array, either for
blanks or salt spots.
19. After you have “made a data set from query wizard”, you can apply further filtering.
Copy the data set by highlighting the data set name in the dataset tab of the project
tree and then right click and copy. Then point the mouse arrow to a blank space in
the project tree and paste the data set in; this makes a copy of the dataset with an
extension of _0, which you can later rename. Open the copy of the dataset and then
use “Analysis, find specified values, require that at least 5 arrays have the Rm<0.4
“for example. The found substances will be highlighted. Then use edit to invert
selection and then use Analysis and remove selected rows to get rid of the
substances that aren’t Rm<0.4 in at least 5 arrays. Now the copied and further
Page 18 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
filtered dataset can be renamed to reflect the additional filtering and all of the values
filtered out in the original dataset query will remain filtered out and all the substances
in the copied dataset will represent those with consistent Rm<0.4 in at least 5 arrays.
Clustering
20. PCA=principle component analysis. It is for more than looking at outliers. It allows
viewing of the biggest changes in the data and gives same result each time the
algorthrim is run. It looks for the biggest variance and plots that on the x axis and it
looks for the 2nd biggest variance and plots that on the Y axis and the third variance
is plotted on the Z axis. You can see that the points most distant to the big clusters
are the features/substances representing the greatest variance… most interesting
features. You can use the zoom function and shift key to select multiple branches
and create a dataset from selected features. Cool.
Exporting Data from Acuity
21. The easiest way to copy a graph from the Report tab is to right-click on the graph and
select Copy from the popup menu. In Powerpoint, select Edit / Paste Special. From
the list of objects, select "Device Independent Bitmap".
22. There are two ways to move data out of Acuity and into txt or xls format. You can
use the export function, which results in a txt file, but this is the slower method.
Damian suggested a faster method which is to simply select the data, copy and paste
it directly into excel. I found this essentially instaneous even for 6 arrays and 3500
rows of data. To export the normalized data from 78 arrays, Mike D suggested that I
highlight all 78 microarray files in the microarrays tab and click to open all 78 in the
data table pane. Then move the cursor into the table pane and get the red bar across
the top of the table pane. Then I can use “edit/select all”, “edit/copy”, and then open
excel and “paste into excel”. This will only move the data type being viewed in the
table pane, into excel.
Data Intrepretation Notes
23. Data review: I looked in GenePix at the Rp, Rm, mR, and rR values for selected
bright red or bright green features on slide 91/day 2 of BioSig BA 9/02 series. I noted
that excellent RgnR2 values approaching theoretical “1” could be associated with
features with low or high absolute values . I convinced myself that I could pull out
excellent Rm>2 and Rm<0.5 features, after filtering for features with RgnR2>0.6 for
quality control. Absolute Cy5 and Cy3 values minus backgrounds could be low (300
vs 50) and the Ratio of medians would be significant (Rm=6) or the absolute values
can be high (60,000 and 30,000) and the Rm would be significant (Rm=2).
24. In Acuity, the ratio of medians already has the Cy5 and Cy3 local backgrounds
subtracted out.
Choosing Acuity Database Across Network
25. Pointing Acuity client on one PC to a database on another PC
a. Start/Settings/Control Panel/Administrative Tools
b. ODBC Data Sources
c. System DSN/Add
d. SQL Server
e. Choose a server/Next>
f. With SQL server authentication
Page 19 of 20
UNM SOM EXPERIMENTAL BIOTECHNOLOGY LABORATORY: MICROARRAYS
Standard Operating Procedure
Title: Acuity 3: importing, normalizing, filtering, querying data
SOP#:7.3
Revision level: 2
Effective Date: 11/07/03
Author: B. Griffith
Primary Reviewers: Val Bain
Importing Quick Lists
26. If you have a list of genes of interest that you wish to view in Acuity, put your list of
genes into a tab delimited text file (or excel and then save as tab delimted text).
Next, go to “file” and “import quicklist” in Acuity. Give the list a color and save the
quicklist with a name. Then open your dataset in Acuity, apply the color and
selection with the quicklist. Then create a dataset from the selected substances.
This will be a new dataset containing the full rows of data associated with your
specific list of genes.
Dye Flipping
27. You may have some arrays to which “dye flip” has been applied, meaning that the
Cy3 is the experimental and Cy5 is the control. The standard configuration is for Cy5
to be the experimental and Cy3 be the control. If a subset of your arrays are dye
flipped, go ahead and collect all the data from all the arrays, even the dye flipped
ones, as if Cy5 is the experimental and Cy3 is the control. After you import the
arrays into Acuity and create dataset from the mixed set of arrays, use the
“analysis/dye swap columns” to select the arrays which need to have their Cy3/Cy5
ratios converted to Cy5/Cy3 so the ratio values can be compared to the traditional
Cy5/Cy3 arrays. The dye swap columns will the calculate 1/(cy5/cy3) to get Cy3/Cy5
as experimental over control.
Statistics
28. If you wish to view the statistics on data from multiple arrays, click to highlight the
multiple arrays in the datapane under “mean data”. While they are highlighted, click
on the “statistics” tab, then right click below the statistics tab and in the statistics
window. Select “calculate statistics for selected microarrays”. Acuity then calculates
the statistics across the multiple arrays and the “count” will equal the number of
arrays highlighted and then utilized in the calculation.
Comparing arrays by analysis of variance
29. Rather than comparing differences in gene expression by ratio of medians or fold
change, one can use Acuity to look at differences in gene expression by significance.
You pull all the arrays to be compared into a dataset, then in the datapane you
highlight the “A set “of arrays (ex: 6 arrays from 3hr retinoic acid treatment in P19
cells) and then go to “analysis” , “advanced statistics” , and “calculate significance”.
The software will then ask you to highlight the “B set” and will calculate the
significance. You can use the defaults of “student’s T test with equal variance”, use a
p value of 0.05 and use a bonferroni or Benjamin-hochberg correction for the p value
which is very conservative and will find the most differentially expressed genes.
Page 20 of 20
Download