BMC Bioinformatics

advertisement
Minimum Information Specification For In Situ
Hybridization and Immunohistochemistry Experiments
(MISFISHIE)
Eric W Deutsch1, Lawrence D True2, Martin Korb1, David Campbell1, Sean
Grimmond3, Michael H Johnson1, Christian J Stoeckert, Jr. 6, Yi Zhou4, Alvin Y Liu5,
and certainly many more
1
Institute for Systems Biology, 1441 N 34th Street, Seattle, WA 98103, USA
Department of Pathology, University of Washington, Seattle, WA 98195-7705, USA
3
University of Queensland, St. Lucia, Qld, Australia
4
Children's Hospital Boston and Harvard Medical School, Boston, MA, USA
5
Department of Urology, University of Washington, Seattle, WA 98195, USA
6
Center for Bioinformatics and Department of Genetics, University of Pennsylvania,
Philadelphia, PA 19104, USA
2
Likely submission target: Genome Biology
Abstract
Background
One goal of the biomedical literature is to report results in a manner sufficiently
detailed so that the methods of data collection and analysis can be independently
replicated and verified. In order to assure this level of detail in published works, it is
useful to define a minimum information specification for each experimental data type
and to insist on adherence to this standard for published material. Such a
specification has been widely accepted in and been a benefit to the microarray
community, and efforts are well underway for proteomics data types. However, no
such specification yet exists for visual interpretation-based tissue gene expression
localization experiments, e.g. in situ hybridization and immunohistochemistry
experiments.
Results
We present such a specification, termed “Minimum Information Specification For In
Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE)”. It is
modelled after the MIAME (Minimum Information About a Microarray Experiment)
specification for microarray experiments. These data standards detail the content of
the information that is deemed necessary to satisfy the requirements of the standards;
these standards do not dictate a specific format for encoding the information. The
MISFISHIE standard describes the information that must be provided in six sections:
Experimental Design, Specimens, Probe or Antibody Information, Staining Protocols
and Parameters, Imaging Data and Parameters, and Image Characterizations. A
general checklist is provided to quickly and efficiently establish adherence to the
standard. We surveyed articles in the most recent 12 issues of 3 pathology journals for
potential adherence to this proposed specification and found that approximately NN%
lack sufficient information for independent investigators to readily repeat the study.
[these are numbers are preliminary. Comments withheld until survey completed and
validated.]
Conclusion
This specification was jointly developed by members of the NIH/NIDDK Stem Cell
Genome Anatomy Projects consortium in order to facilitate data sharing within the
consortium. Use of the standard has benefited the consortium and would likely benefit
the whole research community. We encourage journals to encourage and eventually
require compliance with MISFISHIE for all studies that include gene expression
localization data, so that all published data and resulting conclusions may be properly
interpreted and that independent investigators have the information that would enable
them to replicate the findings. More information and examples may be obtained at
http://scgap.systemsbiology.net/standards/misfishie/.
Background
Gene expression localization experiments, such as in situ hybridization and
immunohistochemistry, are frequently used to determine the exact source of
expression observed in more high-throughput (e.g., microarray, proteomics, etc.)
assays. For example, most samples derived from tissue contain many different cells
types, and the expression profiles of each cell type are each likely to be different.
Consequently, the component of a tissue-based sample that is responsible for the
expression signal of a particular gene is ambiguous. Confirming the location and
degree of gene expression within a tissue and cell type is often achieved by staining a
section of the tissue with a suitable reporter to the gene or gene product of interest.
However, it is often the case that in situ hybridization and immunohistochemistry
stains and/or images are presented with minimal interpretation and methodology.
Furthermore neither the reagents and methods used in the experiments nor the results
are easily searchable. Above all, the interpretation of in situ hybridization and
immunohistochemistry stains varies between observers, between different image
analysis platforms and programs, and even between different sessions using the same
image analysis platform and program [1].
Data standards have emerged for microarray data ([2, 3]) to the great benefit of the
entire biological community. [Explain how.] Additional standards are under
development for other high-throughput technologies ([4, 5]).
In order to advocate that results from new gene expression localization experiments
are of maximum benefit to the biomedical community, we propose a minimum
information standard, termed “Minimum Information Specification For In Situ
Hybridization and Immunohistochemistry Experiments (MISFISHIE)”. This standard
describes the minimum information that should be provided when publishing, making
public, or exchanging results from visual interpretation-based tissue gene expression
localization experiments such as in situ hybridization, immunohistochemistry,
reporter construct genetic experiments (GFP/green fluorescent protein, βgalactosidase), etc. Compliance to this standard is expected to provide researchers at
other labs enough information to reproduce the experiment and/or to fully evaluate the
data upon which results are based.
Modeled after the well-received MIAME (Minimum Information About a Microarray
Experiment) specification for microarray experiments [2], this specification only
describes the kind of information that should be provided. MISFISHIE does not list
every parameter that could be specified about any experiment, but rather lists broader
categories of detail that must be addressed, relying on the data producers and
reviewers to insure that each section contains enough information for readers to be
able to fully assess the validity and accurately reproduce the experiment being
described. As MIAME has been a yardstick for whether databases and publications
have done their job in adequately describing a microarray experiment such that its
conclusion could be verified or refuted [6], we hope MISFISHIE will provide a
similar function for gene expression localization experiments.
Furthermore, this specification does not dictate a specific format for reporting the
information. We expect to develop a data model based on the concepts of MAGEOM (MicroArray Gene Expression Object Model [3]) and on the MAGEstk
(MicroArray Gene Expression software tool kit [3]) in the near future. It is this model
and the associated XML-based mark-up language that will provide the recommended
data format for archiving or transferring such data. Since MAGE-OM version 2
(perhaps FuGE-OM) [ref to Angel’s paper] is being designed to be able to include
other functional genomics experiments in a modular fashion, it is likely that the
MISFISHIE-derived object model would be integrated within FuGE-OM rather than
separate from it.
Separation of the minimum information specification and the data format is important
because the data format should both allow for unlimited additional information
beyond the minimum as well as allow one to encode incomplete information for
optimum flexibility. In addition, broader acceptance of the minimum information
required will greatly aid the design of a data model.
It has long been recognized that improved standards for immunohistochemistry are
needed ([7, 8]). However, these standardization discussions have largely focused
either on developing standardized technical protocols which might be used to produce
more uniform stains [7], or on reducing the subjectivity of interpreting histologic
sections [8]. We do not attempt here to promote standardized methodologies for
production or interpretation of assays (although that is certainly encouraged). Rather,
we wish to promote complete disclosure of the methods actually used so that any
result may be replicated by others using the identical procedure that the initial
investigators used.
While no official minimum standard for such data yet exists, there have been several
efforts at organizing gene expression localization data in databases and such database
designs provide a useful framework from which to build a standard. Two notable
databases specialized for the mouse community, the Mouse Gene Expression
Database (GXD) [9] and the Edinburgh Mouse Atlas Gene Expression (EMAGE)
database [10], have influenced the design of MISFISHIE. Mouse-specific fields in
these databases have been removed in favour of more organism-neutral requirements.
And further, several fields in these databases were deemed useful but not a minimum
requirement and were thus not included here. Many experiments in these databases
extracted by curators from journal articles do have many empty fields because they
were not sufficiently described in the original articles. Requiring MISFISHIE
compliance for new articles will result in more complete and reproducible
experiments in these and other databases in the future.
Results and Discussion
This specification describes the type of information that must be provided for
published gene expression localization experiments in six sections (Figure 1):
1. Experiment Design
2. Specimens
3. Probe or Antibody Information
4. Staining Protocols and Parameters
5. Imaging Data and Parameters
6. Image Characterization
The following checklist provides a guideline for insuring that data is compliant to the
specification. It will be useful for researchers preparing to publish data as well as
manuscript reviewers attempting to verify MISFISHIE compliance. Wherever
possible descriptions should use terms from the Microarray Gene Expression Data
Society (MGED) Ontology (MO) [11]. For terms outside the scope of MO, such as
anatomy terms, another appropriate ontology should be used. Use of MO and other
ontologies is especially important as MISFISHIE-supporting applications and
databases are developed. Many of the term used in this specification are defined in
MO.
Experiment Design:
This section should contain information about the gene expression localization
experiment as a whole. These include a brief description of the project,
ExperimentalFactors, and the methods. For example the variables between the assays
in the experiment, and how to get more information about the experiment (web sites
and persons to contact).

Experiment Description: Short summary of the goals of the experiment.

Assay Type(s): e.g., immunohistochemistry, in situ hybridization, GFP, etc.

Experiment Design Type: For example, is it a comparison of normal vs.
diseased tissue, of multiple tissue specimens of similar type, of multiple
probes/antibodies applied to the same tissue, etc.? See MGED Ontology
ExperimentDesignType as a basis for categorizing the design type.

ExperimentalFactors: the parameters or conditions that are tested, such as
probe/antibody, disease state, genetic variation, structural unit, age, etc.

The total number of hybridizations/stains performed in the experiment: a
hybridizations/stain is defined as an assay of a single tissue. Thus, an
immunostain of a section of a tissue microarray consisting of a 10 x 10 array
of different tissues counts as 100 immunostains. If replicates or reruns are a
component of the experimental design, provide details that should include
number of replicates per tissue, per antibody, per probe, etc., as relevant.

URL of any websites or database accession numbers that are related to the
experiment (if available).

Contact information for communicating with the experimenter(s).
BioMaterials (specimens) used, and Treatments (section or mount preparation):
Describing specimens comprehensively is difficult, as they may have nearly an
infinite number of characteristics, especially if clinical information is available. The
guiding principle in sample descriptions is whether the supplied information allows
another researcher to use a similar specimen and produce the same results. The
characteristics that are variable between the specimens should be provided with each
specimen, and attributes that characterize all the specimens as a group may be
provided once.

The origin of the biological specimens. Information required includes detailsof
the organism (species, strain, genotype, sex, age, developmental stage), the
physiologic state of the organism, i.e. normal versus disease, relevant
exogenous factors, i.e. treatment, special diet, and the provider of the
specimen. All information critical for other researchers to be able to reproduce
the biomaterials as closely as possible must be provided. This information is
not limited to the above examples. Referencing an established ontology for the
terms you use is highly encouraged. The rationale: The location of a
population of cells within a specimen is important to know since location may
correlate with gene expression. Differential gene expression may be
consequent to either tissue handling or to tumor biology. For example, p27immunostaining is less frequent and less intense in prostate cancer cells that
are farthest from the cut surface and, thus, are least rapidly fixed [12]. And,
expression of cell cycle regulatory genes is highest at the periphery of lung
cancers [13].)

The manner of preparation of the specimens for the study. Information
required includes the nature of the samples, i.e. whole tissue, tissue sections,
thickness of sections, whole cells, or sections of cells, manner in which the
specimens were prepared for the experiments, i.e. fixation with type of fixative
and duration of fixation versus fresh, non-fixed, non-frozen specimens, versus
frozen specimens, sections mounted on slides versus floating in reagents,
nature of the slides on which sections were mounted, and the protocols used.
Details of how the specimens were stored until use should be provided. For
example, if frozen samples are used, provide information regarding storage
temperature and duration of storage. Referencing previously published
protocols by PubMed ID is permissible if the protocols were appropriately
detailed and were followed exactly. If the specimens are on glass slides,
details should include section thickness and special characteristics of the
slides type of slides, i.e. coating and/or whether the slides are charged slides.
Reporter (probe or antibody) information:
It is critical to provide generous information about the actual reporters (i.e., probes or
antibodies) used since they can greatly vary in their reactivity from lot to lot and from
manufacturer to manufacturer. A manufacturer’s literature usually provides most of
the needed information; key pieces of information should be listed in addition to a
reference to the manufacturer literature since such references may not be permanent.
For privately produced reporters, enough information needs to be provided so that
another lab could produce the exact identical compound.



Unambiguous genomic identification of the reporter:
o
At minimum, the gene identifier and the reference database containing
the identifier.
o
If available, the full sequence of the probe, or clone identifiers of the
antibody.
o
Since such genomic information may not be available for all
antibodies, as much detail that potentially identifies the gene product(s)
that are being studied should be provided.
Protocol for how the reporters were designed and produced or the source from
which they were obtained. For GFP-like experiments, the promoter sequence
should be specified as the reporter.
o
For reporters purchased from a company, the company name and
catalogue number must be provided, as well as the web site that
provides details of the specifications, if available. In addition, key
aspects in the specifications should be repeated since catalogue
numbers and company literature may become unavailable in the future.
o
For a custom made antibody, the putative antigen and references to
studies that characterize the sensitivity and specificity of the antibody
in tissue immunostains should be provided.
Additional attributes of the reporter:
o
For antibodies, include the type of primary antibody (monoclonal vs.
polyclonal), the immunoglobulin isotype, and the organism in which
the antibody was generated.
o
For RNA probes, provide: vector name, cloning site and direction of a
cDNA clone, type of labeling NTPs, in vitro transcription (IVT)
templates (plasmid template linearized with a restriction enzyme or
PCR template with a pair of primers), promoter for IVT labeling
reaction.
Staining protocols and parameters:
The protocols used for staining vary considerably between experimenters. The merits
of standardizing these protocols have been discussed extensively in the literature. This
specification merely insists that the protocol used is sufficiently detailed that another
researcher may follow it exactly and arrive at a similar result.

Number of detectable reporters (e.g., more than one for multiple-dye confocal
fluorescence microscopy) on the hybridization or stain plus specific details
about the detection method:


o
Detection reagent used (e.g., fluorescent, enzyme-substrate, gold
particles, etc.)
o
Source of the detection system plus sufficient details to obtain or
reproduce the reaction product.
The protocol and conditions used to produce the hybridization or
immunostain. This should include the mounting onto the slide/substrate and
subsequent treatments of the section, i.e. immunohistochemical stain protocol,
including parameters such as buffer, temperature, postwash conditions, etc.).
Also include:
o
What steps, if any, were taken to decrease non-specific reaction
product. Such steps for immunoperoxidase experiments might include
preincubation of the specimen preparation with (a) an albumin solution
to block non-specific binding of protein and (b) a peroxide solution to
decrease or abolish reaction product catalyzed by endogenous
peroxidase.
o
Provide details of any antigen or gene product retrieval method that
was used, if any. Such steps for immunohistochemical experiments
could include incubation of sections for a specified time in a specified
buffer subjected to microwave heating for a specified time.
Protocols for the assay controls. Information should include the nature of
negative tissue controls and negative Reporter controls. Optional specificity
Reporter controls, such as competitive inhibition of reaction with either
purified protein or peptide for immunohistochemical studies, should be
provided.
Imaging data and parameters:
Although the MIAME specification stops short of requiring image data, the present
specification requires that representative images be provided since the interpretation
of in situ or immunohistochemistry images is subject to significant observer
variability. The images should well represent the range of gene expression at different
magnifications. While the images are not needed to insure reproducibility of an
experiment, they will aid in the full interpretation and analysis of an experiment as
well as aid in determining why an attempt to reproduce an experiment yielded
differing results. Furthermore, many specimens are unique; consequently, exact
reproduction can be problematic or impossible. Both positive and negative results
should be included and reported and this information is potentially useful for other
work outside the original scope of the reported experiment.
Although not specifically addressed by this specification, it would be of tremendous
value to the community to have an archive of tissue stain images. Such a future effort
would provide examples of tissue localization studies using reagents that are widely
available. This proposed repository could be a reference site for investigators who
want to verify the tissue localizations of Reporter reagents that they might consider
using. Additionally, a general-purpose repository to which researchers could submit
their images for permanent storage with accession numbers for publications would be
very valuable for facilitating MISFISHIE compliance and realizing the full value of
these data to future research. BioImage [14] is such a repository already under
construction at http://www.bioimage.org/.

The images. Any popular file format is acceptable; TIFF or JPEG is preferred.

Image acquisition parameters:
o
Detection method by which hybridization or staining is observed (for
each channel, e.g. fluorescent wavelength, etc., if multiple probes or
antibodies are used).
o
Image scale or total instrument magnification.
o
Image acquisition protocol.
o
Imaging hardware and software used.
o
Image analysis and/or editing software used (if relevant).
Image Characterizations:
The results as interpreted by the original researchers should be reported in a careful
and consistent manner. Not only does this allow reviewers to ensure that the
characterizations are consistent with and representative of the data and, thus, the
conclusions are reasonable, the characterizations should also be able to be stored in
such a way that they can be easily queried and compared with other expression data.
The type of characterization that is recorded for such data can vary significantly
depending on the experimental design. The following guidelines specify a minimum
set of characterization features. Additional characterization of the images as required
by the experimental design should also be provided.

List ontology entries (including reference to ontology, terms, accession
numbers) (or provide term and definition if sufficient detail cannot be found in
an existing ontology) for each structural unit used for classification. Structural
units will be a type of: organ, tissue, cell, subcellular component, etc.

Choose from the MGED Ontology the staining intensity scale. For example, a
three-point scale of absent, equivocal, or present might be appropriate for
evaluating immunohistochemistry stains. However, any scale that the
investigators feel is appropriate may be used as long as each gradation of
intensity in the scale is defined in a manner that an independent investigator
can apply the same set of characterization criteria.

For each structural unit in each slide (or, in each image), provide quantitative
measurements or estimates of:
o
Staining intensity level or optionally, the fraction of the structural unit
population exhibiting each intensity level.
o
Other optional annotations/characterizations of the structural unit, e.g.,
feature density, qualitative characteristics or spatial distribution of the
structural unit or staining. Use of referenced ontology terms is
encouraged.
Both positive and negative measurements of staining relevant to the
experiment should be reported.
For example:
Luminal epithelial cell: present
Basal epithelial cell: absent
etc.
or:
Luminal epithelial cell: 90% present, 10% equivocal, 0% absent
Basal epithelial cell: 0% present, 20% equivocal, 80% absent
etc.

Protocol for the characterization. Information about the basic technique for
characterizing assays should be included, e.g. how many observers performed
the characterizations, assessment of inter-observer variability, whether the
characterizations were performed from the images themselves or visually
through instrument, any exceptions or assumptions made which characterizing
the data, etc.
An example of a simple, small immunohistochemistry experiment annotated using
this checklist is provided as Additional File 1. Other examples are posted at the
MISFISHIE web site, which should always be available as a link from the MGED
workgroup web page http://www.mged.org/Workgroups/.
[Note to contributing readers and reviewers: the current working version of the
MISFISHIE specification and examples may be accessed directly at:
http://scgap.systemsbiology.net/standards/misfishie/
or indirectly via the stable URL shown above.]
Survey of the recent literature
[The data for the following section are not yet available but an assessment is in
progress.]
In order to assess how the MISFISHIE standard compares with what appears to be
standard practice for publication at this time, a selection of articles in the most recent
12 issues of 3 pathology journals was reviewed for adherence to the 6 categories of
the MISFISHIE specification.
[This section to be written upon completion of the survey.]
[Other topics?]
Conclusions
This specification was jointly developed by members of the NIH/NIDDK Stem Cell
Genome Anatomy Projects consortium in order to facilitate data sharing within the
consortium. After use and refinement within the consortium, we offer the
specification published here as MISFISHIE version 1.0 as a proposal to the whole
research community. Additional suggestions from the community will be collected
and suggestions will be folded into a second release, published at the MISFISHIE web
site: http://www.mged.org/Workgroups/MISFISHIE/.
After a suitable period of discussion and revision, we encourage the biomedical
journals to require compliance with MISFISHIE for all published experiments that
include gene expression localization data, so that investigators independent of the
original authors have sufficient knowledge of the methods to conduct independent
replicative experiments.
Our survey of recent articles indicates that approximately NN% of published works
are compliant with this specification, and …
The latest information about MISFISHIE as well as additional examples may be
obtained via the [MISFISHIE Working Group] link at the MGED web site
http://www.mged.org/.
Methods
[Describe the methodology of surveying the journal articles for compliance here.]
List of Abbreviations
MISFISHIE: Minimum Information Specification For In Situ Hybridization and
Immunohistochemistry Experiments
MIAME: Minimum Information About a Microarray Experiment
GFP: green fluorescent protein
MIAPE: Minimum Information About a Proteomics Experiment
PEDRo: Proteomics Experiment Data Repository
MAGE-OM/ML: MicroArray Gene Expression Object Model/ Markup Language
MGED: Microarray Gene Expression Data Society
XML: Extensible Markup Language
Authors' contributions
[Insert Authors Contributions here]
Acknowledgements
This work has been funded in part with federal funds from the National Institute of
Diabetes & Digestive & Kidney Diseases, National Institutes of Health, under
contract U01 DK63630.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
True LD: Quantitative immunohistochemistry: a new tool for surgical
pathology? Am J Clin Pathol 1988, 90(3):324-325.
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C,
Aach J, Ansorge W, Ball CA, Causton HC et al: Minimum information
about a microarray experiment (MIAME)-toward standards for
microarray data. Nat Genet 2001, 29(4):365-371.
Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart
D, Sherlock G, Ball C, Lepage M et al: Design and Implementation of
Microarray Gene Expression Markup Language (MAGE-ML). Genome
Biology 2002, 3(9):RESEARCH0046.
Taylor CF, Paton NW, Garwood KL, Kirby PD, Stead DA, Yin Z, Deutsch
EW, Selway L, Walker J, Riba-Garcia I et al: A Systematic Approach to
Modeling Capturing and Disseminating Proteomics Experimental Data.
Nature Biotechnology 2003, 21(3):247.
Garwood K, McLaughlin T, Garwood C, Joens S, Morrison N, Taylor CF,
Carroll K, Evans C, Whetton AD, Hart S et al: PEDRo: A database for
storing, searching and disseminating experimental proteomics data. BMC
Genomics 2004, 5(1):68.
Stoeckert C, al. e: Drug Discovery Today 2004, 3:159-164.
Swanson PE: Methodologic Standardization in Immunohistochemistry: A
Doorway Opens. Applied Immunohistochemistry 1993, 1(4):229-231.
Taylor CR: An exaltation of experts: concerted efforts in the
standardization of immunohistochemistry. Hum Pathol 1994, 25(1):2-11.
Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal
JS, Corbani LE, Blake JA, Eppig JT et al: The mouse Gene Expression
Database (GXD): updates and enhancements. Nucleic Acids Res 2004, 32
Database issue:D568-571.
Baldock RA, Bard JB, Burger A, Burton N, Christiansen J, Feng G, Hill B,
Houghton D, Kaufman M, Rao J et al: EMAP and EMAGE: a framework
for understanding spatially organized data. Neuroinformatics 2003,
1(4):309-325.
Stoeckert C, Parkinson H, Whetzel T, Spellman P, Ball CA, White J, Matese J,
Fan L, Fragoso G, Heiskanen M et al: The MGED Ontology:
http://mged.sourceforge.net/ontologies/MGEDontology.php. 2004.
De Marzo AM, Fedor HH, Gage WR, Rubin MA: Inadequate formalin
fixation decreases reliability of p27 immunohistochemical staining:
probing optimal fixation time using high-density tissue microarrays. Hum
Pathol 2002, 33(7):756-760.
Dobashi Y, Shoji M, Jiang SX, Kobayashi M, Kawakubo Y, Kameya T:
Active cyclin A-CDK2 complex, a possible critical factor for cell
proliferation in human primary lung carcinomas. Am J Pathol 1998,
153(3):963-972.
14.
Carazo JM, Stelzer EH: The BioImage Database Project: organizing
multidimensional biological images in an object-relational database. J
Struct Biol 1999, 125(2-3):97-102.
Figures
Specimens
Experimental
Design
Staining
Protocols
Imaging Data
Image
Characterization
Probes and
Antibodies
Figure 1: The six sections of the MISFISHIE specification.
Tables
[Insert Tables here]
Additional files
Additional file 1: The HTML seen in MISFISHIE Example 1 currently at:
http://scgap.systemsbiology.net/standards/misfishie/example1.php
Download