Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE) Eric W Deutsch1, Lawrence D True2, Martin Korb1, David Campbell1, Sean Grimmond3, Michael H Johnson1, Christian J Stoeckert, Jr. 6, Yi Zhou4, Alvin Y Liu5, and certainly many more 1 Institute for Systems Biology, 1441 N 34th Street, Seattle, WA 98103, USA Department of Pathology, University of Washington, Seattle, WA 98195-7705, USA 3 University of Queensland, St. Lucia, Qld, Australia 4 Children's Hospital Boston and Harvard Medical School, Boston, MA, USA 5 Department of Urology, University of Washington, Seattle, WA 98195, USA 6 Center for Bioinformatics and Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA 2 Likely submission target: Genome Biology Abstract Background One goal of the biomedical literature is to report results in a manner sufficiently detailed so that the methods of data collection and analysis can be independently replicated and verified. In order to assure this level of detail in published works, it is useful to define a minimum information specification for each experimental data type and to insist on adherence to this standard for published material. Such a specification has been widely accepted in and been a benefit to the microarray community, and efforts are well underway for proteomics data types. However, no such specification yet exists for visual interpretation-based tissue gene expression localization experiments, e.g. in situ hybridization and immunohistochemistry experiments. Results We present such a specification, termed “Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE)”. It is modelled after the MIAME (Minimum Information About a Microarray Experiment) specification for microarray experiments. These data standards detail the content of the information that is deemed necessary to satisfy the requirements of the standards; these standards do not dictate a specific format for encoding the information. The MISFISHIE standard describes the information that must be provided in six sections: Experimental Design, Specimens, Probe or Antibody Information, Staining Protocols and Parameters, Imaging Data and Parameters, and Image Characterizations. A general checklist is provided to quickly and efficiently establish adherence to the standard. We surveyed articles in the most recent 12 issues of 3 pathology journals for potential adherence to this proposed specification and found that approximately NN% lack sufficient information for independent investigators to readily repeat the study. [these are numbers are preliminary. Comments withheld until survey completed and validated.] Conclusion This specification was jointly developed by members of the NIH/NIDDK Stem Cell Genome Anatomy Projects consortium in order to facilitate data sharing within the consortium. Use of the standard has benefited the consortium and would likely benefit the whole research community. We encourage journals to encourage and eventually require compliance with MISFISHIE for all studies that include gene expression localization data, so that all published data and resulting conclusions may be properly interpreted and that independent investigators have the information that would enable them to replicate the findings. More information and examples may be obtained at http://scgap.systemsbiology.net/standards/misfishie/. Background Gene expression localization experiments, such as in situ hybridization and immunohistochemistry, are frequently used to determine the exact source of expression observed in more high-throughput (e.g., microarray, proteomics, etc.) assays. For example, most samples derived from tissue contain many different cells types, and the expression profiles of each cell type are each likely to be different. Consequently, the component of a tissue-based sample that is responsible for the expression signal of a particular gene is ambiguous. Confirming the location and degree of gene expression within a tissue and cell type is often achieved by staining a section of the tissue with a suitable reporter to the gene or gene product of interest. However, it is often the case that in situ hybridization and immunohistochemistry stains and/or images are presented with minimal interpretation and methodology. Furthermore neither the reagents and methods used in the experiments nor the results are easily searchable. Above all, the interpretation of in situ hybridization and immunohistochemistry stains varies between observers, between different image analysis platforms and programs, and even between different sessions using the same image analysis platform and program [1]. Data standards have emerged for microarray data ([2, 3]) to the great benefit of the entire biological community. [Explain how.] Additional standards are under development for other high-throughput technologies ([4, 5]). In order to advocate that results from new gene expression localization experiments are of maximum benefit to the biomedical community, we propose a minimum information standard, termed “Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE)”. This standard describes the minimum information that should be provided when publishing, making public, or exchanging results from visual interpretation-based tissue gene expression localization experiments such as in situ hybridization, immunohistochemistry, reporter construct genetic experiments (GFP/green fluorescent protein, βgalactosidase), etc. Compliance to this standard is expected to provide researchers at other labs enough information to reproduce the experiment and/or to fully evaluate the data upon which results are based. Modeled after the well-received MIAME (Minimum Information About a Microarray Experiment) specification for microarray experiments [2], this specification only describes the kind of information that should be provided. MISFISHIE does not list every parameter that could be specified about any experiment, but rather lists broader categories of detail that must be addressed, relying on the data producers and reviewers to insure that each section contains enough information for readers to be able to fully assess the validity and accurately reproduce the experiment being described. As MIAME has been a yardstick for whether databases and publications have done their job in adequately describing a microarray experiment such that its conclusion could be verified or refuted [6], we hope MISFISHIE will provide a similar function for gene expression localization experiments. Furthermore, this specification does not dictate a specific format for reporting the information. We expect to develop a data model based on the concepts of MAGEOM (MicroArray Gene Expression Object Model [3]) and on the MAGEstk (MicroArray Gene Expression software tool kit [3]) in the near future. It is this model and the associated XML-based mark-up language that will provide the recommended data format for archiving or transferring such data. Since MAGE-OM version 2 (perhaps FuGE-OM) [ref to Angel’s paper] is being designed to be able to include other functional genomics experiments in a modular fashion, it is likely that the MISFISHIE-derived object model would be integrated within FuGE-OM rather than separate from it. Separation of the minimum information specification and the data format is important because the data format should both allow for unlimited additional information beyond the minimum as well as allow one to encode incomplete information for optimum flexibility. In addition, broader acceptance of the minimum information required will greatly aid the design of a data model. It has long been recognized that improved standards for immunohistochemistry are needed ([7, 8]). However, these standardization discussions have largely focused either on developing standardized technical protocols which might be used to produce more uniform stains [7], or on reducing the subjectivity of interpreting histologic sections [8]. We do not attempt here to promote standardized methodologies for production or interpretation of assays (although that is certainly encouraged). Rather, we wish to promote complete disclosure of the methods actually used so that any result may be replicated by others using the identical procedure that the initial investigators used. While no official minimum standard for such data yet exists, there have been several efforts at organizing gene expression localization data in databases and such database designs provide a useful framework from which to build a standard. Two notable databases specialized for the mouse community, the Mouse Gene Expression Database (GXD) [9] and the Edinburgh Mouse Atlas Gene Expression (EMAGE) database [10], have influenced the design of MISFISHIE. Mouse-specific fields in these databases have been removed in favour of more organism-neutral requirements. And further, several fields in these databases were deemed useful but not a minimum requirement and were thus not included here. Many experiments in these databases extracted by curators from journal articles do have many empty fields because they were not sufficiently described in the original articles. Requiring MISFISHIE compliance for new articles will result in more complete and reproducible experiments in these and other databases in the future. Results and Discussion This specification describes the type of information that must be provided for published gene expression localization experiments in six sections (Figure 1): 1. Experiment Design 2. Specimens 3. Probe or Antibody Information 4. Staining Protocols and Parameters 5. Imaging Data and Parameters 6. Image Characterization The following checklist provides a guideline for insuring that data is compliant to the specification. It will be useful for researchers preparing to publish data as well as manuscript reviewers attempting to verify MISFISHIE compliance. Wherever possible descriptions should use terms from the Microarray Gene Expression Data Society (MGED) Ontology (MO) [11]. For terms outside the scope of MO, such as anatomy terms, another appropriate ontology should be used. Use of MO and other ontologies is especially important as MISFISHIE-supporting applications and databases are developed. Many of the term used in this specification are defined in MO. Experiment Design: This section should contain information about the gene expression localization experiment as a whole. These include a brief description of the project, ExperimentalFactors, and the methods. For example the variables between the assays in the experiment, and how to get more information about the experiment (web sites and persons to contact). Experiment Description: Short summary of the goals of the experiment. Assay Type(s): e.g., immunohistochemistry, in situ hybridization, GFP, etc. Experiment Design Type: For example, is it a comparison of normal vs. diseased tissue, of multiple tissue specimens of similar type, of multiple probes/antibodies applied to the same tissue, etc.? See MGED Ontology ExperimentDesignType as a basis for categorizing the design type. ExperimentalFactors: the parameters or conditions that are tested, such as probe/antibody, disease state, genetic variation, structural unit, age, etc. The total number of hybridizations/stains performed in the experiment: a hybridizations/stain is defined as an assay of a single tissue. Thus, an immunostain of a section of a tissue microarray consisting of a 10 x 10 array of different tissues counts as 100 immunostains. If replicates or reruns are a component of the experimental design, provide details that should include number of replicates per tissue, per antibody, per probe, etc., as relevant. URL of any websites or database accession numbers that are related to the experiment (if available). Contact information for communicating with the experimenter(s). BioMaterials (specimens) used, and Treatments (section or mount preparation): Describing specimens comprehensively is difficult, as they may have nearly an infinite number of characteristics, especially if clinical information is available. The guiding principle in sample descriptions is whether the supplied information allows another researcher to use a similar specimen and produce the same results. The characteristics that are variable between the specimens should be provided with each specimen, and attributes that characterize all the specimens as a group may be provided once. The origin of the biological specimens. Information required includes detailsof the organism (species, strain, genotype, sex, age, developmental stage), the physiologic state of the organism, i.e. normal versus disease, relevant exogenous factors, i.e. treatment, special diet, and the provider of the specimen. All information critical for other researchers to be able to reproduce the biomaterials as closely as possible must be provided. This information is not limited to the above examples. Referencing an established ontology for the terms you use is highly encouraged. The rationale: The location of a population of cells within a specimen is important to know since location may correlate with gene expression. Differential gene expression may be consequent to either tissue handling or to tumor biology. For example, p27immunostaining is less frequent and less intense in prostate cancer cells that are farthest from the cut surface and, thus, are least rapidly fixed [12]. And, expression of cell cycle regulatory genes is highest at the periphery of lung cancers [13].) The manner of preparation of the specimens for the study. Information required includes the nature of the samples, i.e. whole tissue, tissue sections, thickness of sections, whole cells, or sections of cells, manner in which the specimens were prepared for the experiments, i.e. fixation with type of fixative and duration of fixation versus fresh, non-fixed, non-frozen specimens, versus frozen specimens, sections mounted on slides versus floating in reagents, nature of the slides on which sections were mounted, and the protocols used. Details of how the specimens were stored until use should be provided. For example, if frozen samples are used, provide information regarding storage temperature and duration of storage. Referencing previously published protocols by PubMed ID is permissible if the protocols were appropriately detailed and were followed exactly. If the specimens are on glass slides, details should include section thickness and special characteristics of the slides type of slides, i.e. coating and/or whether the slides are charged slides. Reporter (probe or antibody) information: It is critical to provide generous information about the actual reporters (i.e., probes or antibodies) used since they can greatly vary in their reactivity from lot to lot and from manufacturer to manufacturer. A manufacturer’s literature usually provides most of the needed information; key pieces of information should be listed in addition to a reference to the manufacturer literature since such references may not be permanent. For privately produced reporters, enough information needs to be provided so that another lab could produce the exact identical compound. Unambiguous genomic identification of the reporter: o At minimum, the gene identifier and the reference database containing the identifier. o If available, the full sequence of the probe, or clone identifiers of the antibody. o Since such genomic information may not be available for all antibodies, as much detail that potentially identifies the gene product(s) that are being studied should be provided. Protocol for how the reporters were designed and produced or the source from which they were obtained. For GFP-like experiments, the promoter sequence should be specified as the reporter. o For reporters purchased from a company, the company name and catalogue number must be provided, as well as the web site that provides details of the specifications, if available. In addition, key aspects in the specifications should be repeated since catalogue numbers and company literature may become unavailable in the future. o For a custom made antibody, the putative antigen and references to studies that characterize the sensitivity and specificity of the antibody in tissue immunostains should be provided. Additional attributes of the reporter: o For antibodies, include the type of primary antibody (monoclonal vs. polyclonal), the immunoglobulin isotype, and the organism in which the antibody was generated. o For RNA probes, provide: vector name, cloning site and direction of a cDNA clone, type of labeling NTPs, in vitro transcription (IVT) templates (plasmid template linearized with a restriction enzyme or PCR template with a pair of primers), promoter for IVT labeling reaction. Staining protocols and parameters: The protocols used for staining vary considerably between experimenters. The merits of standardizing these protocols have been discussed extensively in the literature. This specification merely insists that the protocol used is sufficiently detailed that another researcher may follow it exactly and arrive at a similar result. Number of detectable reporters (e.g., more than one for multiple-dye confocal fluorescence microscopy) on the hybridization or stain plus specific details about the detection method: o Detection reagent used (e.g., fluorescent, enzyme-substrate, gold particles, etc.) o Source of the detection system plus sufficient details to obtain or reproduce the reaction product. The protocol and conditions used to produce the hybridization or immunostain. This should include the mounting onto the slide/substrate and subsequent treatments of the section, i.e. immunohistochemical stain protocol, including parameters such as buffer, temperature, postwash conditions, etc.). Also include: o What steps, if any, were taken to decrease non-specific reaction product. Such steps for immunoperoxidase experiments might include preincubation of the specimen preparation with (a) an albumin solution to block non-specific binding of protein and (b) a peroxide solution to decrease or abolish reaction product catalyzed by endogenous peroxidase. o Provide details of any antigen or gene product retrieval method that was used, if any. Such steps for immunohistochemical experiments could include incubation of sections for a specified time in a specified buffer subjected to microwave heating for a specified time. Protocols for the assay controls. Information should include the nature of negative tissue controls and negative Reporter controls. Optional specificity Reporter controls, such as competitive inhibition of reaction with either purified protein or peptide for immunohistochemical studies, should be provided. Imaging data and parameters: Although the MIAME specification stops short of requiring image data, the present specification requires that representative images be provided since the interpretation of in situ or immunohistochemistry images is subject to significant observer variability. The images should well represent the range of gene expression at different magnifications. While the images are not needed to insure reproducibility of an experiment, they will aid in the full interpretation and analysis of an experiment as well as aid in determining why an attempt to reproduce an experiment yielded differing results. Furthermore, many specimens are unique; consequently, exact reproduction can be problematic or impossible. Both positive and negative results should be included and reported and this information is potentially useful for other work outside the original scope of the reported experiment. Although not specifically addressed by this specification, it would be of tremendous value to the community to have an archive of tissue stain images. Such a future effort would provide examples of tissue localization studies using reagents that are widely available. This proposed repository could be a reference site for investigators who want to verify the tissue localizations of Reporter reagents that they might consider using. Additionally, a general-purpose repository to which researchers could submit their images for permanent storage with accession numbers for publications would be very valuable for facilitating MISFISHIE compliance and realizing the full value of these data to future research. BioImage [14] is such a repository already under construction at http://www.bioimage.org/. The images. Any popular file format is acceptable; TIFF or JPEG is preferred. Image acquisition parameters: o Detection method by which hybridization or staining is observed (for each channel, e.g. fluorescent wavelength, etc., if multiple probes or antibodies are used). o Image scale or total instrument magnification. o Image acquisition protocol. o Imaging hardware and software used. o Image analysis and/or editing software used (if relevant). Image Characterizations: The results as interpreted by the original researchers should be reported in a careful and consistent manner. Not only does this allow reviewers to ensure that the characterizations are consistent with and representative of the data and, thus, the conclusions are reasonable, the characterizations should also be able to be stored in such a way that they can be easily queried and compared with other expression data. The type of characterization that is recorded for such data can vary significantly depending on the experimental design. The following guidelines specify a minimum set of characterization features. Additional characterization of the images as required by the experimental design should also be provided. List ontology entries (including reference to ontology, terms, accession numbers) (or provide term and definition if sufficient detail cannot be found in an existing ontology) for each structural unit used for classification. Structural units will be a type of: organ, tissue, cell, subcellular component, etc. Choose from the MGED Ontology the staining intensity scale. For example, a three-point scale of absent, equivocal, or present might be appropriate for evaluating immunohistochemistry stains. However, any scale that the investigators feel is appropriate may be used as long as each gradation of intensity in the scale is defined in a manner that an independent investigator can apply the same set of characterization criteria. For each structural unit in each slide (or, in each image), provide quantitative measurements or estimates of: o Staining intensity level or optionally, the fraction of the structural unit population exhibiting each intensity level. o Other optional annotations/characterizations of the structural unit, e.g., feature density, qualitative characteristics or spatial distribution of the structural unit or staining. Use of referenced ontology terms is encouraged. Both positive and negative measurements of staining relevant to the experiment should be reported. For example: Luminal epithelial cell: present Basal epithelial cell: absent etc. or: Luminal epithelial cell: 90% present, 10% equivocal, 0% absent Basal epithelial cell: 0% present, 20% equivocal, 80% absent etc. Protocol for the characterization. Information about the basic technique for characterizing assays should be included, e.g. how many observers performed the characterizations, assessment of inter-observer variability, whether the characterizations were performed from the images themselves or visually through instrument, any exceptions or assumptions made which characterizing the data, etc. An example of a simple, small immunohistochemistry experiment annotated using this checklist is provided as Additional File 1. Other examples are posted at the MISFISHIE web site, which should always be available as a link from the MGED workgroup web page http://www.mged.org/Workgroups/. [Note to contributing readers and reviewers: the current working version of the MISFISHIE specification and examples may be accessed directly at: http://scgap.systemsbiology.net/standards/misfishie/ or indirectly via the stable URL shown above.] Survey of the recent literature [The data for the following section are not yet available but an assessment is in progress.] In order to assess how the MISFISHIE standard compares with what appears to be standard practice for publication at this time, a selection of articles in the most recent 12 issues of 3 pathology journals was reviewed for adherence to the 6 categories of the MISFISHIE specification. [This section to be written upon completion of the survey.] [Other topics?] Conclusions This specification was jointly developed by members of the NIH/NIDDK Stem Cell Genome Anatomy Projects consortium in order to facilitate data sharing within the consortium. After use and refinement within the consortium, we offer the specification published here as MISFISHIE version 1.0 as a proposal to the whole research community. Additional suggestions from the community will be collected and suggestions will be folded into a second release, published at the MISFISHIE web site: http://www.mged.org/Workgroups/MISFISHIE/. After a suitable period of discussion and revision, we encourage the biomedical journals to require compliance with MISFISHIE for all published experiments that include gene expression localization data, so that investigators independent of the original authors have sufficient knowledge of the methods to conduct independent replicative experiments. Our survey of recent articles indicates that approximately NN% of published works are compliant with this specification, and … The latest information about MISFISHIE as well as additional examples may be obtained via the [MISFISHIE Working Group] link at the MGED web site http://www.mged.org/. Methods [Describe the methodology of surveying the journal articles for compliance here.] List of Abbreviations MISFISHIE: Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments MIAME: Minimum Information About a Microarray Experiment GFP: green fluorescent protein MIAPE: Minimum Information About a Proteomics Experiment PEDRo: Proteomics Experiment Data Repository MAGE-OM/ML: MicroArray Gene Expression Object Model/ Markup Language MGED: Microarray Gene Expression Data Society XML: Extensible Markup Language Authors' contributions [Insert Authors Contributions here] Acknowledgements This work has been funded in part with federal funds from the National Institute of Diabetes & Digestive & Kidney Diseases, National Institutes of Health, under contract U01 DK63630. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. True LD: Quantitative immunohistochemistry: a new tool for surgical pathology? Am J Clin Pathol 1988, 90(3):324-325. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC et al: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001, 29(4):365-371. Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M et al: Design and Implementation of Microarray Gene Expression Markup Language (MAGE-ML). Genome Biology 2002, 3(9):RESEARCH0046. Taylor CF, Paton NW, Garwood KL, Kirby PD, Stead DA, Yin Z, Deutsch EW, Selway L, Walker J, Riba-Garcia I et al: A Systematic Approach to Modeling Capturing and Disseminating Proteomics Experimental Data. Nature Biotechnology 2003, 21(3):247. Garwood K, McLaughlin T, Garwood C, Joens S, Morrison N, Taylor CF, Carroll K, Evans C, Whetton AD, Hart S et al: PEDRo: A database for storing, searching and disseminating experimental proteomics data. BMC Genomics 2004, 5(1):68. Stoeckert C, al. e: Drug Discovery Today 2004, 3:159-164. Swanson PE: Methodologic Standardization in Immunohistochemistry: A Doorway Opens. Applied Immunohistochemistry 1993, 1(4):229-231. Taylor CR: An exaltation of experts: concerted efforts in the standardization of immunohistochemistry. Hum Pathol 1994, 25(1):2-11. Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal JS, Corbani LE, Blake JA, Eppig JT et al: The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res 2004, 32 Database issue:D568-571. Baldock RA, Bard JB, Burger A, Burton N, Christiansen J, Feng G, Hill B, Houghton D, Kaufman M, Rao J et al: EMAP and EMAGE: a framework for understanding spatially organized data. Neuroinformatics 2003, 1(4):309-325. Stoeckert C, Parkinson H, Whetzel T, Spellman P, Ball CA, White J, Matese J, Fan L, Fragoso G, Heiskanen M et al: The MGED Ontology: http://mged.sourceforge.net/ontologies/MGEDontology.php. 2004. De Marzo AM, Fedor HH, Gage WR, Rubin MA: Inadequate formalin fixation decreases reliability of p27 immunohistochemical staining: probing optimal fixation time using high-density tissue microarrays. Hum Pathol 2002, 33(7):756-760. Dobashi Y, Shoji M, Jiang SX, Kobayashi M, Kawakubo Y, Kameya T: Active cyclin A-CDK2 complex, a possible critical factor for cell proliferation in human primary lung carcinomas. Am J Pathol 1998, 153(3):963-972. 14. Carazo JM, Stelzer EH: The BioImage Database Project: organizing multidimensional biological images in an object-relational database. J Struct Biol 1999, 125(2-3):97-102. Figures Specimens Experimental Design Staining Protocols Imaging Data Image Characterization Probes and Antibodies Figure 1: The six sections of the MISFISHIE specification. Tables [Insert Tables here] Additional files Additional file 1: The HTML seen in MISFISHIE Example 1 currently at: http://scgap.systemsbiology.net/standards/misfishie/example1.php