Guidelines for Data Presentation Objective • Provide a framework that can be utilized as a tool for the advancement of standardized data presentation Data Guidelines Experimental Design Sample Procurement Sample preparation Fix/Perm Which Fluorophore Controls Isotype? Single color FMO Instrumentation Appropriate Lasers Appropriate Filters Instrument Settings Lin vs Log Time A, W, H Interpretation Mean, Median %+ CV SD Signal/Noise Gating Analysis Presentation Histogram Dot Plot Density Plot Overlay Bar Graph First, Lets address the problem • Data analysis incorporates many disciplines including instrumentation, statistics, biology, and photonics. Often times knowledge in one of the above is missing • Many different instruments and software packages are available. • Historical precedento Unfortunately there is a large body of work published with poor data and no clear guidelines Some Examples of Poor Data Presentation -Arbitrary and difficult to replicate gate -On axis data difficult to visualize, interpret, and review 2011 Nature article Some Examples of Poor Data Presentation -eBioscience product literature -Normal Peripheral blood stained with listed reagents -That’s some bright CD19 and dim CD3 -Ratio between B and T seems off for normal blood Some Examples of Poor Data Presentation From Nature Medicine, 1998. Human stem cells were injected into NOD/SCID mice and were reported to reconstitute multiple lineages. Myeloid B Cells T Cells… CD4 & CD8 Some Examples of Poor Data Presentation “Medium-to-high FS”? Did they backgate to ensure this was the correct gate? Some Examples of Poor Data Presentation An isotype control for two channels? Which one? (CD45 was on yet a third channel; no control for that?) How was gate actually defined on this control? Impossible to estimate the amount of background staining in this histogram: need a gate to express it! Other graphs are shown as bivariate displays, causing difficulty to translate. % Pos? Some Examples of Poor Data Presentation Why are cells expressing both markers? If these are myeloid origin, then why is a lymphocyte gate (“R1”) applied? The cells on the diagonal look like nonspecific staining, and in fact were probably present in the isotype control. Some Examples of Poor Data Presentation Nearly 100% of cells are expressing CD19. If so, then there is no “room” left over for other lineages… The data appears self-contradictory. But without percentages, we cannot tell. Some Examples of Poor Data Presentation Same problem as for “myeloid” cells: The CD2+CD3+ cells appear to be non-specifically-stained. The CD4 and CD8 distributions don’t look like typical mature T cells… and what about the CD4+CD8+. Some Examples of Poor Data Presentation Why do graphs “e” and “h” have so many events compared to graphs “d”, “f”, and “g”? R1 + R2 (2.5%) represents very few events… Some Examples of Poor Data Presentation FITC and PE appear to be over-compensated. Example 1 An Example of Poor Data Presentation: Summary Critical analysis of this figure shows that it does not support the contentions of the authors. This does not mean that the authors were wrong. Reviewers should have demanded a more rigorous example dataset… but perhaps the reviewers were not FACS experts. Guidelines can educate Unfortunately, this example is neither unique… nor even uncommon. Research Misconduct Inquiry The Division of Investigative Oversight, Office of Research Integrity is currently swamped with request for flow cytometry related Research Misconduct Inquiries. Currently a majority of these cases display blatant intentional fraud. However there is a significant trend pushing for flow related guidelines, and the onus on investigators for proper representation of data is growing. Research Misconduct Research Misconduct is defined by law: 42 CFR Sections 50 and 93. Sections 93.103 & 104: Research misconduct is defined as fabrication, falsification, or plagiarism … in reporting research results. Falsification is manipulating … changing or omitting data or results such that the research is not accurately represented in the research record. Misconduct can be committed intentionally, knowingly, or recklessly. There is no wrong way to analyze your data Meaning- Investigators are free to choose: • Which plot types for display • Placement of gates for analysis • Which statistics • # events to display or collect • Which software package to use • How many times you reanalyze There is definitely a wrong way to analyze your data Meaning- Investigators decisions can lead to incorrect data generation or interpretation: • Inappropriate gates for analysis (lymphocyte gate for CD15 staining, or inconsistent gates) • Misleading or inconsistent plots for display • Inappropriate controls (e.g. using isotype for gating) • Inappropriate number of events collected (too few events for meaningful and accurate statistical comparison) Implementation of Guidelines by J. Exp. Med. A set of guidelines for publication of flow cytometry data has been implemented by the Journal of Experimental Medicine All papers submitted for review will be required to comply with the guidelines, with submission of supplementary information, in order to be reviewed. Papers with sophisticated flow cytometric analysis may undergo an independent review to ensure the appropriateness of the analysis and presentation. MIFLowCyt Minimum Information about a flow Cytometry Experiment ISAC Recommendation The fundamental tenet of scientific research is that the published results of any study have to be open to independent validation or refutation. The MIFLowCyt establishes criteria for recording and reporting information about the flow cytometry experiment overview, samples, instrumentation, and data analysis. It promotes consistent annotation of clinical, biological, and technical issues surrounding a flow cytometry experiment by specifying the requirements for data content and by providing a structured framework for capturing information Guidelines: Why do we need them? • A consistent presentation style ensures better communication of data to readers and listener • Speaking a common language • Faster interpretation; understanding nuances • Provides a level of confidence that the data has been appropriately generated and analyzed • Allows reviewers and readers to focus on the point of the presentation, avoiding distractions from inappropriate or inconsistent presentations Guidelines: What they are NOT They will not define how to do science or how to analyze and interpret the data. In most cases, they are not requirements; they simply codify the “between the lines” information. They will not prevent nor reduce purposeful fraud. They can reduce reckless science. They can reduce confusion and ambiguity within published data Introduction Principles and Guidelines A few examples of the principles and guidelines for data presentation follow. Hardware/Software Principles and Guidelines Information about the instrument configuration should be provided Why: Different configurations (laser, filters, etc.) can result in very different sensitivities, compensation requirements, etc. Some experiments (for example, fluorescence intensity comparisons across different days) require that the instrument be carefully calibrated. Interpretation of the significance of the results may require knowledge of these procedures. Instrument • Manufacturer • Identify the FACS instrument and software used to collect, compensate and analyze the data. • Include Model and Version where more than one exists. • Light source • Type • Wavelength • Power • Optics- Band pass, Long Pass, 530/30 Hardware/Software Instrument Configuration Providing instrument configuration is a delicate balance between providing sufficient information as to be useful vs. providing too much that is not helpful. Instrument configuration can be summarized in three sections: • Optical • QA/QC • Compensation There is no “right” procedure (but there are “wrong” procedures for some kinds of experiments). Knowing instrument configuration is necessary to fully interpret data. Hardware/Software Instrument Configuration: Optical The optical configuration determines what fluorescence measurements were made by the instrument. There are two tables: one for lasers, the other for detectors. Lasers Number 1 2 3 4 Wavelength 488 nm 532 nm 408 nm 635 nm Power and Type 15 mW Argon Ion 200 mW Pulsed Diode 25 mW Diode 35 mW HeNe FACS core facilities can create these tables and supply them to users Detectors Name B510 B710 G565 G605 G660 V450 V655 R660 Laser 1 1 2 2 2 3 3 4 Wavelength range 505-515 680-730 565-585 600-620 650-680 420-480 650-680 650-680 Dyes FITC PerCP-Cy5.5 PE TR-PE Cy5PE, PI Pacific Blue, Cascade Blue QD650 APC Hardware/Software Instrument Configuration: QA/QC Knowledge of the QA/QC procedures are necessary to understand how data analysis was performed. Do the gates move from experiment to experiment? Are MFI calculations compared between experiments? Is sensitivity equivalent across experiments? Relevant QA/QC procedures can likely be summarized by a limited set of options that authors select from: o No daily QC (i.e., fire up the instrument and hope that yesterday's settings are close enough) o Alignment using beads: Set the instrument so that the same output fluorescence is observed on each channel every day o Set the instrument up to the same voltages and settings each day (record beads for QA) o Set the instrument up so that unstained cells are in the first decade of fluorescence Hardware/Software Instrument Configuration: Compensation A very brief description of how compensation was accomplished is all that is needed. •What were the controls? (Beads, cells, combinations) •Was compensation manual or automatic? •What software was used to compensate? •Was manual adjustment of compensation necessary? This helps reviewers interpret distributions that they may think are improper compensation. Graphs-General Principles and Guidelines Graph axis labels should include (at a minimum) the reagent being measured Why: Interpretation of the graph is much faster; the reader does not have to translate each label. In the case of fluorescent antibodies, both the specificity and the fluorochrome should be indicated. Do not use “FL1” or “P1” as a label. Fluorescent Reagent Description • • • • • What is binding target Reporter (Fluorochrome) Clone name or number Reagent Manufacturer Reagent catalogue number Graphs-General Principles and Guidelines The number of events displayed in any graph should be indicated Why: • The number of events making up a display can impact on the visualization of the display • The number of events should be considered when interpreting the precision of the analysis Graphs-General Annotating Graphs Indicate with a simple number within or near each graphic, or list in the Figure Legend. Total PBMC Lymphocytes Cy5PE: CD45 PhyEry: CD16 63.0% 6296 10000 events Fluor: CD14 Consistent use of color helps minimize extraneous text ForSc Axis labels show both the measurement and the fluorochrome Figure 001.01 Scaling or Axis labels • Show all parts of the plot axis that indicate the scaling that was used, (Lin, Log, Bi-exponential) • Numerical values for axis “ticks” an be eliminated except when necessary to clarify the scaling. Graphs-General Principles and Guidelines To convey quantitative representation of subsets from graphical displays, a calculated frequency of gated events must be displayed. The graph itself cannot convey such information. Why: Depending on how many events are displayed, the appearance of a subset may be quite different. The only way to assess the frequency with accuracy is to provide a numerical value. Histograms can provide notoriously misleading information about frequencies. Graphs-General Graphs Cannot Convey Frequencies 250 100 200 80 Gate # Cells % of Max 150 100 60 40 50 20 0 0 0 50 100 ForSc 150 200 0 50 100 ForSc 150 200 Two datasets. What is the representation of “large” (high forward-scatter) cells? Does the “red” distribution have more? Figure 001.04 Graphs-General Graphs Cannot Convey Frequencies 250 Events: 4,922 4,922 # Cells 200 150 100 50 0 0 50 100 ForSc 150 200 Blue Which distribution has more cells? Red Figure 001.04 Intensity measurement Explicitly define the statistic applied (mean, median, Geo mean Graphs-General Principles and Guidelines The choice of smoothing and specific display type is up to the author. Choose whichever graph and display options most readily convey the information needed to interpret the experiments, but be consistent across all graphs within an analysis Why: There is no single “best” way to display data. Each display type has advantages and disadvantages. However, using different displays in different graphs may mislead readers because of the nuances of emphasis by each graph type. Gating Principles and Guidelines Whenever gated analyses are performed, an illustration of the gating process should be shown. Why: The way in which cells are gated can dramatically impact the analysis and interpretation, particularly when rare populations are involved. Backgating demonstrates how each gate has impacted the analysis, and can demonstrate that the gating process has not artefactually selected for the subsets being analyzed. The gating “tree” teaches readers how to analyze data when they do similar experiments. Gating Principles and Guidelines Unless otherwise explicitly stated, gating is assumed to have been performed subjectively Why: By convention. Gating Principles and Guidelines The use of control samples to set gates should be shown; the algorithm to place gates should be explicitly defined if it was not subjective Why: In many cases, subjective placement of gates is a reasonable way to analyze the data; interpretation will not be affected by minor relocations of the gate. However, some types of analysis require rigorous placement of gates to provide the most significant data. If gate placement was algorithmic, then it must be described and shown. Gating Gate Placement Algorithms Purely subjective Illustration is always useful. Unlikely to be acceptable for quantitative fluorescence measurements, identification of dimly-expressing subsets; discrimination between overlapping subsets. Based on control stains (unstained, FMO, etc.) The control sample must be shown, along with a description of how it was used to place the gate. If the gates move for different types of samples (e.g., treated vs. untreated), then at least one example of each should be given. Objective algorithm. Detail the algorithm (e.g., “Top 2% of events”; “Autogate defined by software”). Experimental and Sample Information • How were cell suspensions prepared o o o o Specific proteases Filtration Lysing agents Fix/Perm reagents Implementation of Guidelines by J. Exp. Med. In addition to ensuring that primary data presentation conforms with the guidelines, authors will also be expected to submit a single additional supplementary section devoted to the flow cytometry. This section will include: • Table of instrument information (template provided online) • Gating tree example(s) • Gating control(s) • Additional analyses pertinent to the interpretation of the flow cytometric data References Prefetto et al 2006 JIM Keeney et al 1998 Cytometry Cytometry 30(5), 1997 MIFLowCyt 1.0 http://ucflow.blogspot.com/2011/04/displaytransformation-and-flowjo.html (bi-exponential display) Cytometry A 783A:384-385 Seventeen-colour flow cytometry: unravelling the immune system Stephen P. Perfetto, Pratip K. Chattopadhyay & Mario Roederer Nature Reviews Immunology 4, 648-655 (August 2004)