Slides - CFAR-CIG

advertisement
Flow Cytometry and
Reproducible Analysis
Cliburn Chan
Department of Biostatistics and
Bioinformatics, DUMC
Reproducible Analysis
• Can someone in a different lab replicate your
results?
• Can someone else in your lab replicate your
results?
• Can you replicate your own results
– 6 months later?
– When FlowJo goes from version 10.0 to 11.0?
– When your lab catches fire and all your computers
melt into toxic waste?
Complexity of flow analysis
•
•
•
•
•
•
•
•
Experimental design
Running the experiment
Raw data (FCS files)
Compensation
Transformation
Gating strategy
Gates  MFI and relative frequencies
Statistical analysis – e.g. outcome correlation
Experimental design
• Is randomization done correctly?
• Is the sample size sufficient?
• Is there an SOP for annotating the
experiment?
– MIATA
– MiFlowcyt
• What is the informatics strategy to ensure that
data is recorded accurately and backed-up
safely?
Running the experiment
• Stuff I know little about …
• Janet and Jennifer will teach in this workshop
– Instrument calibration
– Bridging studies
– Reagent qualification
– Use of appropriate biological controls
– Use of appropriate technical controls
Raw data (FCS files)
• Is there a file naming SOP that is followed?
• Is there an SOP for recording FCS metadata?
– Channel labels – fluorochrome, antibody, FMO
Inconsistent annotation example
Compensation, transformation and
gating strategy
• Compensation is Real = Spillover-1 × Observed
• Transformation is complicated – can think of as
linear (low values) and log (high values)
• Gating strategy is hard to replicate, but can be
stored as a template and “re-used” with tweaking
• Compensation, transformation and gating should
be done on a per-batch and not per-file basis
• Would recommend storing workspace containing
this data in both .jo and .xml formats
Working with statisticians
• At some point, a statistician is likely to be
asked to analyze your data. This can lead to
much unhappiness.
• Statisticians do not like Excel
– The first thing they will try to do is export to a CSV
or delimited file, for import into SAS or R
– If this is difficult to do, they will not like you
Excel rules for happy statisticians
•
•
•
•
•
1 worksheet = 1 table
1 cell = 1 value
Data/metadata = comprehensive & consistent
Formatting = None
Validation = Yes
1 worksheet = 1 table
• A table has column headers and a number of
rows and nothing else – it is RECTANGULAR
• Do not put more than 1 table in a worksheet
• Do not use non-rectangular tables
• Example of good worksheet
1 worksheet = 1 table
1 cell = 1 value
• Easy to filter by tube, sample or subject
• Easy to write validation rules or lookup table
1 cell = 1 value
• ID column has 3 different values
• Need to do text parsing to recover information
– very error prone
Data: column names
• Consistent column names across worksheets
– Singlets/Lymphocytes
– Singlet/Lymphs
– Singlets / Lymphocytes
– Singlets/Lymphoctyes
• Use full gating path for column name
– Singlets/Lymphocytes/Viable/CD4+/CM/IFN+
Data: What to record
• Better to have more data than less data
– Sample type (PBMC, whole blood)
– Recovery
– Viability
• Better to have basic than derived data
– Counts better than relative frequencies
• Keep link to raw data for reproducibility
– Path to FCS and workspace files on server
• Use special indicator for missing data (e.g. NAN), not zero
• Use as many columns as you need and name them sensibly
and consistently
Data: Versioning
• Do not change the data in the worksheet once
it has been handed to statistician.
• If there are errors that must be corrected,
make a new copy, label the filename with date
and version, and send that to statistician
– ArcticRatExperiment_07May2013_Version01.xlsx
– ArcticRatExperiment_17May2013_Version02.xlsx
Formatting
• Don’t do it.
• Avoid putting information via:
– Highlighting
– Fancy spacing
– Different fonts and font effects
– Merging cells
– Comments
• Will it survive a round-trip from Excel to CSV
and back again?
Formatting - Before
Formatting - After
Comments are lost
Highlighting is lost
Bad cell formatting is lost
Merged cells become missing information
Summary of Reproducible Analysis
•
•
•
•
•
•
•
Know what you are doing from PBMC to Excel
SOPs are important
Annotation is important
Excel is OK if you use NONE of its features
Keep all necessary data in the same place
Keep a remote backup
Talk with your statistician
Biologist talks to Statistician
http://www.youtube.com/watch?v=Hz1fyhVOjr4
Download