Detection Call Algorithms for Genomic Microarray Data Damien Bruno, PhD

advertisement

Detection Call Algorithms for Genomic Microarray

Data

Damien Bruno, PhD

VCGS Pathology, Cytogenetics

Genetic algorithms

Hidden Markov Models

Circular Binary Segmentation wavelets

Maximum Likelihood

Demystifying the Black Box

• Intensity to Copy Number

• LogR and Allelic Data

• Detection Algorithms

– CNV

– LCSH/LOH

– Allelic Imbalance

– Breakpoint Mapping

• Calling Accuracy / Resolution

The ‘Black Box’

1. Image acquisition

2. Image analysis

3. Signal normalization/ smoothing

– has a significant effect on final data quality

4. Duplicate treatment

LogR (signal ratio) and BAF values are calculated for each probe

5. Apply segmentation and/or Calling Algorithms

6. Visual Inspection/ Analysis

Inferring CN from Intensity

LogR - log

2

(I

A

+I

B

)

‘normal’ (2/2, 0), het loss (1/2, -1), het dup (3/2, 0.58), etc

BAF - I

B

/ (I

A

+I

B

)

AA, BB, AB, A, B, AAB, ABB, etc

So Why Do We Need Calling Algorithms?

Various sources of variation that create ‘noise’

• hybridization differences/efficiencies, intensitydependent effects, small local effects (e.g. GC content) etc.

• Errors during scanning, human error during set-up

Pre-processing and Calling Algorithms

• Increase detection of aberrations, precision and confidence

• Removes some of the subjectivity, simplifies analysis

(faster, genomic coordinates)

Detection Algorithms

Algorithm

PennCNV cnvpartition

Computational Approach

HMM (combined LogR and BAF)

Segmentation of CN estimates, smoothing (sliding window)

Circular binary segmentation

Platform /

Software

Illumina /

Affymetrix SNP array

Illumina SNP array

Array-CGH CBS

BirdSuite HMM Affymetrix

Array-CGH GLAD Segmentation adaptive weights smoothing

FACADE Segmentation edge detection and non-parametric statistics

Nexus Rank and

SNPRank

Segmentation, including segment significance calculation

Array-CGH multi

Smoothing

Detection Algorithms

Eilers et al . Bioinformatics, 2005

Detection Algorithms

Calling approaches model data at the probe level (gain, loss, neutral)

LogR

Calling using 3-state HMM

Gain

Normal

Loss

Extend probe states to detect contiguous aberrations (i.e. breakpoint calling)

Detection Algorithms

Segmentation methods seek to identify contiguous regions of common means (LogR, SD, SE, t-statistic)

2 copy (neutral)

1 copy (del)

Once the means are calculated and breakpoints defined, a separate procedure (model) is used to assign copy number states (i.e. calling)

Empirical Comparisons

• Sensitivity-specificity tradeoff

• Effect of algorithm on number of CNVs detected

• Effect of algorithm on accuracy of assigning breakpoints and copy number/ allelic states

Lai et al. Bioinformatics, 2005

Lai et al. Bioinformatics, 2005

Factors Influencing Call Performance

• Array specification (probe target, count, distribution)

• Hybridisation characteristics

– Link calling performance to sample QC

• Segmentation/Calling Algorithm

– Modelling of Data (too restrictive) e.g. HMM (set number of expected states)

• Suboptimal parameter settings e.g. SD and LogR thresholds, ‘confidence’, CNV length, probe number

Zhang et al. PLOS one, 2011

Dellinger et al. Nucleic Acids Research, 2010

Conclusions

confidence/ significance estimates

Model complex genomes

Download