Notes 1 - Department of Mathematics and Statistics

advertisement

Advanced Statistical Methods:

Beyond Linear Regression

John R. Stevens

Utah State University

Notes 1. Case Study Data Sets

Mathematics Educators Workshop

28 March 2009 http://www.stat.usu.edu/~jrstevens/pcmi

Why this workshop?

 Me …

 Outreach mission of USU

Recruitment – undergraduate & graduate

Too much fun

 You …

Outline

Notes 1: Case Study Data sets

 1. Challenger Explosion

2. Beetle Fumigation

3. T-cell Cancer

Notes 2: Statistical Methods I

 Logistic Regression – incl. Separation of Points

 EM Algorithm

Notes 3: Statistical Methods II

 Tests for Differential Expression

Multiple hypothesis testing

Visualization

Machine Learning

Notes 4: Computer Implementation

(Notes 5): Bonus Material

4

Case Study 1: Challenger

January 18, 1986 explosion prompted the Presidential

Commission on the Space Shuttle Challenger Accident

Commission's 1986 report attributed the explosion to a burn through of an O-ring seal at a field joint in one of the solidfuel rocket boosters

After each of the previous 24 launches, the solid rocket boosters were inspected, and the presence or absence of damage to the field joint was noted

5

Challenger Data

Motivating question:

What was so different on the 25th launch?

13

14

15

16

17

8

9

10

11

12

5

6

3

4

7

Obs

1

2

18

19

20

21

22

23

24

Flight Temp Damage

STS1 66 NO

STS9 70 NO

STS51B 75

STS2 70

STS41B 57

STS51G 70

STS3 69

NO

YES

YES

NO

NO

YES

NO

STS41C 63

STS51F 81

STS4 80

STS41D 70

STS51I 76

STS5 68

STS41G 78

STS51J 79

STS6 67

STS51A 67

YES

NO

NO

NO

NO

NO

NO

STS61A 75

STS7 72

STS51C 53

STS61B 76

STS8 73

STS51D 67

STS61C 58

YES

NO

YES

NO

NO

NO

YES

Case Study 2: Beetle Fumigation

– Rhyzopertha Dominica

6 (Image courtesy Clemson University – USDA Cooperative Extension Slide Series, www.insectimages.org)

7

Motivation

 Beetle: lesser grain borer

A primary pest of stored grain

A year-round problem in moderate climates

 Australian grain industry:

 $6–8 billion

Zero tolerance for insect-infested grain

Phosphine fumigant for control

Some beetles have developed resistance levels more than 235 times greater than normal

(UQ News Online, 18 Oct. 1999)

8

Experimental Background

 Two DNA markers linked to resistance

 rp6.79: two genotypes: –,+

 rp5.11: three genotypes: B,H,A

 Motivating question:

What contributes to the degree of resistance?

 Mixture of six beetle genotypes  exposure to various concentrations of fumigant (48 hours)

9

Experimental Data

Phosphine

Dosage

Total

Receiving Total

(mg/L)

0

Dosage

98

Deaths

0

Total Survivors Observed at Genotype

Survivors -/B

98 31

-/H

27

-/A +/B +/H +/A

10 6 20 4

0.003

0.004

0.005

0.01

0.05

0.1

0.2

0.3

0.4

1.0

100

100

100

100

300

400

750

500

500

7850

10,798

270

383

740

490

16

68

78

77

492

7,806

10,420

84 18 26 10 6 20

32 10

22 1

23 0

30 0

17 0

10 0

10 0

8 0

44 0

4

4

3

7

5

2

7

6

1 9 8 5

0 0 5 20

0 0 0 10

5

7

0 0 0 0 10

0 0 0 0 10

0 0 0 0 8

0 0 0 0 44

4

4

2

0

378

10

Practical Considerations in

Choosing Dosage

 Clearly a high dosage would kill all beetles, regardless of genotype

 Time more important than concentration

 Expense more time with lower dose

Technical limitations maintain concentration in silos

Safety spontaneous combustion at high conc.

11

Case Study 3: T-cell Cancer

Acute lymphoblastic leukemia (ALL)

 leukemia – cancer of white blood cells

ALL – excess of lymphoblasts (immature cells that become white blood cells)

Two types of interest here:

T-cell – manage cell-mediated immune response

(activation of cells, release of cytokines)

B-cell – manage humoral immune response

(secretion of antibodies)

Researchers used gene expression technology

12

Central Dogma of Molecular

Biology

13

General assumption of microarray technology

 Use mRNA transcript abundance level as a measure of the level of “expression” for the corresponding gene

 Proportional to degree of gene expression

14

How to measure mRNA abundance?

Several different approaches with similar themes:

Affymetrix GeneChip

Nimblegen array

Two-color cDNA array more oligonucleotide arrays

Representation of genes on slide

 Small portion of gene

 Larger sequence of gene

Affymetrix Probes

25 bp

15 (Images courtesy Affymetrix, www.affymetrix.com)

Affymetrix Technology – GeneChip

Each spot on array represents a single probe sequence

(with millions of copies)

Perfect match

Mismatch

Each gene is represented by a unique set of probe pairs (usually

12-20 probe pairs per probe set)

These probes are fixed to the array

16 (Image courtesy Affymetrix, www.affymetrix.com)

Affymetrix Technology – Expression

17

A tissue sample is prepared so that its mRNA has fluorescent tags; wait for hybridization

(Images courtesy Affymetrix, www.affymetrix.com)

Affymetrix GeneChip

18 Image courtesy Affymetrix, www.affymetrix.com

19

Cartoon Representations

 Animation 1: GeneChip structure

(1 min.)

 Animation 2: Measuring gene expression

(2.5 min)

Data: Spot Intensities

20

Full Array Image Close-up of Array Image

Images courtesy Affymetrix, www.affymetrix.com

21

Basic goal of microarray technology

“Observe” gene expression in different conditions – healthy vs. diseased, e.g.

Decide which genes’ expression levels are changing significantly between conditions

Target those genes – to halt disease, e.g.

Study those genes – to better understand differences at the genetic level

22

ALL Data

“Preprocessed” gene expression data

12625 genes (hgu95av2 Affymetrix GeneChip)

128 samples (arrays) a matrix of “expression values” – 128 cols, 12625 rows phenotypic data on all 128 patients, including:

 95 B-cell cancer

 33 T-cell cancer

 Motivating question: Which genes are changing expression values systematically between B-cell and T-cell groups?

23

Next …

 Analysis for these case studies

 Build on known statistical methods

 Notice huge potential for additional methods

Download