Lies, Damned Lies, and Health Physics

advertisement
Lies, Damned Lies,
and Health Physics
Some Random Comments About
Statistics in Health Physics
Tom LaBone
Savannah River Chapter of
the Health Physics Society
Aiken, SC
April 15, 2011
1
“There are three kinds of lies: lies, damned lies,
and statistics.”
Mark Twain
“It is easy to lie with statistics.”
“It is hard to tell the truth without statistics."
Andrejs Dunkels
2
Today

Informal, mostly apocryphal discussion of




Main message of talk



what statistics really is,
who practices statistics and how they do it, and
why all of this is important to you as a health physicist
A good working knowledge of statistics is essential in any
endeavor where data are collected and analyzed (e.g., health
physics)
Everyone in the room should become a statistician (of sorts)
No math is used in this presentation and no health
physicists were harmed during its preparation
3
Health Physics and Statistics

Some HP “stat” books I used in school
F. Knoll Radiation Detection and Measurement 1st
Edition 1979
 J. Shapiro Radiation Protection 1nd Edition 1972
 H. Cember Introduction to Health Physics 1st Edition
1969
 R. D. Evans The Atomic Nucleus 1955
 P. R. Bevington Data Reduction and Error Analysis for
the Physical Sciences 1st Edition 1969
 G.

Statistics was a tool, a “wrench to turn a nut”
 Is
that all it is?
4
What is Statistics?
“Humans are good, she knew, at discerning subtle
patterns that are really there, but equally so at
imagining them when they are altogether absent.”
Carl Sagan in Contact
5
Signals and Noise
Useful information comes to us in the form
of signals that form distinct patterns
 The signals are contaminated with varying
degrees of noise, which can make it
difficult to see the signal

6
Seeing Patterns

In our evolutionary history,
seeing patterns where none
existed may have been less
harmful than missing patterns
that did exist
noise in the grass – is it
just the wind or is it a lion?
 That

So, we as a species got very
good at seeing patterns, even
in the absence of a signal
7
Apophenia
Apophenia is the experience of seeing
meaningful patterns or connections in
random or meaningless data
 What do you see below?

8
Face on Mars
Viking 1 Orbiter
Mars Global Surveyor
9
Face in Food, et cetera
10
Face in Data
11
Statistics is …




… a science that helps us to differentiate signal
from noise and make decisions with a known
probability of being wrong
… a very practical, decision oriented
methodology developed to tame our natural
tendency to be Apopheniacs
… based on the idea that variability and noise
are natural and unavoidable
… a relatively modern science that is actively
evolving
 especially
since cheap, powerful computers became
available
12
Really, What is Statistics?
“Statistics is concerned with collecting,
analyzing, and interpreting data in the best
possible way, where the meaning of “best”
depends on the particular circumstances of
the practical situation”
Chris Chatfield
Problem Solving: A Statistician’s Guide
13
Exploratory Data Analysis

Look at data (usually with graphics) and use our
ability to see patterns in the data to
 Suggest
hypotheses to test
 Assess validity of assumptions on which statistical
inference will be based
 Support the selection of appropriate inferential tests
 Suggest ideas for further data collection
14
Air Filters
Fecal Samples
Kinectrics Filters All
3
338
477
477
157
176
157
176
20
509
479
10
5
5
4 2
3
3
6
5
4 2
476
142
203
136475
511 451453
159
137 202178
155
174
212
518
85
293
158 138
177
322
164
183
210
150229
167
186
500
417
149
337
287
301
454
319
204
486
151144
517504
496 298
512
194
162
181
146
498
403
471
519
502 91
321
48
217
128
488
148
193
10 449
28
231
510
46
187
491
497
25
7216
27
9 143
90
508
23
5134
64 63
201
467
235
168
188
478
437
356
513
264
57
481
355214
340
516
165
184
56
240
130
227
462
480
59
22
4520
169
189
289
220
261
26
8
198
232
233
457
492
166
185
55
386
236
129
506
505
309
110
113
263
140
170
190
432
219
205
213
16
34
369
153
172
265
507
332
465
239
326
211
446
238
71
317
257
230
330
371
404
21
3442
254
288
400
192
439
141
50
156
175
44
66
38
253
414
242
464
269
388
160
179
273
470
484
11
29
468
223
195
303
43
335
297
226
67
197
81
171
191
234
354
364
472
342
392
62
455
118
99
245
88
394
416
237
272
387
415
13
31
92
101
329
283
390
53
302
1
3
103
147
2
40
206
260
4
2
71
80
291
207
348
247
461
405
397
409
107
270
275
277
349
357
374
411
447
221
286
347
365
440
458
324
421
73
89
209
393
58
154
173
17
35
86
246
102
249
115
39
456
389
473
344
425
295
52
45
196
222
311
372
399
419
418
424
427
452
61
60
112
208
370
377
445
125
135
345
19
391
200
250
1428
304
360
450
413
100
218
82
398
49
412
109
429
284
98
132
87
37
122
161
180
215
248
255
278
306
316
396
430
438
466
487
362
382
259
367
380
401
163
182
351
268
341
368
4104
366
406
18
36
44
126
385
258
353
459
299
152
490
515
358
76
123
127
228
276
281
331
334
352
378
383
422
443
75
296
343
381
93
111
133
292
318
346
375
420
433
435
376
423
94
290
105
117
119
199
15
33
131
96
483
314
69
501
300
384
436
463
70
97
145
114
224
266267
280
305
308
312
315
339
350
361
373
407
431
441
460
469
482
485
495
494
68
241
244
499
503
285
313
359
448
474
79
434
24
323
327
74
106
336
6
42
274
363
83
124
294
78
121
77
3204765
493
410
256
41
252
251
262
310
395
402
72
325
279
333
489
84
108
307
514
95
116
54
282
139
225
426
51
20
2328
120
379
408
12
30
Pu239
15
Pu239
10
6
243
509
479
476
203
136 475
453
511
159
178
451
155 137
174
202
212
518
293 138 85
158
177
322
164
183
150
167
186
500210
229
149
417
287
337
301
319
454
204
151
517
486
144
496
298
512
504
194
162
181
403
498
146
502
519
321
48
217
91231 471
128
148
488
10
28
193
510
46
187
491
134
449
7437
497
27
9214
216
90
23
5 25
508
6364
467
201
168
188
143
235
478
513
264
356
520
57
355
56
462
165
184
340
516
227
240
130
59
480
22
4481 232236
220
261
169
189
289
26
8386
198
233
457
55
492
166
185
129
110
113
432
263
205
170
506
213
309
505
140
219
16
34
446
369
326
465
265
239
332
153
172
211
71
507
238
317
257
371
230
330
400
404
192
141
288
21
156
175
439
3190
254
50
44
414
470
388
464
253
273
38
160
179
269
484
66
242
223
335
468
303
197
226
195
11
29
81
297
67
43
394
354
364
171
191
392
455
99
245
88
118
234
342
62
472
101
103
291
329
271
14
32
260
348
415
13
31
272
442
247
283
387
80
40
237
302
390
53
92
206
207
147
357
447
270
349
405
458
461
107
397
102
275
286
409
73
89
365
411
154
173
324
374
221
246
277
393
440
347
421
86
17
35
58
249
115
39
419
399
345
377
398
413
425
424
456
473
112
125
218
295
208
389
450
200
370
372
445
196
304
391
427
452
222
344
360
52
100
250
19
45
61
135
311
418
1416
82
60
49
65
215
412
396
466
259
362
368
367
401
430
163
182
255
278
284
444
487
122
126
132
351
382
429
104
268
306
341
366
406
438
316
37
248
380
98
161
180
87
428
109
18
36
353
385
459
258
276
281
299
334
420
111
123
292
343
352
376
375
383
433
435
515
127
152
228
296
331
358
381
422
490
117
131
318
346
443
94
199
290
423
93
96
119
76
75
105
378
328
133
15
33
474
483
314
448
145
266
363
361
384
407
436
441
494
501
114
285
313
320
327
339
359
463
495
106
224
241
280
300
308
323
350
460
482
97
121
294
373
431
469
485
503
47
68
244
312
434
499
24
79
124
274
315
70
69
77
83
6209
78
305
336
42
74
267
410
252
310
395
493
514
256
325
489
95
251
262
307
333
402
72
116
279
108
84
41
54
282
225
426
139
20
51
2
408
120
379
12
30
142
243
243
1
3
Slope = 0.316
Slope = 0.236
157
176
6
2
8
231
Cm244
3
0
2
5
4
Slope = 2.02
60
6
Slope = 1.38
477
338
142
Slope = 4.56
40
10
50
Slope = 6.09
159
178
471202
85
210
44
137453475203
91 287 212
298
509
451 479
229 337 136
204
462 449502496
150
293
500
155518
174
63
504
164
183
39
476
214
511
167
186
216301
217
10
28
65
64
149
322
193
48
144
247 355
90
47
321
519
134
25
7517
403
498
437 138
416227
115
264
130
23
5319
16
34
129
148
151
162
181
236
168
188
520
235
486
320249
417
282
194
497
512
49
467
240
22
4481
398
205
81
213
158
177
128
454
394
126
77
197
156
175
457
508
102
121
165
184
219
57
446
371
82
484
56
59
478
238
510
348
328
211
71
46
146
246
78
330
207
233
386
488
291
100
245
470
88
170
190
26
8
218
294
432
226
261
326
267
160
179
239
413
116
154
173
17
35
55
54
220
273
230
166
185
131
366
406
20
96
124
99
2
86
388
80
118
18
36
269
492
297
67
140
143
444
104
260
465
198
332
289
516
232
368
95
341
199
464
15
33
58
206
263
50
513
356 242
363
514
450
304
141
393
455
428
507
106
192
307
268
335
110
108
113
119
274
360
392
51
83
105
250
336
209
265
342
439
147
340
163
182
426
103
323
351
117
200
271
290
421
42
40
302
62
153
172
345
400
327
73
89
324
391
434
84
283
390
472
474
408
259
377
401
458
112
111
125
292
359
376
375
489
208
286
329
333
318
346
365
370
445
14
32
279
423
24
221
414
440
347
380
6
19
135
74
1
133
195
53
448
362
367
420
101
285
313
325
343
433
435
241
296
382
381
223
503
94
244
303
499
79
187
43
357
419
215
399
447
252
396
466
145
266
276
281
310
334
349
361
395
407
424
430
441
494
114
123
225
255
278
339
352
383
487
495
122
127
224
228
251
257
262
275
280
308309
331
350
354
364
379
422
460
482
139
171
191
306
373
372
402
411
431
438
443
469
485
68
72
196
312
316
374
415
427
452
13
12
31
30
93
222
272
277
288
315
442
75
248
378
61
161
180
305
311
387
418
234
253
92
60
254
169
189
491
27
9480
201
358
37
45
41
38
505
270
317
384
436
107
256
295
463
515
132
152
300
490
97
468
70
76
87
506
258
299
425
501
344
52
237
314
353
459
473
120
284
397
389
409
369
98
11
483
69
429
21
329
385
405
66
456
410
109
404
461
493
412
15
4
1
Cm244
2
10 12
2
338
8
1
25
1
6
0
10
1
4
8
2
6
0
4
30
2
Fecals as of 3/5/2011
Am241
5
10
15
20
25
30
10
20
30
40
50
60
0
10
20
5
30
Am241
0
2
4
6
8
10
12
0
5
10
15
15
Confirmatory Data Analysis

Use statistical tests to answer questions
about the data along with the risks of
reaching the wrong conclusion
 Is
the material on the filters the same material
that is in the fecal samples?
 Are the Pu-239 to Am-241 ratios in the fecal
samples and air samples the same once we
account for random noise?
16
Fecal Samples
10
5
95% CI = (1.33, 1.46)
0
Am-241 (mBq)
15
2
0
2
4
6
Pu-239 (mBq)
8
10
12
17
Data Dredging



Are the two Pu-239 to Am-241 ratios the same?
If this question was asked before we saw the
data we can proceed with the test to answer it
If this question was inspired by the data then we
should not test the same data to get the answer
 Referred
to as data snooping, data dredging, etc.
 Cancer clusters
18
Statistical Method

Define the problem
 Formulate
your questions in such a way that
unambiguous answers are possible

Collect data
 Collect


data capable of answering your question
Analyze the data
Present the results
 in
terms your audience can understand
19
Define the Problem
“An approximate answer to the right problem
is worth a good deal more than an exact
answer to an approximate problem.”
John Tukey
"It is better to solve the right problem the
wrong way than to solve the wrong problem
the right way".
Richard Hamming
20
Data Collection

Collect data that are capable of answering
the question asked (Data Quality
Objectives)
 Designed
experiments
 Observational studies

Sampling
 You
select samples from a population in order
to make inferences about the population
21
GIGO


The collection of data is often the most timeconsuming and expensive part of a study
Reverend Bayes and all of his horses can’t fix a
bum dataset
22
Analyze the Data


All statistical procedures have assumptions
In practice, the assumptions of any given
statistical procedure are violated to some degree
 Can
 Can



the validity of the assumptions be verified?
the validity of the answer be verified?
How robust is your statistical procedure to
violations of its assumptions?
Simple approximate solutions you can understand
may be better than complex exact solutions that
you can’t
Augment standard statistical analyses with
simulations
23
Present Results

Technical answer versus the functional
answer
 “the
null hypothesis is not rejected”
 technically “not rejected” “accepted”
 functionally “not rejected” = “accepted”

Statistical significance and practical
significance
 Apply
“so what” test to your answers
24
What is a Statistician?
“Powerful spirits should only be
called by the master himself”
Goethe
The Sorcerer's Apprentice
25
What is a Statistician?



Based on Chatfield’s definition of statistics, anyone who
makes decisions based on the analysis of data might be
called a statistician
However, the title statistician is usually reserved for a
professional who has specialized training in the concepts,
theoretical bases, and methodologies of statistics
Key difference between the sorcerer and his apprentice


Contrary to what you might think, there is a lot of subjectivity and
professional judgment in the practice of statistics
Statistics is vast in scope and detail, and the apprentice does not
know what he does not know
“It ain't what you don't know that gets you into trouble. It's what you
know for sure that just ain't so.”
Mark Twain
26
The Sorcerer’s Apprentice


We may not be statisticians, but we are clearly
doing statistics, often without adult supervision
Doing our own statistics is a good thing, but we
need to become better students of the black arts
and consult the master before the brooms get
out of control
“Should I refuse a good dinner simply because I do not understand
the processes of digestion?”
Oliver Heaviside
[On being criticized for using formal mathematical manipulations without understanding
how they worked]
27
How We Can be Better Statisticians
Master the basics
 Learn the language
 Play with your data
 Use better software
 Perform reproducible work
 Consult with a real statistician

28
Master the Basics
Kahn Academy
http://www.khanacademy.org/
29
Statistics MS/Certificate
Distance Programs
University of South Carolina
 Colorado State University
 Texas A&M University
 Penn State University

30
Concepts and Terminology

Specialized Concepts


Statistics has a very precise language all its own



“the null hypothesis is not rejected”
“not rejected” “accepted”
Questions and answers are not right unless you use the
proper language to convey the proper concept


Population versus sample for example
some statisticians can be intolerant of laymen who misuse the
language of statistics
Learn to phrase questions and interpret answers
properly
31
Exploratory Statistics
Learn to play with
your data and see if it
is trying to tell you
something new
 Study graphs of your
data

“There is no data that can be
displayed in a pie chart, that cannot be
displayed BETTER in some other type
of chart.”
John Tukey
32
Software used for Statistics

I use the following software for statistical
calculations (in order of usage)
R
 Minitab
 SAS
 Spreadsheet

(e.g., MS Excel, Gnumeric)
There are many others
33
Spreadsheets (Excel)

What some people can do in Excel is nothing
short of amazing (but should they be doing it?)
 Amarillo
Slim beat tennis champ Bobby Riggs at PingPong, using a frying pan instead of a paddle

Spreadsheet Addiction by Patrick Burns
 http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_ad
diction.html

Problems with spreadsheet implementation
 Excel

has a long history of doing bad stats
Problems with spreadsheet paradigm
 Reproducible
science
34
http://www.msnbc.msn.com/id/21033161/from/RS.1/
9/28/2007
M. G. Almiron et al. On the Numerical
Accuracy of Spreadsheets, Journal of
Statistical Software (34) 4, 2010
35
Reproducible Research

Reproducible research refers to the idea that the ultimate
product of research is the paper along with the full
computational environment used to produce the results
in the paper such as the code, data, etc. necessary for
reproduction of the results
Raw Data
Data
Massaging
Calculations
Plots and
Tables
Final
Paper
36
The R Project for
Statistical Computing




R is a language and environment for statistical
computing and graphics
R is available as Free Software under the terms
of the GNU General Public License in source
code form
It compiles and runs on a wide variety of UNIX
platforms and similar systems (including
FreeBSD and Linux), Windows and MacOS
Download from http://www.r-project.org/
37
Advantages of R

Command line interface rather than a GUI
 Promotes

reproducible statistics
Open source
 Flexible licensing
 Availability of source
code for peer review
 Bugs are public knowledge and are fixed quickly
 New tests and methods tend to appear first in R


Many dozens of recently published books
devoted to R
Free (and very good) community support
available
38
Consult with a Statistician

If you are going to involve a statistician, do
it at the study design and data collection
phases
 If
not, at least estimate how much it will cost
to collect the data all over again

Anybody can analyze compelling data
“To call in the statistician after the experiment is done
may be no more than asking him to perform a postmortem examination: he may be able to say what the
experiment died of.”
Sir Ronald Fisher
39
Twisted Answers to
Crooked Questions


As health physicists there are times when a
decision will be made, with or without good data
and a proper statistical analysis
In such situations we base our decisions on
professional judgment, often augmented with
“statistics”
 We must not fool ourselves about what we are doing
 … of all the wrong answers we have to choose from, this one
is the best
 We
have no right to expect a statistician to endorse
such mischief
40
The Apprentice Should Beware of …
The Management Prior
 Being bamboozled by other people’s
statistics
 “The only right way to do this is X [insert
statistical method here]”
 Being seduced by complexity

41
Statistics in the Workplace:
Musings of a Sorcerer's Apprentice
Presentation to USC Stat Club
March 26, 2009

Main message
 A degree
in statistics is a “Swiss Army Knife” that is
very useful in any endeavor where data are collected
and analyzed
 Everyone in the room should become a health
physicist (I had no takers)
42
Download