Uploaded by White Snow

E BOOK Applied Medical Statistics 1st Edition by Jingmei Jiang

advertisement
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
v
Contents
1
1.1
1.2
1.2.1
1.2.2
1.2.3
1.2.4
1.2.5
1.3
1.4
1.5
1.6
1.7
What is Biostatistics 1
Overview 1
Some Statistical Terminology 2
Population and Sample 2
Homogeneity and Variation 3
Parameter and Statistic 4
Types of Data 4
Error 5
Workflow of Applied Statistics 6
Statistics and Its Related Disciplines
Statistical Thinking 7
Summary 7
Exercises 8
2
2.1
2.1.1
2.1.2
16
2.2
2.2.1
2.2.2
2.3
2.3.1
2.3.2
2.4
2.4.1
2.4.2
2.5
2.6
Descriptive Statistics 11
Frequency Tables and Graphs 12
Frequency Distribution of Numerical Data 12
Frequency Distribution of Categorical Data
3
3.1
Fundamentals of Probability 53
Sample Space and Random Events
6
Descriptive Statistics of Numerical Data 17
Measures of Central Tendency 17
Measures of Dispersion 26
Descriptive Statistics of Categorical Data 31
Relative Numbers 31
Standardization of Rates 34
Constructing Statistical Tables and Graphs 38
Statistical Tables 38
Statistical Graphs 40
Summary 47
Exercises 48
54
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 5
30-03-2022 21:10:36
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
vi
Contents
3.1.1
3.1.2
3.2
3.2.1
3.2.2
3.3
3.3.1
3.3.2
3.4
3.5
3.5.1
3.5.2
3.6
3.6.1
3.6.2
3.7
3.8
Definitions of Sample Space and Random Events 54
Operation of Events 55
Relative Frequency and Probability 58
Definition of Probability 59
Basic Properties of Probability 59
Conditional Probability and Independence of Events 60
Conditional Probability 60
Independence of Events 60
Multiplication Law of Probability 61
Addition Law of Probability 62
General Addition Law 62
Addition Law of Mutually Exclusive Events 62
Total Probability Formula and Bayes’ Rule 63
Total Probability Formula 63
Bayes’ Rule 64
Summary 65
Exercises 65
4
4.1
4.2
4.2.1
4.2.2
4.2.3
4.3
4.3.1
4.3.2
4.4
4.4.1
4.4.2
4.4.3
4.5
4.6
Discrete Random Variable 69
Concept of the Random Variable 69
Probability Distribution of the Discrete Random Variable 70
Probability Mass Function 70
Cumulative Distribution Function 71
Association Between the Probability Distribution and Relative Frequency
Distribution 72
Numerical Characteristics 73
Expected Value 73
Variance and Standard Deviation 74
Commonly Used Discrete Probability Distributions 75
Binomial Distribution 75
Multinomial Distribution 80
Poisson Distribution 82
Summary 87
Exercises 87
5
5.1
5.2
5.3
5.3.1
5.3.2
5.3.3
5.4
5.4.1
5.4.2
5.4.3
5.5
5.6
Continuous Random Variable 91
Concept of Continuous Random Variable 92
Numerical Characteristics 93
Normal Distribution 94
Concept of the Normal Distribution 94
Standard Normal Distribution 96
Descriptive Methods for Assessing Normality 99
Application of the Normal Distribution 102
Normal Approximation to the Binomial Distribution 102
Normal Approximation to the Poisson Distribution 105
Determining the Medical Reference Interval 108
Summary 109
Exercises 110
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 6
30-03-2022 21:10:37
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Contents
6.5
6.6
Sampling Distribution and Parameter Estimation 113
Samples and Statistics 114
Sampling Distribution of a Statistic 114
Sampling Distribution of the Mean 115
Sampling Distribution of the Variance 120
Sampling Distribution of the Rate (Normal Approximation) 122
Estimation of One Population Parameter 124
Point Estimation and Its Quality Evaluation 124
Interval Estimation for the Mean 126
Interval Estimation for the Variance 130
Interval Estimation for the Rate (Normal Approximation Method) 131
Estimation of Two Population Parameters 132
Estimation of the Difference in Means 132
Estimation of the Ratio of Variances 136
Estimation of the Difference Between Rates (Normal Approximation
Method) 139
Summary 141
Exercises 141
7
7.1
7.1.1
7.1.2
7.1.3
7.1.4
7.2
7.2.1
7.2.1.1
7.2.1.2
7.2.2
7.2.2.1
7.2.2.2
7.3
7.3.1
7.3.2
7.4
7.5
Hypothesis Testing for One Parameter 145
Overview 145
Concepts and Procedures 146
Type I and Type II Errors 150
One-sided and Two-sided Hypothesis 152
Association Between Hypothesis Testing and Interval Estimation 153
Hypothesis Testing for One Parameter 155
Hypothesis Tests for the Mean 155
Power of the Test 156
Sample Size Determination 160
Hypothesis Tests for the Rate (Normal Approximation Methods) 162
Power of the Test 163
Sample Size Determination 164
Further Considerations on Hypothesis Testing 164
About the Significance Level 164
Statistical Significance and Clinical Significance 165
Summary 165
Exercises 166
8
8.1
Hypothesis Testing for Two Population Parameters 169
Testing the Difference Between Two Population Means: Paired
Samples 170
Testing the Difference Between Two Population Means: Independent
Samples 173
t-Test for Means with Equal Variances 173
F-Test for the Equality of Two Variances 176
Approximation t-Test for Means with Unequal Variances 178
Z-Test for Means with Large-Sample Sizes 181
Power for Comparing Two Means 182
6
6.1
6.2
6.2.1
6.2.2
6.2.3
6.3
6.3.1
6.3.2
6.3.3
6.3.4
6.4
6.4.1
6.4.2
6.4.3
8.2
8.2.1
8.2.2
8.2.3
8.2.4
8.2.5
vii
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 7
30-03-2022 21:10:37
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
viii
Contents
8.2.6
8.3
8.3.1
8.3.2
8.4
8.5
Sample Size Determination 183
Testing the Difference Between Two Population Rates (Normal
Approximation Method) 185
Power for Comparing Two Rates 186
Sample Size Determination 187
Summary 188
Exercises 189
9
9.1
9.1.1
9.1.2
9.2
9.3
9.3.1
9.3.2
9.3.3
9.4
9.4.1
9.4.2
9.4.2.1
9.4.2.2
9.5
9.6
9.7
One-way Analysis of Variance 193
Overview 193
Concept of ANOVA 194
Data Layout and Modeling Assumption 195
Procedures of ANOVA 196
Multiple Comparisons of Means 204
Tukey’s Test 204
Dunnett’s Test 206
Least Significant Difference (LSD) Test 209
Checking ANOVA Assumptions 211
Check for Normality 211
Test for Homogeneity of Variances 213
Bartlett’s Test 213
Levene’s Test 215
Data Transformations 217
Summary 218
Exercises 218
10
10.1
10.1.1
10.1.2
10.2
10.2.1
10.2.2
10.2.3
10.3
10.3.1
10.3.2
10.3.3
10.3.4
10.3.5
10.4
10.4.1
10.4.2
10.4.3
10.5
10.6
Analysis of Variance in Different Experimental Designs
ANOVA for Randomized Block Design 221
Data Layout and Model Assumptions 223
Procedure of ANOVA 224
ANOVA for Two-factor Factorial Design 229
Concept of Factorial Design 230
Data Layout and Model Assumptions 233
Procedure of ANOVA 234
ANOVA for Repeated Measures Design 240
Characteristics of Repeated Measures Data 240
Data Layout and Model Assumptions 242
Procedure of ANOVA 243
Sphericity Test of Covariance Matrix 245
Multiple Comparisons of Means 248
ANOVA for 2 × 2 Crossover Design 251
Concept of a 2 × 2 Crossover Design 251
Data Layout and Model Assumptions 252
Procedure of ANOVA 254
Summary 256
Exercises 257
221
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 8
30-03-2022 21:10:37
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Contents
11
11.1
11.1.1
11.1.2
11.1.3
11.2
11.2.1
11.2.2
11.2.3
11.2.4
11.2.5
11.3
11.3.1
11.3.2
11.4
11.4.1
11.4.2
11.5
11.6
χ2 Test 261
Contingency Table 262
General Form of Contingency Table 263
Independence of Two Categorical Variables 264
Significance Testing Using the Contingency Table 265
χ2 Test for a 2 × 2 Contingency Table 266
Test of Independence 266
Yates’ Corrected χ2 test for a 2 × 2 Contingency Table 269
Paired Samples Design χ2 Test 269
Fisher’s Exact Tests for Completely Randomized Design 272
Exact McNemar’s Test for Paired Samples Design 275
χ2 Test for R × C Contingency Tables 276
Comparison of Multiple Independent Proportions 276
Multiple Comparisons of Proportions 278
χ2 Goodness-of-Fit Test 280
Normal Distribution Goodness-of-Fit Test 281
Poisson Distribution Goodness-of-Fit Test 283
Summary 284
Exercises 285
12
12.1
12.2
12.3
12.4
12.4.1
12.4.2
12.5
12.6
12.7
12.8
Nonparametric Tests Based on Rank 289
Concept of Order Statistics 289
Wilcoxon’s Signed-Rank Test for Paired Samples 290
Wilcoxon’s Rank-Sum Test for Two Independent Samples 295
Kruskal-Wallis Test for Multiple Independent Samples 299
Kruskal-Wallis Test 299
Multiple Comparisons 301
Friedman’s Test for Randomized Block Design 303
Further Considerations About Nonparametric Tests 306
Summary 306
Exercises 306
13
13.1
13.2
13.2.1
13.2.2
13.2.3
13.3
13.3.1
Simple Linear Regression 311
Concept of Simple Linear Regression 311
Establishment of Regression Model 314
Least Squares Estimation of a Regression Coefficient 314
Basic Properties of the Regression Model 316
Hypothesis Testing of Regression Model 317
Application of Regression Model 321
Confidence Interval Estimation of a Regression
Coefficient 321
Confidence Band Estimation of Regression Model 322
Prediction Band Estimation of Individual Response Values 323
Evaluation of Model Fitting 325
Coefficient of Determination 325
Residual Analysis 326
13.3.2
13.3.3
13.4
13.4.1
13.4.2
ix
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 9
30-03-2022 21:10:37
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
x
Contents
13.5
13.6
Summary 327
Exercises 328
14
14.1
14.1.1
14.1.2
14.2
14.3
14.4
14.4.1
14.4.2
14.5
14.6
Simple Linear Correlation 331
Concept of Simple Linear Correlation 331
Definition of Correlation Coefficient 331
Interpretation of Correlation Coefficient 334
Hypothesis Testing of Correlation Coefficient 336
Confidence Interval Estimation for Correlation Coefficient 338
Spearman’s Rank Correlation 340
Concept of Spearman’s Rank Correlation Coefficient 340
Hypothesis Testing of Spearman’s Rank Correlation Coefficient 342
Summary 342
Exercises 343
15
15.1
15.1.1
15.1.2
15.1.3
15.1.4
15.2
15.2.1
15.2.2
15.3
15.3.1
Multiple Linear Regression 345
Multiple Linear Regression Model 346
Concept of the Multiple Linear Regression 346
Least Squares Estimation of Regression Coefficient 349
Properties of the Least Squares Estimators 351
Standardized Partial-Regression Coefficient 351
Hypothesis Testing 352
F-Test for Overall Regression Model 352
t-Test for Partial-Regression Coefficients 354
Evaluation of Model Fitting 356
Coefficient of Determination and Adjusted Coefficient of
Determination 356
Residual Analysis and Outliers 357
Other Aspects of Regression 359
Multicollinearity 359
Selection of Independent Variables 361
Sample Size 364
Summary 364
Exercises 364
15.3.2
15.4
15.4.1
15.4.2
15.4.3
15.5
15.6
16
Logistic Regression 369
16.1
Logistic Regression Model 370
16.1.1 Linear Probability Model 371
16.1.2 Probability, Odds, and Logit Transformation 371
16.1.3 Definition of Logistic Regression 373
16.1.4 Inference for Logistic Regression 375
16.1.4.1 Estimation of Model Coefficient 375
16.1.4.2 Interpretation of Model Coefficient 378
16.1.4.3 Hypothesis Testing of Model Coefficient 380
16.1.4.4 Interval Estimation of Model Coefficient 382
16.1.5 Evaluation of Model Fitting 385
16.2
Conditional Logistic Regression Model 388
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 10
30-03-2022 21:10:37
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Contents
16.2.1
16.2.2
16.2.3
16.3
16.3.1
16.3.2
16.3.3
16.3.4
16.4
16.5
Characteristics of Conditional Logistic Regression Model 390
Estimation of Regression Coefficient 390
Hypothesis Testing of Regression Coefficient 393
Additional Remarks 394
Sample Size 394
Types of Independent Variables 394
Selection of Independent Variables 395
Missing Data 395
Summary 395
Exercises 396
17
17.1
17.1.1
17.1.2
17.2
17.2.1
17.2.2
17.3
17.3.1
17.3.2
17.4
17.4.1
17.4.2
17.4.3
17.4.4
17.5
17.5.1
17.5.2
17.6
17.7
Survival Analysis 399
Overview 400
Concept of Survival Analysis 400
Basic Functions of Survival Time 402
Description of the Survival Process 405
Product Limit Method 405
Life Table Method 408
Comparison of Survival Processes 410
Log-Rank Test 410
Other Methods for Comparing Survival Processes 413
Cox’s Proportional Hazards Model 414
Concept and Model Assumptions 415
Estimation of Model Coefficient 417
Hypothesis Testing of Model Coefficient 419
Evaluation of Model Fitting 420
Other Aspects of Cox’s Proportional Hazard Model 421
Hazard Index 421
Sample Size 421
Summary 422
Exercises 423
18
18.1
18.1.1
18.1.2
18.1.3
18.1.4
18.2
18.2.1
18.2.2
18.3
18.3.1
18.3.2
18.3.3
18.4
18.5
Evaluation of Diagnostic Tests 431
Basic Characteristics of Diagnostic Tests 431
Sensitivity and Specificity 433
Composite Measures of Sensitivity and Specificity 435
Predictive Values 438
Sensitivity and Specificity Comparison of Two Diagnostic Tests 440
Agreement Between Diagnostic Tests 443
Agreement of Categorical Data 444
Agreement of Numerical Data 447
Receiver Operating Characteristic Curve Analysis 448
Concept of an ROC Curve 449
Area Under the ROC Curve 450
Comparison of Areas Under ROC Curves 453
Summary 456
Exercises 457
xi
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 11
30-03-2022 21:10:37
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
xii
Contents
19
19.1
19.1.1
19.1.2
19.1.3
19.1.4
19.2
19.2.1
19.2.2
19.3
19.3.1
19.3.2
19.4
19.5
Observational Study Design 461
Cross-Sectional Studies 462
Types of Cross-Sectional Studies 462
Probability Sampling Methods 462
Sample Size for Surveys 466
Cross-Sectional Studies for Clues of Etiology 468
Cohort Studies 469
Measures of Association in Cohort Studies 469
Sample Size for Cohort Studies 470
Case-Control Studies 472
Measures of Association in Case-Control Studies 472
Sample Size for Case-Control Studies 473
Summary 474
Exercises 475
20
20.1
20.1.1
20.1.2
20.1.3
20.2
20.2.1
20.2.2
20.3
20.3.1
20.3.2
20.4
20.5
20.5.1
20.5.2
20.6
20.7
Experimental Study Design 477
Overview 478
Basic Components of an Experimental Study 478
Principles of Experimental Study Design 480
Blinding Procedures in Clinical Trials 482
Completely Randomized Design 483
Concept of Completely Randomized Design 483
Sample Size for Completely Randomized Design 485
Randomized Block Design 486
Concepts of Randomized Block Design 486
Sample Size for Randomized Block Design 488
Factorial Design 489
Crossover Design 491
Concepts of Crossover Design 491
Sample Size for 2 × 2 Crossover Design 492
Summary 493
Exercises 493
Appendix 495
References 549
Index 557
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
ftoc.indd 12
30-03-2022 21:10:37
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
1
1
What is Biostatistics?
CONTENTS
1.1 Overview 1
1.2 Some Statistical Terminology 2
1.2.1 Population and Sample 2
1.2.2 Homogeneity and Variation 3
1.2.3 Parameter and Statistic 4
1.2.4 Types of Data 4
1.2.5 Error 5
1.3 Workflow of Applied Statistics 6
1.4 Statistics and Its Related Disciplines 6
1.5 Statistical Thinking 7
1.6 Summary 7
1.7 Exercises 8
1.1 Overview
Data are present everywhere in our lives, and almost all types of scientific research
have to deal with the collection, description, or analysis of data. This makes statistics
one of the most powerful methodologies across all disciplines for exploring the
unknown world. Statistics is a discipline on its own and has a wide spectrum of theories, methods, and applications. A prerequisite for discussing the theory and application of statistics is the definition and statement of its objectives. According to
Merriam–Webster’s Collegiate Dictionary, statistics is “a branch of mathematics
dealing with the collection, analysis, interpretation, and presentation of masses of
numerical data.” According to the Random House College Dictionary, it is “the
science that deals with the collection, classification, analysis, and interpretation of
information
or
data.”
According to The New Oxford English–Chinese Dictionary, it is “the practice or
science of collecting and analyzing numerical data in large quantities, especially for the
purpose of inferring proportions in a whole from those in a representative sample.”
Although there are some differences among these definitions, each definition
implies that statistics is a science of data and uses the theory of mathematical
statistics to make inferences.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 1
30-03-2022 21:15:52
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
2
1 What is Biostatistics?
The application of statistical theories and methods to medical research fields is termed
“medical statistics,” or more broadly, biostatistics when applied to life sciences.
There are two branches of biostatistics based on its functions: (i) statistical description is concerned with the organization, summarization, and description of data; and
(ii) statistical inference is concerned with the use of sample data to make inferences
about the characteristics of a larger set of data. This division of descriptive and inferential statistics helps us to establish a progressive learning framework for statistics.
However, this division is not always necessary in scientific activities where the two
branches complement each other in deepening our knowledge of the real world.
We briefly review the development of biostatistics. In London in 1603, the Bills of
Mortality began to be published weekly, which is generally considered to mark the
beginning of biostatistics. Since then, related theories have continued to emerge, and the
early twentieth century ushered in the peak of development of biostatistics. Several pioneers played a crucial role in the development of the theoretical framework and applications of biostatistics. G.J. Mendel (1822–1884), the father of modern genetics, used
probability rules to discover the basic laws of biogenetics in the 1860s. He is considered to
be one of the first to apply mathematical methods to biology. K. Pearson (1857–1936), the
founding father of modern statistics, established the world’s first department of statistics
at University College London in 1911, and developed several key statistical theories (e.g.,
measure of correlation and χ2 distribution). W.S. Gosset (1876–1937) proposed the t distribution and t-test in 1908, which laid the foundation for the sampling distribution of
the sample mean, and signified the establishment of small sample theory and methodology. R.A. Fisher (1890–1962) developed statistical significance tests, and various
­sampling distributions, and established the experimental design method and related
statistical analysis technique. These were collected in Design of Experiments, which was
first published in 1935. With the efforts of these pioneers and other statisticians, after
hundreds of years, a complete theoretical system of biostatistics had formed.
At the present time, the development of biostatistics is being driven by the unprecedented and still growing range of life science applications using advances in computing power and computer technology, and new formats of data that continue to
emerge. Despite this, the ideas of basic statistics have not changed: to make an inference about a population based on information contained in a sample from that
population and to provide an associated measure of goodness for the inference.
1.2 Some Statistical Terminology
In this text, we aim to explain basic statistical methods commonly applied in biomedical research. Before this, we provide an overview of several statistical terms, which are
the premise for further learning.
1.2.1
Population and Sample
A population (statistical population or target population) is a certain or some characteristics of study subjects that are our target of interest. Population is usually denoted by
X (also called random variable), and can be viewed as a dataset. The basic unit that
constitutes the population is called the individual.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 2
30-03-2022 21:15:52
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
1.2 Some Statistical Terminology
3
The dataset that defines a population is typically large or conceptual. The former
suggests a finite population because it has a finite number of individuals regardless of
how large it is. For example, the dataset of the heights of all the college freshman boys
in Beijing in 2020 is a finite population (though very large). When the dataset only
exists conceptually, we call it an infinite population, for example, the weights of infants
and the antihypertensive treatment effects of a certain drug. The sampling theory and
statistical inference principle introduced in this text are based on an infinite population.
A sample, denoted by X 1 , X 2 ,…, X n (n is the sample size), is a subset of data selected
from a population. The purpose of obtaining a sample is to infer about the characteristics of its underlying unknown population.
The process of drawing a sample from a population is termed sampling. In practice,
depending on the research objectives and feasibility, samples can be obtained using
random or non-random sampling. A random sample is obtained through probability sampling. In this text, we generally assume the use of a simple random sample in which each
individual in the population has an equal chance of being sampled. Non-random sampling
relies on the subjective judgment of the researcher and is beyond the scope of this text.
Note the following: (i) The concept of population is different in biomedical research
and statistical terminology. In biomedical research, the term “study population” (or
study subject) typically refers to a group of humans or other species of organism, whereas
the characteristics of the study subjects are the population we are interested in statistics.
For example, in a study of blood glucose concentrations among 3-year-old children, all
children of that age are regarded as the study population. However, from a statistical
point of view, all blood glucose concentrations in children of that age constitute the
population of interest. (ii) Although the dataset of a population is typically large, the
essential difference between the population and the sample is not the amount of data we
have, but the objective of the research. If the objective is to provide a description only,
then the data we have can be regarded as a population, regardless of how small it is,
whereas if the objective is to draw an inference, then we need to clarify what population
we are interested in, and consider how to obtain a representative sample, or how good
the sample at hand is. The representativeness of the sample of the population is a very
important basis for a reasonable inference.
1.2.2
Homogeneity and Variation
In statistics, homogeneity means the similarity among individuals within a population.
In fact, without homogeneity, we can rarely define a population. The individual differences in a homogenous population are termed variation.
Example 1.1 Survey of the height of college freshman boys in Beijing in 2020.
Homogeneity: College freshman boys in Beijing in 2020.
Variation: Individual differences in height.
Example 1.2 Study of the antihypertensive treatment effects of a drug.
Homogeneity: Hypertensive patients taking this drug.
Variation: Individual differences in the treatment effects.
From Examples 1.1 and 1.2, we can see that homogeneity refers to similarities in the
nature, condition, or background of individuals in a population. The mission of statistics
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 3
30-03-2022 21:15:54
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
4
1 What is Biostatistics?
can be interpreted as describing the features of a homogenous population and identifying the heterogeneity of different populations. Variation is an inherent attribute of life
sciences, and biomedical researchers should learn to use statistical methods to reveal
the laws of biological phenomena in the context of variation.
1.2.3
Parameter and Statistic
A descriptive measure of the characteristics calculated on a population is called a
population parameter, or simply, a parameter, generally denoted by the Greek letter θ.
For example, in the survey of the height of freshman boys, the population mean (average
height, typically denoted by µ ) is a parameter. However, it is difficult to have data for
the entire population most of the time, so a sample is used instead. Correspondingly, a
descriptive measure based on a sample is called a sample statistic, or simply, a statistic.
For example, if we draw a sample (typically a random sample) from the population and
calculate the average height, the sample mean is a statistic and is typically denoted by
x . The mathematical definition and roles of statistics are elaborated on in Chapter 6.
Because most populations are theoretical, the parameters are constants that are usually
unknown, whereas the statistics are calculated from samples, which are indeterminate,
and the values of statistics could be different for different samples.
1.2.4 Types of Data
Data are the representation or observation of the characteristic population. Data can
be classified as numerical and categorical, depending on their properties:
(1) Numerical data, also known as quantitative data, are the data expressed in numbers and are obtained by measuring each research subject’s indices, that is, the
quantity or number of things. Numerical data differentiate themselves from other
number-form data types as a result of the ability to perform arithmetic operations
using these numbers. We can subdivide numerical data into two types:
Continuous data occur when data can be measured on a continuum or scale, i.e.,
there is a possible value between any other two values.
Most numerical data in biomedical research are continuous or can be viewed as continuous. For instance, if we conduct a survey on the health and nutritional status of
7-year-old boys in a less developed region in 2020, the measurement results of their
heights (cm), weights (kg), and hemoglobin (g/L) can be viewed as continuous data
because their values can assume, in theory, any value in a certain range.
Discrete data occur when the data can only take certain values. The possible values
of discrete data are generally integers. For instance, if we also collect data on the
number of cases of cold (0,1, 2,…) in 2020 for the 7-year-old boys, then they are discrete
data.
(2) Categorical data, also known as qualitative data, include two subtypes:
Unordered categorical data are obtained by dividing research subjects into two or
more unordered groups. For instance, we can denote a man and woman as 1 and 2 for
sex and denote A, B, O, and AB as 1, 2, 3, and 4 for blood type. Unlike numerical data,
the numbers representing different categories do not have mathematical meanings.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 4
30-03-2022 21:15:54
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
1.2 Some Statistical Terminology
5
Individual values do not have a quantitative difference if they belong to the same category and have qualitative differences if they belong to different categories.
Ordinal categorical data are obtained by dividing research subjects into orderings of an
attribute. They are not measured; nonetheless, they have a potential ordering. For instance, the treatment effect of a disease can be ordered as cured, effective, improved, ineffective, and deteriorated. The laboratory test results of urine protein determination can be
ordered as −, ±, +, + +, and + + +. We can also use numerical values such as 1, 2, 3,… to
represent the potential grades, although the numbers do not have numerical meanings.
Numerical data and categorical data are not set in stone; under certain conditions,
they can be exchanged according to the research objectives and statistical methods
used. For example, in a large survey on hypertension, the blood pressure values collected are numerical data. If we want to estimate the prevalence of hypertension, we
could group survey participants according to whether they are hypertensive (1 for
hypertensive and 0 for not hypertensive), and the data become unordered categorical
data (binary data). If we want to know the degrees of hypertension, the blood pressure
measurements can be reclassified into ordinal categorical data. Conversely, categorical
data can also be changed to numerical data. For example, if we want to compare the
epidemic of hypertension in different regions, we could use binary data to calculate
the hypertension prevalence p, which ranges from 0 to 1 and belongs to the scope of
numerical data. In the study design, we should collect as much raw data (original
data) as possible in numerical form to minimize the loss of information and allow for
flexible transformation.
1.2.5
Error
Error refers to the difference between the observed value and real value (parameter).
The following formula defines the relation between them:
x = θ + ε, (1.1)
where x denotes the observed value; θ denotes the real value, theoretically; and ε
denotes the error, which can represent a random error or systematic error.
(1) A random error, as the name suggests, is completely random, that is, the magnitude
and sign of ε cannot be predetermined, and the scope ε ∈ (−∞, + ∞) . A random
error is caused by the influence of many uncertain factors in the actual observation or
measurement process.
As shown in Formula 1.1, a random error can be interpreted in many ways. For
example, if x is the measured value in an experiment, then ε = x − θ reflects the
measurement error in the results of each measurement. Additionally, the sampling
error is the most typical type of random error. If x is a sample statistic, then ε = x − θ
reflects the difference between statistic x and the parameter θ resulting from the sampling process, which is fundamental to the study of statistical inference introduced in
Chapter 6.
(2) A systematic error, also known as bias in epidemiology, is another type of error that
has a fixed magnitude and directional systematic deviation from a real number, that
is, ε = a (a ≠ 0), where a is a constant. A systematic error is caused by the influence
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 5
30-03-2022 21:15:56
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
6
1 What is Biostatistics?
of certain factors, for example, an uncorrected instrument, the sensory disturbance
of the measurer, or high or low standards in evaluating a treatment effect.
Random errors are unavoidable but could manifest some laws of regularity in
some conditions. The study and application of the law of random errors is one of
the most important elements of statistics. In practice, random and systematic
errors often coexist, both requiring considerations in the study design and data
analysis.
1.3 Workflow of Applied Statistics
The following four steps in applied statistical workflow are indispensable in practice:
Statistical design: This marks the beginning of scientific research, and is directly
responsible for the accuracy and reliability of the research results. Statistical design
should be conducted with specific research objectives and domain knowledge. This
means that good research design is inevitably based on interactions between domain
experts and statisticians. Two categories of research design exist in general, observational design and experimental design, which we discuss in Chapters 19 and 20,
respectively.
Data collection: Data collection is used to obtain the raw data required by research
through a reasonable and reliable approach. The collection of representative data is important for obtaining reliable conclusions. Regardless of which method is used, the accuracy
and integrity of the data should be given high priority.
Statistical analysis: The next step is the management and analysis of the raw data
according to the research objectives and types of data. This step typically includes the
statistical description, statistical inference, and (or) statistical modeling for mining the
information hidden in the data.
Statistical reporting: After all the steps are executed, the analysis results are displayed. Appropriate statistical tables and graphs can be used to enhance the presentation of results. Final conclusions and suggestions are drawn, guided by domain
knowledge. A key feature of statistical reporting is that all conclusions are probabilistic.
1.4 Statistics and Its Related Disciplines
The discipline of statistics does not stand alone. Instead, it is closely related to the
development of other disciplines.
Statistics and medicine: Statistics not only helps to solve practical problems, but
also promotes its own development during the process. Its application to the biomedical sciences is a typical demonstration of this. With the further understanding of data
in the twenty-first century, evidence-based medicine, precision medicine, and other
quantitative methods will provide a broader space for applying statistics.
Statistics and mathematics: Statistics is a branch of mathematics. The
mathematical basis of statistics is the theory of probability and calculus. However, this
does not mean that learning statistics must be based on knowledge of advanced mathematics. In fact, the objective of learning statistics is not to master complicated
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 6
30-03-2022 21:15:56
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
1.6 Summary
7
mathematical proofs but the application of statistical thinking and methods to solve
problems that arise in scientific research.
Statistics and computer science: Modern statistics cannot be separated from
developments in computer science. The field of statistics has benefited greatly from
advances in computing power. In the digital era, computer science and information
technology are as important to statistics as the theory of probability. Computer software
has become an important auxiliary tool for statistical analysis. The conclusions are
largely the same using different statistical software, even if the numerical results have
minor differences. To avoid any distraction caused by these technical issues in learning
statistical ideas and methods, in this text, we present results mainly using SPSS, among
other alternatives.
1.5
Statistical Thinking
Statistical thinking includes applying rational thinking and statistical science to critically
evaluate data and the resultant correct and false inferences. How does statistical thinking
play its role in scientific research practice? To answer this question, we must note that
inferences based on sample data are almost always subject to error because a sample does
not provide an exact image of the population.
The population is typically a theoretical and conceptual truth of interest. The science of statistics helps us to establish a methodological framework or workflow to
draw inferences about the unknown characteristics of the population using the sample
of limited data at hand, based on one or a few assumptions. The statistical inference
process is an important part of the scientific method. Inference based on experimental
or observational data is first used to develop a theory about some phenomenon. Then
the theory is tested against additional sample data.
Errors may occur in the inference process based on a sample. What matters is how we
quantify and evaluate the error. Statistics connects the quantification of errors with the
measurement of the reliability of inference using probability. This connection provides a
solid theoretical basis for reasonable statistical inference.
Statistics builds a bridge between abstract theoretical concepts and the solution of
specific problems. It enables researchers to make inferences (estimates and decisions
about the target population) with a known measurement of reliability. With this
ability, a researcher can make intelligent decisions and inferences from data; that is,
statistics helps researchers to think critically about their results.
We end this chapter with remarks from the famous statistician, C.R. Rao.
All knowledge is, in the final analysis, history.
All sciences are, in the abstract, mathematics.
All judgments are, in their rationale, statistics.
1.6
Summary
The learning objective of this chapter is to understand some basic concepts in
statistics and the role of statistics in biomedical research, which are the basis for
future learning.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 7
30-03-2022 21:15:56
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
8
1 What is Biostatistics?
Statistics is a science about data, and its basic characteristic is that it is a quantitative
science.
Two branches, statistical description and statistical inference, constitute the main
content of statistics.
The application of statistics to biomedical research generally includes the following
four steps: statistical design, data collection, statistical analysis, and statistical
reporting.
Statistical thinking includes the application of rational thinking and statistical science to critically evaluate data and make inferences from them.
1.7 Exercises
1. Suppose you were so interested in the waist circumference of your schoolmates
that you prepared a tape measure in a statistics class and measured the waist circumference of all your classmates who were present. Answer the following
questions:
(a) Decide whether the data you obtained is a sample or population? For what
research objectives should it be considered a sample or population?
(b) If it is considered a sample, what is the population you are drawing an inference about? How representative of the population is it?
(c) How do you determine the homogeneity of your population? Is there heterogeneity? If yes, how can you improve the homogeneity? Is there variation? What
may lead to this variation?
(d) Are there errors in the obtained data? What are the random errors and
systematic errors? Can you tell the difference between them? Can you, and how
do you, minimize the errors?
(e) What steps do you need to follow to complete a report on your survey?
2. Choose a quantitative research article in clinical medicine, basic medicine, public
health, or any biomedical research topic you are interested in and answer the following questions:
(a) What is the population and how is it defined from the perspectives of the
research and statistics, respectively? What are the differences between the concepts of population using different perspectives?
(b) Is the sample presented in the research a random sample? What are the advantages of a random sample and non-random sample?
(c) Illustrate the relationship between the population and sample, and between
homogeneity and variation using your selected paper.
(d) Is there any factor that may lead to random or systematic errors in the research?
How do you distinguish them? How have they been minimized? Can you think
of ways to further minimize the errors?
(e) What data are collected? What are the types of data? How do you determine the
type of data? Which type of data contains more information? Do these types of
data allow for further transformation?
(f) How many steps are involved in the statistical plan? What are the specific roles
of these steps and what is the relation between these steps?
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 8
30-03-2022 21:15:56
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
1.7 Exercises
9
(g) Are the conclusions obtained from the research correct? How does the
knowledge of statistics learned from this chapter help you with critical
thinking?
(h) Can you follow the conceptual path as laid out by the research and use statistical
critical thinking to solve a problem that interests you in your daily life? Try to
create a statistical design as you deepen your knowledge and skills through
further learning.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
c01.indd 9
30-03-2022 21:15:56
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
Download