Uploaded by Syara Shazanna Zulkifli

Basic Biostatistics Using SPSS 2019

advertisement
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 1
Descriptive Statistics
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
1
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
2
Descriptive Statistics
• Are numerical values obtained from the sample that gives
meaning to the data collected.
• It consists of methods for organizing, displaying and describing
data using tables, graphs and summary measures.
• Descriptive statistics consists of:
i. Frequency and proportion
ii. Measures of central tendency.
iii. Measures of dispersion.
• Two important concepts to understand descriptive statistics are:
i. Variables.
ii. Distribution..
MdRodiSPSS
3
Descriptive Statistics
• Involves
• Collecting Data
• Presenting Data
• Characterizing Data
• Purpose:
• Describe Data.
• Descriptive statistics:
• Simplifying, summarizing, describing data..
MdRodiSPSS
4
Descriptive Statistics
Nominal
Categorical
Ordinal
Variables
Discrete
Numerical
Intervals
Continuous
Ratio
MdRodiSPSS
5
Categorical Data Analysis
Presentation:
• Percentage.
• Frequency
• Relative frequency
• Cumulative relative frequency
• Proportion.
Graphical:
• Bar chart.
• Pie chart.
• Pareto diagram..
MdRodiSPSS
6
i. Frequency
Question: How to get frequency for the “race” from SPSS ?
Analyze >>
Descriptive >>
Frequency
MdRodiSPSS
7
Hands-on: Percentage
Describe data for race of respondents
1
2
3
4
5
MdRodiSPSS
8
Output: Percentage
MdRodiSPSS
9
ii. Cross-tabulation
• Cross tabulation is a method to quantitatively analyze the
relationship between multiple variables.
• It is also known as contingency tables or cross tabulation groups
variables to understand the association between different
variables
• It is usually performed on categorical data – data that can be
divided into mutually exclusive groups.
Example:
• How to cross tabulate between Gender and ethnicity.
Analyze >>
Descriptive >>
Crosstabs
MdRodiSPSS
10
Hands-on: Cross-tabulation
Describe data for race of respondents stratified by gender
1
2
4
3
MdRodiSPSS
6
5
11
Hands-on: Percentage
Describe data for race of respondents stratified by gender
7
You can change
to “ROW”
percentages
8
MdRodiSPSS
9
12
Output: Percentage
1) Percentage by ROW
2) Percentage by COLUMN
MdRodiSPSS
13
Hands-on: Simple Bar Chart
Describe data for race of respondents
1
4
2
3
5
6
MdRodiSPSS
14
Hands-on: Simple Bar Chart
Describe data for race of respondents
7
8
MdRodiSPSS
9
15
Output: Simple Bar chart
MdRodiSPSS
16
Hands-on: Clustered Bar Chart
Describe data for race of respondents stratified by gender
1
4
2
3
5
6
MdRodiSPSS
17
Hands-on: Clustered Bar Chart
Describe data for race of respondents stratified by gender
9
7
8
MdRodiSPSS
10
18
Output: Clustered Bar chart
MdRodiSPSS
19
Hands-on: Pie Chart
Describe data for race of respondents
1
4
2
5
3
MdRodiSPSS
20
Hands-on: Pie Chart
Describe data for race of respondents
7
6
MdRodiSPSS
8
21
Output: Pie chart
MdRodiSPSS
22
Exercise:
• Describe data for:
i. Gender
ii. Educational level
iii. Medication type
iv. Heart disease status
MdRodiSPSS
23
Numerical Data Analysis
1) Presentations:
• Measures of central tendency.
• Measures of dispersion (variability).
2) Graphical:
• Histogram.
• Stem & Leaf plot.
• Box and whisker plot.
• Many more …
MdRodiSPSS
24
Explore
•
i.
ii.
iii.
It is the first step in the analytic process:
to explore the characteristics of the data.
to screen for error and correct them.
to look for distribution patterns – normal distribution or not.
•
It may require transformation before further analysis using
parametric methods.
or may need analysis using non-parametric technique..
•
MdRodiSPSS
25
Hands-on: Explore
Describe data for weight of respondents
1
2
4
3
5
MdRodiSPSS
26
Output: Explore
MdRodiSPSS
27
Hands-on: Explore (stratify)
Describe data for weight of respondents stratified by gender
1
2
4
3
5
6
MdRodiSPSS
28
Output: Explore (stratify)
• Mean weight for male:
69.55 (SD: 2.67)
• Mean weight for female:
70.07 (SD: 2.66)
MdRodiSPSS
29
Hands-on : Histogram
Describe data for weight of respondents
1
4
5
2
6
MdRodiSPSS
3
30
Output: Histogram
Curve
MdRodiSPSS
31
Exercise:
Explore:
i. Weight
ii. Age
iii. Height
iv. Body mass index
MdRodiSPSS
32
Answer: Summary
Variables
Weight
Age
Height
BMI
Mean (SD)
69.81 (2.66)
31.20 (2.85)
Median (IQR)
72.50 (14)*
1.47 (0.12)**
-
* Data skewed to the left
** Data skewed to the right
MdRodiSPSS
33
Scales of measurement
Categorical
(Qualitative)
Variables
Numerical
(Quantitative)
Nominal – The assignment of Numbers for
Classification purposes:
Eg. Gender, Blood group
Ordinal - Quantitative Values Providing a
Classification According to order or Magnitude
Eg. Educational status
Discrete - values are countable that only certain
values with no intermediate values (OR only
whole number).
Eg. number of children (1, 2, 3, 4, ....).
Continuous
MdRodiSPSS
Intervals – Classification according to
a continuous with interval equality &
subdivision sensibility
Eg. Temperature
Ratio – Interval data with an absolute
value of 0
34
Eg. Height, weight
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
35
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 2
Inferential Statistics:
Estimation & Hypothesis Testing
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
36
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
37
Inferential Statistics
•
The application of analytic procedures on the sample of
population to infer (generalize) the results obtained to the target
population.
• It is a techniques, where:
i. inferences are drawn for the population parameter from the
sample statistics; OR
ii. sample statistics observed are inferred to the corresponding
population parameters.
• It analysis infers properties about a population, this includes:
i. Estimation / Confidence intervals
- Point estimation (maximally likely value for parameter).
- Interval estimation (also called confidence intervals for
parameter)..
MdRodiSPSS
ii. Hypothesis testing tests of significance..
38
Inferential Statistics
• The application of analytic procedures on the sample of
population to infer (generalize) the results obtained to the target
population.
Random sampling:
Every member of the population has the
same chance of being selected in the
sample
Target
Population
Parameter
Observed
sample
Statistics
MdRodiSPSS
Estimation
39
(inference about the population)
Statistics
Population
sampling
Inferential
• Describe the
characteristics of the
sample
• eg:
i.
Sex : % male, %
female
ii. Age : mean (sd)
iii. Race : % Malay, %
Chinese, % Indian
Sample
MdRodiSPSS
Descriptive40
GOLDEN RULE
It is never about the sample
It is ALWAYS about the population
MdRodiSPSS
41
1. Estimation & Confidence Intervals
• It refers to the process by which one makes inferences about a
population based on information obtained from a sample.
• It use sample statistics to estimate the population parameter.
• Example:
i. sample means are used to estimate population means.
ii. sample proportion to estimate population proportions.
• An estimate of a population presented in two ways:
i. Point estimate: - a point estimate of a population is a single
value of statistic.
ii. Intervals estimate: - an intervals that have two numbers,
between which a population parameter is said to lie..
MdRodiSPSS
42
How to calculate Confidence Intervals
• The calculation of confidence intervals based on:
i. The standard deviation of the population - known or unknown.
ii. The number of sample - more or less than 30.
iii. The level of confidence - set by the researcher.
• Knowing (i) and (ii) are to determine the decision of choosing
either using:
i. t-score; or
ii. z-score..
MdRodiSPSS
43
t-score versus z-score
How to calculate the range of value of certain value to be the BMI
and prevalence of obesity to all Shah Alam population?
Do you know the POPULATION
STANDARD DEVIATION, σ?
No
Yes
Is the sample size
above 30
Yes
MdRodiSPSS
No
Use the
Use the
z-score
t-score
44
t-score versus z-score
1) When the σ in UNKNOWN OR the σ is KNOWN but
sample < 30: (using t-score)
• The value of critical value t1/2 will be determined from t-table
based on:
i. Level of significance.
ii. Degree of freedom.
iii. One or two-sided.
2. When the σ in KNOWN and sample > 30: (using z-score)
• confidence coefficient (Z) will be determined from standard
normal distribution table based on the degree of the confident.
• When the degree of confidence is:
i. 90%, z is 1.64.
ii. 95%, z is 1.96.
MdRodiSPSS
45
iii. 99%, z is 2.58..
t-score versus z-score
when the σ in UNKNOWN
OR σ in KNOWN but
sample < 30
When the σ in KNOWN +
sample > 30
General
formula
Point estimate ±
(critical value * SE)
Point estimate ±
(confidence coefficient * SE)
mean interval
Point estimate ±
[ t½ * (SD/ √n) ]
Point estimate ±
[ z * (SD/ √n) ]
Proportion
interval
Point estimate ±
[ t½ * ( √ p (1 – p) / n ) ]
Point estimate ±
[ z * ( √ p (1 – p) / n ) ]
MdRodiSPSS
46
Estimation & Confidence Intervals
Research question:
• What is the mean (with 95% confidence intervals) of Body Mass
Index (BMI) and the prevalence of Tuberculosis in Shah Alam?
i.
ii.
Body Mass Index – to estimate mean and 95% confidence
intervals
Tuberculosis – to estimate proportion and 95% confidence
intervals
MdRodiSPSS
47
Hands-on: Estimation (Mean)
Calculate the mean of Body Mass Index and 95%CI
1
2
4
3
5
MdRodiSPSS
48
Output: Estimation (Mean)
• The mean of BMI : 31.20 (95%CI: 30.64, 31.75)
MdRodiSPSS
49
Hands-on: Estimation (Mean)
Calculate the mean of Body Mass Index and 95%CI stratified by Gender
1
2
4
3
5
6
MdRodiSPSS
50
Output: Estimation (Mean)
• The mean BMI for Male :
30.61 (95%CI: 29.79,
31.43)
• The mean BMI for
Female: 31.80 (95%CI:
31.09, 32.52)
MdRodiSPSS
51
Hands-on: Estimation (Proportion)
Calculate the prevalence of Tuberculosis
1
2
3
Tips (Coding) must be:
• 0 for No
• 1 for Yes
4
5
MdRodiSPSS
52
Output: Estimation (Proportion)
The prevalence of Tuberculosis : 43.4% (95%CI: 33.8, 53.0)
MdRodiSPSS
53
Hands-on: Estimation (Proportion)
Calculate the prevalence of Tuberculosis stratified by Gender
1
2
3
Tips (Coding) must be:
• 0 for No
• 1 for Yes
4
5
6
MdRodiSPSS
54
Output: Estimation (Proportion)
• The prevalence of Tuberculosis for Male : 31.5% (95%CI: 18.7, 44.3)
• MdRodiSPSS
The prevalence of Tuberculosis for Female : 55.8% (95%CI: 41.8. 69.7)55
2. Hypothesis testing
• Hypothesis testing:
- It express the degree of accuracy of sample results to represent
the true situation in a population.
• Purpose:
- To aid the researcher in reaching a conclusion concerning the
population by examining a sample from the population.
• This is how we decide if:
- Effect actually occurred.
- Treatment have effects.
- Groups different from each other.
- One variable predicts another..
MdRodiSPSS
56
Types of Hypothesis
1) Null hypothesis (Ho)
• Hypothesis of no difference.
• No difference or relationship between the variable of interest.
2)
•
•
•
Alternate Hypothesis (Ha)
Hypothesis that contradict null hypothesis
Can indicate direction of the difference or relationship
Also called research hypothesis..
MdRodiSPSS
57
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
58
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
59
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 3
Analyzing Quantitative Data:
T-test
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
60
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
61
Student t-test
• A t-test is any statistical hypothesis testing in which the test
statistic follow a Student’s t-distribution.
• It is one of the probability distributions used in statistics when
dealing with continuous random variables and follow a standard
normal distribution.
• It is used for hypothesis testing involving numerical data
(comparing means).
• It is almost similar to the Standard Normal Distribution (zdistribution).
MdRodiSPSS
62
Types of t-test
T-test
MdRodiSPSS
Group
T-test
Comparing mean of
one group
1-sample
t-test
Comparing mean of
pair group
Dependent
t-test
Comparing means of
two independent groups
Independent
t-test
63
3.1 One sample t-test
• It helps to determine whether μ (the population mean) is equal to
a hypothesized value (the test mean).
• The test uses the standard deviation of the sample (s) to estimate
the standard deviation of the population (σ)..
• The hypotheses are specified about a single distribution..
MdRodiSPSS
64
Assumptions
i.
The outcome must be a continuous variable (numerical variable
i.e. interval or ratio)
ii. The sample is selected by random sampling - each individual in
the population has an equal probability of being selected in the
sample.
iii. The tested data is normally distributed (i.e. no outliers) OR
sample size is big (≥30).
iv. Scores on the test variables are independent (i.e. independent of
observations)..
MdRodiSPSS
65
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
66
One sample t-test
Research question:
• You want to determine whether the weight of the respondents is
statistically different from 70kg.
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• Ho: The μ is equal to 70 kg.
b) Alternate hypothesis:
• Ha: The μ is NOT equal to 70kg (or far from 70kg).
Step 2: Choose the significant level.
• α = 0.05 (two-sided).
MdRodiSPSS
67
One sample t-test
• Step 3: Checking assumptions.
i. The data is a continuous variable (numerical variable i.e. interval
or ratio).
ii. The sample is selected by random sampling - each individual in
the population has an equal probability of being selected in the
sample.
iii. The tested data is normally distributed (i.e. no outliers) OR
sample size is big (≥30).
iv. Scores on the test variables are independent (i.e. independent of
observations).
• Step 4: Choose the test statistic.
- One sample t-test with (n – 1) df..
MdRodiSPSS
68
One sample t-test
• Step 5: Find p value.
- to calculate t-calc..
• Formula:
•
•
•
•
X : mean from our sample
μ : our hypothesized mean
s : standard deviation from our sample
n : number of our sample
MdRodiSPSS
69
Hands-on: One sample t-test
1
4
2
3
5
6
MdRodiSPSS
70
Output: One sample t-test
• The p-value is 0.455 which is more than alpha (0.05).
MdRodiSPSS
71
Conclusion
Step 6: Conclusion
a) Statistical conclusion:
• Since the p-value (p=0.455) is more than α (0.05), we DO NOT
reject Ho.
• Therefore, we can conclude that there is no significant difference
that the mean weight of the respondents is statistically different
from 70kg (OR far from 70kg).
• Since the result is not statistically significant, therefore the
difference observed could be due to chance.
b) Probem conclusion:
• The weight of the respondents is equal to 70kg OR the weight of
the respondent is not far from 70kg..
MdRodiSPSS
72
Output: Summary
Table: The comparison between weight and the test value of the weight
(N=106)
Weight
a
N
Mean (SE)
Test value
ta (df)
Mean difference
(95% CI)
p-value
106
69.81 (0.26)
70.00
-0.750
(105)
-0.19
(-0.71, 0.32)
0.455
Statistical test: One sample t-test
MdRodiSPSS
73
3.2 Independent t-test
•
•
It is most commonly used method to evaluate the differences in
means between two groups.
The independent t-test compared the means between two unrelated groups (independent variable) on the same continuous
dependent variable..
Independent variable
Dependent variable
(continuous data)
Group 1
Compare
Means
Group 2
MdRodiSPSS
74
Assumptions
i. Dependent variable is either interval or ratio.
ii. Random samples.
iii. Independent variable consists of two independent groups –
sample should appear in only one group and these groups are
un-related.
iv. Dependent variable is approximately normally distributed in
each population OR sample size is big (n ≥ 30).
v. Variances between the two groups must be equal (homogeneity
of variances) – can be checked by looking at the Levene’s test.
Levene's test:
i. When the test is significant (p<0.05) → the variance is unequal
→ t-test is NOT valid.
ii. When the test is not significant (p>0.05) → the variance are
equal – t-test is valid..
MdRodiSPSS
75
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
76
Independent t-test
Research question:
• Is there any difference in the body mass index (BMI) between
male and female.
Step 1: Specify the Ho and Ha
a) Null hypothesis:
• Ho: There is no different in the mean of BMI between male and
female (Ho: μmale = μfemale).
b) Alternate hypothesis:
• Ha: There is a different in the mean of BMI between male and
female (Ha: μmale ≠ μfemale).
Step 2: Choose the significant level
MdRodiSPSS
• α = 0.05 (two-sided)..
77
Independent t-test
Step 3: Checking assumptions
• Dependent variable is either interval or ratio.
• Random samples.
• Independent variable consists of two independent groups –
sample should appear in only one group and these groups are unrelated.
• Dependent variable is approximately normally distributed in each
population OR sample size is big (n ≥ 30).
• Equal variances between the two groups (homogeneity of
variances) – can be checked by looking at the Levene’s test.
Step 4: Choose the test statistic
• Independent t-test with (n1 + n2 - 2) df.
MdRodiSPSS
78
Steps in hypothesis testing
Step 5 : Find p value
- to calculate t-calc
Formula:
MdRodiSPSS
79
Hands-on: Independent t-test
1
4
2
3
5
6
10
7
8
MdRodiSPSS
9
80
Output: Independent t-test
Descriptive
statistics
•
•
Levene’s test → test for equality of variances (assumption)
Since p-value is not significant (p=0.118), we can conclude that the
variances of both groups are equal.
•MdRodiSPSS
Therefore, t-test is valid..
81
Output: Independent t-test
•
The p-value is 0.031 which is less than alpha at 0.05
Step 6: Conclusion
a) Statistical conclusion:
• Since the p-value (p=0.031) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we can conclude that there is a significant difference in the
mean of BMI between male and female (p=0.031)..
MdRodiSPSS
82
Output: Independent t-test
Problem conclusion:
• There is a difference in the BMI between male and female [mean
difference: -1.19 (95%CI: -2.27, -0.11)]
• BMI for male is lower than BMI for female [30.61 (SE: 0.41) versus
31.80 (SE: 0.36)].
• We are 95% confident that the difference of BMI between male and
female is in the range between 0.11 to 2.27 kg/m2 (in average
1.19kg/m2)..
MdRodiSPSS
83
Output: Summary
Table: The comparison of body mass index between male and female (n=106)
BMI
Gender
N
Mean (SE)
ta
(df)
Mean difference
(95%CI)
p-value
Male
54
30.61 (0.41)
52
31.80 (0.36)
-1.19
(-2.27, -0.11)
0.031*
Female
-2.196
(104)
* statistically significant at α=0.05
a Statistical test: Independent t-test
MdRodiSPSS
84
3.3 Paired t-test
• It compares two-paired observations from the same individual or
on match individuals.
• It is also known as a t-test for repeated measure ot a t-test for
matched samples
• Often used with a pre and post test design and data in pairs.
Examples:
i. Is the new drug decrease the patient’s blood pressure?
ii. Is the new medication can reduce the weight?
iii. Is there any differencee in the intraocular pressure (IOP)
between right and left eye?
MdRodiSPSS
85
Assumptions
i.
ii.
iii.
iv.
The dependent variables must be numerical data.
Random samples.
The observations are paired or dependent.
The difference between pair (before and after) is normally
distributed, unless the sample size is big enough (n ≥ 30).
- we need to calculate the difference between pair.
- then determine the distribution of the difference between pair.
i. if the data is normally distributed → t-test is valid
ii. if the data is not normally distributed → t-test is NOT valid..
MdRodiSPSS
86
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
87
Paired t-test
Research question:
• Is there any difference in the systolic blood pressure before and
after treatment.
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• The mean of pre-treatment and post-treatment of systolic blood
pressure are the same (Ho: μdifference = 0).
b) Alternate hypothesis:
• The mean of pre-treatment and post-treatment of systolic blood
pressure are NOT the same (Ha: μdifference ≠ 0).
Step 2: Choose the significant level.
• α = 0.05 (two-sided).
MdRodiSPSS
88
Paired t-test
Step 3: Checking assumptions.
• The dependent variables must be numerical data.
• Random samples.
• The observations are paired or dependent.
• The difference between pair (before and after) is normally
distributed, unless the sample size is big enough (n ≥ 30).
Step 4: Test statistics.
• Paired t-test, with (n – 1)df..
MdRodiSPSS
89
Paired t-test
Step 5: Find p-value
- to calculate t-calc..
Formula:
d : mean difference,
s : sample standard deviation,
n : sample size and
t : a Student t with n-1 degrees of freedom
MdRodiSPSS
90
Hands on: Checking the normality
of the difference
1
2
4
3
5
MdRodiSPSS
• A new variable : “Dif_SBP” ill be
created at the right end of the data..
91
Hands on: Checking the normality of the
difference (creating histogram)
1
4
5
2
3
MdRodiSPSS
6
92
Output: Paired t-test
• The data for “Dff_SBP”
is normally distributed.
• Assumption fulfilled.
• Paired t-test is valid..
MdRodiSPSS
93
Hands on: Paired t-test
1
4
2
3
5
MdRodiSPSS
94
Output: Paired t-test
Descriptive
statistics
Step 6: Conclusion.
a) Statistical conclusion:
• Since the p-value (p<0.001) is less than α (0.05), we reject Ho and accept
Ha.
• Therefore, we can conclude that the mean of pre-treatment and posttreatment of systolic blood pressure are NOT the same (Ha: μdifference ≠
0)..
MdRodiSPSS
95
Output: Paired t-test
b) Problem conclusion:
• The mean difference of SBP before and after treatment is not 0 [mean
paired difference: 8.89 (95%CI: 6.71, 11.06)]
• The difference is 8.89mmHg (which is a truely difference).
• The SBP before treatment is higher than SBP after treatment [146 (SE:
1.24) versus 137 (SE: 0.72)].
• We are 95% confident that the different of SBP before and after
treatment was in the range of 6.71 to 11.06 mmHg (in average 8.89
mmHg).
MdRodiSPSS
96
Output: Summary
Table: The paired difference between SBP (before treatment) and SBP (after
treatment) (N=106)
N
SBP
106
Before
treatment,
Mean (SE)
After
treatment,
Mean (SE)
146.67 (1.23) 137.98 (0.72)
Mean paired
difference
(95%CI)
ta
(df)
p-value
8.89
(6.71, 11.06)
8.095
(105)
<0.001*
* statistically significant at α=0.05
a Statistical test: Paired t-test
MdRodiSPSS
97
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
98
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 4
Analyzing Quantitative Data:
One-way Analysis of Variance (ANOVA)
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
99
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
100
ANOVA
• It is a technique for comparing means and is an extension of the
t-test
• It is useful in comparing (testing) two or more means (groups or
variables) for statistical significance.
• It determines means of ≥ 2 independent groups significantly
different from one another.
• Only one independent variable (factor / grouping) with ≥ 2
groups.
i. Grouping variable → nominal
ii. Outcome variable → interval of ratio.
MdRodiSPSS
101
One-way ANOVA
Uncontrolled
DM
Controlled
DM
Normal
MEAN
AGE
MEAN
AGE
MEAN
AGE
MdRodiSPSS
102
One-way ANOVA
Uncontrolled
DM
Controlled
DM
Normal
Independent
t-test
Independent
t-test
Independent
t-test
MdRodiSPSS
Increase in Type I or alpha error
WRONGly rejecting H0 is true
103
One-way ANOVA
Uncontrolled
DM
Controlled
DM
Normal
2
1
3
Overall ANOVA test
SIGNIFICANT
(p-value < 0.05)
POST HOC TEST
Which pairs have significant different of mean
MdRodiSPSS
104
ANOVA
Sum of Squares (SS) & Mean Squares (MS):
• 2 possible sources of variations:
i. between the groups - groups have different means that vary
about the overall means.
ii. within the groups - reflect that not all the subjects within the
group have exact same values.
• Types:
i. One- way ANOVA : only 1 independent variable.
ii. Two-way ANOVA : two independent variables.
Hypotheses:
• Ho: Population group means are equal to one another.
• Ha: at least one pair mean difference between groups..
MdRodiSPSS
105
Assumptions
i. Random samples measured in interval or ratio scales.
ii. Test are independent from each other.
iii. In each independent group, their dependent variables are
normally distributed.
iv. The variances in between group are equal population
(homogenous).
- It will be tested using Levene’s test (homogeneity of
variances).
Note: Assumptions of ANOVA test ∼ independent t-test..
MdRodiSPSS
106
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
107
ANOVA
Research question:
• Is there any different in the weight between race (Malay,
Chinese and Indian?
Step 1: Specify the hypotheses.
a) Null hypothesis:
• There is no mean difference in the weight between race.
(Ho: μmalay = μchinese = μindian).
b) Alternate hypothesis:
• There is at least one pair of mean difference in the weight
between race.
Step 2: Choose the significant level.
•MdRodiSPSS
α = 0.05 (one-sided)..
108
Steps of hypothesis testing
Step 3 : Choose test statistics.
- ANOVA with (k – 1, n - k) df.
Step 4: Checking assumptions.
• Random samples measured in interval or ratio scales.
• Test are independent from each other.
• In each independent group, their dependent variables are
normally distributed.
• The variances in between group are equal population
(homogenous).
Step 5 : Find p value.
- to calculate F-ratio (F-calculation)..
MdRodiSPSS
109
Steps on calculating F-Ratio
1.
2.
3.
4.
5.
Calculate the Grand Mean or the overall mean.
Calculate Sum Squares:
2a - Sum of squares between groups (SSB).
2b - Sum of squares within groups (SSW) .
2c - Total sums of squares (SST).
Calculate degree of freedom (df):
3a - Total.
3b - Within group.
3c - Between group.
Calculate Mean Squares:
4a - Mean Square Between group (MSB).
4b - Mean Square Within group (MSW).
Calculate F-Ratio..
MdRodiSPSS
110
Calculate F-Ratio (test statistic)
• F-Ratio is a ratio of two sample variances.
• The F-test statistic is found by dividing the between group
variance (MSB) by the within group variance (MSW).
F-ratio = MSB / MSW..
• The larger the differences in the mean, the larger the
treatment variance component, the larger the F.
• The F-ratio follows the F-distribution which is a positively
skewed distribution with only positive values..
MdRodiSPSS
111
Source Table
Source of
variation
SS
Degree of
freedom
MS
F-ratio
Between
SSB
k–1
SSB
k–1
MSB
MSW
Within
SSW
n–k
SSW
n–k
Total
SST
n-1
MdRodiSPSS
112
Hands-on: One-way ANOVA
1
4
2
6
3
5
MdRodiSPSS
113
Hands-on: One-way ANOVA
7
8
10
9
MdRodiSPSS
114
Output: One-way ANOVA
• Test of Homogeneity of
variances it to test whether the
variances are the same (one of
the assumption)
• Since the p-value is more than 0.05 (p=0.073), therefore we assume
that the variances are homogenuous.
MdRodiSPSS
115
Output: One-way ANOVA
Step 6: Conclusion
a) Statistical conclusion:
• Since the p-value (p<0.001) is less than α (0.05), we reject Ho and
accept the Ha.
• Therefore, we can conclude that there is at least one pair of mean
difference in the weight between race (p<0.001)..
MdRodiSPSS
116
Output: One-way ANOVA
b) Problem conclusion
• There is at least one pair difference in the weight between race.
Which pair?
? between Malay and Chinese
? Between Malay and Indian
? between Chinese and Indian
MdRodiSPSS
Post-hoc test
117
Hands on: One-way ANOVA (Post-hoc)
1
4
2
MdRodiSPSS
3
118
Output: One-way Post-hoc ANOVA
Interpretations:
• There is a significant difference in the mean weight between Malay
and Chinese. The weight of Malay is significantly lower compared to
weight of Chinese (mean diff.: -3.05, 95%CI: -4.14, -1.96; p<0.001)
MdRodiSPSS
119
Output: Post-hoc ANOVA
Interpretations:
• There is a significant difference in the mean weight between Malay
and Indian. The mean weight of Malay is significantly lower than
mean weight of Indian (mean diff.: -4.66; 95% CI: -5.90, -3.42;
p<0.001).
• There is a significant difference in the mean weight between Chinese
and Indian. The mean weight of Chinese is significantly lower than
mean weight of Indian (mean diff.: -1.61; 95% CI: -2.77, -0.45;
MdRodiSPSS
120
p=0.003)..
Output: Summary
Table: Mean weight between race (N=106)
Variable
N
Mean Weight
(SD)
F-statisticsa
(df)
P-value
Race:
Malay
Chinese
Indian
33
46
27
67.29 (1.82)
70.35 (2.24)
71.96 (1.59)
85.071
(2,27)
<0.001*b
* statistically significant at α=0.05
a One way ANOVA test
b mean SBP (after treatment) “Malay” and “Chinese” (p<0.001); “Malay” and “Indian”
(p<0.001); and “Chinese” and “Indian” (p=0.003) were significantly different by post hoc
test Bonferroni’s procedure..
MdRodiSPSS
121
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
122
Basic Biostatistics Workshop
using SPSS 2019
Lesson 5
Analyzing Categorical Data
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
123
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
124
Categorical Data Analyses
•
•
•
•
i.
ii.
Categorical Data Analysis (CDA) involves the analysis of data
with a categorical response variables which is one of the nonparametric method for analyzing the data.
The categorical data can be nominal or ordinal variable.
The data in such a study represents count or frequencies of
observations in each category.
It can be:
estimating single proportion and
comparing two or more proportions..
MdRodiSPSS
125
Types of Categorical Data
One proportion
• Binomial test
• Chi-square Goodness
for fits
Two or more proportions
Analysis of
categorical
data
• Pearson Chi-square
(Chi-square test for
independence)
• Chi-square test for
homogeneity
• Yate’s correction
• Fisher’s Exact test
Dependent group
MdRodiSPSS
Stratified sampling to
control confounder effect
McNemar’s test
Mantel-Haenszel test
126
5.1.1 Binomial Test
• It is a test of one proportion.
• It is an exact test of the statistical significant of deviation from
the theoretically expected of observation into two categories.
• Purpose: to compare the proportion observed in a sample equals
with standard or special value.
Assumptions:
i. The outcomes can be categorized as binary data (Yes / No).
ii. The observations should be independent from each other.
iii. The total number of observations in category A multiplied by
the total number of observations > 10 and the total number of
observations in category B multiplied by the total number of
observations > 10.
MdRodiSPSS
127
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
128
Binomial Test
Research question:
• It is true that the proportion of Tuberculosis in Shah Alam is 0.5.
• Our hypothesis: The proportion is 0.5.
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• H0: The proportion of Tuberculosis in Shah Alam is 0.5 (H0: p =
0.5).
b) Alternate hypothesis:
• Ha: The proportion of Tuberculosis in Shah Alam is NOT 0.5
(Ha: p ≠ 0.5) (or not far from 0.5).
Step 2: Choose the significant level.
• α = 0.05 (two-sided)..
MdRodiSPSS
129
Steps in hypothesis testing
Step 3: Checking assumptions.
• The outcomes can be categorized as a binary data (Yes / No).
• The observations should be independent from each other.
• The total number of observations in category A multiplied by
the total number of observations >10, and the total number of
observations in category B multiplied by the total number of
observations >10.
Step 4: Choose the test statistic.
- Binomial test (testing of single proportion).
Step 5: Find p value.
- to calculate z, using formula,
MdRodiSPSS
130
Hands-on: Explore
First step: to find the proportion of the Tuberculosis.
1
2
4
3
5
MdRodiSPSS
131
Output: Binomial Test
• The proportion of Tuberculosis is: 0.43 (95%CI: 0.34, 0.53)
• Our hypothesis is: 0.5
MdRodiSPSS
132
Hands-on: Binomial Test (Method 1)
1
5
6
2
7
3
4
MdRodiSPSS
133
Output: Binomial Test
Step 6: Conclusion
a) Statistical conclusion:
• Since p-value (p=0.207) is more than α (0.05), we DO NOT reject
the Ha.
• Therefore, we can conclude that the proportion of Tuberculosis in
Shah Alam is 0.5 OR not far from 0.5.
b) Problem conclusion:
• The proportion of Tuberculosis in Shah Alam is 0.5 (OR not far
from 0.5)..
MdRodiSPSS
134
Hands-on: Binomial Test (Method 2)
1
4
5
2
MdRodiSPSS
3
135
Hands-on: Binomial Test (Method 2)
6
9
7
10
8
12
11
15
13
MdRodiSPSS
14
136
Output: Binomial Test
• p=0.207
MdRodiSPSS
137
Output: Binomial Test
Step 6: Conclusion.
a) Statistical conclusion:
• Since p-value (p=0.207) is more than α (0.05), we DO NOT reject
the Ha.
• Therefore, we can conclude that the proportion of Tuberculosis in
Shah Alam is 0.5 OR not far prom 0.5.
b) Problem conclusion:
• The proportion of Tuberculosis in Shah Alam is 0.5 (OR not far
from 0.5)..
MdRodiSPSS
138
Output: Summary
Table: The proportion of Tuberculosis in Shah Alam (N=106)(Hypothesis: 0.5)
Variable
Tuberculosis
a
n
Proportion
Test statistica
95% CI
p-value
106
0.43
60.00
0.34, 0.53
0.207
Statistical test: Binomial test
MdRodiSPSS
139
5.1.2 Chi-square Goodness for fits
•
•
It is referred to as one-sample chi-square.
It explores the proportion of cases that fall into the various
categories of a single categorical variable, and compared these
with hypothesized values.
• Two values are involved:
i. Observed number - which is the frequency of a category from
sample.
ii. Expected number - which is calculated based upon the claim
distribution..
MdRodiSPSS
140
The properties of Goodness-of-fit
• The data are the observed frequencies.
• This means that there is only one data value for each category.
• The degrees of freedom in one less than number of categories (df
= k – 1)
• It has a chi-square distribution with a right tail test.
• The value of the test statistic doesn’t change if the order of the
categories is switched..
MdRodiSPSS
141
Chi-square Goodness-of-fit
Assumptions:
• Random sample.
• The observations must be independent and mutually exclusive.
• Have count number of categorical data .
• Sufficiently large sample size – to ensure that the expected count
should be at least five (5) in each category (or not more than 20% of
cells have expected count less than 5).
• Data are categorical at nominal or ordinal levels..
MdRodiSPSS
142
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
143
Chi-square Goodness for fits
Research question:
• You want to determine whether the number of smokers in the
sample correspond to that reported in the literature from a
nationwide study (20% smokers and 80% non-smokers).
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• Ho: The proportion of smokers and non smokers in the sample
fits the given distribution (i.e. 20% smokers; 80% non-smokers).
b) Alternative hypothesis:
• Ha: The sample has a different distribution.
Step 2: Choose the significant level.
• α = 0.05 (one-sided)..
MdRodiSPSS
144
Chi-square Goodness-of-fit
Step 3: Checking assumptions:
• Random sample.
• The observations must be independent and mutually exclusive.
• Have count number of categorical data .
• Sufficiently large sample size – to ensure that the expected count
should be at least five (5) in each category (or not more than 20%
of cells have expected count less than 5).
• Data are categorical at nominal or ordinal levels.
Step 4: Choose the test statistic.
- chi-square test for goodness-of-fit with df = 1.
Step 5: Find p value.
(Observed - Expected)2
- to calculate x2, using formula: x2 = Σ ---------------------------MdRodiSPSS
Expected
145
Hands-on: Chi-square Goodness-for-fits
(Method 1)
1
5
6
7
2
3
MdRodiSPSS
4
146
Hands-on: Chi-square Goodness-for-fits
(Method 1)
8
9
10
MdRodiSPSS
147
Output: Chi-square Goodness-for-fits
Descriptive statistics:
• The Observed number, Expected
number and residual of the data
p<0.001.
MdRodiSPSS
148
Output: Chi-square Goodness-for-fits
Step 6: Conclusion
a) Statistical conclusion:
• Since the p-value (p<0.001) is less than α (0.05), we reject Ho
and accept Ha.
• Therefore, we can conclude that the sample sample significantly
has a different distribution.
b) Problem conclusion:
• The proportion of smokers and non smokers in the sample fits is
different from the given distribution (i.e. 20% smokers; 80% nonsmokers).
* If we explore the data for smoking status, the prevalence of
smoking is : 43.4% (95%CI: 33.8, 53.0).
MdRodiSPSS
149
Hands-on: Chi-square Goodness-for-fits
(Method 2)
1
4
5
2
MdRodiSPSS
3
150
Hands-on: Chi-square Goodness-for-fits
(Method 2)
6
9
7
10
8
11
12
MdRodiSPSS
151
Hands-on: Chi-square Goodness-for-fits
(Method 2)
13
14
15
16
MdRodiSPSS
152
Output: Chi-square Goodness-for-fits
• p<0.001.
• The conclusions are the same like previous example
MdRodiSPSS
153
5.2.1 Pearson Chi-square
(Chi-square test for independence)
• It is a statistical hypothesis test statistic in which the sampling
distribution of the test statistic is a chi-squared distribution when
the null hypothesis is true.
• It is to determine the relationship or association between two
categorical variables (independent and dependent).
• It compares the frequency of cases found in various categories of
one variable across the different categories of another variables..
MdRodiSPSS
154
5.2.1 Pearson Chi-square
(Chi-square test for independence)
Assumptions:
i. The data must be in the form of frequencies in both variables.
ii. The observations recorded are collected on a random basis.
iii. Independent observations - each person or case can only be
counted once. They cannot appear in more than one category or
group and the data from one subject cannot influence the data
from another.
iv. The lowest expected frequency in any cell should be 5 (or not
more than 20% of cells have expected counts less than 5)
v. Both variables are categorical.
MdRodiSPSS
155
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
156
Pearson Chi-square
Research question:
• Is there any association between smoking and heart disease?
Step 1: Specify Ho and Ha
a) Null hypothesis:
• There is no association between smoking and heart disease.
b) Alternate hypothesis:
• There is an association between smoking and heart disease..
Step 2: Choose the significant level.
• α = 0.05 (two-sided)..
MdRodiSPSS
157
Pearson Chi-square
Step 3: Checking assumptions:
• Random sample.
• The observations must be independent and mutually exclusive.
• Have count number of categorical data.
• Sufficiently large sample size – to ensure that the expected count
should be at least five (5) in each category (or not more than 20%
of cells have expected count less than 5).
• Data are categorical at nominal or ordinal levels.
Step 4: Choose the test statistic.
- Pearson chi-square test with (c – 1)(r – 1) df.
Step 5: Find p value.
(Observed - Expected)2
- to calculate x2, using formula: x2 = Σ ---------------------------MdRodiSPSS
Expected
158
Hands on: Pearson Chi-square
1
4
2
3
MdRodiSPSS
6
5
159
Hands on: Pearson Chi-square
7
10
8
9
MdRodiSPSS
160
Hands on: Pearson Chi-square
11
12
14
13
MdRodiSPSS
161
Output: Pearson Chi-square
• Percentage of heart
disease among
smoker = 65.2%
• Percentage of heart
disease among non
smoker = 20.0%
• Since 0 cells (0.0%)
have expected count
less than 5, therefore
Chi-square test for
independence
(Pearson Chisquare) is valid.
MdRodiSPSS
162
Output: Pearson Chi-square
Step 6: Conclusion.
a) Statistical conclusion:
• Since p-value (p<0.001) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we can conclude that there is a significant association
between smoking and heart disease..
MdRodiSPSS
163
Output: Pearson Chi-square
b) Problem conclusion:
• There is an association between Smoking and Heart disease.
What is the association?
Explaination:
• to answer this question, we need to know the study design and the
risk measurement.
i. for cross-sectional study and case control study, the risk
measurement is odds ratio (OR).
ii. for cohort study, the risk measurement is Relative Risk (RR)..
MdRodiSPSS
164
Output: Pearson Chi-square
OR
RR
• Based on Odds ratio (OR): the smokers are almost 7.5 times more
likely to have heart disease compared to the non-smokers [OR: 7.5
(95%CI: 3.1, 18.1)]
• Based on Relative Risk (RR): the smokers are almost 2.3 times
more likely to develop lung cancer compared to the non-smokers
[RR: 2.3 (95%CI: 1.52, 3.49)]..
MdRodiSPSS
165
Output: Summary
Table: The cross-tabulation between heart disease and smoking statuses (N=106)
Heart disease status (N=109)
Smoking
status
Total,
Frequency
(%)
Chisquare
(df)
p-value
OR
(95%CI)
22.253
(1)
<0.001*
7.50
(3.12, 18.0))
Yes,
Frequency (%)
No,
Frequency (%)
Yes
30 (65.5)
16 (34.8)
46 (100.0)
No
12 (20.0)
48 (80.0)
60 (100.0)
* Statistically significant at α=0.05
Table: The cross-tabulation between heart disease and smoking statuses (N=106)
Heart disease status
Smoking
status
Total,
Frequency
(%)
Chisquare
(df)
p-value
RR
(95%CI)
22.253
(1)
<0.001*
2.30
(1.52, 3.49)
Yes,
Frequency (%)
No,
Frequency (%)
Yes
30 (65.5)
16 (34.8)
46 (100.0)
No
12 (20.0)
48 (80.0)
60 (100.0)
MdRodiSPSS
* Statistically
significant at α=0.05
166
5.2.2 Chi-square Test-for-Homogeneity
• It is to determine the distribution of a particular characteristic is
similar for various groups (i.e. to see the two populations are
homogenous).
• It is used with a single categorical variable from two (or more)
independent sample.
Hypotheses:
• Ho: the proportions for the two (or more) distributions are the
same.
• Ha: at least one of the proportion pair of the distribution is
different..
MdRodiSPSS
167
Chi-square Test-for-Homogeneity
Assumptions:
i. The data must be in the form of frequencies in both variables.
ii. The observations recorded are collected on a random basis.
iii. Independent observations - each person or case can only be
counted once. They cannot appear in more than one category or
group and the data from one subject cannot influence the data
from another.
iv. The lowest expected frequency in any cell should be 5 (or not
more than 20% of cells have expected counts less than 5)
v. Both variables are categorical.
MdRodiSPSS
168
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
169
Chi-square Test-for-Homogeneity
Research Question:
• Is there any different in the proportion of anxiety status (Mild.
Moderate and severe) between those who have Tuberculosis and
no Tuberculosis?
Step 1: Specify Ho and Ha.
a) Null hypothesis:
• Ho: There is no significant different in the proportion of anxiety
status (Mild. Moderate and severe) between those who have
Tuberculosis and no Tuberculosis?
b) Alternative hypothesis:
• Ha: There is at least one significant difference in the proportion
of anxiety status (Mild. Moderate and severe) between those
MdRodiSPSS
170
who have Tuberculosis and no Tuberculosis?
Chi-square Test-for-Homogeneity
Step 2: Choose the significant level.
• α = 0.05 (two-sided).
Step 3: Checking assumptions.
i. The data must be in the form of frequencies in both variables.
ii. The observations recorded are collected on a random basis.
iii. Independent observations - each person or case can only be
counted once. They cannot appear in more than one category or
group and the data from one subject cannot influence the data
from another.
iv. The lowest expected frequency in any cell should be 5 (or not
more than 20% of cells have expected counts less than 5)
v. Both variables are categorical..
MdRodiSPSS
171
Chi-square Test-for-Homogeneity
Step 4: Choose the test statistic.
- Pearson chi-square test with (c – 1)(r – 1) df.
Step 5: Find p value.
- to calculate x2, using formula:
MdRodiSPSS
(Observed - Expected)2
x2 = Σ ---------------------------Expected
172
Hands-on: Test-for-Homogeneity
1
2
4
3
MdRodiSPSS
6
5
173
Hands-on: Test-for-Homogeneity
7
9
8
MdRodiSPSS
174
Hands-on: Test-for-Homogeneity
10
11
13
12
MdRodiSPSS
175
Output: Test-for-Homogeneity
Descriptive
statistics
p>0.05
MdRodiSPSS
176
Conclusion
Step 6: Conclusion
a) Statistical conclusion:
• Since the p-value (p=0.053) is more than α (0.05), we DO NOT
reject Ho.
• We can conclude that there is no significant different in the
proportion of anxiety between tuberculosis status.
b) Problem conclusion:
• There is no significant different in the proportion of anxiety status
(Mild. Moderate and severe) between those who have Tuberculosis
and no Tuberculosis..
MdRodiSPSS
177
Output: Summary
Table: The cross-tabulation between Tuberculosis status and anxiety status (N=106)
Anxiety
Tuberculosis
status
a
Total
Chi
square
(df)a
pvalue
5.860
(2)
0.053
Mild,
Freq.,
n(%)
Moderate
Freq.,
n(%)
Severe.
Freq.,
n(%)
Yes
15
(31.9)
12
(44.4)
19
(59.4)
46
(43.3)
No
32
(68.1)
15
(55.6)
13
(40.6)
60
(56.6)
Statistical test: Chi-square Test-for-Homogeneity
MdRodiSPSS
178
5.2.3 Fisher’s Exact Test
• It is a statistical significance test used in the analysis of
contingency table.
• Although in practice it is employed when sample sizes are small,
it is valid for all samples.
• It is an analysis for independence in 2 x 2 table when the
assumptions for the chi-square test are not met.
Criteria:
• Both variables are dichotomous qualitative (2 x 2 table).
• When one of the expected value in 2 x 2 table is less than 5.
• The binary data are independent.
• Sample size of less than 20.
• Sample size of 20 to less than 50 but one or more of the cells
MdRodiSPSS
have expected value of less than 5..
179
Fisher’s Exact Test
Formula:
(a + b)! (a + c)! (b + d)! (c + d)!
x2 = ----------------------------------------(a+b+c+d)! a! b! c! d!
Assumptions:
• The assumptions for Fisher's exact test are almost the same like
Person's chi-square test.
• However:
i. When one of the expected value (note: not the observed value) in
a 2 x 2 table is less than 5 and especially when it is less than 1.
ii. The sample size is less than 50..
MdRodiSPSS
180
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
181
Fisher’s Exact Test
Research question:
• You want to determine whether there is an association between
expose to sun and skin cancer.
• Among 17 people who expose to sun - 12 have skin cancer and 5
don’t have.
• Among 10 people who did not expose to sun - 3 have skin cancer
and 7 don’t have.
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• There is no association between sun exposure and skin cancer
status.
b) Alternate hypothesis:
• There is an association between sun exposure and skin cancer
MdRodiSPSS
status..
182
Fisher’s Exact Test
Step 2: Choose the significant level
• α = 0.05 (two-sided).
Step 3: Checking assumptions
• The data must be in the form of frequencies in both variables.
• The observations recorded are collected on a random basis.
• Independent observations - each person or case can only be
counted once. They cannot appear in more than one category or
group and the data from one subject cannot influence the data
from another.
• Sample size less than 50.
• One of the expected value in 2 x 2 table is less than 5.
• Both variable are qualitative variables..
MdRodiSPSS
183
Fisher’s Exact Test
Step 4: Choose the test statistic
• Fisher's exact test with no df.
Step 5: Find p-value
• to calculate x2.
• Formula:
(a + b)! (a + c)! (b + d)! (c + d)!
x2 = ----------------------------------------(a+b+c+d)! a! b! c! d!
MdRodiSPSS
184
Hands-on: Fisher’s Exact Test
1
2
4
3
MdRodiSPSS
6
5
185
Hands-on: Fisher’s Exact Test
7
10
8
9
MdRodiSPSS
186
Hands-on: Fisher’s Exact Test
11
12
14
13
MdRodiSPSS
187
Output: Fisher’s Exact Test
• The percentage of those who
expose to sun developed skin
cancer is 70.6%.
• The percentage of those who
do not expose to sun
developed skin cancer is
30.0%.
MdRodiSPSS
• Since 1 cell (25.0%) has
an expected count less
than 5, the assumption
for Pearson chi-square
test is NOT met.
• We can select Fisher’s
exact test as an option to
solve the problem.. 188
Conclusion
Step 6: Conclusion.
a) Statistical conclusion:
• Since p-value is more than 0.05, we DO NOT reject Ho.
• Therefore, we can conclude that there is no significant association
between sun exposure and skin cancer status (p=0.057).
b) Problem conclusion:
•MdRodiSPSS
There is no association between sun exposure and skin cancer
189
status..
Output: Summary
Table: The cross-tabulation between sun exposure and skin cancer (N=27)
Skin cancer status
Sun
exposure
a
Total,
Frequency
(%)
Chisquare
(df)
pvalue
4.201
(1)
0.057a
Yes,
Frequency (%)
No,
Frequency (%)
Yes
12 (70.6)
5 (29.4)
17 (100.0)
No
3 (30.0)
7 (70.0)
10 (100.0)
Statistical test: Fisher’s exact test
MdRodiSPSS
190
5.2.4 Yates’ Correction
•
Hands-on for Yates’ Correction is the same like Fisher’s Exact
Test.
• However, we choose Yates’ correction when:
i. Sample size is more than 50.
ii. At least 1 cell (25.0%) has an expected count less than 5.
• In the output, we choose Continuity Correction for the p-value.
MdRodiSPSS
191
Summary
No of categories
RXC
2X2
2X2
RXC
Sample size ≥ 50
Yes
Yes
No
Yes / No
At least 80% of
cells have expected
count ≥ 5
Yes
No
Yes / No
No
Appropriate test /
solution
Pearson
Chi-square
Yates’
Correction
Fisher’s
Exact test
Collapse the
categories
MdRodiSPSS
192
5.3 McNemar’s test
• It is a non-parametric chi-square procedure to compare the
proportion obtained from 2x2 contingency table with a
dichotomous trait and matched pairs of subjects, to determine
whether the row and column arginal frequencies are equal (i.e.
whether is “marginal homogeneity”).
• Mostly, it is used in the analysis to compare before and after
findings in the same individual with dichotomous outcome.
• Example: a researcher want to compare the customer satisfaction
(satisfy or not satisfy) before and after the campaign.
• It also used in the analysis in cross-over design and matched
case-control study..
MdRodiSPSS
193
2 x 2 table in McNemar’s test
• The test is often used for the situation where one test for the
presence (1) or absence (0) of something and variable A is the
state at the first observation (i.e. pre-test) and variable B is the
state at the second observation (i.e. post-test).
After campaign
Before
campaign
Total
Satify
Not satisfy
Satisfy
a
b
a+b
Not satisfy
c
d
c+d
a+c
b+d
a+b+c+d
Total
• a & d : Concordant pair.
• c & b : Discordant pair – a pair of different outcome use to test
the different in outcome.
MdRodiSPSS
• df: (r – 1)(c – 1)
194
Hypotheses
After campaign
Before
campaign
Total
Total
Satify
Not satisfy
Satisfy
a
b
a+b
Not satisfy
c
d
c+d
a+c
b+d
a+b+c+d
• The Ho of marginal homogeneity states that the two marginal
probabilities for each outcome are the same.
• i.e. Pa + Pb = Pa + Pc; and
Pc + Pd = Pb + Pd
• Thus, the hypotheses:
- Ho: Pb = Pc
- Ha: Pb ≠ Pc
MdRodiSPSS
195
McNemar’s test assumptions
i. The cases are a random sample from the population.
ii. Related or dependent samples where one categorical dependent
variable with two categories (i.e. dichotomous variable) and one
categorical independent variable with two related groups.
iii. The two groups of your dependent variable must be mutually
exclusive – means that no groups can overlap.
MdRodiSPSS
196
McNemar’s test statistics
•
i.
•
•
•
There are two types of McNemar test statistic:
Marginal Homogeneity test:
Large number of discordant (b + c > 25)
x2 has a chi-squared distribution with 1 df.
Formula:
(b - c)2
x2 = --------b+c
• Odds ratio = c / b
ii. Exact binomial test (continuity correction):
• when b + c < 25
• x2 is not well approximated by the chi-squared distribution.
• Formula:
(I b – c I - 1)2
MdRodiSPSS
x2 = ---------------b+c
197
Research Question
Research question:
• You want to determine if the short course has an effect on the
result of the test (pass and fail).
• The student with the result (pre-short course) in the rows and the
result (post-short course) in the columns.
• The result is displayed below:
Post-course
Precourse
Total
MdRodiSPSS
Total
Pass
Fail
Pass
32
10
42
Fail
40
24
64
72
34
106
198
Steps in hypothesis testing
• Step 1 : Specify the null and alternate hypotheses.
• Step 2 : Choose the significance level α, one or two sided
(tailed).
• Step 3 : Check assumptions.
• Step 4 : Choose the test statistic.
• Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Statistical and Problem conclusions.
• Step 7 : Interpret and give conclusion..
MdRodiSPSS
199
McNemar’s test
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• Ho: There is no difference in the examination result before and
after short course (Ho: Resultpre – Resultpost = 0).
b) Alternative hypothesis:
• Ha: There is a difference in the examination result before and
after short course (Ha: Resultpre – Resultpost ≠ 0).
Step 2: Choose the significance level.
• α= 0.05 (two-sided)..
MdRodiSPSS
200
McNemar’s test
Step 3: Checking assumptions.
• The cases are a random sample from the population.
• Related or dependent samples where one categorical dependent
variable with two categories (i.e. dichotomous variable) and one
categorical independent variable with two related groups.
• The two groups of your dependent variable must be mutually
exclusive.
Step 4: Choose the test statistic.
• McNemar’s (Marginal Homogeneity) test with (r – 1)(c – 1) df
since b + c > 25
Step 5: Find p value.
(b - c)2
• to calculate x2 (Mc Nemar’s Test). Formula: x2 = --------MdRodiSPSS
b+c
201
Hands-on: McNemar’s Test (method 1)
1
2
4
3
MdRodiSPSS
6
5
202
Hands-on: McNemar’s Test (Method 1)
7
8
MdRodiSPSS
9
203
Output: McNemar’s Test
• Since the p-value (p<0.001) is less than α (0.05), we reject the Ho
and accept the Ho.
• Therefore, we can conclude that there is a difference in the
examination result before and after short course..
MdRodiSPSS
204
Hands-on: McNemar’s Test (Method 2)
1
4
5
2
3
MdRodiSPSS
205
Hands-on: McNemar’s Test (Method 2)
9
6
10
7
11
8
12
MdRodiSPSS
206
Output: McNemar’s Test
• Since the p-value is less than 0.05 (p<0.001) at α=0.05, we reject the Ho
and accept the Ho.
•MdRodiSPSS
Therefore, we can conclude that there is a difference in the examination
207
result before and after short course.
Hands-on: McNemar’s Test (Method 3)
1
5
6
7
2
3
MdRodiSPSS
4
208
Output: McNemar’s Test
• Since the p-value is less than 0.05 (p<0.001) at α=0.05, we
reject the Ho and accept the Ho.
• Therefore, we can conclude that there is a difference in the
examination result before and after short course.
MdRodiSPSS
209
Output: Summary
Table: The cross-tabulation between examination result before and after short course
(N=106)
Post Course Result
Pre Course
Result
a
Total,
Frequency (%)
Chi-square
(df)
p-value
16.820
(1)
<0.001a
Pass
Fail
Pass
32
10
42
Fail
40
24
64
Statistical test: McNemar ‘s test
MdRodiSPSS
210
5.4 Mantel-Haenszel test
• We are often interested only in investigating the relationship
between two binary variables – example: a disease and an
exposure. However, we have to control for confounders.
• A confounding variable is a variable that may be associated with
either the disease or exposure or both.
• MH is a test used in the stratified analysis for categorical
variables when there is confounder need to be control.
• It allows an investigator to test the association between a binary
predictor or treatment and a binary outcome while taking into
account the stratification.
• This is another way to test for conditional independence, by
exploring associations in partial tables for 2 × 2 × K tables.
MdRodiSPSS
211
Mantel-Haenszel
• Assumptions:
i. Observation are independent from each other - each observation
comes from a different subject that the subjects were randomly
selected from the population of interest
ii. All observations are identically distributed - sample obtained in
the same way.
MdRodiSPSS
212
Mantel-Haenszel Test
• The Mantel-Haenszel test is based on the z statistic:
• Where the summation (Σ) is across levels of the confounder.
• When the above test is statistically significant, the association
between the disease and the exposure is real.
• Because we assume that the confounder is not an effect modifier,
the odds ratio is constant across its level.
MdRodiSPSS
213
OR for Mantel-Haenszel
• The OR at each level is estimated by ad/bc;
• The Mantel-Haenszel procedure pools data across levels of the
confounder to obtain a combined estimate:
MdRodiSPSS
214
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
215
Mantel-Haenszel test
Research question:
• A researcher want to determine the impact of a new medication
on the treatment of headache (adjusted by gender).
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• Ho: There is no association between medication status and
outcome of headache (adjusted by gender).
b) Alternate hypothesis:
• Ho: There is an association between medication status and
outcome of headache (adjusted by gender).
Step 2: Choose the significant level.
• α = 0.05 (two-sided).
MdRodiSPSS
216
Mantel-Haenszel test
Step 3: Checking assumptions:
• Observation are independent from each other - each observation
comes from a different subject that the subjects were randomly
selected from the population of interest.
• All observations are identically distributed - sample obtained in
the same way.
Step 4: Choose the test statistics
• Mantel-Haenszel test with (r - 1)(c - 1) df.
Step 5: Find the p-value
• Calculate z, Formula:
MdRodiSPSS
217
Hands-on: Mantel-Haenszel test
1
2
4
3
7
5
6
MdRodiSPSS
218
Hands-on: Mantel-Haenszel test
8
13
9
10
11
12
MdRodiSPSS
219
Hands-on: Mantel-Haenszel test
14
15
17
16
MdRodiSPSS
220
Output: Mantel-Haenszel test
Descriptive
statistics
MdRodiSPSS
221
Output: Mantel-Haenszel test
Step 6: Conclusion
a) Statistical
conclusion:
• Since the pvalue (p=0.004)
is less than α
(0.05), we reject
Ho and accept
Ha.
b) Statistical conclusion:
• Therefore, we can conclude that there is a signficant association
between type of medication and headache status after adjusting for
MdRodiSPSS
gender (p=0.004)..
222
Output: Mantel-Haenszel test
• However, there is only Female patients had found association
between type of medication and headache status (p=0.004).
MdRodiSPSS
• There is no significant association for male patients (p=0.221)..
223
Output: Mantel-Haenszel test
Association:
• After adjusting for gender, it was found that only female patients associated
with type of medication and headache status (p=0.004).
• The odds of female patients with new medication associated with 5.82 odd
MdRodiSPSS
224
(95%CI: 1.68, 20.20) to the better outcome compared to those treat with
placebo..
Output: Mantel-Haenszel test
• Homogenous association implies that the conditional relationship
between any pair of variables given the third variable is the same
across the strata in the population.
• The homogeneity of the odds ratio is tested using Breslow-Day and
Terone's
• The output who both test are not statistically significant (p>0.05).
• Therefore, we can conclude that the odds ratio is homogenous across
the strata..
MdRodiSPSS
225
Output: Mantel-Haenszel test
• Conditional independence is testing that odds ratios are the same and
equal to 1 across the strata in the population.
• The conditional independence is tested using Cochran's and MantelHaenszel tests.
• The output show both test are statistically significant (<0.05).
• Therefore, we can conclude that the odds ratios are NOT the same
MdRodiSPSS
226
and equal 1 cross across the strata..
Output: Mantel-Haenszel test
Interpretation:
• There is a statistically significant between type of medication and
headache status after adjusting for gender using MH test [estimate:
3.31 (95%CI: 1.45, 7.59), p=0.005]..
MdRodiSPSS
227
Output: Summary
Table: The cross-tabulation between Medication type and response of treatment adjusting for
gender (N=106)
Medication
type
Male
Female
Response
Total.
Chip-value
a
Freq, n(%) square
Better
Freq, n(%)
Same
Freq, n(%)
New
12 (42.9)
16 (57.1)
28 (100.0)
Placebo
7 (26.9)
19 (73.1)
26 (100.0)
New
16 (59.3)
11 (40.7)
27 (100.0)
Placebo
5 (20.0)
20 (80.0)
25 (100.0)
8.443
(1)
0.005*
MH OR
(95%CI)
3.31
(1.45, 7.59)
* Statistically significant at α=0.05
a Mantel-Haenszel test
MdRodiSPSS
228
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
229
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 6
Correlation Analyses
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
230
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
231
Re-cap: Exploring relationship among
variables
Remarks
Relationship only
(2 categorical
variables)
x2 of independence
or homogeneity
Has been cover
in categorical
analyses
Relationship with
strength & direction
(quantitative)
Correlation:
Pearson's,
Spearman Rank,
Kendall's Tau
Phi, Cramer's V, etc
will cover in this
lesson & in nonparametric
method
Prediction between
dependent variabl and
independent variable(s)
Regression
(simple, multiple,
logistic, etc)
will be covered in
Regression
Analyses
Identify the structure
underlying a group or
related variables
Reliability test &
Factor Analysis
Statistical
Techniques
MdRodiSPSS
232
Simple Correlation
MdRodiSPSS
233
6.1 Simple Correlation
▪
▪
Correlation is defined as the quantification of the degree to
which two continuous variables are related, providing the
relationship is linear.
It measures the strength of the linear relationship between two
variables, without taking into consideration the fact that both
these variables may be influenced by a third variable.
Correlation coefficient, r:
• It measures the direction and the strength of association in
correlation.
• The value can ranges between –1 to +1:
i. +1 : perfect positive correlation
ii. 0 : no correlation at all.
iii. -1 : perfect negative correlation..
MdRodiSPSS
234
Magnitude of association, r
• The qualitative description of the strength of the linear
relationship and the qualitative value of r.
Value of r
±1
±0.75 to ±1
Qualitative description of the strength *
Perfect correlation
Strong (Positive / Negative) correlation
±0.50 to ±0.75
Moderate (Positive / Negative) correlation
±0.25 to ±0.50
Weak (Positive / Negative) correlation
0 to ±0.25
No linear correlation
* note - different books will give different classifications
MdRodiSPSS
235
Different types of correlation analyses
Classification of correlation analyses
Types of Correlation
On the basis of
degree of correlation
On the basis of
number of variables
On the basis of
Linearity
Positive
Correlation
Simple
Correlation
Linear
Correlation
Negative
Correlation
Partial
Correlation
Non-Linear
Correlation
Multiple
Correlation
236
MdRodiSPSS
Assumptions
i.
ii.
iii.
iv.
v.
vi.
The samples must be pair related
The sample must be randomly selected.
The observation must be independent.
The scale of measurement should be intervals or ratio.
Both variables (x1 and x2) must be normally distributed.
Assume that a straight line in relationship between each of the
variable in the analysis (Linearity)
vii. Assume that data is normal distributed about the regression line
(Homoscedasticity)..
MdRodiSPSS
237
Steps in Hypothesis testing
•
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Calculate, r
Step 6 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 7 : Conclusions (statistical & problems)
• Step 8 : Interpret and give overall conclusion..
MdRodiSPSS
238
Research Question
Research question:
• What is the correlation between cholesterol and calorie intake ?
Step 1: Specify the Ho and Ha.
a) Null hypothesis:
• The linear correlation between cholesterol and calories intake is 0
(r = 0).
b) Alternate hypothesis:
• The linear correlation between cholesterol and calories intake is
not 0 (r ≠ 0).
Step 2: Choose the significant level.
• α = 0.05 (two-sided)..
MdRodiSPSS
239
Simple Correlation
Step 3: Checking assumptions.
• The samples must be pair related.
• The sample must be randomly selected.
• The observation must be independent.
• The scale of measurement should be intervals or ratio.
• Both variables (x1 and x2) must be normally distributed.
• Assume that a straight line in relationship between each of the
variable in the analysis (Linearity).
• Assume that data is normal distributed about the regression line
(Homoscedasticity)..
Step 4: Choose the test statistic
• Pearson correlation with (n - 2) df..
MdRodiSPSS
240
Simple Correlation
Step 5: Calculate, r
• Formula:
Step 6: Find the p-value
• to calculate the t-calc.
• Formula:
MdRodiSPSS
241
Hands-on: distribution of cholesterol &
calorie intake
1
2
4
5
3
9
6
7
MdRodiSPSS
8
242
Output: distribution of cholesterol &
calorie intake
Skewness:
• Cholesterol: -0.168
• Calorie intake: 0.151
• Both less than ± 1
• It can be concluded that
both variables are normally
distributed.
KS test:
• Both variables are not statistically significant (p>0.05).
MdRodiSPSS
• It can be concluded that both variables are normally distributed.
243
Output: distribution of cholesterol &
calorie intake
Cholesterol
MdRodiSPSS
Calorie intake
244
Hands-on: Scatter plot
1
4
2
5
3
MdRodiSPSS
245
Hands-on: Scatter plot
6
7
MdRodiSPSS
8
246
Output: Scatter plot
• From the scatter plot, we have a rough idea that there is a linear
MdRodiSPSS
247
positive correlation between cholesterol and calories intake.
Hands-on: Simple Correlation
1
4
2
3
5
6
MdRodiSPSS
248
Output: Simple Correlation
r = 0.940,
p<0.001
Step 7: Conclusion
a) Statistical conclusion
• Since the p-value (p<0.001) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we can conclude that there was a significant linear
correlation between cholesterol and calorie intake (p<0.001).
• The linear correlation between cholesterol and calorie intake was
MdRodiSPSS
not 0..
249
Output: Simple Correlation
r = 0.940,
p<0.001
Step 7: Conclusion
b) Problem conclusion:
▪ There are a positive strong linear correlation between sholesterol
and calorie intake (r=0.940).
MdRodiSPSS
250
Summary: Simple Correlation
Table: The correlation between Calorie intake and Cholesterol
(N=106)
Variables
Mean (SD)
ra
p-value
Calorie intake
10.23 (2.85)
0.940
<0.001*
Cholesterol
768.31 (373.97)
* Statistical significant at α=0.05
a Statistical test: Pearson correlation
MdRodiSPSS
251
Partial Correlation
MdRodiSPSS
252
6.2 Partial Correlation
• The partial correlation is the correlation between two continuous
variables, adjusting by the third variable.
• When there are many factors influence the outcome, it is possible
to control the variables and the effect of each variable can be
studied separately.
factor1
Outcome
Other factors
(confounding)
MdRodiSPSS
253
Partial correlation analysis
• Consider a correlation matrix for variables A, B and C
A
B
C
A
*
r(AB)
r(AC)
B
-
*
r(BC)
C
-
-
*
▪ The partial correlation of A and B controlling
(adjusted) for C is:
r(AB)C
MdRodiSPSS
254
Partial Correlation
Assumptions:
i. The samples must be pair related
ii. The sample must be randomly selected.
iii. The observation must be independent.
iv. The scale of measurement should be intervals or ratio.
v. Both variables (x1,x2 and x3) must be normally distributed.
vi. Assume that a straight line in relationship between each of the
variable in the analysis (Linearity)
vii. Assume that data is normal distributed about the regression line
(Homoscedasticity)..
MdRodiSPSS
255
Steps in Hypothesis testing
•
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Calculate, r
Step 6 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 7 : Conclusions (statistical & problems)
• Step 8 : Interpret and give overall conclusion..
MdRodiSPSS
256
Research Question
Research question:
• What is the correlation between cholesterol and calories intake
(adjusted by age) ?
Step 1: Specify the Ho and Ha
a) Null hypothesis:
• The linear correlation between cholesterol and calories intake
(adjusted by age) is 0 (r = 0).
b) Alternate hypothesis:
• The linear correlation between cholesterol and calories intake
(adjusted by age) is not 0 (r ≠ 0).
Step 2: Choose the significant level
•MdRodiSPSS
α = 0.05 (two-sided)
257
Partial Correlation
Step 3: Checking assumptions
• The samples must be air related
• The sample must be randomly selected.
• The observation must be independent.
• The scale of measurement should be intervals or ratio.
• Both variables (x1,x2 and x3) must be normally distributed.
• Assume that a straight line in relationship between each of the
variable in the analysis (Linearity)
• Assume that data is normal distributed about the regression line
(Homoscedasticity)..
Step 4: Choose the test statistic
• Partial correlation with (n - 2) df
MdRodiSPSS
258
Partial Correlation
Step 5: calculate r
• Formula:
Step 6: Find the p-value
• to calculate the t-calc
• Formula:
MdRodiSPSS
259
Hands-on: Partial Correlation
1
4
2
5
3
6
MdRodiSPSS
260
Output: Partial Correlation
r = 0.939,
p<0.001
Step 6: Conclusion
a) Statistical conclusion
• Since the p-value (p<0.001) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we can conclude that there are a significant correlation
between cholesterol and calorie intake adjusted by age (p<0.001).
• The linear correlation between cholesterol and calorie intake
(adjusted by age) is not 0 (r ≠ 0)..
MdRodiSPSS
261
Output: Partial Correlation
r = 0.939,
p<0.001
Step 6: Conclusion
b) Problem conclusion
• There are a positive strong linear partial correlation between
sholesterol and calorie intake adjusted by age (r=0.935)..
MdRodiSPSS
262
Summary
Table: The partial correlation between Calorie intake and Cholesterol
(adjusted by age) (N=103)
Control
Variable
Mean (SD)
ra
p-value
Age
Calorie intake
10.23 (2.85)
0.939
<0.001*
Cholesterol
768.31 (373.97)
* Statistical significant at α=0.05
a Statistical test; Partial correlation
MdRodiSPSS
263
Re-cap: Some of the correlation coefficients
Name
First variable
Second variable
Pearson, r
Interval / Ratio
Interval / Ratio
Spearman rho, ρ Ordinal
Ordinal
Kendall’s Tau
Ordinal
Ordinal
Phi
Dichotomous (Nominal) Dichotomous (Nominal)
Intraclass, R
Intervals / Ratio (Test)
MdRodiSPSS
Intervals / Ratio (Re-test)
264
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
265
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 7
Simple Linear Regression
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
266
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
267
Simple Linear Regression
• Regression analysis is a statistical methodology to estimate the
relationship (using a theoretical or an empirical model) of a
response variable to a set od predictor variables.
• It helps to understand how the typical value of one dependent
continuous variable (usually called y) changes when any one of
the independent variable (continuous or categorical) (usually
called x) is varies.
• The dependent variables the variable for which we want to make
a prediction.
• Unlike correlation analysis, in regression analysis, it have to
identify the dependent and independent variables..
MdRodiSPSS
268
Simple Linear Regression
• General regression model:
Y = β0 + β1X + ε
Where:
• β0 and β1 are parameters: where β0 is the intercept and β1 is the
regression coefficient.
• X is a known constant.
• Deviation ε are independent N(0,σ2).
Meaning:
• The values of the regression parameter, β0 and β1 are not known,
we estimate them from data.
• β1 indicates the change in the mean response per unit increase in
X..
MdRodiSPSS
269
Regression Line
• If the scatter plot of our sample data suggests a linear
relationship between two variables i.e.
y = β0 + β1x
• We can summarize the relationship by drawing a straight line on
the plot.
• Least squares methods give us the “best” estimated line for our
set of sample data.
• We will write an estimated regression line based on sample data
as:
MdRodiSPSS
270
Regression Line
• The method of least squares chooses the values for b0, and b1 to
minimize the sum of squared errors.
• Using calculus, we obtain estimating formulas:
•
or,
MdRodiSPSS
271
Beta Coefficient, bx
• It is the slope in the population, in the regression model.
• It reflects the amount of change in the dependent variable per
one unit increase in independent variable.
• No range of values for the Coefficient.
• It is calculated as the slope of the linear line that fit the data in
the scatter plot.
• The method of fitting the line is called “least equation”
method..
MdRodiSPSS
272
Simple Linear Regression
180
160
140
Slope, b (Popul. – β)
Regression Coefficient
120
100
1 unit
80
60
4
MdRodiSPSS
5
6
7
8
Cholestrol level
e.g. “b = 15” refers to:
mean blood pressure will
increase by 15Rsq
points,
= 0.9238
9
10
when cholesterol
level
increase by 1 unit.
273
Coefficient of Determination, r2
• It can be interpreted as the proportion of the variability among
the observed values of y that is explained by the linear regression
of Y on X.
• The r2 is the ratio of the explained variation to the total variation.
• It is equal of the square of the Pearson correlation coefficient
• Range value: O < r2 < 1, and denotes the strength of the linear
association between x and y.
• r2 represents the percent of the data that is the closest to the line
of best fit..
MdRodiSPSS
274
Coefficient of Determination, r2
• Example: The linear relationship between blood pressure and
cholesterol level.
r = 0.829, then r2 = 0.687.
• Means that 68.7% of the total variation in blood pressure (y) can
be explained by the linear relationship of cholesterol level (x) (as
described by the regression equation).
• The other 31.3% of the total variation in blood pressure (y)
remains unexplained..
MdRodiSPSS
275
Assumptions
•
•
•
•
•
•
The sample must be related.
The sample must be randomly selected.
The observation must be independent.
The scale of measurement must be intervals or ratio.
Both variable (x and y) must be normally distributed.
Assume that a straight line in relationship between each of the
variable in the analysis
• The means of subpopulation Y, all lie on the same straight line
(assumption of linearity).
• For each value of x1, there is sub-population of Y, which must be
normally distributed.
• Assume that the data is normally distributed on the regression
line (Homoscedasticity)..
MdRodiSPSS
276
Steps in Hypothesis testing
•
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the regression line.
Step 6 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 7 : Conclusions (statistical & problems)
• Step 8 : Interpret and give overall conclusion..
MdRodiSPSS
277
Simple Linear Regression
Research question – 3 scenarios:
i.What is the relationship between cholesterol and calories intake
(continuous variable)?
ii.What is the relationship between cholesterol and gender (male
and female) (binary independent variable) ?
iii.What is the relationship between cholesterol and race (Malay,
Chinese and Indian) (more than 2 categorical independent
variables) ?
MdRodiSPSS
278
Research Question (1)
Research question:
• What is the relationship between cholesterol and calories
intake ?
Step 1: Specify the Ho and Ha
a) Null hypothesis:
• The relationship between cholesterol and calories intake is 0
(β = 0).
b) Alternate hypothesis:
• The relationship between cholesterol and calories intake is
not 0 (β ≠ 0).
Step 2: Choose the significant level
•MdRodiSPSS
α = 0.05 (two-sided)
279
Assumptions
Step 3: Checking assumptions
• The sample must be related.
• The sample must be randomly selected.
• The observation must be independent.
• The scale of measurement must be intervals or ratio.
• Both variable (x and y) must be normally distributed.
• Assume that a straight line in relationship between each of the
variable in the analysis
• The means of subpopulation Y, all lie on the same straight line
(assumption of linearity).
• For each value of x1, there is subpopulation of Y, which must be
normally distributed.
• Assume that the data is normally distributed on the regression
line (Homoscedasticity)
MdRodiSPSS
280
Assumptions
Step 4: Choose test statistic
• Simple linear regression analysis with (n - 2) df
i. Calorie intake (continuous variable) - dependent
ii. Cholesterol level (continuous variable) - independent
Step 5: Find the regression line
MdRodiSPSS
281
Assumptions
Step 6: Find the p-value
• to calculate the t-calc
• Formula:
t = b1 / SE(b)
• SE(b) denotes the standard error of b
MdRodiSPSS
282
Hands-on: SLR
1
4
7
5
2
6
3
MdRodiSPSS
283
Hands-on: SLR
9
8
12
10
11
MdRodiSPSS
284
Hands-on: SLR
13
14
15
16
MdRodiSPSS
17
285
Output: SLR
MdRodiSPSS
286
Output: SLR
i.
•
ii.
•
r = 0.940
The linear correlation between calorie intake and cholesterol is positice
and the strength is strong
r2 = 0.884
88.4% of the calorie intake is explained by the variation of cholesterol.
The other 11.6% is explained other factors.
Are the correlation and
coefficient of determination significant?
MdRodiSPSS
287
Output: SLR
Step 7: Conclusion
a) Statistical conclusion
• Since p-value (p<0.001) is less than α at 0.05, therefore we reject
Ho and accept Ha.
• We can concluded that the relationship between cholesterol and
calories intake is not 0 (β ≠ 0).
• The correlation (r = 0.940) and coefficient of determination (r2 =
0.884) are not due to chance..
MdRodiSPSS
288
Output: SLR
Step 7: Conclusion
a) Problem conclusion
• The linear relationship between calorie intake and cholesterol
level is not 0.
• The equation between calorie intake and cholesterol is:
Calorie intake = -494.30 + 123.488*cholesterol
MdRodiSPSS
289
Output: SLR
Step 7: Conclusion
a) Problem conclusion
Calorie intake = -494.30 + 123.488*cholesterol
• This mean that if cholesterol level is increased by 1 mmol/l, we
would expect the calorie intake to increase by 123.49 (95%CI:
114.79, 132.18) unit.
• The cholesterol level is explained 88.4% of the variation of
calorie intake.
MdRodiSPSS
290
• The other 11.6% is explained by other factors..
Output: SLR
MdRodiSPSS
291
Output: SLR
• The scatter plot is not looking any pattern.
• It can be concluded that the assumption is fit.
MdRodiSPSS
292
Summary
Table: The relationshio between cholesterol level to the calorie
intake (N=106)
Variable
Cholesterol
b (95%CI)
ta
p-value
123.49 (114.79, 132.18)
28.166
<0.001
* statistical significant at α=0.05
a Statistical test: Simple Linear Regression
MdRodiSPSS
293
Research Question (2)
Research question:
• What is the relationship between cholesterol and gender (male
and female) ?
• In this example we can see that:
i. Dependent variable (cholesterol) is continuous variable
ii. Independent variable (gender – male & female) is categorical
variable.
• In the previous lesson, we can compare is there any different in
the mean of cholesterol between male and female.
• In that scenario, we use independent t-test to solve the problem.
• However, we can use simple linear regression to analyse the
research question..
MdRodiSPSS
294
Hands-on: SLR
1
7
4
5
2
3
MdRodiSPSS
6
295
Hands-on: SLR
9
8
12
10
11
MdRodiSPSS
296
Hands-on: SLR
13
14
15
16
17
MdRodiSPSS
297
Output: SLR
MdRodiSPSS
298
Output: SLR
i.
•
ii.
•
r = 0.261
Since gender is categorical variable, therefore we cannot conclude that
there is weak linear correlation between cholesterol and gender.
r2 = 0.068
We can conclude that 6.8% of the cholesterol is explained by the
variation of gender. The other 93.2% is explained other factors.
Are the correlation and
coefficient of determination significant?
MdRodiSPSS
299
Output: SLR
Step 7: Conclusion
a) Statistical conclusion
• Since p-value (p<0.001) is less than α at 0.05, therefore we reject
Ho and accept Ha.
• We can concluded that the relationship between cholesterol and
gender is not 0 (β ≠ 0).
• The coefficient of determination (r2 = 0.068) are not due to
chance..
MdRodiSPSS
300
Output: SLR
Step 7: Conclusion
a) Problem conclusion
• The relationship between gender and cholesterol level is not 0.
• The equation between gender and cholesterol is:
Cholesterol level = 9.50 + 1.477*gender
MdRodiSPSS
301
Output: SLR
Step 7: Conclusion
a) Problem conclusion
Cholesterol level = 9.50 + 1.477*gender
• Coding: 0 = male; 1 = female
• This mean that is the respondent is male (0), the cholesterol level
is 9.50
• However, if the respondent is female (1), the cholesterol level will
be : 9.50 + 1.477(1)
MdRodiSPSS
302
= 10.977..
Output: SLR
Step 7: Conclusion
a) Problem conclusion
Cholesterol level = 9.50 + 1.477*gender
• This mean that if the respondent is female, the cholesterol level is
increased by 1.477 (95%CI: 0.41, 2.54) mmol/l.
• The gender is explained 6.8% of the variation of cholesterol.
• The other 93.2% is explained by other factors..
MdRodiSPSS
303
Output: SLR
MdRodiSPSS
304
Output: SLR
• The scatter plot is not looking any pattern.
• It can be concluded that the assumption is fit.
MdRodiSPSS
305
Summary
Table: The relationship between cholesterol level to the gender
(N=106)
Variable
Cholesterol
b (95%CI)
ta
p-value
1.477 (0.41, 2.54)
2.752
0.007
* statistical significant at α=0.05
a Statistical test: Simple Linear Regression
MdRodiSPSS
306
Research Question (3)
Research question:
• What is the relationship between cholesterol and race (Malay,
Chinese and Indian) ?
• In this example we can see that:
i. Dependent variable (cholesterol) is continuous variable
ii. Independent variable (race - Malay, Chinese and Indian) is
categorical variable (more than two categories).
• In the previous lesson, we can compare is there any different in
the mean of cholesterol between in between races (Malay,
Chinese and Indian).
• In that scenario, we use ANOVA to solve the problem.
• However, we can use simple linear regression to analyse the
research question.
MdRodiSPSS
307
• But we must create dummy coding..
Research Question (3)
From dataset: SLR
• Coding: 1 = Malay, 2 = Chinese; and 3 = Indian
• Therefore, we need to transform the code (n – 1):
Vector
MdRodiSPSS
X1
X2
Malay (1)
0
0
Chinese (2)
1
0
Indian (3)
0
1
308
Hands-on: Creating Dummy Coding
1
3
4
5
6
2
7
8
MdRodiSPSS
309
Hands-on: Creating Dummy Coding
2
1
7
6
8
3
11
10
12
MdRodiSPSS
13
310
Then, click OK
Output: Creating Dummy Coding
•MdRodiSPSS
Then to the same thing for the second vector (X2)
• Dummy Coding: Malay = 0; Chinese = 0; and Indian = 1
311
Output: Creating Dummy Coding
MdRodiSPSS
312
Hands-on: SLR
1
7
4
5
2
6
3
MdRodiSPSS
313
Hands-on: SLR
9
8
12
10
11
MdRodiSPSS
314
Hands-on: SLR
13
14
15
16
MdRodiSPSS
17
315
Output: SLR
MdRodiSPSS
316
Output: SLR
i.
•
ii.
•
r = 0.276
Since race is categorical variable, therefore we cannot conclude that
there is weak linear correlation between cholesterol and race.
r2 = 0.076
We can conclude that 7.6% of the cholesterol is explained by the
variation of gender. The other 92.4% is explained other factors.
Are the correlation and
coefficient of determination significant?
MdRodiSPSS
317
Output: SLR
Step 7: Conclusion
a) Statistical conclusion
• Since p-value (p=0.017) is less than α at 0.05, therefore we reject
Ho and accept Ha.
• We can concluded that the relationship between cholesterol and
grace is not 0 (β ≠ 0).
• The coefficient of determination (r2 = 0.076) are not due to
chance.
MdRodiSPSS
318
Output: SLR
Step 7: Conclusion
a) Statistical conclusion
• At the beginning, we decided that Malay was chosen as a
reference group.
• Therefore, we can conclude that:
i. There is a significance difference in the mean of cholesterol
level between India and Malay (p=0.006).
ii. However, there is no significant difference in the mean of
cholesterol between Chinese and Malay (p=0.448).
MdRodiSPSS
319
• We cannot compare the difference between Chinese and Indian..
Output: SLR
Step 7: Conclusion
a) Problem conclusion
• When the respondent is Indian, the cholesterol level is increased
by 2.02 (95%CI: 0.59, 3.44) mmol/l compared to Malay
• However, when the respondent in Chinese, there is no significant
difference in the cholesterol level compared to Malay.
• We can conclude that 7.6% of the cholesterol is explained by the
variation of race. The other 92.4% is explained other factors..
MdRodiSPSS
320
Output: SLR
MdRodiSPSS
321
Output: SLR
• The scatter plot is not looking any pattern.
• It can be concluded that the assumption is fit.
MdRodiSPSS
322
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
323
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 8
Simple Logistic Regression
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
324
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
325
Introduction
• Logistic regression is a form regression analysis in which the
outcome (dependent) variable is category.
• What is the “Logistic” component?
- Instead of modelling the outcome, Y, directly the method
models the LOG ODDS (Y) using the logistic function.
• What is the “Regression” component?
- Method used to quantify association between an outcome
and predictor variables.
- Could be used to build predictive models as a function
of predictors..
MdRodiSPSS
326
Introduction
• Types of logistic regression:
i. Binary (binary) logistic regression: when the dependent is a
dichotomy.
- Example: Disease: Yes/No.
ii. Multinominal logistic regression: when there are more than two
levels in the dependent variable.
- Example: Severity of disease: mild / moderate / severe.
• The predictor (independent variable) can be both continuous
and/or categorical variables.
• Simple logistic regression is when the independent variable is
only one whereas multiple logistic regression is when the
independent variable is more than one..
MdRodiSPSS
327
Logistic Function
• The logistic function for a single predictor:
Z = β0 + β1X1 + ε
• The Z value is then transformed using a link function to obtain
the probability of the event occurring.
• For binary outcome, the link function is:
• The Plot:
MdRodiSPSS
328
Dichotomous Predictor
• Therefore, for the odds ratio associated with risk presence we
have: OR = eβ1
• Taking the natural logarithm we have: Ln(OR) = β1
• thus the estimated regression coefficient associated with a 0 - 1
coded dichotomous predictor is the natural log of the OR
associated with risk presence!!
• In this practical, there are three types of simple logistic
regression analyses:
i. Continuous independent variable (e.g. radiation time).
ii. Binary independent variable (e.g. smoking statis = Yes / No).
iii. More than two independent variables (e.g. diabetes mellitus norma, impaired and DM)..
MdRodiSPSS
329
Assumptions
• Logistic regression does not assume a linear relationship between
dependent and independent variables.
• The dependent variable must be a dichotomy (two categories)
• The independent variables need not be interval, nor normally
distributed, nor linearly related nor of equal variance within each
group.
• The categories (groups) must be mutually exclusive and
exhaustive.
• Larger samples are needed than for linear regression because
maximum likelihood coefficients are larger sample estimate. A
minimum of 50 cases per predictor is recommended.
MdRodiSPSS
330
Steps in Hypothesis testing
•
•
•
•
•
Step 1 : Specify the null hypothesis and alternate hypothesis.
Step 2 : Choose the significance level α, one or two-sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Find the p-value, compare with α:
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 6 : Conclusions (statistical & problems)
• Step 7 : Interpret and give overall conclusion..
MdRodiSPSS
331
Research Question (1)
Research question:
• Does radiation time associated with lung cancer?
• Independent vriable: radiation time (continuous variable)
• Dependent variable: Lung cancer (binary: Yes / No)
Step 1: Specify Ho and Ha
a) Null hypothesis:
• Ho: There is no association between radiation time and lung
cancer.
b) Alternate hypothesis:
• Ha: The is an association between radiation time and lung
cancer..
MdRodiSPSS
332
MdRodiSPSS
333
Research Question (1)
Step 2: Choose the significant level
• α = 0.05 (one-sided)
Step 3: Checking assumptions
• Logistic regression does not assume a linear relationship between
dependent and independent variables.
• The dependent variable must be a dichotomy (two categories)
• The independent variables need not be interval, nor normally
distributed, nor linearly related nor of equal variance within each
group.
• The categories (groups) must be mutually exclusive and
exhaustive.
• Larger samples are needed than for linear regression because
maximum likelihood coefficients are larger sample estimate. A
minimum of 50 cases per predictor is recommended.
MdRodiSPSS
334
Research Question (1)
Step 4: Choose statistical test
• Simple logistic regression with (r - 1)(c - 1) df
Step 5: Find the p-value
• to calculate the Wald Chi-square
• Formula:
Wald Chi-square = (co-efficient estimate / SE)2
MdRodiSPSS
335
Hands-on: SLogR
1
4
7
5
2
6
3
MdRodiSPSS
336
Hands-on: SLogR
8
9
10
MdRodiSPSS
11
337
Output: SLogR (Beginning Block)
MdRodiSPSS
338
Output: SLogR (Method = Enter)
Omnibus Test of Model Co-efficients:
• Here SPSS has added the “radiation time” variable as a predictor.
• The Model Coefficient gives a Chi-square of 12.858 on 1 degree of
freedom (df) significant beyond 0.001.
• This is a test of the Ho that adding the “radiation time” variable to
the model has significantly increased to predict lung cancer.
• Therefore, the inference, adding “radiation time” variable into the
model improves the model..
MdRodiSPSS
339
Output: SLogR (Method = Enter)
-2 Log Likelihood (LL):
• Under model summary, we can see the -2 Log LL statistic is 129.490.
• This statistic measures how poorly the model predict lung cancer - the
smaller the statistic, the better the model.
• Although SPSS does not give us this statistic for the model that had only
the intercept, but you can derive it to be: 129.490 + 12.858 =142.348.
• Adding “radiation time” variable reduced the -22LL statistic by 142.348
- 129.490 = 12.858, which is the x2 statistic as seen in the omnibus test.
MdRodiSPSS
340
Output: SLogR (Method = Enter)
Cox & Snell R2:
• The cox & Snell R2 can be interprets like R2 in linear regression but
can't reach a maximum value of 1.
• In this study we can conclude that 11.4% of lung cancer is explained
by radiation time.
Nagelkerke R2:
• It is a modification of Cox & Snell R2.
• The Nagelkerke R2 can reach a maximum of 1..
MdRodiSPSS
341
Output: SLogR (Method = Enter)
Hosmer & Lemeshow test:
• Pearson Chi-square: to test the Ho that the number of observed and
predicted outcomes are not different.
• If the Chi-square statistic for its degrees of freedom would be smaller
than the critical values giving a p-value > 0.05, it indicates that the Ho
cannot be rejected and the model does fits well.
• Of the other way round, the model does not fits well.
• In this analysis, we found that the p-value was more than 0.05.
• It can be say that there was no statisticaly significant (p>0.05).
• Therefore, we can conclude that the model does fits well.
MdRodiSPSS
342
Output: SLogR (Method = Enter)
• This is the contingency table of the sample
MdRodiSPSS
343
Output: SLogR (Method = Enter)
Classification Table:
• It is explaining the sensitivity and specificity of the independent
variable to the dependent variable.
• It is not very important for simple logistic regression, but very
important in multiple logistic regression,
MdRodiSPSS
344
Output: SLogR (Method = Enter)
Step 6: Conclusion
a) Statistical conclusion.
• Since the p-value (p=0.001) is less than α (0.05), we can reject Ho and
accept Ha.
• Therefore, we can conclude that there is an association between
radiation time and lung cancer..
MdRodiSPSS
345
Output: SLogR (Method = Enter)
Step 6: Conclusion
b) Problem conclusion.
• The variable in the equation output show the regression equation is:
P/(1 – P) = exp (0.053*radiation) – 7.837
• For increase of 1 unit radiation (in second), there will be increase in
0.053 log odds risk of lung cancer.
• OR = exp 0.053
= 1.055
MdRodiSPSS
346
Output: SLogR (Method = Enter)
Example:
• For an increase 5 units radiation time (in second), it will increase OR exp 0.053(5) = 1.30 risk of lung cancer.
• For an increase 10 units radiation time (in second), it will increase OR exp 0.053(10) = 1.69 risk of lung cancer..
MdRodiSPSS
347
Summary: SLogR
Table: The association between radiation time and lung cancer (N=106)
Radiaiton time
B (SE)
Walda (df)
OR (95%CI)
p-value
0.053 (0.016)
11.013 (1)
1.06 (1.02, 1.09)
0.001*
* statistically significant at ɑ=0.05
a Statisitcal test: Simple Logistic Regression
MdRodiSPSS
348
Research Question (2)
Research question:
• Does smoking associated with lung cancer?
• Independent vriable: Smoking status (binary: Yes / No)
• Dependent variable: Lung cancer (binary: Yes / No)
Step 1: Specify Ho and Ha
a) Null hypothesis:
• Ho: There is no association between smoking and lung cancer.
b) Alternate hypothesis:
• Ha: The is an association between smoking and lung cancer.
MdRodiSPSS
349
Hands-on: SLogR
1
4
7
5
2
6
3
MdRodiSPSS
350
Hands-on: SLogR
8
9
10
MdRodiSPSS
11
351
Output: SLogR (Beginning Block)
MdRodiSPSS
352
Output: SLogR (Method = Enter)
Omnibus Test of Model Co-efficients:
• Here SPSS has added the “smoking” variable as a predictor.
• The Model Coefficient gives a Chi-square of 22.859 on 1 degree of
freedom (df) significant beyond 0.001.
• This is a test of the Ho that adding the “smoking” variable to the
model has significantly increased to predict lung cancer.
• Therefore, the inference, adding “smoking” variable into the model
improves the model..
MdRodiSPSS
353
Output: SLogR (Method = Enter)
-2 Log Likelihood (LL):
• Under model summary, we can see the -2 Log LL statistic is 119.489.
• This statistic measures how poorly the model predict lung cancer - the
smaller the statistic, the better the model.
• Although SPSS does not give us this statistic for the model that had only
the intercept, but you can derive it to be: 119.489 + 22.859 = 142.348.
• Adding “radiation time” variable reduced the -2 LL statistic by 142.348
- 119.489 = 22.859, which is the x2 statistic as seen in the omnibus test..
MdRodiSPSS
354
Output: SLogR (Method = Enter)
Cox & Snell R2:
• The cox & Snell R2 can be interprets like R2 in linear regression but
can't reach a maximum value of 1.
• In this study we can conclude that 19.4% of lung cancer is explained
by radiation time.
Nagelkerke R2:
• It is a modification of Cox & Snell R2.
• The Nagelkerke R2 can reach a maximum of 1..
MdRodiSPSS
355
Output: SLogR (Method = Enter)
Hosmer & Lemeshow test:
• Pearson Chi-square: to test the Ho that the number of observed and
predicted outcomes are not different.
• If the Chi-square statistic for its degrees of freedom would be smaller
than the critical values giving a p-value > 0.05, it indicates that the Ho
cannot be rejected and the model does fits well.
• Of the other way round, the model does not fits well.
• In this analysis, we found that the p-value was more than 0.05.
• It can be say that there was no statistically significant (p>0.05).
•MdRodiSPSS
Therefore, we can conclude that the model does fits well..
356
Output: SLogR (Method = Enter)
• This is the contingency table of the sample
MdRodiSPSS
357
Output: SLogR (Method = Enter)
Classification Table:
• It is explaining the sensitivity and specificity of the independent
variable to the dependent variable.
• It is not very important for simple logistic regression, but very
important in multiple logistic regression,
MdRodiSPSS
358
Output: SLogR (Method = Enter)
We can now use this model to predict the odds of having lung cancer:
• The odds prediction equation is: Odds = exp (a + bx)
If the subject is non-smoker (No = 0), the
• Odd = exp [-1.39 + 2.02 (0)] = exp (-1.39)
= 0.25
• The non-smoker is 0.25 as likely to have lung cancer.
If the subject is smoker (Yes = 1), then the:
• Odds = exp [-1.39 + 2.02(1)]
= exp 0.63
= 1.88
MdRodiSPSS
• The smoker is 1.88 time likely to have lung cancer..
359
Output: SLogR (Method = Enter)
What is odds ratio (OR)?
Odds of lung cancer in smoker
• OR is the ratio between two odds: -------------------------------------------Odds of lung cancer in non-smoker
Therefore, OR = 1.88 / 0.25 = 7.50
• which is almost equal to exp(B) in the “variable in the equation” table
OR is thus equivalent to exp log odds of lung cancer
= exp 2.02
=
7.50..
MdRodiSPSS
360
Output: SLogR (Method = Enter)
Interpreting Odds Ratio (OR):
• OR = exp 2.02
= 7.50
• It is telling that the model predicts that the odds of having lung cancer
are 7.50 times higher for smoker compared to non-smoker.
• IF lung cancer is a rare condition, OR is equivalent to Relative Risk
(RR).
MdRodiSPSS
361
Summary: SLogR
Table: The association between smoking and lung cancer (N=106)
Smoking (Yes vs No)
B (SE)
Wald a (df)
OR (95%CI)
p-value
2.02 (0.45)
20.30 (1)
7.50 (3.12, 18.02)
<0.001*
* statistically significant at ɑ=0.05
a Statistical test: Simple Logistic Regression
OR
Table: The association between smoking and lung cancer (N=106)
Smoking
Yes
No
B (SE)
Wald a (df)
OR (95%CI)
p-value
2.02 (0.45)
20.30 (1)
7.50 (3.12, 18.02
1
<0.001*
* statistically significant at ɑ=0.05
a Statistical test: Simple Logistic Regression
MdRodiSPSS
362
Research Question (3)
Research question:
• Does diabetes status associated with myocardial infarction?
• Independent vriable: Diabetes status (three categories: Normal,
Impaired and Diabetes)
• Dependent variable: Myocardial infarction (binary: Yes / No)
Step 1: Specify Ho and Ha
a) Null hypothesis:
• Ho: There is no association between diabetes status and
myocardial infarction.
b) Alternate hypothesis:
• Ha: The is an association between diabetes status and myocardial
MdRodiSPSS
363
infarction..
Hands-on: SLogR
1
7
4
5
2
6
3
MdRodiSPSS
364
Hands-on: SLogR
9
8
MdRodiSPSS
365
Hands-on: SLogR
12
14
11
10
13
MdRodiSPSS
366
Hands-on: SLogR
15
16
17
MdRodiSPSS
18
367
Output: SLogR (Preliminary)
• Here, we set
“Normal” as
reference group
MdRodiSPSS
368
Output: SLogR (Beginning Block)
MdRodiSPSS
• This table shows that
DM2 (Diabetes) is
statistically significant
compared to DM
(reference group –369
Normal) (p=0.005)
Output: SLogR (Method = Enter)
Omnibus Test of Model Co-efficients:
• Here, SPSS has added the “DM status” variable as a predictor.
• The Model Coefficient gives a Chi-square of 9.677 on 2 degree of
freedom (df) significant beyond 0.008.
• This is a test of the Ho that adding the “DM status” variable to the
model has significantly increased to predict myocardial infarction.
• Therefore, the inference, adding “DM status” variable into the model
will improves the model.
• Since “DM status” is having 3 categories and “Normal” is the
reference group, AT LEAST ONE comparison will be significant:
MdRodiSPSS
either: “Impaired DM” versus “Normal”; or “DM” versus “ Normal
370
Output: SLogR (Method = Enter)
-2 Log Likelihood (LL):
• Under model summary, we can see the -2 Log LL statistic is 135.416.
• This statistic measures how poorly the model predict myocardial
infarction - the smaller the statistic, the better the model.
• Although SPSS does not give us this statistic for the model that had only
the intercept, but you can derive it to be: 135.416 + 9.677 = 145.093.
• Adding “DM status” variable reduced the -2LL statistic by 145.093 135.416 = 9.677, which is the x2 statistic as seen in the omnibus test..
MdRodiSPSS
371
Output: SLogR (Method = Enter)
Cox & Snell R2:
• The Cox & Snell R2 can be interprets like R2 in linear regression but
can't reach a maximum value of 1.
• In this study, we can conclude that 8.7% of myocardial infarction is
explained by DM status.
Nagelkerke R2:
• It is a modification of Cox & Snell R2.
• The Nagelkerke R2 can reach a maximum of 1..
MdRodiSPSS
372
Output: SLogR (Method = Enter)
Hosmer & Lemeshow test:
• Pearson Chi-square: to test the Ho that the number of observed and
predicted outcomes are not different.
• If the Chi-square statistic for its degrees of freedom would be smaller
than the critical values giving a p-value > 0.05, it indicates that the Ho
cannot be rejected and the model does fits well.
• Of the other way round, the model does not fits well.
• In this analysis, we found that the p-value was more than 0.05.
• It can be say that there was no statistically significant (p>0.05).
• Therefore, we can conclude that the model does fits well.
MdRodiSPSS
373
Output: SLogR (Method = Enter)
• This is the contingency table of the sample
MdRodiSPSS
374
Output: SLogR (Method = Enter)
Classification Table:
• It is explaining the sensitivity and specificity of the independent
variable to the dependent variable.
• In this scenario:
i. Sensitivity: 39.1%
ii. Specificity: 85.0%
MdRodiSPSS
375
Output: SLogR (Method = Enter)
The variable in the Equation output shows the regression equation is:
• Comparing between “Normal” and “Impaired DM”:
• Log odds for Normal = b0 + b*normal
= -0.981 + 0.629(0)
• Log odds or Impaired DM = b0 + b*Impaired DM
= -0.982 + 0.629(1)
MdRodiSPSS
376
Output: SLogR (Method = Enter)
• Log odds Impaired DM versus Normal:
= Log odds Impaired DM - Log odds Normal
= (-0.981 + 0.629) - (-0.981 + 0)
= 0.629
Log odds Impaired DM versus Normal = 0.629
• OR = exp (0.629)
= 1.877
• Impaired DM are at 2 times risk of having myocardial infarction
compared to Normal.
• However, the association was NOT statistically significant (p=0.201)..377
MdRodiSPSS
Output: SLogR (Method = Enter)
The variable in the Equation output shows the regression equation is:
• Comparing between “Normal” and “Diabetes”:
• Log odds for Normal = b0 + b*normal
= -0.981 + 1.647(0)
• Log odds or Impaired DM = b0 + b*Impaired DM
= -0.982 + 1.647(1)
MdRodiSPSS
378
Output: SLogR (Method = Enter)
• Log odds Diabetes versus Normal:
= Log odds Diabetes - Log odds Normal
= (-0.981 + 1.674) - (-0.981 + 0)
= 1.674
Log odds Diabetes versus Normal = 1.674
• OR = exp (1.674)
= 5.33
• Diabetes are at 5 times risk of having myocardial infarction compared
to Normal.
• The association was statistically significant (p=0.003)..
MdRodiSPSS
379
Summary: SLogR
Table: The association between Diabeted status and Myocardial infarction
(N=106)
Diabetes
status
Normal
Impaired DM
Diabetes
B (SE)
Wald a (df)
OR (95%CI)
p-value
0.63 (0.49)
1.67 (0.57)
8.945 (2)
1.634 (1)
8.772 (1)
1
1.88 (0.72, 4.93)
5.33 (1.76, 16.15)
ref.
0.201
0.003*
* Statistically significant at ɑ=0.05
a Statistical test: Simple Linear Regression
MdRodiSPSS
380
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
381
Basic Biostatistics Workshop
Using SPSS 2019
Lesson 9
Non Parametric Method Data Analyses
Dr. Mohamad Rodi Isa
MBBS (Malaya), DAP&E (SEAMEO-TROPMED, M’sia), MPH (Malaya), DrPH (Malaya)
Public Health Medicine, UiTM
MdRodiSPSS
382
Contents
Recap
Introduction to statistics.
Lesson 1
Descriptive Statistics.
Lesson 2
Inferential Statistics: Estimation & Hypothesis Testing.
Lesson 3
Analyzing Quantitative Data: T-test.
Lesson 4
Analyzing Quantitative Data: Analysis of Variance (ANOVA).
Lesson 5
Analyzing Categorical Data.
Lesson 6
Correlation Analyses.
Lesson 7
Simple Linear Regression.
Lesson 8
Simple Logistic Regression
Lesson 9
Non-Parametric Method Data Analyses.
MdRodiSPSS
383
Non-Parametric tests
• It is used when the characteristics of the population from which
a sample is drawn is unknown.
• No assumption about the distribution of variable
→ Distribution free.
• Dealing with small sample size.
• Less powerful compared to parametric – less likely to find a true
difference when it exists than the parametric test (replace by
rank).
• It is more robust when testing skewed data..
MdRodiSPSS
384
Non-Parametric tests
Advantages:
• It allow for the testing of hypothesis that are not statements
about population parametric values.
• The tests may be used when the form of the sampled population
is unknown.
• The procedures then to be computationally easier and
consequently more quickly applied than parametric procedures.
• The procedures may be applied when the data being analyzed
consists merely of rankings or classification.
Disadvantages:
• The procedures with data that can be handled with a parametric
procedures results in a waste of data.
•MdRodiSPSS
The application of some of the nonparametric tests may be
385
laborious for large samples..
Parametric vs Non-parametric
Parametric
Non-parametric
Normal
Any
Homogenous
Any
Typical data
Ratio or intervals
Ordinal or Nominal
Data set
relationships
Independent
Any
Usual central
measure
Mean (SD)
Median (IQR)
Can draw more
conclusions
Simplicity, less
affected by outliers
Assumed
distribution
Assumed Variance
Benefits
MdRodiSPSS
386
9.0 Non-Parametric Tests
Descriptions
Statistical test
1. Single distribution
-
• Kolmogorov-smirnov
2. Single sample
Compare with hypothesize
median
• One-sample Wilcoxon
Sign-Rank
3. Independent
Comparing 2 medians
• Mann-Whitney U
Comparing ≥ 3 medians
• Kruskal Wallis
Continuous → categories
• Median
Comparing related 2 median
groups
• Sign test
• Wilcoxon Match-Pair
(Wilcoxon Pair-Rank)
Comparing related ≥ 3 median
groups
• Friedman
Continuous versus Continuous
• Spearman-rank
Ordinal versus Ordinal
• Kendall's Tau
Nominal versus Nominal
• Phi-coefficient
4. Paired
5. Correlation
MdRodiSPSS
387
Hypothesis testing for Non-Parametric
•
•
•
•
•
•
•
Step 1 : Specify the null and alternate hypotheses.
Step 2 : Choose the significance level α, one or two sided (tailed).
Step 3 : Check assumptions.
Step 4 : Choose the test statistic.
Step 5 : Distribution of test statistic
Step 6 : Decision rule
Step 7 : Calculate test statistics to find the p-value,
- p > α → do not reject Ho → no sig. difference.
- p < α → reject Ho → sig. difference.
• Step 8 : Statistical and Problem conclusions.
MdRodiSPSS
388
9.1 Non-Parametric Tests
Descriptions
1. Single distribution
MdRodiSPSS
Statistical test
-
• Kolmogorov-smirnov
389
9.1 One-Sample KS
• It is a non-parametric of the equality of cntinuous variable.
• It is one dimensional probability distribution that can be used to
compare:
i. a sample with a sample probability distribution (one-sample KS
test); or
ii. two samples (two sample KS test) – this is under independent
sample to determine whether the distribution of teo groups are
equal (similar) or not..
MdRodiSPSS
390
One-Sample Kolmogorov-smirnov
Research Question:
• Is distribution of troponin normally distributed?
• Variable: Troponin - a continuous variable.
a) Null Hypothesis:
• Ho: The data for troponin is follow a special distributed
(normally distributed).
b) Alternate hypothesis:
• Ha: The data for troponin DO NOT follow a special distributed
(not normally distributed).
MdRodiSPSS
391
Hands-on: One-Sample KS (Method 1)
1
5
6
2
7
3
4
MdRodiSPSS
392
Output & Conclusion
• Since the p-value (<0.001) is less than α (0.05), we reject the Ho and
accept Ha.
• Therefore, we can conclude that the distribution for troponin was not
normally distributed..
MdRodiSPSS
393
Hands-on: One-Sample Kolmogorovsmirnov (Method 2)
1
4
5
2
MdRodiSPSS
3
394
Hands-on: One-Sample KS (Method 2)
6
9
7
10
8
11
12
MdRodiSPSS
395
Hands-on: One-Sample KS (Method 2)
13
15
14
MdRodiSPSS
396
Output & Conclusion
p<0.001
• Since the p-value is less than 0.05 (p<0.001) at α=0.05, we reject the Ho
and accept Ha.
MdRodiSPSS
• Therefore, we can conclude that the distribution for troponin was not 397
normally distributed..
9.2 Non-Parametric Tests
Descriptions
2. Single sample
MdRodiSPSS
Statistical test
Compare with hypothesize
median
• One-sample Wilcoxon
Sign-Rank
398
9.2 One-Sample Wilcoxon Sign-Rank
• 1-Sample Wilcoxon Sign-Rank test is a non-parametric test
alternative to 1-sample t-test when the data cannot be assumed to
be normally distributed.
• It is based on ranks and because of that, the location parameter
is median.
• It is used to determine whether the median of the sample is equal
to a known standard value (i.e. theoretical value)..
MdRodiSPSS
399
One-Sample Wilcoxon Sign-Rank
Research Question:
• You want to determine whether the level of Troponin is
1ng/mL.
• Test variable: Troponin - a continuous variable.
• Hypothesized median: 1ng/mL
a) Null Hypothesis:
• Ho: The median of troponin level is equal to 1ng/mL
b) Alternate hypothesis:
• Ha: The median of troponin level is NOT equal to 1ng/mL..
MdRodiSPSS
400
Hands-on: One-Sample Wilcoxon Sign-Rank
1
4
5
2
MdRodiSPSS
3
401
Hands-on: One-Sample Wilcoxon Sign-Rank
6
9
7
10
8
11
12
13
MdRodiSPSS
402
Output & Conclusion
p=0.527
• Since the p-value (0.527) is more than α (0.05), we DO NOT reject the
Ho.
•MdRodiSPSS
Therefore, we can conclude that the median of troponin is equal to (or403
NOT FAR from) 1ng/mL (or not far from 1ng/mL)..
9.3 Non-Parametric Tests
Descriptions
3. Independent
MdRodiSPSS
Statistical test
Comparing 2 medians
• Mann-Whitney U
Comparing ≥ 3 medians
• Kruskal Wallis
Continuous → categories
• Median
404
9.3.1 Mann-Whitney U
• It is a non-parametric test alternative to independent t-test.
• It is also known as Wilcoxon Rank-sum test.
• It can be used to compare two populations median, if the
underlying data are not normally distributed, if the
measurements are ordinal.
• The rank sum test (Mann-Whitney U test) is performed on ranks
rather than the actual measurements..
MdRodiSPSS
405
Mann-Whitney U
Research Question:
• You want to determine whether the median of cortisol level is
different betwen male and female.
a) Null Hypothesis:
• Ho: There is no different in the median of cortisol level between
male and female (Ho: θ1 = θ2).
b) Alternate hypothesis:
• Ha: There is no different in the median of cortisol level between
male and female (Ho: θ1 ≠ θ2)..
MdRodiSPSS
406
Hands-on: Mann-Whitney U
(Method 1)
1
6
7
8
5
2
12
3
9
4
MdRodiSPSS
10
11
407
Output & Conclusion
• Since the p-value (p=0.037) is less than α (0.05), we reject the Ho and
accept Ha.
• Therefore, we can conclude that the there was a significant different in
the median of cortisol between male and female..
MdRodiSPSS
• The median of cortisol level
in male is significantly
higher compared to female
(p=0.037) [Median = 9.8
(IQR: 1.2) versus [Median =
408
8.50 (IQR: 3.1)]
Output & Conclusion
Table: The comparison of Cortisol level between male and female (N=15)
Variable
Gender
n
Median (IQR)
za
p-value
Cortisol
Male
7
9.80 (1.2)
-2.087
0.037*
Female
8
8.50 (3.1)
* statistical significant at α=0.05
a Statistical test: Mann-Whitney U
MdRodiSPSS
409
Hands-on: Mann-Whitney U
(Method 2)
1
4
5
2
3
MdRodiSPSS
410
Hands-on: Mann-Whitney U
(Method 2)
6
10
7
8
11
12
9
13
MdRodiSPSS
411
Output & Conclusion
p=0.040
• Since the p-value is less than 0.05 (p=0.040) at α=0.05, we reject the Ho
and accept Ha.
•MdRodiSPSS
Therefore, we can conclude that the there was a significant different in412
the median of cortisol between male and female..
Summary: Mann-Whitney U
Table: The comparison of Cortisol level between male and female (N=15)
Variable
Gender
n
Median (IQR)
test statistica
p-value
Cortisol
Male
7
9.80 (1.2)
10.00
0.040*
Female
8
8.50 (3.1)
* statistical significant at α=0.05
a Statistical test: Mann-Whitney U
MdRodiSPSS
413
9.3.2 Kruskal-Wallis test
• It is a non-parametric test alternative to ANOVA.
• It can be used to compare three or more groups if the underlying
data are not normally distributed, or if the measurements are
ordinal.
• It is a test is performed on ranks rather than actual
measurements..
MdRodiSPSS
414
Kruskal-Wallis test
Research question:
• Is there any different in the median of pack cell volume (PCV)
between 3 group of races (Malay, Chinese and Indian).
a) Null hypothesis:
• Ho: The median of pack cell volume (PCV) between 3 group of
races (Malay, Chinese and Indian) are the same.
• Ho: θ1 = θ2 = θ3.
b) Alternative hypothesis:
• Ha: There is at least one pair different in the median of pack
cell volume (PCV) between 3 group of races (Malay, Chinese
and Indian)..
MdRodiSPSS
415
Hands-on: Kruskal Wallis test
(Method 1)
1
6
7
8
5
2
3
12
9
4
10
MdRodiSPSS
11
416
Output: Kruskal Wallis test
• Since the p-value (p=0.005) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we conclude that there is at least one pair different in the
median of pack cell volume (PCV) between 3 group of races (Malay,
Chinese and Indian)
Which Pair?
• Since there ia no post-hoc or pairwise comparison method in this
method, we need to analyse using Mann-Whitney U test, pair by pair
MdRodiSPSS
417
with an adjustment.
Hands-on: Mann-Whitney U test
[Malay (1) versus Chinese (2)]
1
6
7
8
5
12
2
3
9
4
10
11
Then:
- change the group: (1, Malay) and (3, Indian) and
MdRodiSPSS
- change the group: (2, Chinese) and (3, Indian)
418
Output: Kruskal Wallis test
Malay versus Chinese
Malay versus Indian
Chinese versus Indian
• It was found that all pairs were
statistically significant (p<0.05)
at α=0.05.
• However, we need to do an adjustment by times the p-value by the
number of group to get the adjusted p-value (it is called Bonferroni
correction)
MdRodiSPSS
419
Output: Kruskal Wallis test
• Therefore:
i. between Malay and Chinese: 0.014 x 3 = 0.042
ii. between Malay and Indian: 0.014 x 3 0.042
iii. between Chinese and Indian; 0.020 x 3 = 0.060
• The significant pairs were between:
i. Malay and Chinese (p=0.042); and
ii. Malay and Indian (p=0.042)
MdRodiSPSS
420
Output: Summary
Table: Median Pack Cell Volume by race (N=13)
Variables
Race
n
Median
(IQR)
Test statistica
(df)
p-value
Pack Cell
Volume
Malay
5
31.00 (19.0)
0.005*b
Chinese
4
8.00 (1.5)
10.711
(2)
Indian
4
3.50 (2.5)
* statistically significant at α=0.05
a Kruskal Wallis test
b The significant difference is between Malay and Indian (p=0.042) and Malay and CHinese
(p=0.042) by pairwise comparison (Bonferroni adjustment)..
• The median Pack Cell Volume for Malay is higher than Chinese [31.00
(IQR: 19.0) versus 8.00 (IQR: 1.5), p=0.042] and the median Pack Cell
Volume for Malay is higher than Indian [31.00 (IQR: 19.0) versus 3.5
MdRodiSPSS
421
(IQR: 2.5), p=0.042]..
Hands-on: Kruskal Wallis test
(Method 2)
1
4
5
2
3
MdRodiSPSS
422
Hands-on: Kruskal Wallis test
(Method 2)
6
10
7
8
11
12
9
13
MdRodiSPSS
423
Output: Kruskal Wallis Test
p=0.005
• Since the p-value (p=0.005) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we can conclude that there is at least one pair different in the
median of pack cell volume (PCV) between 3 group of races (Malay,
Chinese and Indian)
MdRodiSPSS
Which pair ?
Malay – Chinese?
Malay – Indian?
Chinese – Indian?
424
Output: Kruskal Wallis Test
MdRodiSPSS
Drop-up menu:
•Select pairwise comparison
425
Output: Kruskal Wallis Test
• The significant pair different is between Indian and Malay by pairwise
comparison (p=0.003)
• The other pairs were not statistically significant (p>0.05)..
MdRodiSPSS
426
Output: Summary
Table: Median Pack Cell Volume by race (N=13)
Variables
Race
n
Median
(IQR)
Test statistica
(df)
p-value
Pack Cell
Volume
Malay
5
31.00 (19.0)
0.005*b
Chinese
4
8.00 (1.5)
10.711
(2)
Indian
4
3.50 (2.5)
* statistically significant at α=0.05
a Kruskal Wallis test
b The significant difference is between Malay and Indian (p=0.042) and Malay and CHinese
(p=0.042) by pairwise comparison
• The median Pack Cell Volume for Malay is higher than Chinese [31.00
(IQR: 19.0) versus 8.00 (IQR: 1.5), p=0.003].
MdRodiSPSS
427
9.3.3 Median test
• It is a non-parametric procedure used to test the Ho that two
independent samples have been drawn from population with
equal medians.
• It is almost the same like chi-square test for independence,
however we need to find the median of the dependent variable
first before we can divided the dependent variable into two
groups..
MdRodiSPSS
428
Median test
Research question:
• A research want to determine whether the median endorphin
level between those who had heart disease and no heart disease.
a) Null hypothesis:
• Ho: There is no different in the median endorphin level between
those who had heart disease and no heart disease.
b) Alternate hypothesis:
• Ha: There is a different in the median endorphin level between
those who had heart disease and no heart disease..
MdRodiSPSS
429
Hands-on: Explore
First: we need to explore the dependent variable (Endorphine) to find the
median.
1
4
2
3
5
MdRodiSPSS
430
Output: Explore
• The median for Endorphin level is 8.5
• Therefore, we use 8.5 as cut-out point.
MdRodiSPSS
431
Hands-on: Median Test
1
4
5
2
3
MdRodiSPSS
432
Hands-on: Median Test
6
10
7
11
8
12
13
9
14
MdRodiSPSS
433
Output: Median Test
p=1.000
• Since the p-value (p=1.000) is more than α (0.05), we DO NOT reject
Ho.
• We can conclude that there is no different in the median endorphin level
between those who had heart disease and no heart disease (median is at
MdRodiSPSS
434
8.5 µmol/L)..
Output: Summary
Cross tabulation
Table: The cross-tabulation between heart disease and endorphine level (N=21)
Heart disease
Endorphine More than 8.5.
level
Less than 8.5.
a
Median test (at 8.50)
MdRodiSPSS
Total, freq,
n(%)
Test
statistica
p-value
0.043
(1)
1.000#
Yes
Freg., n(%)
No
Freq, n(%)
5 (45.5)
6 (54.5)
11 (100.0)
6 (60.0)
4 (40.0)
10 (100.0)
435
9.4 Non-Parametric Tests
Descriptions
4. Paired
MdRodiSPSS
Statistical test
Comparing related 2 median
groups
• Sign test
• Wilcoxon Match-Pair
(Wilcoxon Pair-Rank)
Comparing related ≥ 3 median
groups
• Friedman
436
9.4.1 Paired data - Sign test
• The “paired-sample sign test” is used to determine whether there is a
median difference between paired or matched observation.
• It can be considered as an alternative to the dependent t-test or
Wilcoxon signed-rank test when the distribution of difference between
paired observations is either not normal or asymmetrical.
• The sign test may be employed to test the Ho that the median difference
is 0..
MdRodiSPSS
437
Paired data - Sign test
• The “paired-sample sign test” is used to determine whether there is a
median difference between paired or matched observation.
• It can be considered as an alternative to the dependent t-test or
Wilcoxon signed-rank test when the distribution of difference between
paired observations is either not normal or asymmetrical.
• The sign test may be employed to test the Ho that the median difference
is 0.
• Assumptions:
i. The dependent variable should be on continuous or ordinal.
ii. Independent variable should consists of two categorical, “related
group” or match-pairs”.
iii. The paired observations for each participant need to be independent.
iv. The difference score (i.e. difference between the paired observations) is
from a contiuous distribution..
MdRodiSPSS
438
Sign test – Paired data
Research question:
• Does dopamine level differ before and after surgery?
a) Null hypothesis:
• Ho: Pre-surgery and post-surgery median dopamine levels are the
same.
- Ho: Median difference = 0.
b) Alternate hypothesis:
• Ha: Pre-surgery and post-surgery median dopamine levels are
NOT the same.
- Ha: Median difference ≠ 0..
MdRodiSPSS
439
Hands-on: Sign test – Paired data
1
4
5
2
3
MdRodiSPSS
440
Hands-on: Sign test – Paired data
6
9
10
7
11
8
12
MdRodiSPSS
441
Output: Sign test – Paired data
p=0.002
• Since the p-value (p=0.002) is less than α (0.05), therefore we reject Ho
and accept Ha.
• We can conclude than the median of dopamine if pre-surgery and postMdRodiSPSS
442
surgery are significantly NOT the same..
Output: Summary
• The median of dopamine level before surgery was significantly higher
than after surgery [Median: 9.50 (IQR: 3) versus Median: 8.00 (IQR:
2), p=0.002]
Table: The comparison of dopamine level pre-sergury and post-surgery (N=16)
N
Median (IQR)
Test statistica
p-value
Dopamine (pre-surgery)
16
9.50 (3.00)
1.000
0.002*
Dopamine (post-surgery
16
8.00 (3.00)
* statistically significant at ɑ=0.05
a Statistical test: Sign test - Paired data
MdRodiSPSS
443
9.4.2 Wilcoxon signed-rank test
• It is a non-parametric test alternative to two dependent samples.
• It is also known as Wilcoxon Match-Pair or Wilcoxon Pair-Rank
test.
• Primarily, it is used when the underlying populations of
differences cannot be assumed to be normally distributed.
• It can also be used when data are ordinal rather than discrete.
• Because it relied on ranks, the non-parametric test is less sensitive
to measurement error and to outlying values..
MdRodiSPSS
444
Wilcoxon signed-rank test
Research question:
• Does the of Calcium levels differ before and after pregnancy?
a) Null hypothesis:
• Ho: Pre-pregnancy and post-pregnancy median Calcium levels
are the same.
- Ho: Median difference = 0.
b) Alternate hypothesis:
• Ha: Pre-pregnancy and post-pregnancy median Calcium levels
are NOT the same.
- Ha: Median difference ≠ 0..
MdRodiSPSS
445
Hands-on: Wilcoxon signed-rank test
(Method 1)
1
4
5
2
3
MdRodiSPSS
446
Hands-on: Wilcoxon signed-rank test
(Method 1)
6
9
10
7
11
8
12
MdRodiSPSS
447
Output: Wilcoxon signed-rank test
p=0.176
• Since the p-value (p=0.176) is more than α (0.05), we DO NOT reject
Ho.
• Therefore, we can conclude that the pre-pregnancy and post-pregnancy
median Calcium levels are the same..
MdRodiSPSS
448
Output: Summary
• The median of calcium level pre-pregnancy was higher than postpregnancy [Median: 10.30 (IQR: 3.0) versus Median: 9.80 (IQR: 2.7)].
However, the difference was not statistically significant (p=0.176
Table: The comparison of dopamine level pre-sergury and post-surgery (N=16)
N
Median (IQR)
Test statistica
p-value
Calcium (pre-pregnancy)
16
10.30 (3.00)
31.000
0.176
Calcium (post-pregnancy)
16
9.80 (2.70)
a Statistical
MdRodiSPSS
test: Wilcoxon Match-Pair test
449
Hands-on: Wilcoxon signed-rank test
(Method 2)
1
5
6
7
2
3
4
MdRodiSPSS
450
Output: Wilcoxon signed-rank test
(Method 2)
Table: The comparison of dopamine level pre-sergury and post-surgery (N=16)
N
Median (IQR)
Test statistica
p-value
Calcium (pre-pregnancy)
16
10.30 (3.00)
31.000
0.176
Calcium (post-pregnancy)
16
9.80 (2.70)
a Statistical
MdRodiSPSS
test: Wilcoxon Match-Pair test
451
9.4.3 Friedman Test
• The Friedman test is a non-parametric method, alternative to
the parametric one-way repeated measures ANOVA ranks.
• In its use of ranks it is similar to the Kruskal–Wallis one-way
ANOVA by ranks.
• It is used to detect differences in treatments across multiple test
attempts.
• The procedure involves ranking each row (or block) together, then
considering the values of ranks by columns..
MdRodiSPSS
452
Friedman Test
Research question:
• A group of researcher conducts a study to compare three
methods of stimulators by give then grading. They want to
determine is there any different in the grade between stimulators
(Model A, B and C).
a) Null hypothesis:
• Ho: The median grade of three models are equal.
b) Alternative hypothesis:
• Ha: The median grade of three models are not equal. There is at
least one pair different in the median grade..
MdRodiSPSS
453
Hands-on: Friedman Test (Method 1)
1
5
6
2
7
3
4
MdRodiSPSS
454
Output: Friedman Test
• Since the p-value (p=0.013) is less than α (0.05), we reject the Ho and
accept Ha.
• Therefore, we can conclude that there is at least one pair group different
in the median grade between 3 group of models (A, B and C).
Which pair?
• Since there is n post-hoc or pairwise comparison method in this
method, we need to analyze using Wilcoxon signed-rank test, pair by
pair with an adjustment..
MdRodiSPSS
455
Hands-on: Wilcoxon signed-rank test
1
5
6
7
2
3
4
MdRodiSPSS
456
Output: Friedman Test
• It was found that two pairs were statistically significant (Model A versus
Model B and Model B versus Model C)
• However, we need to do an adjustment by times the p-value by the
number of group to get the adjusted p-value (it is called Bonferroni
correction)
• Therefore: - between Model A and Model B: 0.020 x 3 = 0.060
- between Model B and Model C: 0.21 x 3 = 0.063
• After Bonferroni adjustment, non pair found to be significant
MdRodiSPSS
457
difference..
Summary: Friedman Test
Descriptive part
Table: Median Grade between Stimulator model (N=9)
Grade
a
N
Model A,
Median
(IQR)
Model B,
Median
(IQR)
Model C,
Median
(IQR)
Chisquarea
(df)
p-value
9
2.00 (1)
3.00 (1)
2.00 (1)
8.706 (2)
0.013*
Statistical Test: Friedman test
MdRodiSPSS
458
Hands-on: Friedman Test (Method 2)
1
4
5
6
2
3
MdRodiSPSS
459
Hands-on: Friedman Test (Method 2)
7
10
11
8
9
12
13
14
MdRodiSPSS
460
Output: Friendman Test
p=0.013
• Since the p-value (p=0.013) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we can conclude that there is at least one pair different in the
median of gradebetween 3 models (A, B and C)
MdRodiSPSS
Which pair ?
A-B
A-C
B-C
461
Output: Friedman Test
Drop-up menu:
• Select: Pairwise Comparison
MdRodiSPSS
462
Output: Friedman Test
• The significant pair different is between Model A and
Model C by pairwise comparison (p=0.029)
MdRodiSPSS
463
Summary: Friedman Test
Descriptive
part
• The median grade Model
A is less than median
grade Model B [ 2.00
(IQR: 1) versus 3.00
(IQR:1)]
Table: Median Grade between the Stimulator Model (N=9)
Grade
n
Model A
[Median
(IQR)]
Model B
[Median
(IQR)]
Model C
[Median
(IQR)]
Test
statistica
(df)
P-value
9
2.00 (1)
3.00 (1)
2.00 (1)
8.706 (2)
0.013*b
* statistically significant at α=0.05
a Friedman test
b The significant difference is between Stimulator Model A and Model C (p=0.029) by
MdRodiSPSS
pairwise comparison
464
9.5 Non-Parametric Tests
Descriptions
5. Correlation
MdRodiSPSS
Statistical test
Continuous versus Continuous
• Spearman-rank
Ordinal versus Ordinal
• Kendall's Tau
Nominal versus Nominal
• Phi-coefficient
465
9.5.1 Spearman rank Correlation
• It is a bivariate measure of correlation / association that is
employed with rank-order data.
• It represents the degree of relationship between two or more
variables.
• The data for both are in a rank-order format..
MdRodiSPSS
466
Spearman rank Correlation
Research question:
• You want to determine is there any correlation between
depression score and anxiety score.
a) Null hypothesis:
• Ho: There is no significant (linear) rank correlation between
depression score and anxiety score (Ho: rs = 0)
b) Alternative hypothesis:
• Ha: There is a significant (linear) rank correlation between
depression score and anxiety score. (Ha: rs ≠ 0)
MdRodiSPSS
467
Hands-on: checking for normality
(depression and anxiety)
1
5
2
6
3
7
4
MdRodiSPSS
468
Output: Scatter plot
• The p-value of both variables (depression and stress) were found
statistically significant (p<0.05).
• Therefore, we can conclude that both data (depression and stress) were
not normally distributed..
MdRodiSPSS
469
Hands-on: Scatter plot
1
4
2
5
3
MdRodiSPSS
470
Hands-on: Scatter plot
6
7
8
MdRodiSPSS
471
Output: Scatter plot
• From the scatter plot, we have a rough idea that there could be a
positive rank correlation between depression and stress scores.
• It is significant?
MdRodiSPSS
472
Hands-on: Spearman rank Correlation
1
4
2
3
5
6
MdRodiSPSS
473
Output: Spearman rank Correlation
• rs = 0.888
• Since p-value (<0.001) is less
than α (0.05),
we reject the
Ho and accept
the Ha.
• The correlation is positive and the strength is strong.
• Statistical conclusion: There is a significant positive strong rank
correlation between depression and stress scores (p<0.001).
• Problem conclusion: There is a positive strong rank correlation between
depression and stress scores (rs=0.888)
• We can conclude that when the depression score increases, the anxiety
MdRodiSPSS
474
score will also increases..
9.5.2 Kendal Rank correlation coefficient
• It is a statistic used to measure the strength and direction of
association that exists between two variables measured on at
least an ordinal scale.
• A “tau” test is a non-parametric hypothesis for statistical
dependence based on the tau-coefficient.
• Three types:
i. Tau-a - test the strength of association the the cross tabulation
and both variables have to be ordinal. Tau-a will not make any
adjustment for ties.
ii. Tau b - unlike Tau-a, it makes an adjustment for ties.
iii.Tau-c - is more suitable than Tau-b for the analysis of data based
on non-square (i.e. contingency tables)..
MdRodiSPSS
475
Kendal Rank correlation coefficient
Example:
• The correlation between examination grade and time spend
revising (i.e. where there were six examination grade - A, B, C,
D, E and F) and revision time (less than 5 hours; 5 - 9 hours; 10 14 hours; 15 - 19 hours; and 20 hours or more).
• The correlation between customer satisfaction (dissatisfaction,
satisfaction and very satisfaction) and delivery time (measured in
days).
Assumptions:
i. The two variables should be measured in an ordinal or
continuous scale.
ii. Kendall's tau-b determines whether there is a monotonic
relationship between two variables..
MdRodiSPSS
476
Kendal Rank correlation coefficient
Research question:
• Is there any correlation between educational status and salary.
• x1: educational status: Primary; Secondary; and Tertiary
• x2 : salary - less than RM999; RM1,000 - RM4,999; RM5,000 RM9,999; and more than RM10,000.
Null Hypothesis:
• Ho: There is a correlation between educational status and salary
Alternate Hypothesis:
• Ha: There is no correlation between educational status and
salary..
MdRodiSPSS
477
Hands-on: Kendal tau-b
1
2
4
3
MdRodiSPSS
6
5
478
Hands-on: Kendal tau-b
7
9
8
MdRodiSPSS
479
Hands-on: Kendal tau-b
• Since the p-value (p=0.006) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore we can conclude that there is a significant correlation
between educational status and salary.
• There is a weak positive correlation between educational status and
MdRodiSPSS
salary (tau-b: 0.219)..
480
Output: Kendal tau-b
Table: The correlation between Educational level and salary
Educational level
Salary:
- Less than RM999
- RM1,000 - RM4,999
- RM5,000 - RM9,999
- More than RM10,000
No
Formal
Primary
Secondary
Tertiary
13
3
2
1
16
7
4
3
10
7
13
4
8
7
4
4
Valuea
p-value
0.219
0.006*
* Statistically significant at ɑ=0.05
a Statistical test: kendall's Tau-b
MdRodiSPSS
481
9.5.3 Phi-coefficient
• It is a non-parametric test of relationship that operates on two
dichotomous variables.
• It intersects variables a cross a 2 x 2 matric to estimate whether
there is a non-random pattern across the four cells in the 2 x 2
matrix.
• The signs [positive (+ve) or negative (-ve)] are relevant:
i. A positive phi-coefficient indicates that the most of the data are
in the diagonal cells.
ii. A negative phi-coefficient indicates that the most of the data are
in the off-diagonal cells.
• The main thing to consider is the strength of the relationship
between two variables and look at the 2 x 2 matrix to determine
what is means..
MdRodiSPSS
482
Phi-coefficient
Research question:
• Is there any correlation between smoking and heart disease?
a) Null hypothesis:
• Ho: There is no correlation between smoking and heart disease
b) Alternate hypothesis:
• Ha: There is a correlation between smoking and heart disease.
MdRodiSPSS
483
Hands-on: Phi-coefficient
1
4
2
3
MdRodiSPSS
6
5
484
Hands-on: Phi-coefficient
7
8
MdRodiSPSS
9
485
Output : Phi-coefficient
• Since the p-value (p<0.001) is less than α (0.05), we reject Ho and
accept Ha.
• Therefore, we can conclude that there is a significant correlation
between smoking and heart disease.
• There is a weak positive correlation between smoking and heart
MdRodiSPSS
disease..
486
Summary : Phi-coefficient
Table: the correlation between smoking and heart disease (N=106)
Heart disease
Smoking
status
Yes
No
Yes
30
16
No
12
48
Phi-coefficienta
p-value
0.548
<0.001*
* Statistical significant at α=0.05
a Statistical test: Phi-coefficient
MdRodiSPSS
487
Thank you for your attention
rodi@salam.uitm.edu.my
MdRodiSPSS
488
Download