33811

advertisement
Dr. Mona Hassan Ahmed Hassan
Prof. Biostatistics
What to do
before sitting
to PC?
Statistical
Software
How to
generate and
interpret
results?
Data Coding
Transformation of qualitative information into
Numbers
OR
Symbols
Data Preparation
Either the information is transferred from the original record to a “coding sheet”
Ser.
Column
Ser.
Code
Column
Ser.
Code
Column
Age
Age
Age
Sex
Sex
Sex
MS
MS
MS
Educ.
Coding form
Code
Code
ID 1
1. Date of Interview
10/1/2008
2. What is your date of Birth?
25/8/1986
3. What sex are you?
Male (m) Female (f)
4. What is your marital status?
Single
(1)
Married (2)
Widowed (3)
Divorced (4)
5. What is your height (cm)?160
6. What is your weight (kg)?58
1
10/01/2008
25/08/1986
f
1
160
58
Coding by more than one
person
 Precise
instructions should be
developed for coders
 Coders, must be trained
 check for inter-coder reliability
Sorting of the questionnaires
1-100
101-200
Describing the Sample
measures of central tendency and
variability.
 The appropriate measure of central
tendency
and
variability
will
depend upon the variables level of
measurement and the shape of the
distribution.

Scales of measurement
Interval
Nominal
Ratio
Ordinal
Scales of Measurement
Ali
Samy
Ramy
Nominal
Symbols
Assigned
to Runners
Finish
Ordinal
Rank Order
of Winners
Finish
3rd place
Interval
Ratio
Performance
Rating on a
0 to 10 Scale
Time to
Finish, in
Seconds
2nd place 1st place
3
7
9
15.2
14.1
13.4
Scales of Measurement
Scale
Nominal
Ordinal
Interval
Ratio
Basic
Characteristics
Numbers identify
& classify objects
Nos. indicate the
relative positions
of objects but not
the magnitude of
differences
between them
Differences
between objects
can be compared,
zero point is
arbitrary
Common
Permissible Statistics
Examples
Descriptive
Inferential
Patient number, ICD Percentages, mode Chi-square, binomial test
code, Blood group
Preference rankings, Percentile, median Rank-order correlation,
Social class
Friedman ANOVA
Temperature,
Range, mean,
Attitude, opinion, IQ standard deviation
Zero point is fixed, Length, weight,
ratios of scale
Income
values can be
compared
Geometric mean,
harmonic mean,
Coefficient of
variation
Product-moment correlation,
t tests, regression
Shapes of Distribution
6
5
4
3
2
1
0
40
50
60
70
Mean
Median
Mode
80
90
100
68% within mean+SD
95% within mean+2SD
99% within mean+3SD
Right-skewed distribution
Mode Median Mean
If Mean > Median  Positive or right skewness
(long right tail)
It arises when the mean is increased by some unusually
high values
Left-skewed distribution
Mean Median Mode
If Mean < Median  Negative or left skewness (long left
tail).
Negative skewness occurs when the mean is reduced by
some extremely low values.
Inference
Developing and Testing a Hypothesis
differences in frequency distributions of
nominal level variables chi-square
associations or correlations between
variables, bivariate correlations
differences between groups with respect to
the distribution of interval/ratio level data.
t-tests
The most popular statistical packages
1
SAS
2
3
4
5
6
7
8
9
10
SPSS
STATA
Epi Info
SUDAAN
S-PLUS
MedCalc
Excel
Statistica
Minitab
Sample
size
Using Epitable (Under EpiInfo) to
Calculate Sample Size
SPSS
Statistical
Packages
Sciences
FOR
Social
Creating a Data File in SPSS










ID
Gender Male Female
Date of Birth
Educational Level (years)
Employment Category 1 Clerical 2 Custodial 3
Manager
Current Salary $
Beginning Salary $
Months since Hire
Previous Experience (months)
Minority Classification 0 No 1 Yes
Data Entry
Excel
Access
Word
Any Statistical software
Data entry
Data cleaning

General data check:
Printout

Quick data check (Frequency tables)
1- Wild codes check (invalid codes)
2- Completeness check: ensure that
all cases collected are
represented in the data file
without replication
Simple frequency
Data check
jobcat Employment Category
Valid
Frequency
1 Clerical
363
2 Custodial
27
3 Manager
84
Total
474
Percent Valid Percent
76.6
76.6
5.7
5.7
17.7
17.7
100.0
100.0
Cumulative
Percent
76.6
82.3
100.0
Perform Descriptive
Statistics
Descriptive
Descriptive Statistics
N
Statistic
Educational Level (years)
474
Months since Hire
474
Valid N (listwise)
474
Range Minimum Maximum
Statistic Statistic Statistic
13
8
21
35
63
98
Mean
Statistic Std. Error
13.49
.133
81.11
.462
Std.
Deviation
Statistic
2.885
10.061
Variance
Statistic
8.322
101.223
Conduct Simple Correlations
and regression
Correlation
Correlations
educ
s albegin
Educational
Beginning
Level (years)
Salary
Pears on Correlation
s albegin Beginning Salary
.633**
1
Sig. (2-tailed)
.000
N
474
474
educ Educational Level Pears on Correlation
1
.633**
(years )
Sig. (2-tailed)
.000
N
474
474
**. Correlation is s ignificant at the 0.01 level (2-tailed).
Regression
Coefficientsa
Model
1
Uns tandardized
Coefficients
B
Std. Error
-6290.97
1340.920
Standardized
Coefficients
Beta
97.197
.633
(Cons tant)
educ Educational
1727.528
Level (years)
a. Dependent Variable: salbegin Beginning Salary
t
-4.692
Sig.
.000
17.773
.000
95% Confidence Interval for
B
Lower Bound Upper Bound
-8925.878
-3656.056
1536.536
1918.521
Scatter
t- test (Two independent groups)
t- test (Two independent groups)
t- test (Two independent groups)
Group Statistics
gender Gender
educ Educational m Male
Level (years)
f Female
N
258
216
Mean
14.43
12.37
Std. Deviation
2.979
2.319
Std. Error
Mean
.185
.158
Independent Samples Test
Levene's Test for
Equality of Variances
F
educ Educational Equal variances assumed 17.884
Level (years)
Equal variances not
assumed
Sig.
.000
t-test for Equality of Means
t
8.276
8.458
95% Confidence Interval
of the Difference
Mean
Std. Error
df
Sig. (2-tailed) Difference Difference Lower
Upper
472
.000
2.060
.249
1.571
2.549
469.595
.000
2.060
.244
1.581
2.538
Paired t- test (Dependent groups)
Paired Samples Statistics
Pair 1
Mean
s alary Current Salary
$34,419.57
s albegin Beginning Salary
$17,016.09
474
Std. Deviation
$17,075.661
Std. Error
Mean
$784.311
474
$7,870.638
$361.510
N
Paired Samples Correlations
N
Pair 1
s alary Current Salary &
s albegin Beginning Salary
Correlation
474
.880
Sig.
.000
Paired Samples Test
Paired Differences
Mean
Pair 1 salary Current Salary $17,403.481
salbegin Beginning Salary
Std. Deviation
Std.
Error
Mean
$10,814.620
$496.7
t
35.036
df
473
Sig. (2-tailed)
.000
Chi-Square test
jobcat Employment Category * gender Gender Crosstabulation
gender Gender
f Female
m Male
jobcat Employment 1 Clerical
Count
206
157
Category
% within gender Gender
95.4%
60.9%
2 Cus todial Count
0
27
% within gender Gender
.0%
10.5%
3 Manager Count
10
74
% within gender Gender
4.6%
28.7%
Total
Count
216
258
% within gender Gender 100.0%
100.0%
Chi-Square Tests
Pears on Chi-Square
Likelihood Ratio
N of Valid Cases
Value
79.277a
95.463
474
df
2
2
Asymp. Sig.
(2-s ided)
.000
.000
a. 0 cells (.0%) have expected count les s than 5. The
minimum expected count is 12.30.
Total
363
76.6%
27
5.7%
84
17.7%
474
100.0%
Download