Dr. Mona Hassan Ahmed Hassan Prof. Biostatistics What to do before sitting to PC? Statistical Software How to generate and interpret results? Data Coding Transformation of qualitative information into Numbers OR Symbols Data Preparation Either the information is transferred from the original record to a “coding sheet” Ser. Column Ser. Code Column Ser. Code Column Age Age Age Sex Sex Sex MS MS MS Educ. Coding form Code Code ID 1 1. Date of Interview 10/1/2008 2. What is your date of Birth? 25/8/1986 3. What sex are you? Male (m) Female (f) 4. What is your marital status? Single (1) Married (2) Widowed (3) Divorced (4) 5. What is your height (cm)?160 6. What is your weight (kg)?58 1 10/01/2008 25/08/1986 f 1 160 58 Coding by more than one person Precise instructions should be developed for coders Coders, must be trained check for inter-coder reliability Sorting of the questionnaires 1-100 101-200 Describing the Sample measures of central tendency and variability. The appropriate measure of central tendency and variability will depend upon the variables level of measurement and the shape of the distribution. Scales of measurement Interval Nominal Ratio Ordinal Scales of Measurement Ali Samy Ramy Nominal Symbols Assigned to Runners Finish Ordinal Rank Order of Winners Finish 3rd place Interval Ratio Performance Rating on a 0 to 10 Scale Time to Finish, in Seconds 2nd place 1st place 3 7 9 15.2 14.1 13.4 Scales of Measurement Scale Nominal Ordinal Interval Ratio Basic Characteristics Numbers identify & classify objects Nos. indicate the relative positions of objects but not the magnitude of differences between them Differences between objects can be compared, zero point is arbitrary Common Permissible Statistics Examples Descriptive Inferential Patient number, ICD Percentages, mode Chi-square, binomial test code, Blood group Preference rankings, Percentile, median Rank-order correlation, Social class Friedman ANOVA Temperature, Range, mean, Attitude, opinion, IQ standard deviation Zero point is fixed, Length, weight, ratios of scale Income values can be compared Geometric mean, harmonic mean, Coefficient of variation Product-moment correlation, t tests, regression Shapes of Distribution 6 5 4 3 2 1 0 40 50 60 70 Mean Median Mode 80 90 100 68% within mean+SD 95% within mean+2SD 99% within mean+3SD Right-skewed distribution Mode Median Mean If Mean > Median Positive or right skewness (long right tail) It arises when the mean is increased by some unusually high values Left-skewed distribution Mean Median Mode If Mean < Median Negative or left skewness (long left tail). Negative skewness occurs when the mean is reduced by some extremely low values. Inference Developing and Testing a Hypothesis differences in frequency distributions of nominal level variables chi-square associations or correlations between variables, bivariate correlations differences between groups with respect to the distribution of interval/ratio level data. t-tests The most popular statistical packages 1 SAS 2 3 4 5 6 7 8 9 10 SPSS STATA Epi Info SUDAAN S-PLUS MedCalc Excel Statistica Minitab Sample size Using Epitable (Under EpiInfo) to Calculate Sample Size SPSS Statistical Packages Sciences FOR Social Creating a Data File in SPSS ID Gender Male Female Date of Birth Educational Level (years) Employment Category 1 Clerical 2 Custodial 3 Manager Current Salary $ Beginning Salary $ Months since Hire Previous Experience (months) Minority Classification 0 No 1 Yes Data Entry Excel Access Word Any Statistical software Data entry Data cleaning General data check: Printout Quick data check (Frequency tables) 1- Wild codes check (invalid codes) 2- Completeness check: ensure that all cases collected are represented in the data file without replication Simple frequency Data check jobcat Employment Category Valid Frequency 1 Clerical 363 2 Custodial 27 3 Manager 84 Total 474 Percent Valid Percent 76.6 76.6 5.7 5.7 17.7 17.7 100.0 100.0 Cumulative Percent 76.6 82.3 100.0 Perform Descriptive Statistics Descriptive Descriptive Statistics N Statistic Educational Level (years) 474 Months since Hire 474 Valid N (listwise) 474 Range Minimum Maximum Statistic Statistic Statistic 13 8 21 35 63 98 Mean Statistic Std. Error 13.49 .133 81.11 .462 Std. Deviation Statistic 2.885 10.061 Variance Statistic 8.322 101.223 Conduct Simple Correlations and regression Correlation Correlations educ s albegin Educational Beginning Level (years) Salary Pears on Correlation s albegin Beginning Salary .633** 1 Sig. (2-tailed) .000 N 474 474 educ Educational Level Pears on Correlation 1 .633** (years ) Sig. (2-tailed) .000 N 474 474 **. Correlation is s ignificant at the 0.01 level (2-tailed). Regression Coefficientsa Model 1 Uns tandardized Coefficients B Std. Error -6290.97 1340.920 Standardized Coefficients Beta 97.197 .633 (Cons tant) educ Educational 1727.528 Level (years) a. Dependent Variable: salbegin Beginning Salary t -4.692 Sig. .000 17.773 .000 95% Confidence Interval for B Lower Bound Upper Bound -8925.878 -3656.056 1536.536 1918.521 Scatter t- test (Two independent groups) t- test (Two independent groups) t- test (Two independent groups) Group Statistics gender Gender educ Educational m Male Level (years) f Female N 258 216 Mean 14.43 12.37 Std. Deviation 2.979 2.319 Std. Error Mean .185 .158 Independent Samples Test Levene's Test for Equality of Variances F educ Educational Equal variances assumed 17.884 Level (years) Equal variances not assumed Sig. .000 t-test for Equality of Means t 8.276 8.458 95% Confidence Interval of the Difference Mean Std. Error df Sig. (2-tailed) Difference Difference Lower Upper 472 .000 2.060 .249 1.571 2.549 469.595 .000 2.060 .244 1.581 2.538 Paired t- test (Dependent groups) Paired Samples Statistics Pair 1 Mean s alary Current Salary $34,419.57 s albegin Beginning Salary $17,016.09 474 Std. Deviation $17,075.661 Std. Error Mean $784.311 474 $7,870.638 $361.510 N Paired Samples Correlations N Pair 1 s alary Current Salary & s albegin Beginning Salary Correlation 474 .880 Sig. .000 Paired Samples Test Paired Differences Mean Pair 1 salary Current Salary $17,403.481 salbegin Beginning Salary Std. Deviation Std. Error Mean $10,814.620 $496.7 t 35.036 df 473 Sig. (2-tailed) .000 Chi-Square test jobcat Employment Category * gender Gender Crosstabulation gender Gender f Female m Male jobcat Employment 1 Clerical Count 206 157 Category % within gender Gender 95.4% 60.9% 2 Cus todial Count 0 27 % within gender Gender .0% 10.5% 3 Manager Count 10 74 % within gender Gender 4.6% 28.7% Total Count 216 258 % within gender Gender 100.0% 100.0% Chi-Square Tests Pears on Chi-Square Likelihood Ratio N of Valid Cases Value 79.277a 95.463 474 df 2 2 Asymp. Sig. (2-s ided) .000 .000 a. 0 cells (.0%) have expected count les s than 5. The minimum expected count is 12.30. Total 363 76.6% 27 5.7% 84 17.7% 474 100.0%