Preliminary Data Analysis

Introduction to Data Analysis  Why do we analyze data?   Make sense of data we have collected Basic steps in preliminary data analysis    Editing Coding Tabulating Introduction to Data Analysis  Editing of data  Impose minimal quality standards on the raw data  Field Edit -- preliminary edit, used to detect glaring omissions and inaccuracies (often involves respondent follow up)      Completeness Legibility Comprehensibility Consistency Uniformity Introduction to Data Analysis  Central office edit  More complete and exacting edit   Best performed by a number of editors, each looking at one part of the data Decisions on how to handle item non-response and other omissions need to be made  List-wise deletion (drop for all analyses) vs. case-wise deletion (drop only for present analysis) Introduction to Data Analysis  Coding -- transforming raw data into symbols (usually numbers) for tabulating, counting, and analyzing  Must determine categories     Completely exhaustive Mutually exclusive Assign numbers to categories Make sure to code an ID number for each completed instrument Introduction to Data Analysis  Tabulation -- counting the number of cases that fall into each category   Initial tabulations should be preformed for each item One-way tabulations     Determines degree of item non-response Locates errors Locates outliers Determines the data distribution Preliminary Data Analysis  Tabulation   Simple Counts For example    74 families in the study own 1 car 2 families own 3 Missing data (9)   1 Family did not report Not useful for further analysis Number of Cars 1 Number of Families 75 2 23 3 9 2 1 Total 101 Preliminary Data Analysis  Tabulation    Compute Percentages Eliminate non-responses Note – Report without missing data Number of Cars 1 Number of Families 75% 2 23% 3 Total 2% 100 Preliminary Data Analysis  Cross Tabulation  Simultaneous count of two or more items   Note marginal totals are equal to frequency totals Allows researcher to determine if a relationship exists between two variables   Used a final analysis step in majority of real-world applications Investigates the relationship between two ordinal-scaled variables Number of Cars Lower Income Higher Income 1 48 27 75 2 or More 6 19 25 Total 54 46 Total 100 Preliminary Data Analysis  Cross Tabulation  To analyze the data   Calculate percentages in the direction of the “causal variable” Does number of cars “cause” income level? Lower Income Higher Income Total 1 64% 36% 100% 2 or More 24% 76% 100% Total 54% 46% 100% Num ber of Cars Preliminary Data Analysis  Cross Tabulation  To analyze the data    Does income level “cause” number of cars? Seem like this is the case. In the direction of income – thus, income marginal totals should be 100% Lower Income Higher Income 1 89% 59% 75% 2 or More 11% 41% 25% Num ber of Cars Total Total 100% 100% 100% Preliminary Data Analysis  Cross Tabulation allows the development of hypotheses  Develop by comparing percentages across    Lower income more likely to have one car (89%) than the higher income group (59%) Higher income more likely to have multiple cars (41%) than the lower income group (11%) Are results statistically significant?  To test must employ chi-square analysis Preliminary Data Analysis  Chi-square analysis  Tests the hypothesis that two or more nominallyscaled variables are NOT independent    Null hypothesis (HO) is that the variables are independent (i.e., no relationship exists) Alternative hypothesis (HA) is that a statistical relationship exists among the variables Present example   HO: Income level will have no affect on the number of cars that a family owns HA: Income level will affect the number of cars that a family owns Preliminary Data Analysis  Chi-square analysis  General Approach    Based on “marginal totals” compute the expected values per cell Compare expected values to actual values to compute chi-square value (C2) Compare computed C2 to critical C2  Table 4 on p. 442 in text Num ber of Cars Lower Income Higher Income Total 1 75 2 or More 25 Total 54 46 100 Preliminary Data Analysis  Chi-square analysis  Compute Expected Values     E1 = (75 * 54)/100 E1 = 40.5 E2 = (75 * 46)/100 E2 = 34.5  Note E1 + E2 = 75  E3 = ? E4 = ?  Lower Income Higher Income 1 E1 E2 75 2 or More E3 E4 25 Total 54 46 100 Num ber of Cars Total Preliminary Data Analysis        Compute C2 value Cell Oi C2 = S (Oi – Ei)2/Ei Computed C2 = 12.08 E1 df = (rows - 1) x (cols. - 1) = 1 x 1 =1 a = .05 Critical C2 = 3.84 12.08 > 3.84: Reject the Null Hypothesis (reject if Computed > Critical) Ei Oi - Ei (Oi – Ei)2 (Oi – Ei)2/Ei 48 40.5 7.5 56.25 1.39 E2 27 34.5 -7.5 56.25 1.63 E3 6 13.5 -7.5 56.25 4.17 E4 19 11.5 7.5 56.25 4.89 C2 12.08 S Preliminary Data Analysis  Conclusion  Income has an influence on number of cars in a family

Preliminary Data Analysis

Related documents

Products

Support

Preliminary Data Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib