Topic 7 1 DATA COLLECTION & VALIDATION Copyright 2022 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. 7-1 Learning Objectives Understand . . . • The institutional approval process • The procedures of data collection. • The tasks involved in data collection • The importance of editing raw data to assure it is complete, accurate, and correctly coded. ©McGraw Hill 7-2 Institutional Approval Nonhuman animals as research participants Approval from the Institutional Animal Care and Use Committee (IACUC) Humans as research participants Approval from the Institutional Review Board (IRB) Researchers must prepare a research protocol IACUC or the IRB must determine if research study is ethically acceptable If the study procedures conform to acceptable practices IACUC or the IRB will approve the study then proceed with data collection ©McGraw Hill 7-3 Pilot Study (1 of 2) Before conducting a study, it is strongly recommended that you conduct a pilot study Pilot study - a preliminary study that is conducted on a few participants prior to the actual research study Can provide a great deal of information If the instructions are not clear If the IV manipulation produced the intended effect Sensitivity of the dependent variable can be checked Gives the researcher experience with the procedure ©McGraw Hill 7-4 Pilot Study (2 of 2) Conducting an internet-based study Complete the online study tasks yourself Have a few pilot participants complete the tasks Find out whether the study works properly in your browser If the data are returned to you in a manner that is understandable If a problem is not detected until after the data have been collected It might have had an influence on the results of the study If changes are made to the study after receiving IRB approval, the IRB must approve the intended changes. ©McGraw Hill 7-5 Collect the Data 1 Train the data collectors Determine the data collection timeline Implement research task(s) process(es) Invite chosen participants Activate the research tasks Remind participant to complete research task(s) Enter the data ©McGraw Hill 7-6 Collect the Data 1 Train the data collectors Determine the data collection timeline • The dates and times for training of data collectors (if applicable). • The activation dates and times (start and stop times) of each data collection task. • When data entry starts and is expected to finish, to include both automated and manual entry. • When data editing starts and is expected to finish. • When the clean data file will be ready for processing. ©McGraw Hill 7-7 Collect the Data 1 Train the data collectors Determine the data collection timeline Implement research task(s) process(es) Instrument disposition includes the processes(s) by which the measurement instrument for each research task is distributed to the participant and the output of each completed research task is returned to the researcher in charge of the study. These processes sometimes need instructions. ©McGraw Hill 7-8 Collect the Data 1 Train the data collectors Determine the data collection timeline Implement research task(s) process(es) Invite chosen participants ©McGraw Hill 7-9 Collect the Data 1 Train the data collectors Determine the data collection timeline Implement research task(s) process(es) Invite chosen participants Activate the research tasks • Survey activation is the decision that launches the survey; it indicates the researcher has addressed all known measurement instrument problems and the process is as error-free as he or she can make it. ©McGraw Hill 7-10 Collect the Data 1 Train the data collectors Determine the data collection timeline Implement research task(s) process(es) Invite chosen participants Activate the research tasks Remind participant to complete research task(s) Reminders are a key task of survey research. Rarely does a participant take complete a measurement instrument when it is first presented. Reminders often use email or phone contact. Multiple reminders are possible. ©McGraw Hill 7-11 Collect the Data 1 Train the data collectors Determine the data collection timeline Implement research task(s) process(es) Invite chosen participants Activate the research tasks Remind participant to complete research task(s) Enter the data • Data entry is a set of processes that including coding and data file creation. It converts data gathered into a medium for data viewing and analysis. ©McGraw Hill 7-12 Missing Data Missing data are information from a participant or case that is not available for one or more variables of interest. Missing data typically occur in surveys • when respondents accidentally skip, refuse to answer, or do not know the answer to an item on the questionnaire. • when researcher error corrupts data files. • There are three basic types of missing data: • Data missing completely at random (MCAR) • Data missing at random (MAR) • Data missing but not missing at random (NMAR) ©McGraw Hill 7-13 Missing Data Correction Techniques There are three basic techniques for dealing with missing data: 1) 2) 3) ©McGraw Hill listwise deletion: cases with missing data on one variable are excluded from the sample for all analyses of that variable. pair-wise deletion: missing data are estimated using all cases that have data for each variable or pair of variables; the estimation replaces the missing data. Predictive replacement (replacement of missing values with estimated scores): missing data are predicted from observed values on another variable; the observed value is used to replace the missing data. 7-14 Data Preparation Data preparation includes two tasks: editing data: The first step in data preparation is to edit the raw data. Editing detects errors and omissions, corrects them when possible, and certifies the maximum data quality standards are achieved. The purpose is to guarantee that data are accurate, complete and appropriately coded. post-collection coding of data ©McGraw Hill 7-15 Field Editing In large field projects with multiple data collectors, editing is the responsibility of the field supervisor. It should be done soon after the data have been collected. During the stress of data collection, data collectors often use ad hoc abbreviations and special symbols. If the forms are not completed quickly, the field interviewer may not recall what the participant said. Therefore, reporting forms should be reviewed regularly. When entry gaps are present, a callback should be made rather than guessing what the respondent probably said. The field supervisor executes data validation by re-interviewing some percentage of the respondents on some questions to verify that they have participated. Ten percent is the typical amount used in data validation. Data validation is a process that attempts to verify that research protocols to avoid data errors were followed and that data are real by identifying fake or inaccurate data. ©McGraw Hill 7-16 Coding Data Data coding means systematically reorganizing raw data into a format that statistics software on computers can use. The coding procedure is a set of rules stating that you will assign certain numbers to variable attributes. A codebook is a document describing the coding procedure and the computer file location of data for variables in a specific format. Researchers should begin to think about a coding procedure and codebook before collecting any data. Many researchers pre-code a questionnaire before collecting any data. If precoding is not done, the first step after collecting data is to create a codebook. ©McGraw Hill 7-17 Precoding Precoding means assigning codebook codes to variables in a study and recording them on the questionnaire. With a pre-coded instrument, the codes for variable categories are accessible directly from the questionnaire. ©McGraw Hill 7-18 Partial Coding Scheme 2 Variable ID Location Variable Label Response Codes Variable Type A1 1 Interviewer Number Assigned Nominal A2 2 Participant ID Assigned Nominal Q5 10 Evaluation of Current Policy 1=Excellent, 2=Good, 3=Fair, 4=Poor Ordinal Q6a 11 Reason for Purchase-Bought Home 1=Yes, 2=No Nominal Q6b 12 Reason for Purchase-Birth of Child 1=Yes, 2=No Nominal Q6c 13 Reason for Purchase-Death of Relative 1=Yes, 2=No Nominal Q6d 14 Reason for Purchase-Promoted 1=Yes, 2=No Nominal Q6e 15 Reason for Purchase-Changed Job/Carrier 1=Yes, 2=No Nominal C1 30 Gender 1=Male, 2=Female, 3=Other, 9=Missing Ordinal C2 31 Marital Status 1=Married, 2=Widowed, 3=Divorced, 4=Separated, 5=Never Married Nominal C3 32 Housing Ownership 1=Own, 2=Rent, 3=Other, 9=Missing Nominal C4 33 Birth Year 2 Digits Nominal C5 34 5-Digit Zipcode 5-Digit Code, 99999=Missing Nominal ©McGraw Hill 7-19 Post-Coding Open-Ended Questions For open-ended questions, researchers are forced to categorize responses after the data area collected. This question illustrates the use of an open-ended question. After preliminary evaluation, response categories were created for that item. These could be seen in the overall coding scheme. 6. What prompted you to purchase your most recent life insurance policy? _______________________________ _______________________________ _______________________________ _______________________________ _______________________________ _______________________________ _______________________________ ©McGraw Hill 7-20 Appropriately Coded for Analysis Categories should be… ©McGraw Hill Exhaustive Appropriate to the research problem Mutually exclusive Derived from one classification principle 7-21 Handling “Don’t Know” Responses When the number of “don’t know” (DK) responses is low, it is not a problem. However, if this option is high, it may mean that the question was poorly designed, too sensitive, or too challenging for the respondent. The best way to deal with undesired DK answers is to design better questions at the beginning. Example: Do you have a productive relationship with your present salesperson? The word “productive” may cause confusion. As there is no operational definition provided for the word, they may choose the DK option as an alternative way to express their confusion. If “productive” were defined for the participant, and we still had a high percentage of DK responses, then there might be another underlying problem in their DK choice. If DK response is legitimate, it should be kept as a separate reply category. ©McGraw Hill 7-22 Recoding We may find unexpected patterns in our preliminary examination of the data, that may require us to recode variables to better reflect the responses. Recoding variables involves developing new mapping rules and assigning new codes based on the merging of initial variable categories. If you recode this 5-point scale to a 3-point scale (agree, neutral, disagree) you will not reduce the data level or its statistical operations. But if you reduced an interval scale to ordinal, or a ratio scale to interval or ordinal, you would reduce the data level of the initial variable and change the statistical operations that could be performed for that variable. ©McGraw Hill 7-23 Cleaning Data Errors made when coding or entering the data into a computer threaten the validity of the measures and cause misleading results. Cleaning the data is the process of checking the accuracy of the coding of the data. Many researchers code 10-15 percent of the data a second time. If there are no coding errors in the recorded sample, the researcher can proceed. If not, all the coding must be checked. ©McGraw Hill 7-24