Data Preparation and Description Data Preparation: Introduction • Once the data begin to flow, a researcher’s attention turns to data analysis. • Data preparation includes editing, coding, and data entry; – It is the activity that ensures the accuracy of the data and their conversion from raw form to reduced and classified forms that are more appropriate for analysis. Data Preparation: Introduction • Preparing a descriptive statistical summary is another preliminary step leading to an understanding of the collected data; – It is during this step that data entry errors may be revealed and corrected. Data Preparation: Editing • The customary first step in analysis is to edit the raw data. • Editing detects errors and omissions, corrects them when possible, and certifies that maximum data quality standards are achieved. Data Preparation: Editing • The editor’s purpose is to guarantee that data are: – Accurate; – Consistent with the intent of the question and their information in the survey; – Uniformly entered; – Complete; and – Arranged to simplify coding and tabulation. Data Preparation: Editing • In the following question asked of adults aged 18 or older, one respondent checked two categories, indicating that he was a retired officer and currently serving on active duty. – Please indicate your current military status: • • • • • • Active duty Reserve Retired National Guard Separated Never served in the army Data Preparation: Editing • The editor’s responsibility is to decide which of the responses is both – consistent with the intent of the question or other information in the survey, and – most accurate for this individual participant. Data Preparation: Editing Two types of editing are field editing and central editing. • Field Editing: In large projects, field editing review is the responsibility of the field supervisor; – When entry gaps are present from interviews, a callback should be made rather than guessing what the respondent “probably would have said”. – Self-interviewing has no place in quality research. – Validating the field research is the control function of the supervisor. • It means he or she will reinterview some percentage of the respondents to make sure they have participated. • Many research firms will recontact about 10 percent of the respondents in this process of data validation. Data Preparation: Editing • Central Editing: For a small study, the use of a single editor produces maximum consistency. In large studies, editing tasks should be allocated so that each editor deals with one entire section. – When replies are inappropriate or missing, the editor can sometimes detect the proper answer by reviewing the other information in the data set. • It may be better to contact the respondent for correct information, if time and budget allow. • Another alternative is for the editor to strike out the answer if it is inappropriate. Here an editing entry of “no answer” is called for. – Another problem that editing can detect concerns faking an interview that never took place. • This “armchair interviewing” is difficult to spot, but the editor is in the best position to do so. • One approach is to check responses to open-ended questions. These are most difficult to fake. Distinctive response patterns in other questions will often emerge if data falsification is occurring. To uncover this, the editor must analyze the set of instruments used by each interviewer. Data Preparation: Coding • Coding involves assigning numbers or other symbols to answers so that the responses can be grouped into a limited number of categories. • In coding, categories are the partitions of a data set of a given variable. For example, if the variable is gender, the partitions are male and female. • Categorization is the process of using rules to partition a body of data. • Both closed and free-response questions must be coded. Data Preparation: Coding • The categorization of data sacrifices some data detail but is necessary for efficient analysis. • Most software programs work more efficiently in the numeric mode; – Instead of entering the word male or female in response to a question that asks for the identification of one’s gender, we would use numeric codes, e.g., 0 for male and 1 for female • Numeric coding simplifies the researcher’s task in converting a nominal variable, like gender, to a “dummy variable” Data Preparation: Missing Data • In survey studies, missing data typically occur when participants accidentally skip, refuse to answer, or do not know the answer to an item on the questionnaire. • In longitudinal studies, missing data may result from participants dropping out of the study, or being absent for one or more data collection periods. • Missing data also occur due to researcher error, corrupted data files, and changes in the research or instrument design after data were collected from some participants, such as when variables are dropped or added. Data Preparation: Missing Data • The strategy for handling missing data consists of two-step process: – the researcher first explores the pattern of missing data to determine the mechanism for missingness (the probability that a value is missing rather than observed), and – then selects a missing-data technique. The three basic types of techniques which can be used to salvage data sets with missing values are: • Listwise deletion • Pairwise deletion • Replacement of missing values with estimated scores Data Preparation: Data Entry • Data entry converts information gathered by secondary or primary methods to a medium for reviewing and manipulation. • Keyboarding remains a mainstay for researchers who need to create a data file immediately and store it in a minimal space on a variety of media. • However, researchers have profited from more efficient ways of speeding up the research process, especially from bar coding and optical character and mark recognition. Data Preparation: Data Entry • Keyboarding: A full screen editor, where an entire data file can be edited or browsed, is a viable means of data entry for statistical packages like SPSS or SAS. – SPSS offers several data entry products, including Data Entry Builder which enables the development of forms and surveys, and Data Entry Station which gives centralized entry staff, such as telephone interviews or online participants, access to the survey. – Both SAS and SPSS offer software that effortless accesses data from databases, spreadsheets, data warehouses, or data marts. Data Preparation: Data Entry • Bar-code technology is used to simplify the interviewer’s role as a data recorder. When an interviewer passes a bar-code over the appropriate codes, the data are recorded in a small, lightweight unit for translation later • Researchers studying magazine readership can scan bar codes to denote a magazine cover that is recognized by an interview participant. Data Preparation: Data Entry • Optical Character Recognition (OCR): – Users of a PC image scanner are familiar with OCR programs which transfer printed text into computer files in order to edit and use it without retyping. • Optical scanning of instruments is efficient for researchers. – Optical scanners process the marked-sensed questionnaires and store the answers in a file. – This method has been adopted by researchers for data entry and preprocessing due to its faster speed, cost savings on data entry, convenience in charting and reporting data, and improved accuracy. – It reduces the number of times data are handed, thereby reducing the number of errors that are introduced.