17 datapreparation

advertisement

Business Research Methods

14-1

Data Preparation

Data Preparation

• Data preparation refers to the process of ensuring the accuracy of the data and their conversion from raw form into classified forms appropriate for analysis.

14-2

Data Preparation Process

Validation

Editing

Coding

Data Entry

Data Cleaning

Tabulation & Analysis

14-3

14-4

Questionnaire Checking

A questionnaire returned from the field may be unacceptable for several reasons.

– Parts of the questionnaire may be incomplete .

– The pattern of responses may indicate that the respondent did not understand or follow the instructions .

– The responses show little variance .

– One or more pages are missing .

– The questionnaire is received after the preestablished cutoff date .

– The questionnaire is answered by someone who does not qualify for participation .

– Validation & Editing help in preparing data for data entry

Validating

• It is the process of ascertaining whether the interviews conducted complied with specified norms

• It helps in detecting any fraud or failure by interviewer to follow specified instructions

• Questionnaire has a separate place to record respondent’s name, address, telephone number

& other demographic details & date of interview

• It is the basis for “validation” to confirm if the interview was really conducted

14-5

Editing

14-6

• It is the process of checking for mistakes by interviewer or respondent in filling the questionnaire

• It is a manual process which is generally done twice, first by the service firm which conducted interviews & second by the market research firm

• The first check is generally done by the field supervisors in the field itself

• Problems to be checked in editing involve

- Finding out whether the interviewer has followed skip pattern

- Whether responses to open ended questions have been properly obtained

Editing

14-7

During editing some illegible, incomplete, inconsistent or ambiguous responses are found which are called unsatisfactory responses .

• Treatment of Unsatisfactory Results

– Returning to the Field – The questionnaires with unsatisfactory responses may be returned to the field, where the interviewers recontact the respondents.

– Assigning Missing Values – If returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses.

– Discarding Unsatisfactory Respondents – In this approach, the respondents with unsatisfactory responses are simply discarded.

14-8

Coding

• Coding : It is the process of assigning a symbol, usually a number, to each possible response to each question.

• Coding is necessary for efficient data analysis

• Categorization of responses to be done for the purpose of coding should be:

-Appropriate :If income is important variable wider income classification may not be appropriate

-Exhaustive :Should list all possible alternatives

-Mutually Exclusive : Responses should not fit into more than one category

.

Coding

• Coding closed ended questions is easy since there are a definite number of predetermined

14-9 responses

• Closed ended questions are generally precoded & hence intermediate step of framing the codes prior to data entry can be avoided

• Coding the data from open ended questions is much more difficult as responses are unlimited

& vary

14-10

Coding

• Guidelines for coding unstructured questions :

• Category codes should be mutually exclusive and collectively exhaustive.

• Only a few (10% or less) of the responses should fall into the “other” category.

• Category codes should be assigned for critical issues even if no one has mentioned them.

• Data should be coded to retain as much detail as possible.

14-11

Content Analysis for open ended questions

• Qualitative technique used to analyze text provided in the response category of open ended questions

• It systematically & objectively derives categories of responses that represents homogeneous thoughts & opinions

• It identifies responses particularly relevant to the survey

• It requires the researcher to name categories through a detailed examination of data ( as against pre-coding)

• It is an iterative interpretation process of first reading the responses, then rereading them to establish meaningful categories

• The number & meaning of categories are further refined so that it is most representative of the respondents’ text

• Each response is classified into as many categories as necessary to capture full picture

• Responses out of context of the question are not coded

Codebook

A codebook contains coding rules and the necessary information about each variable in the survey. A codebook generally contains the following information

• question number ---(3)

• variable number ----(4)

• variable name ----(Brand)

• instructions for coding--- 1=Amul

2=Cadbury

3=Nestle

14-12

Coding Don’t Knows

14-13

• Don’t know is included in possible answers

• Respondents choose this either because they genuinely don’t know or because they don’t want to answer

• A considerable number of DK responses may be generated for some questions

• Researcher can either ignore them or allocate the frequency to all other responses in the ratio they occur

• How many chocolates you eat in a typical week?

300(<20):200(>20):50(DK)

• 330(<20):220(>20)

14-14

Data Entry or Transcribing

• Data entry involves transferring coded data from questionnaires or coding sheets into computers through keypunching

• Data collected through CATI or CAPI are entered directly into computer

• Besides keypunching data can be transferred using optical scanning, mark sense forms or computerised sensory analysis

• Optical scanners can read responses on questionnaires. They can read darkened small circles & process marked answers .Used in correction of papers in competitive exams. Transcription of UPC data at checkout counters in supermarkets

• Mark sense forms require responses to be recorded with special pencil in a predestinated area coded for that response. The data can then be read by a machine

• Computerised sensory analysis automate data collection process. Questions appear on a computerised gridpad & responses are recorded directly into computer using a sensing device

Data Cleaning

• Data cleaning is undertaken after data entry & includes

---consistency checks

----treatment of missing values

• Compared to preliminary consistency checks during editing ,checking at this stage is more thorough & extensive as it uses computers

14-15

Data Cleaning

Consistency Checks

Consistency checks identify data that are out of range, logically inconsistent, or have extreme values.

– Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to identify out-of-range values for each variable and print out the respondent code, variable code, variable name, record number, column number, and out-of-range value.

– Extreme values should be closely examined.

14-16

Data Cleaning

Treatment of Missing Responses

• Substitute a Neutral Value – A neutral value, typically the mean response to the variable, is substituted for the missing responses.

• Substitute an Imputed Response – The respondents' pattern of responses to other questions are used to impute or calculate a suitable response to the missing questions.

• In casewise deletion , cases, or respondents, with any missing responses are discarded from the analysis.

• In pairwise deletion , instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for each calculation.

• After data cleaning computer data file is deemed clean &ready for analysis

14-17

Download