chap015 - courses.psu.edu

1995 7888 4320 000 000001 00023
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
15
15-2
Coding, Editing, and
Preparing Data for
Analysis
1234 0001 897251 00000
1995 7888 4320 000 000001 00023
C
H
A
P
T
E
R
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Value of Preparing Data for Analysis
1995 7888 4320 000 000001 00023
 Validating, editing, and coding information captured
from respondents is a necessary step along the road to
providing decision-makers with information they can use
to address market opportunities and business problems.
 Entering, “combing” or “cleaning,” and tabulating data
is a complex, though fascinating process where the
raw material collected via the research endeavor is
packaged into a format ready for use by decisionmakers.
15-3
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Essentials of Data Validation
1995 7888 4320 000 000001 00023





Data, when “validated” by a research team
covers the following five areas of concern:
Fraud
Screening
Procedure
Completeness
Courtesy
15-4
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Essentials of Data Editing
1995 7888 4320 000 000001 00023
When data is “edited” by a research team
focuses on the following four questions:
1) Have the answers been asked properly?
2) Have the answers been recorded precisely?
3) Have only qualified respondents been included
in the sample?
4) Have all open-end responses been consolidated?
15-5
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Editing
1995 7888 4320 000 000001 00023
 Carefully checking survey data for completeness, legibility, consistency,
and accuracy.
 Most important purpose is to eliminate or at least reduce the number of
errors in the raw data.
 Two Forms of Error in Raw Survey Data
 Interviewer Error
 Respondent Error
 Two Major Types of Editing
 Field Editing
 Office Editing
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Two Major Types of Editing
1995 7888 4320 000 000001 00023
 Field Editing: Editing done on personal interviews, mallintercept, and telephone surveys as the data collection takes place.
It must occur the same day data gathering occurs.
 Office Editing: Editing done at a central location by an office
staff after all data collection is finished. It occurs after
considerable time has elapsed.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Response Problems and Solutions
1995 7888 4320 000 000001 00023
Potential Problems
 Wrong Informant
 Return to Sender
 Illegible Writing
 Incomplete Responses
 Damaged Measuring Instrument
 Apparently Confused Respondent
 Lack of Variance Among Responses
 Lack of Consistency Among Responses
 Late Responses
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Response Problems and Solutions –
cont’d
1995 7888 4320 000 000001 00023
Potential Solutions
 Follow-up Interviews
 Where responses are incomplete or the form was incorrectly
filled out, the researchers may send the respondent another form
or reinterview the respondent if time permits
 Offer a “no valid response” option
 Eliminate all unacceptable surveys
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Ways To Perform Editing
1995 7888 4320 000 000001 00023
 Personal Editing: Editing performed by a person.
 Computer Editing: Editing performed by a computer.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Coding
1995 7888 4320 000 000001 00023
 The process of systematically and consistently assigning each
response a numerical score.
 The key to a good coding system is for the coding categories to
be mutually exclusive and the entire system to be collectively
exhaustive.
 To be mutually exclusive, every response must fit into only one
category.
 To be collectively exhaustive, all possible responses must fit into
one of the categories.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Coding – cont’d
1995 7888 4320 000 000001 00023
 Coding Missing Numbers: When respondents fail to complete portions
of the survey. Whatever the reason for incomplete surveys, researchers
must indicate to the computer that there was no response provided by
the respondent.
 Coding Open-Ended Questions: When open-ended questions are used,
researchers must create categories. All responses must fit into a
category, once all responses have been returned. Furthermore, similar
responses should fall into the same category.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Coding – cont’d
1995 7888 4320 000 000001 00023
 Precoded Questionnaires: Sometimes researchers place codes
on the actual questionnaire, which simplifies data entry.
 There are Two Sets of Codes:
 One set codes individual responses.
 The second set of codes is for individual questions.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Coding – cont’d
1995 7888 4320 000 000001 00023
 Codebook: Contains the instructions for the people who code
survey data. It is the blueprint for proper data coding.
 The codebook typically includes:





Column Number
Variable Number
Variable Name
Question Number
Coding Instructions
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Entering Data
1995 7888 4320 000 000001 00023
 If data entry is not instantaneous, then data-entry operators (or keyboard
operators) are needed to input survey data into the computer.
 Problems can occur during data entry tasks, such as transposing numbers and
inputting an infeasible code. It is a good idea to have someone check the dataentry operator’s work.
 Optical-Scanning Devices: Are data-processing machines that electronically
read survey answers that are in a prescribed form, such as numbers, codes, or
words.
 With rapidly advancing technologies, data entry will become ever more
streamlined.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Data Tabulation
1995 7888 4320 000 000001 00023
 Tabulation: The organized arrangement of data in a table format that is easy
for the researcher to read and understand.
 Researchers tabulate the data to count the number of responses to each
question.
 Simple Tabulation: The tabulating of results of only one variable to inform
the researcher how often each response was given.
 Cross Tabulation: A statistical technique that involves tabulating the results of
two or more variables simultaneously to inform the researcher how often each
response was given.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Reviewing Tabulations
1995 7888 4320 000 000001 00023
Researchers need to review the study’s tabulations to determine
whether or not the data contains any additional mistakes before they
begin running statistical tests. This may be partially accomplished
by running frequency distributions.
Frequency Distribution: A distribution of data that summarizes the
number of times a certain value of a variable occurs and is expressed
in terms of percentages.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Essentials of Data Coding
1995 7888 4320 000 000001 00023
Data, when “coded” by a research team involves
assigning a “value” (normally a number – e.g. “1”
or “2”) to the responses to each question
contained in the survey.
AN EXAMPLE:
The two responses which follow a question such
as: “What is your gender?” would have a “1”
assigned to the category “Female” and “2”
assigned to the category “Male”.
15-6
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
How to Handle Open-Ended Questions
There are four stages to coding open-end questions:
1995 7888 4320 000 000001 00023
1) Brainstorm a list of possible responses and create a list.
Assign a value to each of the responses.
2) Consolidate the responses into response category which
exhibit shared meaning.
3) Assign values to data which has been captured by the
survey instrument, as well as data which has been
omitted by the respondent.
4) Assign a coded value to each response.
15-7
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Master Code Form: An Example
1995 7888 4320 000 000001 00023
FAST-FOOD OPINION SURVEY
This questionnaire pertains to a project being conducted by a marketing research class at The
University of Memphis. The purpose of this project is to better understand the attitudes and opinions of
consumers toward fast-food restaurants. The questionnaire will take only 10-15 minutes to complete,
and all responses will remain strictly confidential. Thank you for your help on this project.
1.
Below is a listing of various fast-food restaurants. How many of these restaurants would
you say you visited in the past two months? Check as many as may apply.
Taco Bell
01
Church’s Fried Chicken
08
Hardee’s
02
McDonald’s
09
Kentucky Fried Chicken
03
Burger King
10
Wendy’s
04
Back Yard Burgers
11 √
Rally’s
05
Arby’s
12
Popeye’s Chicken
06
Sonic
13
Krystal’s
07
Other, please specify
15-8a
Have not visited any of these
establishments
See code sheet
20
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Master Code Form: An Example
1995 7888 4320 000 000001 00023
2. In a typical month, how many times would you say you visit
a fast-food restaurant, such as the ones indicated above?
(X ONE BOX)
One 
Two 
√
Three 
Four 
Five 
Six 
Seven or more 
1
2
3
4
5
6
7
15-8b
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Master Code Form: An Example
1995 7888 4320 000 000001 00023
3. On your last visit to a fast-food restaurant, what was the
dollar amount you spent on food and beverages?
Under $2
 1
$8.01-$10.00
 5
$2.01-$4.00
 2
$10.01-$12.00
 6
$4.01-$6.00
√ 3

More than $12
 7
$6.01-$8.00
 4
Don’t Remember
 8
15-8c
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Essentials of Data Entry
1995 7888 4320 000 000001 00023
When data is “entered” by a research it’s normally done
in any one or a combination of the following four
ways:
1)
2)
3)
4)
Key driven devices like a computer (PC).
Touch Screen Technology.
Light Pens.
Scanning Technology e.g. Bubble Shop
15-9
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Data Tabulation: One-Way Tabulation
1995 7888 4320 000 000001 00023
When a research team performs a “one-way”
tabulation they focus on a single variable
operating in the research study.
15-10
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Frequency Distribution: An Example
1995 7888 4320 000 000001 00023
5. In the past TWO WEEKS, which fast-food restaurants in your area
have you had food or beverage from?
(27) (DO NOT READ-MULTIPLE RESPONSE)
Frequency
Percentage
1. Andy’s
3
.7
2. Arby’s
25
6.2
3. Back Yard Burgers
26
6.4
4. Burger King
48
11.9
3
.7
6. Hardee’s
22
5.4
7. Kentucky Fried Chicken
39
9.7
135
33.4
5. Church’s Fried Chicken
8. McDonald’s
15-11a
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Frequency Distribution: An Example
1995 7888 4320 000 000001 00023
Frequency
Percentage
9. Sonic
46
11.4
10. Subway
14
3.5
11. Taco Bell
67
16.6
12. Wendy’s
84
20.8
13. Other
43
10.6
14. Refused
1
.2
15. Don’t know
6
1.5
16. None
52
12.9
17. Pizza Hut
21
5.2
18. Rally’s
14
3.5
9
2.2
404
100
19. Captain D’s
Total qualified
15-11b
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Data Tabulation: Cross-Tabulation
1995 7888 4320 000 000001 00023
When a research team performs a “crosstabulation” they focus on two or more variables
contained in questions in the research study.
15-12
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Cross-Tabulation: An Example
Q2 Visits per Month by Q1520 Gender
Q1520
1995 7888 4320 000 000001 00023
Count
Page 1 of 1
Female
1
Male
2
1
1
Row
Total
Q2
0
2
None
.5
1
27
23
50
One
11.8
2
25
33
58
Two
13.6
3
16
25
41
Three
9.6
4
38
32
70
Four
16.5
5
12
14
26
Five
6.1
6
19
15
34
Six
8.0
7
70
74
144
Seven or more
15-13
Column
208
Total
48.9
Number of Missing Observations: 18
33.9
217
51.1
425
100.0
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Summary of Learning Objectives
1995 7888 4320 000 000001 00023
 Illustrate the process of preparing data for preliminary
analysis.
 Demonstrate the procedure for assuring data validation.
 Illustrate the process of editing and coding data obtained
through survey methods.
 Acquaint the user with data entry procedures.
 Illustrate a process for detecting errors in data entry.
 Discuss techniques used for data tabulation and data
analysis.
15-15
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.