Uploaded by Angeline Tan

Data Collection & Validation in Business Research

advertisement
Topic 7
1
DATA COLLECTION & VALIDATION
Copyright 2022 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC.
7-1
Learning Objectives
Understand . . .
• The institutional approval process
• The procedures of data collection.
• The tasks involved in data collection
• The importance of editing raw data to assure it is complete,
accurate, and correctly coded.
©McGraw Hill
7-2
Institutional Approval
 Nonhuman animals as research participants
Approval from the Institutional Animal Care and Use
Committee (IACUC)
 Humans as research participants
 Approval from the Institutional Review Board (IRB)
 Researchers must prepare a research protocol
 IACUC or the IRB must determine if research study is
ethically acceptable
 If the study procedures conform to acceptable practices
 IACUC or the IRB will approve the study
 then proceed with data collection

©McGraw Hill
7-3
Pilot Study (1 of 2)
 Before conducting a study, it is strongly recommended that you
conduct a pilot study
 Pilot study - a preliminary study that is conducted on a few
participants prior to the actual research study
 Can provide a great deal of information
 If the instructions are not clear
 If the IV manipulation produced the intended effect
 Sensitivity of the dependent variable can be checked
 Gives the researcher experience with the procedure
©McGraw Hill
7-4
Pilot Study (2 of 2)
 Conducting an internet-based study
Complete the online study tasks yourself
 Have a few pilot participants complete the tasks
 Find out whether the study works properly in your browser
 If the data are returned to you in a manner that is
understandable
 If a problem is not detected until after the data have been
collected
 It might have had an influence on the results of the study
 If changes are made to the study after receiving IRB approval,
the IRB must approve the intended changes.

©McGraw Hill
7-5
Collect the Data
1
Train the data collectors
Determine the data collection timeline
Implement research task(s) process(es)
Invite chosen participants
Activate the research tasks
Remind participant to complete research task(s)
Enter the data
©McGraw Hill
7-6
Collect the Data
1
Train the data collectors
Determine the data collection timeline
• The dates and times for training of data collectors (if applicable).
• The activation dates and times (start and stop times) of each
data collection task.
• When data entry starts and is expected to finish, to include both
automated and manual entry.
• When data editing starts and is expected to finish.
• When the clean data file will be ready for processing.
©McGraw Hill
7-7
Collect the Data
1
Train the data collectors
Determine the data collection timeline
Implement research task(s) process(es)
 Instrument disposition includes the processes(s) by which the
measurement instrument for each research task is distributed to
the participant and the output of each completed research task is
returned to the researcher in charge of the study.
 These processes sometimes need instructions.
©McGraw Hill
7-8
Collect the Data
1
Train the data collectors
Determine the data collection timeline
Implement research task(s) process(es)
Invite chosen participants
©McGraw Hill
7-9
Collect the Data
1
Train the data collectors
Determine the data collection timeline
Implement research task(s) process(es)
Invite chosen participants
Activate the research tasks
• Survey activation is the decision that launches the survey; it
indicates the researcher has addressed all known
measurement instrument problems and the process is as
error-free as he or she can make it.
©McGraw Hill
7-10
Collect the Data
1
Train the data collectors
Determine the data collection timeline
Implement research task(s) process(es)
Invite chosen participants
Activate the research tasks
Remind participant to complete research task(s)
 Reminders are a key task of survey research. Rarely does a
participant take complete a measurement instrument when it is
first presented.
 Reminders often use email or phone contact. Multiple reminders
are possible.
©McGraw Hill
7-11
Collect the Data
1
Train the data collectors
Determine the data collection timeline
Implement research task(s) process(es)
Invite chosen participants
Activate the research tasks
Remind participant to complete research task(s)
Enter the data
• Data entry is a set of processes that including coding and data
file creation. It converts data gathered into a medium for data
viewing and analysis.
©McGraw Hill
7-12
Missing Data
 Missing data are information from a participant or case that is not
available for one or more variables of interest.
 Missing data typically occur in surveys
• when respondents accidentally skip, refuse to answer, or do not
know the answer to an item on the questionnaire.
• when researcher error corrupts data files.
• There are three basic types of missing data:
• Data missing completely at random (MCAR)
• Data missing at random (MAR)
• Data missing but not missing at random (NMAR)
©McGraw Hill
7-13
Missing Data Correction Techniques
 There are three basic techniques for dealing with missing data:
1)
2)
3)
©McGraw Hill
listwise deletion: cases with missing data on one variable
are excluded from the sample for all analyses of that
variable.
pair-wise deletion: missing data are estimated using all
cases that have data for each variable or pair of variables;
the estimation replaces the missing data.
Predictive replacement (replacement of missing values
with estimated scores): missing data are predicted from
observed values on another variable; the observed value is
used to replace the missing data.
7-14
Data Preparation
Data preparation includes two tasks:
 editing data: The first step in data preparation is to edit the raw data.
Editing detects errors and omissions, corrects them when possible, and
certifies the maximum data quality standards are achieved. The purpose
is to guarantee that data are accurate, complete and appropriately
coded.
 post-collection coding of data
©McGraw Hill
7-15
Field Editing
 In large field projects with multiple data collectors, editing is the
responsibility of the field supervisor. It should be done soon after the
data have been collected.
 During the stress of data collection, data collectors often use ad hoc
abbreviations and special symbols. If the forms are not completed
quickly, the field interviewer may not recall what the participant said.
Therefore, reporting forms should be reviewed regularly. When entry
gaps are present, a callback should be made rather than guessing what
the respondent probably said.
 The field supervisor executes data validation by re-interviewing some
percentage of the respondents on some questions to verify that they
have participated. Ten percent is the typical amount used in data
validation.
 Data validation is a process that attempts to verify that research
protocols to avoid data errors were followed and that data are real by
identifying fake or inaccurate data.
©McGraw Hill
7-16
Coding Data
 Data coding means systematically reorganizing raw data into a
format that statistics software on computers can use.
 The coding procedure is a set of rules stating that you will
assign certain numbers to variable attributes.
 A codebook is a document describing the coding procedure and
the computer file location of data for variables in a specific
format.
 Researchers should begin to think about a coding procedure and
codebook before collecting any data.
 Many researchers pre-code a questionnaire before collecting
any data.
 If precoding is not done, the first step after collecting data is
to create a codebook.
©McGraw Hill
7-17
Precoding
 Precoding means assigning codebook codes to variables in a
study and recording them on the questionnaire.
 With a pre-coded instrument, the codes for variable categories
are accessible directly from the questionnaire.
©McGraw Hill
7-18
Partial Coding Scheme
2
Variable ID
Location
Variable Label
Response Codes
Variable Type
A1
1
Interviewer Number
Assigned
Nominal
A2
2
Participant ID
Assigned
Nominal
Q5
10
Evaluation of Current Policy
1=Excellent, 2=Good,
3=Fair, 4=Poor
Ordinal
Q6a
11
Reason for Purchase-Bought Home
1=Yes, 2=No
Nominal
Q6b
12
Reason for Purchase-Birth of Child
1=Yes, 2=No
Nominal
Q6c
13
Reason for Purchase-Death of Relative
1=Yes, 2=No
Nominal
Q6d
14
Reason for Purchase-Promoted
1=Yes, 2=No
Nominal
Q6e
15
Reason for Purchase-Changed Job/Carrier
1=Yes, 2=No
Nominal
C1
30
Gender
1=Male, 2=Female,
3=Other, 9=Missing
Ordinal
C2
31
Marital Status
1=Married, 2=Widowed,
3=Divorced, 4=Separated,
5=Never Married
Nominal
C3
32
Housing Ownership
1=Own, 2=Rent, 3=Other,
9=Missing
Nominal
C4
33
Birth Year
2 Digits
Nominal
C5
34
5-Digit Zipcode
5-Digit Code,
99999=Missing
Nominal
©McGraw Hill
7-19
Post-Coding Open-Ended Questions
 For open-ended questions, researchers are forced to categorize
responses after the data area collected.
 This question illustrates the use of an open-ended question. After
preliminary evaluation, response categories were created for that item.
These could be seen in the overall coding scheme.
6. What prompted you to purchase
your most recent life insurance
policy?
_______________________________
_______________________________
_______________________________
_______________________________
_______________________________
_______________________________
_______________________________
©McGraw Hill
7-20
Appropriately Coded for Analysis
Categories
should be…
©McGraw Hill
Exhaustive
Appropriate to the
research problem
Mutually exclusive
Derived from one
classification principle
7-21
Handling “Don’t Know” Responses
 When the number of “don’t know” (DK) responses is low, it is not a
problem. However, if this option is high, it may mean that the question
was poorly designed, too sensitive, or too challenging for the
respondent.
 The best way to deal with undesired DK answers is to design better
questions at the beginning.
 Example: Do you have a productive relationship with your present
salesperson? The word “productive” may cause confusion. As there is
no operational definition provided for the word, they may choose the
DK option as an alternative way to express their confusion.
 If “productive” were defined for the participant, and we still had a high
percentage of DK responses, then there might be another underlying
problem in their DK choice.
 If DK response is legitimate, it should be kept as a separate reply
category.
©McGraw Hill
7-22
Recoding
 We may find unexpected patterns in our preliminary examination
of the data, that may require us to recode variables to better
reflect the responses.
 Recoding variables involves developing new mapping rules and
assigning new codes based on the merging of initial variable
categories.
 If you recode this 5-point scale to a 3-point scale (agree, neutral,
disagree) you will not reduce the data level or its statistical
operations. But if you reduced an interval scale to ordinal, or a
ratio scale to interval or ordinal, you would reduce the data level
of the initial variable and change the statistical operations that
could be performed for that variable.
©McGraw Hill
7-23
Cleaning Data
 Errors made when coding or entering the data into a computer
threaten the validity of the measures and cause misleading
results.
 Cleaning the data is the process of checking the accuracy of the
coding of the data.
 Many researchers code 10-15 percent of the data a second time.
If there are no coding errors in the recorded sample, the
researcher can proceed. If not, all the coding must be checked.
©McGraw Hill
7-24
Download