Secondary Data Problems and Tips to Handle Them in Research

advertisement
Secondary Data Problems and Tips to Handle
Them in Research and Studies
Don Patrick, Yahoo Contributor Network




Nov 9, 2010
MORE:
Copyright Basics
Quantitative Research
Qualitative Research
FlagPost a comment
Secondary data is not only numbers, but information and facts collected by organizations or
individuals other than the user. Secondary data can be obtained directly from reports
published by government agencies such as Census Bureau from their periodic national
surveys or from published or unpublished research reports, articles, interview reports, field
notes, and books. Presently, there are non-commercial as well as commercial agencies that
provide aggregated secondary data sets directly accessible by the researcher or to buy in the
market in CD-ROM media form. The reasons to use secondary data vary from research to
research. Common reasons are the cost and time saving, wealth of information available on
the topic under research, the purpose of the research and expected accuracy of findings.
Secondary data is useful in research work at the beginning to see the overall picture of a
situation and to define the research problems. The secondary data also could be the sole data
source for some research endeavors. With secondary data, a researcher is able to narrow
down his research question during the initial stages of his research process filtering through
initial hypothesis. Although a researcher essentially should begin with consulting secondary
data sources, he should not assume that they are free from errors and flaws. This article
points to these issues and ways to minimize secondary data problems. The main issues
relating to secondary data are data validity problems, reliability issues, trustworthiness of
data and information, and data source bias. A procedural, but much easier issue in secondary
data research is the copy right issue, which will be discussed in the last section of this article.
Validity Problems
The validity issue of secondary data is very prominent, because it raises questions on validity
of the conclusions that are drawn from data. Validity problem in secondary data research
arises when the definitions of a situation by the original data collector or organization do not
match with that of the theoretical definition of the secondary data user. Two important
validity issues are discussed in this article. They are the construct validity, and the content
validity of secondary data and information. These concepts are discussed with examples, and
a few tips are given to show how they could be handled by secondary data users.
Construct validity
The term construct can be defined as a property that explains the facets of a situation, event
or a behavior. The way researcher understands a situation initially, and expresses his/her
idea affects the way he/she designs methods to capture the details of the situation. Construct
validity seeks agreement between concepts expressed by the researcher (constructs) and
specific measuring devices or procedures adopted in a research. In other words, construct
validity is an assessment of how well the researcher converts his ideas or initial thoughts into
actual programs or research measures, and the extent to which the tests or scales sufficiently
assess the theoretical construct as the original researcher assumes. The following examples
clarify the construct validity issue. In the first two examples are to show the differences in
construct definition by the secondary data user comparative to the original researcher or the
agency.
a) A secondary data researcher defines work injury as minor cuts, bruises, and sprains on the
job while the respective government/private agency responsible for data collection may
define it as a physical injury that would need visit to a physician. The data generated through
the latter definition excludes many injured worker statistics, and perhaps come to biased
conclusions.
b) Census and Statistics Departments defines unemployed as individuals actively seeking for
work in a defined period, while a researcher using secondary data may define it including
individuals not seeking work as jobs are not available, or waiting because suitable jobs are
not available in the market.
In the following example the original researcher assumes that his experimental method
rightly captures all facets of the constructs in his study.
c) In 1961, Albert Bandura of Stanford University defined all human behavior as something
learned, through social imitation and copying, rather than inherited through genetic factors.
He developed a testing method called Bobo Doll experiment which was performed on
children to observe their learned behavior. If a researcher wants to study human behavior
and want to use information from Bandura's original experiment, he will have to question the
definition of the construct as well as the testing methods and check whether the test method
adopted measures exactly what Bandura claimed to measure in his experiment. This is what
exactly happened when his theory was challenged, and raised construct validity issues by
later researchers.
Also validity issue can occur when a secondary data user has to develop proxy (indirect)
variables to capture his construct using data from secondary sources.For example, if a
researcher wants to study about household violence and has to depend on statistics collected
by the Police Department where the Department of Police collects these data only based on
reported incidents. It is well known that most of the household violence incidents are not
reported for various reasons. Hence, the secondary data user only partially captures the
reality in his proxy variables.
The idea of construct validity extends more than the arguments raised here. Particularly, the
validity issue in qualitative research is much more complex than in quantitative approaches.
Further discussion on this issue, however, is beyond the scope of this article.
Prior to using data, the secondary data user should at least check the following.
a) Should look into the definition of a particular construct and decide whether the scope of
the definition is overlapping correctly with the definition of the data user.
b) Should read the information with caution, and look research design details, and comment
about them. In most cases, in articles, design details are available in appendices including the
questionnaire items for the reader to check.
c) Check the measurements, and decide whether they are measuring what they are claimed to
be measured.
d) Check whether the original authors have checked their findings with similar studies in the
area, and show evidence of similar results.
e) Critically evaluate the definitions of events or situations when developing proxy variables.
f) If possible contact the authors of the research, or the agency responsible for data collection
and have clarifications about their definitions of the constructs, and the ways the data is
collected.
In quantitative and experimental research, designs are done in advance, and ways to control
validity threats are built into designs.They are exclusively expressed in quantitative research,
and one who uses the information and data is able to check them and decide how well the
validity issues are being handled. The secondary data user has to have a critical eye on
secondary data irrespective of the research philosophy followed by the original authors.
Content validity problems
The idea of content validity differs from construct validity in the sense that it refers to
whether the items on a test actually test what the researcher is expecting to test in the
content, and also that the test is a representative sample of the research measures of the
content.The following few examples explains the content validity issue.
a) Let's assume a researcher wants to measure knowledge of teachers about a new
curriculum. If he decides to use secondary data relating to the general knowledge of teachers
about the course curriculum, one can raise a content validity issue as he measures the
general knowledge, not the teachers' knowledge of the new course curriculum.
b) If a researcher wants to measure mathematical skills of a student group, and he uses only
tests results of students' skills in addition of numbers, he creates a content validity issue,
since the addition skills alone cannot be a measure of the domain mathematical skills which
has wider skill range.
c) The content validity has become a serious issue in also tests conducted to select candidates
for employment. The question there is whether those tests measure the knowledge, skills,
and behaviors required by a certain job domain (Williams 1995).
One who uses secondary data generated through such studies has a difficult task. He has to
perhaps redefine the concept first as to his understanding, and look at the measurements
critically used to generate data and information. The following is a list of questions to ask by
a secondary data user prior to using the data and information.
a) What aspects constitute the domain content of the research?
b) Do the measurements have been correctly identified to measure the domain content?
c) Do the measurements represent a sufficient sample to cover the content of the domain?
To answer these questions the data user should be able to list the content of the domain first,
and then possible range of measurements. He should be able to crosscheck his definition and
measurements against the secondary research data and information he is going to use.
Reliability issue
The next major issue in secondary data is the reliability of data, and their measurements.
Reliability concept is also looked in different ways in quantitative and qualitative research
approaches. In quantitative research, reliability is viewed as a measurement error, and as an
issue of variance. There could be always unobserved parts in events or situations due to
measurement errors or inability to observe through scientific methods. For example, if a test
measures mathematical skill of a student and the student obtains a score of 70, this is the
observed measurement of student's math skills. We can only know the observed score. The
question is whether there is an unobserved portion of the student's math skills. If this
particular student was not well on the day of the test and was not able to perform well, is
there a way to establish the fact that the measurement was reliable? A reliable measurement
should produce the same observation when repeating the experiments under the same
conditions. So, the tendency toward consistency found in repeated measurements is referred
to as reliability. If the same test was repeated on the same student for the second time under
the same conditions he should get the same score if the above measurement is to be reliable.
There are statistical methods to measure reliability, and in research articles, or other work
they are usually shown in foot notes or appendices or the content itself. One who use
secondary data will know how reliable the data and information he uses from these notes.
The secondary data user should check the following in detail prior to using data particularly
from quantitative research documents.
a)What methods the researcher has used to guarantee the internal consistency of the data he
generated? Has he used methods like test-retest method, or any other internal consistency
method?
b)Has the researcher explains in detail the instruments he used, and can the reader check a
sample of the instrument?
c)Has he checked the test stability, means whether the individuals vary in their responses
when the instrument is administered a second time.
d)Has the researcher taken steps to eliminate test administration errors and scoring errors?
Dealing with data from quantitative research is much easier than the data from qualitative
research. The quantitative research methods are primarily intended to test theory, and the
researcher works in deductive manner. Qualitative researcher, on the other hand, tries to
understand the meaning of a situation and the lived experiences from the eyes of the
participants. As to Stanbacka (2001) in qualitative research, situations are revealed in ever
changing situations, and hence conventional reliability measurements are not relevant.
Instead, dependability and trustworthiness of the researcher's account is much more
important in qualitative research.
Trustworthiness of data
This is an issue mainly in qualitative research approaches and data generated through them.
Some of the scholars reject the concepts of validity and reliability arguing that there is no
reality external to one's perception about a situation or phenomenon. So there is no need to
have an objective truth to compare an account of a situation created through qualitative
research. What we need is "the possibility of testing a researcher's account against the world,
giving the phenomena that we are trying to understand the chance to prove us wrong"
(Maxwell 1996). As Guba and Lincoln (1985) pointed out, secondary data user has to assure
that the data and findings of a research have high confidence, they can be applied in other
similar contexts, and the findings would be repeated if the study could be replicated in the
same way. Also, the secondary data user have to clarify that the data emanated from the
participants, and not solely from the researcher. The following list provides a guideline to
check the trustworthiness of qualitative research information and data.
a)What are the techniques the researcher has used to record information? Has he used video
tapes, recording machines or transcribing to record interviews, what he heard or what he
saw?
b)Is the researcher describing the situation without giving valid reason for not to use those
techniques?
c)Is the researcher describing the situation in detail, in concrete manner as the events occur
and in chronological order?
d)Is the researcher is interpreting the situation based on a pre-set framework or revealing the
participants understanding and meaning of the event or the situation?
e)Check how the researcher has avoided false information from participants since informants
could behave the way researcher wants, and is not revealing the reality.
f)Also, the qualitative researchers engage in endeavors to reveal realities in frequently
changing situations and hence it is also necessary to check whether these changes and their
affects have properly described by the researches.
g)Check whether the researcher has collected data using multiple methods and from diverse
range of settings and individuals, so he can compare the information and check for surfacing
patterns in interpretation. This method is called triangulation.
h)Also check how far the researcher has discussed the situation with others other than the
participants who are familiar with the situation or the event.
i)How far the researcher uses simple numeric evidence, and negative evidence to support the
conclusions?
Data source bias
This issue is valid for both quantitative and qualitative research approaches. Particularly in
government agencies research data could represent political views of the agency. In most
government agency reports in almost all countries exaggerated data is common to show the
voters and the rest of the world a better picture than the real conditions. As discussed earlier
a researcher may also come up with his own accounts, not the participants of the study to
prove a certain point he has in mind. Further, when sampling subjects, researcher or data
collecting organization might follow a biased selection of participants for various reasons.
Frequent exclusion of poor neighborhoods by government agency studies is a known
example in this respect.
Secondary data user need to check carefully the following to identify data source bias.
a)In case of government reports and data, cross check with the data from independent
studies, and reports from international organizations such as United Nations reports.
b)Check about special events or extreme situations that could affect data collecting agencies.
For example, an election could affect exaggeration of employment data or data related to
growth indicators.
c)Check for sampling methods adopted by the data collectors, and whether distribution of
subjects has been covered sufficiently.
d)Check whether the researcher has triangulated data findings with the participants and
experts.
Copyright Issue
Secondary data owns by others, and the user is supposed to get the permission from the
original owner if they are to be copied. Copyright law protects creative expressions reduced
to a tangible form, such as books, articles, recorded music, paintings, photos, or screen plays.
Copyright procedure in general follows the Berne Copyright Convention. In United States, it
is governed by the title 17 of the U.S. code. Other countries have their own copyright laws,
and U.S. Circular 38a explains the international copyright relations of the United States.
Copying content for commenting and critiquing, teaching and training, study and
researching is allowed under the "fair use" of information and data. However, fair use
defense is not applicable if one wanted to republish a research work that has been written
using secondary data.
How to Obtain permission
When the secondary data user has to obtain copyright permission he should contact the
original author or his authorized agent. Oral permission is legal, but it is better to have
permission documented. The following are two agent sites help to speed up the process.
http:/www.copyright.com
http:/www.icopyright.com
Conclusion
The secondary data and information is an essential part of any study and research. However,
the user should know that they are not free from flaws. The major issues in secondary data as
discussed in this article are validity and reliability, trustworthiness of research accounts, and
the data source bias. The copy right is a procedural issue, but very important when using
secondary data. This article discussed important check points, and questions to ask by the
secondary data user to minimize secondary data problems in research work.
References:
Download