Secondary Data Problems and Tips to Handle Them in Research and Studies Don Patrick, Yahoo Contributor Network Nov 9, 2010 MORE: Copyright Basics Quantitative Research Qualitative Research FlagPost a comment Secondary data is not only numbers, but information and facts collected by organizations or individuals other than the user. Secondary data can be obtained directly from reports published by government agencies such as Census Bureau from their periodic national surveys or from published or unpublished research reports, articles, interview reports, field notes, and books. Presently, there are non-commercial as well as commercial agencies that provide aggregated secondary data sets directly accessible by the researcher or to buy in the market in CD-ROM media form. The reasons to use secondary data vary from research to research. Common reasons are the cost and time saving, wealth of information available on the topic under research, the purpose of the research and expected accuracy of findings. Secondary data is useful in research work at the beginning to see the overall picture of a situation and to define the research problems. The secondary data also could be the sole data source for some research endeavors. With secondary data, a researcher is able to narrow down his research question during the initial stages of his research process filtering through initial hypothesis. Although a researcher essentially should begin with consulting secondary data sources, he should not assume that they are free from errors and flaws. This article points to these issues and ways to minimize secondary data problems. The main issues relating to secondary data are data validity problems, reliability issues, trustworthiness of data and information, and data source bias. A procedural, but much easier issue in secondary data research is the copy right issue, which will be discussed in the last section of this article. Validity Problems The validity issue of secondary data is very prominent, because it raises questions on validity of the conclusions that are drawn from data. Validity problem in secondary data research arises when the definitions of a situation by the original data collector or organization do not match with that of the theoretical definition of the secondary data user. Two important validity issues are discussed in this article. They are the construct validity, and the content validity of secondary data and information. These concepts are discussed with examples, and a few tips are given to show how they could be handled by secondary data users. Construct validity The term construct can be defined as a property that explains the facets of a situation, event or a behavior. The way researcher understands a situation initially, and expresses his/her idea affects the way he/she designs methods to capture the details of the situation. Construct validity seeks agreement between concepts expressed by the researcher (constructs) and specific measuring devices or procedures adopted in a research. In other words, construct validity is an assessment of how well the researcher converts his ideas or initial thoughts into actual programs or research measures, and the extent to which the tests or scales sufficiently assess the theoretical construct as the original researcher assumes. The following examples clarify the construct validity issue. In the first two examples are to show the differences in construct definition by the secondary data user comparative to the original researcher or the agency. a) A secondary data researcher defines work injury as minor cuts, bruises, and sprains on the job while the respective government/private agency responsible for data collection may define it as a physical injury that would need visit to a physician. The data generated through the latter definition excludes many injured worker statistics, and perhaps come to biased conclusions. b) Census and Statistics Departments defines unemployed as individuals actively seeking for work in a defined period, while a researcher using secondary data may define it including individuals not seeking work as jobs are not available, or waiting because suitable jobs are not available in the market. In the following example the original researcher assumes that his experimental method rightly captures all facets of the constructs in his study. c) In 1961, Albert Bandura of Stanford University defined all human behavior as something learned, through social imitation and copying, rather than inherited through genetic factors. He developed a testing method called Bobo Doll experiment which was performed on children to observe their learned behavior. If a researcher wants to study human behavior and want to use information from Bandura's original experiment, he will have to question the definition of the construct as well as the testing methods and check whether the test method adopted measures exactly what Bandura claimed to measure in his experiment. This is what exactly happened when his theory was challenged, and raised construct validity issues by later researchers. Also validity issue can occur when a secondary data user has to develop proxy (indirect) variables to capture his construct using data from secondary sources.For example, if a researcher wants to study about household violence and has to depend on statistics collected by the Police Department where the Department of Police collects these data only based on reported incidents. It is well known that most of the household violence incidents are not reported for various reasons. Hence, the secondary data user only partially captures the reality in his proxy variables. The idea of construct validity extends more than the arguments raised here. Particularly, the validity issue in qualitative research is much more complex than in quantitative approaches. Further discussion on this issue, however, is beyond the scope of this article. Prior to using data, the secondary data user should at least check the following. a) Should look into the definition of a particular construct and decide whether the scope of the definition is overlapping correctly with the definition of the data user. b) Should read the information with caution, and look research design details, and comment about them. In most cases, in articles, design details are available in appendices including the questionnaire items for the reader to check. c) Check the measurements, and decide whether they are measuring what they are claimed to be measured. d) Check whether the original authors have checked their findings with similar studies in the area, and show evidence of similar results. e) Critically evaluate the definitions of events or situations when developing proxy variables. f) If possible contact the authors of the research, or the agency responsible for data collection and have clarifications about their definitions of the constructs, and the ways the data is collected. In quantitative and experimental research, designs are done in advance, and ways to control validity threats are built into designs.They are exclusively expressed in quantitative research, and one who uses the information and data is able to check them and decide how well the validity issues are being handled. The secondary data user has to have a critical eye on secondary data irrespective of the research philosophy followed by the original authors. Content validity problems The idea of content validity differs from construct validity in the sense that it refers to whether the items on a test actually test what the researcher is expecting to test in the content, and also that the test is a representative sample of the research measures of the content.The following few examples explains the content validity issue. a) Let's assume a researcher wants to measure knowledge of teachers about a new curriculum. If he decides to use secondary data relating to the general knowledge of teachers about the course curriculum, one can raise a content validity issue as he measures the general knowledge, not the teachers' knowledge of the new course curriculum. b) If a researcher wants to measure mathematical skills of a student group, and he uses only tests results of students' skills in addition of numbers, he creates a content validity issue, since the addition skills alone cannot be a measure of the domain mathematical skills which has wider skill range. c) The content validity has become a serious issue in also tests conducted to select candidates for employment. The question there is whether those tests measure the knowledge, skills, and behaviors required by a certain job domain (Williams 1995). One who uses secondary data generated through such studies has a difficult task. He has to perhaps redefine the concept first as to his understanding, and look at the measurements critically used to generate data and information. The following is a list of questions to ask by a secondary data user prior to using the data and information. a) What aspects constitute the domain content of the research? b) Do the measurements have been correctly identified to measure the domain content? c) Do the measurements represent a sufficient sample to cover the content of the domain? To answer these questions the data user should be able to list the content of the domain first, and then possible range of measurements. He should be able to crosscheck his definition and measurements against the secondary research data and information he is going to use. Reliability issue The next major issue in secondary data is the reliability of data, and their measurements. Reliability concept is also looked in different ways in quantitative and qualitative research approaches. In quantitative research, reliability is viewed as a measurement error, and as an issue of variance. There could be always unobserved parts in events or situations due to measurement errors or inability to observe through scientific methods. For example, if a test measures mathematical skill of a student and the student obtains a score of 70, this is the observed measurement of student's math skills. We can only know the observed score. The question is whether there is an unobserved portion of the student's math skills. If this particular student was not well on the day of the test and was not able to perform well, is there a way to establish the fact that the measurement was reliable? A reliable measurement should produce the same observation when repeating the experiments under the same conditions. So, the tendency toward consistency found in repeated measurements is referred to as reliability. If the same test was repeated on the same student for the second time under the same conditions he should get the same score if the above measurement is to be reliable. There are statistical methods to measure reliability, and in research articles, or other work they are usually shown in foot notes or appendices or the content itself. One who use secondary data will know how reliable the data and information he uses from these notes. The secondary data user should check the following in detail prior to using data particularly from quantitative research documents. a)What methods the researcher has used to guarantee the internal consistency of the data he generated? Has he used methods like test-retest method, or any other internal consistency method? b)Has the researcher explains in detail the instruments he used, and can the reader check a sample of the instrument? c)Has he checked the test stability, means whether the individuals vary in their responses when the instrument is administered a second time. d)Has the researcher taken steps to eliminate test administration errors and scoring errors? Dealing with data from quantitative research is much easier than the data from qualitative research. The quantitative research methods are primarily intended to test theory, and the researcher works in deductive manner. Qualitative researcher, on the other hand, tries to understand the meaning of a situation and the lived experiences from the eyes of the participants. As to Stanbacka (2001) in qualitative research, situations are revealed in ever changing situations, and hence conventional reliability measurements are not relevant. Instead, dependability and trustworthiness of the researcher's account is much more important in qualitative research. Trustworthiness of data This is an issue mainly in qualitative research approaches and data generated through them. Some of the scholars reject the concepts of validity and reliability arguing that there is no reality external to one's perception about a situation or phenomenon. So there is no need to have an objective truth to compare an account of a situation created through qualitative research. What we need is "the possibility of testing a researcher's account against the world, giving the phenomena that we are trying to understand the chance to prove us wrong" (Maxwell 1996). As Guba and Lincoln (1985) pointed out, secondary data user has to assure that the data and findings of a research have high confidence, they can be applied in other similar contexts, and the findings would be repeated if the study could be replicated in the same way. Also, the secondary data user have to clarify that the data emanated from the participants, and not solely from the researcher. The following list provides a guideline to check the trustworthiness of qualitative research information and data. a)What are the techniques the researcher has used to record information? Has he used video tapes, recording machines or transcribing to record interviews, what he heard or what he saw? b)Is the researcher describing the situation without giving valid reason for not to use those techniques? c)Is the researcher describing the situation in detail, in concrete manner as the events occur and in chronological order? d)Is the researcher is interpreting the situation based on a pre-set framework or revealing the participants understanding and meaning of the event or the situation? e)Check how the researcher has avoided false information from participants since informants could behave the way researcher wants, and is not revealing the reality. f)Also, the qualitative researchers engage in endeavors to reveal realities in frequently changing situations and hence it is also necessary to check whether these changes and their affects have properly described by the researches. g)Check whether the researcher has collected data using multiple methods and from diverse range of settings and individuals, so he can compare the information and check for surfacing patterns in interpretation. This method is called triangulation. h)Also check how far the researcher has discussed the situation with others other than the participants who are familiar with the situation or the event. i)How far the researcher uses simple numeric evidence, and negative evidence to support the conclusions? Data source bias This issue is valid for both quantitative and qualitative research approaches. Particularly in government agencies research data could represent political views of the agency. In most government agency reports in almost all countries exaggerated data is common to show the voters and the rest of the world a better picture than the real conditions. As discussed earlier a researcher may also come up with his own accounts, not the participants of the study to prove a certain point he has in mind. Further, when sampling subjects, researcher or data collecting organization might follow a biased selection of participants for various reasons. Frequent exclusion of poor neighborhoods by government agency studies is a known example in this respect. Secondary data user need to check carefully the following to identify data source bias. a)In case of government reports and data, cross check with the data from independent studies, and reports from international organizations such as United Nations reports. b)Check about special events or extreme situations that could affect data collecting agencies. For example, an election could affect exaggeration of employment data or data related to growth indicators. c)Check for sampling methods adopted by the data collectors, and whether distribution of subjects has been covered sufficiently. d)Check whether the researcher has triangulated data findings with the participants and experts. Copyright Issue Secondary data owns by others, and the user is supposed to get the permission from the original owner if they are to be copied. Copyright law protects creative expressions reduced to a tangible form, such as books, articles, recorded music, paintings, photos, or screen plays. Copyright procedure in general follows the Berne Copyright Convention. In United States, it is governed by the title 17 of the U.S. code. Other countries have their own copyright laws, and U.S. Circular 38a explains the international copyright relations of the United States. Copying content for commenting and critiquing, teaching and training, study and researching is allowed under the "fair use" of information and data. However, fair use defense is not applicable if one wanted to republish a research work that has been written using secondary data. How to Obtain permission When the secondary data user has to obtain copyright permission he should contact the original author or his authorized agent. Oral permission is legal, but it is better to have permission documented. The following are two agent sites help to speed up the process. http:/www.copyright.com http:/www.icopyright.com Conclusion The secondary data and information is an essential part of any study and research. However, the user should know that they are not free from flaws. The major issues in secondary data as discussed in this article are validity and reliability, trustworthiness of research accounts, and the data source bias. The copy right is a procedural issue, but very important when using secondary data. This article discussed important check points, and questions to ask by the secondary data user to minimize secondary data problems in research work. References: