Uploaded by ronice.johnson

ReliabilityandValidityofResearchInstruments

advertisement
Reliability and Validity of Research Instruments
Correspondence to kubaiedwin@yahoo.com
Edwin Kubai
UNICAF University - Zambia
Introduction
This paper primarily focuses explicitly on two terms namely; reliability and validity as used in
the field of educational research. When conducting any educational study it is worth noting that
designing and measuring the research instruments is very essential especially to novice
researchers. The data collection tools (research instruments) should be designed in such way that
they would be able to accurately measure the intended construct under investigation and ensure
the meaningfulness of the study findings. This would greatly enhance believability and trust
worthiness of the research findings especially if the study is repeated by different investigators
under the same conditions or with different research instruments measuring the same construct. It
is absolutely true to note that reliability and validity are two terms used in any investigation of
which novice researchers find them difficult to differentiate them. They find difficult on how
accurately to explain to the audience if their research instruments meet the minimum threshold
for reliability and validity conditions. It has been noted
with concern that most novice
researchers fail to clarify how reliability and validity was achieved in their respective studies due
to lack of sufficient knowledge about the concept or some fail completely to mention about it in
their research methodology. This paper attempts to clarify issues related to reliability and validity
of research instruments to ensure tranquility and transferability of research findings. The next
section defines validity and reliability concepts as used in designing research instruments
1
© September 15, 2019
Definition of Reliability and Validity
According to Drost (2011), reliability is “the extent to which measurements are
repeatable when different people perform the measurement on different occasion, under different
condition, supposedly with alternative instruments which measure the construct or skill”. It can
also be defined as the degree to which the measure of a construct is consistent or dependable. For
instance when several people guess your own weight, the value of the weigth might not be
necessarily correct since it will be inconsistence with the accurate value hence the measurement
is said to be unreliable. If a weighing scale is used by different people to give the value of your
weight then there is likelihood of getting the same value every time a measurement is done hence
this measurement would be said to be reliable. “The extent to which a measure adequately
represents the underlying construct that it is supposed to measure” (Drost, 2011) is called
validity. The term construct refers to the skill, knowledge, attribute or attitude that the researcher
is investigating. For instance if a researcher wanted to measure compassion ,it is vital to know if
the measure would accurately measure compassion or empathy because the two terms are closely
related. Some of the constructs under investigation might be imaginary (they don’t exist in
reality) it is important to develop a scale that would consistently and precisely measure the
intended unobservable construct. Reliability and validity form psychometric properties of
measurement scales that are very important in estimating adequacy and accuracy procedures of a
scientific research as mentioned by Bajpai and Bajpai (2014). The next section discusses types of
reliability and how to use them in designing instrument for educational research.
2
© September 15, 2019
Reliability
From the previous section, reliability has been defined as stability of measurement over a
variety of conditions in which the results should be obtained (Nunnally, 1978). It is basically the
repeatability or replication of research findings. When a study is conducted by a researcher under
some conditions and then the same study is done again for the second time and yields the same
results then the data is said to be reliable. According to Drost (2011), reliability of data from
research instruments is affected by two errors; namely random error and systematic error.
Random error is attributed to a set of unknown and uncontrollable external factors that randomly
influence some observations but not others. For example respondents who might have nicer
moods might respond positively to constructs like self-esteem, happiness and satisfaction as
compared to respondents with bad mood. Random error is seen as noise in measurement hence it
is usually ignored. Systematic error is an error that is introduced by factors that systematically
affect all observations of a construct across the entire sample. Systematic error is considered as a
bias in measurement and should be corrected to yield better results of the sample. The best way
to estimate reliability is to measure the associations between tests, items and raters by calculating
reliability coefficient (Rosnow and Rosenthal, 1991). The following are the type’s reliability;
Test-retest reliability
It is a measure of consistency between measurements of the same construct administered to the
same sample at two different points in time (Drost, 2011). If the correlation between the two sets
of test is significant then observations have not changed substantially hence the aspect of time is
very critical for this type of reliability.
3
© September 15, 2019
Split-half reliability
Heale & Twycross, (2015) defined Split-half reliability as a measure of consistency between two
halves of a construct measure. For example, if a researcher uses a ten-item measure to measure a
construct, the items are divided into half or two sets of even and odd if the total number of items
is an odd one. It is assumed that the number of items for measuring a construct is available and
is measured within the same time period hence minimize the random error. The correlation
between the two halves must be obtained to determine the coefficient of reliability. A practical
advantage of this method is that it is cheaper and obtained easily as compared test retest
reliability where the researcher has to design new set of items to administer later.
Inter-rater reliability
It is also called inter-observer rating or an agreement. It involves rating of observations using a
specific measure but by different judges. The rating is basically independent but happens at the
same time. Reliability is obtained by correlation of scores from the two or more raters on the
same construct or sometimes it is the decision of agreement of the judgments of the same raters.
This is basically used when judges are rating or scoring a piece of an artistic work or music
performance on stage. There scores are correlated to give the Cohen` s Kappa coefficient of
inter-rater reliability especially if the variables are categorical.
Internal consistency reliability
It is a measure of consistency between different items of the same construct. It measures the
consistency within the instrument and questions on how well a set of items measures a particular
characteristic of the test. Single items within a test are correlated to estimate the coefficient of
reliability. Cronbach`s alpha coefficient is used to determine internal consistency between items
(Cronbach, 1951).
4
© September 15, 2019
An individual item of a test might have a small correlation with true scores attest with higher
items might have a higher correlation. For instance, 5-item test might have a correlation of 0.40
while a 12-item test might have a correlation of 0.80. According to Cortina, (1993) coefficient
alpha is used to estimate reliability for item-specific variance in a one-dimensional test. If the
coefficient alpha is low, it means that the test is too short or the items have little in common.
Validity
As defined earlier validity is the extent to which an instrument measures what it purports
to measure. Validity is the trying to explain the truth of research findings as explained by
Zohrabi, (2013). For example does IQ test measure intelligence? Validity is measured using
both theoretical and empirical evidences. Theoretical assessment is where an idea of a construct
is translated or represented into an operational measure. This is done by panel of experts who are
judges or university lectures that rate suitability of each item and evaluates its fitness in the
definition of the construct. Empirical assessment is where validity is based on quantitative
analysis involving statistical techniques. The following are type’s validity in educational
research;
Construct validity
This refers to how a concept, idea or behavior that is a construct is translated or transformed into
functioning and operating reality (Trochim, 2006). This happens especially if the relationship has
its cause and effect hence the construct validity justifies the existence of relationship. Construct
validity is critically substantiated under the following validity; face validity, content validity,
concurrent and predictive validity, and convergent and discriminant validity.
5
© September 15, 2019
Face validity
It is where an indicator seems to be a reasonable measure of its underlying construct “on its
face”. It actually ascertains that the measure is appears to be assessing the intended construct
under investigation. For example the aspect of an individual going to church every Sunday can
make someone conclude that the person is religious which might not be really true. The face
validity is often used by university lectures when assessing research instruments designed by
their students.
Content validity
This is an assessment on how well a set of scale of items matches with the relevant content
domain of the construct that it is trying to measure. According to Bollen (1989), as cited in Drost
(2011) content validity is a qualitative type of validity where the domain of the concept is made
clear and the analyst judges whether the measures fully represent the domain (p.185). The
researcher should design a research instrument that adequately addresses the construct or area
under investigation. For instance if a researcher wants to cover an investigation on
implementation of a new curriculum then the research instrument or test items designed by the
researcher must adequately address the domain to yield valid research findings. A group of
judges or experts that have content in the area under investigation can be used to assess this type
of validity.
Convergent and Discriminant validity
They are assessed together or jointly for a set of measure. Convergent validity refers to closeness
of which the measure relates to the construct that it purported to measure or simply it converges
with the construct. Discriminant validity refers to the degree to which a measure does not
measure or discriminates the construct it is not supposed to measure. To effectively obtain
convergent validity comparison of observed values of one indicator of one construct with others
indicators of the same construct is done.
6
© September 15, 2019
Discriminant validity is obtained by demonstrating that indicators of one construct are dissimilar.
A statistical procedure called bivariate correlation is used to analyze items using exploratory
factor analysis for convergent and discriminant validity.
Criterion-related validity
It is the degree of correspondence between a test measure and one or more external referents
(criteria) by correlation (Mohajan, 2017). For instance suppose students sat for an examination
and scored some scores and then we ask them about their scores. A correlation can be done
between their observed scores and true scores from the teachers’ record. Criterion –related
validity is closely related concurrent or predictive types of validity. Concurrent validity is where
one measure relates to other concrete criterion that is presumed to occur simultaneously. It
happens when a criterion exist at the same as the measure. An example could be the Students’
performance scores obtained from calculus and linear algebra since all of them are mathematics
test. Predictive validity is where a measure successfully predicts a future outcome that it is
theoretically expected to predict. A good example of predictive validity is the use of Students
Continuous Assessment Test (CAT) to predict their performance in final Examination. The
scores for the CAT can be correlated with the scores obtained from the Final Examination.
Conclusion
This paper has critically examined the definition of the terms reliability and validity as used in
educational research. It is important for novice researchers to have sufficient knowledge on the
concepts of reliability and validity when designing research instrument to enhance
trustworthiness and generalizability of research findings. The types of reliability identified
include; Test-retest reliability, split-half reliability, inter-rater reliability and internal consistency
reliability. The function of reliability in research is to ensure that the observed score is almost
similar to true score obtained by minimizing the errors in measurement.
7
© September 15, 2019
The following types of validity have been discussed; Face validity, content validity, convergent,
discriminant and criterion-related validity. Validity requires that the research instrument is
reliable but an instrument might be reliable without being valid. The interpretation of the results
of a test depends entirely on the underlying construct and validity of the research findings.
Reference
Bajpai, S. R., & Bajpai, R. C. (2014). Goodness of Measurement: Reliability and Validity.
International Journal of Medical Science and Public Health, 3(2), 112-115.
Bollen, K. A. (1989). Structural Equations with Latent Variables (pp. 179-225). John Wiley &
Sons.
Campbell, D.T. and Fiske, D.W. (1959). Convergent and discriminant validation by the
multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Cortina, J. M. (1993). What is Coefficient Alpha? An Examination of Theory and Applications.
Journal of Applied Psychology, 78 (1), 98-104.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,
16(3), 297-334.
Drost, E., A. (2011). Validity and reliability in social science research. Education Research and
Perspectives, 38 (1), 105-124.
Fiske, Donald W. (1982). Convergent -Discriminant Validation in Measurements and Research
Strategies. In Brinberg, D. and Kidder, L. H., (Eds), Forms of Validity in Research, pp.
77-93.
Heale, R., & Twycross, A. (2015). Validity and Reliability in Quantitative Studies. Evidence
Based Nurs, 18(4), 66-67.
8
© September 15, 2019
Mohajan, H. (2017). Two criteria for good measurements in research: validity and
reliability. Annals of Spiru Haret University. Economic Series, 17(4), 59-82.
Nunnally, J. C. (1978). Psychometric Theory. McGraw-Hill Book Company, pp. 86-113, 190255.
Rosenthal, R. and Rosnow, R. L. (1991). Essentials of Behavioral Research: Methods and Data
Analysis. Second Edition. McGraw-Hill Publishing Company, pp. 46-65.
Trochim, W. M. K. (2006). Introduction to Validity. Social Research Methods, retrieved from
www.socialresearchmethods.net/kb/introval.php, September 9, 2010.
Zohrabi, M. (2013). Mixed Method Research: Instruments, Validity, Reliability and Reporting
Findings. Theory and Practice in Language Studies, 3(2), 254-262.
9
© September 15, 2019
Download