Kesahan Kandungan

advertisement
OBJEKTIF
1.Dapat menjelaskan maksud kesahan dan
keboleh percayaan sesuatu alat
pengukuran penyelidikan
2.Dapat menghuraikan jenis-jenis kesahan
dan keboleh percayaan alat pengukuran
yang digunakan dalam penyelidikan
KESAHAN
(VALIDITY)
KEBOLEHPERCAYAAN
(RELIABILITY)
Validity refers to the degree in
which our test or other measuring
device is truly measuring what
we intended it to measure.
Sejauh mana alat
mengukur apa yang ia
sepatutnya ukur
Kesahan bermaksud kebolehan ujian mengukur apa
yang sepatutnya diukur, Youngman & Eggleston,
1982; Sax & Newton, 1997)
Kesahan sesuatu alat pengukuran merujuk kepada
sejauh manakah alat yang digunakan mengukur data
yang dikehendaki untuk mencapai objektif kajian
(Mohd Majid Konting, 1990)
Based on Internal Structure
Kesahan Gagasan
Construct
(determination of the
significance, meaning,
purpose, and use of the
scores)
Based on Relations
to Other Variables
Based on content
Kesahan Kriteria
Criterion-referenced (scores are a
predictor of an outcome or criterion
they are expected to predict)
Concurrent
Evidence
Predictive
Evidence
Kesahan Kandungan
Content (representative of
all possible questions that
could be asked)
Content validation is usually carried
out by experts
Kesahan Kandungan
(Content Validity)
Sejauh mana alat merangkumi kandungan sesuatu
bidang.
 Matlamat utama ialah untuk memastikan semua isi
dan kandungan bidang yang diukur menggambarkan
bidang tersebut.
 Berdasarkan kepada skop dan objektif dan
kandungan sesuatu bidang yang dikaji.
 Pendapat pakar atau penilai luar diperlukan bagi
menilai kesesuaian butiran bagi domain yang dipilih.

…is concerned with a test’s ability to include or represent
all of the content of a particular construct. The
question “1 + 1 = ___” may be a valid basic addition
question. Would it represent all of the content that
makes up the study of mathematics? It may be
included on a scale of intelligence, but does it
represent all of intelligence? The answer to these
questions is obviously no. To develop a valid test of
intelligence, not only must there be questions on math,
but also questions on verbal reasoning, analytical
ability, and every other aspect of the construct we call
intelligence. There is no easy way to determine
content validity aside from expert opinion.
1.
2.
3.
Do the items appear to represent the thing
you are trying to measure?
Does the set of items underrepresented
the construct’s content (i.e., have you
excluded any important content areas or
topics?)
Do any of the items represent something
other than what you are trying to measure
(i.e., have you included any irrelevant
items?)
Sebelum sesuatu instrumen itu dikatakan
mempunyai kesahan kandungan, lima syarat ini
perlu dipenuhi:
1.
2.
3.
4.
Bidang kandungan mestilah dinyatakan dalam bentuk
tingkah laku yang secara umum diterima maknanya.
Bidang mestilah dihuraikan dengan jelas.
Bidang mestilah relevan dengan tujuan penggunaan
ujian.
Hakim-hakim yang berkelayakan mestilah bersetuju
bahawa bidang telah disampel secara mencukupi.
Evidence Based on Internal
Structure
To measure several components or dimensions of a
construct.
 Use Factor Analysis to analyzes correlations among
test items and tells you the number of factors
present. Its tell you whether the test is
unidimensional or multidimensional.
 Unidimensional – all the item measure are single
construct.
 Multidimensional – different set of item tap
different construct or different component of a
broader construct.

…… Internal Structure
Factor analysis tell you how many dimensions or
factors your test items represent.
 Also can obtain a measure of test homogeneity
(i.e., the degree to which the different items
measure the same construct or trait)
 Use coefficient alpha (Alpha Cronbach) for the test
of homogeneity.
 If the alpha is low (e.g., <.70) for the test, then
some items might be measuring different
constructs or some items might be bad.
 Examine the items that are contributing to your
low coefficient alpha and consider eliminating or
revising them.

Kesahan Kriteria
(Criterion Validity)





Obtained by relating your test scores to a relevant criterion.
A criterion is the standard or benchmark that you want to
predict accurately on the basis of scores from your test.
Sejauh mana kaitan antara alat dengan kriteria luaran yang
berkecuali (sama ada item mengukur kriteria yang hendak
diukur).
Ditentukan dengan analisis korelasi antara dua set markah.
Calculate correlation coefficients for the study of validity –
validity coefficients.
Concurrent Validity refers to a measurement device’s ability to vary
directly with a measure of the same construct or indirectly with a
measure of an opposite construct. It allows you to show that your
test is valid by comparing it with an already valid test. Administering
the focal test and criterion test at approximately the same point in
time (i.e., concurrently) and then correlating the two set of scores. If
the two sets of scores highly correlated, you have concurrent
evidence.
e.g.
A new test of adult intelligence, for example, would have
concurrent validity if it had a high positive correlation with the
Wechsler Adult Intelligence Scale since the Wechsler is an
accepted measure of the construct we call intelligence. An
obvious concern relates to the validity of the test against
which you are comparing your test. Some assumptions must
be made because there are many who argue the Wechsler
scales, for example, are not good measures of intelligence.
• Obtain predictive evidence of validity by measuring your
participants at one point in time on your test and then, at a
future time, measuring them on the criterion measure.
• Take more time and effort than concurrent evidence, but it
can provide superior evidence that your test does what
you want it to do.
In order for a test to be a valid screening device for some
future behavior, it must have predictive validity. The SAT is
used by college screening committees as one way to predict
college grades. The GMAT is used to predict success in
business school. And the LSAT is used as a means to predict
law school performance. The main concern with these, and
many other predictive measures is predictive validity
because without it, they would be worthless
Reliability is synonymous with the consistency of a test, survey,
observation, or other measuring device. Imagine stepping on your
bathroom scale and weighing 140 pounds only to find that your weight on
the same scale changes to 180 pounds an hour later and 100 pounds an
hour after that. Base on the inconsistency of this scale, any research
relying on it would certainly be unreliable. Consider an important study on
a new diet program that relies on your inconsistent or unreliable bathroom
scale as the main way to collect information regarding weight change.
Would you consider their results accurate?
Sejauh mana instrumen mengukur dengan tekal apa
yang hendak diukur.
 Scores from measuring variables that are stable and
consistent

Test-retest
Reliability
Internal
Consistency
Reliability
Equivalent
Forms
Reliability
Merujuk kepada ketekalan atau stabiliti markah
ujian jika dilakukan pada masa yang berbeza.
Contoh:
Ujian diberikan kepada 100 individu untuk satu masa dan diulangi
pada masa berlainan. Dua set markah ini dikorelasikan. Sekiranya
individu memperoleh markah tertinggi dalam ujian 1 juga
memperolehi markah tertinggi dalam ujian 2, begitu juga individu
yang mendapat markah terendah dalam ujian 1 juga mendapat
markah terendah dalam ujian, maka dikatakan mempunyai korelasi
yang tinggi. Oleh itu soalan ujian tersebut mempunyai
kebolehpercayaan yang tinggi.
Refers to the consistency of a group of individual’s scores on two
equivalent forms of a test designed to measure the same
characteristic.
 Menggunakan satu alat yang dibina dan satu lagi yang piawai.
 Ditadbir ke atas subjek yang sama dan pada masa yang sama atau
masa yang lain.
 Equivalent form means that two tests are constructed so that
they are identical in every way except for the specific items
asked on the test.
 This means that they have the same number of items, the items
are the same difficulty level, the item measure the same
construct, and the test is administered, scored, and interpreted
in the same way.
 The two set of scores are than correlated. If this reliability
coefficient to be very high and positive, that is the individuals
who do well on the first form of the test should also do well on
the second form, and individuals who performed poorly on the
first form of the test should perform poorly on the second test.

Internal consistency refers to how consistently the items on a
test measure a single construct or concept.
 The test-retest methods of assessing reliability are general
methods that can be used with just about any test.
 Internal consistency measures are convenient and are very
popular with researchers because they require one group of
individuals to take the test one time.
 Two indexes of internal consistency:
o Split half reliability
o Coefficient alpha

Split-half reliability
• Splitting a test into two equivalent halves and then
assessing the consistency of the scores across the two
halves of the test.
• Divide the test into halves and correlate the scores
from the two halves.
• Compute the correlation between scores on the two
halves of the test using Spearman-Brown formula.
• The low correlation indicates that the test was
unreliable, a high correlation indicates that the test was
reliable.
Coefficient alpha
• Lee Cronbach 1951) developed coefficient alpha.. Alpha
Cronbach
• Coefficient alpha tells you the degree to which the items
are interrelated.
Rule of thumb:
• At a minimum, greater than or equal to .07 for research
purposes and somewhat greater than that value (e.g. ≥
.09) for clinical testing purposes.
Pernyataan item mestilah jelas dan tepat.
 Arahan mestilah jelas dan ringkas.
 Item hendaklah bentuk sejenis.
 Situasi dan masa pengukuran hendaklah piawai,
serupa dan terkawal.
 Elakkan gangguan ke atas subjek.
 Elakkan kebimbangan subjek dengan memberi
jaminan keselamatan dan kerahsiaan ke atas
maklumat yang diberi.

Fasa terakhir tinjauan
sebelum pengumpulan
data bermula.
Matlamatnya adalah untuk
mencari masalah dalam soal
selidik, termasuk soalan yang
lemah, arahan yang tidak lengkap
dan item yang sukar dijawab.
Tidak boleh gunakan
kumpulan fokus
sebenar.
Untuk kajian
baharu,
lakukan dua
kali ujian rintis.
Jumlah responden tidak
ditentukan dengan tepat,
dicadangkan sekurangkurangnya 25 orang, lebih baik
antara 50 – 75 orang.
Train
researchers to
collect
observational
data
Develop
standard written
procedures for
administering an
instrument
Obtain
permission to
collect and
use public
documents
Respect individuals and sites
during data gathering (ethics)
Institutional or
organizational
(e.g., school
district)
Parents of
participants who are
not considered adults
Campus approval (e.g.,
university or college) and
Institutional Review
Board (IRB)
Download