Exercise on Topic 32: Measures of Reliability

advertisement
HOMEWORK EIGHT: INSTRUMENTATION
Topic: Instrumentation
Isabel Cabrera
EDCI-6300.61 Foundations of Research in Education
Dr. Alberto Jose Herrera
The University of Texas at Brownsville
March 10, 2012
1
HOMEWORK EIGHT: INSTRUMENTATION
2
Exercise on Topic 32: Measures of Reliability
1. Researchers need to use at least how many observers to determine inter
observer reliability?
A researcher should at least have two observers independently observe in order to check
the reliability.
2. When there are two quantitative scores per participant, researchers can
compute what statistic to describe reliability?
When there are tow quantitative scores per participant the researcher can check on the
degree of relationship by computing a correlation coefficient.
3. Do researchers usually measure at two different points in time to estimate
inter observer reliability?
No, it is in the test-retest reliability when researchers measure at two different points in
time.
4. Do researchers usually measure at two different points in time to estimate
test-retest reliability?
Yes, when researchers measure at two different points in time, it is said that the
researcher is using the test-retest reliability.
5. According to this topic, most published tests have reliability coefficients that
are about how high?
Most published tests have reliability coefficients of .80 or higher.
6. According to this topic, serviceable reliability coefficients may be how low if
researchers are examining group averages?
The serviceable reliability coefficients may be as low as .50 if researchers are examining
group averages.
HOMEWORK EIGHT: INSTRUMENTATION
3
Exercise on Topic 33: Internal Consistency & Reliability
1. Which two methods for estimating reliability require two testing sessions?
In the test-retest reliability is a test that is administered twice and the parallel forms
reliability is when two alternative forms of a test are administered.
2. Does the split-half method require “one” or “two” administrations of a test?
The researcher administers one test, but scores the items in the test as though they were
two separate tests by scoring all the odd numbers as one test score and then using all the
even numbers as another test score.
3. What is meant by an “odd- even split”?
The odd-even split is a process that results in two different scores per examinee by
dividing the questions in half by using the even numbered problems as one score and the
odd numbered problems as another score for the examinee.
4. If a split-half reliability coefficient equals 0.00, what does this indicate?
When the researcher correlates the two sets of scores, he/she is yielding what is known as
a split-half reliability coefficient. It can range from 0.00, which indicates a complete
absence of reliability.
5. What is the highest possible value for a split-half reliability coefficient?
1.00 is the highest possible value for a split-half reliability coefficient.
6. To obtain alpha, mathematical procedures are used to obtain the equivalent
of what?
In order to obtain alpha, the mathematical procedures are used to obtain the
equivalent of the average of all possible spit-half reliability.
7. Does alpha estimate the consistency of scores over time?
No, alpha measures consistency among the items within a test at a single point in time.
Exercise on Topic 34: Norm- & Criterion-Referenced Tests
HOMEWORK EIGHT: INSTRUMENTATION
4
1. A norm-referenced test is designed to facilitate a comparison of an
individual’s performance with what?
A norm-referenced (NRT) test is designed to facilitate a comparison of an individual’s
performance with that of a norm group.
2. Are norm groups always a national sample?
Norm groups are often national samples of examinees, but it also may be an entire local
population.
3. What is the definition of a criterion-referenced test?
A CRT is a test designed to measure the extent to which individual examinees have met
performance standards of specific criteria.
4. In which type of test are items answered correctly by about 50% of the
participants favored in item selection?
In the NRT, items answered correctly by about 50% of the participants are favored in
item selection.
5. In which type of test are items typically selected on the basis of content they
cover without regard to items difficulty?
In the CRT, item difficulty typically is of little concern. Instead, they are used to
determine the desired level of performance.
6. Which type of test should be used in research where the purpose is to
describe specifically what examinees can and cannot do?
The criterion-referenced test is used to describe specifically what examinees know and
can do or in other words performance level. In contrast, the norm-referenced tests are
designed in comparison to the norm group.
Exercise on Topic 35: Measures of Optimum Performance
HOMEWORK EIGHT: INSTRUMENTATION
5
1. Which type of test is designed to predict achievement in general?
An intelligence test is designed to predict achievement in general.
2. An algebra prognosis test is an example of what type of test?
An algebra prognosis test is an example of an aptitude test because it is designed to
predict achievement in algebra by measuring basic math skills that are used in algebra.
3. A test designed to measure how much students learn in a particular course in
school is what type of test?
The achievement test is a measure of how much students learn in a particular course in
school.
4. A test designed to predict success in learning a new set of skills is what type
of test?
An aptitude test is designed to predict success in learning a new set of skills. For example,
the Scholastic Aptitude Test (SAT) is a test to predict success in college.
5. A list of desirable characteristics of a product or performance is known as
what?
A check list is a list of desirable characteristics of a product or performance, each of
which is awarded on a point system or rating scale.
6. How can researchers increase the reliability of scoring essays, products, and
performances?
Researchers can make sure that the scorers know specifically what characteristics of the
essays, performances, or products they are to consider and how much weight to give to
each characteristic in arriving at the the scores in order to increase reliability of the test.
7. According to this topic, are intelligence tests a good measure of innate
ability?
No, intelligence tests are not a good measure of innate ability. At best, intelligence tests
measure skills that have been acquired in some specific cultural milieu.
8. According to this topic, how valid are commercially published aptitude tests?
Commercially published aptitude tests have low to modest validity with coefficients of
about .20 to .60.
Exercise on Topic 36: Measures of Typical Performance
HOMEWORK EIGHT: INSTRUMENTATION
6
1. Do researchers usually want participants to show their best when measuring
personality traits?
No, when researchers measure personality traits, they want to determine participants’
typical levels of performance not to show their best.
2. In this topic, what is the main reason for administering personality measures
anonymously?
The main reason for administering personality measures anonymously is to reduce social
desirability in participants’ responses in order to increase the validity of the test.
3. What do researchers reduce by observing behavior unobtrusively?
Researchers want to observe behavior unobtrusively in order to reduce the influence of
social desirability to increase the validity of personality measures.
4. Loosely structured stimuli are used in which type of personality measure?
Projective techniques provide loosely structured or ambiguous stimuli such as the ink
blots to measure certain personality traits such as aggressiveness.
5. According to this topic, which type of personality measure is seldom used in
personality research?
Projective techniques test is seldom used in personality research because they are time
consuming, interpretations are usually in words, and their validity is highly suspect.
6. What is the range of choice in Likert-type scale?
In the Rensis Likert type scale the range is from “Strongly agree” to “Strongly disagree.”
Points are rewarded to each of the statements.
7. Is content validity a relevant concern when assessing the validity of a Likerttype scale for measuring attitudes?
Yes, content validity is a relevant concert when assessing the validity of a Likert-type
scale for measuring attitudes. The researcher must have several statements concerning
attitudes in order to have an analysis to assure that the contents are comprehensive to
contribute to its content validity.
Download