Exercise on Topic 32: Measures of Reliability

HOMEWORK EIGHT: INSTRUMENTATION Topic: Instrumentation Isabel Cabrera EDCI-6300.61 Foundations of Research in Education Dr. Alberto Jose Herrera The University of Texas at Brownsville March 10, 2012 1 HOMEWORK EIGHT: INSTRUMENTATION 2 Exercise on Topic 32: Measures of Reliability 1. Researchers need to use at least how many observers to determine inter observer reliability? A researcher should at least have two observers independently observe in order to check the reliability. 2. When there are two quantitative scores per participant, researchers can compute what statistic to describe reliability? When there are tow quantitative scores per participant the researcher can check on the degree of relationship by computing a correlation coefficient. 3. Do researchers usually measure at two different points in time to estimate inter observer reliability? No, it is in the test-retest reliability when researchers measure at two different points in time. 4. Do researchers usually measure at two different points in time to estimate test-retest reliability? Yes, when researchers measure at two different points in time, it is said that the researcher is using the test-retest reliability. 5. According to this topic, most published tests have reliability coefficients that are about how high? Most published tests have reliability coefficients of .80 or higher. 6. According to this topic, serviceable reliability coefficients may be how low if researchers are examining group averages? The serviceable reliability coefficients may be as low as .50 if researchers are examining group averages. HOMEWORK EIGHT: INSTRUMENTATION 3 Exercise on Topic 33: Internal Consistency & Reliability 1. Which two methods for estimating reliability require two testing sessions? In the test-retest reliability is a test that is administered twice and the parallel forms reliability is when two alternative forms of a test are administered. 2. Does the split-half method require “one” or “two” administrations of a test? The researcher administers one test, but scores the items in the test as though they were two separate tests by scoring all the odd numbers as one test score and then using all the even numbers as another test score. 3. What is meant by an “odd- even split”? The odd-even split is a process that results in two different scores per examinee by dividing the questions in half by using the even numbered problems as one score and the odd numbered problems as another score for the examinee. 4. If a split-half reliability coefficient equals 0.00, what does this indicate? When the researcher correlates the two sets of scores, he/she is yielding what is known as a split-half reliability coefficient. It can range from 0.00, which indicates a complete absence of reliability. 5. What is the highest possible value for a split-half reliability coefficient? 1.00 is the highest possible value for a split-half reliability coefficient. 6. To obtain alpha, mathematical procedures are used to obtain the equivalent of what? In order to obtain alpha, the mathematical procedures are used to obtain the equivalent of the average of all possible spit-half reliability. 7. Does alpha estimate the consistency of scores over time? No, alpha measures consistency among the items within a test at a single point in time. Exercise on Topic 34: Norm- & Criterion-Referenced Tests HOMEWORK EIGHT: INSTRUMENTATION 4 1. A norm-referenced test is designed to facilitate a comparison of an individual’s performance with what? A norm-referenced (NRT) test is designed to facilitate a comparison of an individual’s performance with that of a norm group. 2. Are norm groups always a national sample? Norm groups are often national samples of examinees, but it also may be an entire local population. 3. What is the definition of a criterion-referenced test? A CRT is a test designed to measure the extent to which individual examinees have met performance standards of specific criteria. 4. In which type of test are items answered correctly by about 50% of the participants favored in item selection? In the NRT, items answered correctly by about 50% of the participants are favored in item selection. 5. In which type of test are items typically selected on the basis of content they cover without regard to items difficulty? In the CRT, item difficulty typically is of little concern. Instead, they are used to determine the desired level of performance. 6. Which type of test should be used in research where the purpose is to describe specifically what examinees can and cannot do? The criterion-referenced test is used to describe specifically what examinees know and can do or in other words performance level. In contrast, the norm-referenced tests are designed in comparison to the norm group. Exercise on Topic 35: Measures of Optimum Performance HOMEWORK EIGHT: INSTRUMENTATION 5 1. Which type of test is designed to predict achievement in general? An intelligence test is designed to predict achievement in general. 2. An algebra prognosis test is an example of what type of test? An algebra prognosis test is an example of an aptitude test because it is designed to predict achievement in algebra by measuring basic math skills that are used in algebra. 3. A test designed to measure how much students learn in a particular course in school is what type of test? The achievement test is a measure of how much students learn in a particular course in school. 4. A test designed to predict success in learning a new set of skills is what type of test? An aptitude test is designed to predict success in learning a new set of skills. For example, the Scholastic Aptitude Test (SAT) is a test to predict success in college. 5. A list of desirable characteristics of a product or performance is known as what? A check list is a list of desirable characteristics of a product or performance, each of which is awarded on a point system or rating scale. 6. How can researchers increase the reliability of scoring essays, products, and performances? Researchers can make sure that the scorers know specifically what characteristics of the essays, performances, or products they are to consider and how much weight to give to each characteristic in arriving at the the scores in order to increase reliability of the test. 7. According to this topic, are intelligence tests a good measure of innate ability? No, intelligence tests are not a good measure of innate ability. At best, intelligence tests measure skills that have been acquired in some specific cultural milieu. 8. According to this topic, how valid are commercially published aptitude tests? Commercially published aptitude tests have low to modest validity with coefficients of about .20 to .60. Exercise on Topic 36: Measures of Typical Performance HOMEWORK EIGHT: INSTRUMENTATION 6 1. Do researchers usually want participants to show their best when measuring personality traits? No, when researchers measure personality traits, they want to determine participants’ typical levels of performance not to show their best. 2. In this topic, what is the main reason for administering personality measures anonymously? The main reason for administering personality measures anonymously is to reduce social desirability in participants’ responses in order to increase the validity of the test. 3. What do researchers reduce by observing behavior unobtrusively? Researchers want to observe behavior unobtrusively in order to reduce the influence of social desirability to increase the validity of personality measures. 4. Loosely structured stimuli are used in which type of personality measure? Projective techniques provide loosely structured or ambiguous stimuli such as the ink blots to measure certain personality traits such as aggressiveness. 5. According to this topic, which type of personality measure is seldom used in personality research? Projective techniques test is seldom used in personality research because they are time consuming, interpretations are usually in words, and their validity is highly suspect. 6. What is the range of choice in Likert-type scale? In the Rensis Likert type scale the range is from “Strongly agree” to “Strongly disagree.” Points are rewarded to each of the statements. 7. Is content validity a relevant concern when assessing the validity of a Likerttype scale for measuring attitudes? Yes, content validity is a relevant concert when assessing the validity of a Likert-type scale for measuring attitudes. The researcher must have several statements concerning attitudes in order to have an analysis to assure that the contents are comprehensive to contribute to its content validity.

Exercise on Topic 32: Measures of Reliability

Related documents

Products

Support

Exercise on Topic 32: Measures of Reliability

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib