Confidence interval of the estimate

advertisement

E. RELIABILITY OF ESTIMATES

The estimates presented in this publication are based on a sample survey, and may contain various errors.

1. Sampling errors and their use

Sample errors arise from the fact that only a sample is surveyed, and not the entire surv ey population (“the population,” below). The sample upon which the present survey is based is only one of a large number of possible samples, which could have been obtained with the same sampling method and the same size. It is obvious that estimates based on various samples are different from each other, and almost all are different from the value that would have resulted if the information had been collected from the entire population, and not from a sample.

The estimate X

ˆ

is the value estimated according to the specific sample of a survey, instead of the corresponding value that would have been obtained if the information had been collected from the entire population.

The sampling error of the estimate

(

ˆ

) is the average difference between all the various estimates that could have been obtained from all the possible samples of the same size and method, on the one hand, and the value obtained from the information collected from the entire population, under the same conditions of data collection performed in the survey. The sampling error can be estimated from the survey data.

Sometimes it is convenient to assess the accuracy of an estimate by the relative sampling error defined as the sampling error of the estimate, divided by the estimated value of

( )

. It is customary to multiply this value by 100 in order to express it in percentages. Care should be taken not to use estimates that are liable to relatively large sampling errors (over 25%) and particularly large sampling errors

(over 40%).

Confidence interval of the estimate

Since it is clear that a specific estimate

– X

ˆ

– deviates (almost certainly) from the value of X , it is best to relate not just to the value estimated from the sample, but also to the range in which the value of X is likely to be found at a given probability

(at a specific level of confidence). This can be done using the sampling error.

- I -

The confidence interval for an estimate is an interval containing X , at a given, predetermined level of confidence. The confidence interval is based on the estimate

X

ˆ and the estimate of its sampling error – 

(

ˆ

) , obtained from the sample and at a predetermined confidence level.

It is customary to present confidence intervals at a confidence level of 95%, and the confidence interval at this confidence level is between

2

 ˆ

( ) and

2

 ˆ

( ) .

It can be claimed with 95% confidence that the value that would have been obtained from the entire population can be found in this area, assuming there are no other errors.

A higher or lower level of confidence can be determined, and the confidence interval can be calculated as follows:

1

 

K

1

 

67%

1.0

80%

1.3

90%

1.7

95% 99.5%

2.0

2.8

K

1

 

is the number of sampling errors that should be added to or subtracted from the estimate, in order to obtain a confidence interval at the required level of confidence, which is commonly marked as 1

  .

Example 1: The estimated percentage of first-degree recipients in the humanities, who worked during the survey period, among the 2000/01 graduates, is 84.1 (Table

6), and the sampling error of the estimate is 1.4% (Table 6 – Sampling Errors). It can be claimed with 95% confidence that this percentage is in the range of

84.1

%  2 *

1.4%, i.e., between 81.3% and 86.9%.

Confidence intervals are usually relatively symmetrical to the estimate; but with regard to estimates based on a small number of cases in the sample, the intervals may be asymmetrical. In these cases, both the estimate itself and the estimate of its sampling error are subject to great errors. In addition, confidence intervals of percentage estimates should be avoided, since they may – in special cases – include values outside of the scale of 0 to 100.

Sampling errors and confidence intervals of discrepancies between exclusive groups (non-overlapping)

Below, instructions are presented for calculating estimates of discrepancies between exclusive groups; constructing confidence intervals for them, and testing the extent of

- II -

the discrepancies based on the estimates and the sampling errors in the table generator. The instructions are for estimates of discrepancies of two types:

– Discrepancies between two estimates of the same type that relate to exclusive population groups (e.g., educational institutions and different fields of studies). It is impossible to conduct the calculations for groups that are not exclusive based only on estimates and sampling errors.

Discrepancies between estimates of the same type that relate to different years. If

ˆ

1

is the estimate of Group 1, and

ˆ

2

is the estimate of Group 2, then the estimate of the discrepancy between the two groups is:

1

2

.

In order to conclude if

ˆ

1

is, in fact, different from

ˆ

2

in the population, the sampling error of the discrepancy estimate must be calculated:

 ˆ

(

ˆ

)

  ˆ 2

(

1

)

  ˆ 2

(

2

) .

In this formula,

 ˆ

(

1

) and

 ˆ

( and

2

, respectively.

2

) are the sampling errors of the estimates, X

1

If the sampling error is

 ˆ

(

ˆ

)

,

the confidence interval of the discrepancy estimate D at the confidence level of 1

  is:

K

1

 

 ˆ

(

ˆ

) .

If the confidence interval includes the value zero, the discrepancy D is said not to be statistically significant, i.e., according to the specific sample of the survey, at the confidence level determined, it is impossible to say that X

1 is, in fact, different from

ˆ

2

in the population (although in the sample they are different from each other).

If the confidence interval does not include the value zero, a significant discrepancy is said to exist; and at a confidence level of 1

  the discrepancy will be between

K

1

 

 ˆ

(

ˆ

) and

K

1

 

 ˆ

(

ˆ

)

.

Example 2: The estimated percentage of those satisfied with the computer and communications services among college graduates in 1999/2000 is 71.6, and among those who graduated in 2001/02 – 76.7 (Table 3). The sampling error for each year is

0.7% (Table 3 – Sampling Errors). If the difference is estimated at:

- III -

2

1

76 .

7 %

71 .

6 %

5 .

1 % the sampling error of the difference is:

 ˆ

(

ˆ

)

  ˆ 2

(

1

)

  ˆ 2

(

2

)

0 .

7 %

2 

0 .

7 %

2 

1 .

0 %

.

The confidence interval of the difference estimate at a confidence level of 95% is:

2

 ˆ

(

ˆ

)

5 .

1 %

2 * 1 .

0 % , i.e., between 3.1% and 7.1%. The interval does not contain the value of 0, and therefore it can be said that the difference is significant.

Example 3: The estimated percentage of male first-degree recipients who worked at the time of the survey, is 84.3 among 2000/01 graduates in the humanities; and 88.6 of those in social sciences (Table 6

– Percentages). The sampling errors of both estimates are 2.6% and 1.5%, respectively (Table 6 – Sampling Errors). The estimated difference is:

2

1

88 .

6 .

0 %

84 .

3 %

4 .

3 % .

The sampling error of the difference is:

 ˆ

(

ˆ

)

  ˆ 2

(

1

)

  ˆ 2

(

2

)

2 .

6 %

2 

1 .

5 %

2 

3 .

0 %

.

The confidence interval of the estimated difference at a confidence level of 95% is

2

 ˆ

(

ˆ

)

4 .

3 %

2 * 3 .

0 % ; i.e., between -1.7% and 10.3%. The interval contains the value zero; therefore, the difference is not significant.

Note: Reaching general conclusions regarding the differences between two exclusive groups, based on the comparisons between estimates, should be avoided. Thus, for instance, if many significant difference estimates were obtained in various elements of graduates’ satisfaction between institution A and institution B, no wide-reaching conclusions should be arrived that satisfaction with Institution A is different from satisfaction with Institution B. Multiple comparisons require other tools for reaching conclusions, whereas the tools described in the instructions here are for individual comparisons only.

- IV -

2. Non-sampling errors

Non-sampling errors in the survey may arise from many factors at all stages of data gathering and processing, and they can also be found in information gathered from the general population and not just from a sample of units.

The main non-sampling errors in the survey are:

(a) Errors resulting from non-response: errors resulting from the fact that graduates do not respond, due to absence, refusal or other reasons. This may cause a certain bias in estimates, because the characteristics of the respondents may be different from those of the graduates who did not respond to the survey. The method of estimation is intended to reduce this bias.

(b) Response errors: errors resulting from misunderstanding the questions, a lack of will or ability to respond correctly, or from an incorrect presentation of the question.

(c) Errors in data key entry: Data of those who responded by mail are entered in the office. In addition, data gathered in telephone interviews are entered directly by the surveyor, using the Blaze program. Errors may occur in entering data.

(d) Errors at various processing stages: errors that occur while processing the material; e.g., errors in coding. Some of the errors are corrected using checks of the received data.

While the sampling errors can be estimated through the survey data, it is difficult (and often even impossible) to assess non-sampling errors; and there are no quantitative assessments of these errors in this publication. Nevertheless, it is important to emphasize that in planning and conducting the survey, efforts were made to reduce as much as possible the number of errors of the various types.

- V -

Download