Ch8. Properties of Sample statistics

advertisement

248

Chapter 8. Properties of Sample Statistics

Problem PS200

Consider as a (small) population the values 1, 2, 3, 6 in cells A2:A5. We illustrate some properties of sample statistics when randomly sampling with replacement.

1.

Report a first possible simple random sample with replacement of 2 observations from this population in cells D2:E2: (1 1). Report all remaining possible samples in cells

D3:E17: (1 2), (1 3), (1 6), (2 1), (2 2), …. Notice that there are 4 2 = 16 different samples. For the first sample, compute the sample mean, the sample variance and the sample standard deviation in cells F2:H2. Repeat this computation for all remaining samples in cells F3:H17;

2.

Compute the population mean in cell J2 ( πœ‡ = 3 ) and the average of the 16 sample means in cell J3. Their equality illustrates that the sample mean is an unbiased estimator of the population mean when sampling with replacement, i.e. 𝐸(𝑋̅) = πœ‡ ;

3.

Compute the population variance in cell J5: = 𝑉𝐴𝑅. 𝑃(𝐴2: 𝐴5) ( 𝜎 2 = 3.5

) and the average of the 16 sample variances in cell J6. Their equality illustrates that the sample variance is an unbiased estimator of the population variance when sampling with replacement, i.e. 𝐸(𝑆 2 ) = 𝜎 2

;

4.

Compute the population standard deviation in cell J8: = 𝑆𝑇𝐷𝐸𝑉. 𝑃(𝐴2: 𝐴5)

( 𝜎 = 1.8708

) and the average of the 16 sample standard deviations in cell J9. Notice that

𝐸(𝑆) = 1.4142 ≠ 𝜎 = 1.8708

(the sample standard is a biased estimator of the population standard deviation);

5.

Compute 𝜎 2 ⁄ = 3.5 2 in cell J11 and the (population) variance of the 16 sample means in cell J12: = 𝑉𝐴𝑅. 𝑃(𝐹2: 𝐹17) . Their equality is an illustration of π‘£π‘Žπ‘Ÿ(𝑋̅) = 𝜎 2 ⁄ 𝑛 .

Assignment PA200

Repeat Steps 1 to 5 for samples of two observations from the population consisting of three elements: 1 , 4, 7.

249

Problem PS201

Consider again the population in Problem PS200. We illustrate some properties of sample statistics when sampling without replacement.

1.

Report a first possible simple random sample without replacement of 2 observations from this population in cells D2:E2: (1 2). Report all remaining possible samples in cells

D3:E13: (1 3), (1 6), (2 1), (2 3), …. Notice that there are 12 different samples. For the first sample, compute the sample mean, the sample variance and the sample standard deviation in cells F2:H2. Repeat this computation for all remaining samples in cells

F3:H13;

2.

Compute the population mean in cell J2 and the average of the 12 sample means in cell

J3. Their equality illustrates that the sample mean is an unbiased estimator of the population mean also when sampling without replacement, i.e. 𝐸(𝑋̅) = πœ‡ ;

3.

Compute the population variance in cell J5 and the average of the 12 sample variances in cell J6 and observe that

(illustrating that

𝑁−1

𝑁

∗ 𝑆

𝐸(𝑆

2

2 ) ≠ 𝜎 2

but that 𝐸(𝑆 2 ) = 𝜎

is an unbiased estimator of 𝜎 2

2

);

𝑁

𝑁−1

= 3.5 ∗

4

3

= 4.6667

4.

Compute the population standard deviation in cell J8 and the average of the 12 sample standard deviations in cell J9 and notice that 𝐸(𝑆) = 1.8856 ≠ 𝜎 ∗ √𝑁 (𝑁 − 1)

2.1602

.

=

Assignment PA201

Repeat Steps 1 to 4 for samples without replacement of two observations from the population consisting of three elements: 1, 4, 7.

250

Problem PS202

Consider the small data set in Problem PS187 (in the range A2:A11) and assume the data make up a population. To repeat the analysis in Problem PS200 for a sample of size 5 from this population we would have to list 10 5 = 100000 samples. Instead we resort to simulation.

1.

Take a simple random sample with replacement of size 5 in the range C2:G2 (see Problem

PP3) and drag the range C2:G2 to line 1001 to generate 1000 random samples with replacement. Compute the 1000 sample means in column H, the sample medians in column I, the sample variances in column J, the sample mean square errors in column K and the sample standard deviations in column L;

2.

Compute the population mean in cell N2 (value of 6), the average of the 1000 sample means in cell O2. Comparing both cells illustrates the property 𝐸(𝑋̅) = πœ‡ . Repeat the simulation a number of times (key F9);

3.

Compute the population median in cell N3 (value of 4.5), the average of the 1000 sample medians in cell O3. Notice that on the average the sample median does not estimate the population median very well. Repeat the simulation a number of times (key F9);

4.

Compute the population variance in cell N4 (value of 11.2), the average of the 1000 sample variances in cell O4, the average of the sample mean square errors in cell P4.

Notice that the sample variance is, on the average, a better estimator of the population variance than the sample mean square error. The simulation illustrates the property

𝐸( 𝑆

2 ) = 𝜎

2

. Repeat the simulation a number of times (key F9);

5.

Compute the population standard deviation in cell N5 (value of 3.347), the average of the

1000 sample standard deviations in cell O5. Notice that on the average the sample standard deviation does not estimate the population standard deviation very well (at least for small samples). Repeat the simulation a number of times (key F9);

6.

Compute in cell N7 the variance of the sample mean 𝜎 2 ⁄ = 11.2 5 and in cell

O7 the variance of the 1000 sample means (illustrating the property: π‘£π‘Žπ‘Ÿ(𝑋̅) = 𝜎 2 ⁄ 𝑛 ).

Repeat the simulation a number of times (key F9).

Assignment PA202

Repeat Steps 1 to 6 for the data set in Assignment PA187.

251

Problem PS203

Consider the small data set in Problem PS187 (in the range A2:A11) and assume the data make up a population. Generate a simple random sample without replacement of size 5 in the range

D2:D6 using random numbers in the range B2:B11 and the integers 1 to 5 in cells C2:C6. (see

Problem PP3).

1.

Compute the sample mean in cell F2, the sample median in cell G2, the sample variance in cell H2 and the sample standard deviation in cell I2;

2.

Use the TABLEoption to generate 1000 times the sample statistics of Step 1;

3.

Compute the population mean in cell K2 (value of 6), the average of the 1000 sample means in cell L2. Comparing both cells illustrates that 𝐸(𝑋̅) = πœ‡ also holds for sampling without replacement. Repeat the simulation a number of times (key F9);

4.

Compute the population median in cell K3 (value of 4.5), the average of the 1000 sample medians in cell L3. Notice that on the average the sample median does not estimate the population median very well. Repeat the simulation a number of times (key F9);

5.

Compute the population variance in cell K4 (value of 11.2), the average of the 1000 sample variances in cell L4, the average of the 1000 sample variances multiplied by 9/10 in cell L5. Notice that 𝑆 2 ∗ (𝑁 − 1) 𝑁 is, on the average, a better estimator for the population variance than the sample variance itself. The simulation illustrates the result

𝐸( 𝑆

2 ) ∗ ( 𝑁 − 1 ) ⁄ 𝑁 = 𝜎

2

for random sampling without replacement from a finite population.

Repeat the simulation a number of times (key F9);

6.

Compute in cell K7 the variance of the sample mean (𝜎 2 𝑛 ⁄

((10 − 5) (10 − 1) ) = 1.244

and in cell L7 the variance of the 1000 sample means (illustrating the property: π‘£π‘Žπ‘Ÿ(𝑋̅) = 𝜎

2 𝑛

𝑁−𝑛

𝑁−1

for random sampling without replacement) . Repeat the simulation a number of times (key F9).

Assignment PA 203

Repeat Steps 1 to 6 for the data set in Assignment PA187.

252

Problem PS204

Consider the data set ‘Euroweight’ as a population.

1.

Compute the mean (answer: 7.52123 g), the variance (answer: 0.00118 𝑔 2

), the standard deviation (answer: 0.03437 𝑔 ) and the skewness (answer: -0.18822) of the variable

‘weight’ in cells L2:L5. Notice that the computation of the population skewness in Excel requires that the function for sample skewness SKEW be multiplied by

(𝑁 − 1) ∗ (𝑁 − 2) 𝑁 2

with N the population size;

2.

Take a random sample of size 30 (with replacement) from the population of weights in cells D2:D31. Compute the sample mean, the sample variance, sample standard deviation and sample skewness in cells F2:I2 and use the TABLE -option to repeat these statistics

1000 times;

3.

Compute the averages of the 1000 sample means, variances, standard deviations and skewness coefficients in cells L9:L12. Compare the averages with the corresponding population parameters. Notice that the correspondence is close for the means and the variances, slightly biased for the standard deviation but strongly biased for the skewness parameter (often even with an opposite sign). A major factor to explain the negative sign of the skewness of the population is the extreme outlier 7.201 g. As this observation is not often picked up in the sampling, most sample coefficients of skewness will be positive with a positive average as a result;

4.

Alternative measures have been proposed to overcome the influence of outliers on the coefficient of skewness. One such coefficient is Galton’s skewness defined as

(𝑒

3

+ 𝑒

1

− 2 ∗ 𝑒

2

3

− 𝑒

1

) where 𝑒 π‘˜

is the π‘˜ π‘‘β„Ž

quartile. Compute Galton’s coefficient of skewness for the variable ‘weight’ in the population (answer: 0.05882 using

QUARTILE.EXC

to compute quartiles) and the sample variant in column J for all samples.

Compute the average of the sample Galton coefficients in cell L13 and compare with the population value.

Assignment PA204

Consider the data set ‘Sabena’ as a population. Apply Steps 1 to 4 above for the variable DELAY

TIME ARR.

253

Problem PS205

Consider the data set ‘Sabena’ as a population.

1.

Compute in cell Y2 the proportion πœ‹

0

of flights that did not have any delay upon arrival in

Brussels.

Answer: cell Y2: = πΆπ‘‚π‘ˆπ‘π‘‡πΌπΉ(𝑄2: 𝑄3854; " ≥ 0") 3853 ; πœ‹

0

= 0.3397

;

2.

Use column R to report 1 in a cell when the corresponding flight is on time, 0 if not.

Compute in cell Y3 the mean of the values in column R (also πœ‹

0 variance 𝜎 2

) and the population

in cell Y4: = 𝑉𝐴𝑅. 𝑃(𝑅2: 𝑅3854), equal to 0.2243. Check that the population variance equals πœ‹

0

∗ (1 − πœ‹

0

) in cell Y5;

3.

Take a random sample (with replacement) of size 100 (range T2:T101) from the 0-1 values in column R. Compute the proportion of flights 𝑝̂ in the sample that did not have any delay in Brussels (cell V2: = 𝐴𝑉𝐸𝑅𝐴𝐺𝐸(𝑉2: 𝑉101) ). Compute the sample variance in cell W2: = 𝑉𝐴𝑅. 𝑆(𝑇2: 𝑇101) . Use the TABLE -option in columns V and W to repeat

1000 times both sample statistics;

4.

Compute in cell Y7 the average of the 1000 sample proportions in column V and compare with πœ‹

0

in cell Y2 (illustrating the property that the sample proportion is unbiased).

Compute in cell Y8 the average of the 1000 sample variances in column W and compare with the population variance in cell Y4;

5.

Compute in cell Y10 the variance of the sample proportion in a sample of 100 observations: = πœ‹

0

∗ (1 − πœ‹

0

) 100 = 0.00224

. Compute in cell Y11 the variance of the

1000 sample proportions and compare with cell Y10. Compute in cell Y12 the average of the 1000 sample variances in column W divided by 100 and compare with cells Y10 and

Y11: the sample variance divided by the sample size is a reasonable estimate of the variance of the sample proportion.

Assignment PA205

Consider the data set ‘Forbes2010’. Apply Steps 1 to 5 above for the proportion companies with sales above 10 billion dollars.

254

Problem PS206

Consider the data set ‘Forbes2010’ and consider it to be a population.

1.

Compute the population covariance and correlation between the variables ‘market value’ and ‘profit’ in cells Q2:Q3: = 𝐢𝑂𝑉𝐴𝑅. 𝑃(𝐹2: 𝐹1997; 𝐻2: 𝐻1997) (=48.3708) and =

𝐢𝑂𝑅𝑅𝐸𝐿(𝐹2: 𝐹1997; 𝐻2: 𝐻1997) (=0.6041);

2.

Take a random sample of 30 companies in cells J2:J31. Example for cell J2:

= 𝐼𝑁𝐷𝐸𝑋($𝐡$2: $𝐡$1997; π‘…π΄π‘π·π΅πΈπ‘‡π‘ŠπΈπΈπ‘(1; 1996)) and drag to cell J31.

Add the the profit and market value in columns K and L. Example for cell K2 (profits): =

𝐼𝑁𝐷𝐸𝑋($𝐹$2: $𝐹$1997; 𝑀𝐴𝑇𝐢𝐻(𝐴2; $𝐡$2: $𝐡$1997; 0)) . Drag to cell K31. Similarly for cells L2:L31(market value).

Compute the sample covariance between ‘profit’ and ‘market value’ in cell N2, the sample correlation in cell O2;

3.

Use the TABLE -option to repeat the sample covariance and the correlation 1000 times in columns N and O.

Compute the average of the sample covariances and the correlations in cells Q6:Q7.

Notice: the average of the sample covariances is usually quite close to the population covariance (an illustration of the property that the sample covariance is an unbiased estimator of the population covariance), the average of the sample correlation coefficients is quite different from the population correlation (illustrating the result that the sample correlation is a biased estimator of the population correlation), the variability in both the covariances and the correlations is quite high (a result of the high variability in both variables, e.g. the population standard deviation of ‘market value’ is 27.60 for a population mean of 15.71).

Assignment PA206

Take the data set ‘Sabena’ and consider it to be a population. Compute the covariance and the correlation between the variables ‘DELAY TIME DEP’ and ‘DELAY TIME ARR’.

Take a random sample of 50 flights and compute the sample covariance and sample correlation.

Use the TABLE -option to generate both sample statistics 1000 times.

Compute the average sample covariance and the average sample correlation and compare with the population covariance and correlation.

255

Problem PS207

Take Problem PP41 and its random variable X . The expected value, variance and standard deviation of X were computed in Problem PP61:

𝐸(𝑋) = 2.05, π‘£π‘Žπ‘Ÿ(𝑋) = 0.7475, 𝑠𝑑𝑑𝑒𝑣(𝑋) = .8646

.

1.

Generate 50 random numbers in the range A2:A51 and a random sample of the variable X in the range B2:B51 (see Problem PP41). For the first 20 observations in column B, compute the sample mean in cell D3, the sample variance in cell E3 and the sample standard deviation in cell F3. For all 50 observations compute the sample mean in cell G3, the sample variance in cell H3 and the sample standard deviation in cell I3;

2.

Use the TABLE -option to repeat 1000 times the range D3:I3;

3.

Report the value of 𝐸(𝑋) in cell K3, compute the average of the 1000 sample means ( 𝑛 =

20 ) in cell L3, the average of the 1000 sample means ( 𝑛 = 50 ) in cell M3. Note that both averages are close to 𝐸(𝑋) illustrating that 𝐸(𝑋̅) = πœ‡ ;

4.

Report the value of π‘£π‘Žπ‘Ÿ(𝑋) in cell K4, compute the average of the 1000 sample variances

( 𝑛 = 20 ) in cell L4, the average of the 1000 sample variances ( 𝑛 = 50 ) in cell M4. Note that both averages are close to π‘£π‘Žπ‘Ÿ(𝑋) illustrating that 𝐸(𝑆 2 ) = 𝜎 2

;

5.

Report the value of 𝑠𝑑𝑑𝑒𝑣(𝑋) in cell K5, compute the average of the 1000 sample standard deviations ( 𝑛 = 20 ) in cell L5, the average of the 1000 sample standard deviations ( 𝑛 = 50 ) in cell M5. Note that both averages slightly underestimate 𝑠𝑑𝑑𝑒𝑣(𝑋) and the estimation for sample size 50 is, on average, slightly better;

6.

Compute in cell K7 the value 𝜎 2 ⁄ = 0.7475 20 = 0.0374

, the variance of the sample mean of a sample of size 20. Compute in cell L7 the (population) variance of the sample means in column D. Compare both values.

Repeat this computation for the sample of size 50 in cells K8 and M8 and compare both values.

Assignment PA207

Repeat Steps 1 to 6 for the random variable Y in Assignment PA41. The expected value, variance and standard deviation of Y are computed in Assignment PA61.

256

Problem PS208

1.

Consider a Bernoulli process with a fraction of success πœ‹ = 0.4

. Report the fraction 0.4 in cell A1. Consider 50 Bernoulli trials in the range A2:A51 with probability of success in cell A1. Example for cell A2: = 𝐼𝐹(𝑅𝐴𝑁𝐷( ) < 𝐴$1; 1; 0) . Drag cell A2 to cell A51;

2.

Compute the number of successes in the sample in cell C2: = π‘†π‘ˆπ‘€(𝐴2: 𝐴51) and the fraction successes in cell D2: = 𝐢2 50 ;

3.

Use the TABLE -option of Excel to generate the sample statistics in C2:D2 a thousand times in columns C and D;

4.

Report the expected value of the number of successes in a sample of 50 in cell F3: = 50 ∗

𝐴1 . Compute the average number of successes in the 1000 samples in cell G3: =

𝐴𝑉𝐸𝑅𝐴𝐺𝐸(𝐢2: 𝐢1001) . Compare both values;

5.

Report the variance of the number of successes in a sample of 50 in cell F5:

= 50 ∗ 𝐴1 ∗ (1 − 𝐴1) . Compute the variance of the number of successes in the 1000 samples in cell G5: = 𝑉𝐴𝑅𝐼𝐴𝑁𝐢𝐸. 𝑃(𝐢2: 𝐢1001) . Compare both values;

6.

Repeat Step 5 for the standard deviations of the number of successes in cells F7 and G7;

7.

Repeat the fraction of successes in cell A1 in cell F11. Compute the average of the fraction successes in the 1000 samples: = 𝐴𝑉𝐸𝑅𝐴𝐺𝐸(𝐷2: 𝐷1001) . Compare both values;

8.

Report the variance of the fraction of successes in a sample of 50 in cell F13:

𝐴1 ∗ (1 − 𝐴1) 50 . Compute the variance of the fraction of successes in the 1000 samples in cell G13:

= 𝑉𝐴𝑅𝐼𝐴𝑁𝐢𝐸. 𝑃(𝐷2: 𝐷1001) . Compare both values;

9.

Repeat Step 8 for the standard deviations of the fraction of successes in cells F15 and

G15;

10.

Change the fraction of success in cell A1 to different values: 0.8, 0.05, 0.95, 0.99.

Assignment PA208

A process generates a fraction 0.4 of successes. Use the function CRITBINOM to generate 1000 values of a binomial variable with 𝑛 = 25 and πœ‹ = 0.4

. Report the fraction of successes for each sample. Compute the average number of successes and the average number of fractions over the

1000 samples. Compare to their expected values. Compute the variance of the number of successes and the variance of the fractions over the 1000 samples and compare to their theoretical values. Use the TABLE -option of Excel to repeat the four sample statistics 50 times. Change the fraction in cell A1 to different values: 0.8, 0.05, 0.95, 0.99.

257

Problem PS209 (sampling from a normal and uniform process)

Consider a normal random variable X with expected value πœ‡ = 30 and standard deviation 𝜎 = 5 .

Report these values in cell A2 (expected value) and cell B2 (standard deviation).

1.

Generate 64 values of X in the range A3:A66: = 𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); $𝐴$2; $𝐡$2) ;

2.

First consider a sample of size 10 (first 10 observations in column A). Compute the sample mean in cell D3, the sample variance in cell E3 and the sample standard deviation in cell F3. Then consider the full sample of size 64 and again compute the sample mean

(cell G3), the sample variance (cell H3) and the sample standard deviation (cell I3);

3.

Use the TABLE -option of Excel to generate the sample statistics in cells D3:I3 1000 times;

4.

Consider the sample of size 10. Compute the average of the 1000 sample means in cell K4 and compare to its expected value 30. Compute the average of the 1000 sample variances in cell K6 and compare to its expected value 25. Compute the average of the 1000 sample standard deviations in cell K8 and compare to the standard deviation of X . Notice that the average value consistently underestimates 𝜎 when repeating the simulation. Compute the variance of the 1000 sample means in cell K10 and compare to its expected value 2.5;

5.

Repeat Step 4 for the sample of size 64. Notice that the average value of the standard deviations is usually only slightly smaller than 𝜎 (larger sample sizes provide a better estimate of 𝜎 );

6.

Change the values of the parameters πœ‡ and 𝜎 and check the results.

Assignment PA209

Repeat Steps 1 to 6 for a uniform random variable X with lower bound 5 and upper bound 15.

258

Problem PS210 (sampling from a skew gamma pdf)

Consider a (right skew) gamma random variable X with parameters 𝛼 = 2 and 𝛽 = 5 . Report these values in cell A2 and cell B2.

1.

Generate 64 values of X in the range A3:A66: = 𝐺𝐴𝑀𝑀𝐴. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); $𝐴$2; $𝐡$2) ;

2.

First consider a sample of size 10 (first 10 observations in column A). Compute the sample mean in cell D3, the sample variance in cell E3 and the sample standard deviation in cell F3. Then consider the full sample of size 64 and again compute the sample mean

(cell G3), the sample variance (cell H3) and the sample standard deviation (cell I3);

3.

Use the TABLE -option of Excel to generate the sample statistics in cells D3:I3 1000 times;

4.

Consider the sample of size 10. Compute the average of the 1000 sample means in cell K4 and compare to its expected value πœ‡ = 2 ∗ 5 = 10 , the expected value of X .

Compute the average of the 1000 sample variances in cell K6 and compare to its expected value 𝜎 2 = 2 ∗ 5 ∗ 5 = 50 , the variance of X .

Compute the average of the 1000 sample standard deviations in cell K8 and compare to the standard deviation of X ( = √50 = 7.07

). Notice that the average value consistently underestimates 𝜎 when repeating the simulation.

Compute the variance of the 1000 sample means in cell K10 and compare to its expected value 𝜎 2 ⁄ = 50 10 = 5 ;

5.

Repeat Step 4 for the sample of size 64. Notice that the average value of the standard deviations is usually only slightly smaller than 𝜎 (larger sample sizes provide a better estimate of 𝜎 );

6.

Change the parameter 𝛽 in cell B2 to the value 10 and check the results.

Assignment PA210

Consider a (left skew) gamma random variable Y with parameters 𝛼 = 5 and 𝛽 = 2 . Repeat Steps

1 to 6 for the variable Y (in Step 6, change the parameter 𝛼 to the value 10.

259

Problem PS211 (sample covariance and correlation)

Consider the random variables X and Y in Problem PP136. Their covariance and correlation were computed in Problem PP172: π‘π‘œπ‘£(𝑋, π‘Œ) = 0.9452

and π‘π‘œπ‘Ÿπ‘Ÿ(𝑋, π‘Œ) = 0.4795

.

1.

Toss the two dies 100 times and report the results in the ranges A2:A101 (die 1) and

B2:B101 (die 2). Report the value of X and Y for every pair of tosses in columns C ( X ) and D ( Y );

2.

Consider the first twenty tosses as a sample of size 20 of X and Y . Compute the sample covariance in cell F3, the sample correlation in cell G3. Consider the full sample of 100 observations, compute the sample covariance in cell H3, the sample correlation in cell I3;

3.

Use the TABLE -option of Excel to repeat the sample statistics in F3:I3 1000 times;

4.

Report the value of π‘π‘œπ‘£(𝑋, π‘Œ) in cell K3, the value of π‘π‘œπ‘Ÿπ‘Ÿ(𝑋, π‘Œ) in cell L3.

Compute the average values of the four sample statistics in cells M3:N3 and M8:N8.

Compare the averages to the exact values in cells K3:L3. Notice that in all cases the sample statistics provide, on average , close estimates of the process parameters (the expected value of the sample covariance equals the process covariance, not quite so for the correlation, see the Assignment for an example where the correlation is, on average, not a close estimate of the process correlation). However, notice the large variability in the sample statistics.

Assignment PA211

Consider the Assignments in Problem PP136 and Problem PP172. Generate 100 values of X and

Y in columns A and B. Compute the values 𝑋 π‘Œ and ⁄ in columns C and D. Repeat Steps 1 to

4 for the variables X and Y , then also for the variables 𝑋 π‘Œ and ⁄ . Notice here that the average of the correlations between 𝑋 π‘Œ and ⁄ is not very close to the theoretical correlation for the small sample size of 20 observations.

260

Problem PS212 (sample covariance and correlation for bivariate normal random variables)

Assume random variables X and Y are bivariate normally distributed with parameters πœ‡

𝑋

= 10 , 𝜎 2

𝑋

= 4, πœ‡

π‘Œ

= 10, 𝜎 2

π‘Œ

= 1, 𝜎

π‘‹π‘Œ

= −1 . It follows that the correlation 𝜌(𝑋, π‘Œ) = −0.5

.

Report the value of the correlation in cell A2.

1.

Generate 100 values of X in cells A4:A103: = 𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 10; 2) .

Generate 100 values of Y in cells B4:B103. Example for cell B4:

= 𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉 (𝑅𝐴𝑁𝐷( ); 10 + $𝐴$2 ∗

(𝐴4−10)

; 𝑆𝑄𝑅𝑇(1 − $𝐴$2 ∗ $𝐴$2)) .

2

(see also Problem PP177, Step 3);

2.

Consider the first 15 observations as a sample of size 15 and compute the sample covariance in cell D3, the sample correlation in cell E3. Repeat the computation of sample covariance and correlation for all data (sample size of 100) in cells F3 and G3;

3.

Use the TABLE -option to repeat the statistics in D3:G3 1000 times;

4.

Report the value of π‘π‘œπ‘£(𝑋, π‘Œ) = −1 in cell I3, the value of π‘π‘œπ‘Ÿπ‘Ÿπ‘’π‘™(𝑋, π‘Œ) = −0.5

in cell

J3.

Compute the average value of the sample covariances for 𝑛 = 15 in cell L3, the average value of the sample correlations in cell M3.

Repeat the computations for 𝑛 = 100 in cells L7:M7.

Compare the averages to the exact process values. Notice that in all cases the sample statistics provide, on average , close estimates of the process parameters with slightly better results for the larger sample.

Assignment PA212

Consider the random variables 𝑆 = 𝑋 + π‘Œ and π‘Œ = 𝑋 − π‘Œ with X and Y the random variables above.

Generate 100 values of S and V in columns C and D for values of X and Y in columns A and B.

Repeat Steps 2 to 4 for the random variables S and V . Compare the averages of the sample statistics to the theoretical process covariances and correlations.

261

Problem PS213 (comparing estimators of the mean of a normal and uniform population)

Assume we need to estimate the mean πœ‡ of a normally distributed population. To do this we have a sample of 9 observations from that population: π‘₯

1

, π‘₯

2

, … , π‘₯

9

. Consider four estimators for the mean πœ‡ :

- the sample mean π‘₯Μ… =

1

∗ ∑ π‘₯ 𝑖

,

9

- the average of the maximal and the minimal sample value,

- the sample median

- a linear combination of the 9 observations, 1 45 ∗ (π‘₯

1

+ 2 ∗ π‘₯

2

+ 3 ∗ π‘₯

To check how useful these estimators are we set up an experiment:

3

+ β‹― + π‘₯

9

) .

1.

Generate 9 observations from a normal population wit expected value 10 and standard deviation 3 in the range A3:A11;

2.

Compute the sample average in cell C3, the average of the maximal and minimal sample observation in cell D3, the sample median in cell E3 and the linear combination in cell F3;

3.

Use the TABLE -option to generate the four estimators 1000 times;

4.

Compute the average of the 1000 values of each estimator in cells I3:L3 and the variance in cells I4:L4;

5.

Compare the estimators: all four are unbiased but the variability of the sample mean is smallest.

Assignment PA213

Apply the 5 steps above when X is uniformly distributed between 2 and 12.

262

Problem PS214 (combining data from two samples to estimate a mean and a proportion)

Assume X a random variable with unknown expected value πœ‡ and variance 𝜎 2

. We take two samples from X of sizes m and n . Let 𝑋̅ π‘š

and 𝑋̅ 𝑛

denote the sample means. Let 𝑋̅ denote the sample mean when the observations in both samples are taken together as one sample of size π‘š + 𝑛 .

1.

To estimate πœ‡ , consider the estimator 𝑋̅̅ =

1

2

∗ (𝑋̅ π‘š

+ 𝑋̅ 𝑛

) . Show that this estimator is unbiased and compute its variance. 𝜎

2

Answer: π‘£π‘Žπ‘Ÿ(𝑋̅̅) = ∗ (1 π‘š ⁄ ) ;

4

2.

Show that for any m and n it holds that π‘£π‘Žπ‘Ÿ(𝑋̅) ≤ π‘£π‘Žπ‘Ÿ(𝑋̿) . In which case are both equal?

Compute both variances for 𝜎 = 3, π‘š = 16, 𝑛 = 8 .

Answer: π‘£π‘Žπ‘Ÿ(𝑋̿) = 0.4219

, π‘£π‘Žπ‘Ÿ(𝑋̅) = 0.375

;

3.

Assume X to be normally distributed with expected value 10 and standard deviation 3.

Generate a first sample of size 16 in cells A2:A17, a second sample of size 8 in cells

B2:B9. Compute the sample means of both samples in cells D2 and E2, the value of the estimator 𝑋̿ in cell F2 and the value of 𝑋̅ in cell G2;

4.

Use the TABLE -option to generate 1000 values of the statistics in cells D2:G2;

5.

Compute the average of the weighted means in column F in cell J2, the average of the sample means in column G in cell K2. The result shows experimentally that both estimators are unbiased;

6.

Compute the variance of the weighted means in column F in cell J3, the variance of the sample means in column G in cell K3. Compare both values with the results in Step 2.

Assignment PA214

Consider a Bernoulli process with a fraction of successes πœ‹ = 0.25

. Consider a first sample consisting of 40 Bernoulli trials in the range A2:A51 and a second sample of 20 trials in the range

B2:B21. Compute the fraction of successes in both samples and the average of both fractions.

Show that the average is an unbiased estimator of the process fraction 0.25. Derive the variance of this estimator. Compute the fraction of successes taking all 60 sample points as one sample.

Show that this fraction is an unbiased estimator and compute its variance. Use the TABLE -option to generate 1000 times both estimators. Compute the average and variance for both estimators over the 1000 observations and compare to the theoretical values above.

263

Problem PS215 (combining data from two samples to estimate a variance)

Assume X a random variable with unknown variance 𝜎 and n . Let 𝑆 2 π‘š

and 𝑆 2 𝑛

denote the sample variances. Let

2

𝑆

. We take two samples from

2

denote the sample variance when the observations in both samples are taken together as one sample of size π‘š + 𝑛 .

X of sizes m

1.

To estimate 𝜎 2

, consider the estimator 𝑆 2 π‘Žπ‘£π‘’

=

1

2

∗ (𝑆 2 π‘š

+ 𝑆 2 𝑛

) . Show that this estimator is unbiased;

2.

Assume X to be normally distributed with expected value 10 and standard deviation 3.

Generate a first sample of size 16 in cells A2:A17, a second sample of size 8 in cells

B2:B9.

Compute the sample variances of both samples in cells D2 and E2, the value of the estimator 𝑆 2 π‘Žπ‘£π‘’

in cell F2 and the value of 𝑆 2

in cell G2;

3.

Use the TABLE -option to generate 1000 values of the statistics in cells D2:G2;

4.

Compute the average of the estimates in column F in cell J2, the average of the estimates in column G in cell K2. The result shows experimentally that both estimators are unbiased;

5.

Compute the variance of the estimates in column F in cell J3, the variance of the estimates in column G in cell K3. Compare both estimated variances. Notice that the variability of the sample variance based on the total sample is smaller than the variability of the estimator of the average of both sample variances.

Assignment PA215

Repeat Steps 2 to 5 above but assume that X is a gamma random variable with parameters 𝛼 = 2, 𝛽 = 5 .

264

Problem PS216 (estimation of parameters of Poisson pmf and exponential pdf)

Let X be Poisson distributed with expected value πœ‡ . Take a random sample of size n .

1.

Show that the sample mean 𝑋̅ is an unbiased estimator of πœ‡ but that 𝑋̅ 2 estimator of πœ‡ 2

. However, show that 𝑋̅ 2

is a biased

is an asymptotically unbiased estimator of πœ‡ 2

Answer: 𝐸(𝑋̅ 2 ) = πœ‡ 2 + πœ‡ 𝑛 ;

.

2.

Derive an unbiased estimator for πœ‡ 2

Answer: 𝑋̅ 2 − 𝑋̅ 𝑛 ;

from the result in Step 1.

3.

Assume πœ‡ = 4 . Generate a sample of 100 values of X in the range B6:B105 (see Problem

PP115 to do this).

Compute the mean of the first 12 observations (in the range B6:B17) in cell D6, the square of cell D6 in cell E6.

Compute the mean of all 100 observations in cell F6 and the square of cell F6 in cell G6.

Use the TABLE -option to generate the sample statistics in cells D6:G6 1000 times;

4.

Compute the average of the 1000 sample averages in column D (column F) in cell K12

(cell L12), the average of the squares in column E (column G) in cell K15 (L15). Cells

K12 and L12 illustrate that 𝑋̅ is an unbiased estimator of πœ‡ . Cells K15 and L15 show that

𝑋̅ 2

is a biased estimator of πœ‡ 2

but the bias decreases with increasing sample size.

Compute the value of 𝑋̅ 2 − 𝑋̅ 𝑛 in cells K18 and L18 illustrating the result in Step 2.

Assignment PA216

Consider X with exponential pdf with parameter πœ† (see Problem PP120).

Show that the sample mean 𝑋̅ is an unbiased estimator of 1 πœ† .

Show that 𝑋̅ 2

is a biased estimator of ⁄ 2

and that 𝑛 ∗ 𝑋̅ 2 ⁄ (𝑛 + 1) is an unbiased estimator of

⁄ 2

.

Generate 100 observations of X when πœ† = 0.5

.

Compute the mean of the first 12 observations, its square and the value of 12 ∗ 𝑋̅ 2 ⁄ 13 . Repeat similar statistics for all 100 observations.

Use the TABLE -option to repeat the six statistics 1000 times.

Compare the simulated results with the exact results shown above.

Download