251y0411 2/27/04 Name: ________KEY___________ Student Number and class: _____________________

advertisement
251y0411 2/27/04
ECO251 QBA1
FIRST HOUR EXAM
February 18, 2004
Name: ________KEY___________
Student Number and class: _____________________
Part I. (7 points)
Use the 5 3-digit numbers that you used in the second problem in the take-home exam. (If you don’t have
them – take the numbers (9, 9, 9, 9, 9, 9, 10, 12, 8,1300) and replace the nines with your student number,
changing any zeros in your student number to ones, then rewrite the resulting string of numbers as five
three-digit numbers. Example: Seymour Butz’s student number is 976500, so he gets 976511101281300
which, as three digit numbers is (976, 511, 101, 281, 300. )
Compute the following:
a) The Median (1)
b) The Standard Deviation (3)
c) The 3rd Quintile (2)
d) The Coefficient of variation (1)
Solution: The numbers in order are 101, 281, 300, 511, 976.
x
x2
101
10201
x1
281
78961
x2
x3
300
90000
x4
511
261121
x5
Total
976
952576
2169
1392859
a) The middle number is 300.
b) n  5, x 
 x  2169  433 .80 , s   x
2
2
 nx 2
n
5
n 1
451946 .8

 112986 .7 . So s  112986.7  336.1349
4
c) pn  1  .66  3.6 . So a  3 and .b  .6

1392859  5433 .80 2
5 1
x1 p  xa  .b( xa1  xa ) so x1.6  x.4  x3  0.6( x 4  x3 )  511  0.6(511  300)  637.6
d) C 
s 336 .1349

 0.7749
x
433 .8
1
251y0411 2/16/04
Part II. (At least 35 points – At least 2 points each)
1. Which of the following is a graph of the cumulative distribution?
a) *Ogive
b) Histogram
c) Frequency Polygon
d) Pie Chart
e) None of the above
2.
Which of the following is an example of continuous ratio data?
a) The Likert Scale rates consumer impression of a product on a 1 to 5 scale with one best
and 5 worst.
b) The Celsius scale for measuring temperature
c) The number of Brittany Spaniels entered in a dog show
d) *The number of dollars you paid in sales tax last year.
3.
Cumulative relative frequency cannot be calculated by
a) *Taking the cumulative frequency for each class and dividing by the sum of cumulative
frequencies for all classes.
b) The relative frequency for each class plus the sum of the relative frequencies of all
previous classes.
c) The relative frequency of each class plus the cumulative relative frequency of the
previous class
d) The cumulative frequency of each class divided by the sum of the frequencies for all
classes.
e) All of the above can be used to calculate cumulative relative frequencies.
4.
Consider the following formulas (i)

x
2
 nx 2
n 1


(ii)
k
3mean  mode 
(iii) 33
std .deviation
s
n
x 3  3x
x 2  2nx 3 .If the sample is skewed to the left, which of these
(n  1)( n  2)
should be positive?
a) *(i)
b) (ii)
c) (iii)
d) (iv)
e) None should be positive
f) All should be positive.
Answer: Any legitimate measure of skewness should be negative if the population is skewed to the
n
x  x 3
right. From your formula table, the measures of skewness are: (i) k 3 
(n  1)( n  2)
(iv)


n
(n  1)( n  2)
 x
3
 3x
x
2

 2nx 3 -
skewness, (ii) g1 
k3
s3
- relative skewness and
3mean  mode 
(iii) SK 
- Pearson’s measure of skewness. The other one is s 2 
std .deviation

x
2
 nx
 x  x 
2
n 1
2
- the sample variance, which is always positive and measures dispersion. .
n 1
251y0411 2/16/04
2
5.
(D-68) From which of the following would it be easiest to calculate the interquartile range?
a) Mean, median and mode
b) Ogive
c) *Box Plot.
d) Stem-and-leaf plot
6.
A summary measure that is computed to describe a characteristic of a sample is called
a) a parameter.
b) a census.
c) *a statistic.
d) the scientific method.
7.
What is the difference between a field and a cell? Make a diagram of a table and show where one
of these is.
Solution A cell is a location in a field. The table that was handed out to you is below.
Table Number
Title
Headnote

Stub
Master Caption
Stub Head
R
O
W
L
A
B
E
L
S
Footnotes
Source Note
Column Labels
C
E
L
L
S
 Boxhead
Field
Field


8.
If a distribution is skewed to the right, we would expect
a) mode > mean
b) *mode < median
c) median > mean
d) mode > median
Explanation: If it is skewed to the right the order should be ‘mode, median, mean,’ so the
median is larger than the mode.
9.
The estimation of the population average family expenditure on food based on the sample average
expenditure of 1,000 families is an example of
a) *inferential statistics.
b) descriptive statistics.
c) a parameter.
d) a statistic.
3
251y0411 2/16/04
10. Which of the following is most likely a parameter as opposed to a statistic?
a) the average score of the first five students completing an assignment
b) *the proportion of females registered to vote in a county
c) the average height of people randomly selected from a database
d) the proportion of trucks stopped yesterday that were cited for bad brakes
Duplicate question – ignore!!!
11. Which of the following is most likely a parameter as opposed to a statistic?
a) the average score of the first five students completing an assignment
b) the proportion of females registered to vote in a county
c) the average height of people randomly selected from a database
d) the proportion of trucks stopped yesterday that were cited for bad brakes
TABLE 2-2
At a meeting of information systems officers for regional offices of a national company, a survey was taken to determine the
number of employees the officers supervise in the operation of their departments, where X is the number of employees overseen
by each information systems officer.
X
f_
1
7
2
5
3
11
4
8
5
9
12. Referring to Table 2-2, across all of the regional offices, how many total employees were
supervised by those surveyed?
a) 15
b) 40
c) *127
d) 200
fx  71  52  113  84  95  7  10  33  32  45  127 .
Explanation:

TABLE 2-4
A survey was conducted to determine how people rated the quality of programming available on television. Respondents were
asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality). The stem-and-leaf display of the data
is shown below.
StemLeaves
3
24
4
03478999
5
0112345
6
12566
7
01
8
9
2
13. Referring to Table 2-4, what fraction of the respondents rated overall television quality with a
rating of 80 or above?
a) 0.00
b) *0.04
c) 0.96
d) 1.00
Explanation: There are 25 numbers. The highest 3 are 70, 71 and 92. Only 92 is above 80. 1
out of 25 is 1 25  .04.
4
251y0411 2/16/04
14. (6 points) On the basis of 100 observations
Stock A has a mean rate of return of 7% with a standard deviation of 1%
Stock B has a mean rate of return of 9% and a standard deviation of 1.5%
For stock A the fraction of observations between 4% and 10% must be at least ___88___%.
If returns on stock A have a symmetrical unimodal distribution, the fraction of observations between
4% and 10% must be approximately __99.7%_______.
According to what you have learned in class, which of these two stocks is riskiest? You must show why
for your answer to count.
Explanation: According to the Bienayme-Chebyshev rule (I called it Chebyshef’s Inequality), 1 k 2 is
the largest possible proportion in the tails, where tails are defined as the points below   k and the
points above   k . Since, for stock A,   7 and   1, 4 is   3 and 10 is   3 , so
k  3.
1
k2
 19 is the maximum proportion in the tails and 1  19  8 9  88 .8% is the minimum
proportion in the center. According to the empirical rule, almost all or approximately 99.7% must be in
the center (‘99%,’ ‘99.7%’ or ‘almost all’ were accepted.) For the Stock A, the coefficient of variation
is 1 7  0.14 and for Stock B it is 1.5 9  0.167 . Stock B is riskier.
15. (3 points) A survey of 47 cities shows that the number of new AIDS cases reported last year varied
from 135 to 1337. If these data are to be presented in 5 classes, what intervals would you use?
Explain your reasoning using an appropriate formula and use it to fill in the table below.
Class
From
To
A
B
C
D
E


1337

135
Use
 240 .4 . A possible interval above 240.4 might be 250 or 300. We might use one
5
of the following:
Class
From
To
Class
From
To
A
100
under 350
A
0
under 300
B
350
under 600
B
300
under 600
C
600
under 850
C
600
under 900
D
850
under 1100
D
900
under 1200
E
1100
under 1350
E
1200
under 1500
16. Cities are divided into three classes
Class
A
At least 900 new AIDS cases
B
Less than 900 new AIDS cases
C
More than 30% of new AIDS cases also had a chronic
contagious disease.
Which of the following classes are mutually exclusive? (Circle) (1.5)
A and B ,
B and C, A, B, and C
Which of the following classes are collectively exhaustive? (Circle) (1.5)
A and B ,
B and C, A, B, and C
5
251y0411 2/16/04
ECO251 QBA1
FIRST EXAM
February 18, 2004
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
Throughout this exam show your work! Please indicate clearly what sections of the problem you are
answering and what formulas you are using. Turn this is with your in-class exam.
Part III. Do all the Following (11 Points) Show your work!
1. Look at the frequency distribution below. Replace the 9s with your student number. If any digit of your
student number is 0, change it to a 1. For example, Seymour Butz’s student number is 976500 so the
frequencies he uses are (9, 7, 6, 5, 1, 1, 10, 12, 8).
Class
frequency
$500- 599.99
$600- 699.99
$700- 799.99
$800- 899.99
$900- 999.99
$1000-1099.99
$1100-1199.99
$1200-1299.99
$1300-1399.99
a. Calculate the Cumulative Frequency (0.5)
b. Calculate The Mean (0.5)
c. Calculate the Median (1)
d. Calculate the Mode (0.5)
e. Calculate the Variance (1.5)
f. Calculate the Standard Deviation (1)
g. Calculate the Interquartile Range (1.5)
h. Calculate a Statistic showing Skewness and
Interpret it (1.5)
i. Make a histogram of the Data showing relative
or percentage frequency (Neatness Counts!)(1)
j. Extra credit: Put a (horizontal) box plot below
the histogram using the same scale. (1)
9
9
9
9
9
9
10
12
8
Assume that this data represents a sample of
rents paid in Chester County.
Solution: x is the midpoint of the class. Our convention is to use the midpoint of 50 to 60, not 59.999.
Note also, that the midpoints have been divided by 10. Most numbers should be multiplied by 10, the
variance should be multiplied by 100 and k 3 by 1000. Calculations follow for both the computational and
definitional formulas. (Don’t do both.)
class
A
B
C
D
E
F
G
H
I
f F x
50- 59.999 9 9 55
60- 69.999 7 16 65
70- 79.999 6 22 75
80- 89.999 5 27 85
90- 99.999 1 28 95
100-109.999 1 29 105
110-119.999 10 39 115
120-129.999 12 51 125
130-139.999 8 59 135
59
fx
fx 2
495
455
450
425
95
105
1150
1500
1080
5755
27225
29575
33750
36125
9025
11025
132250
187500
145800
612275
fx3
1497375
1922375
2531250
3070625
857375
1157625
15208750
23437500
19683000
69365875
xx
f x  x 
-42.5424 -382.881
-32.5424 -227.797
-22.5424 -135.254
-12.5424 -62.712
-2.5424
-2.542
7.4576
7.458
17.4576 174.576
27.4576 329.492
37.4576 299.661
0.001
f  x  x 2
16288.7
7413.0
3049.0
786.6
6.5
55.6
3047.7
9047.1
11224.6
50918.8
f x  x 3
-692959
-241238
-68731
-9865
-16
415
53205
248411
420447
-290331
 f  59,  fx  5755 ,  fx  612275 ,  fx  69365875 ,  f x  x   0 (except for a
rounding error),  f x  x 2  50918.8, and  f x  x 3  290331. Note that, to be reasonable, the
n
2
3
mean, median and quartiles must fall between 50 and 140.
a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column.
251y0411 2/16/04
6
b. Calculate the Mean (1): x 
 fx  5755  97.5424
n
59
c. Calculate the Median (2): position  pn  1  .560   30 . This is above F  29 and below F  39,
 pN  F 
so the interval is G, 110-119.999 in hundreds. x1 p  L p  
 w so
 f p 
 .559   29 
x1.5  x.5  110  
 10   110  0.5  110 .5
10


d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 12 is the largest frequency,
the modal group is , 120 to 129.99 and the mode is 125 (in hundreds).
e. Calculate the Variance (3): s 2 
s2 
 f x  x 
n 1
2

 fx
2
 nx 2
n 1

612275  59 97 .5424 2 50918 .332

 877 .902 or
58
58
50918 .8
 877 .910 . The computer got 877.908.
58
f. Calculate the Standard Deviation (2): s  877.9  29.629 .
g. Calculate the Interquartile Range (3): First Quartile: position  pn  1  .2560   15 . This is above
 pN  F 
F  9 and below F  16 , so the interval is B, 60-69.999. x1 p  L p  
 w gives us, in hundreds,
 f p 
 .25 59   9 
Q1  x1.25  x.75  60  
 10   68 .214 .
7


Third Quartile: position  pn  1  .7560   45 . This is above F  39 and below F  51, so the interval
 .7559   39 
is H, 120-129.999. x1.75  x.25  120  
 10   124 .375 .
12


IQR  Q3  Q1  124.375  68.214  56.161.
h. Calculate a Statistic showing Skewness and interpret it (3):
 fx  5755 ,  fx  612275 ,  fx  69365875 , and  f x  x 3  290331.
n
 fx  3x  fx  2nx   585957  69365875  397.5424 612275  259 97.5424  

(n  1)( n  2) 
We had n  59,
k3
2
3
2
3
3
3
 0.017846 69365875  179168319  109512153   0.017846 290291   5181 .
or k 3 
n
(n  1)( n  2)
or g 1 
k3
s
3

 f x  x 
 5181
29 .629 3
3

59
 290331   5181 The computer gets -5181.37
5857 
 0.19912
3mean  mode 397 .5424  125 

 2.780
std .deviation
29 .629
Because of the negative sign, the measures imply skewness to the left.
or
Pearson's Measure of Skewness SK 
7
251y0411 2/16/04
i. A histogram is a simple bar graph with frequency on the y-axis and the numbers 300-1200 on the x-axis.
The data Seymour showed is:
class
f f rel
A
B
C
D
E
F
G
H
I
50- 59.999 9
60- 69.999 7
70- 79.999 6
80- 89.999 5
90- 99.999 1
100-109.999 1
110-119.999 10
120-129.999 12
130-139.999 8
59
.1525
.1186
.1017
.0847
.0169
.0169
.1695
.2034
.1356
.9998
Each number in the column is the corresponding number in the column divided by n  59. The y axis
couold be marked from zero to 0.25.
j. The box plot should show the median and the quartiles and use the same x axis as the histogram..
2. Use the frequencies you used in problem 1 in this problem as values of x . Add 1300 at the end. Write
the result in clumps of 3 digits. Example: In the last problem, Seymour Butz used (9, 7, 6, 5, 1, 1, 10, 12,
8). If we add 1300 at the end, we have 976511101281300. In 3 digit clumps this gives him (976, 511, 101,
281, 300).
For these five numbers, compute the a) Geometric Mean b) Harmonic mean, c) Root-mean-square (1point
each). Label each clearly. If you wish, d) Compute the geometric mean using natural or base 10 logarithms.
(1 point extra credit each ). While you’re at it, compute the sample mean and bring it to the exam (no credit
– but it won’t hurt).
x  2169 . This is not used in any of the following calculations and
Solution: Note that Seymour found

there is no reason why you should have computed it!
a) The Geometric Mean.
1
x g  x1  x 2  x3  x n  n  n
x 
5
976 511101281300   5 4.2463879 10 12  4246387930
00 
1
 4246387900 00.20  335 .432 .
b) The Harmonic Mean.
1 1

xh n

1 1
1 
 x  5  976  511  101  281  300   5 0.00102459
1
1
0.019774583
5
1
1
1
  0.00395492
.
1
So xh 
1
1
n

1
x

 0.00195695  0.009900990  0.00355872  0.00333333

1
 252 .850
0.00395492
c) The Root-Mean-Square.
1
1
1
2
x rms

x 2  976 2  511 2  101 2  281 2  200 2  952576  261121  10201  78961  90000 
n
5
5


1
1392859
5

  278571 .8 .

So x rms 
1
n
x
2
 278571 .8  527 .799 .
8
5
251y0411 2/16/04
d) (i) Geometric mean using natural logarithms
1
ln( x)   1 ln 976   ln 511   ln 101   ln 281   ln 300 
ln x g 
n
6
1
1
 6.88346  6.23637  4.61512  5.63835  5.7037825   29 .07709   5.81542
5
5
 

So x g  e 5.81542  335 .432 .
(ii) Geometric mean using logarithms to the base 10
1
log( x)  1 log 976   log511   log101   log 281   log300  
log x g 
n
5
1
1
 2.98945  2.70842  2.00432  2.44871  2.47712   12 .62801   2.52560 .
6
5
 

So x g  10 2.52560  335 .432 .
Notice that the original numbers and all the means are between 101 and 976.
9
Download