Shape populations arising from mathematical theory, and random samples from any population.

advertisement
S05
The Shape characteristic of the Distribution of
populations arising from mathematical theory, and
random samples from any population.
(See textbook Chapter 3)
Your textbook, in Sections 3.1 thru 3.3, deals with “sets of numerical data” without worrying
about the possible source(s). Let us use a more limited view and say the dataset in question is
either a population or a random sample from a population.
Chapter 3 gives methods for summarizing data to describe important characteristics of the data's
distribution. What do we mean by “distribution of data”? It is a very general term meaning the
pattern of the numbers. (Where are they centered, how do they spread out around the center,
what are the quantiles, deciles, ... etc?)
One important characteristic of the distribution of a dataset is its shape. By shape we mean the
way in which proportions of the numbers in the dataset change as we move across subintervals of
that interval of the real line which contains all numbers in the dataset.
Example: A tiny population.
Consider a dataset which is (say) the population consisting of elements {1, 1, 1, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3, 3, 4, 4, 4, 4, 5, 5}. The proportions of unique values are
A tiny population- Distribution
Value
1
2
3
4
5
Proportion
3/20
5/20
6/20
4/20
2/20
A tiny population
Population
Rows
elements
1
1
2
1
3
1
4
2
5
2
6
2
7
2
8
2
9
3
10
3
11
3
12
3
13
3
14
3
15
4
16
4
17
4
18
4
19
5
20
5
Distributions
Population elements
6
5
4
3
2
1
0
Moments
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
2.85
1.2258187
0.2741014
3.4237008
2.2762992
20
S05
A histogram of relative frequencies (proportions) in this case looks as
5
4
3
2
1
1/20 2/20 3/20 4/20 5/20 6/20
The configuration across the tops of the bars shows the shape of this distribution. Note that
every population whose unique elements are 1, 2, 3, 4 and 5, occurring with proportions 3/20,
5/20, 6/20, 4/20, and 2/20 will have this shape regardless of population size. Some other graphs
describing various common shapes of distributio ns, and the name of each, are given in Figure 3.7
on page 73 of your textbook. Let us next review some of the theoretical families of populations
and look at the shapes of their distributions as examples of what population and random sample
shapes may look like.
Example: The Binomial, a family of populations arising in mathematical theory.
Consider a collection of populations indexed by a positive integer n and a real number 0 < p < 1.
These were also mentioned in a previous handout. Thus, the pair (n, p), for specific n and p,
identifies what we will call a Binomial (n, p) population. The unique elements in such a
population are the integers 0, 1, 2, ..., n. These occur in proportions which are given by the value
of the density function f (x) defined as
n!

x
n− x
 x ! ( n − x) ! p (1 − p)
f (x ) = 

 0
for any x = 0, 1, 2, ..., n
otherwise
(Recall that n ! = 1 ⋅ 2 ⋅ 3 ⋅ 4 ⋅ K ⋅ n and 0 ! ≡ 1 )
The shapes of the distributions of Binomial (5, 0.2), (5, 0.5), (5, 0.8), and (10, 0.2) are shown by
histograms in Figure 5.3 on page 233. (Don't worry about chapter 5 now.) Note that population
size is not a consideration.
For the case of Binomial (2, 0.66), the proportions are
x
f ( x)
0
2!
( 0.66) 0 ( 0.34) 2 = 0.1156
0! 2 !
1
2!
(0.66) 1 (0.34) 1 = 0.4488
1 !1 !
2
2!
( 0.66) 2 ( 0.34) 0 = 0.4356
2 !0 !
2
S05
Also, the cumulative distribution function value F (1) = f ( 0) + f (1) for example is the proportion
in the interval [0, 1]. The Binomial (n, p) is what we call a family of populations. Each
population, of course, has its own distribution. These populations arise from mathematical
theory and so they are called theoretical distributions. Certainly they are figments of our
imagination. These are examples of Discrete Populations.
Example: The standard Exponential population arising in theory.
Populations like this one require some imagination to envision. The unique elements in this
population are the totality of positive real numbers, and each number occurs one or more times in
the population. The proportion of population elements in any given interval (a, b) is found by
integrating the following density function h(x) over (a, b).
e − x

h (x ) = 
0

x>0
otherwise
x
The cumulative distribution function is F ( x) = ∫ e − y dy. Thus the proportion of population
0
3
elements in the interval [2, 3] is F ( 3) − F (2) = ∫ e − y dy , and in [0, ∞] is
2
∞
∫0
e − y dy = 1 .
Since the unique elements in this population form a continuum over (0, ∞), the shape of the
distribution can be shown as the graph of h(x) over (0, ∞) which is the continuous curve shown
on page 258, Figure 5.14 with α = 1. Think of this as the limiting histogram as the interval width
of histogram rectangles goes to zero. This is a Continuous Population. It is, of course, one
member of the Exponential E (α ) family of populations.
Example: The Standard Normal population, a Continuous Population arising in theory.
The unique elements in this theoretical population are all the real numbers. Each occurs one or
more times. The proportion of population elements in any given interval (a, b) is the integral of
the following density function g(x) over (a, b).
g ( x) =
e
−
x2
2
2π
,
−∞< x< ∞ .
The Cumulative Distribution Function (CDF) is F ( x ) = ∫
x
−∞
g ( y ) dy. The shape of the standard
Normal distribution is the bell shape. The graphs in Figure 5.11 on page 254 show this shape.
These are, of course, graphs of g(x). Note the symmetry of the distribution about zero. Thus half
of the population elements are less than zero and half are greater than zero.
The integral of g(x) does not have closed form. Thus the integral of g(x) over any finite interval
must be approximated numerically. This is difficult and we won't attempt it. Table 3.10 on page
3
S05
89 can be used to find approximate population proportions over some intervals. Using the table
we can find, for example,
Interval
(− ∞, −1.88)
(− ∞, −0.39)
(− ∞, −0.08)
(− ∞, 1.65)
(− ∞, 1.88)
(−0.39, −0.08)
Proportion
0.03
0.35
0.47
0.95
0.97
0.47 − 0.35 = 0.12
JMP will evaluate g(x) and the CDF. Table B.3 on pages 788 and 789 inverts Table 3.10 by
giving proportion in the body of the table and interval endpoint in the margin.
Why do we care about theoretical populations?
Mathematical theory provides facts and figures about theoretical populations. Things like
population mean, variance, quantiles, deciles, shape of distribution, etc. Basically, we can
assume that anything we might wish to know about the characteristics of a population is known
about each theoretical population.
Now, in a real world situation, wherein we imagine a population of interest (like thrust face
runouts), if evidence points to the distribution of our population being similar to that of
theoretical population Z (say) then the characteristics of our population of interest will be similar
to the known characteristics of population Z. How can we acquire evidence about the
distribution of elements in our population of interest? The answer is to take a random sample of
population elements and look at the distribution (especially the shape). If it approximates that of
population Z, then this is evidence that the distribution of our population might be similar to that
of population Z.
Chapter 3 in your textbook discusses ways to get evidence about the distribution of elements in a
random sample (or any dataset we might have).
4
Download