Uploaded by David Adrian Ochoa

Test Notes 2

advertisement
Daniel C. Tracht
ECON 15A Notes 2
What Do We Study?
We want to know about a parameter in a population, so
we look at a sample of the population and create a statistic
that will answer our question. The statistic can be used to
describe the sample, or for making inferences.
In our sample, we have observations which have information in variables. The observations can be grouped by
the categories within a variable. Often we want to know
the relationship between variables. The affected variable is
the dependent, while the affecting variable is independent.
Every variable can be a representation of categories,
where we can make nominal groupings. Sometimes there
is clear order to the observations. When distance is meaningful, we can find the interval between two observations.
If there is a meaningful zero, then we can take a ratio of
the values.
Averages
For nominal data, we can only calculate the mode, the
most common category. With ordinal data, we can also
calculate the median, the middle observation. With Interval/Ratio, we can calculate a mean. The most common
mean is the arithmetic mean. But there are also the harmonic, geometric, and quadratic means. For formulae and
relative sizes, see Table 1.
Harmonic
1
n
P
i
Q
x−1
i
−1
1
xi ) n
Geometric
(
Arithmetic
1
n Σi x i
P 2 2
1
i xi
n
i
Moments and Standardization
Small
There are raw and central moments.
The b-th raw moP
b
ment is defined as m0n = n1 i (xi ) . The b-th central
P
b
moment is defined as mn = n1 i (xi − µ) . The second,
third, and fourth central moments are the variance, skewness, and kurtosis, respectively. As the order of the central
moments, increase, there is more weight placed out observations far from the mean.
To give comparable units, the moments are often standardized. The most common of these is the Z-score. The
Z-score of an observation is z = σ1 (xi − µ). This is a unitless number that specifies how many standard deviations
from the mean the particular observation is. To standardize the n-th moment, the moment is divided by σ n to rid
it out its units. For example, using the standardized skew,
it becomes possible to say that if the standardized skew is
positive, then it is skewed to the right, and the larger the
standardized skew, the more skewed the distribution is.
Big
Graphs
To visualize data, we use graphs. Depending on the type
of data, we have different options. The types are summarized in Table 2. The key point is that area is meaningful
in all of these. In box, histogram, and density, distance
is also meaningful. Histograms have all buckets the same
size, while density plots can have varying widths. But still,
area is meaningful.
Dispersion
Distribution Shapes
In addition to the mean, we also need to know how
spread out the data is. There are many ways of calculating
an index of qualitative variation. If there is order, we can
calculate quartiles, or medians of the data above and below
Bar
Box
the median. If we have interval data, we can calculate an
interquartile range using IQR= Q3 − Q1.
With cardinal data, the first thing to calculate is the
deviation, which is simply xi − µ. Since these could be
positive or negative, the standard approaches are to either
take the absolute value or square them. After taking the
absolute value, we take the arithmeticPmean to find the
Average Absolute Deviation: AAD= n1 i |xi − µ|.
If we square them, then the sum of these deviations
is known as the sum of squared deviations. Taking the
average of these, we get the variance, and from there, we
get the standard deviation by taking the square root. See
Table 3 for information about these.
To predict these parameters in the population, we need
to use the statistics from the sample. However, since we
have already used our data to predict the mean, we must
use Bessel’s correction to account for the decrease in degrees of freedom and get better predictions of the population parameters.
Smallest
Quadratic
Biggest
Table 1. The Means
Pie
Variance
SD
√
P
2
Population
σ 2 = n1 i (xi − µ)
σ2
p
P
2
Uncorrected
s2n = n1 i (xi − x)
s2n
p
c2
c2 = 1 P (xi − x)2
σ
Corrected
σ
i
n−1
Table 3. Measures of Dispersion
Histogram
Nominal
Ordinal
Interval/Ratio
Table 2. Graphs
Density
If the distribution has more than one peak, it is multimodal. Otherwise, it is unimodal. If there is no skew, the
distribution is symmetric. Otherwise, it is asymmetric.
For many graph shapes, about 68% of the observations
are within one standard deviation of the mean, about 95%
are within two, and 99.7% are within three. Chebyshev’s
Inequality shows that for most distributions, at least 1− k12
share of the observations are within k standard deviations.
Download