6. Methods 6.8. Methods related to outputs, Introduction

advertisement
6. Methods
6.8. Methods related to outputs, Introduction
In order to present the outcomes of statistical data collections to the users in a manner most
users can easily understand, a variety of statistical techniques is used. Techniques generally
associated with the presentation of statistics are cross-tabulation, summary statistics
including averages (measures of central tendency), growth rates and indexes, and graphical
representation of outcomes, which includes various forms of graphs, and maps.
A cross tabulation (commonly known as a table) displays the joint distribution of two or
more variables. They are usually presented as a contingency table in a matrix format.
Whereas a frequency distribution provides the distribution of one variable, a contingency
table describes the distribution of two or more variables simultaneously. Each cell shows the
number of respondents who gave a specific combination of responses, that is, each cell
contains one cross tabulation.
For averages, go to Module 5.8. Mathematical methods in the outputs – Methods.
A histogram is a popular form of graphical display of tabulated frequencies, shown as bars. It
shows what proportion of cases fall into each of several categories: it is a form of data
binning. The categories are usually specified as non-overlapping intervals of some variable.
The categories (bars) must be adjacent. The intervals (or bands, or bins) are generally of the
same size, and are most easily interpreted if they are. Histograms are used to plot density of
data, and often for density estimation: estimating the probability density function of the
underlying variable.
For further information, go to: http://en.wikipedia.org/wiki/Histogram
Among the many other forms of graphical display of statistics are:
The pie chart (or a circle graph) is a circular chart divided into sectors, illustrating relative
magnitudes or frequencies. In a pie chart, the arc length of each sector (and consequently its
central angle and area), is proportional to the quantity it represents. Together, the sectors
create a full disk. It is named for its resemblance to a pie which has been sliced. While the
pie chart is perhaps the most ubiquitous statistical chart in the business world and the mass
media, it is rarely used in scientific or technical publications. It is one of the most widely
criticized charts, and many statisticians recommend to avoid its use altogether.
For other forms of statistical summaries, go to Module 5 Methods – Tools.
One of the dominant forms of summary statistics is measures of central tendency.
Central tendency
Measures of central tendency include different types of means, the median and the mode.
They are commonly known as averages. An average is a single value within a range of data,
used to represent all the values in the series.
An average can be either weighted or un-weighted. There are three kinds of averages or
means, Arithmetic (A), Geometric (G) and Harmonic (H).
Un-weighted arithmetic mean
The un-weighted arithmetic mean is possibly the most frequently used measure of an
average, or mean, value. The un-weighted arithmetic mean is often called a ‘simple’ average
since all numbers (xi ) are represented once and given the same weight. The un-weighted
arithmetic mean is derived by summing up the numbers and divide by count of numbers n).
The un-weighted arithmetic average can easily be computed and it is the measure that most
people intuitively understand as a measure of average. However, that fact does not make it
suitable for all purposes.
Weighted arithmetic mean
A weighted arithmetic mean is obtained by multiplying each number (xi ) by its weight (wi ),
adding these products, and divide the sum of the products by the sum of the weights.
Weighted arithmetic means are commonly used to construct price and volume indices, e.g.,
a Laspeyres price index is a weighted arithmetic average of price relatives.
Un-weighted geometric mean
An un-weighted geometric mean of a series of numbers is obtained as the n-root of the
product of these numbers.
An alternative procedure is to compute the logarithm to each number in the series and
derive ln G as the arithmetic mean of the logarithms. G is then derived as the exponential to
ln G.
The geometric mean is mostly used for averaging rates of change or ratios. Taking the
geometric mean of growth rates, or changes, is equivalent with deriving a compounding
growth rate. Using a geometric mean is the most common way of estimating growth in
economic aggregates over a period of time. Furthermore, in the compilation of price indices,
geometric averages are sometimes used to average prices at the most elementary level
where no weights are available. A geometric average is typically used to avoid giving undue
importance to a few extreme values.
Weighted geometric mean
A weighted geometric mean of a series of numbers is obtained as the product of these
numbers raised in the power equal to one over the sum of the weights.
Harmonic mean
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of the numbers
in the series.
Harmonic means or weighted harmonic means are commonly used in averaging indices. A
Paasche index is a weighted harmonic mean, and one of the most well-known Paasche
indices is the GDP-deflator.
Weighted harmonic mean
Relationship between the different means
The following relationship between the three types of means will always hold if all values of
the series are positive; A >= G >= H (Equality will hold only if all numbers in the series are
equal).
Median or fractiles
The median is also a measure of central tendency. The median of a set of ‘n’ numbers, x 1, x2,
.....,xn, is the middle value when the numbers are arranged from smallest to largest. If ‘n’ is
an odd number, there exists a unique middle value, and it is the median. If ‘n’ is an even
number, there are two middle values, and the median is defined as their average. Roughly
speaking, the median is the value that divides the data series into equal halves, 50 percent of
the observations lie below the median and 50 percent lie above it. Some prefer to call the
median the position average.
The difference between the arithmetic average and the median tells something about the
distribution of the series. The arithmetic average is a weighted center of a data series in the
sense that the series balances even at the point of the arithmetic mean. The arithmetic
mean is influenced by any extreme values (small or large) in the series. The median, on the
other hand, is not influenced by those extreme values. This point is illustrated in the
example below.
The example above illustrates that the median is likely to be a more sensible measure of
center than the arithmetic mean when the distribution is (extremely) asymmetrical. This is
why reports on income distributions frequently quote the median as a summary measure,
rather than the average.
If the number of observations is quite large, it is sometimes useful to extend the notion of
the median and to divide the data into quartiles (4), quintiles (5), deciles (10), or percentiles
(100) -- fractiles. A fractile is a value at or below which lies a given fraction of a data series.
Data are arranged from the smallest number to the largest, so that the value of the third
quartile will always be higher than (or at least as high as) the first and the second (the
median).
A fractile is found as item (p/q)*n + ½ th, and is measured as the value of this item, where p
is the fractile number, q is the number of parts that the series is to be divided, and n is the
number of items in the series.
Mode
If the numbers of a data series tend to concentrate around one value, this number is called
the mode. The mode is in that sense the most typical, or frequently observed, value in a
series. The mode is therefore the value that a number selected at random is most likely to
take. To obtain the mode, data should be graphed or presented with their frequencies.
Growth rates
A special form of central tendency presentation is growth rates.
A growth rate shows the percentage change from one period to another/the next or the
average change over a number of periods.
One-period growth rate
For the calculation of the growth rate for one period a simple percentage change
methodology is appropriate.
The one period growth rate can be used on all kind of data (except for the case where the
initial value is zero), to measure growth in economic indicators like GDP or CPI, population
growth etc. from one period to the next.
Compounding or geometric average growth rate
A compounding growth rate derives the average growth over a period. To compute the
average growth, the values at the start (0) and at the end (t) are the only values used. The
compounding growth rate is a generalization of the one-period growth rate, and computing
a geometric average over the one-period growth rates will be identical to compute a
compounding growth rate.
The formula for the one-period growth rate can be re-arranged:
Since:
It follows that:
where g is average percentage growth/change from period 0 to t.
The compound growth rate is used when averaging growth, interest and rate of return over
discrete periods. Most economic phenomena are measured only at intervals (month, quarter
or year) for which the compound growth model is appropriate.
Compound growth rates do not take into account intermediate values of the series, thus, is
sometimes named end-point growth rates.
Annualized, or compounded sub-annual growth rates
When measuring growth in sub-annual time series, e.g., quarterly GDP or monthly CPI, there
is always a choice of how to measure and present the growth. Growth can be presented (i)
as the change from the same period the of previous year ( 4.98 to 4.99 ), (ii) as the change
from the previous period ( 3.99 to 4.99 ), or (iii) as the change from the previous period (
3.99 to 4.99 ) at annualized, or compounded rate of change.
The purpose of annualizing the rates of change is to present period-to-period rates of
change for different period lengths on the same scale, and thus to make it easier for the
layman to interpret the data, e.g., to realize that a 0.8 percentage growth from one month
to the next is equivalent to a 2.4 percentage growth from one quarter to the next, or an
annual growth of 10.0 percent.
However, annualizing growth rates also means that the irregular effects are being
compounded. Since short-term statistics are prone to larger erratic movements than annual
data, annualizing may result in an exaggeration of the short term movements in the series,
at the expense of the overall trend.
Exponential growth
If the frequency of compounding is considered to be continuous, the growth is called
exponential. That is, the growth is approximately exponential when c in the formula for the
compounding growth rate is so high that r/c becomes ‘very small’.
While the compound growth rate is used to measure average growth for discrete
data/periods, the exponential growth rate is used for continuous data.
Least-square growth rates – Trends
Least-squares growth rates can be used whenever there is a sufficiently long time series to
permit a reliable calculation. The least-square growth rate is estimated by fitting a linear
regression trend line to the logarithmic (annual) values of the variable in the relevant period.
The regression equation takes the form:
which is equivalent to the logarithmic transformation of the compound growth equation:
where x is the variable, t is time, and a = ln x0 and b = ln ( 1 + r ) are the parameters to be
estimated.
If b is the estimate of b, then the average annual growth rate, r, is obtained as:
An average growth rate estimated as a least-squares growth rate represents a trend, and it is
not necessarily equal to any of the actual growth rates between any two periods.
VSS Activities
11
Download