251descr1 1/22/07 (Open this document in 'Outline' view!) ECONOMICS 251 COURSE OUTLINE A. Introduction 1. Definitions Define Statistics, Descriptive and Analytic Statistics, Induction and Deduction. 2. Uses of Statistics B. Sources and Types of Data 1. Data Define data sets, observation, unit of observation. Qualitative and quantitative data. Nominal, ordinal, interval and ratio data. Discrete vs. continuous data. Data is discrete if the number of values it can take are countable – most typically a variable that can only be a whole number, like the number of times you can win the lottery is discrete. If two numbers x1 and x 2 are drawn from continuous data, any number between them also could be part of the same data set. Temperature, weight and most other things that we measure are continuous data. a. Qualitative Data (i) Nominal Data: There is no natural number scale - numbers are only used to define categories, so that no operations like addition or multiplication are valid. (ii) Ordinal Data: Numbers are used only to order things (e.g. first, second, first). Differences between ranks do not always have the same meaning. Most mathematical operations are still not valid. b. Quantitative Data (i) Interval Data: Differences between ranks have consistent meaning, but, like Celsius temperature, there is no obvious origin, so that , although addition and subtraction can be used, multiplication and division have no real meaning. (ii) Ratio Data: there is a meaningful origin, so that multiplication and division are valid. 2. Sources Define primary and secondary sources, internal and external data. 3. Cross Section and Time Series Data a. Cross Section Data b. Time Series Data. i. Indices ii. Real Values iii. Rates of change iv Logarithms C. Presentation of Data 1. Classification Define collectively exhaustive and mutually exclusive classes. These are not the same thing. Collectively exhaustive means that every item you are considering has a place in a class. Mutually exclusive means that if an item belongs in any given class, it does not belong in another class as well. 2. Tables Define parts of tables. See 251pttbl . 3. Charts and Graphs 251descr1 11/09/06 (Open this document in 'Outline' view!) Define parts of graphs Line graph example http://www.epinet.org/issueguides/minwage/figure2.pdf Pie chart example - National Priorities Project Where Do Your Tax Dollars Go? posted 2006 This publication shows how the federal government spent the average household's 2004 income taxes in each state and 193 cities, towns and counties. Component part line chart example 251GDP_DPI D. Frequency Distributions and Populations. 1. Definitions Meaning of Population, Frame, Census, Sample, Grouped Data, Frequency, Example of Frequency Distribution, Relative Frequency. Width of a class interval. largest smallest w (Always round this result up!) number of classes Example: Let us assume that we have a sample consisting of numbers between 905 and 8756, and that we 8756 905 1570 .2 . We will at want to present the data in 5 classes. Our class interval will be at least 5 least round this up to 1571. If we want to use 1571, our first class will begin at 905, the next at 905 + 1571 = 2476 etc. In fact we might consider a class interval of 1600 and start our lowest group at 800. The classes would be 800 – under 2400, 2400 – under 4000, 4000 – under 5600, 5600 – under 7200 and 7200-under 8800. Just make sure that the classes cover the data and that there are few empty classes. The most commonly observed rule for deciding on the number of classes is Sturgis’ rule. The formula can be written as number of classes 1 3.3 log 10 n where log 10 n is the log base 10 of the number of observations. This rule should not be taken seriously. For more on this see http://cnx.org/content/m10160/latest/. 2. Graphs of the Frequency Distribution. See http://cnx.org/content/m10927/latest/ a. The Histogram b. The Frequency Polygon c. The Cumulative Frequency Distribution (Ogive). See http://home.ched.coventry.ac.uk/Volume/vol0/ogive.htm d. Relative Frequencies. e. Smoothed Histograms E. Sampling and Descriptive Statistics. 1. Sampling to Learn About a Population. Infinite and finite populations, target and sampled populations, the Stability of Mass Data. 2. The Meaning of Random Sampling. A simple random sample of n items taken from a population of N items must be selected in such a way that all combinations of n items are equally likely. 3. Descriptive Statistics. a. Measures of Central Tendency. (Where's the middle of the data?) b. Measures of Dispersion. (How spread out are the data?) 2 251descr1 11/09/06 (Open this document in 'Outline' view!) c. Measures of Asymmetry etc. (What else can I say about the shape?) 3 251descr1 11/09/06 (Open this document in 'Outline' view!) F. Measures of Central Tendency. 1. The Arithmetic Mean of Ungrouped Data. a. The Population Mean. x N b. The Sample Mean. x x n Example: Consider the following data set. x Row 1 2 3 4 5 10000 17000 23000 30000 80000 160000 It makes no difference whether we call this a sample or a population, so let’s say that this is a sample. We 160000 x 160000 , n 5 so x 32000 . The alert observer will note that the mean has can write 5 been raised by the highest number so that it is actually above all the numbers but the highest one. 2. The Arithmetic Mean of Grouped Data. To make an ungrouped data formula into a grouped data formula, substitute f for . For x substitute the midpoint of the group. This is defined for our purposes as the arithmetic mean of the lower limit of the group in question and the lower limit of the next group. In other words if we have the group 10 to 10.99, followed by 11 to 11.99 the midpoint of the first group is 10.50, not 10.495. The formula for a population mean for grouped data is thus formula are essentially identical. x fx fx . The sample mean formula and the population mean n . n Example: It makes no difference whether we call this a sample or a population, so let’s say that this is a sample. Row x f fx 1 2 3 4 5 10 12 14 16 18 3 30 3 36 5 70 3 48 1 18 15 202 Note that there is no reason to sum x. We can write fx 202 , n f 15 so x 202 13 .467 . 15 Not also that if we use f rel in place of f , we do not have to divide by n . 4 251descr1 11/09/06 (Open this document in 'Outline' view!) 3. The Weighted Arithmetic Mean. w wx w , xw wx . Example: We have three firms with profit rates of 10%, 12%, and 15%, w which would average 12.33%. If we want a rate of return on capital we might want to know that the assets of the firms are respectively $2 billion, $1 billion and $1 billion. It is also common in a situation like this to use relative weights found by dividing the original weights by the sum of the weights, in this case 4. Row x w wx wrel wrel x 1 2 3 10 12 15 2 1 1 20 12 15 4 47 .50 .25 .25 5 3 3.75 11.75 47 w 4, wx 47 and x w So 11 .75 . If we use relative weights, we can read the weighted 4 mean as the sum of the wrel x column. 1.00 4. The Median of Ungrouped Data. Defined simply as the middle point when the data is in order. If there are two middle points, take their arithmetic mean. In continuous data half the points will be above or below the median. Consider the data set that we used for the mean. x Row 1 2 3 4 5 10000 17000 23000 30000 80000 160000 Note that the middle number is the third number and that 3 5 1 . In general the index 2 n 1 . If this is a sample, we can write x50 23000 . If this is a population 2 23000 . The alert observer will note that median is not much affected by the highest number so that it seems more typical that the mean. Now consider a second data set. x Row of the median is location 1 2 3 4 5 6 10000 17000 23000 27000 30000 80000 160000 n 1 6 1 3.5 . This formula seems to be telling us that, since there is no one 2 2 middle number, we have to average the third and fourth number. If this is a sample, we can write 23000 27000 x 50 25000 . If this is a population 25000 . 2 Note that location 5 251descr1 11/09/06 (Open this document in 'Outline' view!) 5. The Median of Grouped Data. This is a special case of the formulas for fractiles of grouped data below, where p .5 . position 1 2 n 1 . pn F x1 p L p w . For the formulas and the example used in class see f p 251median. 6. The Mode The mode is simply the most common point, not very useful in discrete ungrouped data. For grouped data it is defined as the midpoint of the largest group. If we dredge up our example for grouped data below. Profit Rate f F Midpo int 9-10.99% 3 3 10 11-12.99% 3 6 12 13-14.99% 5 11 14 15-16.99% 3 14 16 17-18.99% 1 15 18 Total 15 Since 13-14.99 is the largest group and its midpoint is 14, we can write mo 14 . Note that a distribution can have two modes, which would make it bimodal. If it has only one mode it is unimodal. Of all the measures of central tendency, the mode is the most resistant to a few very high numbers and the mean is least resistant. Populations made up of data like wealth and income almost always have a few outliers to the right of most of the data. They tend to be cut off on the left by the fact that a minimum income is necessary to sustain life. We say that a population of this type is skewed to the right. Typically for such a population mode median mean mo . On the other hand a population that is skewed to the left would have mean median mode mo . So what would you expect if a population is unimodal and symmetrical? 7. Other Means. a. The Geometric Mean. 1 x g x1 x 2 x3 x n n n x or ln x g 1 n ln( x) Example 1: Find the geometric mean of 1, 2 and 3 1 x g 1 2 33 3 6 6 0.3333 1.817 Or, using natural logarithms, ln x g 1 ln 1 ln 2 ln 3 1 0 0.693147 1.098612 0.597253 . So 3 3 that x g e 0.597253 1.817 . Or, using logarithms to the base 10, log x g 1 log 1 log 2 log 3 . So that x g 10 logxg 1.817 . 3 Example 2: A stock’s value grows at 50% in period 1 and 5% in period 2. Find the average growth rate. 1 Add 1 to the growth rates and take a geometric mean. x g 1.50 1.05 2 2 1.575 1.575 0.500 1.255 . So the average growth rate is 25.5%. 6 251descr1 11/09/06 (Open this document in 'Outline' view!) b. The Harmonic Mean. 1 1 1 xh n x Example: Find the harmonic mean of 50 and 30. 1 11 1 1 0.020000 0.03333 0.026667 . x h 2 50 30 2 So x h 1 37 .50 0.026667 c. The Root-Mean-Square. x 1 1 2 x 2 or x rms n n Example: Find the rms of 1, 2 and 3 x rms 2 1 2 1 1 4 9 4.666667 2.160 1 2 2 32 3 3 x rms d. What Formulas for Means Have in Common. f x 1 n f x . 8. Measures of Position. Percentiles, deciles, quintiles, quartiles and fractiles. The two formulas below are two-step formulas. The first step is multiplying n 1 (or N 1 )* by p . p represents the fractile of the data wanted measured from the bottom. For example, if we want the 91st percentile, p is .91. Note that the number you have found is called x1 p x1.91 x.09 (i.e. 9% from the top!). If we want the third quartile, Q3 x.25 , p is 3 4 or 0.75. If we want the first quartile, Q1 x.75 , p is 4 or 0.25. Of course, for the median p .5 . N or n represents the number of items in the population or sample, not the number of groups or classes. 1 a. Finding a Fractile of Grouped Data. To use this formula, we must first compute the cumulative distribution of the group and determine in which group the desired fractile is located with the calculation position pn 1 *. Once we have found the group that this is in, let f p be the frequency of the chosen group, and let F be the cumulative frequency pn F up to but not including the chosen group. The formula here is x1 p L p w . In this formula, w f p is the class interval (the interval between the lower limit of the chosen group and the lower limit of the next group) and L p is the lower limit of the chosen group. 7 251descr1 11/09/06 (Open this document in 'Outline' view!) Example: Suppose that in the example below we must find the first quartile. Since the first quartile is the .25 fractile, p is .25. To locate the group use position pn 1 = 0.25(16)=4. Profit Rate 9-10.99% 11-12.99% 13-14.99% 15-16.99% 17-18.99% f F 3 3 3 6 5 11 3 14 1 15 Total 15 we find that x1.25 x.75 Using the cumulative distribution F column, we find the fourth item in the sample. Since 4 is above 3 and below 6 in the F column, we pick the group 11-12.99%. n is 15, and for the group we have picked, w = 13 - 11 = 2, L p 11 , F = 3, and f p 3 . If we put these numbers into the formula, .25 15 3 11 2 11.5 . 3 pn F Note: Sometimes is negative. In this case choose the group before the one you would ordinarily f p have chosen. Example: If you want the 19th percentile of the data above position pn 1 =.19(16) = pn F .1915 3 3.04, which would normally take us into 11-12.99. But 0.075 , so use the group 3 f p 9-10.99 instead. But see c below. b. Finding a Fractile of Ungrouped Data. This time when we compute position pn 1 , we divide it into an integer part, a , and a fractional part, .b . We then use the formula x1 p xa .bxa1 xa to find the actual value. Example: If our set of numbers is 1,5,7,9,9,11,13,14,17 ,19 n = 10, and we wish to find the first quartile, p = 0.25, so that pn 1 = 0.25 (11) = 2.75. Then a 2 , and .b .75 . Now find xa and xa 1 , in this case x2 and x3 , and use the formula x1 p xa .bxa1 xa . 1,5,7,9,9,11,13,14,17,19, xa x2 5 x1 p x.75 5 0.757 5 6.5 and x a 1 x3 7 , so that c. Experimental formula (Don't read this unless you are really ready to ask questions!) See 251dscr_A . Document continues in 251descr2 . * Experimentation indicates that a better formula is position 1 pn 1 . This is compatible with the formula position .5n 1 for the median and seems to work in more places. 8