Author: PRINCE FODAY 3. BASIS ANALYTICAL TOOLS IN ECONOMICS The science and methods of economic analysis in modern sense starts from the mere recording and tabulation of numerical data followed by the processes of accepting facts from mathematical theory. The first stage in any economic analysis is the collection of facts or data. The data themselves can be of any kind as long as they can be counted. The facts collected have attributes or characteristics. Attributes maybe described as one which is not capable of numerical definition (e.g. colour of eyes) or one, which can be, expressed in numerical terms (e.g. heights in inches), salary in pounds or dollars, marks in examination. The facts collected may be a continuous variable or discrete variable. A continuous variable takes any value within the range of its observed minimum and maximum values (e.g. the heights or weights of school children). Discrete variables are variables that are exact in measurement and we cannot have less or more than a whole number (e.g. number of students in a class.) Generally, all economic facts are subject to collection, tabulation and presentation, analysis based on mathematical theory, and conclusion. 2.1 Tabulation Before tabulation takes place, the information or data collected from individual question Aires (sets of questions meant to achieve objectives) needs to be entered on a separate summary sheet. These totals are then transferred to the relevant columns of prepared table. The purpose of tabulation is to reduce the data into few so as to ease its comparison. The Construction of Tables The construction of table depends on the nature of the data. The nature of the data can be raw, ungrouped and grouped. Table 1 a) Tabulation: Raw Data A raw data is a numerical fact that has been collected from field or desk research. To illustrate the normal procedure in tabulating a raw data, the data in table 1 relates to the number of rejects in each successive period of five minutes. The smallest number of rejects is at the beginning of the group, and the largest at the end (i.e. in order of magnitude). Such an arrangement of data is Called ARRAY. The table shows that the minimum and maximum numbers of rejects are 3 and 33. The difference between the highest and lowest value is called the RANGE. 3 3 7 8 9 11 12 13 13 15 16 17 17 18 19 19 20 20 21 21 22 22 22 22 2 22 23 23 23 23 23 24 24 24 24 24 25 25 26 26 26 27 27 28 28 28 29 30 31 33 b) Tabulation: Ungrouped Data Table 2 is an ungrouped data or ungrouped frequency distribution because there are still many figures to absorb. Table 2 Number of rejects 3 7 8 9 11 12 13 15 16 17 18 19 Frequency 2 1 1 1 1 1 1 1 1 2 1 2 Number of rejects 20 21 22 23 24 25 26 27 28 29 30 31 33 Frequency 2 2 6 5 5 2 3 2 3 1 1 1 1 Comparison is possible to some extent with an ungroup frequency distribution. One would now say that 22 occur most frequently. This is because has the highest frequency. C) Tabulation: Grouped Data Tabulation of grouped data or group frequency distribution is as a result of the many unabsorbed figures in ungrouped data. Instead of the frequencies of each single number of rejects being shown separately, the range is sub-divided ‘classes’. In table 3 that follows class consists 4 classes with or class interval. Thus the first class (i.e. 3-7) covers all 4 values inclusively. Conventionally, the number of rejects is usually called independent variable. The corresponding frequencies are referred to as the dependent variable. Table 3 Number of rejects 3-7 8-12 13-17 18-22 23-27 28-32 33-37 Frequency 3 4 6 13 17 6 1 How To Find The Class With In most cases the class width or class interval is given. However, where the class width is not given, the following steps can be of help. - Find the smallest and largest valve from the raw data. For example, in table 1 the smallest and largest value is 3 and 33; Find the range (i.e. 33-3=30) Express the number of class needed, say 7 as in table3; Divide the range value by the number of classes (i.e. 30/7=4.29=4); and Conclude that the class width is 4. 2.2 Presentation of Data Presentation refers to using graphs and charts to explain the information of explaining data or information in clear manner. This type of communication of facts is visual. By mere view one can understand a diagram. Presentation of the collected data is not of importance to the novice, but very useful to the economic analyst or quantitative technician. For example, a well designed but simple diagram showing the trend of revenue and expenditure will be explain better than a mass of detailed monthly figures. Graphs a) Histogram The histogram comprises of series of rectangles touching on another. The independent variable (classes) is plotting along the horizontal axis and the dependent variable (frequencies) against the vertical axis. The bars are with equal width and the frequencies within each class are represented by height of bar. If the class-width varies, then the height of the bar must be adjusted. The area of the different rectangles is proportional to the frequencies in the respective class: i) Histogram: Equal class width Consider table 3 that shows a machine that produces the number of rejects in each successive period of five minutes and draw the histogram of the data. Table 3 Number of rejects 3-7 8-12 13-17 18-22 23-27 28-32 33-37 Frequency 3 4 6 13 17 6 1 The grouped frequency expressed in table 3 cannot give as the joint bars so; we need to adjust the table. The adjustment can be done by subtracting 0.5 from the lower class and adding 0.5 to the upper class of the first class, and then taking the upper class as the first class of the second class on a continuous as shown on table 4 Table 4 Number of rejects 2.5-7.5 7.5-12.5 12.5-17.5 17.5-22.5 22.5-27.5 27.5-32.5 32.5-37.5 Frequency 3 4 6 13 17 6 1 The figure 1 is the histogram of an equal class-width. II.Histogram: unequal class width Consider table 5, which relates to the length of life of bad debts and draw the histogram. Table 5 Working days 0-5 5-10 10-20 20-30 30-35 Number of bad debts 30 20 32 14 4 There is an unequal class width in table 5 that must be adjusted. The class 10-20 is twice the width of the 1st and 2nd class. The class 10-20 can be divided into two classes that make the 3rd and 4th classes (i.e. 10-15 and 15-20). The number of bad debts has to be divided equally (i.e. 32/2=16) between the two classes. The 5th class (i.e. 20-30) is dealt with in the same way, which will result in table 6 that follows Table 6 Working days 0-5 5-10 10-15 15-20 20-25 25-30 30-35 Number of bad debts 30 20 16 16 7 7 4 The figure 2 is the right histogram. b) Frequency Polygon Frequency polygon is arrived at by joining the midpoints of the top of the rectangles of the histogram on the machine that produces the number of rejects in each successive period of five minutes (i.e. the histogram for equal class width). The frequency polygon is shown in the figure 3. Figure 1 HISTOGRAM: EQUAL CLASS WIDTH FREQUENCY 18 16 14 12 10 8 6 SCALE 1) H.A: 2cm=5 rejects 2) V.A: 2cm=2 units 4 2 0 2.5 7.5 12.5 17.5 22.5 NUMBER OF REJECTS 27.5 32.5 37.5 FIGURE 2 HISTOGRAM: UNEQUAL CLASS WIDTH NUMBER OF BAD DEBTS 30 SCALE 25 20 15 10 5 1) H.A: 2cm=5working days 2) V.A: 2cm=5number of bad debts 0 5 10 15 20 25 30 35 WORKING DAYS FIGURE 3 FREQUENCY POLYGON FREQUENCY 12 18 16 14 12 10 8 6 SCALE 1) H.A: 2cm=5rejects 2) V.A: 2cm=2units 4 2 0 2.5 7.5 12.5 17.5 22.5 27.5 32.5 375 NUMBER OF REJECTS Cumulative Frequency Curve This is sometime called OGIVE. The cumulative frequency curve is arrived at by plotting the cumulative frequencies (dependent variable or y-axis) against the upper class boundaries (independent variable or x-axis). Consider the marks of economics students in 124 in Gambia senior secondary school that follows: Marks 0-10 11-21 22-32 33-43 44-54 55-65 66-76 Number of Students 1 2 5 12 6 3 7 Construct the cumulative frequency curve for the frequency distribution. Solution The first step requires the adjustment of the data into class boundaries and thereafter makes use of the upper class boundaries and cumulative frequencies to draw the OGIVE. Marks Number of Students Marks (boundaries) 0-10 11-21 22-32 33-43 44-54 55-65 66-76 1 2 5 12 6 3 7 -0.5-10.5 10.5-21.5 21.5-32.5 32.5-43.5 43.5-54.5 54.5-65.5 65.5-76.5 Marks (upper class boundaries) 10.5 21.5 32.5 43.5 54.5 65.5 76.5 Cumulative Frequency Curve (OGIVE) Cumulative NO, of Students 1 3 8 20 26 29 36 CUMULATIIVE NUMBER OF STUDENTS 40 30 20 10 10.5 21.5 32.5 43.5 54.5 65.5 76.5 MARKS (UPPER CLASS BOUNDARIES) C) Logarithmic Scale Graphs Logarithmic scale graphs are concern about relative changes instead of absolute change as in ordinary graphs. The logarithmic scale graphs are ideal for business that is growing in size. For example, turnover and profits may be increasing each year, but the rate of increase may be falling. Comparison between different data is soundly expressed by measuring the relative than absolute changes. The logarithmic scale graph is sometimes called ‘Semi-logarithmic graph’ because only one of the two scales on the graph is logarithmic. Logarithmic scale graph has nothing like origin as other graphs or charts where all the points are measured from the origin. To show the use of ordinary graphs and logarithmic scale graphs, consider the hypothetical data in Table 7, which shows the turnover and profits before taxation for standard chartered bank for the period 1980-1988. Table 7 Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 Turnover (DM) 1,025 1,230 1,472 1,650 1,866 2,210 2,463 2,855 3,220 Profits before taxation 100 121 164 177 185 230 242 278 313 We can plot this data on ordinary graph paper. The years should be on the horizontal axis, and turnover and profit on the vertical axis. The figure 4 that follows illustrates an ORDINARY GRAPH for turnover. The logarithmic scale graph can be drawn by finding the logarithms of turnover and profits before taxation as expressed table 8. Table 8 Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 Turnover DM 1,025 1,230 1,472 1,650 1,866 2,210 2,463 2,855 3,220 log 3.0107 3.0899 3.1679 3.2175 3.2709 3.3444 3.3915 3.4556 3.5079 Profit before taxation DM log 100 121 164 177 185 230 242 278 313 2.0000 2.0828 2.2148 2.2478 2.2672 2.3617 2.3838 2.4441 2.4955 Figure 4 Ordinary Graph DM 4,000 3,000 2,000 1,000 1980 1981 1982 1983 1984 1985 1986 YEARS 1987 1988 d) Lorenz Curve Lorenz curve is useful when wanting to express visually inequality between data. Before plotting the Lorenz curve, the cumulative totals for both sets of data is needed. Furthermore, it is necessary to express as a percentage of the total. To illustrate the Lorenz curve, suppose the following figures come from the report on the census of production 1990: textile machinery and accessories. Establishments No. 48 42 38 21 26 16 13 Net Output D’ooo 1,406 2,263 3,699 2,836 3,152 5,032 214 20,385 We need to take the first step by finding the cumulative totals for both sets of data and there after express each of the cumulative entries as percentage of the total as shown in table that follows Table 9 Establishment Nos (i) 48 42 38 21 26 16 23 Cumulative Establishment (ii) 48 90 128 149 175 191 214 Cumulative Percentage (iii) 22 42 60 70 82 89 100 Net Output (iv) 1,406 2,263 3,699 2,836 3,152 5,032 20,385 Cumulative Net Outputs (v) 1,406 3,669 7,368 10,204 13,356 18,388 28,773 Cumulative Percentage (vi) 4 10 19 26 35 47 100 Column (iii) and (vi) are plotted in figure 5. Column (iii) is plotted on the horizontal axis, column (vi) on the vertical axis. The diagonal line drawn from the origin is called ‘the line of equality’ the plot shows that the actual curve is away from the line of equality, which shows the inequality. The further the curve is away from the diagonal line, the greater the inequality. FIGURE 5 LORENZ CURVE NET OUTPUT D’000 100 80 60 LINE OF EQUALITY 40 20 20 40 60 80 100 ESTABLISHMENT NOS Charts a) Simple bar chart This is used when a data needs no comparison. Consider the data in table 10 on the utilisation of milk for the production of butter, cheese, condensed milk and others. Table 10 Milk Utilisation 1980 1981 Butter Cheese Condensed Milk Others Total D56 126 73 56 311 D58 129 77 65 329 The simple bar chart for 1980 can be represented diagrammatically. Simple Bar Chat 150 Butter 100 Cheese 50 Condensed Milk 0 1980 Others b) Multiple Bar Charts Multiple bar charts are appropriate for comparing between years. The performance of a firm over the years can be visually communicated by the use of the multiple bar charts. Considering the data in table 10, the multiple bar charts for 1980 to 1981 can be diagrammatically illustrated. Multiple Bar Char 800 Butter 700 600 500 Cheese 400 300 Condensed Milk 200 100 Others 0 1980 1981 C) Component bar chart Component bar chart is necessary when one wish to consider how the different types of milk utilisation products make up the totals of milk utilisation. We can arrive at the component bar chart by extending table 10. The extension is made by finding the cumulative for the data on milk utilisation products for 1980 and 1981 as expressed in table 11. Table 11 Milk utilisation 1980 Butter Cheeses Condensed Milk Others D56 126 73 56 1980 (Cumulative) 56 182 255 311 1981 D58 129 77 65 1981 (Cumulative) 58 187 264 329 The component bar chart can now be illustrated diagrammatically. Component Bar Chart 1200 Others 1000 800 Condensed Milk 600 400 Cheese 200 0 1980 1981 Butter d) Percentage component bar chart Percentage component bar chart expresses each milk utilisation product as a percentage of the total milk utilisation. Table 12 will give us an idea on what needs be done before drawing the diagram in figure 6 (table on milk utilisation products is used for explanation purpose). Table 12 Milk Utilisation 1980 1980 (Cum.) 1980 (%) 1981 1981 (Cum.) 1981 (%) Butter Cheeses Condensed Milk Others 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% D56 126 73 56 56 182 255 311 18 59 82 100 D58 129 77 65 58 187 264 329 18 57 80 100 Others Condensed Milk Cheese Butter 1980 1981 e) Pie Chart A pie chart serves as an alternative presentation to a percentage component bar chart. The pie chart is usually preferred when faced with the selection between pie chart and percentage component bar chart. To construct a pie chart, the student needs ruler, pencil, compass and protractor. Using the data on milk utilisation for 1980, we can construct the pie chart in figure 7 from the table 13. Table 13 Milk Utilisation 1980 (D’ooo) 1980 (Angle) Butter 56 Cheese 126 Condensed Milk 73 Others 56 56 3600=64.8 311 126 3600=145.9 311 73 3600=84.5 311 56 3600=64.8 311 311 PIE CHART Butter Cheese Condensed Milk Others Some Basic Statistics Measures The concern here is to know about the arithmetic mean, mode, and their simple application. a) Arithmetic Mean The arithmetic mean involves finding the sum of individual items and dividing the sum by the number of observations (or total frequencies). There are different ways of calculating the arithmetic mean depending on the nature of the data (i.e. raw, ungrouped and group) i) Arithmetic Mean: Raw data Consider the raw data 1,2,3,4,5; the arithmetic mean can be as follows: Arithmetic (x ) = 1+2+3+4+5 5 = 15 5 Arithmetic mean (x ) = 3 The formula to use is, n Arithmetic mean (x ) = i 1 Xi = X1+X2+---+ Xn n Where X= Arithmetic mean (Large sigma)= the sum of the individual items n= total frequencies (ii) Arithmetic mean: ungroup data Consider the ungroup data in table 14 that follows Table 14 Marks 10 20 30 40 50 Number of Students 1 2 8 3 1 The arithmetic mean can be calculated as in table 15 Table 15 Marks (x) 10 x1 20 x2 30 x3 40 x4 50 x5 Total Number of Students (f) 1 f1 2 f2 8 f3 3 f4 1 f5 f; =15 F1 x1 1x10=f1, x1=10 2x20=f2, x2=40 8x30=f3, x3=240 3x40=f4, x4=120 1x50=f5, x5=50 f; x; =460 The formula to use is f1, x1+ f2, x2 +----+fn, xn Arithmetic Mean = f1, x1+ f2, x2 +----+fn =10+40+---+50 1+2+---+1 Arithmetic mean = 30.666 Arithmetic mean = 30.670 (2 decimal places) (iii) Arithmetic mean: Group data Table 16 shows the age distribution of the estimated population of country A at 30th June 1980. Table 16 Age 0–9 Number (ten thousand) 795 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 782 670 720 707 692 494 292 The arithmetic mean can be found as in table 17 Table 17 Age X; 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 Number (ten thousand) F; 795 782 670 720 707 692 494 292 =5152 Arithmetic mean = f1, x1+ f2, x2 +----+fn, xn f1, x1+ f2, x2 +----+fn =3577.5+11339+---+21754 795+782+---+292 X; F;.x; 4.5 14.5 24.5 34.5 44.5 54.5 64.5 74.5 795x4.5=3577.5 782x14.5=11339 670x24.5=16415 720x34.5=24840 707x44.5=31461.5 692x54.5=37714 494x64.5=31863 292x74.5=21754 =178964 =34.7368 Arithmetic mean = 34.74 (decimal places) Where X= Class mark (e.g. =4.5) F= Frequencies = Summation or total b) Median Median divides the distribution into two equal parts, or it is the middle value after arranging the distribution in ascending or descending order. Median: Raw data Consider the raw data 5,3,1,4 and 2, arranging them in ascending order 1,2,3,4and5, you would arrive at the median 3. The median is 3 because it divides the data into two equal parts. There may be situations where the data is even like 1,2,3,4,5,6. You will realise that the median is 3+4 =3.5 2 We were able to get exactly 3 as the median for the data 1,2,3,4,5 because the distribution is odd. The median is the middle item, and it can be traced at the position n/2, where n is the total number of observations. In some cases n+1/2 is used instead of n/2. (i) Median: Ungrouped data Consider the ungrouped data in table 18 Table 18 Marks x 10 20 30 40 50 Total Number of students (f) 1 2 8 3 1 f = N= n =15 The median can be known by finding the cumulative number of students and thereafter using the formula n/2 or n+1/2 to trace the median as expressed in table19 Table 19 Marks 10 20 30 40 50 Number of students 1 2 8 3 1 N= n= f =15 Cumulative 1 1+2=3 3+8=11 11+3=14 14+1=15 Using the formula n+1/2 or n/2 (n/2 preferred), we have 15/2 =7 ½th value. The 7 ½th value falls under the mark 30. Hence, the median mark is 30. iii. Median of Grouped Data Table 20 shows a shop which holds 100 units of an item at the start of each week is concerned that there is too high a level of stock. Weekly sales of the item during the past two years have been as follows: Table 20 No. of items sold 1 - 20 21 – 40 41 – 60 61 – 80 81 – 100 No. of weeks 6 20 40 38 3 The median is measured by finding first the cumulative number of employees and thereafter using the formula n/2 to trace the median class. The upper class boundary should be considered and it would be ideal to use the less than type of cumulative frequency. If the classes are like 8 –10, 10 –12, etc, then there is no need for adjustment. Furthermore, if the classes are like 1-20, 21-40, etc, then adjustment needs to be done by subtracting 0.5 from the lower boundaries and adding 0.5 to the upper boundaries (i.e. 0.5-20.5, 20.5-40.5.etc). Table 21 that follows will explain the approach to use before the calculation. Table 21 No. Of items sold (Adjusted) 0.5-20.5 20.5-40.5 40.5-60.5 60.5-80.5 80.5-100.5 TOTAL No. Of items sold No. Of Weeks (Upper Classes) Less than 20.5 6 “ “ 40.5L1 20 “ “ 60.5L2 40 “ “ 80.5 38 “ “ 100.5 3 n=107 No. Of Weeks (Cumulative) 6 F1 6+20=26 F2 26+40=66 (median Class) 66+38= 104 104+3=107 The formula to use in calculating the median is Median = L1 + n/2 - F1 x F2 + F1 (L2 – L1) Where L1 = Lower class boundary of the median class L2 = Upper class boundary of the median class n = Number of observations ( total number of weeks) F1 = Cumulative frequency curve corresponding to the lower class boundary of the median class. F2 = Cumulative frequency curve corresponding to the upper class boundary of the median class. From the table, L1=40.5; L2=60.5; F1= 26; F2 = 66; Median = L1 + n/2 - F1 x F2 + F1 (L2 – L1) = 40. 5 + 107/2 – 26 x (60.5 – 40.5) 66 + 26 = 40.5 + 53.5 - 26 66 + 26 x 20 = 40.5 + 27.5 x 20 92 = 40.5 + 0.2989 x 20 = 40.5 + 5.978 = 46.478 = 46.5 (1 Decimal Place) Therefore, the median is 46.5. Median: Graphical Method The median can be graphically arrived at by first drawing the cumulative frequency curve followed by the use of n/2 to trace the median as illustrated in the diagram. The cumulative frequency curve is drawn by plotting the cumulative frequency against the upper class boundaries of the different classes. Figure 8 MEDIAN: GRAPHICAL METHOD Cumulative Number of Weeks 120 100 Cumulative NO. Of Weeks 80 60 CUMULATIVE FREQUENCY CURVE 40 20 20.5 40.5 60.5 80.5 100.5 No. of Items Sold MEDIAN = 40.5+ 6 = 46.5 (1 Decimal Place) Mode Mode is defined as the value or class that occur most frequently. For example, if we have the observations, , , the mode is. The mode is because it occurs twice whilst the other values occurred once. It is possible for two or more values to become the mode, provided they have the same number of occurrences, for example, , , , , the values and are the mode. i. Mode: Raw Data The heights of five students in a class are 3.5, 5.5, 6.2, 4.5 and 5.5. The modal height is 5.5. The answer is 5.5 because it is the value that occurred most frequently. ii. Mode: Ungrouped data Use the distribution in table 22 to find the mode MARKS 10 NO. OF STUDENTS 1 20 30 40 50 2 8 Mode 5 1 The modal mark is 30 because it occurred most frequently compared to the other marks. i Mode: Grouped data Table 23 shows the weekly income of employees in xyz Ltd. WEEKY INCOME (D) 8-10 10-12 12-14 L1 14-16 L2 16-18 18-20 20-22 The mode is calculated by using the formula Mode = L1 + fa fa NUMBER OF EMPLOYEES 34 58 69fa 100 Modal group 95fb 70 35 x (L2 - L1) + fb Where L1 = Lower limit of the modal group fa = Frequency in group below the modal group. fb = Frequency in group after the modal group L2 = Upper limit of the modal group The values of the symbols are indicated in the table above and are as follows: L1=14; L2=16; fa=69; fb=95 Substituting the values in the formula, Mode = L1 + fa fa =14 + x (L2 - L1) + fb 69 x ( 16 – 14) 164 =14 + 0.4207 x 2 = 14+0.8415 =14.8415 Mode = 14.8 (1 decimal place) Therefore, the mode of the grouped data is 14.8. Mode: Graphical Method Graphically, the mode can be found by first drawing the histogram and later on tracing the mode from the histogram, as explained in figure 9. FIGURE 9 MODE: Graphical Method Number of Employees 100 90 80 70 60 50 40 30 20 10 0 8 10 12 14 16 18 MODE=14+0.8 = 14.8 20 22 Weekly Income