Chapter 8 Locating and Collecting Economic Data Introduction • In this chapter we focus on how data are constructed and where they may be found. Collecting and manipulating data is the key part of an empirical research project. A research project is nothing without adequate data and an original testable hypothesis. We, as researchers, have to be sure that there is enough data to adequately test our hypothesis. Otherwise, we might have experienced the dullness(!) of investing a great deal of time and effort just to see that the data are not available to test our painstaking hypothesis. Data Creation • The vast majority of data are constructed rather than collected. For this reason, statistics is made up not only of facts but also of knowledge which is created. Steps in data construction Best (2001) identifies 3 steps in the construction of a data series: • Defining the concept • Deciding how the concept will be measured, and • Determining how to define the sample on which the data will be based. • Every data series is constructed for a specific purpose. However, a given data series may not be defined or measured in a way that best matches your needs. So, sometimes [or probably most of the time(!)] you may need to construct your own data. Sample data • Most social science statistics are based on sample data rather than populations. For example, average family income is not the average income of ALL families; rather, it is the average income of families in the sample. The Structure of Economic Data • It is important to distinguish between those organizations that collect or produce data and those that publish it. Characteristics of data sets • Data comes in 3 forms: time-series, cross-section, and longitudinal (panel) data. Time-series data gives different observations or data points on the same variable at different points accross time (ex.: Turkish GDP per capita over the time period 1923-2009). Cross-section data, by contrast, gives different observations of a comparable variable at the same point in time (ex.: average disposable personal income across different cities of Turkey for 2009). Longitudinal (panel) data take a cross-section sample and follow it over time (ex.: a sample of family income for the same 10 families over 5 years). • Longitudinal data is an example of a micro data set, since the data points or observations are of individual economic agents such as individuals, households, or firms. Macro data are compiled at national level. Besides, the frequency of data changes as well. You may find daily, weekly, monthly, quarterly, or annual data. Organizations That Collect and Publish Data • • • • • A number of US governmental, international, and private organizations gather economic and social statistics. There are; Census bureau (www.census.gov) Bureau of economic analysis (www.bea.doc.gov) Bureau of labor statistics (www.bls.gov) The federal reserve (www.federalreserve.gov) International agencies; International Monetary Fund Worlg Bank OECD Eurostat Asian Development Bank Inter-American Development Bank For Turkish data sources; • Central Bank Of the Republic of Turkey (www.tcmb.gov.tr) • Turkish Statistical Institute (www.tuik.gov.tr) • State Planning Organization of Turkey (www.dpt.gov.tr) Major Primary Data Collections • US national income and product accounts (official national accounts of the US). • US flow of funds accounts (data on financial flows across the US economy) • US balance of payments accounts and international investment position of the US. • US census of population and integrated public use microdata series. • Current population survey. • Current employment statistics. • The economic census. • Annual survey of manufacturers. • Current industrial reports. • American housing survey. • Consumer expenditure survey. • National longitudinal surveys. • Panel study of income dynamics. • Surveys of consumers. • Survey of consumer finances. Major Secondary Data Collections • • • • • • • • • • • These sources are usually more user friendly compared to the primary sources. Economic report of the president. Economagic. FRED II (federal reserve economic data) (an excellent source for US macro and financial data). Stat-USA/State of the nation. Inter-university Consortium for political and social research. International financial statistics (principal data set of the IMF). World economic outlook database. Penn world tables. Joint BIS-IMF-OECD-WB statistics on external debt. Eurostat OECD main economic indicators and national accounts Chapter 9 Putting Together Your Data Set Introduction • Empirical research can be divided into 2 types: experimental and survey (nonexperimental). In the first one, the data come from the experiment. Collecting the data is the major part of the study. For the latter, we use preexisting data. Researchers generally donot put the same care and effort into it. This is undoubtedly a huge mistake! Developing a Search Strategy for Finding Your Data It is a good idea to start with a search strategy. We have 2 steps: Step1: Before you search You need to have a large sample size (large enough to obtain statistically valid empirical test results). The second issue is that of a random or representative sample which will be discussed in details in the 10th chapter. The third one is obtaining data that correctly measure the concepts that your theory deems important. Once you have determined your list of desired variables, the next step is to think about where those data are likely to be found. To summarize step 1 by raising questions; • What are the desired variables? • How should each variable be defined? • What data frequency and sample period or what level of analysis? • What are potential sources for data on each variable? Step 2: As you search • • • • As you begin to investigate each data source, you need to ask several questions What data are in fact available? If the data are not the ideal, are they good enough? If the data are not acceptable, is there an available proxy that is? (a proxy is a variable that should behave roughly the same as your theoretical variable). If there is no adequate proxy, how can the hypothesis be reformulated to make it testable, given the data available? Data Manipulation • • • • • • • • • Data for any variable may be found in various forms some of which are listed below: Levels Per capita (per person) Changes Rates of change (growth rates) Annualized growth rates Proportions Nominal Real Index numbers Level of variable • This is the most basic form. It is the actual value or size of the variable being measured (ex.: level of Turkish GDP per capita in 2009 is TLX). Researchers often use per capita form of the variable which is found by dividing the level of the variable by the appropriate population. Change in variable • Sometimes it is more useful to examine the change in a variable than the level. ex.: Say that GDP of Turkey in 2008 and 2009 are X and Y respectively (in TL). Then, the change between 2008 and 2009 would be (Y-X). A more meaningful evaluation would be made by calculating the rate of change (percentage change or the growth rate). If we turn to the example, the rate of change between 2008 and 2009 would be calculated as follows: • G= [(Y - X) / X]*100 • For periods of time shorter than a year, annualized growth rate is used. Let us give an example: • Assume that the sales of a company grows by 10% each quarter and that the beginning value for sales is TL100. Then, • 1.10*100= TL110 for the 1st Q. • 1.10*110= TL121 for the 2nd Q. • 1.10*121= TL133.1 for the 3rd Q. • And finally, 1.10*133.1= TL146.4 for the final Q. So, the annualized growth would be 46.4% which is 6.4% more than the rough approximation (10%*4= 40%). To formulate this rate: • Gq = [(X1/X0)^4 – 1]*100 (X0: initial value; X1: next period’s value) • In our example: [(110/100)^4 – 1]*100= %46.4. • If the data is monthly then we should raise the ratio of monthly values to the 12th power. • A form of data similar to growth rate is proportion. It is also called a share or a percentage or a fraction. Let us give a numerical example: • Suppose that; GDP= TL 10446.2 = (C=7303.7)+(I=1593.2)+(G=1972.9)+(X-M=-423.6) • The proportion of consumption expenditures in GDP would be calculated as follows: • C/GDP = 7303.7/10446.2 = 0.699 = 69.9%. • Similarly, the share of other components in GDP would be calculated as: • I/GDP = 1593.2/10446.2 = 0.153 = 15.3%. • G/GDP = 1972.9/10446.2 = 0.189 = 18.9%. • (X-M)/GDP = -423.6/10446.2 = 0.041 = -4.1%. Real versus Nominal Magnitutes • Let’s remember the simple identity below: • V=PxQ where V: nominal (or value); P: price, and Q: real (or volume). • Nominal data are data measured by using the actual market prices that existed during the time period in question. Real data, at the micro level refer to the actual quantities employed by a firm (labor hours), produced by a firm (number of widgets), or sold by a firm (sales volume). Index numbers • Real GDP is an example of what economists call an index number, or more specifically, a quantity index. The other type is a price index. Index numbers, unlike most other statistics, have no units. They are designed for comparison purposes. For ex., one could use an index number to compare the level of whatever the index is measuring to an earlier time period known as the base period. • • • • The formula is as follows; XT = (Xt/X0)*100 Xt is the value of the raw variable in a given time period t in the series, X0 is the value of the raw variable in the base period, which is the period to be compared against, XT is the resulting index number. Note that in the base period Xt=X0. Quantity indices versus real quantities We have 2 ways to make real measurements: 1. Create a quantity index (weighted average of the quantities), 2. Create a price index and divide the nominal value by this price index. Price indices versus implicit price deflators • We have 2 ways to make price measurements. One is to create a price index. The other one is to create a quantity index and divide the nominal value by this quantity index. The result is called the implicit price deflator. How inflation distorts nominal values Because prices tend to increase over time, it would be misleading to compare the nominal measurements. By using base year prices and actual year quantities, real GDP excludes the effects of changing prices over time. Rebasing data series • Base year is generally changed to keep it “recent” because we do care more about the recent economic changes than historic ones. Let us give a numerical example: • Suppose that we have the following annual CPI data; Year 1991 92 93 94 Base 1997 95 96 97 98 0,85 0,95 1,00 1,15 0,95 1,00 1,15 Base 1992 0,95 1,00 1,25 1,30 1,40 Linked series 0,58 0,61 0,76 0,79 0,85 Now, suppose that we want to complete the series with a 1997 base year. We need to transform the values for the observations only available with a base year of 1992, so they correctly show the change in CPI between both parts of the data. As seen from the table, year 1995 is the year of overlap, that is, we have 2 values for this year. The data with the earlier base year need to be reduced to link to the data with the later base year. The amount of the reduction is given by the ratio of the two values for 1995. So, the reduction factor would be: 85/140= 0.607. To obtain the linked series with a 1997 base year, each value with base year 1992 would be multiplied by the reduction factor (0.607) (shown in the third row). Data smoothing • Data that has volatility are sometimes “smoothed” to better reveal the underlying trends. There are a variety of techniques for smoothing data. We will discuss only 2 of them. 1. Moving averages, 2. Seasonal adjustment. • A moving average replaces the actual data point in each period with an average of the (n-1) preceding data points with the nth. The result is that any abnormal observations become less important, since they are averaged with more normal ones. • Some variables show seasonal patterns, whereby they change predictably in certain months, quarters, or seasons. This is a question for which seasonally adjusted data are designed. Constructing a data appendix to your research • It is good scientific practice to make your data available so that other researchers may replicate your work. In explaining your sources and methods, you should provide clear citations for exactly which sources provided the raw data, as well as a complete explanation of how you manipulated the raw data to transform it into the form you actually used. • Thanks for affording time.