Units of Analysis and data structure The Basics Outline Definitions Elements of the unit of analysis Data structure Unit of observation The unit of observation is a basic concept in quantitative research that represents the objects that are observed and about which information is systematically collected. Unit of analysis The unit of analysis is the object about which generalizations are made based on an analysis. Determination The unit of observation is determined by the method by which observations have been selected; The unit of analysis is determined by an interest in exploring or explaining a specific phenomenon. When they are the same The unit of observation and unit of analysis are the same when the generalizations being made from a statistical analysis are attributed to the unit of observation. Identifying units of analysis The unit of analysis is shaped by three properties: Social entities Time Space Social entities observations of a single social entity, such as a person or an institution observations of multiple entities with a defined relationship, such as family, employer-employee Social Phenomena transactional observations that are the result of actions among entities, such as labour strikes or international conflicts, including wars. Time observations made at one point in time: a cross-sectional study Time observations made at multiple points in time the data may be organized by time; commonly referred to as a time series time may structure some form of repeated measures of content or subjects Space observations made within a specific spatial area observations made within a hierarchy of spatial areas Substituting units There may be requests for which data for a desired unit of analysis can’t be delivered but for which data are available summarized over one of the other attributes of the unit of analysis. Substituting units Example: Request for firm-level data for NAICS 312 Beverage and Tobacco Product Manufacturing Ideal source: microdata on establishments from the Canadian Census of Manufacturers No access to enterprise microdata Substituting units Example: NAICS 312 Alternatives: are there aggregate data summarizing the firms within NAICS 312? Possibilities: summaries over time (time series) or geography (small-area business statistics) Statistical data structure Let’s review basic data structure. The unit of analysis defines the underlying structure of a data file. Statistical data structure This structure consists of a series of rows with each row containing the data of one member of the unit of analysis. This simple structure is known as the flat, rectangular data matrix. Statistical data structure Case 1 Case 2 Case 3 * * * Case n-1 Case n Statistical data structure All of the information collected for each member of the unit of analysis is organized in a fixed location in the file called fields or variables. Statistical data structure * Case 1 Case 2 Case 3 * * * Case n-1 Case n * Statistical data structure * Case 1 Case 2 Case 3 * * * Case n-1 Case n * Statistical data structure This structure looks like the grid of a spreadsheet. However, there is one very important difference between a statistical data structure and a spreadsheet. The spread sheet is organized around individual cells, while the statistical data structure is organized around the rows. Statistical data structure Spreadsheet Statistical data structure Spreadsheet Cell B2 Cell E3 Cell C5 Cell F7 Statistical data structure Statistical Data Structure Row 1 Row 3 Row k-1 Statistical data structure The next slide presents the way that this simple statistical data structure appears in SPSS. Row 1 Row 1 Row 8 Row 1 Row 8 Row 15 Field 8 Row 1 Row 8 Row 15