UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Variables, Variable Types, Data Set Structure, and Data File Types Often, the type of Econometric analysis that is appropriate in a given situation depends on the types of variables and data that are available. In this handout, we will discuss the issues and terminology involved when describing Variables, Data, and Variable Types, Data Observations (rows), Data Fields (columns), and Data File Types. Types of Variables Recall that variables are things that (1) can be observed, (2) can be measured/described, and (3) have the potential to vary. Variables whose values can be measured are called Cardinal/Measurement/Scale Variables. Variables whose values we can’t measure but instead simply describe different categories of things are called Ordinal/Nominal/Qualitative/Categorical/ Character/ String Variables. These two primary variable types can be further divided into sub-categories. A variable's type is important, because some analysis methods can be applied to only some types of variables and not to others! Cardinal/Measurement/Scale Variables -- numeric (number) variables where the distance between any two values has the same meaning from one data observation to the next. Continuous -- cardinal numeric variables for which there is an infinite number of fractional values between one number and the next Discrete -- cardinal numeric variables for which there are no fractional values between one number and the next (for example, a variable that takes only integer values is a discrete variable) Ordinal/Nominal/Qualitative/Categorical/Character/String Variables--data values that are composed of numbers or text characters (e.g., persons' names, city names, colors, etc.) that indicate categories rather than measurements. Ordinal Numeric Variables -- numeric (number) variables that indicate order or rank, rather than measurement; the distance between any two values is not necessarily the same from one data observation to the next. The numbers indicate ordered/ranked categories rather than measurements. For example, suppose two people are rating product desirability on a scale of 1 to 10. Suppose both people rank the first product “5” and the second product “3.” Both people rank the first product higher than the second product, but the key is that the difference between a “5” and a “3” for the first person might be different from the difference between a “5” and a “3” for the second person. So, the “5” and the “3” simply indicate ordered/ranked categories rather than cardinal measurements. Ordinal Character Variables – the same as ordinal numeric variables, but characters (text) values are used to indicate the different ordered/ranked categories (e.g., Likert scale data from questions asking “strongly agree, agree, don't care, disagree, strongly disagree”) Nominal Numeric Variables -- numeric variables that indicate unordered/unranked categories. For example, using values 0, 1 and 2 to represent the colors Red, Blue, and Green. In this case the numbers do not necessarily indicate that the colors are ranked, with Red coming before Blue, etc., the numbers simply indicate different (but unranked) categories of color. (Green is not twice as big as Blue just because a 2 is used to indicate Green and a 1 is used to indicate Blue). Nominal Character Variables – the same as nominal numeric variables, but character (text) values are used to indicate the unordered/unranked categories (e.g., persons’ names, colors, states) 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Rules for Naming Variables in SAS--In SAS, variable names must follow these rules: Names must be 32 characters or less in length Names must start with a letter or underscore (name may NOT begin with a number!) Names may contain only letters, underscores and numbers (after the first character) Names may not contain blank spaces, dashes, or the special characters % $ ! * & # @, etc. Names may contain any mix of upper and lowercase letters. SAS is not case-sensitive. That is, capitalization doesn't matter when it comes to variable names. Reserved Words are words that SAS uses for special purposes and so cannot be used as variable names. The names _N_, _ERROR_, _FILE_, _INFILE_, _MSG_, _IORC_, and _CMD_ are reserved words in SAS. Note that reserved words in SAS start and end with an underscore; to avoid conflicts with reserved words, it is recommended that you do not use variable names that start and end with an underscore. Missing Values--A Missing Value is a data value that is missing from a data set, because either the value was never collected, or it was collected but not entered into the database, or it was lost or deleted from the database. In SAS, a missing data value is indicated by a single period ".". The period is used to "hold the place" of the missing data value in the data set. Observations/Cases (Rows) and Variables/Fields (Columns) Data are typically arranged in horizontal rows and vertical columns, such as the rows and columns of a table or the rows and columns in a spreadsheet. An "Observation" or "Case" is a person, place, thing or time on which (during which) data are collected. Observations/Cases are usually the rows in your data set. Sometimes, an observation number is displayed at the right of each row to number the observations for ease of reference. (SAS automatically creates a column of observation numbers, which it names "Obs", and adds these numbers to your data set.) In SAS, the maximum number of observations in a data set is limited only by your computer memory. A "Variable" or "Field" is a type of data collected for each observation. Variables/Fields are usually the columns in your data set. Often, the variable/field names are displayed at the tops of the columns. If a data set has a column of observation numbers used to number the rows, the observations numbers are usually not counted as a variable. In SAS, the maximum number of variables is 32,767 (for full compatibility with earlier versions of SAS). Typical Data Set Structure Observation Numbers Variable Names Observations, or Cases Variables, or Fields Obs Name Height Weight Sign 1 Larry 6.2 150 Aquarius 2 Moe 5.7 180 Virgo 3 Curly 5.4 215 Libra 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Data File Types Data can be stored in several types of computer files. The type of computer file is usually indicated by the "extension" at the end of the computer file name. The extension is the part of the file name that follows the dot in the file name. For example, the extension on a data file named "DumasData.xls" is the "xls" part. Some common file types and extensions are listed below: Space or tab-delimited text file Comma-delimited text file Microsoft Excel spreadsheet file Lotus 123 spreadsheet file Data Interchange Format database file Microsoft Access database file dBase database file Stata data file SPSS data file SAS data file .txt .prn .dat .csv .xls .xlsx .xlsm .wks .wk1 .wk3 .dif .accdb, .mdb .dbf .dta .sav .sd2 .sd7 .sas7bdat Some data files are simple text files; common examples include ".txt" ".prn" ".dat" and ".csv" files. Simple text files must use "delimiters" to indicate where one data value ends and the next begins. A "delimiter" is simply a special character that indicates where one data value is separated from the next. A "space-delimited file" has each data value separated by a space, a "comma-delimited file" has each data value separated by a comma, and so on. IMPORTANT: If a data file uses a space as a delimiter, then none of the data values should contain spaces, or SAS will think that there are two data values instead of only one. Also, a blank cannot be used to represent a missing character value if a space is used as a delimiter--either change the delimiter to something else or use something other than a blank to represent a missing character value. Similarly, if a data file uses a comma as a delimiter, then none of the numbers in the data should contain commas. In Microsoft Windows, the file name extensions are sometimes "hidden" to shorten the file names. If this is the case on your computer, you need to "un-hide" the file name extensions so that you can see what type of data file you are working with. You can un-hide file name extensions by going to the Properties of the Windows folder and changing the properties of the folder to show file name extensions. SAS can open and "read" all of the data file types listed above, but you must tell SAS which data file type you want to open. More about this in the handout on Proc Import and Proc Export. 3