MATH 2441 Probability and Statistics for Biological Sciences Types of Data Statistics deals with the organization, summarization, and analysis of the implications of experimental observations or measurements. How these operations are carried out, and what sort of mathematical methods are appropriate depends on the nature of the observations. Although most of what is mentioned in this document is "common sense", it is important to be aware of the issue. Whenever we make a measurement or an observation, we are measuring or observing something. In statistics, that "something" is called a variable, because different measurements or observations of its "value" may be different -- they may vary over a set or range of possibilities. There are at least three important ways to classify different types of data in statistics. First, there is the distinction between qualitative data and quantitative data: the term qualitative comes from the word "quality", indicating a property, characteristic, feature or attribute. Qualitative data is always a list of words or names of a characteristic. Examples of qualitative variables (which have qualitative "values") are the flavor of ice cream, the color of a person's eyes or hair, the species of a selected life form, the brand of potato chip selected by a customer, the presence or absence of a particular genetic feature, etc. the term quantitative comes from the word "quantity", indicating amount, measure, number, size, etc. Quantitative data is always a list of numerical values where the numbers are more than just names, but actually represent measured numerical values. Examples of quantitative variables that might be considered in studying the population of BCIT students are the height of a student, the age of a student, the number of apples the student ate in the past week. However, the student ID number is a qualitative variable rather than a quantitative variable, since it is in some way equivalent to a name for that student. The numerical digits in a student number are not intended to indicate the measure or size or amount of something that that student has. Sometimes numerical digits are used to represent qualitative values. Thus, the players on a sports team often have numbers on their shirts, but these numbers are qualitative labels, not quantitative values. Similarly, statisticians sometimes code qualitative values with numerical digits -- for example, letting the numerical digits 0 and 1 stand for the qualities "male" and "female", respectively. Grey areas can arise. For example when we use a scale of 1 - 5 to represent the range of responses to questions from "strongly disagree" to "strongly agree" on survey-type questionnaires, one could regard the result as qualitative (ie., one of the list of "strongly disagree", "disagree", "no opinion", "agree" or "strongly agree") or as qualitative (the values 1, 2, 3, 4, and 5 measuring the degree of agreement with the statement given). Arithmetic operations often make sense with qualitative data, but do not make sense with qualitative data. Secondly there is the notion of scale, of which statisticians distinguish four kinds: nominal scales: the observation of the variable results in one of a set of characteristics or attributes, rather than a numerical value. The word "nominal" comes from the word "name", meaning that the observations will be names rather than numerical values. Nominal scales result in qualitative data. Examples of nominal scales are: flavor (for example, the choice of ice cream purchased by a randomly selected customer might be chocolate, vanilla, strawberry, etc. There is no numerical or quantitative relationship between these flavors: we can't say that vanilla is twice as much as chocolate or that vanilla is five more than chocolate, etc.) gender (possibilities usually are just "male" or "female") David W. Sabo (1999) Types of Data Page 1 of 3 species or variety (if we're talking about, say, lettuce plants, the observed variety might be romaine, buttercrunch, iceberg, leaf, etc.) genetic phenotypes Although people sometimes use numerical codes to label such attributes (for example, we might record ice cream flavors as 1 = chocolate, 2 = vanilla, 3 = strawberry, etc.), these numerical codes are still just names, not values. We know this is so, because in this particular example, it makes no sense to talk about flavor 2.5 being halfway between vanilla and strawberry, for instance. Nominal scales have no natural ordering from least to greatest, or smallest to biggest, or even some intrinsic notion of first to last. ordinal scales: the possible observations of the variable form a set which has a natural order -- the observations can be ranked in some order. Ordinal scales can result in either qualitative or quantitative data. Purely ordinal scales are not as common in technical applications as the other three because usually, the natural order is a result of numerical value, and so the data really belongs to the last two types described below. However, a couple of simple examples of an ordinal scale are: alphabetic order sets of levels (for example, a school student is classified as being in grade 1, or grade 2, or grade 3, etc. There's no implication that a student in grade 2 knows twice as much stuff as a student in grade 1 (though it is true that a student completing grade 2 has completed one grade more than a person completing grade 1, and so in this sense, the grade level completed can be regarded as an interval scale). However, there is a notion of increasing knowledge and skill as one progresses from one grade to the next through the system.) numerical categories (for example, rather than record the actual weight gain of mice on a particular diet -- which would result in a ration scale, see below -- we might simply categorize the observations as weight loss, no change, small gain, moderate gain, and large gain. These five possibilities represent an increase amount of weight gain, but only indicate relative ranking, not precise relative size.) rating scales (you see these in surveys where you are asked to select responses to a sentence from the set of strongly disagree, disagree, no opinion, agree, strongly agree, etc.) Ordinal scales are most often used in biological applications when it is not possible or feasible to work with either an interval or a ratio scale, but the data reflects some sort of ordering or size property. interval scales form the first of two distinctly numerical or quantitative scales. In an interval scale, differences between observed values have significance, but their ratio does not. Another way of saying this is that interval scales do not have a true zero. Examples of interval scales are: the celsius (or fahrenheit) temperature scales. A temperature difference between 40 0 and 200 is the same as the temperature difference between 700 and 500 (for example, it would take as much heat to raise the temperature of some water from 20 0 to 400 as it would to raise the temperature of that water from 50 0 to 700). However, it does not make sense to speak of 400C as being twice as hot as 200C. Nor does it make sense to talk of a temperature of 00C indicating the absence of temperature or the absence of heat. time scales are interval scales. ratio scales are interval scales that also have a natural zero, so that ratios of values (and not just differences between values) are meaningful. Examples are: concentrations (a 2 M solution is twice as concentrated as a 1 M solution. A 0 M solution indicates a solution containing no solute.) measurements of size relative to some standard (for example, measurements of length in meters. A plant 1.5 m tall is twice as tall as a plant which is 0.75 m tall. A mouse which weighs 36 g is twice as heavy as a mouse which weighs just 18 g.) Like interval scales, ratio scales are always numerical. In brief summary, we can say: Page 2 of 3 Types of Data David W. Sabo (1999) a nominal scale consists of an unordered set of qualitative "values" an ordinal scales looks like a nominal scale, but with the possible "values" having a meaningful or natural ordering from first to last, or least to greatest, etc. an interval scale looks like an ordinal scale (has ordering), but with the differences between possible values also being meaningful a ratio scale looks like an interval scale, but with the ratios of possible values also being meaningful. Thirdly, when dealing with quantitative data or variables, it will be necessary to distinguish between discrete and continuous data, and their corresponding variables. the possible values of discrete variables form a set of distinct, isolated quantities. Observations that result from counting objects or items give discrete data, since only whole number values can arise. Thus, the number of "heads" observed when a coin is flipped four times is a discrete quantity, because the only possible values that can arise are 0, 1, 2, 3, or 4. The number of mice in a sample of six which have a certain genetic mutation will be a discrete value, since the only values that can arise are 0, 1, 2, 3, 4, 5, or 6. the possible values of a continuous variable form an unbroken set of decimal values, with at most a finite number of distinct gaps. Continuous variables usually result from measurements made relative to a standard scale of size: for example, length, mass, time, temperature, etc. Thus, the mass of a mouse selected at random is a value from a continuous scale, since in principle, any value between 0 g (a very light mouse!) and some maximum value could occur. The distinction between discrete and continuous variables is quite important from a methodological point of view. Methods for solving problems involving continuous variables almost always are based on concepts from calculus, whereas methods for solving problems involving discrete variables often just involve simple arithmetic or algebra. Both discrete and continuous variables arise in biological sciences applications, though continuous variables are quite a bit more common. David W. Sabo (1999) Types of Data Page 3 of 3