STATISTICS What is Statistics? Statistics consists of a body of methods for collecting and analyzing data (Agresti & Finlay, 1997). It is a method of dealing with data. It is a tool concerned with the collection, organization, presentation, analysis and interpretation of numerical information. Two branches of statistics . Descriptive Statistics is concerned with the presentation of information in a convenient, usable and understandable form (Runyon and Haber, 1986). Other writers refer to descriptive statistics as the procedure used in describing properties of a sample, or of a population where complete population data are available. Example: If we measure the Intelligence Quotient (IQ) of all the students in the School of Graduate Studies and calculate its mean, that mean is a descriptive statistics because it describes the characteristics of a complete population. Inferential Statistics is concerned with generalizing this information more specifically, with making inferences about population which are based upon samples taken from population (Runyon & Haber, 1986). Here a sample is selected with the intent of predicting what the larger population is like. Example: If we wish to make a statement about the mean IQ of all students in the School of Graduate Studies at the Bukidnon State College computed on a sample of 100 students and estimate the error involved, we use the procedure from inferential statistics. Terms and Concepts Variable and Constant Variable refers to a characteristics or phenomenon which may take on different values. In addition, a variable is something that has two or more meaningful and useful divisions, categories, characteristics, or values (Grimm & Wozniak, 1990). Example: 1. Grade point average 2. Height 3. Weight 4. Tribe 5. Age These will take on different values when different individuals are observed. Another example of variables are: shirt in different sizes (small, medium, large, extra-large). Social class with categories of upper, middle and lower class. Religion with categories of Roman Catholic, Protestant, Seventh Day Adventist, Mormons, etc. A variable is contrasted with a constant, the value of which never changes. Example: pi, is a constant which always takes the value of 3.1416…. Population, Sample and Census Population is a complete set of individuals, objects or measurements of interest in a study. Sometimes the population is a clearly defined set of subjects. Example: We may wish to investigate all the students’ grades after this course to find out relationship between their Grade Point Average and their scores in other foundation subjects. Sample is a subset of a population. It is a portion of the population. Oftentimes it is impossible to take all the members of the population because of cost, time and manpower constraints. A subgroup may be selected to represent the total population. Example: We may choose only 100 students from the School of Graduate Studies at the Bukidnon State College. The 100 students are then the sample. Census is the collection of data from every element in the population (Triola, 1998). In census there is what we call as complete enumeration. Closely related to the concepts of population and sample are the concepts of parameter and statistic. The following definitions are easy to remember if we recognize the alliteration in “population parameter” and sample statistic.” Parameter and Estimates Parameter is any characteristic of the population which is measurable. It is a numerical measurement describing some characteristic of a population. Usually, parameter or population values are unknown. We estimate them from sample values. In statistical notation, the Greek letters (e.g. . µ and σ are to represent population parameters). Example: The grade point average and standard deviation of all students in the School of Graduate Studies. Estimate or statistic calculated from a sample in order to estimate the population parameter. It is a numerical summary of the sample data. We shall employ the Roman letters (X and s) to represent estimates. Different symbols are used for parameters and statistics. Example: The mean IQ scores of a random sample of students under this class is used to estimate the IQ scores of all the students in School of Graduate Studies. Characteristic Parameter Mean Standard deviation Variance Pearson Correlation Coefficient 𝜇 , mu 2 Statistic _ X s S2 r Number of Cases N n The Nature of Data Some data sets consist of numbers (such as heights, scores in the test, etc.) and others are nonnumerical (such as gender). The terms quantitative and qualitative data are often used to distinguish between these two types. 1.Quantitative data consists of numbers representing counts or measurements. Quantitative data can be described by distinguishing between the discrete and continuous types. . Discrete data result from either a finite number of possible values or countable number of possible values. The number of possible values is 0, or 1, or 2 and so on. Continuous data result from infinitely many possible values that can be associated with points on a continuous scale in such a way that there are no gaps or interruptions . When data represent counts, they are discrete; when they represent measurements, they are continuous. The number of students in this class is discrete data; the amount each one has in the wallet now is a continuous data because they are measurements that can assume any value over a continuous span. Four Levels of Measurement Another way to classify data is to use four levels of measurement: 1. nominal, 2. ordinal, 3. interval and 4. ratio. The nominal level of measurement is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high). The simplest measurement scale is termed nominal or classificatory. The categories of nominal variables do not differ by quantity, degree, or amount, but only by kind. Example: The two categories of the nominal variable “gender” (male and female) are distinct, do not overlap, include possible sexes, and cannot be ordered or ranked. The same would be true of the nominal variable “region” which might be broken into the categories of NCR, Region I, Region II, Region III, Region IV, Region V, Region VI, Region VII, Region VIII, Region IX, Region X, Region XI, Region XII, and ARMM, etc. Nominal scales represent the lowest level of measurement because they allow you only to count and compare the number of cases in each category. Other examples of nominal scales are given below: The numbers on baseball players’ uniforms are nominal in nature. In Social Science research, groups in sample are commonly labeled with numbers (such as 1 = Matigsalog, 2 = Talaandig, 3 = Higaonon, 4 = Manobo). However, when these numbers have been attached to categories, averaging the numbers together is not usually advisable. On the scale above for ethnic groups, the average score of 1.87 would have no meaning. The ordinal measurement scales involves data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless. The ordinal measurement scales classify people or things into types or kinds, but with one additional feature. Here the classes or categories can be ranked. Ordinal categories are distinct, mutually exclusive, and exhaustive, but they are also orderable in terms of quantity, magnitude, or some other criteria. In other words, ordinal measurement scales have the property of magnitude but not the property of equal intervals for the property of absolute 0. It allows us to rank individuals or objects but not to say anything about the meaning of the differences between the ranks. Example: For example, the three categories of the ordinal scale “social classes” (upper, middle, and lower) are distinct, do not overlap, include the entire range of social class, and can be ranked: The upper class is higher than the middle class and the middle class is higher than the lower class. No statement can be made however about the amount of difference between categories. The differences between upper and middle and between middle and lower are not calculable. Another example is ranking students GPA. If you ranked 1st in a class of 400, the rank indicates greater than or less than, but not how much higher or lower. The interval level of measurement is like the ordinal level, with the additional property that we can determine meaningful amounts of differences between data. However, there is no inherent (natural) zero starting point (where none of the quantity is present. Although the categories of nominal and ordinal scales cannot be further subdivided on a measurement scale, the values of interval permit distances and differences between values on a scale to be considered or measured. Some social researchers even distinguish between interval and ratio scales. In both cases interval scales are of equal size. Whereas with interval scales there is an arbitrary zero point, however, with ratio variables there is a true zero point where zero is equivalent to a total absence of the variable. Example: For example, time measured by calendars temperature on the Fahrenheit scale, and intelligence by IQ scores are interval variables because zero values do not mean the total absence of time, temperature, or intelligence, respectively. In contrast, age, income, and urbanization (percent of a population living in urban places) are ratio variables because zero values do indicate a total absence of those attributes. The ratio level of measurement scale is the interval level modified to include the inherent zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and rations are both meaningfully. For most statistical purposes interval and ratio scales are treated as a similar type of measurement scales. Note, however, that a major difference is the fact that one cannot form ratios with values of interval scale. For example, it is incorrect to say that 60o is twice as hot as 30o; but it is correct to say that PhP 60,000.00 is twice as much as PhP 30,000.00. Because of the scarcity of interval variables, the ambiguity concerning the differences between interval and ratio scales, and their similar statistical treatment, it makes sense to treat these two types of measurement scales as one type.