SECTION 7.1: BRANCHES OF STATISTICS, DATA TYPES, AND GRAPHS STATISTICS DEFINED Statistics is the branch of mathematics about data collection (the aspect dealing with obtaining numerical measurements), data tabulation/presentation (the aspect dealing with organizing data into tables, graphs, or charts), data analysis (the aspect dealing with extracting relevant information from the given data), and data interpretation (the aspect dealing with drawing conclusions from the analyzed data). (Pagoso et al., 1992). Based on this definition, data is the major component of statistics. Hence, statistics is succinctly referred to as the science of data. BRANCHES OF STATISTICS Statistics is divided into two: descriptive statistics and inferential statistics. Procedures focused on collecting and describing a set of data to obtain relevant information are concerns of descriptive statistics. These procedures apply only to the group (whether sample or population) from which the data has been collected. On the other hand, procedures concerned with the analysis of the data from the sample in order to make predictions or inferences about the population are the concerns of inferential statistics. These procedures may be about making generalizations from samples to populations, doing estimations and hypothesis tests, finding relationships among variables, and making predictions. It is important to note that population as used in statistics refers to the totality of the group under study while sample is just a subset of this group. Example 1: Describing the enrollment in a university in terms of the percentages per level (freshman, sophomore, junior, and senior) is a concern of descriptive statistics. Example 2: Testing the hypothesis that male and female students significantly differ in performance in mathematics test is a concern of inferential statistics. DATA AND THEIR TYPES Data are referred to as pieces/bits of information that function as the basic component of any statistical investigation. These are obtained whenever measurements are done or observations are recorded. TYPES OF DATA Data may be classified according to the type of variable being collected. Quantitative data are data obtained on quantitative variables (variables that can be measured numerically). Qualitative data, on the other hand, are those collected on qualitative variables (variables that cannot assume a numerical value but can be divided into two or more non numeric categories). Examples of quantitative data: number of siblings, speed of a car, blood pressure reading, etc. Examples of qualitative data: eye color, year level, socioeconomic status, etc. Data may also be classified according to the different measurement characteristics. Numbers have the following functions: to classify and to compare values either by ranking, getting differences, or forming quotients. Nominal data are data where numbers can be assigned to categories but they cannot be ranked, and no mathematical computation can be done. Ordinal data are data where numbers can be assigned to categories and these numbers can now be compared by ranking. Interval data are data where the numbers can be subtracted and these differences can now be compared. Ratio data are data obtained from measurements with a unique origin. Examples: Gender is nominal because if the number 1 is assigned to male and 2 to female, you cannot compare the numbers. You cannot say that 1 < 2. You cannot perform subtraction either. You cannot say that 2 – 1 = 1. Grade levels in basic education are ordinal. You can now compare the grade levels. The statement 1 < 2 is now true. It simply means that grade level 1 is lower than grade level 2. But just like in nominal data, mathematical computations are not possible. There is no meaning in the statement “5 – 3 = 2”. Temperature readings are intervals. There is no unique origin for temperature reading in the Celsius scale. But you can now compare differences. If, on Monday, the highest temperature is 36oC and the lowest is 30oC, the difference in temperature is 6oC. If, on Tuesday, the highest temperature is 37oC and the lowest is 29oC, the difference in temperature is 8oC. These differences can now be compared. The difference in temperature on Monday is lower than the difference in temperature on Tuesday. Height of a person is a ratio of data. There is a unique origin in the instrument being used for measurement. Comparing by ranking, forming differences, and forming quotients are now possible. TYPES OF GRAPHS Bar – a graph made of bars, with heights representing the frequencies (or percentages) of respective categories Example: The graph shows the frequency of students from different levels in a university with 1300 students. Pie – a circle divided into slices/portions representing the percentages of a population or a sample that belong to different categories Example: The graph shows the percentages of students from different levels in a community college with 650 students. Line – a graph which shows the relationship between two or more set of quantities using lines Example of a line graph: The graph below shows the enrolment for schools A and B from 2017 to 2020. Pictograph – a graph in which picture symbols are used to represent values Example of a pictograph: The graph below shows the number of months that four students obtained a score of not lower than 80% from a spelling test. SECTION 7.2: MEASURES OF CENTRAL LOCATION AND MEASURES OF POSITION When you want to describe a given set of numerical data, one of the descriptions can be a number that represents its central value. There are three ways of describing the location of that central value: mean, median, and mode.Aside from describing the location of the central value of a given data set, you may also be interested in locating the position of a certain value relative to the position of the other values. If that is the case, you have four possible measures: median, quartile, decile, and percentile. MEASURES OF CENTRAL LOCATION/CENTRAL TENDENCY COMPARING THE DIFFERENT MEASURES Table 11 below shows the comparison of the different measures in terms of characteristics, stability of the measure, subsequent manipulations, number of values, type of data, and effect of extreme values. Table 11: Comparison of the Different Measures MEASURE OF POSITION/FRACTILES: DEFINITIONS AND INTERPRETATIONS The definitions and interpretations of the different measures of position are given in the boxes below. PROCEDURE IN COMPUTING THE FRACTILES Different authors suggest different ways of computing the fractiles. The procedure adapted here is the one used by Bluman (2013) and Freund & Simon (1997). Let p be afraction between 0 and 1. Compute pn. ○ If pn is not an integer, use the next higher integer for the pth fractile position. ○ If pn is an integer, use the mean of the values in positions pn and (pn + 1) as the pth fractile. Example: Given the following scores obtained by students in a statistics test: 20 16 18 30 10 12 18 13 25 28 a) Compute the three quartiles. b) Compute the 7th decile. c) Compute the 35th percentile. ARRANGE: 10 12 13 16 18 18 20 25 28 30 SECTION 7.3: MEASURES OF VARIABILITY Knowing the location of the central value of a given data set and the position of a certain value relative to the other values in a data set does not fully give a full description about the data set. Knowing also how spread out the scores are from each other is another way of describing the set. This is known as the measure of spread or dispersion or variability. Another measure that is likewise important is the measure comparing the variations of two or more groups which enables you to determine which of these groups is more variable than each of the other groups. The measure used for this is the coefficient of variation is known. MEASURES OF VARIABILITY 1. Range . This is a measure of spread obtained by subtracting the smallest value from the largest value in a data set. Example: The data below are the scores of 8 students in a Statistics examination. 43 46 41 39 36 48 41 28 Find the range: Range = 48 – 28 = 20 2. Quartile Deviation(QD) . This is a measure of spread which is one-half the range of the middle 50% of the cases or observations (This is also called semi-interquartile range.) MEASURE OF RELATIVE VARIATION: COEFFICIENT OF VARIATION (CV) SECTION 7.4: FUNDAMENTAL PRINCIPLE OF COUNTING AND PROBABILITY Knowledge of the fundamental principle of counting is needed in understanding probability and inferential statistics. These topics will be discussed in this section. Fundamental Principle of Counting or Multiplication Principle If one event can occur in m ways and a second one can occur in n ways, then the number of ways both can occur is (m)(n). Note: The principle also holds true for more than two events. Example 1: A room has 4 doors. In how many ways can an individual make a trip into this room and out again if he must enter and leave only by means of the doors? Solution: There are two events: entrance and exit. For entrance there are 4 doors the same with exit. n = (4)(4) = 16 Answer: There are 16 ways. PROBABILITY Probability is a branch of mathematics that deals with measuring or determining quantitatively the likelihood that an event or experiment will have a particular outcome. DEFINITIONS An experiment refers to some situation of interest whose outcome is determined by chance. A sample space S is a set of all possible outcomes of an experiment. Each element in a sample space is called an outcome or sample point. An event is any subset of a sample space. Example 1: A bowl has 4 orange marbles, 5 green marbles, and 6 yellow marbles. One marble is then picked at random. Determine the probability that it is green. Solution: Let S = the experiment of picking a marble from the bowl n(S) = 15 E = the event of selecting a green marble n(E) = 5