A branch of Applied Mathematics specializing in procedures for collecting, organizing, presenting, analyzing, and interpreting data from observations. Statistics involves much more than simply drawing graphs and computing averages. In education, it is frequently used to describe test results. In science, the data resulting from experiments must be collected and analyzed. Manufacturers can provide better products at reasonable costs through the use of statistical quality control techniques. In government, many kinds of statistical data are collected all the time. A knowledge of statistics can help you become more critical in your analysis of information; hence, you will not be misled by manufactured polls, graphs, and averages. Statistical Question → one that can be answered by collecting data that vary. Example of Statistical Questions: 1. How many hours do college students spend time in studying? (summarizing question) 2. Do college students spend more time in social media than studying? (comparing question) 3. Do students who spend more time in studying do better in exam? (relationship question) Example of Non-Statistical Questions: 1. How old are you? 2. What is your favorite subjects? It is the analysis of data that help to describe, show, and summarize data under study Organize, analyze and present data in a meaningful way It is used to describe a situation It explain already known data and limited to a sample and population having small size Types: Measure of central tendency & Measure of variability Results are shown with the help of charts, graphs, tables, etc. Commonly Used Summarizing Values Percentage Measures of central tendency and location Measures of variability Skewness and Kurtosis Examples: 1) Class average of examination 2) Range of students’ score 3) Average salary It is the analysis of random sample of data taken from a population to describe and make inference about the population Compares, test, and predict data It is used to explain the chance of occurrence of an event It attempts to reach the conclusion about the population Types: Estimation of parameters & Testing of hypothesis Results are shown with the help of probability scores Commonly Used Statistical Tools or Techniques Estimation of Parameters Testing of hypothesis (z-test, t-test, ANOVA, Chi-squares, regression, Time series analysis) Examples: 1) Significant relationships between job satisfaction and performance of CCDC employees 2) The use of module is significantly effective than the traditional method of teaching Two types of data: 1. QUANTITATIVE DATA - numerical values 2. QUALITATIVE DATA - categorical responses such as colors, information, or questions that are answerable by YES or NO, labels, genders, attitude, etc. QUANTITATIVE Discrete continuous - numerical characteristics or attributes associated with the population that can assume different values. - is collection of facts or information. - finite number of values; values are obtained by counting (number of students, number of passers, etc.) 2. Continuous - infinite number of values between two specific numbers; values are obtained by measuring (weight, height, temperature) Levels of Measurement Nominal - classifies and categorizes data Ex: type of blood, gender, religion, citizenship Ordinal - rank or order to show relationship Ex: President (officers), eldest (family order) Ratio - value of zero or starts at an absolute zero point Ex: mass, length, time, angle, energy, rating, electrical change, test results Interval - variables are measured based on a set of intervals on a certain scale Ex: temperature (freezing point, boiling point) 1. Discrete - this method is used when the objective is to determine the cause and effect relationship of a certain phenomenon under controlled condition. SAMPLING TECHNIQUES POPULATION includes all of the elements from a set of data; objects, events, organizations, countries, species, organisms, etc. SAMPLE is a subset taken from a population, either by random sampling or by non-random sampling A. RANDOM SAMPLING Selection of n elements derived from the N population, which is the subject of an investigation or experiment, where each point of the sample has an equal chance of being selected using the appropriate sampling technique. Types of Random Sampling Techniques 1. Lottery Sampling - each member of the population has an equal chance of being selected. Systematic sampling - members of the population are listed and samples are selected at intervals called sample intervals. In this technique, every nth item in the list will be selected from a randomly selected starting point. Ex: If you want to draw a 200 sample from a population of 6,000, select every 3rd person in the list. 3. Stratified random sampling - members of the population are grouped on the basis of their homogeneity. This technique is used when there are number of distinct subgroups in the population within which full representation is required. The sample is constructed by classifying the population into sub-populations or strata on the basis of certain characteristics of the population, such as age, gender or socioeconomic status. 2. COLLECTION OF DATA Statistical of Data it focuses in determining the charges in the attitude, characteristics and behavior of people or other subjects. This technique includes watching and recording actions and behaviors. The person who gathers the data is called an investigator while the person being observed is called the subject. - oral or verbal communication where the interviewer asks questions in any mode (face to face, telephone, or virtual) to an interviewee. - gathered through a set of question that is mailed or handed to respondents who are expected to read and understand them. - if you have a big number pf samples, it is the most practical way to use, in a national level, surveys are usually covered by the government and other forms of surveying organization such as Philippine Statistic Authority (PSA). Example: Select a sample of 400 students from the population which are grouped according to the cities they came from. The table shows the number of students per city. City Population (N) A B C D Solution: To determine the number of students to be taken as sample from each city, we divide the number of students per city by total population (N=28,000) multiply the result by the total sample size (n=400). City A B C D Population (N) 12,000 10,000 4,000 2,000 Total= 28,000 Sample (n) x (400) = 171 x (400) = 143 x (400) = 57 x (400) = 29 Total= 400 12,000 28,000 10,000 28,000 4,000 28,000 2,000 28,000 Cluster Sampling - applied on a geographical basis. Generally, first sampling is performed at higher levels before going down to lower levels. 4. 12,000 10,000 4,000 2,000 Some forms of graphs for ungrouped frequency distributions are pie hart, bar graph, and line graph PIE CHART (Pie Graph) Used to show how all the parts of something are related to the whole. It is represented by a circle divided into slices or sectors of various sizes that show each part’s relationship to the whole and to other parts of the circle. BAR GRAPH Uses rectangles (or bars) of uniform width to represent data, particularly the nominal or categorical type of data. The height of the rectangle denotes the frequency of the variable. Two types of bar graph: - sometimes called a column chart. Used to show the changes in the numerical value of a variable over a period of time. Example: Samples are taken randomly from the provinces first, follows by cities, municipalities, or barangays, and then from households. Multi-stage Sampling - uses a combination of different sampling techniques. 5. Example: Selecting respondents for a national election survey, use the lottery method first for regions and cities, then use stratified sampling to determine the number of respondents from selected areas and clusters. - used to represent changes in data over a period of time. Data like changes in temperature, income, population, and the like can be represented by a line graph. Data are represented by points and are joined by line segments. A line graph may be curved, broken, or straight. DATA refers to information that is collected and recorded. It can be in the form of numbers, words, measurement, and much more. Grouped Data is the type of data which is classified into groups after collection. Ungrouped Data which is also known as raw data that has not been placed in any group or category after collection. When data are presented as graph, they are easily interpreted and compared. Data in an ungrouped frequency distribution can be presented graphically to give a better picture of the distribution. Some forms of graphs for grouped frequency distribution are the Histogram and Ogive. HISTOGRAM A bar graph that shows the frequency of data that occur within a certain interval. In a histogram, the bars are always vertical, the width of each bar is based on the size of the interval it represents, and there are no gaps because their bases cover a continuous range of possible values. Ie for a given ungrouped data to be transformed as a group data. Example: the following are the test scores of students. Construct a suitable frequency table. Use 6 as the desired number of class interval. OGIVE Also called the cumulative frequency graph or cumulative frequency curve is a graph plotted from a cumulative frequency table. Frequency Distribution Table Frequency - the number of occurrence of a data Frequency Table - a table that lists items and shows the number of times the items occur. Steps in constructing a frequency table (for ungrouped data) Step 1: Make 3 columns. Arrange the data in order in the first column. Step 2: Make a tally. Step 3: Count the tallies then write the frequencies. Step 4: Total all the frequencies Steps in constructing a frequency distribution table 1. Determine the range. Range is the difference between the highest value H and the lowest value L in the set of data. R = H - L 2. Determine the desired number of the class interval or categories. The ideal number of class interval in somewhere between 5 and 15. 3. Determine the class width or approximate size of the class interval by dividing the range by the desired number of class intervals. Class Width = 𝑅𝑎𝑛𝑔𝑒 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 W= 𝑹 𝑪𝒍 4. Write the class intervals starting with the lowest lower value as determined in the data. Then add the class width to the starting point to get the next interval. Do this until the highest value is contained in the last interval. 5. Tally the corresponding number of scores in each interval. Then summarize the results or sum up the tallies under the frequency column. 14 15 30 26 30 10 10 30 34 20 30 10 22 18 14 21 19 11 Solution: 1. Determine the range. 19 15 40 22 26 16 10 15 20 36 17 29 18 28 40 36 37 R=H-L = 40 - 10 = 30 2. Class interval = 6 3. Determine the Class Width W= 𝑹 𝑪𝒍 = 𝟑𝟎 𝟔 =5 4. Write the class interval starting with the lowest lower value as determined in the data. - starting with 10 and with W = 5, the class intervals are: 10 - 15, 16 - 21, 22 - 27, 28 - 33, 34 - 39, 40 -45. 5. Tally the corresponding number of scores in each interval. Then summarize the results or sum up the tallies under the frequency column. Scores 10 - 15 16 - 21 22 - 27 28 - 33 34 - 39 40 - 45 Tally IIII IIII IIII IIII IIII IIII I IIII II TOTAL Frequency 10 9 4 6 4 2 35 DATA → refers to information that is collected and recorded. It can be in the form of numbers, words, measurement and much more. UNGROUPED DATA → which is also known as raw data that has not been placed in any group or category after collection. GROUPED DATA → the type of data which is classified into groups after collection. Measures of Central Tendency of Ungrouped Data MEAN or the Arithmetic Mean Most commonly used measure of central position. It is used to describe a set of data where the measures cluster or concentrate at a point. ∑𝒙 ̅ = 𝒙 𝑵 Where ∑ 𝑥 = the summation of 𝑥 (sum of the measures 𝑁 = number of values of 𝑥 Example: The grade in Probability of 10 students are 87, 84, 85, 85, 86, 90, 79, 82, 78, 76. What is the average grade of the 10 students? Sulotion: ∑𝒙 ̅ = 𝒙 𝑵 𝑥̅ = 87 + 84 + 85 + 85 + 8610+ 90 + 79 + 82 + 78 _76 𝑥̅ = 832 10 𝑥̅ = 83.2 Hence, the average grade of the 10 students is 83.2. MEDIAN Middle value or term in a set of data arranged according to size/magnitude (either increasing or decreasing) For data with two median, add the two middle values and divide it by two. Example 1: The library logbook shows that 58, 60, 54, 35, and 97 books, respectively, were borrowed from Monday to Friday last week. Find the median. Solution: Arrange the data in increasing order. 35, 54, 58, 60, 97 Since the middle value is the median, then the median is 58. Example 2: Andrea’s scores in 10 quizzes, during the first quarter are 8, 7, 6, 10, 9, 5, 9, 6, 10, and 7. Find the median. Solution: Arrange the data in increasing order. 5, 6, 6, 7, 7, 8, 9, 9, 10, 10 Since the number of measures is even, then the median is the average of the two middle score. Md = 7 +2 8 = 7.5 Hence, the median of the set of scores is 7.5. MODE The measure of value which occurs most frequently in a set of data The value with the greatest frequency. To find the mode for a set of data: 1. Select the measures that appears most often in the set: 2. If two or more measures appear the same number of times, then each of these values is a mode” and 3. If every measure appears the same number of times, then the set of data has no mode. MEASURES OF VARIABILITY (UNGROUPED DATA) Measure of dispersion or variability refer to the spread of the values about the mean. These are important quantities used by statistician in evaluation. Smaller dispersion of score arising from the comparison often indicates more consistency and more reliability. The most commonly used measures of dispersion are the range, the average deviation, the standard deviation, and the variance. 1. -- the difference between the largest value and the smallest value. R=H-L Where R = range, H = highest value, L = lowest value Example: Test score of 10, 8, 9, ,7, 5, and 3. R=H-L R = 10 - 3 R=7 2. -- the dispersion of a set of data about the average ̅| A.D. = ∑|𝒙−𝒙 𝑵 Where A.D is the average deviation; 𝑥 is the individual score; 𝑥̅ is the mean; and 𝑁 is the number scores |𝒙 − 𝒙 ̅| is the absolute value of the deviation from the mean. Procedure in computing the average deviation: 1) Find the mean for all the cases. 2) Find the absolute difference between each score and the mean. 3) Find the sum of the difference and divide by 𝑁. 𝑁 is the number scores. Example: Find the average deviation of 12, 17, 13, 18, 15, 14, 17, 11. 1) Find the mean 𝑥̅ . ∑𝑥 𝑥̅ = = 12 + 17 + 13 + 18 +9 15 + 14 + 17 + 11 𝑁 135 𝑥̅ = 9 = 15 2) Find the absolute difference between each score and the mean. |𝑥 − 𝑥̅ | = |12 − 15| = 3 = |17 − 15| = 2 = |13 − 15| = 2 = |18 − 15| = 3 = |18 − 15| = 3 = |15 − 15| = 0 = |14 − 15| = 1 = |17 − 15| = 2 = |11 − 15| = 4 = |12 − 15| = 3 3) Find the sum of the absolute difference ∑|𝑥 − 𝑥̅ |. |𝑥 − 𝑥̅ | = |12 − 15| = 3 = |17 − 15| = 2 = |13 − 15| = 2 = |18 − 15| = 3 = |18 − 15| = 3 = |15 − 15| = 0 = |14 − 15| = 1 = |17 − 15| = 2 = |11 − 15| = 4 = |12 − 15| = 3 ∑|𝑥 − 𝑥̅ | = 20 x 12 17 13 18 18 15 14 17 11 𝑥̅ 15 15 15 15 15 15 15 15 15 |𝑥 − 𝑥̅ | 3 2 2 3 3 0 1 2 4 Example: compute the standard deviation of the set of test scores: 39, 10, 24, 16, 19, 26, 29, 30, 5. 1) Find the mean. 2) Find the deviation from the mean (𝑥 − 𝑥̅ ). 3) Square the deviations (𝑥 − 𝑥̅ )2 . 4) Add all the squared deviations ∑(𝑥 − 𝑥̅ )2 . 5) Tabulate the results obtained: 6) Compute the standard deviation (SD) using the formula ̅) SD = √∑(𝒙−𝒙 𝑵 𝟐 (𝑥 − 𝑥̅ )2 𝑥 5 10 16 19 24 26 29 30 39 ∑(𝑥 − 𝑥̅ )2 4. -- the variance ꝺ2 of a data is equal to 1⁄𝑁 . ꝺ𝟐 = ∑(𝒙−𝒙 ̅ )𝟐 𝑵 Where ꝺ2 is the variance; 𝑁 is the total number of observations; x is the raw score; and 𝑥̅ is the mean of the data. Variance is not only useful, it can be computed with ease, and it can also be broken into two or more component sums of squares that yield useful information. ∑|𝑥 − 𝑥̅ | = 20 4) Solve for the average deviation by dividing the result in step 3 by 𝑁 . A.D. = 3. ∑|𝑥−𝑥 ̅| 𝑁 = 20 9 ∑(𝑓𝑥) ∑𝑓 = 2.22 Median : 𝑙𝑏𝑚𝑐 + [ -differentiates sets of scores with equal averages. But the advantage of standard deviation over mean deviation is that it has several applications in inferential statistics. ̅) SD = √∑(𝒙−𝒙 𝑵 Mean : 𝑥̅ = ∑𝑓 2 − <𝑐𝑓 𝑓𝑚𝑐 ]i i = interval 𝑙𝑏 = lower boundary → LL - 0.5 𝑐𝑓 = cumulative frequency 𝑓𝑚𝑐 = frequency of the median class mc = median class → ∑2𝑓 = 40 = 20th 2 𝟐 Where SD is the standard deviation; 𝑥 is the individual score; 𝑥̅ is the mean; and Mode : 𝑙𝑏𝑚𝑜 + [ 2 𝐷1 +𝐷2 ]i mo = modal class = highest frequency 𝐷1 = difference 𝐷2 = difference 𝑥 41-45 36-40 31-35 26-30 21-25 16-20 𝑖=5 𝑓 1 8 8 14 7 2 𝑥̅ 43 38 33 28 23 18 ∑ = 40 𝑓𝑥 43 304 264 392 161 36 ∑ = 1,200 1,200 Mean = 40 𝑥̅ = 30 40 −9 Median = 25.5 + [ 214 ]5 25.5 + [11 ]5 = 29.43 14 Mode = 25.5 + [6 +6 7]5 6 25.5 + [13 ]5 = 27.81 𝐷1 = 14 - 8 = 𝐷2 = 14 - 7 = 6 7 𝑙𝑏 40.5 35.5 30.5 25.5 20.5 15.5 < 𝑐𝑓 40 39 31 23 9 2