Probability and Statistics Course Requirements 1. Quizzes – 25% 2. First Long Exam – 25% 3. Second Long Exam – 25% 4. Third Long Exam – 25% 5. Total – 100% Passing – 60% Probability and Statistics Statistics A branch of mathematics that deals with the collection, organization and analysis of numerical data and with such problems as experiment design and decision making. 3 Important features of Statistics: 1. Data gathering 2. Data analysis 3. Making decision Probability and Statistics Definition of terms 1. Raw data Data collected in original form 2. Variable Characteristic or attribute that can assume different values 3. Population All subjects possessing a common characteristic that is being studied Probability and Statistics Definition of terms 4. Sample A subgroup or subset of a population 5. Parameter Characteristic or measure obtained from a population 6. Qualitative variables Variables which assume non-numerical values Probability and Statistics Definition of terms 7. Quantitative variables variables which assume numerical values 8. Discrete variables Variables which assume finite or countable number of possible values, usually obtained by counting 9. Continuous variables Variables which assume infinite number of possible values, usually obtained by measurement Probability and Statistics Everyone involved in the experiment must have a clear idea about what is to be studied, how the data is to be collected and at least a qualitative understanding as to how these data are to be analyzed. Guidelines for designing experiments: 1. Statement of the problem / recognition of the problem Develop all the ideas about the objectives of the experiment Probability and Statistics Guidelines for designing experiments: 2. Choice of factors and levels Choose the factors to be varied in the experiment Choose the ranges over which these factors will be varied Identify the specific levels at which runs will be made Probability and Statistics Guidelines for designing experiments: 3. Selection of the response variable The experimenter should be certain that this variable really provides useful information about the process under study 4. Choice of experimental design Involves the consideration of sample size (number of replicates/trials), the selection of a suitable run order for the experimental trials, and the determination of whether or not blocking or other randomization restrictions involved. Probability and Statistics Guidelines for designing experiments: 5. Performing the experiment Monitor the process carefully to ensure that everything is being done according to plan 6. Data analysis Analyzing the data collected during the experiment by statistical methods 6. Conclusions Making decision based on the statistical results Probability and Statistics Methods of Sampling 1. Random sampling sampling in which the data is collected using chance methods or random numbers. 2. Systematic sampling Sampling in which the data is collected by selecting every kth object 3. Stratified sampling Sampling in which the population is divided into groups (strata) according to some characteristic. Each strata is then sampled either random or systematic Probability and Statistics Methods of Sampling 4. Cluster sampling sampling in which the population is divided into groups (usually geographically). Some of these groups are randomly selected, and then all of the elements in those groups are selected. Probability and Statistics Methods of Summarizing/Characterizing Data 1. Tabular Methods a. Frequency Distribution b. Cumulative Frequency c. Stem and Leaf Table 2. Graphical Methods a. Frequency Histogram b. Frequency Polygon c. Ogive d. Pie chart Probability and Statistics Methods of Summarizing/Characterizing Data 3. Numerical Methods a. Measures of Central Tendencies Mean/Average, Median, Mode b. Measures of Dispersion Range, Variance, Standard Deviation c. Measures of Shape Skewness, Kurtosis d. Measures of Data Locations Percentiles, Deciles, Quartiles Probability and Statistics Tabular Methods 1. Frequency Distribution The organization of raw data in tabular form with classes and frequencies Steps in Constructing a Frequency Distribution Table: 1. Determine the number of class intervals, k, needed to summarize the data: No. of class intervals No. of samples Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 2. Find the range of observations Range Minimum value Maximum value Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 3. Determine the width of the class intervals Range No. of class intervals Class width Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 4. Form the frequency table Class Interval Class Boundaries Class Mark, xi Frequency, fi Relative Freq’y. % Class interval Separates one class in a grouped frequency from the other The interval could actually appear in the raw data and it begins with the lowest value Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 4. Form the frequency table Class Interval Class Boundaries Class Mark, xi Frequency, fi Relative Freq’y. % Class boundary Separates one class in a grouped frequency from the other It has one more decimal place than the raw data and therefore it does not appear in the data Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 4. Form the frequency table Class Interval Class boundary Class Boundaries Class Mark, xi Frequency, fi Relative Freq’y. % Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 4. Form the frequency table Class Interval Class Boundaries Class Mark, xi Frequency, fi Relative Freq’y. % Class Mark (Midpoint), xi The number in the middle of the class Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 4. Form the frequency table Class Interval Class Boundaries Class Mark, xi Frequency, fi Relative Freq’y. % Frequency, fi The number of times a certain value or class of values occurs Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: 4. Form the frequency table Class Interval Class Boundaries Class Mark, xi Frequency, fi Relative Freq’y. % Relative Frequency, % Frequency divided by the total number of data This gives the percent of values falling in that class Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: Illustration: the nicotine contents, in milligrams, for 40 cigarettes of a certain brand were recorded as follows: 1.09 1.74 1.58 2.11 1.64 1.79 1.37 1.75 1.92 1.47 2.03 1.86 0.72 2.46 1.93 1.63 2.31 1.97 1.70 1.90 1.69 1.88 1.40 2.37 1.79 0.85 2.17 1.68 1.85 2.08 1.64 1.75 2.28 1.24 2.55 1.51 1.82 1.67 2.09 1.69 Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: Illustration: the nicotine contents, in milligrams, for 40 cigarettes of a certain brand were recorded as follows: 1.09 1.74 1.58 2.11 1.64 1.79 1.37 1.75 1.92 1.47 2.03 1.86 0.72 2.46 1.93 1.63 2.31 1.97 1.70 1.90 1.69 1.88 1.40 2.37 1.79 0.85 2.17 1.68 1.85 2.08 1.64 1.75 2.28 1.24 2.55 1.51 1.82 1.67 2.09 1.69 Class Interval 0.72 – 1.02 1.03 – 1.33 1.34 – 1.64 1.65 – 1.95 1.96 – 2.26 2.27 – 2.57 Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: Illustration: the nicotine contents, in milligrams, for 40 cigarettes of a certain brand were recorded as follows: Class Interval Class Boundaries Class Mark, xi 0.72 – 1.02 1.03 – 1.33 1.34 – 1.64 1.65 – 1.95 1.96 – 2.26 2.27 – 2.57 0.715-1.025 1.025-1.335 1.335-1.645 1.645-1.955 1.955-2.265 2.265-2.575 0.87 1.18 1.49 1.80 2.11 2.42 Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: Illustration: the nicotine contents, in milligrams, for 40 cigarettes of a certain brand were recorded as follows: 1.09 1.74 1.58 2.11 1.64 1.79 1.37 1.75 1.92 1.47 2.03 1.86 0.72 2.46 1.93 1.63 2.31 1.97 1.70 1.90 1.69 1.88 1.40 2.37 1.79 0.85 2.17 1.68 1.85 2.08 1.64 1.75 2.28 1.24 2.55 1.51 1.82 1.67 2.09 1.69 Class Boundaries 0.715-1.025 1.025-1.335 1.335-1.645 1.645-1.955 1.955-2.265 2.265-2.575 Frequency, fi 2 2 8 17 6 5 Probability and Statistics Tabular Methods Steps in Constructing a Frequency Distribution Table: Illustration: the nicotine contents, in milligrams, for 40 cigarettes of a certain brand were recorded as follows: Class Interval Class Boundaries 0.72 – 1.02 1.03 – 1.33 1.34 – 1.64 1.65 – 1.95 1.96 – 2.26 2.27 – 2.57 0.715-1.025 1.025-1.335 1.335-1.645 1.645-1.955 1.955-2.265 2.265-2.575 Class Mark, xi 0.87 1.18 1.49 1.80 2.11 2.42 Frequency, fi 2 2 8 17 6 5 Relative Freq’y. % 5.00 5.00 20.00 42.50 15.00 12.50 Probability and Statistics Tabular Methods Cumulative Frequency Distribution Table: Cumulative Frequency, cfi Gives the running total of the frequencies The number of observations in the sample whose values are less than or equal to the upper boundary of the class interval Relative Cumulative Frequency (cfi / total number of samples) * 100 Percent of the values which are less than the upper boundary Probability and Statistics Tabular Methods Cumulative Frequency Distribution Table: Class Interval Class Boundaries 0.72 – 1.02 1.03 – 1.33 1.34 – 1.64 1.65 – 1.95 1.96 – 2.26 2.27 – 2.57 0.715-1.025 1.025-1.335 1.335-1.645 1.645-1.955 1.955-2.265 2.265-2.575 Class Freq’y, Cumulative Mark, fi Frequency, xi cfi 0.87 1.18 1.49 1.80 2.11 2.42 2 2 8 17 6 5 2 4 12 29 35 40 Relative Cum. Freq’y. % 5.00 10.00 30.00 72.50 87.50 100.00 Probability and Statistics Graphical Methods Frequency Histogram A graph which displays the data by using vertical bars of various heights to represent frequencies The horizontal axis can either be class intervals, class boundaries, or class marks Probability and Statistics Graphical Methods Frequency Histogram 18 16 frequency 14 12 10 8 6 4 2 0 0.87 1.18 1.49 1.8 Class mark 2.11 2.42 Probability and Statistics Graphical Methods Frequency Polygon A line graph between frequency and class mark 18 16 frequency 14 12 10 8 6 4 2 0 0.87 1.18 1.49 1.8 Class mark 2.11 2.42 Probability and Statistics Graphical Methods Relative cumulative frequency Ogive A frequency polygon of relative cumulative frequency against upper class boundaries 120 100 80 60 40 20 0 1.025 1.335 1.645 1.955 2.265 Upper class boundary 2.575 Probability and Statistics Graphical Methods Pie chart The degree of slice is based on the relative frequency 5 5 20 42.5 15 12.5 Probability and Statistics Numerical Methods Measures of Central Tendencies 1. Mean / Average The sum of the product of class mark and the corresponding frequency divided by the total number of samples Probability and Statistics Numerical Methods Measures of Central Tendencies 2. Median The value that will divide the samples into two equal halves when the samples are arranged from lowest to highest Total frequencies of all class intervals before the median class Frequency of the median class Lower class boundary of the median class Probability and Statistics Numerical Methods Measures of Central Tendencies 3. Mode The most frequent number Lower class boundary of the modal class Frequency difference of the modal class and the preceeding class Frequency difference of the modal class and the succeeding class Probability and Statistics Numerical Methods Measures of Variability / Dispersion 1. Range Measures how the samples are clustered. It is the difference between the highest and the lowest values of the raw data Range Minimum value Maximum value Probability and Statistics Numerical Methods Measures of Variability / Dispersion 2. Variance Measures how the samples are dispersed. Probability and Statistics Numerical Methods Measures of Variability / Dispersion 3. Standard deviation, s The positive square root of the variance Coefficient of variation, Cv If Cv < 10 – the data are considered clustered, else the data are dispersed Probability and Statistics Numerical Methods Measures of Shape 1. Skewness A measure of the symmetry of the distribution of the sample If Sk < 0 – the distribution is skewed to the left (i.e., left tail is longer than right tail) Probability and Statistics Numerical Methods Measures of Shape 1. Skewness A measure of the symmetry of the distribution of the sample If Sk = 0 – the distribution is symmetric with respect to the mean, i.e., right and left tails are of equal length (the distribution is called normal or Gaussian) Probability and Statistics Numerical Methods Measures of Shape 1. Skewness A measure of the symmetry of the distribution of the sample If Sk > 0 – the distribution is skewed to the right (i.e., right tail is longer than left tail) Probability and Statistics Numerical Methods Measures of Shape 2. Kurtosis A measure of the height of the distribution If kurtosis < 0 – the distribution has short height or is almost flat Probability and Statistics Numerical Methods Measures of Shape 2. Kurtosis A measure of the height of the distribution If kurtosis = 0 – the distribution has the right height Probability and Statistics Numerical Methods Measures of Shape 2. Kurtosis A measure of the height of the distribution If kurtosis > 0 – the distribution has a high peak Probability and Statistics Numerical Methods Measures of Data Location 1. Quartiles: Q1, Q2, Q3 It is the 25%, 50% and 75% respectively of the data 2. Deciles: D1, D2, D3, … D9 It is the 10%, 20%, 30%,…90% respectively of the data 3. Percentile: P1, P2, P3, … P99 It is the 1%, 2%, 3%,…99% respectively of the data Probability and Statistics Quiz The diameter of 36 rivet heads in 1/100 of an inch is given below: 6.72 6.66 6.66 6.72 6.77 6.64 6.62 6.74 6.82 6.76 6.72 6.81 6.70 6.73 6.76 6.79 6.78 6.80 6.70 6.78 6.70 6.72 6.78 6.66 6.62 6.76 6.76 6.76 6.75 6.76 6.67 6.76 6.66 6.68 6.70 6.72 1. Construct a Cumulative Frequency Table 2. Determine the Mean, Median and Mode 3. Determine the Variance, Standard deviation and the coefficient of variation 4. Determine the skewness and kurtosis of the distribution and make a conclusion about the shape of the distribution