Statistical Methods Quiz every class – it will take 10 minutes at the beginning of every single class. Use the ppt folder without the final answers Mod is used for checking the answer Section 1.1 - - What is Statistics? o Is the study of how to collect, organize, analyze, and interpret numerical information from data? o What is the best way to interpret the numbers? Individuals are the people or objects included in the study. A variable is a characteristic of the individual to be measured or observed o Variable is the characteristics so sometimes it’s – weight, measurement, color (describes the individual) o Quantitative variable has a value or numerical measurement for which operation such as addition or averaging make sense. (Weight Measurement) (Number) o Qualitative variable describes an individual by placing the individual into a category or group, such as male or female (Color, Name, non-number) sometimes it is known as a Categorical variable. In Population data, the data are from every individual of interest In sample data, the data are from only some of the individuals of interest Population parameters is a numerical measure that describes an aspect of a populations (Average, mean, variance, standard deviation) Sample statistic is a numerical measure that describes an aspect of a sample Levels of measurement: Nominal Ordinal Interval Ration - - Nominal – names, labels categories – nationality, eye color, zip code, Major, (There is no best version Ordinal – data that is arranged in order – poor, acceptable, good, rankings. (Grade A,B,C) Rating scale (poor, good, excellent) Interval – can be arranged in order – temperature – precise differences rankings – there is no true zero (SAT score, IQ, Temperature) no meaningful zero because zero doesn’t mean nothing. Even if they get a zero IQ it doesn’t really mean zero it is a score Ratio – Interval plus true zero, (Heightm weightm time, salary, age, (Zero really means Zero) aka Nothing Descriptive statistics – involved methods of organizing pictures, summarizing information from samples or population Inferential Statistics – involved methods of using information from a sample to draw conclusions regarding the population – it takes it one step further descriptive statistics since you have to take information and draw a conclusion regarding the population. Section 1.2 Random Samples Simple random samples – n measurements from a population is a subset of the population selected that everyone in the population has an equal chance of being selected. Everyone basically has an equal chance of getting selected to get tested for the sample size - - - - Random Sampling: everyone just gets an equal sample chance from the entire population Stratified sampling: Divide the entire population into subgroups called strata. – characteristics such as age, income, education level are examples. They share similar characteristics and you just take samples from each strata or each of the groups Systematic sampling: Number all members of the population sequentially. Then from a point you choose every kth member of the sample. So, if it was every 106th person for example Cluster sampling: Divide the entire population into pre-existing segments or cluster. The clusters are often geographic. Make a random selection of clusters. Include every member of each selected cluster in the sample. (Often this is geographic) So you just choose random people from the already done clusters. Multistage sampling: Use a variety of sampling `methods to create successively smaller groups at each stage. The final sample consists of clusters (You use the clusters first so you’re using more than one method to get the sampling) Convenience sampling: Create a sample by using data from population members that are readily available. (Using a sample from a previous research or someone from someone else’s sample. There IS ALWAYS going to be errors when taking samples from a population because people are always different - The differences in the error are known as sampling error. Sampling error – does not perfectly represent the population Nonsampling error - happens with poor sample design, sloppy data collection, faulty measurement instruments, bias in questionnaires, and so on. Section 1.3 Introduction to Experimental Design Basic Guidelines for Planning a statistical study 1. First identify the individuals and or objects of interest 2. Specify the variables as well as the protocols for taking measurements or making observations 3. Determine if you will use an entire population or a representative sample. If using a sample, decide on a viable sampling method. 4. In your data collection plan, address issues of ethics, subject confidentiality, and privacy. If you are collecting data at a business, store, college, or other institution, be sure to be courteous and to obtain permission, as necessary. 5. Collect the data. 6. Use appropriate descriptive statistics methods and make decision using appropriate inferential statistics methods 7. Finally, note any concerns you might have about your data collection methods and list any recommendations for future studies. Census – measurements or observations from the entire population are used If we use data from only part of the population of interest, we have a sample. Sample, measurements, or observations from part of the population Gathering data for statistical study Observational study, observations and measurements of individuals are conducted in a way that does not change the response or the variable being measured (don’t do anything to change the result) Experiment, a treatment is a deliberately imposed on the individuals to observe a possible change in the response or variable being measured. (purposely doing something to trigger some sort of response) Control is something that nothing happens to them – Everything that works as normal Treatment group or Experimental group gets all of the medicinal stuff Chapter 2 9/14/2020 Organizing Data Frequency Distribution, Histograms, and related topics 2.1 4 assignments due Frequency Table - - First thing when you want to start using a table is to figure out how many groups o Five to 15 classes are usually used o Always use around (5-15) o Use less than 5 and you lose too much information or too general o More than 15 and the data isn’t summarized you could just use all of the data and there’s no nothing you are really taking out of it. How to figure out how many numbers in each classs Compute Large Data Value – Small data value --------------------------------------------------------------Desired Number of classes When you get the result increase the value to the next biggest number The lower-class limit is the lowest data value that can fit in a class The upper-class limit is the highest data value that can fit in a class Midpoint just add the two class limits and divide by two The relative frequency of a particular class, divide the class frequency f by the total of all frequencies n sample size. Frequency needs to equal the amount of data points The relative frequency is just the frequency divided by the number of data points or total frequency So like 14/60 21/60 Histograms and Relative-Frequency Histograms - For histograms, the height of the bar is the class frequency, whereas for relative-frequency histograms, the height of the bar is the relative frequency of that class. Make sure to have a gap for histograms. There’s a gap there because it usually doesn’t start at 0. Because we always start with class boundaries. Because you have to be able to fill the gap so if it was a perfect number. Always use class boundary for histograms Typical mound shaped symmetrical histogram - mount monitorLook like mountain Typical uniform or rectangular histogram - Literally looks like a rectangle Typical skewed histogram - Looks like Mario staircase Skewed left means left is down and looks like Mario staircase Skewed right means right is down and more data on the left side Typical bimodal histogram - Two camel backs Cumulative Frequency Tables and Ogives - Cumulative frequency is for a class is the sum of the frequencies for that class and all previous classes. You are just adding the previous frequency plus the current frequency. When building the ogive you need to match the upper class boundary number and the cumulative frequency. Section 2.2 Bar Graphs, Circle Graphs, and Time-series Graphs - The issue with histograms is that the data must be quantitative Stem and leaf display Averages and variation 3.1 – Measures of Central Tendency: Mode, Median, and Mean Measures of Variation Measuring the spread of the data we use RANGE basically the difference between the largest and smallest value. Variance and Standard Deviation The Square root of Variance is the Standard Deviation Mean = expected value Sample is x bar Population we use u Find STANDARD DEVIATION To calculate the standard deviation of those numbers: 1. 2. 3. 4. Work out the Mean (the simple average of the numbers) Then for each number: subtract the Mean and square the result. Then work out the mean of those squared differences. Take the square root of that and we are done! The only difference between the sample and the population formulas is that there is no (n-1) for the population in the denominator Coefficient of Variation CV Standard Deviation / the Mean then multiply by 100 for a percent The higher the number or percent the more VARIABLE IT IS. The lower the percent the more concentrated Chebyshev’s Theorem 1 – ( 1/(k^2)) K is how many standard deviations from the mean. Chapter 3.3 Percentiles and Box and Whisker Plots For quarters if is Lowest – Q1 – Q2 (Median or 50th Percentile) – Q3 – Highest Find the Median to get the Q2 Then you find the median for the inbetween for the lowest value and highest value to find Q1 and Q3 IQR is known as Interquartile range or known as = (Q3 – Q1) to give you the middle 50 percent. 4.1 09/28/2020 What is probability? It is all about the likelihood of an event Not possible to get a negative probability P(A), read “P of A,” denotes the probability of event A. If P(A) = 1, event A is certain to occur If P(A) = 0, event A is certain not to occu Probability of event = relative frequency = f/n F is the frequency of the even occurrence in a sample of n observation Probability of even = Number of outcomes favorable to even / total number of outcomes Intuition based probability Law of large numbers A statistical experiment or statistical observation can be thought of as any random activity that results in a definite outcome An event is a collection of one or more outcomes of a statistical experiment or observation A simply even is one particular outcome of a statistical experiment The set of all simple events constitutes that sample space of an experiment. P(A) + P(Ac) = 1 P(Ac) = 1 – P(A) Section 4 Conditional Probability and multiplication Rules Independent events - Two events are independent if the occurrence or noncurrent of one vent does not change the probability that the other event will occur Multiplication rule for independent events P(A and B) = P(A)*P(B) Conditional probability means that the events are dependent The notation P(A, given B) denotes the probability that event A will occur given that event B has occurred. In the P(A and B) = dependent considering P(AIB) – B must happen first P(BIA) – A must happen first P(A and B) = (P(A) * P(B I A) P(A and B) = P(B) * P (A I B) P(A I B) = P(A and B)/ P(B) Section 4.3 Trees and Counting Techniques 0! = 1 1! = 1 5! = 5 * 4 * 3 * 2 * 1 The formula we use to compute this number is called the permutation formula. As we see in the next example, the permutations rule is really another version of the multiplication rule. Counting Rule for Combinations Order does not matter The number of comvintation so fn ovjects taken r at a time is Cn,r = n!/r!(n-r)! Chapter 5 The Binomial Probability Distribution and Related Topics Section 1 Introduction to Random Variables and Probability distributions A quantitative variable x is a random variable if the value that x takes on in a given experiment or observation is a chance or random outcome A discrete random variable can take on only a finite number of values or a countable number of values - A WHOLE NUMBER 25, 26, 27, A continuous random variable can take on any of the countless number of values Probability Distribution of Discrete Random Variable A random variable has a probability distribution whether it is discrete or continuous A probability distribution is an assignment of probabilities to each distinct value of a discrete random variable or to each interval of values of a continuous random variable. Features of the probability distribution of a discrete random variable 1. The probability distribution has a probability assigned to each distinct value of the random variable 2. The sum of all assigned probabilities must be 1 Binomial Probabilities P(at least one) = 1 – p(none)