Statistics and Probability: 5 sessions Anna Bargagliotti The University of Memphis • Fundamentals of Statistics (purpose and vocab) • Probability Problems • Regression, correlation, causation • Random Variables II • Counting Problems Fundamentals of Statistics (purpose and vocab) What is statistics? Definition 1: Statistics is a collection of procedures and principles for gaining and analyzing information in order to help people make decisions when faced with uncertainty. (Seeing Through Statistics, by Jennifer Utts) Definition 2: Statistics is the science of collecting, organizing, and interpreting data OR Statistics are the data that describe or summarize something. (Our text) Definition 3: Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. (Wikipedia) Definition 4: A set of concepts, rules, and procedures that help us organize numerical information, understand, and make decisions. (http://bobhall.tamu.edu/FiniteMath/Module8/Introduction.html) Experimental Design • Observational (Descriptive) • Experimental • What do we need to answer before we do a study? – What is your population? – Once the population for your experiment has been determined, you select a representative sample. Types of Information • What types of variables exist? – Categorical • Ordinal • Nominal – Quantitative • Discrete • Continuous What can be used to represent information graphically? • • • • • • • Bar Histogram Box plot Scatterplot Pie Stem and Leaf Any others? What types of descriptive questions do statisticians ask? • Where is the center? • What is the spread? • What is the shape? What types of research questions do statisticians ask? • What is the relationship between or among variables? • What types of relationships are there between or among variables? • What is the cause of an outcome? • Are samples, procedures, groups, trials, experiments, different or the same? Examples and Problems – Researchers chose two groups of 100 high school students. One group receives $5 for every week they have perfect attendance, the other group gets nothing. • What may be potential research questions the researcher is trying to study? • Is there a control group? • What is the treatment? • Is this observational or experimental? Categorical? Quantitative? Ordinal? Nominal? Discrete? Continuous? • • • • • • • • • A person’s height A person’s degree A person’s race A person’s SAT score A person’s shoe size Amount of time it takes to assemble a puzzle The state in which a person lives Rating of a newly elected politician Population in the city of Memphis Represent the information in one graph in the most clear manner • The polls for the Democratic party in the state of Tennessee are showing: Clinton has 33% support, Edwards 40%, and Obama 34%. In the state of Florida: Clinton has 43%, Edwards 35%, and Obama 34%. Represent the information in one graph in the most clear manner • These data represent the percentage of people in each of the 50 states that supports universal heath care. 57, 60 , 50, 52, 42, 51, 39, 42, 49, 56, 38, 70, 43, 58, 48, 59, 57, 43, 40, 44, 33, 47, 46, 57, 51, 59, 63, 50, 48, 41, 47, 36, 56, 61, 50, 61, 49, 47, 32, 57, 61, 51, 60, 67, 41, 53, 45, 52, 48, 70 Determine if the following statements make sense • Your bar graph must be wrong because your bars are wider than those shown in the solutions manual. • Your pie chart must be wrong because when I added the percentages on your wedges, they totaled 124% • I was unable to make a bar chart, because the data categories were qualitative rather than quantitative. • I rearranged the bars on my histogram so that the tallest bar comes first. Find the descriptive statistics • For the previous data set, find the measures of center. • For the previous data set, find the shape. – What does finding the “shape” mean? Finding the spread: Why or why not is this a good measure of spread? When would it be good? When would it be bad? • Range: the difference between the highest and lowest score in a distribution. • Interquartile Range: measure of the spread of the middle 50% of the scores. Defined as the 75th and 25th percentile. • Variance: measure of squared deviation from the mean divided by the N-1 (where N is the sample size). • Standard Deviation: square root of the variance. Measure of variability in same units as the data. Find the measures of spread • Consider the following set of 100-meter dash running times: (time in seconds) 9.92, 9.97, 9.99, 10.01, 10.06, 10.07, 10.08, 10.10, 10.13, 10.13, 9.89, 9.98 What does this all have to do with random variables? • Recall: what is a random variable? • What are the concepts we talked about with random variables? • How do those concepts relate to what we talked about today? • How do use random variables in statistics? • How do probability concepts come into play?