Stat 225 Lecture Notes Day 1 Day 1 (Chapter 1, 7.8) Stat Applications and Types of Data Data: facts and figures collected, analyzed and summarized for presentation and interpretation. Data Set: all data collected in a particular study Elements: individual entities of a data set Variables: a characteristic of interest for the elements Observations: The set of measurements collected for a particular element Student 1 Student 2 Student 3 Gender F M M Class Fresh Junior Senior Miles from Home 65 1,005 32 Credits 18 12 16 Ice Cream Chocolate Vanilla Rocky Road The data set is the entire spreadsheet full of data. Each student is an element. For a class with 40 people, there will be 40 elements in the data set. The variables are gender, class, miles from home. The observations are (F, Fresh, 65, 18, Chocolate) for the first element. For a class of 40 students, there will also be 40 sets of observations. Types of Data Qualitative: labels or names used to identify an attribute of each element Nominal: order does NOT matter (gender, race, marital status) Ordinal: order DOES matter (class [fresh, soph, jr, sr]) Quantitative: require numeric values that indicate how much or how many Interval: ratios of quantities cannot be compared (temp (C/F), IQ score) for example, if it is 4 degrees out today and it was 2 degrees out yesterday, it does NOT mean it is twice as warm today as it was yesterday. If that were the case, if it was 0 today and its twice as yesterday, how cold was it yesterday? Ratio: ratios of quantities have meaning (height, weight, age) For example, a person who weighs 200 pounds weighs twice as much as someone who weighs 100 pounds. Qualitative Quantitative Nominal Ordinal Interval Ratio Gender Letter Grade Level of Tide Height Marital Status Class Longitude Weight Race Class Rank Year-Dates Temp Kelvin Ethnicity Lickhert Scale Temperature (C/F) Time to run Mile Hometown Degree of Illness IQ Score Salary Religious Affiliation Level of Satisfaction SAT Score Age Smoking Status Birthday Month Length Favorite Sport Clothing Size GPA Instrument Play Prize Medal (G, S, B) Ounces in Glass Compare Qualitative and Quantitative: What What What What What What What type type type type type type type of of of of of of of variable variable variable variable variable variable variable is is is is is is is smoking status? qual SAT score? quant income? quant level of satisfaction? qual GPA? quant clothing size (s, m, l, xl)? qual time it takes to run a mile? quant Cross-Sectional: data collected at the same or approximately the same point in time Time Series Data: data collected over several time periods. Cross-Sectional Temperature of Jan 1st in each state Profit of each store in Tippecanoe Mall earned on Black Friday Resting Heart rate of each person in classroom Number of car accidents on Friday afternoon in each county of IN Time Series Temperature on the first of every month Profit of a store for each day from Thanksgiving to Christmas Heart rate before, during, and immediately after vigorous exercise Number of car accidents each weekend of Football season in West Lafayette Sources of Data Existing Sources: Student Records including: student ID, GPA, number of credits taking, courses Surveys: Teacher Evaluations: how well class time was used, best aspects of course, how helpful was homework etc. Experiments: When testing a new drug to lower blood pressure, have three groups, one taking a placebo, one taking generic brand, one taking name brand. The data collected will represent the difference in blood pressure from start. *key thing in an experiment rather than observational study is that you manipulate and control the groups, such as assigning different treatments (drugs) to each one* Observational Studies: Comparing the sales of different grocery stores in West Lafayette (simply observing their sales records and are not applying a treatment to any group) *key thing in an observational study rather than an experiment is that you are simply observing what happens and are not giving a specific treatment to anything* Types of Sampling: Simple Random Sampling: A sample is selected from a population in such a way that each element has the same probability of being selected. Sampling With Replacement: Elements are put back in the population after being selected for the sample allowing for a chance of being selected more than once for a single sample Sampling Without Replacement: Elements are not replaced after being selected and are therefore only chosen once to be in a sample. **what is more common, with replacement or without replacement?** Stratified Random Sampling: the population is first divided into strata (groups) and a simple random sample is then taken from each strata. Cluster Sampling: the population is first divided into clusters and then a simple random sample of the clusters is taken (where each element in the selected cluster is surveyed). Systematic Sampling: first, one of the first k elements is selected and then every kth element thereafter is selected. Convenience sampling: elements are selected for the sample of a basis of convenience. Judgment sampling: elements are selected for the sample based on the judgment of the person doing the study. What are examples of each type of sample and when might they be used? Population: the set of all elements of interest in a particular study Sample: a subset of the population **Who you are sampling** Census: the process of conducting a survey to collect data for the entire population Sample Survey: the process of conducting a survey to collect data for a sample **How you would sample** Statistical Inference: using data from a sample to estimate the characteristic of a population. ***Activity from Ellen – counting e’s*** 1. You will have 2 minutes to do your inspection. 2. You may not alter the product you are inspecting. This means no writing on or folding the paper. Now, I need to make sure the instructions are clear. Does everyone understand what a letter “e” looks like? We are counting both capital “E” and lowercase “e” as defects. Elegant, extravagant elephants entertain every evening at seven. They serve escargot and eggs benedict and endive. Eight elderly elegant elephants elevate themselves to the expensive entrance with elevators exceeding expectations. Eating everything edible, elephants expand exponentially. “Excellent!” the entertained elephants express after the entertaining entrees were served. Everything was expedited by the energetic efforts of the executive elephant empress. Everyone was entertained to excess and enjoyed the edible endeavors immensely. The evening ended enchantedly with Echinacea herbal tea. Why are you all getting different numbers? You said you understood the directions, and you didn’t have any other questions for me. You had plenty of time. What happened? The right answer is 126. You just did what is called a “census.” You inspected every single letter of the entire paragraph to figure out how many “e”s you had. Now let’s try this with a sample. There are 7 lines in this paragraph. Each of you should pick one line and count the number of “e”s in just your one line. (Give them about 30 seconds for this.) Now multiply your number by 7. What do you get this way? (Ask around. You should get between 119 and 133 “e”s with the sampling method. Line 1 has 17, line 2 has 19, line 3 has 17, line 4 has 18, line 5 has 19, line 6 has 19, and line 7 has 17.) Why sample rather than population? Hard to sample EVERYONE Too expensive to sample everyone Too much time/effort to sample everyone