Tuesday August 28 Class 2 Text problems for August 30: Chapter 2 - 2,6 & 10 Aplia Graded Assignment: “Introduction” due September 4, 9:00 am Practice Problems for Chapter 1 & 2 are now available Please note that a tutorial for basic math concepts is available if needed Slide 1 Introduction Statistics: the language Slide 2 Data and Data Sets Data are the facts and or numbers collected, summarized, analyzed, and interpreted. The data collected in a particular study are referred to as the data set. Slide 3 Elements, Variables, and Observations The elements are the entities on which data are collected. A variable is a characteristic of interest for the elements. The set of measurements collected for a particular element is called an observation. The total number of data values in a complete data set is the number of elements multiplied by the number of variables. Slide 4 Data, Data Sets, Elements, Variables, and Observations Variables Element Names Company Dataram EnergySouth Keystone LandCare Psychemedics Stock Exchange NQ N N NQ N Annual Earn/ Sales($M) Share($) 73.10 74.00 365.70 111.40 17.60 0.86 1.67 0.86 0.33 0.13 Data Set Slide 5 Scales of Measurement Scales of measurement include: Nominal Interval Ordinal Ratio The scale determines the amount of information contained in the data. The scale indicates how the data can be summarized and statistical analyses that are most appropriate. Slide 6 Scales of Measurement Nominal Data are labels or names used to identify an attribute of the element. A nonnumeric label or numeric code may be used. Slide 7 Scales of Measurement Nominal Example: Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on). Slide 8 Scales of Measurement Ordinal The data have the properties of nominal data and the order or rank of the data is meaningful. A nonnumeric label or numeric code may be used. Slide 9 Scales of Measurement Ordinal Example: Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on). Slide 10 Scales of Measurement Interval The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric. Slide 11 Scales of Measurement Interval Example: Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 points more than Kevin. Slide 12 Scales of Measurement Ratio The data have all the properties of interval data and the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale. This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. Slide 13 Scales of Measurement Ratio Example: Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice as many credit hours earned as Melissa. Slide 14 Qualitative and Quantitative Data Data can be further classified as being qualitative or quantitative. The statistical analysis that is appropriate depends on whether the data for the variable are qualitative or quantitative. In general, there are more alternatives for statistical analysis when the data are quantitative. Slide 15 Qualitative Data Labels or names used to identify an attribute of each element Often referred to as categorical data Use either the nominal or ordinal scale of measurement Can be either numeric or nonnumeric Slide 16 Quantitative Data Quantitative data indicate how many or how much: discrete, if measuring how many continuous, if measuring how much Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for quantitative data. Slide 17 Scales of Measurement Data Qualitative Numerical Nominal Ordinal Quantitative Non-numerical Nominal Ordinal Numerical Interval Ratio Slide 18 Cross-Sectional Data Cross-sectional data are collected at the same or approximately the same point in time. Example: data detailing the number of building permits issued in June 2007 in each of the counties of Ohio Slide 19 Time Series Data Time series data are collected over several time periods. Example: data detailing the number of building permits issued in Lucas County, Ohio in each of the last 36 months Slide 20 Types of Statistical Studies Statistical Studies In experimental studies the variable of interest is first identified. Then one or more other variables are identified and controlled so that data can be obtained about how they influence the variable of interest. In observational (nonexperimental) studies no attempt is made to control or influence the variables of interest. a survey is a good example Slide 21 Descriptive Statistics Descriptive statistics are the tabular, graphical, and numerical methods used to summarize and present data. Slide 22 Example: Hudson Auto Repair The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Slide 23 Example: Hudson Auto Repair Sample of Parts Cost ($) for 50 Tune-ups 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 Slide 24 Tabular Summary: Frequency and Percent Frequency Parts Cost ($) 50-59 60-69 70-79 80-89 90-99 100-109 Parts Frequency 2 13 16 7 7 5 50 Percent Frequency 4 26 (2/50)100 32 14 14 10 100 Slide 25 Graphical Summary: Histogram Tune-up Parts Cost 18 16 Frequency 14 12 10 8 6 4 2 Parts 50-59 60-69 70-79 80-89 90-99 100-110 Cost ($) Slide 26 Numerical Descriptive Statistics The most common numerical descriptive statistic is the average (or mean). Hudson’s average cost of parts, based on the 50 tune-ups studied, is $79 (found by summing the 50 cost values and then dividing by 50). Slide 27 Statistical Inference Population - the collection of all the elements of interest Sample - a subset of the population Statistical inference - the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population Census - collecting data for a population Sample survey - collecting data for a sample Slide 28 Process of Statistical Inference 1. Population consists of all tuneups. Average cost of parts is unknown. 4. The sample average is used to estimate the population average. 2. A sample of 50 engine tune-ups is examined. 3. The sample data provide a sample average parts cost of $79 per tune-up. Slide 29 Computers and Statistical Analysis Statistical analysis typically involves working with large amounts of data. Computer software is typically used to conduct the analysis. Instructions are provided in chapter appendices for carrying out many of the statistical procedures using Minitab and Excel. Slide 30 Tainted Truth “ If someone is misusing numbers and scaring us with those numbers to get us to do something, however good that something is, we have lost the power of numbers” WE ALL NEED TO BE CRITICAL. Slide 31 Reported Information Eating oat brand is a cheap and easy way to reduce your cholesterol count (Quaker Oats) Actual Study Information Diet must consist of nothing but oat bran to achieve a slightly lower cholesterol count. Slide 32 Reported Information Only 29% of high school girls are happy with themselves, compared to 66% of elementary school girls. (American Association of University Women) Actual Study Information Of 3000 high school girls 29% responded “Always true” to the statement, “I am happy with the way I am.” Most answered, “Sort of true” and “Sometimes true.” Slide 33 Four out of five people in Columbia prefer Wendys over McDonalds (according to a recent survey) ?????? Credible Slide 34 Slide 35 Slide 36 Ethical Guidelines for Statistical Practice American Statistical Association www.amstat.org Slide 37 Association vs Causation Slide 38 Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Summarizing Qualitative Data Summarizing Quantitative Data Slide 39 Summarizing Qualitative Data Frequency Distribution Relative Frequency Distribution Percent Frequency Distribution Bar Graphs Pie Charts Slide 40 Frequency Distribution A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes. The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data. Slide 41 Example: Marada Inn Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are: Below Average Above Average Above Average Average Above Average Average Above Average Average Above Average Below Average Poor Excellent Above Average Average Above Average Above Average Below Average Poor Above Average Average Slide 42 Frequency Distribution Rating Frequency 2 Poor 3 Below Average 5 Average 9 Above Average 1 Excellent Total 20 Slide 43 Relative Frequency Distribution The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. Slide 44 Percent Frequency Distribution The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. Slide 45 Relative Frequency and Percent Frequency Distributions Relative Frequency Rating .10 Poor .15 Below Average .25 Average .45 Above Average .05 Excellent Total 1.00 Percent Frequency 10 15 25 .10(100) = 10 45 5 100 1/20 = .05 Slide 46 Bar Graph A bar graph is a graphical device for depicting qualitative data. On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis). Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category. Slide 47 Bar Graph Marada Inn Quality Ratings 10 9 Frequency 8 7 6 5 4 3 2 1 Poor Below Average Above Excellent Average Average Rating Slide 48 Pie Chart The pie chart is another commonly used graphical device for presenting relative frequency distributions for qualitative data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle. Slide 49 Pie Chart Marada Inn Quality Ratings Excellent 5% Poor 10% Above Average 45% Below Average 15% Average 25% Slide 50 Example: Marada Inn Insights Gained from the Preceding Pie Chart • One-half of the customers surveyed gave Marada a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager. • For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager. Slide 51 Pie Chart Marada Inn Quality Ratings Excellent 5% Poor 10% Above Average 45% Below Average 15% Average 25% Slide 52 See Example 1 Class 2 data file Text problems for August 30: Chapter 2 – 2 , 6 & 10 Slide 53 Slide 54 Slide 55