Lecture Unit 1 Stats Starts Here Objectives: be able to – Identify the Who, What, Why, When, Where and How associated with data Identify different types of data variables Statistics: An Overview Everyday experiences: Gallup polls, newspaper articles, lotteries, CPI, unemployment data, your admittance to NCSU (predicted GPA) Basic stock data College data Increasing in importance; used in more and more ways in many disciplines NY Times: Statistics Sports Analytics NCSU Sports Analytics Broad Definition Many disciplines can be summarized in a few words: Economics is about … Money (and why it is good) Psychology: Why we think what we think Biology: Life Anthropology: Who? History: What, where, and when? Philosophy: Why? Engineering: How? Accounting: How much? Statistics is about … Variation The discipline of Statistics deals with the efficient collection and the analysis of data to solve real-world problems in the presence of variability. More Specifically … Q. What is Statistics? A. Statistics is a way of reasoning, along with a set of tools and methods, designed to help us understand the world. Q. What are statistics? A. statistics are quantities calculated from data. 2 Broad Areas of Applications 1. Descriptive statistics utilizes numerical and graphical methods to summarize data, look for patterns and trends, present information Descriptive statistics lack a measure of reliability Second Area Inferential statistics Uses data to make estimates, decisions, predictions or other generalizations about a larger data set or population Inferential statistics have a measure of reliability Opinion Polling Common Situations that Require Statistics An opinion poll wants to know what fraction of the public approves of the president’s performance in office. 2. Will a new package design increase sales enough to pay the cost of implementing the new design Tropicana Disaster 3. Gov’t economists release monthly reports about the nation’s economic activity Large groups of people or things Time, cost, inconvenience 1. Three Simple Steps to Doing Statistics Correctly Plan first. Know where you’re headed and why. Do. The mechanics of calculating statistics and making graphical displays are important, but the computations are usually the least important part of the process. Report what you’ve learned. 8 SECTION 1.2 Types of Data Data: numbers with a context Data: values and their context 815, 930, 750, 919 What can you do with these? Find the sum? Find the average? Seems reasonable if these are, for example, SAT scores. BUT these are telephone area codes! Adding and averaging make no sense. Know the context of the data Who: items included in the data What: variable(s) measured on each item Why: purpose for collecting the data ------------------------------- Where: location(s) where data collected When: last week? 1 year ago? last decade? How: internet survey? (worthless); data provided by gov’t agency? (useful) Data Types Qualitative Data Data that categorizes Ex. Male/female, Democrat/Republican, yes/no, Chevy/Buick/Pontiac/Oldsmobile, Awful/Fair/Good/Very Good/Excellent 1. 1a) Nominal (categorical): categorizes only Buick, Chevy, Pontiac 1b) Ordinal: categories can be ranked or ordered taste test; order of finish in a race DataTypes (cont.) Wendy’s is developing a new hamburger. A panel of taste-testers evaluates the new item. Categories: Excellent Very Good Good Poor Gag Ordinal - there is a natural ranking DataTypes (cont.) Wendy’s is developing a new hamburger. A panel of taste-testers evaluates the new item. Categories: Excellent = 5 Very Good = 4 Good = 3 Poor = 2 Gag = 1 Ordinal - there is a natural ranking Data Type Dictates Statistical Procedures Quantitative data Data that is measured on a numerical scale Ex. height, GPA, income, temperature, SAT 1. 2a) interval data no meaningful zero point; difference between 2 values meaningful; cannot meaningfully multiply or divide Ex. temperature, SAT DataTypes Ex. (cont.) 60o F not twice as warm as 30o F; the difference between 32o and 30o same as difference between 83o and 810, 2 degrees in each case. (No meaningful “zero”; 0 degrees not the absence of all heat) Ratio data zero point meaningful; can multiply and divide Ex. income, height, GPA, pulse rate; $200 is twice as much as $100; $0 is the absence of all money Why Do We Care About the Type of Variable We Have? The type of data we have dictates the statistical procedures (graphics, summaries, inference techniques) that we can use. Summaries of categorical data: Proportions, counts, tables, bar charts Example: student opinion of quality of NCSU campus food. Excellent: 10%, Very Good: 12% Good: 25%, Fair: 35%, Poor: 18% OPINION PERCENT Excellent 10% Very Good 12% Good 25% Fair 35% Poor 18% Summaries of quantitative data: Averages, medians, stand. dev., histograms Example: maximum speed (mph) of 198 roller coasters from around the world. average: 57.1, median: 55.9, standard deviation: 18.5 mph We collect these data from 50 students. Which variable is categorical? A. B. C. D. Eye color Head circumference Hours of homework last week Number of TV sets in home 0% A. 0% B. 0% C. 0% D. Registration and Records collects data on NCSU students. Which one of the following is quantitative? 1. Class ( freshman, sophomore, etc.) 2. Grade point average 3. Whether the student took an AP class 4. Whether the student has taken the SAT 0% 1 0% 2 0% 3 0% 4 End of Section 1.2