Business Statistics: A Decision-Making Approach 7th Edition Chapter 1 The Where, Why, and How of Data Collection What is Statistics? Statistics is the development and application of methods to collect, analyze and interpret data. Modern statistical methods involve the design and analysis of experiments and surveys, the quantification of biological, social and scientific phenomenon and the application of statistical principles to understand more about the world around us. Statistics is a discipline which is concerned with: designing experiments and other data collection, summarizing information to aid understanding, drawing conclusions from data, and estimating the present or predicting the future. Population vs. Sample Population a b Sample cd b ef gh i jk l m n o p q rs t u v w x y z c gi o n r y u Populations and Samples A Population is the set of all items or individuals of interest Examples: All likely voters in the next election All parts produced today All sales receipts for November A Sample is a subset of the population Examples: 1000 voters selected at random for interview A few parts selected for destructive testing Every 100th receipt selected for audit Why Sample? Less time consuming than a census Less costly to administer than a census It is possible to obtain statistical results of a sufficiently high precision based on samples. Sampling Techniques Sampling Techniques Nonstatistical Sampling Convenience Statistical Sampling Simple Random Systematic Judgment Not interested in…… Stratified Cluster Statistical Sampling Items of the sample are chosen based on known or calculable probabilities Statistical Sampling (Probability Sampling) Simple Random Stratified Video Clip Systematic Cluster Please read the book Example of Random Sampling Suggesting how the statistical sampling techniques can be used to gather data on employees' preferences for scheduling vacation times. Simple random sampling could be used by assigning each employee a number and then using a random number generator to select employees. Table and Excel Toolpak Simple Random Sampling Every possible sample of a given size has an equal chance of being selected Selection may be with replacement or without replacement The sample can be obtained using a table of random numbers or computer random number generator Type of Statistics Descriptive statistics Mathematical methods (such as mean, median, standard deviation) that summarize and interpret some of the properties of a set of data (sample) but do not infer the properties of the population from which the sample was drawn. Inferential statistics Mathematical methods (such as hypothesis development) that employ probability theory for deducing (inferring) the properties of a population from the analysis of the properties of a set of data (sample) drawn from it. It is concerned also with the precision and reliability of the inferences it helps draw. Descriptive Statistics Collect data e.g., Survey, Observation, Experiments Present data e.g., Charts and graphs Characterize data e.g., Sample mean = x n i Inferential Statistics Making statements about a population by examining sample results Sample statistics (known) Population parameters Inference Sample (unknown, but can be estimated from sample evidence) Population Inferential Statistics Drawing conclusions and/or making decisions concerning a population based on sample results. Estimation e.g., Estimate the population mean weight using the sample mean weight Hypothesis Testing e.g., Use sample evidence to test the claim that the population mean weight is 120 pounds Tools for Collecting Data Data Collection Methods Experiments Telephone surveys Written questionnaires Direct observation and personal interview Survey Design Steps Define the issue what are the purpose and objectives of the survey? Define the population of interest Develop survey questions make questions clear and unambiguous use universally-accepted definitions limit the number of questions Survey Design Steps (continued) Pre-test the survey pilot test with a small group of participants assess clarity and length Determine the sample size and sampling method Select sample and administer the survey Types of Questions Closed-end Questions Select from a short list of defined choices Example: Major: __business __liberal arts __science __other Open-end Questions Respondents are free to respond with any value, words, or statement Example: What did you like best about this course? Demographic Questions Questions about the respondents’ personal characteristics Example: Gender: __Female __ Male Data (variable) Types Data Qualitative (Categorical) Quantitative (Numerical) Examples: Discrete Marital Status Political Party Eye Color (Defined categories) Continuous Examples: Number of Children Defects per hour (Counted items) Examples: Weight Voltage (Measured characteristics) Qualitative vs. Quantitative Variables (Data) Qualitative variables (data) take on values that are names or labels. Example: the color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) Quantitative variables are numerical. They represent a measurable quantity. Example: # of students in CSUB or # of people in Bakersfield Discrete vs. Continuous Variables (Data) Quantitative variables can be further classified as discrete or continuous. If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable. Example: The fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds. Discrete vs. Continuous Variables (Data) Example: If we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. That is, we could not, for example, get 2.3 heads. Therefore, the number of heads must be a discrete variable. Data Measurement Levels Measurements Ratio/Interval Data Rankings Ordered Categories Categorical Codes ID Numbers Category Names Ordinal Data Nominal Data Highest Level Complete Analysis Higher Level Mid-level Analysis Lowest Level Basic Analysis Nominal Nominal basically refers to categorically discrete data such as name of your school, type of car you drive or name of a book. This one is easy to remember because nominal sounds like name (they have the same Latin root). Ordinal Ordinal refers to quantities that have a natural ordering. The ranking of favorite sports, the order of people's place in a line, the order of runners finishing a race or more often the choice on a rating scale from 1 to 5. With ordinal data you cannot state with certainty whether the intervals between each value are equal. For example, we often using rating scales (Likert-Scale questions). On a 10 point scale, the difference between a 9 and a 10 is not necessarily the same difference as the difference between a 6 and a 7. This is also an easy one to remember, ordinal sounds like order. Interval Interval data is like ordinal except we can say the intervals between each value are equally split. The most common example is temperature in degrees Fahrenheit. The difference between 29 and 30 degrees is the same magnitude as the difference between 78 and 79 (although I know I prefer the latter). With attitudinal scales and the Likert questions you usually see on a survey, these are rarely interval, although many points on the scale likely are of equal intervals. Ratio Ratio data is interval data with a natural zero point. For example, time is ratio since 0 time is meaningful. Degrees Kelvin has a 0 point (absolute 0) and the steps in both these scales have the same degree of magnitude. Data Types Time Series Data Ordered data values observed over time Cross Section Data Data values observed at a fixed point in time Data Types Sales (in $1000’s) 2003 2004 2005 2006 Atlanta 435 460 475 490 Boston 320 345 375 395 Cleveland 405 390 410 395 Denver 260 270 285 280 Cross Section Data Time Series Data