Business Statistics Introduction & Descriptive Stats

Statistics for Business STAT130 Unit 1: Introduction and Descriptive Statistics Chapter 1 An Introduction to Business Statistics Applications in Business and Economics  Accounting   Production   Public accounting firms use statistical sampling procedures when conducting audits for their clients. A variety of statistical quality control charts are used to monitor the output of a production process. Marketing  Electronic point-of-sale scanners at retail checkout counters are being used to collect data for a variety of marketing research applications. 3 Applications in Business and Economics  Economics   Economists use statistical information in making forecasts about the future of the economy or some aspect of it. Finance  Financial advisors use a variety of statistical information, including price-earnings ratios and dividend yields, to guide their investment recommendations. 4 Key Definitions       A population is the collection of all items or things under consideration –people or objects A sample is a portion of the population selected for analysis A parameter is a summary measure that describes a characteristic of the population A statistic is a summary measure computed from a sample A survey is the gathering of data about a particular group of people or items A census is a survey of the entire population 5 Exercise  A manufacturer of children toys claims that less than 5% of his products are defective. When 500 toys were drawn from a large production run, 8% were found to be defective. a) b) c) d) e) What is the population of interest? What is the sample? What is the parameter? What is the statistic? Does the value 5% refer to the parameter or the statistic? f) Is the value 8% a parameter or a statistics? g) Explain briefly how the statistic can be used to make inferences about the parameter to test the claim. 6 What is Statistics?  Statistics is a science that deals with collecting and analyzing data, drawing conclusions, and making decisions.  There are two main areas of Statistics:   Descriptive statistics: provides tabular and graphical techniques and numerical measures for describing data. Inferential statistics: provides procedures for analyzing data and making decisions. 7 Descriptive Statistics  Collect  data e.g. Survey  Present  data e.g. Tables and graphs  Characterize  data e.g. Sample mean =  Xi n 8 Inferential Statistics  Estimation   e.g.: Estimate the population mean weight using the sample mean weight Hypothesis testing  e.g.: Test the claim that the population mean weight is over 120 pounds Drawing conclusions and/or making decisions concerning a population based on sample results. 9 Inferential Statistics  Making statements about a population by examining sample results Sample statistics (known) Population parameters Inference Sample (unknown, but can be estimated from sample evidence) Population 10 Sources of data  The most popular sources of data are:   Published material, observational studies, experimental studies and surveys. Published material found in books, in scientific journals, on tapes, on CDs, on the Internet, etc…   Data published by the organization that collected the data are called PRIMARY DATA Data published by an organization other than the organization that collected the data are called SECONDARY DATA. 11 Sources of data  Observational studies:   Experimental studies:   are studies in which the sample elements are observed and the information is recorded without controlling any of the factors that might affect the information or measurements. are studies which the measurements are recorded while controlling some factors that might influence the results of the study. Surveys:  are questionnaires designed to solicit information from people, by means of (face-to-face interview, telephone interview, postal mail, e-mail, fax) 12 Types of data   Data are the facts, figures, or records that are collected from the sample elements. Data can be classified:  Qualitative data are labels or names used to identify attributes of the sample elements. The labels can be numbers with no real numerical meaning.   Examples: gender, marital status, race, .. Quantitative data are numbers (with real meaning), representing measurements, obtained from the sample elements.  Examples: salary, age, number of branches,.. 13 Measurement Scales  Nominal data if the order is not important.   Examples: data representing marital status, gender, work sector (public, private), get promoted (yes, no), etc … Ordinal data if the order is important.  Examples: data representing job performance (excellent, good, fair, poor), income level (low, medium, high), educational level (less than high school, high school, college), etc… 14 Measurement Scales  Interval data: All of the characteristics of ordinal plus…  Measurements are on a numerical scale with an arbitrary zero point    Can only meaningfully compare values by the interval between them    The “zero” is assigned: it is nonphysical and not meaningful Zero does not mean the absence of the quantity that we are trying to measure Cannot compare values by taking their ratios “Interval” is the arithmetic difference between the values Example: temperature   0 F means “cold,” not “no heat” 80 F is not twice as warm as 40 F 15 Measurement Scales  Ratio data: All the characteristics of interval plus…  Measurements are on a numerical scale with a meaningful zero point   Values can be compared in terms of their interval and ratio    Zero means “none” or “nothing” $30 is $20 more than $10 $0 means no money In business and finance, most quantitative variables are ratio variables, such as anything to do with money  Examples: Earnings, profit, loss, age, distance, height, weight 16 Exercise  After the graduation ceremonies at a university, six Business graduates were asked whether they will join an MBA program next year. Some information about these graduates is shown below. Graduate Huda Mohamed Sara Ali Fatima Samer Sex F M F M F M Age 52 24 33 38 25 19 MBA 1 1 0 0 1 0 Rank 1 2 4 20 3 8 a)How many elements are in the data set? b)How many variables are in the data set? c) How many observations are in the data set? d)Classify the above variables (qualitative/ quantitative). 17 Sampling  Reasons for Drawing a Sample    It may cost too much to collect information from each element of the population. The population may be too large and it would take a long time to collect information. It may not be possible to obtain information from some elements of the population. Probability Samples Simple Systematic Stratified Cluster 18 Simple Random Samples  Every individual or item from the frame has an equal chance of being selected.  Selection may be with replacement or without replacement.  Samples obtained from computer random number generators. 19 Systematic Samples  Decide on sample size: n  Divide frame of N individuals into groups of k individuals: k=N/n  Randomly select one individual from the 1st group.  Select every kth individual thereafter N = 64 n=8 k=8 First Group 20 Stratified Samples    Population divided into two or more subgroups (called strata) according to some common characteristic. Simple random sample selected from each subgroup. Samples from subgroups are combined into one. Population Divided into 4 strata Sample 21 Cluster Samples  Population is divided into “clusters,” each representative of the population  A simple random sample of clusters is selected  All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique Population divided into 16 clusters. Randomly selected clusters for sample 22 Advantages and Disadvantages  Simple random sample and systematic sample    Stratified sample   Simple to use May not be a good representation of the population’s underlying characteristics that have small probabilities Ensures representation of individuals across the entire population Cluster sample   More cost effective Less efficient (need larger sample to acquire the same level of precision) 23 Chapter 2 Descriptive Statistics: Tabular and Graphical Methods Organizing and Presenting Data  Data in raw form are usually not easy to use for decision making  Some type of organization is needed    Table Graph Techniques reviewed here:     Stem-and-Leaf Display Frequency Distributions and Histograms Bar charts and pie charts Contingency tables and Scatter Diagrams 25 Representing Qualitative Data Qualitative Data Graphing Data Tabulating Data Frequency Table Bar Charts Pie Charts 26 Frequency Tables    A frequency table consists of two columns, one of which shows the categories or classes and the other specifies the frequency for each category. In a frequency table, all frequencies must add up to the sample size (n). A relative frequency table consists of two columns, one of which shows the categories or classes and the other specifies the relative frequency for each category. The relative frequency=(Frequency/sample size) 27 Example  The following table lists all 251 vehicles sold in 2006 by the greater Cincinnati Jeep dealers Jeep Model Frequency Commander 71 Grand Cherokee 70 Liberty 80 Wrangler 30 251 28 Example: Relative Frequency Table Jeep Model Relative Frequency Percent Frequency Commander 0.2829 28.29% Grand Cherokee 0.2789 27.89% Liberty 0.3187 31.78% Wrangler 0.1195 11.95% 1.0000 100.00% 29 Bar Charts and Pie Charts  Bar chart: A vertical or horizontal rectangle represents the frequency for each category    Height can be frequency, relative frequency, or percent frequency What to Look For: Frequently and infrequently occurring categories. Pie chart: A circle divided into slices where the size of each slice represents its relative frequency or percent frequency  What to Look For: Categories that form large and small proportions of the data set. 30 Excel Bar Chart 31 Excel Pie Chart 32 Exercise  A random sample of 25 female shoppers was selected on a given day and each shopper was asked: “what is your favorite shampoo?”. The data were as follows: p, p, s, d, s, d, d, s, p, d, p, d, d, s, d, p, s, s, d, s, p, d, d, s, d, where d= Dove, p= Pantene and s= Sunsilk. Construct a frequency table, a bar chart and a pie chart and comment on the plots. 33 Representing Quantitative Data Quantitative Data Ordered Array Stem and Leaf Display Frequency Distributions and Cumulative Distributions Histogram Polygon Ogive 34 Frequency Distributions  A frequency distribution is a list or a table    containing class groupings (categories or ranges within which the data falls) and the corresponding frequencies with which data falls within each grouping or category Why Use Frequency Distributions?     A frequency distribution is a way to summarize data The distribution condenses the raw data into a more useful form allows for a quick visual interpretation of the data and easy graphical display 35 Class Intervals and Class Boundaries  If each class grouping has the same width  Determine the width of each interval by Width of interval     range number of desired class groupings Use at least 5 but no more than 15 groupings Class boundaries never overlap Round up the interval width to get desirable endpoints 36 Frequency Distribution Example  A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature  Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58  Find range: 58 - 12 = 46  Select number of classes: 5 (usually 5 to 15)  Compute class interval (width): 10 (46/5 then roundup)  Compute class boundaries (limits): 10, 20, 30, 40, 50, 60  Compute class midpoints: 15, 25, 35, 45, 55  Count observations & assign to classes 37 Frequency Distribution Example Ordered Data: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 Total Frequency Relative Frequency 3 6 5 4 2 20 .15 .30 .25 .20 .10 1.00 Percentage 15 30 25 20 10 100 38 The Histogram      A graph of the data in a frequency distribution is called a histogram The class boundaries (or class midpoints) are shown on the horizontal axis frequency is measured on the vertical axis Bars of the appropriate heights can be used to represent the number of observations within each class What to Look For: Central or typical value, extent of spread or variation, general shape, location and number of peaks, presence of gaps and outliers. 39 Histogram Example Class 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 Class Midpoint Frequency 15 25 35 45 55 3 6 5 4 2 Histogram : Daily High Tem perature 7 6 Frequency 6 (No gaps between bars) 5 5 4 4 3 3 2 2 1 0 0 0 5 15 25 35 45 55 More 40 Shapes of Histograms symmetric histograms skewed histograms 41 Frequency Polygons   Plot a point above each class midpoint at a height equal to the frequency of the class Useful when comparing two or more distributions 42 Cumulative Distributions and Ogive    Another way to summarize a distribution is to construct a cumulative distribution Rather than a count, we record the number of measurements that are less than the upper boundary of that class Ogive: A graph of a cumulative distribution    Plot a point above each upper class boundary at height of cumulative frequency Connect points with line segments Can also be drawn using   Cumulative relative frequencies Cumulative percent frequencies 43 Cumulative Frequency Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Freq % Class Cumulative Frequency 10 - <20 3 15 less than 20 3 15 20 - <30 6 30 less than 30 9 45 30 - <40 5 25 less than 40 14 70 40 - <50 4 20 less than 50 18 90 50 - <60 2 10 less than 60 20 100 20 100 Class Total Cumulative % 44 Graphing Cumulative Frequencies: The Ogive (Cumulative % Polygon) less than 10 less than 20 less than 30 less than 40 less than 50 less than 60 10 20 30 40 50 60 0 15 45 70 90 100 Ogive: Daily High Temperature 100 Cumulative Percentage Class Lower Cumulative class boundary Percentage 80 60 40 20 0 10 20 30 40 50 60 45 Exercise  A random sample of 25 stocks was selected from the New York Stock Exchange and the book value (net worth divided by The number of outstanding shares) was recorded for each stock. The data were as follows: 10 11    8 16 14 4 10 8 12 9 7 14 13 7 10 17 8 11 9 15 8 6 18 9 12 Construct a frequency table Construct a histogram and describe the distribution. Determine the cumulative frequency table 46 Stem and Leaf Display  Purpose is to see the overall pattern of the data, by grouping the data into classes     the variation from class to class the amount of data in each class the distribution of the data within each class What to look for: The display conveys information about a representative to a typical value in the data set, the extent of spread about such a value, the presence of any gaps in the data, the extent of symmetry in the distribution of values, the number and location of peaks, and the presence of any outliers (unusual points). 47 Example Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41  Here, use the 10’s digit for the stem unit: Stem Leaf   21 is shown as 38 is shown as Stem 2 1 3 8 Leaves 2 1 4 4 6 7 7 3 0 2 8 4 1 48 Car Mileage: Results   Refer to the Car Mileage Case (Table 2.14) Looking at the stem-andleaf display, the distribution appears almost “symmetrical”  The upper portion (29, 30, 31) is almost a mirror image of the lower portion of the display (31, 32, 33)   Stems 31, 32*, 32, and 33* But not exactly a mirror reflection 49 Crosstabulation Tables  Classifies data on two dimensions    Rows classify according to one dimension Columns classify according to a second dimension Requires three variable 1. 2. 3. The row variable The column variable The variable counted in the cells 50 Example: The Investor Satisfaction Case   Investment broker sells several kinds of investments (stock fund, bond fund, tax-deferred annuity) Wishes to study whether satisfaction depends on the type of investment product purchased Fund Type High Medium Low Total Bond Fund 15 12 3 30 Stock Fund 24 4 2 30 1 24 15 40 40 40 20 100 Tax Deferred Annuity Total 51 More on Crosstabulation Tables    Row totals provide a frequency distribution for the different fund types Column totals provide a frequency distribution for the different satisfaction levels One way to investigate relationships is to compute row and column percentages   Compute row percentages by dividing each cell’s frequency by its row total and expressing as a percentage Compute column percentages by dividing by the column total 52 Row Percentage for Each Fund Type Fund Type High Medium Low Total Bond Fund 50.0% 40.0% 10.0% 100% Stock Fund 80.0% 13.3% 6.7% 100% 2.5% 60.0% 37.5% 100% Tax Deferred Annuity 53 Scatter Plots  Scatter plots are used for bivariate numerical data   The Scatter plot:   Bivariate data consists of paired observations taken from two numerical variables one variable (dependent) is measured on the vertical axis and the other variable (independent) is measured on the horizontal axis. What to look for:  Describe the type of the relationship (linear, nonlinear), the direction (positive, negative) and the strength (strong, moderate, weak). 54 Examples of Scatter Plots Describing direction Describing strength 55 Scatter Plot Example Volume per day Cost per day 23 125 26 140 29 146 33 160 38 167 42 170 50 188 55 195 60 200 Strong positive linear relationship 56 Chapter 3 Descriptive Statistics: Numerical Methods Summary Measures Describing Data Numerically Center and Location Measures of Relative Standing Mean Median Mode Variation Range Percentiles Interquartile Range Quartiles Variance Standard Deviation Coefficient of Variation 58 Measures of Central Tendency  In addition to describing the shape of a distribution, want to describe the data set’s central tendency  A measure of central tendency represents the center or middle of the data Central Tendency Mean Median Mode 59 Mean (Arithmetic Average)  The Mean is the arithmetic average of data values  Sample mean n = Sample Size n x i x  i 1 n Population mean x1  x2    xn  n N = Population Size N x i  i 1 N x1  x2    x N  N 60 Arithmetic Mean    The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 1  2  3  4  5 15  3 5 5 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 1  2  3  4  10 20  4 5 5 61 Median  Not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3  In an ordered array, the median is the “middle” number (50% above, 50% below) 62 Finding the Median  The location of the median: n 1 Median position  position in the ordered array 2    If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of the two middle numbers Note that (n+1)/2 is not the value of the median, only the position of the median in the ranked data 63 Mode       A measure of central tendency Value that occurs most often Not affected by extreme values Mainly used for grouped numerical data or categorical data There may may be no mode There may be several modes No Mode 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 64 Review Example  Five houses on a hill by the beach $2,000 K House Prices: $2,000,000 500,000 300,000 100,000 100,000 $500 K $300 K $100 K $100 K 65 Example: Summary Statistics  Mean:  Median: middle value of ranked data = $300,000  Mode: most frequent value = $100,000 House Prices: $2,000,000 500,000 300,000 100,000 100,000 ($3,000,000/5) = $600,000 Sum 3,000,000 66 Which measure is the “best”?     Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. For a relatively small number of extreme observations (either very small or very large, but not both), the median is usually better. Choosing:    The mode is meaningful on a nominal scale. The median is meaningful on an ordinal scale. The mean is meaningful on an interval/ratio scale. 67 Shape of a Distribution   Describes how data is distributed Symmetric or skewed  If the distribution is symmetric, then mean=median.  If the distribution is skewed to right, then  mode < median < mean If the distribution is skewed to left, then mode > median > mean 68 Exercise  The following data represent the ages of 20 randomly selected managers: 43 44 49 37 45 35 46 32 47 42 39 40 41 45 41 43 50 47 41 51 a) Find the mean, median and mode for the above data. b) Which measure would you choose to describe the data? Why? 69 Measures of Variability Variability Range Variance Standard Deviation Coefficient of Variation 70 Measures of Variation   Knowing the measures of center is not enough Both of the distributions below have identical measures of central tendency Variation Range Variance Standard Deviation Coefficient of Variation 71 Range Simplest measure of variation  Difference between the largest and the smallest observations:  Range = maximum – minimum Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 72 Disadvantages of the Range  Ignores the way in which data are distributed 7 8 9 10 11 12 Range = 12 - 7 = 5  7 8 9 10 11 12 Range = 12 - 7 = 5 Sensitive to outliers 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5 - 1 = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120 - 1 = 119 73 Variance  Average of squared deviations of values from the mean  Population variance: n N σ  2  (X Sample variance: i  μ) 2 i 1 S2  N  (X i  X ) 2 i 1 n -1 Where Where μ = population mean X = arithmetic mean N = population size n = sample size Xi = ith value of the variable X Xi = ith value of the variable X 74 Standard Deviation Most commonly used measure of variation  The square root of the variance  Shows variation about the mean  Has the same units as the original data   Sample standard deviation: n S  (X i  X) 2 i 1 n -1 75 Example: Sample Standard Deviation Sample Data (Xi) : 10 12 14 n=8 S  15 17 18 18 24 Mean = X = 16 (10  X)2  (12  X)2  (14  X)2    (24  X)2 n 1  (10  16)2  (12  16)2  (14  16)2    (24  16)2 8 1  126 7  4.2426 76 Comparing Standard Deviations Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 S = 3.338 20 21 Mean = 15.5 S = .9258 20 21 Mean = 15.5 S = 4.57 Data B 11 12 13 14 15 16 17 18 19 Data C 11 12 13 14 15 16 17 18 19 77 Coefficient of Variation  Measures relative variation  Always a percentage (%)  Shows variation relative to mean  Is used to compare two or more sets of data measured in different units S   100% CV    X 78 Comparing Coefficients of Variation   Stock A:  Average price last year = $50  Standard deviation = $5 S $5   CVA     100%   100%  10% $50 X Stock B:   Average price last year = $100 Standard deviation = $5 S $5 CVB     100%   100%  5% $100 X Both stocks have the same standard deviation, but stock B is less variable relative to its price 79 The Empirical Rule  If the data distribution is bell-shaped, then the interval: a) (-, +) contains about 68.26% of the values in the population. b) (-2, +2) contains about 95.44% of the values in the population. c) (-3, +3) contains about 99.74% of the values in the population. 80 Example   IQs measured on the Stanford Revision of the Binet– Simon Intelligence Scale have a mean of 100 points and a standard deviation of 16 points. The interval: a) (84, 116) contains about 68.26% of the IQ scores. b) (68, 132) contains about 95.44% of the IQ scores. c) (52, 148) contains about 99.74% of the IQ scores. The scores of 25 randomly selected people are shown below. 66 82 86 88 91 95 96 96 101 102 102 104 105 106 111 116 118 121 124 127 129 97 98 112 115 a) 18 scores (72%) fall in the interval (84, 116). b) 24 scores (96%) fall in the interval (68, 132). c) 25 scores (100%) fall in the interval (52, 148). 81 Exercise  The exam scores for the students in an introductory statistics course are as follows. 88 90 67 63 64 89 76 90 86 84 85 81 82 96 39 100 75 70 34 96 a) Compute the descriptive statistics for the given exam scores. b) Apply the empirical rule and check the consistency with the sample results. Explain your conclusion. 82 Measures of Relative Standing Measures of Relative Standing Percentiles The pth percentile in a data:   Quartiles  1st quartile = 25th percentile p% are less than or equal to this value  2nd quartile = 50th percentile (100 – p)% are greater than or equal to this value  = median 3rd quartile = 75th percentile (where 0 ≤ p ≤ 100) 83 Percentiles  The pth percentile in an ordered array of n values is the value in ith position, where p i (n  1) 100  Example: The 60th percentile in an ordered array of 19 values is the value in 12th position: p 60 i (n  1)  (19  1)  12 100 100  In Excel, write =percentile(array, k), where array is the range of data and k is the percentile 84 value in the range 0-1. Quartiles  Quartiles split the ranked data into 4 equal groups 25% 25% Q1  25% Q2 25% Q3 Example: Find the first quartile Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 (n = 9) Q1= 25th percentile, so find the (25/100)(9+1) = 2.5 position so use the value half way between the 2nd and 3rd values, so Q1= 12.5 85 Interquartile Range and Fences   Difference between the first and third quartiles IQR = Q3 – Q1 Inner fences: Located 1.5IQR away from the quartiles:    Q1 – (1.5  IQR) Q3 + (1.5  IQR) Outer fences: Located 3IQR away from the quartiles:   Q1 – (3  IQR) Q3 + (3  IQR) 86 Outliers  Outliers are measurements that are very different from other measurements   Outliers lie beyond the fences of the box-andwhiskers plot    They are either much larger or much smaller than most of the other measurements Measurements between the inner and outer fences are mild outliers Measurements beyond the outer fences are severe outliers The adjacent values are:   The smallest data point falls above the lower fence. The largest data point falls below the upper fence. 87 Box and Whisker Plot (Boxplot)  A Graphical display of data using 5-number summary: Minimum -- Q1 -- Median -- Q3 -- Maximum  The box plots the:    First quartile (Q1), median (Md), third quartile (Q3). Inner fences, outer fences The “whiskers” are dashed lines that plot the range of the data   A dashed line drawn from the box below Q1 down to the minimum Another dashed line drawn from the box above Q3 up to the maximum. 88 Distribution shapes and boxplots 89 How to construct a Boxplot? 1. Determine the quartiles. 2. Determine the potential outliers and the adjacent values. 3. Draw a horizontal axis on which the numbers obtained in Steps 1 and 2 can be located. Above this axis, mark the quartiles and the adjacent values with vertical lines. 4. Connect the quartiles to each other to make a box, and then connect the box to the adjacent values with lines. 5. Plot the potential outlier with an asterisk. 90 Example: Box-and-Whiskers Plots 91 Example  A sample of 20 people yielded the weekly viewing times, in hours, 25 41 27 32 43 66 35 31 15 5 34 26 32 38 16 30 38 30 20 21  The five-number summary is 5      24 30.5 35.75 66 IQR=35.75-24=11.75 1.5*IQR=1.5*13.5=17.625 Lower Fence=Q1-1.5*IQR=24-17.625=6.375 Upper Fence=Q3+1.5*IQR=35.75+17.625=53.375 The observations, 5 and 66, lie beyond the inner fences and hence should be classified as outlier. The adjacent values are 15 and 43. 92 Example: Excel output  The distribution of the viewing times is right skewed with two outliers. 93 Exercise  IQs measured on the Stanford Revision of the Binet–Simon Intelligence Scale. The scores of 25 randomly selected people are shown below. 66 82 86 88 91 95 96 96 97 98 101 102 102 104 105 106 111 112 115 116 118 121 124 127 129 Identify potential outliers, if any, and construct and interpret a boxplot 94

Business Statistics Introduction & Descriptive Stats

Related documents

Products

Support

Business Statistics Introduction & Descriptive Stats

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib