Descriptive Statistics Unit 6 Variable • Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton count • Can be classifies as either: -categorical -quantitative: *discrete *continuous Categorical Variable • Data belongs to one of a set of categories Exs: 1. Gender (Male or Female) 2. Pets owned (dog, cat, great white…) 3. Type of food imported (beef, pork, shellfish …) 4. Engage in 30 minutes of exercise daily (Yes or No) • Type of graph(s): bar, pie Pie Charts • Summarizes categorical variable • Drawn as circle where each category is a slice • The size of each slice is proportional to the percentage in that category Bar Graphs • Summarizes categorical variable • Vertical bars for each category • Height of each bar represents either counts or percentages • Easier to compare categories with bar graph than with pie chart • Called Pareto Charts when ordered from tallest to shortest Quantitative Variable • Data is given numerical values for different magnitudes. Exs: 1. Age of test subjects 2. Number of siblings 3. Seasonal changes in pH of pond water • Type of graph: scatter-plot, line, stem and leaf Quantitative vs. Categorical • For Quantitative variables, key features are the center (a representative value) and spread (variability). • For Categorical variables, a key feature is the percentage of data in each of the categories Discrete Quantitative Variable • Quantitative variable is discrete if its possible values form a set of separate numbers: 0,1,2,3,…. • Exs: 1. Number of calico cats sold 2. Number of nests with down linings 3. Number of students who fall asleep in Stats class Continuous Quantitative Variable • Quantitative variable is continuous if its possible values form an interval • Measurements • Examples: 1. Height/Weight 2. Age 3. Blood pressure Most Common Way to Describe Data • Central tendency • Statistical variation Central Tendency • Used to represent entire data set • Highlights distribution of data • Measures one of the following: mode, mean, and median Mode • Value that occurs most often • Highest bar in the histogram • Mode is most often used with categorical data • Best if not used alone 12, 12, 13, 14, 14, 15, 15, 15, 15, 37, 38 2, 3, 3, 4, 5, 5- bimodal 65, 68, 69, 71, 72, 73, 75, 77- mode? Mean • The sum of the observations divided by the number of observations • Measure of centermost point when there is a symmetrical distribution of values in a data set • Mean = Σx Σ- sum n n- total number of values 8g/cm³, 10g/cm³, 7g/cm³, 9g/cm³ 8g/cm³ + 10g/cm³ + 7g/cm³ + 9g/cm³ 4 34g/cm³ 4 8.5g/cm³ Median • Midpoint of the observations when ordered from least to greatest • Used when there are extremes in data 1. Order observations 2. If the number of observations is: a) Odd, the median is the middle observation b) Even- the median is the average of the two middle observations Central Tendency • If data set has normal distribution: mean, median and mode are the same value • If data set is not distributed normally: values of central tendency will vary. *requires inferential statistics: t-test, ANOVA Comparing the Mean and Median • Mean and median of a symmetric distribution are close • In a skewed distribution: the mean is farther out than the median Statistical Variation • Shows how scores differ from one another • AKA: variation, dispersion, spread • Represent average difference from the mean • Four measures of variation: range, interquartile range, standard deviation, variance Range • Most general measure of variation • Measures difference between highest and lowest values: spread of data Ex. pH 6, 6, 6, 7, 7, 7, 7, 5, 3 range: 7-3 = pH 4 Range • range is strongly affected by outliers. Interquartile Range- IQR • AKA mid-fifty or midspread • Organizes data into 4 quartiles, each with 25% of data • To calculate IQR: 1. Find median of entire data set 2. Find median of lower half of set- lower quartile 3. Find median of upper half of set- upper quartile Quartiles Measure of Spread: Quartiles * 25% of the data at or below Q1 and 75% above * 50% of the obs are above the median and 50% are below * 75% of the data at or below Q3 and 25% above Q1= first quartile = 2.2 M = median = 3.4 Q3= third quartile = 4.35 Calculating Interquartile Range • Interquartile range: distance between the third and first quartile, giving spread of middle 50% of the data: • IQR = Q3 - Q1 Standard Deviation • Each data value has an associated deviation from the mean, x x • A deviation is positive if it falls above the mean and negative if it falls below the mean • The sum of the deviations is always zero Standard Deviation Standard deviation: summarizes the deviations of each observation from the mean and calculates an adjusted average of these deviations: 1. Find mean 2. Find each deviation 3. Square deviations 4. Sum squared deviations 5. Divide sum by n-1 6. Take square root Outlier An outlier falls far from the rest of the data Graphs for Quantitative Data 1. Dot Plot: shows a dot for each observation placed above its value on a number line 2. Stem-and-Leaf Plot: displays individual observations 3. Histogram: uses bars to portray the data Which Graph? • Dot-plot and stem-and-leaf plot: • More useful for small data sets • Data values are retained • Histogram • More useful for large data sets • Most compact display • More flexibility in defining intervals content.answers.com Dot Plots To construct a dot plot 1. Draw and label horizontal line 2. Mark regular values 3. Place a dot above each value on the number line Sodium in Cereals Stem-and-leaf plots • Summarizes quantitative variables • Separates each observation into a stem (first part of #) and a leaf (last digit) • Write each leaf to the right of its stem; order leaves if desired Sodium in Cereals