1 Day One Term/concept Description Statistics o Techniques used to summarize data to answer questions. o Techniques were developed because humans are limited info processors. 3 Purposes of Statistics 3 Stats skills learned in this o Identify when to use a statistic class o Conduct basic statistical analysis o Properly interpret data. Type of question MC MC MC Day Two Term Definition (MC) Variable Characteristics and condition of interest that varies within your sample. - Change/have different values depending on person. - Measured by 1+ questions on survey or interview. Values Possible categories that the variable can take on. - Possible survey answers. Identification (SA) 1. Explanatory (independent) variable: Causes/influences or b4 the other. 2. Outcome (dependent) variable: Caused, influenced, or followed by independent. 1. Look at the choices from which the respondent must answer. 2. Could be in numbers, words, etc. Population The larger group of people you 1. What overall group are you want your study to generalize looking at? to. 2. What group are you trying - Group you want to draw to draw conclusions about? conclusions about. Example Research showed that women are more likely to use Facebook for # of reasons. Identify explanatory/outcome: Explanatory = gender’s impact on reasons people use Facebook. Outcome = reasons women use Facebook. Survey asking likelihood of 1st years returning to DPU in the fall. Choices = very unlikely, somewhat likely, not sure, somewhat likely, very likely. Values = choices. Researchers conducted a survey of 735 Chicago residents who rent their housing about their spending habits. They want to use the data to better understand how Chicago renters spend their money. Sampling Error When you survey a small group of people uncertainty creeps into statistics. 1. The number of people that participated in the study. Population = Chicago renters Same example, but sample size. Usually measured by confidence variable. o E.g. you have certain % of confidence. Discrepancies due to random factors between a sample statistic and a population parameter. Sample Size A representation of all the population; must be chosen wisely. - Sample Subgroup of people from the population that were studied. 1. Conduct consensus If small, depends on time constraints/budget. 2. Use sample size from similar study. 3. Use table to find sample size. 4. Sample size calculator 5. Formula. The number of people that participated in the study Statistic Value that summarizes data from a sample. Parameter Value that summarizes a population. X X 2 Sample = 735 Chicago renters who participated. Same example, but sample size. Sample = 735 Chicago renters who participated. X X Measurement Description Nominal o Separates cases into categories. (=,) o Only provide information to distinguish one thing from another. o Values = differences in type or quantity BUT NOT AMOUNT. o When numbers are used they’re just place holders. Ordinal o Values provide enough (>,<) information to order objects. Whether more or less of characteristic is possessed. o You cannot tell the amount by which values differ. Interval o Differences between values = (-,+) meaningful. i.e. unit of measurement exists. o How much of the characteristic a case possesses. o You can tell differences in amount. Examples o Zip codes o Employee ID numbers o Eye color o Gender o Nationality Application o Mode o Frequency distribution o Entropy o Contingency o Correlation o X2 test. o o o o o o o o o o o Median Frequency distribution Percentiles Tank correlation Run tests Sign tests o o o o Mean Standard deviation Frequency distribution Pearson’s correlation Grades Street numbers Place in race Employee rank Class ranking o Calendar dates. o Temperature (C & F) o Dress size 3 o They have property of units – 1 always means the same. Ratio o Differences & ratios are (*,/) meaningful. o Same as interval, however it has absolute 0: 0 truly represents the absence of the characteristic. o Temperature (in Kelvin) o Monetary quantities o Counts o Age o Mass o Length o Income o Weight o Electrical current o o o o Mean Standard Deviation Frequency distribution Standard error of the mean. o Median & percentiles. o Ratio or coefficient of variation Day Three Terms (MC) Definition Ungrouped Frequency A count of how often each variable value Distribution occurs in a data set. Application Used when values a variable can take are limited. Example Survey asking # siblings students have – majority would probably say 1,2,3. Grouped Frequency The frequency counts are 4 adjacent Distribution groupings of values, or intervals, of the variable. Used when variable has large number of values and it’s acceptable to lose info by collapsing values into intervals. Survey asking the # of kids in graduating high school class; could range from 1500. Discrete Variable Answers the question: “How many?” Whole number values ONLY! Always: Nominal/ordinal-level variables. SOMETIMES interval/ratio are discrete. Answer to someone asking how many jeans you have. How many siblings do you have? How many neurons are in a spinal cord? How much aggression a person has. How much intelligence a person has. Person’s weight. Continuous Variable Answers the question: “How much?” They can take on values between whole numbers – they have fractional values. Fractional values = distance. Histogram Graphic display of a frequency distribution. Height = frequency in intervals. Touching bars = representative of Always interval/ratio level of measurement. Continuous variables. X 4 continuous variables Term Definition Frequency Percentage Frequency distributions Type of Graph Give a raw number. Number of individual cases located in a specific category of value of a variable. All other variables are basically based off this. How often specific value occurred in a sample. Proportions of cases in specific category or value of a variable out of all cases divided by 100. f/s Summarize set of data Tally of often the values, rage of values, occur. For everything except nominal-level data, they can display info about cumulative frequency, percentage and cumulative percentage. Display statistical information. Definition Application Discrete Bar Graph Bars don’t touch each other Used to demonstrate the frequency with which the different values of discrete variables occur. Histogram Clearly labeled axis; height = frequency. Continuous Graphic display of frequency distribution for continuous data. Bars touch. Continuous Frequency Polygon Frequency marked with dots on the midpoint of the interval. Dots connected by lines. Frequencies go to zero at the far left and far right of the graph. 5 Day Four Shape of f Definition Distributions Modality How many peaks exist in the curve of the frequency distribution. Peak: high point (or, mode) and it represents the score/interval with the largest frequency. Skewness Kurtosis Application Measure of symmetry. Asymmetric = skewed. Fancy term for how peaked or flat a distribution is. Outlier Normal Curve Extreme score that falls away from others on the data set. Perfectly symmetric. Not too peaked or too flat. Unimodal: only for normal curve; has one peak. Bimodal: two peaks. Multimodal: 3+ peaks. Positive Skew: tail goes to the right. Negative Skew: tail goes to the left. Leptokurtic: pointy peak. Mesokurtosis : Medium peak. Platykurtic: medium shape curve. X Interpretation Unimodal: one peak at the center of the distribution. X 6 Day Five Central Tendency: Value used to summarize a set of scores – also known as the average. Tell the typical or average score in the database. Often, but not always, located in the center of the database. There are three measures of CT. Levels of CT Description Interpretation Application o “The average of X is #.” o Interval & Ratio Mean o Mathematical average of the score. o Don’t over interpret Without skew o Sum of sample/sample size because not all scores & Outlier will fall in the mean. ONLY. Median o The middle score that separates o “The middle or central o Ordinal the bottom half of scores from score is #.” o Interval & Ratio the rest. With skew & o Odd # of scores = order scores outlier ONLY. high to low, middle = median. o Even # = order low to high, median = average of the middle. o “The most common o Only one that can use Mode o Score with the highest frequency or occurs most often. score is #.” nominal/categorical. o NOT what most people have its o “The score occurring o Any level of what I common, but doesn’t most often is #.” measurement. mean majority. Day Six Amount of spread in set of scores. Low Extent to which scores cluster together – less spread out. High Extent to which scores stretch out widely. Measurement of Variability Definition Interpretation Application o “The o Interval & Range o Most simple to calculate. distance Ratio o The distance between between ONLY the highest and lowest the highest variable. and lowest score is ___.” o Interval & Variance o How close the scores o “The in distribution are to average Ratio the middle of the square ONLY distribution distance o Average squared from the difference of the mean is scores from the mean ___.” o Can be calculated for either population or sample – formula changes with each. Equation o Highest – lowest = range Xmax – Xmin= range o Population variance: 𝜎2 = Σ(𝑥 − μ)2 𝑛 o Sample Variance: 𝑠= Σ(𝑥−μ)2 𝑛−1 7 Standard o The square root of variance. Deviation o The average of the average. o Can be calculated for either population or sample – formula changes with each. o “The average distance from the average X was ___.” o Interval & Ratio ONLY. o Population Standard Deviation: 𝜎 = √Σ(𝑥 − μ)2 /n o Sample Standard Deviation: 𝑠 = √Σ(𝑥 − μ)2 /(n − 1) Day Seven Definition The raw score expressed in terms of how many standard deviations is score is away from the mean. Deviation score/SD. Purpose 1. Tell how extreme/typical a raw score is. Between z = (-)1or z = (+)1. 2. Helps us how to compare how extreme a score is across different samples and different characteristics & scales. Depression scale 1-30 with a score of 25; z=1 (Less extreme) Depression scale 1-10 with a score of 5; z=30 (more extreme) 3. How extreme a score is relative to others in the population or sample (NOT overall good/bad). Test to show preparedness for college: Z English = -2. Z Math = .75 More prepared for math. 4. In psychology, there are a variety of tests/assessments that have a known mean/SD for some groups (e.g. adults/kids) You can obtain a client’s raw score using known mean + SD for the scale, calculate their z-score. Allow you to evaluate how extreme they are relative to a reference group. Interpretation You need the direction, the amount, and the unit (SD). “X was # SD above/below the mean.