Statistics Test 1 Study Guide

To bring with you: 1) Something to write with 2) Calculator (Separate from your phone or any other device that can log on to the internet. If you don’t have one, don’t go out and buy one just for this class–I’ll bring a few to the test and you can use one of those.) Topics on Test 1: 1) Math by hand a) Mean - X-bar - The average - Ask if we need to memorize the equation - n = the number of observations/cases 𝑥i= the value for each observation/ case, i Σ (sigma) means “add” In other words, The mean is equal to the sum of the values of x from 1 to n, divided by n - Add up all the values of a variable & divide by the number of quantities you added b) Variance - Essential to making statistical claims - Can think of as variability - Example: - Say you are doing a study of how depression changes over the life course, & the data look like this: - - - There is variance (or variability) in depression score: Different participants have different scores - There is no variance in age: Every participant is 21 years old - There is a terrible dataset for studying life course change in depression! It only represents one part of the life course - In other words: Age cannot possibly be the cause of the differences in depression scores bc everyone in the study is the same age Related to range - Measures of central tendency in relation to variance: - Mode: no way to discuss variance - Median: percentiles in the distribution - Mean: standard deviation Variance Calculation (Words) - 1st, you calculate the difference between each person’s value & the mean - 15-38.6 = -23.6 - 35-38.6 = -3.6 - 80-3 8.6 = 41.1 - 3-38.6 = 35.6 - 60-38.6 = 21.4 - 2nd, you square each of those difference 2 - − 23. 6 = 556. 96 - − 3. 6 = 12. 96 - 41. 1 = 1713. 96 - − 35. 6 = 1267. 36 2 2 2 2 - - 21. 4 = 457. 96 3rd, you add them up - 556.96 + 12.96 + 1713.96 + 457.96 = 4009.2 4th, you divide them by one less than the number of cases - 4009.2/(5-1) = 1002.3 ➢ Can’t have negative variance c) Standard deviation - Standard deviation is a statistic that pairs w/ the mean & helps describe the data by telling us about variance - Ex: - The mean of depression is (15+35+80+3+60)/5 = 38.6 - But 38.6 isn’t a great characterization of all 5 of these people - The standard deviation = square-root of the variance - √1002.3 = 31.66 2) Measurement a) Identify a response scale as ratio/interval/ordinal/nominal and explain why (explanation is important; we may award partial credit) - Continuous: Ratio - Has numeric units (in the “real” world outside of stats) - Different observers can agree that it means something objective to have zero of that quantity - For example, counts, durations, amounts, frequencies, distances, lengths… - How many friends do you have? (0,1,2,3,4…) - How many years old are you? - How many minutes did you wait to speak with a customer service representative? - How much money did you earn last month? - What percentage of answers did you get correct on the test? - How many days last week did you talk to your mother? - How many miles do you travel from your home to the nearest grocery store? ➢ Ideal in mathematical sense - - Continuous: Interval - Has numeric units (in the “real” world outside of stats) - Zero of that thing/idea is not meaningful, or is arbitrary, or is a social construct - For example - Dates (Lot of calendars don’t recognize the birth of Christ as Year 0 or January 1 as the 1st day of the year) ➢ Arbitrary bc based on historic events - Times (Greenwich mean time is the 0, but it’s entirely arbitrary) - Temperatures, historically ➢ For fahrenheit & celsius (tied to the freezing point of water) Categorical: Ordinal - Categories w/ no fixed distance between them ➢ Moved off the number line ➢ Don’t know that the distance is strictly equal - ➢ Varies systematically from person to person based on experiences - You can rank or order the categories according to some criterion - For example, agreement, frequency, rating, ranking… - How much pain are you in? (1-5 where 5 is the worst imaginable) - How satisfied are you with your purchase? (Extremely, very, some, a little, not at all) - Is your socioeconomic status low, medium, or high? - How often do you go jogging? (Never, rarely, sometimes, often) Categorical: Nominal - Has categorical units, w/ unknown distance between them - There’s no inherent order to the categories - For example, - Anything w/ a yes/no answer - Sex/gender - Race/ethnicity - Marital status - Choice of breakfast cereal b) Write a response scale that is ratio/interval/ordinal/nominal - Look up examples for inspo.-3) Central tendency (Mode, median, mean/standard deviation) a) Where to find them in Stata output - Often, you need to do something with the cases that said “don’t know,” or “no answer” before you can calculate a measure of central tendency. You have multiple options, including: - You can add “if” to your sum or tab command (this is a temporary fix, in that you’d need to add it every time you run the command) - - sum CIG30AV if CIG30AV<91, detail Replace: Allows you to replace values with other values, for example, when you want to tell Stata which values indicate missing or not applicable data - replace varname=newvalue if varname=oldvalue - - replace COCUS30A=. if COCUS30A>30 Here’s another example: Right now, nonsmokers get the code 91, and past smokers get the code 93. But if you’re interested in days smoked in the past month, you don’t have to set those people aside: They’re all people who smoked 0 days in the past month. So you could do this: - replace CIG30USE=0 if CIG30USE==91|CIG30USE==93 - Mode: - - Mode: tabulate, as above - Tab (the abbreviated version of “tabulate”): Presents a frequency distribution - tab varname - tab CIG30USE - To find the mode really easily: - tab CIG30USE, sort Median: - Sum (the abbreviated version of “summarize”): Presents the median (50th percentile) and mean - sum varname, detail - sum COCUS30A, detail - Mean: - - summarize (same as for median): Tip: If you leave off the “, detail” option, you get simpler output that includes only the mean, standard deviation, and range (i.e., no percentiles or information about skewness/kurtosis) - sum varname - sum COCUS30A Standard deviation: - Same as above b) Interpret each in a sentence - LOOK @ HW 1 ANSWER KEY WHEN UPLOADED FOR SPECIFIC EXAMPLES - Mode: - The most common value in the dataset - Note: It’s possible to have more than 1 mode, when 2+ values are equally most common– But I’m not going to give you any problems where that’s true - What’s the most typical or most frequent value for this variable - Median: - The middle value - The 50th percentile - 50% of the sample is higher, & 50% of the sample is lower - Mean: - The average - Standard Deviation - Standard deviation is a statistic that pairs w/ the mean & helps describe the data by telling us about variance c) Relationship to measurement–which stats apply to which types of measurement - Mode: All types of measurement - Median: Ordinal, interval, & ratio - Mean: Interval & ratio - In a dataset, we often want to know the characteristics of a “normal” participant– What is typical of the people in this study 4) Sampling a) Census vs. sample - A census is any study in which every single member of a group of interest is a participant - THE U.S. Census happens every 10 years & is a study of every person living in the U.S. ➢ THE census is an example of a census - THE Census is a massive, expensive, time-consuming undertaking - For these reasons, most studies use a sample of the group of interest– this is even what Census does in “off” years ➢ Polls aren’t a census, but are a sample - Population: The larger group of interest - In Census, every person living in the U.S. in 2020 - Could be all undergraduate students at BC in 2022 - Could be U.S. registered voters in the lead-up to an election - Sample: A subset of participants drawn from a population for the purposes of making inferences about that population ➢ When you have a large/ hard to reach completely population b) Types of random sampling - Any method where I can calculate the probability that each member of the population gets selected into the sample - The laws of probability dictate that if my sample is large enough & I use probability sampling, my sample will be representative of my population - i) ii) iii) iv) We like random sampling because we can estimate sampling error: how much the sample might differ from the population ■ If you can’t avoid it, it’s good to establish what it is Simple - Every member of the population has an equal probability of being in the sample, n/N, where n is the number of units to be selected & N is the total number of units in the population ➢ n = the number in sample - Lots of populations are already enumerated– that is there’s a list of all the units in the population (a sampling frame) - Pull names from a hat - Use a random number generator Systematic: a fixed interval between draws - Sample of the population of people who eat at Hillside - We might survey every 3rd person who walks in for some period of time Stratified - People w/ like characteristics = strata - We might stratify the BC population by class year - Then, conduct a simple random sample within each strata (i.e., freshmen, sophomores, juniors, seniors) ➢ Can ensure to have students from each class Multistage - A series of simple random samples - Example: Americans in states - First, I take a simple random sample of states - Second, I take a simple random sample of counties - Third, I take a simple random sample of residents within selected counties ➢ To make it easier Overall: ● Review slides and your notes ● Reread book sections To practice math by hand: ● Write out any small set of numbers. Calculate! To practice measurement: ● Open the General Social Survey data or the National Survey on Drug Use and Health. Tabulate variables and see if you can classify them. Tip: If you’re not certain, write out your reasoning–if you convince me on the test, I may give partial or full credit. ● You can also look at the text version of the General Social Survey online. Given that the Stata codebook for GSS isn’t very good, this might work better. Note that you need to scroll to page 30 or so before you start seeing questions. ● Click on “Topline Questionnaire” (upper right) for the results of a series of weekly surveys about coronavirus.

Statistics Test 1 Study Guide

Related documents

Products

Support

Statistics Test 1 Study Guide

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib