Conducting Social Research Statistical Principles and An Overview of Regression Analysis Univariate, Bivariate, and Multivariate Statistics Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Basic Notation Y Yi A random variable (data vector) that we want to model. The ith observation in our data vector. Y 4,5,6,7,8 Y2 5 Notation Notation: It varies, so be flexible. Conducting Social Research Basic Notation Y Y-Bar is the mean of Y. Yi The observed value of ith observation. Yˆ i Y-Hat is the estimated or predicted value of ith observation. Conducting Social Research Random Variable • A variable whose numerical value is determined by chance, the outcome of a random phenomenon. • Discrete has a countable number of values. • Continuous can take on any value in an interval. Is “statistical anxiety” continuous or discrete? Conducting Social Research Probability • Probability is the likelihood or chance that something (an event) is the case or will happen.* • The probability of an event is represented by a real number in the range from 0 to 1.* • An impossible event has a probability of 0, and a certain event has a probability of 1.* P(A), p(A) or Pr(A) * P[X] ** *Wikipedia **Studenmund Conducting Social Research Probability Distribution • Assigns probabilities to the possible values of a discrete variable. P[X] + P[Not X] = 1 P[Not X] = 1 - P[X] In the Statistical Anxiety Survey data, what is the probability of having taken a previous statistics course? Of not having taken one? Conducting Social Research Normal (Gaussian) Distribution The Bell Curve Conducting Social Research Law of Large Numbers • The first theorem of probability that describes the long-term stability of a random variable. • Given a sample of independent and identically distributed (iid) random variables with a finite population mean, the average of these observations will eventually approach and stay close to the population mean. Conducting Social Research The Central Limit Theorem • The second theorem of probability that describes the distribution of a random variable. • Given a sample of independent and identically distributed (iid) random variables with a finite, nonzero standard deviation, the probability distribution approaches the normal distribution as the sample size increases. Conducting Social Research Sampling • Population is the entire group of items of interest. • Sample is the observed part of the population. Is the Statistical Anxiety Survey data sample or population based? Conducting Social Research Statistical Inference • The use of a sample to draw conclusions about the population from which the sample came. • Inference is necessary because it is often impractical to “scrutinize” the entire population. Are medical blood tests based on inference? Is the U.S. Census based on inference? Conducting Social Research Random Sampling • The use of a sample to draw conclusions about the population from which the sample came. • Inference is necessary because it is often impractical to “scrutinize” the entire population. Are medical blood tests based on inference? Is the U.S. Census based on inference? Conducting Social Research Selection Bias • The exclusion or underrepresentation of certain types of respondents/observations in a sample, resulting in a nonrepresentative sample. Can you give an example of selection bias highlighted recently in the media? Is the Statistical Anxiety Survey data sample biased? Why or Why not? Conducting Social Research The Expected Value of a Random Variable • A weighted average of all the possible values of the random variable (population mean). E[ X ] X P[ X ] i i i Notation Notation: The italics don’t exactly conform to Studenmund. Remember to be flexible. Conducting Social Research The Variance of a Random Variable • The extent to which the values may differ from the expected value. • The expected value of the difference. E[( X ) ] ( X ) P[ X ] 2 2 2 i i i Conducting Social Research Similarity of Mean and Variance Formulas • Substitution of the squared difference for the value. X P[ X ] i i i ( X ) P[ X ] 2 2 i i i Conducting Social Research The Standard Deviation of a Random Variable • The square root of the variance. • Absolute value of the difference. • Residuals. E[( X ) ] 2 ( X ) P[ X ] 2 i i i Conducting Social Research Population Parameters and Sample Statistics Concept Mean Variance Standard Deviation Sample Statistic Y 2 sy sy Population Parameter E[ Y ] Var [ Y ] 2 y y Var [ Y ] Conducting Social Research Sample Statistics Example We have obtained a sample of 40 housing sales that took place somewhere in some year. The data contains two variables, price (in $’s) and size (total above grade finished area in feet2). Conducting Social Research Price and Size Do you think that price and size would be related to each other? Would one “cause” the other? Which variable would you consider to be independent (X) and which dependent (Y)? Why? Conducting Social Research Independent and Dependent Variables • X= Size and Y = Price • For a buyer the price that they are willing to pay is a function of the size of the house, along with other factors. • X= Price and Y = Size • For a builder the price that they want to receive for a home will determine its size, along with other factors. Conducting Social Research Univariate Statistics Conducting Social Research The Sample Mean of Price Y Y1 Y2 Y3 ... Yn / n n Y Yi / n i 1 $3 ,481,200 / 40 $87 ,030 Conducting Social Research Population Mean and Sample Means If we drew a second sample of 40 housing sales would the mean be exactly the same as the mean of the first sample? Is the sample mean exactly the same as the population mean? Conducting Social Research The expectation of the Sample Means E[ X ] E[ X ] • The Law of Large numbers. E[ X ] N ( , ) 2 • The Central Limit Theorem. Conducting Social Research The Sample Mean of Size X X 1 X 2 X 3 ... X n / n n X Xi / n i 1 177 ,097 / 40 4 ,427 Conducting Social Research The Sum of the Deviations The Zero-sum Property E( X i X ) ( X i X ) 0 E( Yi Y ) ( Yi Y ) 0 Conducting Social Research The Sum of the Squared Deviations Total Sum of Squares ( X X ) 405,415,59 9 i ( Y Y ) i 2 2 $114,245,084,000 Conducting Social Research The Sample Variance s ( X i X ) /( n 1 ) 2 X 2 405,415,599/39 10,395,271 2 2 sY ( Yi Y ) /( n 1 ) 114,245,084,000/39 2,929 ,361,128 Conducting Social Research Sample Standard Deviation s X s 3,224 2 X sY s $54 ,123 2 Y Conducting Social Research Bivariate Statistics (Skipping Ahead to Chapter 2) Conducting Social Research Covariance of X and Y s XY ( X i X )( Y Y ) /( n 1 ) 6,760 ,921,922 /39 173 ,356 ,972 Conducting Social Research Covariance of Y and Y is the Variance of Y sYY ( Yi Y )( Yi Y ) /( n 1 ) s 2 Y 114,245,084,000/39 2,929 ,361,128 Conducting Social Research Correlation of X and Y r s xy sx s y 173 ,356 ,972 3,224 * 54,124 .9934 Conducting Social Research Regression Analysis • Econometricians use regression analysis to make quantitative estimates of economic relationships that previously have been completely theoretical in nature. • Sociologists use regression analysis to make quantitative estimates of social relationships that previously have been completely theoretical in nature. • Political scientists use regression analysis to make quantitative estimates of political relationships that previously have been completely theoretical in nature Conducting Social Research The Basic (Theoretical) Linear Model y f ( x ) e.g. price f ( size ) f ( x) X 0 1 Y X 0 1 • β0 is the Y-intercept, the point at which the regression line crosses the vertical axis. • β1 is the slope of the regression line, a 1 unit change in Xi results in a β1 unit change in Yi. Conducting Social Research Change in the Expected Value of Y E[Y ] X i 0 1 i Other determinants of Y Y E[Y ] Y X i i i i 0 1 i Change in the Observed Value of Y Y E[Y ] X i i i 0 1 i i