Discrete variable- A variable with a basic unit of measurement that cannot be subdivided. (people) Hypothesis- A statement about the relationship between variables that is derived from a theory. Hypothesis are more specific than theories, and all terms and concepts are fully defined. Data- in social science research, information that is represented by numbers Descriptive Statistics- The branch of statistics concerned with 1)summarizing the distribution of a single variable 2) measuring the relationship between two or more variables. Measure of association- statistics that summarize the strength and direction of the relationship between variables. Theory- generalized explanation of the relationship between two or more variables. Populations- total collection of all cases in which the research is interested Inferential statistics- statistics concerned with making generalizations from samples to populations. Independent variable- variable that is identified as a causal variable. the independent variable is thought to cause the dependent variable. level of measurement- mathematical characteristics of a variable as determined by the measurement process. A major criterion for selecting statistical techniques. Statistics- set of mathematical techniques for organizing and analyzing data. Data reduction - summarizing many scores with a few statistics dependent variable- variable that is identified as an effect, result of outcome of something. Variable- any trait that can change values from case to case. Data value- value of variable associated with one element of population Data set-collection of measurements or observations Univariate- summarize one variable (GPA) Bivariate- describe relationship between two variables Mulitivariate- descriptive statistics relationship between three or more variables Raw Score-single measurement/observation Continuous variable- variable with a unit of measurement that can be subdivided. (rounding of the scores) Research- process of gathering information systematically to answer questions or test theories. Sample- subset of a population in inferential statistics. Discrete variable- A variable with a basic unit of measurement that cannot be subdivided. Hypothesis- A statement about the relationship between variables that is derived from a theory. Hypothesis are more specific than theories, and all terms and concepts are fully defined. Data- in social science research, information that is represented by numbers - the sum of scores Basic rule of precedence- find all squares and square roots first, multiple, divide, Sigma, add, subtract Validity- describe if it measures the concept it is intended to measure Reliability- quality of measuring instrument Nominal Level of Measurement: Categories are not numbers Gender, area code, provinces Cannot be ranked, added, divided Categories must be exhaustive (categories must exists for every score) Homogenous (comparable cases) No ambiguity exists mutually exclusive discrete Ordinal Level of Measurement: Ranked from high to low More or less = classified Limitation-scores position respect to other scores discrete Interval Level numbers Ordered categories exactly the same Distance is not equal onlly interval ratio level can be continous Ration Scale Equal differences on scale reflect equal difference in magnitude Distance is equal ****** Type Nominal Ordinal Ratio Interval dichotomous ****** Description Classification of objects -can only be discrete Variable can be ranked -can only be discrete Variables can be ranked, distance is equal -can only be continous Variables can be ranked, distance is not equal -can only be continous Variable comprises only two categories example -ethic groups -job satisfaction productivity Income, age ,salary temperature Gender ******** ***** Chapter 2 Percentage: %= x 100 -frequency over # of cases in all categories n Proportion: N -with small # of cases less than 20 report actual frequency -always report proportions and percentages Ratio compares parts to parts 23females/19 males =1.21 females for every male (1) f2 Rates # of actual occurrence divided by possible occurrence usually multiplied by 10 to eliminate decimal points crude death rate (CDR) multiplied by 1000 CDR= # of deaths X 1000 Total pop Percentage Change Measures increase or decrease in a score @ 2 different times o {(f2-f1)} X100 f1 -2nd set(-)1st set / divide by 1st set X100 Frequency distribution organized table of # of individuals in each category on the scale of measurement first step in any statistical analysis graph or table set of categories that make up original measurement scale record of the number of individuals in each category to understand how many times something has occurred first step in statistical data categories must be discrete 0-99/100-199/200-299 (class intervals) # of categories must be between 6 and 20 lower class limits- smallest # that can belong to different classe interval 0-99 upper class limits- largest number that can belong to different classe interval 0-99 Class midpoints-middle of two classes -add lower case to upper, divide by 2 State class limits: class intervals that organize variables into discrete, non-overlapping intervals. when stated as a discrete category Real class limits: divide distance between the class intervals and add to upper class, and subtract from lower class . ex: stated limits: 18-19, real limits: 17.5-19.5 when stated as a continous category Cumulative frequency and Percentage: give glance at how many cases fall below a given score in the distribution. research may want to make a point of how cases are spread acress the range of scores. Histograms each bar represents a range of values use real limits rather than stated limits values contact with each other show continuous variable display distribution of data used for continous, but commonly used for discrete interval ratio level frequency always on vertical axes tells you if the data is skewed right/left, bell-shaped Chapter 3 Measure of Central Tendency: Idea of the typical mean, median or mode case in the distribution Mode Mean frequency that occurs most frequently quick easy indicator of central tendency nominal-level variable seldom reported alone "average" add all values, divide by N of values most commonly used measure interval-ratio level, but also used ordinal-level (highly skewed distribution) X bar Weighted/aggregate mean o occurance of more than one value. o (Xi) x (2f) o multiply values by frequencies Value Frequency Value X frequency 97 4 97x4= 388 94 11 94x11 = 1034 92 12 92x12 =1104 91 21 91x21 = 1911 90 30 90x30= 2700 89 12 78 9 60 total 1 100 89x12= 1068 78x9 = 702 60z1= 60 8967 A) All scores cancel out around the mean. B) Uses all scores-strength, weakness affected by every score.(skews) C) Least squares principle- mean is closer to all scores than the other measures of central tendency. mean pulled in the direction of the extremes. Symmetric-mean and median having same value. Positive Skew- skewed to left. mean is higher in value than median Negative skew- skewed to the right, mean is lower in value than median. Median represents center of distribution of scores when N is odd, value of median is unambiguous (always a middle case) when N is even the score halfway between the two scores must be attained ordinal or interval-ratio measures position or location good choice in extreme values (outliers) household income=extreme values Percentile used for median. media is the 50th percentile identifies specific point of case find 37th percentile of 78, o 78 X .37 =28.86 is the case Decile divide distribution into 10's Quartile divide distribution into quarters 0 q1(25%) q2(50%) q3(75%) q4(100%) Measure of Dispersion How much variety of the distribution ex. how 'often' do graduates receive $40,000 per year. Chapter 4 Measure of Dispersion variety in a distribution the taller the curve=less dispersion, the flater the curve=more dispersion amount of diversity, heterogeneity R -range is the distance from highest value to the lowest o quick easy indication of variablity o ordinal, interval-ratio o limited= based on only 2 scores o no information about variation between high and low scores Q-interquartile range, only considers the middle 50% of cases in distribution boxplot- Diagram of LH o L--------Q1--------Q2--------Q3--------H o o based on information on five-number summary o Q2(median) is marked with vertical line o useful when two ore more data sets are being compared Good measure of dispersion: o use all scores in distribution o describe average deviation o increase in value as the distribution becomes more diverse Standard Deviation (S) uses all scores in distribution increases in value as the distribution of scores becomes more diverse distance between socres and mean (deviation) if scores are clustered around each other, deviation would be small, vise versa. value of S can increase with the inclusion of one or more outliers units of standard deviation are the same as the units of orginal data values average distance of each score from the mean interval-ratio, but often used with ordinal-level the higher the SD=more distribution, lower SD=less distributions 0 value- no dispersion N-1 is used when working with random samples rather than entire populations. Index of Qualitative Variation (IQV) only measur of dispersion for nominal level variables (but can be used with any variable) varies from 0.00 (no variation) to 1.00 (maximum variation) raio of the amount of variation observed in the distribution. Variance measure of variation equal to the square of the standard deviation s2 used in inferential statistics Coefficient of Varition presents the standard deviation as a percentage of the mean value allows you to compare the variability of different variables. The range rule of thumb-principle that many data sets (95%) of sample values lie within two standard deviation of the mean. Chapter 5 Normal distribution great importance combo of mean and SD can use normal distribution curve to contruct precise descriptive statements about empirical distributions theoretical model, frequency polygon or line chart that is 'unimodal' (single mode/peak) perfectly smooth, and symmetrical=mean, median and mode are same value. crucial point is distance along the abscissa (horizontal) Empirical rule 68% of all values fall within 1 standard deviation of mean 95% of all values fall within 2 standard deviations of mean 99.7% of all values fall within 3 standard deviations of the mean Z scores: percentage of are above, below or between scores in empirical distribution always have same value for mean and standard deviation convert the original units of measuremen into Z scores, "standardize the normal curve to a distribution that has a mean of 0. how many standard deviation units a case is above or below a mean a ruler from x to the mean when value is less than the mean, the Zscore is negative Ordinary values Zscore between -2 and 2 unusual values Zscore less than -2 <-2> Normal curve table Appendix A, detailed description of the area between Z score and the mean Probability method for measuring and quantifying the likelihood of obtaining a specific sample from a specific population. define as a fraction or a proportion. ratio comparing frequency of occurrence / total number of possible events in frequency distributions probability can be defined by proportions of distribution in graphs, can be defined as a proportion of area under the curve Unit normal table: lists different proportions of corresponding to each z-score location.