SUMMARY - RESEARCH METHODS IN SOCIAL RELATIONS One to rule them all FEBRUARY 7, 2015 UCR Edited by Bree 19 1|Page Contents Taxonomy .............................................................................................................................................................................. 2 Chapter 1 – Ways of Knowing................................................................................................................................................ 5 Chapter 2 – Evaluating Social Science Theories and Research .............................................................................................. 9 Chapter 4 - Fundamentals of measurements ...................................................................................................................... 12 Chapter 5 – Modes of Measurement .................................................................................................................................. 14 Chapter 6 - Single-Item Measures in Questionnaires .......................................................................................................... 17 Chapter 7 - Scaling and Multiple-Item Measures ................................................................................................................ 21 Chapter 8 - Fundamentals of sampling................................................................................................................................ 26 Chapter 9 - Probability Sampling Method ........................................................................................................................... 28 Chapter 11 - Randomized Experiments ............................................................................................................................... 31 Chapter 13 - Nonrandomized Designs ................................................................................................................................. 36 2|Page Taxonomy Quality of: Types A Measure Construct Validity Reliability Description Threats How to improve Are we measuring what we think we are measuring? Face Validity Can we do it again and get the same result? Test-Retest Convergent Discriminant Split-Half Inter-Rater A Sample Sampling Does the Sample differ from the corresponding population? Selection Bias, Measurement or Response Bias, Random Sampling Nonresponse Bias A Study (research design) Internal Validity Can we say that A causes B? Threats: Selection, History, Maturation, Selection by Maturation, Random Assignment, Control Groups Other efforts Mortality, Instrumentation Artifacts External Validity Does our sample represent the society? How broadly applicable is the theory? Particularistic Random Sampling, Natural setting Replication Universalistic 3|Page Modes of Measurement Type Subtype Examples Randomization Sampling Probability/ Non-Probability Comments Assignment Between: Randomly assigned to Groups Within: Randomly assigned to order of treatment Observational Qualitative Direct Experimental Indirect Randomized Experiments Narrative Analysis Any None Oral History Any None Inductive Research. Participant Observation Any/None None Greater depth/breath of knowledge. Focus Groups Any None Interviews (face to face, telephone) Any None Questionnaires (mailed, internet) Any None Experience Sampling Any None Physiological Monitoring Any None Observation Any None Collateral Reports Any None Randomized Two Group Any Between Pretest-Posttest Two Group Any Between Solomon FourGroup Any Between BetweenParticipants Factorial Design Any Between Repeated Any Within Potential for strong external validity. For some things the best way to find out is to ask. For some things it’s better not to ask directly. Best Internal Validity, can have low External Validity. 4|Page Measures Design Crosssectional Comparison Non-Random Experiments QuasiExperimental Panel Study Any None Static Group Comparison Any None Pretest-Posttest nonequivalent Any None One Group Pretest-Posttest Any None Interrupted timeseries Design Any None Replicated Interrupted timeseries Design Any None Internal Validity not as strong as true experiments but external validity is improved 5|Page General 1. A construct is an abstract concept that we would like to measure (love, intelligence, aggression, selfesteem, success, taste perception). 2. The operational definition of a construct is the set of procedures we use to measure or manipulate it. 3. A social science hypothesis, naïve or not, is a falsifiable statement of the association between two or more constructs that have to do with human social behaviour. 4. Construct validity: to what extent are the constructs of theoretical interest successfully operationalised in the research? (Do you measure what you want to measure?) 5. To maximise construct validity, we need to measure each construct in more than one way – using multiple operational definitions and then comparing them to see whether they seem to be measuring the same things. (p.33) 6. Internal validity: to what extent does the research design permit us to reach causal conclusions about the effect of the independent variable on the dependent variable? (Can just conclusions be drawn about causality?) 7. External validity: to what extent can we generalise from the research sample and setting to the populations and settings specified in the research hypothesis? (Can we generalise our conclusions to our population?) 8. Correlational fallacy: inappropriately inferring causality from a simple association between two variables (correlation does not imply causality!!!) (p.36) 9. Qualitative research: research with non-numerical data 10. Quantitative research: research with numerical data 11. Reliability: something is reliable if repetition of research leads to the same results 12. Achieve reliability: large sample, test-retest, multiple methods, split-half 13. Results can be reliable but not valid – non-reliable results can never be valid! 14. When researchers create rather than measure levels of an independent variable, we call it a manipulated variable (with manipulated independent variables, the researcher must test the manipulation by subsequently measuring its effects to determine its construct validity) 15. Systematic error reflects influences form other constructs besides the desired one 16. Random error reflects non-systematic, ever-changing influences on the score (p.81) 6|Page 17. The reliability of a measure is defined as the extent to which it is free from random error. 18. The validity is the extent to which a measure reflects only the desired construct without contamination from other systematically varying constructs. 7|Page Chapter 1 – Ways of Knowing Place of Values in social science research - Society’s values may have both positive and negative effects on what should and can be investigated by social scientists. (ex. a study on how to increase the academic success of at-risk youth would be encouraged, whereas one on sexual behavior in teens is not) - Furthermore, the results of some social science research have been publicly misinterpreted and condemned. An example is that of Rind ET. Co, who published an article stating that sexual abuse in childhood does not lead to long-term negative outcomes on the individual. Contestability in social and physical sciences - The results of social science researchers seem to be more contestable than those of physical sciences, in the general population. This is due to: seemingly ordinary quality of most methods of observation and the fact that most topics researched are identifiable with personal convictions or political interpretations. - Even though they sometimes look like casual observations, SSC researches are not. Casual Observation: - The hypotheses derived from casual observation are useful to oneself in the sense that we have ideas about how others will behave and thus we can behave in such a way that will cause the desired response. - Note they are not always correct, as they have a general character - Example: Birds of a feather flock together Similarity results in increased contact. Construct – abstract concept that we would like to measure. Operational definition of a construct – set of procedures we use to measure or manipulate it. SSC Hypothesis – falsifiable statement of the association between 2 or more constructs that have to do with human behavior. Causal associations (construct causes construct) Theory – set of interrelated hypotheses that are used to explain a phenomenon and make predictions about associations among constructs relevant to the phenomenon. Sources of support for naïve theories (derived from casual observation): 1) Logical analysis (being unemployed depression divorce) 2) Authority (a doctor’s opinion for ex. On how to deal with a difficult child) 3) Consensus (getting others’ opinions) 5) Observation 6) Past experience 8|Page Toward a science of social behavior Most important threat – a biased conclusion; also, we can never accept a hypotheses of absolutely true, beyond a doubt. Empirical research – observation that is systematic in an attempt to avoid bias. Operationism – the assumption that all constructs can be measured or observed. Replication – an empirical research that reveals the same conclusion as one previously conducted. 9|Page Chapter 2 – Evaluating Social Science Theories and Research Theory about social behavior has three features: 1. Constructs that are if theoretical interest and that it attempts to explicate or account for in some way. 2. Describes associations among these constructs. 3. Theory incorporates hypothesized links between the theoretical constructs and observable variables that can be used to measure the constructs. Theory must generate testable hypothesis – a theory is comprised of hypotheses, which in turn are comprised of statements about associations among constructs. A variable is any attribute that changes values across people or things being studied. Theory is made up of hypotheses; 2 types: 1. Hypothesized associations among constructs 2. Hypothesized associations between constructs and observable variables. Falsifiable – could conceive of a pattern of findings that would contradict the theory. Necessary and minimum requirement for a theory 1. A productive theory is one that addresses some important phenomenon or social behavior that needs explication. 2. Theory must provide a plausible and empirically defensible explanation for that phenomenon. Measurement Research – conducted by examining whether two or more ways of measuring the same construct give the same results. Four different functions of empirical research: 1. Discovery 2. Demonstration 3. Refutation 4. Replication Discovery – develop/generate a hypothesis; inductive. Demonstration – demonstrating the proof of the hypothesis. Research designed to demonstrate a hypothesis is deductive rather than inductive. The hypothesis generates the research. Refutation – can never conclusively prove a hypothesis, possible to refute it. Replication – the only way to overcome biases one has to replicate the research. The variable used to measure the causal construct, crowdedness of classrooms – independent variable The variable used to assess the affected construct, educational achievement – dependent variable 10 | P a g e The degree to which both the independent and dependent variables accurately reflect or measure the constructs of interest – construct validity Internal validity concerns the extent to which conclusions can be drawn about the causal effects of one variable on another External validity – validity of generalized (causal) inferences in scientific studies, usually based on experiments as experimental validity. Variables measure not only the construct of interest but also what we might call constructs of disinterest – things we would rather not measure. Variables have three components: 1. The constructs of interest 2. Constructs of disinterest (other things that we do not want to measure) 3. Random errors Construct validity – best examined by employing multiple operational definitions, or multiple ways of measuring, and then comparing them to see whether they seem to be measuring the same things. Maximizing Internal Validity: Correlation; inappropriately inferring causality from a simple association between two variables – correlational fallacy – “correlation does not imply causality”. “Hidden third variable” problem, hidden because the researcher might have only measured X and Y but not Z. Reciprocal causation Selective placement is known as the selective threat to internal validity. Selection by maturation threat to internal validity. Research studies carried out with random assignment to the independent variable – randomized experiments. When control is not possible, quasi-experimental research design is used instead of a randomized experimental design. Quasi-experiments are those in which research participants are not randomly assigned to levels of all independent variables implicated in the hypothesis. Maximizing External Validity: The only way we can be confident about generalizing from a sample to a population of interest is to draw a random sample. - Theoretical basis Deductive and inductive refer to two distinct logical processes. Deductive reasoning is a logical process in which a conclusion drawn from a set of premises contains no more information than the premises taken collectively. All dogs are animals; this is a dog; therefore, this is an animal: The truth of the conclusion is dependent only on the method. All 11 | P a g e men are apes; this is a man; therefore, this is an ape: The conclusion is logically true, although the premise is absurd. Inductive reasoning is a logical process in which a conclusion is proposed that contains more information than the observations or experience on which it is based. Every crow ever seen was black; all crows are black: The truth of the conclusion is verifiable only in terms of future experience and certainty is attainable only if all possible instances have been examined. In the example, there is no certainty that a white crow will not be found tomorrow, although past experience would make such an occurrence seem unlikely. 12 | P a g e Chapter 4 - Fundamentals of measurements Constructs: Abstract concepts of discussion in theories (social status, gender roles, intelligence). These can be measured in many different ways. Variables: Concrete representation of constructs which are measurable. Operational definition: specifies how to measure a variable in order to assign a score as high, medium or low social power. Means by which we obtain the numbers/categories of variables. Sequence of steps or procedure to be followed to obtain measurements. Can be easily repeated by anyone. Definitional Operationism: Assumption that the operational definition is the construct, ignoring the facts that measurements can be affected by different internal and external factors. To avoid problems we advocate MULTIPLE operational definitions. Taking into account that each definition has a level of error and imperfection, place is left for improvement. “Observed Score= true score + systematic error + random error.” True score: function of the construct we are attempting to measure. Systematic error: Influences from other constructs besides desired one. Random error: non-systematic, ever changing influences on the score. Reliability: extent to which a measure is free from random error. “A measure can be totally reliable but invalid. Only when the results are reliable AND valid can they be used in a research.” To increase reliability, experiments should provide clear instructions, optimal testing situations, decrease people’s tendencies to make random error and simple mistakes. Types: Test-Retest reliability: Correlation between scores on the same measure administered on two separate occasions. Provides an estimate of measure’s reliability. Two occasions should be far enough apart so that subjects do not remember specific responses.- can be difficult to obtain !!!! People might not want to use their time for it or might remember the previous test. Other options 13 | P a g e Internal consistency: random error varies over time but also from question/test item to another within the same measure Split-half: set of items in the measure is split in half. Correlation between two halves provides an estimate of reliability. However it depends on the way items were split *Coefficient alpha* (preferred): Derives from correlation of each item to each other and does not rest on arbitrary choice of way of dividing. (Usually computed by software). Inter-rater Reliability: when there is possibility of random errors, a strategy could be to use several observers to rate the responses. Reliability can be estimated with the Coefficient alpha. Validity: extent to which a measure reflects only the desired construct without contamination from other systematically varying constructs. It requires reliability as a prerequisite. Face validity: evaluated by experts or judges who read/ look at a measuring technique and decide whether or not it measures what it names suggests. Convergent validity: overlap of alternative measures which tap the same construct but have different source of systematic error. Discriminant validity: measures that are supposed to tap different constructs. Validity is based on an assessment of how much one method of measuring a construct agrees with other measures of the same or similar constructs and disagree with measures of dissimilar constructs. Multitrait-multimethods Matrix (MTMM) is a table of correlation coefficient to evaluate convergent and discriminant validity of a construct. The table is based on principles that the more features 2 measurements have in common, the higher the correlation. Measurements share 2 sets of features: - TRAITS: underlying construct the measurement is supposed to tap. - METHODS: mode of measurement. Same traits+ same method = reliability coefficient Same traits+ diff method= convergent validity coefficient 14 | P a g e Chapter 5 – Modes of Measurement There are many forms or retrieving information. (Operationally define constructs). In this chapter, we review modes of measurement commonly used in research on social behavior. Direct Questioning: Paper-and-pencil questionnaire: Pros: Low costs, no interviewer bias, people can stay anonymous. Less pressure for immediate response. Cons: The sample quality is usually low because people do not respond very often. It is difficult to clarify certain questions (answer may not be reliable). You are very dependent on whether or not the person the person has enough knowledge to answer the questions. Face to face interviews: Pros: You can correct any misunderstandings either in the questions or the answers. You can add visual aids more easily. Quality of information is very high. Often a very high response rate. Cons: Costs, Interviewer effects (they influence the interview). For small area it is possible but not for very large geographic area. Telephone interviews: Pros: Often a high response rate. Do not impose strict limits on interview length. Same as face to face interviews. Motivate the respondent. Lower costs than face to face interviews. Speed (fast pace). Whoever is taking the interview can do it easily from for instance a laptop. Cons: Not everyone has a telephone. Not all numbers are listed in the directory. (This can be avoided by RANDOM DIGIT DIALING). Interviewer effects are possible. No visual aids. People will hate you. Direct Questioning via the Internet: Pros: 15 | P a g e It can be global. You can reach much more people. You can stay more anonymous than you are with email. Low costs. You can reach people via groups that would normally be hard to reach. Cons: Response rate is low. It is not always clear how many people have responded when you post it on a website (how many actually saw the questionnaire). Similar to mail surveys. You do not know if people are telling the truth. Hard to get unbiased selection. Experience Sampling: Basically writing down your experiences, for instance in a diary. Usually an app or something asks input on random occasions during the day, then you fill in a short questionnaire. Pros: Detailed information about the experiences of the respondents. Relatively short lapse in time between the event of interest and participants’ responses to it. Fewer participants are needed in order to meet the sample size. Cons: You are very reliant on the participant to generate the data according to the sometimes strict rules of sampling required to test the hypothesis of interest. Costs are high. Low anonymity. Indirect: Collateral Reports: Common when researching children. A third party (e.g. parents, teacher) responses to questionnaire or interview. Mostly used in combination with information from the actual participant. Pros: Potential to overcome biases inherent in self-reports of constructs of interest to social scientists. Cons: It may not always be reliable. Parents perceive things differently than children. It can double the costs of the study. Not everyone is willing to be such an informant. Observation: Observing social constructs. Pros: Relative objectivity of rating. Occasionally, by good fortune, a researcher can find video footage that is relevant but was taped for other purposes; this keeps the costs down. Can often be accomplished while observant is in their natural surroundings. Cons: Many constructs are not amenable to this type of measurements. Emotions are not always visible on the outside. Hard to test causal propositions. 16 | P a g e Physiological Monitoring: (And psychophysiology: study of the interplay of physiological systems and people’s thoughts, emotions etc.) Pros: People cannot control their own bodies (the outcome). Little concern of biases. Permits the introduction of time into hypothesis and research design. Cons: You need to introduce control factors to make sure the research is not too overwhelming and the participant does not get distracted. Considerable amount of expertise is required. Costs are very high (equipment). 17 | P a g e Chapter 6 - Single-Item Measures in Questionnaires This chapter focuses on developing questions and questionnaires that are reliable and valid, specifically on writing individual items to operationally define constructs. 1. Items – the questions or statements to which participants must provide a response 2. Response o Numeric (On a 1 to 10 scale) o Binary (True/ False) o Verbal reports (“describe in your own words…”) Steps to follow when planning and carrying out questionnaire research 1. Choose a mode of direct questioning. Consider costs and availability of information when deciding. Paper-and-pencil questionnaire Face-to-face/ telephone interview Questionnaire posted on the internet Experience sampling 2. Choose specific content areas to be covered, related topics can be included. 3. Decide which questions require follow up answers, which questions are most important etc. 4. Writing process starts, both open-ended and closed-ended questions can be used. 5. Consider question sequence and transition, and balance of open-ended and closed ended-questions. 6. Circulate draft to experts for suggestions for revision before pre-testing. 7. Questionnaire is pre-tested on same population as the actual study Interviewers should be aware of the purpose of the study and the aim of each question so they can note whether it is effective. Afterwards experimenters must discuss the answers with the participants to see if changes in formulation are necessary Open-ended responses can be changed to closed-ended responses where possible 8. Pre-test results are analyzed, if there are major changes there should be more pre-testing. 9. Final training of personnel (can be done in conjunction with further pre-testing) 10. Actual administration of questionnaire 11. Resultant date is coded and analyzed. All preceding steps are justified by their contribution to the validity of the conclusions. 18 | P a g e Factual Questions Error can arise from: o Overstating or understating income o Rounding off age o Memory failures because events happened to long ago o Memory telescoping – recalling events as more recent than their actual dates. o Specificity in questions is important Attitude and Belief Questions Problems can arise because o It is possible that participants do not have an opinion on a topic as they have not thought about it – can be distinguished through time taken to answer a question o Attitudes are multi-dimensional – can be measured using related questions discussed in chapter 7. o Attitudes have a dimension of intensity Sociometric Questions – a category of questions which measures interactions among members of a group. Sometimes questions are answered more honestly if the participants believe their answers can influence social arrangements, for example seating arrangements. Can provide info on the individual’s position in the group. Behavior Questions Questions should be specific. Shorter interval in answering usually increases accuracy. General Issues Number of questions: Avoid including too many unnecessary questions, but make sure there are enough to permit full understanding of the responses. Different questions for different subsets of respondents (Answer yes, proceed to question number… etc.) Sensitivity related to a question posed needs to be considered Wording Improperly worded questions can result in biased or meaningless responses; terms must be exact and simple Structure Use short questions Questions should simplify the respondent’s task as much as possible 19 | P a g e Alternatives should be made clear (do you think such and such policy should be implemented instead of this and that policy) Forced-choice format – when participants are forced to choose either one answer or the other Double-barreled questions – inappropriately combining two separate ideas while asking for a single answer (e.g. use mother or father instead of ‘parents’) Open-ended versus closed-ended questions Open ended pros: allow more detailed answers, can be used when full range of responses is unknown. Open ended cons: Cost, self-contradictory, incomprehensible, irrelevant answers. Closed ended pros: easily scored for analysis, can help clarify type of response required Often, both forms of questions are used. Responses can be suggested through interval scales “No opinion” should be factored in to avoid unreliable response Filter questions – intended to screen out respondents who do not have any knowledge or opinion on the issue Floaters – people who answer “don’t know” to filtered questions but would give a good answer when there is only an unfiltered question Question Sequence Start with simple questions, follow with main questions Keep topically related questions together Clear transitions Funnel principle – general questions should come first, followed by increasingly specific and detailed questions Split-ballot experiment – two or more versions of a questionnaire used for different subsets – any differences in response can be attributed to wording or sequence variation. Sensitive Content Request income using broad categories Sometimes long questions are more useful in gaining reliable responses as they seem less threatening Terminologies in relation to racial questions are important. Randomized response techniques – the interviewer does not know whether the answer pertains to the sensitive or innocuous question, therefore respondent’s privacy is respected to some measure. Interviewing Questionnaire eliminates biases from social desirability, conformity etc. 20 | P a g e A positive atmosphere should be created to elicit more accurate answers (e.g. introduce yourself) Each question should be asked in the exact way it is worded on paper Explanation of questions causes a change in the frame of reference Every question must be asked Be careful not to suggest a possible reply “Don’t know” may mean a genuine lack of opinion OR the participant avoiding answering – can be aided through evaluative feedback (“thanks, we appreciate your frankness”) Quote participants directly when answering open-ended questions Avoid bias: o Interviewers can influence analysis of response, e.g. if the interviewer is a socialist o Interviewers’ perception of the situation. Can never be overcome completely, but should be avoided as much as possible. 21 | P a g e Chapter 7 - Scaling and Multiple-Item Measures There are many possible ways to scale responses to a particular question: Scaling – Assignment of scores to answers to a question to yield a measure to a construct. Response Scale – The range of possible answers to a given question Multiple Item Measures – When scaling methods obtain multiples observations or ratings and combine them into a single score. e.g.: A congressional representative’s positions on a number of votes can be combined to give a single score measuring the representative’s liberalism/conservatism. Advantages of Multiple Item Measures: - Complexity of a measure of a construct is reduced due the creation of a single score that summarizes several observed variables in a meaningful way. - Allows researchers to test hypotheses about the nature of a construct, whether the construct constitutes a single dimension or several different dimensions. o If a series of variables all measure a single general characteristic of a construct it is unidimensional. o Low associations among variables imply that several dimensions exist meaning the construct is multidimensional. Levels of Measurement: Qualitative scales: • • Nominal (a.k.a. attributes) – Different categories (e.g.., sex, species, phylum, location). – Can have the option “other”. Ordinal (e.g.., social class, attitude scales): – Different categories. – Categories are rankable. – Does not provide information about interval or degree in difference between values. - Likert scale is also ordinal. 22 | P a g e Quantitative scales: • • Interval (temperature on Celsius and Fahrenheit): – Different categories. – Rankable categories. – Constant equal-sized Intervals (can be expressed numerically). – Numbers cannot be multiplied or divided because the scale does not have a true 0. Ratio (e.g., lengths, weights, volumes, capacities, rates): – Different categories. – Rankable categories. – Constant equal-sized Intervals. – Absolute Zero (physical significance, the zero actually signifies something) (e.g., temperature in Kelvin, time). Rating Scales for Quantifying Individual Judgments - Process of a judgment tasks involves forming a subjecting judgment of the position of the stimulus object along the desired dimension and then translating that judgment into a rating using the scale provided. - In order to prevent variety between judges from time to time a number of rating scales can be employed: - Graphic Rating Scale o - Itemized Rating Scales/Specific Category/Numerical Scales o - Judge indicates rating by selecting point on the running from one extreme of attribute to the other. Requite rater to select one of a small number of categories ordered by scale position. e.g.: Liberalism or conservatism is rated by selection of categories that best describe political viewpoint. Categories can be clarified by brief verbal descriptions. Comparative Rating Scales o Require judge to make comparisons. o e.g.: Is the applicant more capable than 10% of the candidates? Rank-Order Scale: Judges are required to rank individuals in relation to one another in regard to a characteristic. 23 | P a g e Self-Rating: - All above ratings can be used to secure individual’s ratings o themselves or someone else’s rating of them. - Assumption is that individuals are in a better position to report their own beliefs and feelings as long as individuals are willing to reveal them. - Concept of what constitutes a moderate or an extreme position can be different. - In order to ensure reliable/valid self-ratings: o Individuals should be told what attribute is to be rating and be given opportunity to recall their behaviors in past situations that are relevant to the judgment. o Must be motivated to give accurate rather than socially desirable ratings. Cautions of Construction/Use Rating Scale: - Halo Bias – The tendency for overall positive/negative evaluations of object being rated to influence ratings on specific dimensions. - Generosity Error – Overestimation of desirable qualities in people that are liked by the rater. - Contrast Error – Tendency for raters to see others as opposites of themselves on a trait. Arises from belief in one self’s personal trait. o e.g.: Someone very orderly rates others as relatively disorderly. Solutions: - Multiple raters and computation of mean rating for each rated object reduces impact of random errors. - Multiple raters and increasing the judge’s familiarity with the object or person being rated reduces halo bias. Development of Multiple-Item Scales: - Domain – The hypothetical population of all items relevant to the construct we wish to measure. o e.g.: Attitude statements - Domain Sampling – When a sample of items is drawn from the domain, and the person’s responses to those items estimate the desired construct as measured by the entire population of items. - Item construction must be performed under certain guidelines: o Items must be empirically related to the construct that is to be measured. o Items must differentiate among people who are at different points along the dimension being measured. o Ambiguous items must be avoided. o Items must be worded in both a positive and negative direction in order to avoid the ‘acquiescent response style’, the general tendency to agree with statements regardless of their content. 24 | P a g e Three Types of Multiple-Item Scales: 1. Differential Scales - Includes items that represent known positions on the attitude scale. Respondents are assumed to agree with only those items who position is close to their own and to disagree with items that represent distant positions. - Require items that have a definite position on the scale: items will elicit agreement from people with positions near the item’s scale value, but disagreement from others whose attitudes are more or less favorable. - Example of a nonmonotone item: The possibility of respondents to provide a greater opinion rather than simply monotone items, which are either clearly favorable or unfavorable to the object. - Advantages: - o Responses offer a check on the scale’s assumption. o Latitude of acceptance (the range of scale values that the subject agrees with) can be calculated. Disadvantages: o The construction procedure is lengthy and cumbersome. o The attitudes of judges influence their assignment of scale values to the items. o Lower reliabilities than other scales. 2. Cumulative Scales - Made up of a series of items with which the respondent indicates agreement/disagreement. - Designed in a way that a respondent who holds a particular attitude will agree with all items on one side of the position and disagree with other items. - Presents scale score, in which the total number of items person agrees with is added. o - Bogardus Social Distance Scale presents this cumulative pattern. (p. 169) Advantages: o A single number (the scale score) carries complete information about the exact pattern of responses to every item. o Provides a test for unidimensionality of the attitude. - Items that reflect more than one dimension will not form a cumulative response pattern. Disadvantages: o A simple random error in responses can distort the perfect cumulative response pattern. o Limited to unidimensional domains. Finding such domains is difficult. 25 | P a g e 3. Summated Scales - Consists of a set of items to which the participant responds with agreement or disagreement. - Uses only monotone items – items are definitely (un-)favorable, no reflection of middle position. - Responds indicate a degree of agreement/disagreement to each item. - Scale score is derived by summing or averaging the numerically coded agree and disagree response to each item. - Interpretation: The probability of agreeing with favorable items and disagreeing with unfavorable ones increases directly with the degree of favorability of the respondent’s attitude. (I would read these last pages of the chapter on the Summated Scale because the examples and diagrams help to understand the text: p. 172 – 176) 26 | P a g e Chapter 8 - Fundamentals of sampling External validity 1. Varies for different research: in most survey research it is quite important (psychology=> internal validity more important) 2. The nature of the desired generalization can take different forms for different types of research. Some research aims specifically at drawing conclusions about a given target population => particularistic research goals. Other research aims at testing theoretically hypothesized associations, no specific population or setting as focus of interest=> universalistic research goals=> no interest in the ability to extend the research findings from the sample to some population. Instead, the applicability of the theory itself outside the research context is the central question, and sampling is of little/no concern SO: research can serve different purposes. The relative priority of the various research validities depends on what the researcher is trying to accomplish!! Some basic definitions and concepts Population: is the aggregate of all of the cases that conform to some designated set of specifications. Stratum: is defined by one or more specifications that divide a population into mutually exclusive segments Population element: a single member of a population Census: is a count of all the elements in a population and/or a determination of the distributions of their characteristics, based on info obtained of each of the elements. Sample: when we select some of the elements with the intention of finding out something about the population from which they are taken, we refer to that group of elements as a sample. Sampling plans: carry the insurance that our estimates do not differ from the corresponding true population Margin of error: helps to estimate how correct the percentages are. Confidence level: the probability of samples that would produce correct results again. Representative sampling plan: sampling plan that carries confidence level insurance. It ensures that the odds are great enough so that the selected sample is sufficiently representative of the population to justify running the risk of taking it as representative. Sampling Error: The difference in the distribution of characteristics between a sample and the population as a whole. SAMPLING The basic distinction in modern sampling theory is between non-probability and probability sampling: Non-probability sampling 27 | P a g e Accidental samples: We simply take the cases that are at hand, continuing until the sample reaches a certain size. (Ex: take the first 100 people that we meet on the street who are willing to participate). Disadvantage: Very low external validity. Quota samples: adds insurance to guarantee the inclusion of diverse elements of the population and to make sure that they are taken account of in the proportions in which they occur in the population. - Almost accidental sampling, but forcing inclusion of i.e. minorities (while trying to keep proportions). Purposive samples: basic assumption is that with good judgment and an appropriate strategy, we can handpick the cases to be included and thus develop samples that are satisfactory in relation to our needs. - Common strategy: Pick cases that are judged to be typical of the population in which we are interested, assuming that errors of judgment in the selection will tend to counterbalance one another. - Snowball Samples: When research question concerns a special population whose members are difficult to locate (Ex: research into gang affiliated people). The snowballing results from members of an initial sample from the target population enlisting other members of the population to participate in the study. If they bring more people in, sample grows. Probability sampling Probability sampling provides the first kind of insurance against misleading results and they provide a guarantee that enough cases are selected from each relevant population stratum to provide an estimate for that stratum of the population. Types of probability sampling: Simple Random Samples: Selected by a process that not only gives each element in the population an equal chance of being included in the sample, but also makes the selection of every possible combination of the desired number of cases equally likely. Requires either a list or some other systematic enumeration of the population elements (sampling frame) Random number generator Careful: don´t use systematic sampling (i.e. every tenth element) Stratified Random Sampling: the population is first divided into two or more strata, then a sample random sample is taken from each stratum, and the subsamples are then joined to form the total sample 28 | P a g e Cluster Sampling: You divide your population into groups (e.g. states of America) and randomly pick some of the groups to (randomly) sample from. - Multistage area sampling: You cluster sample from your cluster sample to create a smaller research area. (E.g. you take a random city from your random state.) This can have as many levels as you want. (Countries, states, cities, streets, houses, etc.). Multilevel Samples: Combines any of the above methods. Chapter 9 - Probability Sampling Method Basic Probability Sampling Methods Simple Random Sampling - Requires list or other systematic enumeration of the population elements => sampling frame. E.g. currently enrolled students in university. Each population element having an equal and independent probability of being sampled + every possible combination of elements of a particular number has an equal probability of being drawn - Make the sample: as an example a list composed of 1 672 elements and your sample size is of 160. Then make a list of random numbers referring to the elements in the list, a high-quality computer random number generator could do that as well. (See later) => Each population element has an equal and independent chance of being included in the sample. Pitfalls Simple Random Sampling: 1) The procedure of systematic sampling (way of choosing elements that is not independent) can create important biases. E.g. noticing that the desired sampling size of 160 is approximately 1 in 10 population elements, you might pick a random number btw 1 and 10 (say, 6) and sample the sixth, sixteenth, twenty-sixth and so.. That is, every tenth element after the random start. Even if every element has an equal chance of being chosen because of the random choice of starting point; this is not a simple random sample because the selection decisions are not independent. 2) The treatment of population elements that are ineligible for sampling. Lists including eligible and ineligible elements. It might seem appropriate simply to select the next element on the list when the sampling procedure comes up with an ineligible name. This method would introduce bias and violate the nature of probability sample. Names that follow ineligible ones would have double the usual probability of selection. The correct procedure is to draw a larger sample in the first place, large enough to compensate for the loss of ineligible elements. E.g. if final sample of 160 is desired, and the estimation of ineligible elements is 20% of the list, then 29 | P a g e the initial sample size would be: 160 x 1.20 or 200. After the expected 20% loss, the remaining sample would approximate the desired size and retain the property of being statistically correct simple random sample. Obtaining and Using Random Numbers The practical question of how best to obtain a list of random numbers. Two categories of strategies: published tables and random numbers generators (software or hardware based). Stratified Random Sampling The population is first divided in two or more strata. The strata can be based on a single criterion (e.g. gender) or a combination of two or more criteria (e.g. age and sex). From each stratum a simple random sample is taken, and the subsamples are then joined to form the total sample. Table 9.2 compares simple random sample and stratified samples (on stratified by sex, the other by age). Observations and conclusions: - There is a marked improvement over SRS (simple random sampling) when the sampling is based on a stratification of the population by sex. The result is marked increase in a number of samples that give means very close to the population mean and a marked reduction in the number of sample means that deviate widely from the population mean. - When stratified by age however there is no improvement in the efficiency of sampling. Stratification contributes to the efficiency of sampling if it succeeds in establishing classes that are internally comparatively homogenous with respect to the characteristics being studied (if the difference between classes are large in comparison with the variation within classes). Sampling the various strata in different proportions It is not necessary for the strata to reflect the composition of the population. Thus, in sampling from a population in which the number of males equals the number of females, it is permissible (and might be desirable) to sample for instance 5 females to every male. However it is then necessary to make an adjustment to find the mean score of the sample that will be the best estimate of the mean score for the total population => “weighting”. Reasons? - Sometimes necessary to increase the proportion sampled from classes having small numbers of cases in order to guarantee that these classes are sampled at all. - We might want to subdivide the cases within each stratum for further analysis. - Mathematical reason. Consider two strata, one of which is more homogeneous with respect to the characteristics being studied than the other. For a given degree of precision, it will take a smaller number of cases to determine the state of affairs in the first stratum than in the second. E.g. if with respect to certain types of opinion questions, men differ among themselves much more than women, we would accordingly plan our sample to include larger proportion of men. If it is the case that women would be expected to be more alike than men in these matters, they do not have to be sampled as thoroughly as do the men for a given degree of precision. In general terms, we can expect the greatest precision if the various strata are sampled proportionately to their relative variabilities with respect to the characteristics under study rather than proportionately to their relative sizes in the population. 30 | P a g e In summary, the reason for using a stratified rather than a simple random sample plan is essentially a practical one: more precise estimates of population values can be obtained with the same sample size under the right conditions. The right conditions involve relative homogeneity of the key attributes within each stratum and easy identification of the stratifying variables. Sampling Error The difference in the distribution of characteristics between a sample and the population as a whole. Because we cannot measure the entire population, we can only estimate the extent of sampling error for a given sample. Calculate the degree of confidence. In SSC, the convention is to accept estimates about which there is a 95 % confidence that the estimate is correct. Make statements such as: “With 95% confidence we can say that there is X % sampling error given the size of our sample”. Sampling error decreases when size of a random sample increases. This property of random samples is easy to see in the formulas used to calculate margin of error. When estimating with 95 % confidence that margin of error associated with an estimate of a proportion, we can use the formula: 1.96 x V P(1 – P)/ N. N = sample size and P = the proportion of the sample that displayed the behavior of interest to us. Because N is the denominator, increasing N always reduces the estimate of sampling error. NB: Sampling error and the oft-cited margin of error associated with public opinion polls are estimates based on a set of assumptions that rarely are met. Such estimates are idealized values that assume a truly random sample from the population and no extraneous influences associated with others aspects of the polling. 31 | P a g e Chapter 11 - Randomized Experiments Randomized experiments are highly specialized tools. They are ideally suited for the task of causal analysis. The main strength of randomized experiments is their internal validity, which is accomplished through the researcher’s assumption of control over the independent variables in the design. To use a randomized experiment, the researcher must be in a position to decide which participants are assigned to which level of the putative cause. The control for a randomized experiment is most easily achieved in a laboratory setting. Controlling and Manipulating Variables Independent variable: the variable that has a causal influence on our outcome variable Dependent variable: is the outcome variable; its values depend on the independent variable Individual difference variables: all variables that people bring with them to a study and are virtually impossible to manipulate, (they are a type of independent variable) Ex: religion, income, education, etc. Experimental variables: properties an experimenter can manipulate or expose people to. Randomized experiment: individuals are randomly assigned to the various levels of the independent variable. Random Assignment (randomization): a procedure we use after we have a sample of participants and before we expose them to a treatment. It is a way of assigning participants to the levels of the independent variable so that the groups do not differ as the study begins. All participants have an equal chance of being assigned to the various experimental conditions. (Maximizes the internal validity of research) Works only on average. Random Sampling: is the procedure we use to select then the first place the participants we will study. Used to make sure the participant group we study is representative of a larger population. Also a ‘fair’ procedure where all participants have an equal chance of being included in the study. Repeated measures design: rather than some participants being in one condition and some in the other, all participants are in both conditions. 32 | P a g e Counterbalancing: the practice of varying the order of experimental conditions across participants in a repeated measures design. It is important because it helps assure internal validity and it helps control for possible contamination or carryover effects between experimental conditions. Threats to internal validity: Selection: refers to any preexisting differences between individuals in the different experimental conditions that can influence the dependent variable. Selection is always a threat to validity any time participants are not randomly assigned to a condition. Maturation: involves any naturally occurring process within persons that could cause a change in their behavior. Ex: fatigue, boredom, growth, intellectual development. History: refers to any event that coincides with the independent variable and could affect the dependent variable. Could be any historical event that occurs in the political, economic or cultural lives of the people we are studying. Instrumentation: is any change that occurs over time in measurement procedures or devices. Ex: experimenter found better way to collect data. Mortality: refers to any attrition of participants from a study. Ex: if some participants do not return for a posttest of if participants in a control group are more difficult to recruit than participants in a treatment group, these differential recruitment and attrition rates could create differences that are confused with effects of the independent variable. Mortality is a problem with longitudinal research. The greater the mortality, the less representative the final participant sample. Differential mortality: when mortality rates are different for the various experimental groups. Also creates a treat to internal validity. Selection by Maturation: occurs when there are differences between individuals in the treatment groups that produce changes in the groups at different times. For examples of these threats on internal validity: p. 249-252 Construct Validity of Independent Variables in a Randomized Experiment Operational Definition: the procedure used by the researcher to manipulate or measure the variables of the study. Construct validity: making sure that our variables capture the construct we wish to measure. 33 | P a g e Manipulated variable: when researchers create rather than measure levels of an independent variable. Manipulated Checks: when researchers demonstrate the validity of their manipulated variables, they generally obtain a measure of the independent variable construct after they have manipulated it. Alternative Experimental Designs: see page 254-264 for graphs and examples. Strengths & Weaknesses of Randomized Experiments: By randomly assigning people to experimental conditions, experimenters can be confident that the subsequent differences on the dependent variable are caused, on average, by the treatments and are not preexisting differences among groups of people. Randomized experiments can rule out many alternative explanations. Experimental Artifacts: Refers to an unintended effect on the dependent variable that is caused by some feature of the experimental setting other than the independent variable. Ex: experimenters can unwittingly influence their participants to behave in ways that confirm the hypothesis, particularly if the participants want to please the experimenter. External Validity Experimental designs and procedures maximize the internal validity of research- they enable the researcher to rule out most alternative explanations or threats to internal validity. Experimenters might maximize internal validity at the expense of the external validity or of the results. Laboratory experiments are criticized because they are poor representations of natural processes. Being artificial is not necessarily a disadvantage. Some lab analogues are more effective than their realistic. Another criticism to experiments is the representativeness of the research participants. A drawback of randomized experiments is that they are rarely yield descriptive data about frequencies or the likelihood of certain behaviors that we can generalize to the rest of the population. An important difference between how probability surveys and experiments are usually conducted is that probability surveys enlist a random sample of respondents who are representative of some larger population. A survey provides descriptive data about the population. An experiment, on the other hand, usually does not make use of a representative or random sample because the purpose of the experiment is not to provide descriptive data about percentages of people in the population. The purpose of an experiment is to provide information about causes and effects. 34 | P a g e Randomized Two-Group Design: Participants are randomly assigned to the experimental treatment group or to a comparison group. No selection threat to internal validity (randomly assigned) No maturation threat (groups mature at the same rate) No instrumentation threat if groups were tested under similar conditions Pretest-Posttest Two-Group Design: Solomon Four-Group Design: Combines the Randomized Two-Group Design with the Pretest-Posttest Two-Group Design. This way the experimenter can see if the pre-test has an influence on the outcome (test the test effect). 35 | P a g e Between-Participants Factorial Design: In a factorial design, two or more independent variables are always presented in combination. The entire design contains every possible combination of the independent variables We can ask whether the effect of one of the independent variables is qualified by the other independent variable. If it does, the two independent variables are said to “interact” in producing o (dependent variable) Repeated Measures Design: Same group of people exposed to multiple (to be tested) treatments, instead of testing all the treatments on separate groups. The repeated measures designs are randomized experiments as long as we randomly assign participants to be exposed to the various conditions in different orders. Needs less participants. 36 | P a g e Chapter 13 - Nonrandomized Designs Science does not begin and end with the randomized experiment. Science is a process of discovery, in which researchers use the best tools available to answer their questions. - In nonrandomized designs the research participants are not randomly assigned to levels of the independent variable. - Instead the comparisons between levels or between treatment or non-treatment conditions must always be made with the presumption that the groups are non- equivalent. - As a result, the internal validity of these designs is threatened by the full range of threats discussed in chapter 11. - These designs are preferred however because - The relative sacrifice in internal validity can well worth the cost. Depending on the aspirations of the researcher and the context in which the research is conducted. This type of experimentation can be done for example: 1. Sociologists gather information from a representative sample of male members of the U.S. labor force to study their training and occupational attainments. 2. Medical researchers survey the nation’s population to determine the incidents of disease related factors. 3. Political sociologists survey a sample of students in large universities to determine whether they support or oppose a military draft in the United States. - In each of these cases the researchers are not interested in establishing cause and effect conclusions. - Rather, of central concern is measuring constructs well and gathering information from a representative sample of individuals. Cross-sectional comparison Form a class of research method that involve observation of some subset of a population of items all at the same time, in which, groups can be compared at different ages with respect of independent variables, such as IQ and memory. - Panel Design: This means that in an experiment the same people were re-interviewed regularly. Quasi-experimental designs One or more independent variables are manipulated but participants are not randomly assigned to levels of the manipulated variables. 37 | P a g e - Interrupted time-series design: “Time series” refers to the strategy of measuring a set of variables on a series of occasions (e.g. monthly) during a specified period of time. “Interrupted” refers to the strategy of introducing the stimulus or event during the period of assessment in order to evaluate its effect on the variables being measured. - Static-group comparison design: This design has as many groups as there are levels of the independent variable. As such, the independent variable varies between participants. - Alternative explanations: Different causes for the outcome of an experiment, due to different influences. - Pretest-Posttest Nonequivalent Control Group Design: Is an extension of the static-group comparison design that includes measures of the dependent variable at multiple points in time. - One group Pretest-Posttest Design: Also known as simple panel design in survey research, is based on withinindividual treatment comparisons. - Regression towards the mean: or statistical regression as it is also called, refers to the phenomenon that extreme scores are not likely to be as extreme on a second testing. - Under matching: Matching on variables known to be associated with the dependent variable always errs in the direction of under matching and, therefore, fails because we can never know when we have matched on enough variables to be sure the two groups represent the same population. 38 | P a g e Chapter 16 - Qualitative research In quantitative research the researcher does not impose structure or questions on the participant, but rather learns from listening to the participant discuss issues in his or her own voice. Narrative Analysis Narrative Analysis is an oral or written recitation of events in the past, so someone would recite their lives, or that of someone else. A Narrative Analysis can even be fictional, if you were to research fairy tales for example, and are asking people for their version. Focus groups A group of 6 – 10 individuals is brought together that discuss a topic of choice by the researcher. The researcher guides the discussion, also called the moderator. The moderator also processes the information found trough the discussion. Oral history Oral history is a method for recording extended life stories of individuals. Participant Observation Observing while participating, like those old-school anthropologist that went to some tribe to live with them and observe them. When they have reached a conclusion, that conclusion should hold true for every observation without exceptions.