Psy 1191 Research Methods Workshop: Operational Definitions Introduction: The science of psychology tries to develop explanations of human behavior through objective observations. The procedures or operations that we use to objectively measure a variable is known as its operational definition. The operational definition gives the variable meaning within a particular study. Because the meaning of our study rests on how we objectively observe the construct or behavior of interest, developing a reliable and valid set of procedures for measuring our variables is crucial for the validity of any research study. Good operational definitions require that we first specify our constructs; developing reliable and valid operations is the last step of specifying constructs when we are designing our own studies. It is always easier to use an existing measure than to develop a new one. Be sure to check the literature for measures that have been successfully used in similar research. A careful reading of the "Procedures" and "Measures" sections of articles will give us information that will help us identify and evaluate the operational definitions used in published research studies. The features of a good operational definition vary depending on the study design. We will examine operational definitions for variables measured in observational, survey, and experimental studies. The meaning of our study rests on how we objectively observe the construct or behavior of interest. Thus, developing a reliable and valid set of procedures for measuring our variables is crucial for the validity of any research study. Good operational definitions require that we first specify our constructs (see Specifying Constructs workshop); developing reliable and valid operations is the last step of specifying constructs when we are designing our own studies. Behavioral Observation: Observational research requires careful attention to specifying where and how observations are made, what is observed, and how it is recorded. As a result, operational definitions in this type of research may be quite lengthy with multiple components. Let's say that you recently read Nancy Henley's theory of status, power, and dominance and want to study whether those in higher status positions are more likely to initiate informal, friendly touches and those in lower status positions are more likely to initiate more formal touches. Let's develop each part of our operational definition: You decide to attend a series of professional meetings sponsored by local businesses and observe members during the social hour before the meeting is called to order. What are the advantages and disadvantages of making observations at this setting? Next you need to decide how to do the observations. Will you pick "targets", unobtrusively observe them for the whole social hour, and count how many and what kind of touches they make (Strategy #1)? Will you wait until you observe a member touch someone and then record what kind of touch (Strategy #2)? List an advantage and disadvantage of each strategy. Before deciding on a strategy, you will definitely want to review the empirical literature on touch. Most of this literature uses touch as the unit of analysis (Strategy #2) because it yields a wider variety of touches. Now you need to decide what you will observe. Based on your reading of the literature, you want to identify formal and informal touches. You need to either find an existing measure or develop your own. In either case, you should have some idea of the content that needs to be included to adequately measure your variable. What types of touches should be included in the categories of "informal" and "formal" touches? If you develop your own coding system for touches, you need to make sure that you do a pilot test to make sure that your raters can reliably identify the behaviors of interest. The pilot test will tell you if you are missing types of touches, if you need to eliminate types that are never seen, if you need to combine categories, etc. Once you have a complete and usable set of codes/behaviors, you are ready to conduct and record your observations. When you are reading an observational study, look for pilot studies or descriptions of how this coding system was used in previous studies. The final step of developing your observations is to decide how you will record them. In this study, you want the observation process to be as natural and unobtrusive as possible. Let's say you have three choices: Paper and pencil tucked in a program or on a clipboard; Hand held organizer; Small tape recorder. What are the advantages and disadvantages of each strategy? Reliability and validity are issues for all operational definitions. We want accurate and reliable observations and we want our observations to validly reflect the variable of interest. If choosing an existing coding scheme, look for good inter-rater reliability. Remember that the more complex the behavior that is recorded, the more difficult it is to achieve good inter-rater reliability. Coding schemes that have been successfully used in other studies, demonstrate good construct validity. If you are developing your own measure, be sure to assess inter-rater reliability. At minimum, you should have good content validity. Let's go on to our next type of study: Surveys. From Wadsworth Publishing: http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/res_methd/science/science_07. html Psy 1191 Research Methods Workshop: Operational Definitions Surveys: Operational definitions for survey studies address the survey method, the type of question, and the question content. We will briefly address each issue. For a comprehensive view of all aspects of surveys see the Surveys Workshop. In addition, Survey Design Workshop addresses how to put the survey together. Methods: There are three methods for obtaining survey data that are commonly used in the literature: face-to-face interviews, telephone interviews, and self-administered questionnaires. Each strategy has its advantages and disadvantages. Face-to-face interviews are the best choice when you need to establish rapport with your participant. In a face-toface interview, you can show respect by attending carefully to the participant's responses, offering encouragement, and answering questions. The disadvantage of the face-to-face interview is that the social situation created might bias the participants' responses. They might not want to disappoint you or might feel hesitant to answer a question on a sensitive topic. Face-to-face interviews are also very expensive to administer. Telephone interviews have the advantage of offering some social distance since the participant cannot see you. It might be easier to answer a sensitive question when you cannot see the interviewer's reaction to your response. You can also answer participants' questions easily in a telephone interview and the cost is considerably less than a face-to-face interview. In addition, random digit dialing also makes it very easy to recruit a random sample from the general population. A major disadvantage to telephone interviews is that it is much easier to deny a request to a telephone request than a face-to-face request. Caller ID also makes it easier to refuse by not answering the call. Self-administered surveys have the great advantages of privacy and low cost. Participants can choose when it is convenient to sit down and answer the survey which gives them time to give more considered responses. This is a great advantage when asking sensitive questions. Self-administered surveys also are a very low-cost alternative, especially if given using the Internet. A major disadvantage of self-administered surveys is that participants cannot ask questions The response rates for this method are often very low. Let's say your counseling center decided to do a survey to find out what students know about the services offered and whether students ever used their services personally or recommended them to a peer. What method would you recommend and why? Types of questions: There are three types of survey questions: open-ended, closed-ended, and partially closed. Use open-ended items when it is important to have complete answers in the participants' own words. Open-ended items are particularly useful when questions are sensitive and you want to let the participant know that you are interested in the response, no matter what it is. For example, the question "What do you think are the best ways to discipline young children?" permits participants to have a wide range of responses. There is no suggestion in the question that there is a preferred method of discipline. Open-ended questions are particularly useful when beginning a new area of research. You need to have a good sense of the entire range of response in order to create valid, close-ended items. The major disadvantage to open-ended response is that they require more effort from the participants and take a great deal of time to score. Closed-ended questions limit responses to specific alternatives written by the researcher. Closed-ended items can use multiple choice, ranks, or likert ratings. Closedended items are easy for participants to answer and require the least effort of researchers to code and analyze. Closed-ended items are not appropriate when the expected responses are too complex to fit into a few number of categories. Extensive pilot research may be necessary to develop good items. Also, Multiple Choice, Rankings, & Likert scale items. Partially closed items are favored when you have a good idea of the range of expected responses but want to give the participant the opportunity to give an answer that is rare or that you did not consider. Participants write in their responses. Which of the following forms of discipline do you think are best for young children? (Check as many as apply) ___ Time out ___ Spanking ___ Redirection ___ No!, plus explanation ___ Other, please specify Content: The content and wording of items are critical for effective survey research. Review the specification of your construct and make sure that all of the dimensions are covered by survey questions. For example, if you were studying post-traumatic stress, you would want to make sure that the instrument that you selected included questions about symptoms of intrusion like nightmares and flashbacks, symptoms of avoidance like emotional numbing and going out of your way to avoid settings similar to the one where the trauma occurred, and symptoms of hyperarousal like vigilance and irritability. Make sure that each item addresses only one issue. Sometimes we are tempted to include more than one idea in an item in order to "soften" the statement. This can result in problems when trying to interpret the answers. For example, in a personality test, participants are asked to agree or disagree From Wadsworth Publishing: http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/res_methd/science/science_07. html Psy 1191 Research Methods Workshop: Operational Definitions with the following statement, "I am a warm and friendly person". For most people, these characteristics will go together and a "yes" or "no" response will reflect their true character. It is possible, though, that people see themselves as warm to others but not particularly friendly. Should they agree because they have the characteristic of warmth? Or should they disagree because they are not friendly? Two separate items listing warmth and friendliness would be a better alternative. Avoid bias in the wording of survey questions. Biased items limit the range of responses you receive. For example the question "Should American citizens have the right to protect their families from harm?" is a question that often appears on questionnaires by those opposing gun control legislation. Most adults would answer "yes" to this question. However, if the question was worded "Should American citizens have unrestricted access to guns in order to protect their families from harm?" the same men and women may not answer "yes" to this question. Biased items should also be avoided because they may reveal the study hypothesis. Response alternatives should be clear, mutually exclusive, and exhaustive. Pilot testing can help you determine whether your questions and response meet these criteria. Participants should fit into one and only one category. If you were asking participants to rate the number of times they read the newspaper in the past month, the following categories would not be mutually exclusive [0; 1-5; 5-10; 10-15; 15-20; 20-25; 25-30]. Someone who read the newspaper 15 times could accurately fit in the 10-15 and the 15-20 categories. For the categories to be exhaustive, responses must fall into at least one alternative. If you assumed that everyone reads the newspaper, you might omit the "0" category. This would be a problem for someone who prefers to obtain their news from the television or Internet rather than the newspaper. You try it. Your school is considering moving alumni weekend to the same weekend as graduation. Most students think this will be chaotic but the administration believes that this will be a positive experience for alumni and it will give graduating seniors a chance to meet those who have successfully launched careers. The administration asks the psychology department to help design a questionnaire. What type of questions would you recommend for this project? Try writing one open-ended and one closed-ended question. Reliability: The reliability of survey instruments is usually assessed over items and occasions. Internal consistency estimates like Cronbach's alpha tell us how well multiple items assess the same underlying construct or dimension. A high Cronbach's alpha means that if a person scored high on one item they also tended to score high on the other items. When you have only a single item that measures your content, reliability is usually tested over time. The item is given on two different occasions and the responses are correlated. A high correlation means that the same or similar responses were given both times and the instrument or question is relatively stable. When choosing existing measures for a survey, look at how reliability was established and the level of reliability. Validity: It is important to establish the construct validity of survey measures. We often do this by correlating instruments with similar measures. Construct validity can also be established by predicting a specific behavior or criterion. Construct validity for existing measures is often established by being successfully used in a wide range of studies. Examining the results of the studies can tell us the various types of factors or behaviors associated with the construct assessed by this measure. Experiments: The operational definition of our independent variable is central in experimental research. How we set up the laboratory and the experimental manipulation is critical to the success of our experiment. Many experiments use music to induce a positive or negative mood. Energizing music like waltzes and mazurkas or calming music like sonatas are often used for the happy or positive condition. Complex music with many changes in tempo and tone (e.g., Wagner) or atonal music (e.g., Glass) are often used to induce negative moods. Typically, classical music is used. You want to use this manipulation for a class experiment but are worried that participants will not like the classical music that has been reported in the literature. If you wanted a genre that is more popular among college students, what would you use? Write down some of the factors you would need to consider to develop a manipulation that used more modern music. To go about developing the operational definition of positive and negative music, you would need to do a number of pilot studies in which students rate the music on a wide range of characteristics. You would need to have them compare different songs from different genres and use this information to develop music that will have the desired effect. Reliability and validity are usually established in experiments through pilot tests and manipulation checks. Test-retest reliability is often calculated. Manipulation checks reveal whether the experimental manipulation produced the desired group differences. This establishes the construct validity of the operational definition. From Wadsworth Publishing: http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/res_methd/science/science_07. html