Experimental Methods in Social Ecological Systems Juan-Camilo Cárdenas Universidad de los Andes Jim Murphy University of Alaska Anchorage Agenda – Day 1 Noon –12:15 Welcome, introductions 12:15 – 1:15 Play Game #1 (CPR: 1 species vs. 4 species) 1:15 – 2:00 Debrief game #1 and other results from the field 2:00 – 2:15 Break 2:15 – 3:15 Game #2 (Beans game) 3:15 – 4:00 Debrief Game #2 4:00 – 4:15 Break 4:15 – 5:00 Basics of Experimental design Homework for Day 2: Think of an interesting question or problem to be worked in groups tomorrow Agenda – Day 2 8:30 – 9:15 9:15 – 10:15 Designing and running experiments in the field Classwork: work in groups solving experimental design problems 10:15 – 10:30 Break 10:30 – 11:15 Discussion on group solutions 11:15 – noon Begin design your own experiment (form groups based on best ideas proposed) Noon – 1:00 Lunch 1:00 – 1:30 Continue design your own experiment (work in groups) 1:30 – 2:30 Present designs 2:30 – 3:00 Feedback: how could we make this workshop better? Materials online We will create a web site with materials from the workshop. Please give us your email address (write neatly!!) and we will send you a link when it is ready. Why run experiments? Types of experiments 1. “Speaking to Theorists” Test a theory or discriminate between theories Compare theoretical predictions with experimental observations Does non-cooperative game theory accurately predict aggregate behavior in an unregulated CPR? Explore the causes of a theory’s failure If what you observe in the lab differs from theory, try to figure out why. Communication increases cooperation in a CPR even though it is “cheap talk” Why? Is my experiment designed correctly? What caused the failure? Theory stress tests (boundary experiments) Types of experiments (cont.) 2. “Searching for Facts” Establish empirical regularities as a basis for new theory In most sciences, new theories are often preceded by much observation. “I keep noticing this. What’s going on here?” The Double Auction Years of experimental data showed its efficiency even though no formal models had been developed to explain why this was the case. Behavioral Economics Many experiments identifying anomalies, but have not yet developed a theory to explain. Types of experiments (cont.) 3. “Whispering in the Ears of Princes” Evaluate policy proposals Alternative institutions for auctioning emissions permits Allocating space shuttle resources Test bed for new institutions Electric power markets Water markets Pollution permits FCC spectrum licenses Basics of Experimental Design Baseline “static” CPR game Common pool resource experiment Social dilemma Individual vs group interests Benefits to cooperation Incentives to not cooperate Field experiments in rural Colombia Groups of 5 people Decide how much to extract/harvest from a shared natural resource Total Level of Extraction by Others My Level of Extraction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 0 900 882 864 846 829 811 793 775 757 739 721 703 686 668 650 632 614 596 578 560 543 525 507 489 471 453 435 417 400 382 364 346 328 1 996 976 955 934 914 893 873 852 831 811 790 769 749 728 708 687 666 646 625 604 584 563 543 522 501 481 460 439 419 398 378 357 336 2 1087 1064 1040 1017 994 970 947 923 900 877 853 830 807 783 760 736 713 690 666 643 620 596 573 549 526 503 479 456 433 409 386 362 339 Low harvest levels (“conservative”) 3 1172 1146 1120 1094 1068 1042 1016 989 963 937 911 885 859 833 807 780 754 728 702 676 650 624 598 571 545 519 493 467 441 415 389 362 336 4 1252 1223 1194 1165 1137 1108 1079 1050 1021 992 963 934 906 877 848 819 790 761 732 703 675 646 617 588 559 530 501 472 444 415 386 357 328 5 1326 1295 1263 1231 1200 1168 1137 1105 1073 1042 1010 978 947 915 884 852 820 789 757 725 694 662 631 599 567 536 504 472 441 409 378 346 314 6 1395 1361 1326 1292 1258 1223 1189 1154 1120 1086 1051 1017 983 948 914 879 845 811 776 742 708 673 639 604 570 536 501 467 433 398 364 329 295 7 1458 1421 1384 1347 1310 1273 1236 1198 1161 1124 1087 1050 1013 976 939 901 864 827 790 753 716 679 642 604 567 530 493 456 419 382 345 307 270 Subjects choose a High harvest levels level of extraction 0–8 8 1516 1476 1436 1396 1357 1317 1277 1237 1197 1157 1117 1077 1038 998 958 918 878 838 798 758 719 679 639 599 559 519 479 439 400 360 320 280 240 Total Level of Extraction by Others My Level of Extraction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 0 900 882 864 846 829 811 793 775 757 739 721 703 686 668 650 632 614 596 578 560 543 525 507 489 471 453 435 417 400 382 364 346 328 1 996 976 955 934 914 893 873 852 831 811 790 769 749 728 708 687 666 646 625 604 584 563 543 522 501 481 460 439 419 398 378 357 336 2 1087 1064 1040 1017 994 970 947 923 900 877 853 830 807 783 760 736 713 690 666 643 620 596 573 549 526 503 479 456 433 409 386 362 339 3 1172 1146 1120 1094 1068 1042 1016 989 963 937 911 885 859 833 807 780 754 728 702 676 650 624 598 571 545 519 493 467 441 415 389 362 336 4 1252 1223 1194 1165 1137 1108 1079 1050 1021 992 963 934 906 877 848 819 790 761 732 703 675 646 617 588 559 530 501 472 444 415 386 357 328 5 1326 1295 1263 1231 1200 1168 1137 1105 1073 1042 1010 978 947 915 884 852 820 789 757 725 694 662 631 599 567 536 504 472 441 409 378 346 314 Payoffs also depend on choices of other 4 group members 6 1395 1361 1326 1292 1258 1223 1189 1154 1120 1086 1051 1017 983 948 914 879 845 811 776 742 708 673 639 604 570 536 501 467 433 398 364 329 295 7 1458 1421 1384 1347 1310 1273 1236 1198 1161 1124 1087 1050 1013 976 939 901 864 827 790 753 716 679 642 604 567 530 493 456 419 382 345 307 270 8 1516 1476 1436 1396 1357 1317 1277 1237 1197 1157 1117 1077 1038 998 958 918 878 838 798 758 719 679 639 599 559 519 479 439 400 360 320 280 240 Total Level of Extraction by Others My Level of Extraction 0 1 2 3 0 900 882 864 846 1 996 976 955 934 2 1087 1064 1040 1017 3 1172 1146 1120 1094 4 1252 1223 1194 1165 5 1326 1295 1263 1231 6 1395 1361 1326 1292 7 1458 1421 1384 1347 8 1516 1476 1436 1396 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 829 811 793 775 757 739 721 703 686 668 650 632 614 596 578 560 543 525 507 489 471 453 435 417 400 382 364 346 328 914 994 970 947 923 900 877 853 830 807 783 760 736 713 690 666 643 620 596 573 549 526 503 479 456 433 409 386 362 339 1068 1042 1016 989 963 937 911 885 859 833 807 780 754 728 702 676 650 624 598 571 545 519 493 467 441 415 389 362 336 1137 1108 1079 1050 1021 992 963 934 906 877 848 819 790 761 732 703 675 646 617 588 559 530 501 472 444 415 386 357 328 1200 1168 1137 1105 1073 1042 1010 978 947 915 884 852 820 789 757 725 694 662 631 599 567 536 504 472 441 409 378 346 314 1258 1223 1189 1154 1120 1086 1051 1017 983 948 914 879 845 811 776 742 708 673 639 604 570 536 501 467 433 398 364 329 295 1310 1273 1236 1198 1161 1124 1087 1050 1013 976 939 901 864 827 790 753 716 679 642 604 567 530 493 456 419 382 345 307 270 1357 1317 1277 1237 1197 1157 1117 1077 1038 998 958 918 878 838 798 758 719 679 639 599 559 519 479 439 400 360 320 280 240 893 873 852 831 811 790 769 749 728 708 687 666 646 625 604 584 563 543 522 501 481 460 439 419 398 378 357 336 Group earnings largest if all choose 1 Total Level of Extraction by Others My Level of Extraction 0 1 2 3 0 900 882 864 846 1 996 976 955 934 2 1087 1064 1040 1017 3 1172 1146 1120 1094 4 1252 1223 1194 1165 5 1326 1295 1263 1231 6 1395 1361 1326 1292 7 1458 1421 1384 1347 8 1516 1476 1436 1396 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 829 811 793 775 757 739 721 703 686 668 650 632 614 596 578 560 543 525 507 489 471 453 435 417 400 382 364 346 328 914 994 970 947 923 900 877 853 830 807 783 760 736 713 690 666 643 620 596 573 549 526 503 479 456 433 409 386 362 339 1068 1042 1016 989 963 937 911 885 859 833 807 780 754 728 702 676 650 624 598 571 545 519 493 467 441 415 389 362 336 1137 1108 1079 1050 1021 992 963 934 906 877 848 819 790 761 732 703 675 646 617 588 559 530 501 472 444 415 386 357 328 1200 1168 1137 1105 1073 1042 1010 978 947 915 884 852 820 789 757 725 694 662 631 599 567 536 504 472 441 409 378 346 314 1258 1223 1189 1154 1120 1086 1051 1017 983 948 914 879 845 811 776 742 708 673 639 604 570 536 501 467 433 398 364 329 295 1310 1273 1236 1198 1161 1124 1087 1050 1013 976 939 901 864 827 790 753 716 679 642 604 567 530 493 456 419 382 345 307 270 1357 893 873 852 831 811 790 769 749 728 708 687 666 646 625 604 584 563 543 522 501 481 460 439 419 398 378 357 336 Strong incentives to harvest more than 1 1317 1277 1237 1197 1157 1117 1077 1038 998 958 918 878 838 798 758 719 679 639 599 559 519 479 439 400 360 320 280 240 Total Level of Extraction by Others My Level of Extraction 0 1 2 3 0 900 882 864 846 1 996 976 955 934 2 1087 1064 1040 1017 3 1172 1146 1120 1094 4 1252 1223 1194 1165 5 1326 1295 1263 1231 6 1395 1361 1326 1292 7 1458 1421 1384 1347 8 1516 1476 1436 1396 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 829 811 793 775 757 739 721 703 686 668 650 632 614 596 578 560 543 525 507 489 914 994 970 947 923 900 877 853 830 807 783 760 736 713 690 666 643 620 596 573 549 1068 1042 1016 989 963 937 911 885 859 833 807 780 754 728 702 676 650 624 598 571 1137 1108 1079 1050 1021 992 963 934 906 877 848 819 790 761 732 703 675 646 617 588 1200 1168 1137 1105 1073 1042 1010 978 947 915 884 852 820 789 757 725 694 662 631 599 1258 1223 1189 1154 1120 1086 1051 1017 983 948 914 879 845 811 776 742 708 673 639 604 1310 1273 1236 1198 1161 1124 1087 1050 1013 976 939 901 864 827 790 753 716 679 642 604 1357 1317 1277 1237 1197 1157 1117 1077 1038 998 958 918 878 838 798 758 719 679 639 599 567 536 504 472 441 409 378 346 314 570 519 493 467 441 415 389 362 336 559 530 501 472 444 415 386 357 328 567 530 493 456 419 382 345 307 270 559 519 479 439 400 360 320 280 240 24 25 26 27 28 29 30 31 32 893 873 852 831 811 790 769 749 728 708 687 666 646 625 604 584 563 543 522 Social optimum: All choose 1 Nash equilibrium: All501choose 471 526 6 545 453 435 417 400 382 364 346 328 481 460 439 419 398 378 357 336 503 479 456 433 409 386 362 339 536 501 467 433 398 364 329 295 Comment on payoff tables The early CPR experiments typically used payoff tables. We don’t live in a world of payoff tables Frames how a person should think about the game A lot of numbers, hard to read Too abstract?? More recent CPR experiments using richer ecological contexts e.g., managing a fishery is different than an irrigation system Objective To explore interaction between: Formal regulations imposed on a community to conserve local natural resources Informal non-binding verbal agreements to do the same. Possible 2x3 factorial design External Enforcement Communication None Low Medium No Baseline Low Medium Yes Comm Only Low + Comm Medium + Comm • Groups of N=5 participants These 2 treatments have • Play 10 rounds of one of the 6 treatments been conducted • Enforcement ad nauseum. • Individual harvest quota = 1 (Social optimum) Are they necessary? • Exogenous probability of audit • Fine (per unit violation) if caught exceeding quota • Participants paid based on cumulative earnings in all 10 rounds Baselines and replication Replication In any experimental science, it is important for key results to be replicated to test robustness Link to previous research. Is your sample unique? Baseline or control group The baseline treatment also gives us a basis for evaluating what the effects are of each treatment In any experimental study, it is crucial to think carefully about the relevant control! Alternative design Stage 1 – Baseline CPR (5 rounds) Stage 2 – one of the 5 remaining treatments (5 rounds) Comm only Low Low + Comm Med Med + Comm Advantage – Having all groups play Stage 1 baseline facilitates a clean comparison across groups. Disadvantage – fewer rounds of the Stage 2 treatments. Enough time to converge?? Disadvantage(?) – All stage 2 decisions conditioned upon having already played a baseline Optimal sample size External Enforcement Communication None Low Medium No Baseline Low Medium Yes Comm Only Low + Comm Medium + Comm • Groups of N=5 participants • How many groups per treatment cell? John List’s notes on sample size Also see: John A. List · Sally Sadoff · Mathis Wagner “So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design” Experimental Economics (2011). 14:439-457 S Some Design Insights A. 0 (control) / 1 (treatment), equal outcome variances B. 0/1 treatment, unequal outcome variances C. Treatment Intensity—no longer binary D. Clusters Some Design Rules of Thumb for Differences in between-subject experiments Assume that X0 is N(μ0,σ02) and X1 is N(μ1, σ12); and the minimum detectable effect μ1– μ0= δ. H0: μ0= μ1 and H1: μ1– μ0= δ. We need the difference in sample means X1 – X0 to satisfy: 1. Significance level (probability of Type I error) = α: 2. Power (1 – probability of Type II error) = 1-β: Standard Case Power A. Our usual approach stems from the standard regression model: under a true null what is the probability of observing the coefficient that we observed? B. Power calculations are quite different, exploring if the alternative hypothesis is true, then what is the probability that the estimated coefficient lies outside the 95% CI defined under the null. Sample Sizes for Differences in Means (Equal Variances) • Solving equations 1 and 2 assuming equal variances σ12 = σ22: 2 s n0* n1* n * 2(ta / 2 t b ) 2 d • Note that the necessary sample size – Increases rapidly with the desired significance level (ta/2) and power (tb). – Increases proportionally with the variance of outcomes (s). – Decreases inversely proportionally with the square of the minimum detectable effect size (d). • Sample size depends on the ratio of effect size to standard deviation. Hence, effect sizes can just as easily be expressed in standard deviations. • Standard is to use α=0.05 and have power of 0.80 (β=0.20). • So if we want to detect a one-standard deviation change using the standard approach, we would need: • n = 2(1.96 + 0.84)2*(1)2 = 15.68 observations in each cell • ½ std. dev. change is detectable with 4*15.68 ~ 64 observations per cell • n=30 seems to be the magic number in many experimental studies: ~ 0.70 std. dev. change. Sample Size “Rules of Thumb”: Assuming α =0.05 and β = 0.20 requires n subjects: α = 0.05 and β = 0.05 1.65 × n α = 0.01 and β = 0.20 1.49 × n α = 0.01 and β = 0.05 2.27 × n Example from a recent undergrad research project Local homeless shelter was conducting a fundraising campaign. They asked us to replicate List’s study about the effects of matching contributions. The shelter wanted the same 4 treatments as in List: No match, 1:1, 2:1, and 3:1 to test whether high match ratios would increase contributions. Local oil company agreed to donate up to $5000 to provide a match for money donated. Fundraising example The shelter had funds to send out 16,000 letters to high income women in Anchorage who had never donated before. Expected response rate was about 3 to 4% (n480-640) Question: How many treatments should we run, if we expect about 500 responses? They said a “meaningful” treatment effect would be ~$25. Standard deviation from previous campaigns was ~$100. Sample size s n0* n1* n * 2(ta / 2 t b ) 2 d 2 2 * 2 100 n 2(1.96 0.84) 251 25 With only 500 expected responses, we could only conduct 2 treatments. Sample Sizes for Differences in Means (unequal variances) Another Rule of Thumb—if the outcome variances are not equal then: The ratio of the optimal proportions of the total sample in control and treatment groups is equal to the ratio of the standard deviations. Example: Communication tends to reduce the variance, so perhaps groups in this treatment. Treatment levels External Enforcement No Communication Yes None Low Medium High Baseline Low Medium High Comm Only Low + Comm Medium + Comm • How many levels of enforcement do we need? Do we need 3 levels of enforcement? High + Comm What about Treatment Levels? Assume that you are interested in understanding the intensity of treatment : Level of enforcement (e.g., audit probability) Assume that the outcome variance is equal across various cells. How should you allocate the sample if audit probability could be between 0-1? For simplicity, say X=25%, 50%, or 75% Assume that you have 1000 subjects available. Reconsider what we are doing: Y = XB + e One goal in this case is to derive the most precise estimate of B by using exogenous variation in X. Recall that the standard error of B is = var(e)/n*var(X) Rules of Thumb Linear ½ sample @ X=25% 0 @ X=50% ½ @ X=75% Quadratic ¼@X=25% ½@X=50% ¼@X=75% Intuition: The test for a quadratic effect compares the mean of the outcomes at the extremes to the mean of the outcome at the midpoint Intra-cluster Correlation What happens when the level of randomization differs from the unit of observation? Think of randomization at the village level, or at the store level, and outcomes are observed at the individual level. Classic example: comparing two textbooks. Randomization over classrooms Observations at individual level Another Example: To test robustness of results, you may want to conduct the experiments in multiple communities. How do you allocate treatments across communities, especially if number of participants per village is small? In our Colombian enforcement study, we replicated the entire design in three regions. In a separate CPR experiment in Russia, we visited 3 communities in one region. Each treatment was conducted 1x in each community. We are assuming that the differences across communities are small. Cannot make cross-community comparison Intracluster Correlation Real Sample Size (RSS) = mk/CE m = number of subjects in a cluster k = number of clusters CE = 1 + ρ(m-1) ρ s 2B s 2w = intracluster correlation coefficient = s2B/(s2B + s2w) = variance between clusters = variance within clusters Randomized factorial design Advantages Independence among the factor variables Can explore interactions between factors Disadvantages Number of treatments grows quickly with increase in number of factors or levels within a factor Example: Conduct experiment in multiple communities and use community as a treatment variable Fractional factorial design External Enforcement Communication Low Medium No Low Medium Yes Low + Comm Medium + Comm Say we want to add informal sanctions with a 3:1 ratio I can pay $3 to reduce your earnings by $1 1 new “factor” with 2 “levels” To run all combinations would require 2x2x2 = 8 treatments Assume optimal sample size per cell is 6 groups of 5 people (30 total per cell) 8 treatments x 30 people/cell = 240 people Assume you can only recruit about half that (~120) You could run only 3 groups per cell (15 people) – lose power/significance Solution: conduct a balanced subset of treatments Fractional factorial design If you are considering this Communication approach, there are a few different design options depending upon the effects you want to capture, number of treatments, etc. This is just one example! External Enforcement Fractional factorial design Advantage: dramatically reduces the number of trials Disadvantage: achieves balance by systematically confounding some direct effects with some interactions. It may not be serious, but you will lose the ability to analyze all of the different possible interactions. Nuisance Variables Other factors of little or no primary interest that can also affect decisions. These nuisance effects could be significant. Common examples Gender, age, nationality (most socio-economic vbls) Selection bias Recruitment -- open to whoever shows up vs random selection Experience Participated in previous experiments Learning Concern in multi-round experiments Non-experiment interactions People talking before an experiment while waiting to start In a community, people may hear about experiment from others Confounded variables Confounding occurs when the effects of two independent variables are intertwined so that you cannot determine which of the variables is responsible for the observed effect. Example: External Enforcement Communication None Low Medium No Baseline Low Medium Yes Comm Only Low + Comm Medium + Comm What are some potential confounds when comparing the Baseline with Low? Another design approach If trying to identify factors that influence decisions, try adding them one at a time. Imposing a fine for non-compliance differs from the baseline CPR in multiple ways. Possible confounds: FRAME The simple existence of a quota may send a signal about expected behavior, independent of any audits or fines. GUILT = FRAME + audit Getting audited may generate feelings of guilt because the individual is privately reminded about anti-social choices FINE = FRAME + GUILT (audit) + fine for violations Are people responding to the expected penalty? Or are they responding to the frame from the quota? 3 Sources of variability conditions of interest (wanted) 2. measurement error (unwanted) 1. People can make mistakes, misunderstand instructions, typos experimental material and process (unwanted) 3. No two people are identical, and their responses to the same situation may not be the same, even if your theory predicts otherwise. Design in a nutshell Isolate the effects of interest Control what you can Randomize the rest Some Practical Advice Some thoughts in no particular order Think carefully about your research question Formulate testable hypotheses grounded in theory How does your idea contribute to the literature? Think carefully about possible results and how they would be interpreted What if results are consistent with theory/expectations? What if they are not? Be prepared for either possibility Prepare code for data analysis BEFORE running experiments Forces you to think carefully about what your data will look like, and what you want to get out of it. Some thoughts on data analysis Are your data discrete, binary or continuous? Multinomial logit, ordered probit, logit, Poission, linear Repeated observations or one-shot decisions Random effects, hierarchical mixed models, nonparametrics More thoughts Subject payments and salience One distinguishing feature of economic experiments is that subjects are paid based on their decisions and possibly the decisions of others Must pay enough for subjects to take experiment seriously Avoid tournaments E.g., giving a bonus to person who earns the most money Typically pay in cash, in some field experiments may use another medium Never use deception! Keep earnings and decisions private Instructions Think carefully about every word in your instructions Framing effects “partner” in the UG or your “opponent” Could frame UG as an offer to “sell” at a price Using examples I used the example of $14/$6 split. Does that suggest proposers should take more than half? What if I used a 10/10 split? Or 6/14? Could give multiple examples… Experiment length Be aware that people get tired and bored Other stuff Strategy method Hot vs cold decisions Paying for just one round in multi-round game AB-BA designs for within-subject comparisons Playing multiple games and paying for just one Factor levels should allow for “enough distance” between hypotheses Social optimum is people will harvest 10% of the fish Nash equilibrium predicts 15%. Nash equilibrium & social optimum should be “farther apart”