Experimentation INFO4990 – Week 6 Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 1 Agenda Experimentation in Computer Science and information systems research Basic experimentation concepts Some widely used experimental design in CS and IS field Analyze data from experiment study Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 2 History Experiment in natural science systematic acquisition of new knowledge, testing theory about nature Agriculture Chemistry … Experimentation in social, psychology and economic studies Study people’s behavior E.g., fairness study Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 3 Experiment in computer science research Derived from natural science experimentation Computer systems performance analysis Hardware Software Algorithm Network Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 4 Experimentation in Information System research Derived from social and economic experimentation Subject under study is usually human Human behavior with regard to information system Hyperlink transferred trustiness Which subject is most suitable for distance learning Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 5 Purpose of experiment Discover and confirm causal relationship Examine the possible influences that one factor or condition may have on another factor or condition Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 6 Basic experimentation concepts Independent variable Cause Research “measure” (manipulate) independent variable by creating a condition or situation Manipulation of independent variable create different treatments. Event manipulation Affecting the independent variable by altering the events that subjects experience Presence versus absence Instructional manipulation Varying the independent variable by giving different sets of instructions to the subjects Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 7 Basic experimentation concepts (cont) Effect (outcome) Physical conditions, behaviors, attitudes, feelings, or beliefs of subjects that change in response to a treatment. How to measure IS research: various data collection methods Questionnaire, interviews, observation, test CS research: Metrics in the field Performance time, rate, error rate, time to failure and duration Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 8 The importance of control Internal validity -- The extent to which we can accurately state that the independent variable produced the observed effect Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 9 Experiment cases A marketing researcher wants to study how humor in television commercials affects sales. To do so, the researcher studies the effectiveness of two commercials that have been developed for a new soft drink called Zowie. One commercial, in which a wellknown but serious television actor describes how Zowie has a zingy and a refreshing taste, airs during the months of March, April and May. The other commercial, a humorous scenario in which several teenagers throw Zowie at on another on a hot summer day, airs during the months of June, July, and the August. The researcher finds that in June through August, Zowie sales are almost double what they were in the preceding three months. “Humor boost sales,” the research concludes. Many alternative explanations Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 10 Strategies to achieve control Keep some things constant Include a control group What are variables that need to be held constant in most experiments? Treatment group (experimental group) Between-subjects design Randomly assign people to groups Use matched pairs Matched-subject design Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 11 Between and matched-subjects design 1 3 8 6 10 22 7 4 9 5 3 2 8 4 10 9 7 1 6 5 Random assignment 1 10 6 7 5 8 3 4 2 treatment control DV DV Monday, August 30, 2004 Randomly assign one member of each pair to each group 9 3 8 1 INFO4990 Information Technology Research Methods (July, 2004) 5 4 7 2 10 9 6 12 Steps in conducting an experiment Identify the relevant variables State hypotheses Decide on an experimental design Decide the way to manipulate independent variables Develop a valid and reliable measure for dependent variable Pilot testing the treatment and dependent variable measures Recruit subjects (or locate cases) Assign subject to groups Introduce treatment to treatment groups Gather data for measure of the dependent variables Hypotheses testing Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 13 Experimental design One shot case study True experimental design Factorial design Block design Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 14 Classic true experimental design pretestposttest Treatment Versus control group Randomized Experimental design Vertical alignment shows two Pretests are measured at same time http://trochim.human.cornell.edu/kb/desintro.htm Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 15 Factorial design Two or more independent variables are manipulated in a single experiment They are referred to as factors The major purpose of the research is to explore their effects jointly Factorial design produce efficient experiments, each observation supplies information about all of the factors Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 16 A simple example Investigate an education program with a variety of variations to find out the best combination Amount of time receiving instruction Settings 1 hour per week vs. 4 hour per week In-class vs. pull out 2 X 2 factorial design Number of numbers tells how many factors Number values tell how many levels The result of multiplying tells how many treatment groups that we have in a factorial design Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 17 Factorial designs in computer system performance analysis Personal workstation design Processor: 68000, Z80, 8086 Memory size: 512K 2M or 8M bytes Number of disks: one, two or three Workload: Secretarial, managerial or scientific User education: high school, college, postgraduate level Dependent variable Throughput, response time Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 18 22 factorial design Two factors, each at two levels Example: workstation design Factor 1: memory size Factor 2: cache size DV: performance in MIPS Cache size Memory size 4M byte 8M byte 1K 15 45 2K 25 75 Performance in MIPS 80 60 1K 40 2K 20 0 4M 8M Memory size Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 19 2K factorial design K factors, each at two level 2K experiments 23 design example In designing a personal workstation, the three factors needed to be studied are: cache size, memory size and number of processors Monday, August 30, 2004 Factor Level -1 Level 1 Memory size 4Mbytes 16Mbytes Catch size 1Kbytes 2Kbytes Number of processors 1 2 4 Mbytes 16 Mbytes Cache size (Kbytes) 1 proc 2 proc 1 proc 2 proc 1 14 46 22 58 2 10 50 34 86 INFO4990 Information Technology Research Methods (July, 2004) 20 Full and fractional factorial design Full factorial design Study all combinations Can find effect of all factors Fractional (incomplete) factorial design Leave some treatment groups empty Less information May not get all interactions No problem if interaction is negligible Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 21 2 factors full factorial design Used where there are two factors that are carefully controlled Examples in computer system performance analysis To compare several processors using several workload To determine two configuration parameters such as cache and memory size Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 22 2 factors full factorial design (cont) Example: cache comparison workload Two caches One caches No caches ASM 54.0 55.0 106.0 TECO 60.0 60.0 123.0 SIEVE 43.0 43.0 120.0 DHRYSTONE 49.0 52.0 111.0 SORT 49.0 50.0 108.0 Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 23 Field and controlled laboratory experiment Field experiment Experiments conducted in real-life or field settings Researcher has less control over the experimental condition Greater external validity but lower internal validity Controlled laboratory experiment Conducted under controlled conditions of a laboratory Greater internal validity but lower external validity Practical consideration Planning and pilot testing Instruction to subjects Post experiment interview Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 24 Example of field and controlled laboratory experiments Field experiment The case in slide 10 A controlled laboratory version Ask two group of subject (students) to view the tape of two different Ads (event manipulation). Use questionnaire to collect their intentions to buy the product. Compare the response from the two groups Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 25 Analyzing data from between subject design Problem You want to measure the acquisition of mathematical skills by distance learning and traditional classroom learning. The study involves the comparison of 20 students, ten taught in classroom and ten taught by distance learning program. The final test scores were collected as dependent variable. Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) DL CL 94 90 89 91 76 83 85 81 88 74 65 60 70 69 72 63 68 62 64 63 77.1 73.6 26 Why can’t we just compare the means The difference between the means is the same in all three. They tell very different stories When we are looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread of variability of their scores Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 27 T-test t-test Assesses whether the means of two groups are statistically different from each other Sample size is small Approximately normal distribution of the measure in the two groups is assumed Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 28 Perform t-test Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 29 Interpret result Set a significance level Degree of freedom t-Test: Two-Sample Assuming Equal Variances DL N1+N2 - 2 Compare t-value with critical value from tdistribution to see if it is larger enough to be significant Monday, August 30, 2004 Mean 77.1 Variance 120.7666667 Observations 10 Pooled Variance 131.5166667 Hypothesized Mean Difference 0 df 18 t Stat 0.682437133 P(T<=t) one-tail 0.251825559 t Critical one-tail 1.734063592 P(T<=t) two-tail 0.503651117 t Critical two-tail 2.100922037 INFO4990 Information Technology Research Methods (July, 2004) CL 73.6 142.2666667 10 30 Analyzing data from matched subject design Problem You want to compare the hit rate of a two cache algorithms. The simulated cache algorithms are running on 5 benchmarks and the hit rate were recorded Monday, August 30, 2004 Cache 1 Cache 2 0.91 0.95 0.67 0.65 0.85 0.90 0.73 0.80 0.93 0.97 0.818 0.854 INFO4990 Information Technology Research Methods (July, 2004) 31 Suitable test: Paired t-test Calculation of t-value t D 2 ( D ) D2 N N ( N 1) Cache 1 Cache 2 Difference D2 B1 0.91 0.95 -0.04 0.0016 B2 0.67 0.65 0.02 0.0044 B3 0.85 0.90 -0.05 0.0025 B4 0.73 0.80 -0.07 0.0049 B5 0.93 0.97 -0.04 0.0016 Total -0.18 0.011 Avg -0.036 t-Test: Paired Two Sample for Means Degree of freedom N-1 Monday, August 30, 2004 Cache 1 Mean 0.818 Variance 0.01292 Observations 5 Pearson Correlation 0.973040321 Hypothesized Mean Difference 0 df 4 t Stat -2.394684379 P(T<=t) one-tail 0.037393209 t Critical one-tail 2.131846782 P(T<=t) two-tail 0.074786418 t Critical two-tail 2.776445105 INFO4990 Information Technology Research Methods (July, 2004) Cache 2 0.854 0.01733 5 32 Analyzing data from factorial design Problem Cache size The memory-cache experiments were repeated three times each. The result is shown right What we want to find out 4M 8M 1K 15 18 12 (15) 45 48 51 (48) 2K 25 28 19 (24) 75 75 81 (77) Which factor contribute most to the performance What’s the joint effect of the two factors Monday, August 30, 2004 Memory size INFO4990 Information Technology Research Methods (July, 2004) 33 Suitable test: ANOVA 2 way ANOVA (Analysis of Variance) F-value Between-sample variation/withinsample variation Monday, August 30, 2004 ANOVA Source of Variation Sample Columns Interaction Within Total SS 1083 5547 300 102 7032 df 1 1 1 8 MS F P-value F crit 1083 84.94118 1.56E-05 5.317655 5547 435.0588 2.93E-08 5.317655 300 23.52941 0.001271 5.317655 12.75 11 Distribution of Variance Total Memory Cache Interaction Errors variance size size 100% 0.788823 0.15401 0.042662 0.014505 INFO4990 Information Technology Research Methods (July, 2004) 34 Statistical package Excel SPSS SAS Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 35 References Paul D. Leedy and Jeanne Ellis Ormrod << Practical Research: Planning and Design >> 7th edition Robert.B.Burns <<Introduction to Research Methods>> 4th edition Raj Jain <<The art of computer system performance analysis by >> www.socialresearchmethods.net http://www.statsoft.com/textbook/stathome.html Monday, August 30, 2004 INFO4990 Information Technology Research Methods (July, 2004) 36