Field studies - Andrew L. Kun

FIELD STUDIES User studies    Ubicomp: people use technology Must conduct user studies Also:  Focus groups  Ethnographic studies  Heuristic evaulations  Etc. User studies  Laboratory studies:  Controlled  environment Field (in-situ) studies  Real world Field studies  Appropriate for ubicomp:  Abundant data  Observe unexpected challenges  Understand impact on lives  Trade-off:  Loss of control  Significant time and effort Three common types    Current behavior Proof of concept Experience with prototype How to think about user studies?  Formulate hypotheses Research steps 1. 2. 3. 4. 5. 6. 7. 8. 9. State problem(s) State goal(s) Propose hypotheses Propose steps to test hypotheses Explain how problem(s), goal(s) and hypotheses fit into existing knowledge Produce results of testing hypotheses Explain results Evaluate research State new problems What is a hypothesis?     Proposing an explanation Theory or hypothesis? “This is just a theory.” Some theories we live by (“just” not justified):  Newton’s theory of motion  Einstein’s theory of relativity  Evolutionary theory Hypothesis   Must be tentative Must predict Hypothesis  Some criteria of scientificity  Self-consistent  Grounded (fits bulk of relevant knowledge)  Accounts for empirical evidence  Empirically testable by objective procedures of science  General in some respect and to some extent On proposing hypotheses  Anomalous phenomena:  Strange and unfamiliar (Bermuda triangle)  Familiar yet not fully understood (cognitive load)  Is there already an explanation? Types of hypotheses   Incremental Fundamental shift:  Ptolemy (c. 90 – c. 168): geocentric cosmology  Copernicus (1473 – 1543): heliocentric cosmology And then came…  Kepler (1571 – 1630): elliptical orbits Fundamental shift example  Ulcer:  Stress? Spicy food?  Bacteria. Types of proposed explanations       Causes Correlation Causal mechanisms Underlying processes Laws Functions Proposing causal explanations  Studies show that using a cell phone while driving increases the probability of getting into an accident. Why is that so?  Pick up ringing phone  Dial number  See but don’t perceive Effects not always there  Cell phone + driving:  Usually no accident  Only one of the factors Remote and proximate causes  Cell phone + driving:  Attention shift → missed signal → accident  Remote cause → proximate cause → effect Correlation  A and B are correlated if: →B B → A  C → A and C → B  A combination of (some of) the above  Coincidence A  Correlation vs. causal relation:  Correlation doesn’t imply causal relation  Cannot determine cause direction (A → B or B → A) Correlation    Positive, negative None found ≠ none exists Causal link → correlation:  May  provide initial evidence for causal link Less explanatory value than facts about causal links Causal mechanisms   Mechanisms connecting remote causes and their effects. E.g.: artery in heart → clotting  Clotting → blocked artery  Blocked artery → heart attack  Aspirin inhibits clotting → lower risk of heart attack  Damaged Underlying processes  Photoelectric effect Photoelectric effect  Einstein: 1921 Nobel Prize in Physics Laws   General regularities in nature Universal: F  = ma Non-universal:  Statistical laws Functions  What is the purpose of the phenomenon? FOR SALE A prime lot of serfs or SLAVES GYPSY (TZIGANY) Through an auction at noon at the St. Elias Monastery on 8 May 1852 consisting of 18 Men 10 Boys, 7 Women & 3 Girls in fine condition Functions  William Harvey (1578 – 1657):  Heart pumps blood through circulatory system  No modern instruments!  Experiments with a number of animals:  Various fish,  Snail,  Pigeon, etc. Multiple methods together   Function → → causal mechanism → → underlying processes National Ignition Facility (Dennis O’Brien @ UNH): Ignition with lasers → → Laser, target chamber → → Physics of nuclear fusion Multiple methods together   Law → underlying processes Isaac Newton (1643 – 1727), second law of motion: F = ma → Graviton? Ockham’s razor  Crop circles: pranksters or aliens? Ockham’s razor  William of Ockham (c. 1288 – c. 1348) http://en.wikipedia.org/wiki/File:William_of_Ockham.png Do I have a hypothesis?   Yes. Do you realize you do? How to think about user studies?  Formulate hypotheses Three common types    Current behavior Proof of concept Experience with prototype Research steps 1. 2. 3. 4. 5. 6. 7. 8. 9. State problem(s) State goal(s) Propose hypotheses Propose steps to test hypotheses Explain how problem(s), goal(s) and hypotheses fit into existing knowledge Produce results of testing hypotheses Explain results Evaluate research State new problems Current behavior  Insights and inspiration:  State problem(s), goal(s)  Propose hypotheses  Relatively long Current behavior – example 1   AJ Brush and Kori Inkpen, “Yours, mine and ours?...” (pdf) (2005 movie inspiring title) Home technology: users share, etc. Current behavior – example 2   Schwetak Patel et al. “Farther Than You May Think…” (pdf) Hypothesis: Mobile phone a proxy to user location. Three common types    Current behavior Proof of concept Experience with prototype Research steps 1. 2. 3. 4. 5. 6. 7. 8. 9. State problem(s) State goal(s) Propose hypotheses Propose steps to test hypotheses Explain how problem(s), goal(s) and hypotheses fit into existing knowledge Produce results of testing hypotheses Explain results Evaluate research State new problems Proof of concept  Technological advance:  Produce results: prototype  Explain results: prototype  Relatively short Proof of concept – example 1   J. Sherwani et al., “Speech vs. Touch-tone: Telephone Interfaces for Information Access by Low Literate Users” (pdf) (video) Hypothesis: Speech better telephony interface than touch-tone for low literate users. Proof of concept – example 2     John Krumm and Eric Horvitz, “Predestination:…” (pdf) Hypothesis: Destinations from partial trajectories. Train/test algorithm on GPS tracks from 169 people Used pre-existing data:  Krumm and Horvitz, “The Microsoft Multiperson Location Survey”  Collecting original data a significant contribution  Leverage! Three common types    Current behavior Proof of concept Experience with prototype Research steps 1. 2. 3. 4. 5. 6. 7. 8. 9. State problem(s) State goal(s) Propose hypotheses Propose steps to test hypotheses Explain how problem(s), goal(s) and hypotheses fit into existing knowledge Produce results of testing hypotheses Explain results Evaluate research State new problems Experience with prototype  Users’ interaction with technology:  Produce results: prototype  Explain results: prototype  Relatively long Prototype an example!  Others don’t care about:  Raw usage information  Usability problems  Intricate implementation details  Etc.  Generalize!  Scientific and good technical work Experience – example 1     C. Neustaedter, et al., “A Digital Family Calendar in the Home:…” (pdf) (video) Hypothesis: At-a-glance awareness, remote access are significant benefits. 4 households, 4 weeks each (Best Student Paper, Graphics Interface 2007) Experience – example 2    Rafael Ballagas et al., “Gaming Tourism:…” (pdf) (video) Hypothesis: Learning through a game. 18 participants: 2 alone + 8 pairs (8 x 2 = 16) Study design  Who is the consumer?  Manager(s)   Professor(s)   For paper, proposal, thesis Funding agency   E.g. advisor’s collaborators Reviewers   E.g. thesis committee Researchers   Industry, academic lab Report on progress, proposal for funding Public  Friends, family, alumni, potential students, donors, potential employers Study design  How can I explain this to a layperson?  What  is key? What can be omitted? How will I write this up?  Paper  Thesis  Report  Blog  post Start writing paper/thesis/report/blog post at the beginning of the study. Study design  Test hypothesis/hypotheses Testing hypotheses via user studies  User studies:  Laboratory studies  Good: Control, easier to evaluate results  Bad: Constraints  Field studies  Good: Fewer constraints  Bad: Less control, more difficult to evaluate results Criteria  Falsifiability:  Prediction fails = explanation isn’t correct  Account for other factors!  Note:  Criterion - singular  Criteria - plural Criteria  Verifiability:  Prediction successful = explanation is correct  Account for other factors! The meat of it…   Battleship Potemkin, 1925 film Rotten meat scene Why larvae in meat?    Francesco Redi (1626-1697) Generation of insects, 1668 Causal explanation: fly droppings Redi’s research  Hypothesis:  Worms  derived from fly droppings Testing hypothesis:  Two sets of flasks with meat: sealed and open  Prediction: worms only in open flask Falsifiability criterion   Can anything cause a failed prediction even if explanation is correct? Did the apparatus operate properly?  Tight seal?  Meat not initially spoiled?  Other? Verifiability criterion    Can anything result in successful prediction even if explanation is wrong? What if “active principle” in the air is responsible for spontaneous generation? Modify experiment:  Replace seal with veil:  Flies cannot reach meat  Air in contact with meat  Modification helps meet verifiability criterion Verifiability criterion  Experimental vs. control group:  Only  difference in level of one independent variable Redi’s experiment:  Control: Open flasks  Experimental: Veil-covered flasks Control: laboratory experiment   Meat in veil-covered flasks? Creating control/experimental groups often impossible without careful design/control Study design   Test hypothesis/hypotheses Formulate in terms of:  Independent variables (multiple conditions)  Dependent variables  Design:  Within-subjects  Between-subjects  Mixed design Within-subjects design: example  Police radio UI:  hardware  Speech  Blog post, video Within-subjects design: example  Results in graphical form: Within-subjects design: example  Results in graphical form: Example: between-subjects design  Classical example: testing a drug Mixed design: example 1    SUI characteristics study Secondary task: speech control of radio 2 x 2 x 2 design:  SR accuracy: high/low  PTT button: yes/no – ambient recognition  Dialog repair strategy: mis-/non-understanding Mixed design: example 2    Motivation: PTT vs. driving performance Secondary task: speech control of radio 2 x 3 x 3 design:  SR accuracy: high/low  PTT activation: push-hold-release/push-release/no push  PTT button: ambient/fixed/glove Push-hold-release Ambient High Low Fixed Glove Push-release Ambient Fixed No-push Glove Ambient Fixed Glove Control condition  Baseline: e.g. no technology vs. later introduced technology Considerations  What will subjects do?  Normal behavior – may take long  Scenarios  Augment existing or brand new?  Augment: taking advantage of familiarity  New: more control (fewer inherited constraints)  Simulate or implement?  E.g. WoZ Data to collect  Qualitative  Insight into what participants did.  How do participants compare? Did they do what they thought they did? Use quantitative data.  Quantitative  How did people behave?  But why? Use qualitative data. Data to collect  At least three types of data:  Demographic  Usage  Reactions Data to collect  Run pilot experiments! Collecting data       Logging Surveys Experience sampling Diaries Interviews Unstructured observation – ethnography Logging  Plan ahead, not after the fact!  Testing hypotheses  Don’t leave important data out  Don’t save data you don’t need  Leverage logging:  Everything  E.g. OK? Mike Farrar’s MS research: files appearing on server indicates apps OK  Explicit communication with server: “I’m OK!” Surveys    Open-ended Multiple-choice Likert-scale Surveys   Questions should allow positive and negative feedback. Text clear to others?  Check!  One question at a time!  “Fun  Length?  Don’t  and easy to use?” bore subjects to death. Standard questions (e.g. QUIS)? Previously used questions? Example: Mike Farrar’s study  Hypotheses:  Initialize grammar (video):  From previous tags  From tags by users with similar interests  Voice commands convenient way to tag photos (video)  Keyboard users will use voice less  Low task completion: give up on voice Experience sampling (ESM)   Short questionnaire Timing:  Random  Scheduled  Event-based Experience sampling (ESM)    How often? How many? Relate to quantitative data? Diaries  Similar to ESM Interviews  Semi-structured:  List of specific questions + follow-up questions  Bring data  E.g. Nancy A. Van House: “Flickr and Public Image Sharing:…”  Interviews + photo elicitation Interviews   Neutral questions Negative feedback is OK (this is hard):  Don’t argue! Participants  Follow IRB rules Participants  Who to recruit?  Representative of intended users  Not your friends, family, colleagues – bias!  May need different types  Recruit sufficient numbers of each type Participant profile  Age  E.g.    age significant for driving Gender Technology use and experience Other  Eye tracker studies: no glasses Number of participants     Between-subjects usually requires more than withinsubjects Proof-of-concept: typically fewer and many types Longer study: may be able to use fewer Time commitment per participant is significant!  Recruit (Craigslist), organize, train, run, transfer data, process data  Participants will drop out – recruit extra  Counterbalancing may not work out Compensation   Don’t try to save on this! Driving simulator lab study cost example 1 graduate student year at UNH ≈ $50k  Software maintenance fees per year ≈ $20k  Trip to conference ≈ $2k  PC or laptop ≈ $2k  $20 x 24 participants ≈ $0.5k (less than 1%) Compensation  Must not affect data  E.g. in image tagging study if we paid per picture:  More data  Unrealistic as interactions are for money not for value of prototype Compensation  Leverage if you can:  Latest driving simulator lab study in collaboration with Microsoft Research:  Use Microsoft software as compensation Data analysis    Test hypotheses Use multiple data types Tell a story Data analysis  Statistics:  Descriptive  Inferential Descriptive statistics  Level of measurement:  Nominal  Ordinal  Interval Descriptive statistics  Level of measurement:  Nominal  Ordinal  Interval Level of measurement  Nominal:  Unordered  E.g. categories yes/no  Valid to report :  Frequency Level of measurement  Ordinal:  Rank order preference without numeric difference  E.g. responses on Likert scale  Five of the eight participants strongly agreed or agreed with the following statement: “I prefer to have a GPS screen for navigation.”  Valid to report :  Frequency  Median  Some people report means but what is the mean of “strongly agree” and “strongly disagree”? Level of measurement  Interval:  Numerical differences significant  E.g. age, number of times an action occurred, etc.  Valid to report:  Sum  Mean  Median  Standard deviation (outliers?) Outliers in interval data Inferential statistics  Significance tests  t-test  ANOVA  Many  others Which to use: depends on data Significance test: example 1  To assess the effect of different navigation aids on visual attention, we performed a one-way ANOVA using PTD as the dependent variable. As expected, the time spent looking at the outside world was significantly higher when using spoken directions as compared to the standard PND directions, p<.01. Specifically, for spoken directions only, the average PDT was 96.9%, while it was 90.4% for the standard PND. Significance test: example 2 … PDT on the PND screen changes with the distance from the previous intersection… significant main effect, p<.01… 20 PDT on standard PND [%]  15 10 5 0 60-80 -5 80-100 100-120 120-140 distance from previous intersection [m] 140-160 Significance test: example 3  Randomization test  Kun et al. (pdf)  Idea from Veit et al. (pdf) Significance test: example 3 35 30 standard p = 0.05 25 Rstw [degrees^2 ] spoken only 20 15 10 5 0 0 1 2 3 4 lag [seconds] 5 6 7 8

Field studies - Andrew L. Kun

Related documents

Products

Support

Field studies - Andrew L. Kun

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib