Welcome to Statistics 111

advertisement

Welcome to Statistics 111

Alex Braunstein

The goal of this course is to develop basic tools for data analysis, probability and statistical methods. Key topics covered in the course include exploratory data analysis, regression, probability, estimation, and hypothesis testing

Stat 111 - Lecture 1 - Introduction 1

Syllabus notes: website

• All handouts will be available on the website: http://stat.wharton.upenn.edu/~braunsf/stat111.html

• Website also contains my contact information

• Link on website for getting Wharton class account if you are not a Wharton student

• Helpful if you want to use Wharton computer labs

Stat 111 - Lecture 1 - Introduction 2

Syllabus notes: Homeworks

• Homeworks will be handed out at the beginning of every week

• ~ 5 homeworks in all

• Homeworks will be submitted at the beginning of class on Mondays

• You are encouraged to work together on homework, but homeworks are to be completed separately and handed in individually.

• Do not copy from another person.

• No late homeworks will be accepted!!

• Late homeworks will get a score of zero, without exception

• Your lowest homework grade is not included in final grade

Stat 111 - Lecture 1 - Introduction 3

Syllabus Notes: Midterm Exam

• Midterm is held on following date:

Monday, June 15th (in class)

• No makeup midterm examination!

• A missing midterm exam counts as a zero score

• Consider taking this class in the fall or spring if you can not attend the midterm!

Stat 111 - Lecture 1 - Introduction 4

Student Questionnaire

• Fill out a questionnaire and hand it in before the break

• I will try to incorporate some of the subjects that interest you into future lectures

Stat 111 - Lecture 1 - Introduction 5

Course Overview

1

Collecting Data

2

Exploring Data

4 Inference

Probability Intro.

3

2

Comparing Variables

Means Proportions

1

Relationships between Variables

1

Regression

1

Contingency Tables

Stat 111 - Lecture 1 - Introduction 6

Out in public: You do statistics ?!?

• I hated that class in college!

• That was the most boring class ever!

• Lame.

Stat 111 - Lecture 1 - Introduction 7

Big Picture Ideas

• Statistics is all about uncertainty

• Focus as much on what we don’t know (or haven’t observed) instead of what we know

• Formulating the question that we want to answer is often the most difficult part

• Statistics is part mathematics, part roll-upyour-sleeves-and-get-thinking.

Stat 111 - Lecture 1 - Introduction 8

Science and Skepticism

• We always need to be cautious about conclusions based on data

• Possible sources of bias and confounding?

• How might things have gone wrong?

• A little bit of skepticism is a good thing!

Stat 111 - Lecture 1 - Introduction 9

Statistical Modeling

• Inference : using mathematical models of uncertainty to answer questions

• Connect probability concepts to our data

• Can not make claims without using models and making assumptions

• Are the assumptions reasonable?

Stat 111 - Lecture 1 - Introduction 10

After the break

• Collecting Data: Design of Experiments

• Sections 3.1-3.2 in Moore, McCabe and Craig

• First couple of classes will not involve much math at all, but we will get into lots of data analysis after that!

Stat 111 - Lecture 1 - Introduction 11

Break!

• Hand in questionnaire

• 5 minutes

Stat 111 - Lecture 1 - Introduction 12

Outline for Second Half of Lecture

• Introduction to Experiments

• Sources of Bias in Experiments

• Techniques for Avoiding Bias

• Matching

• Randomization

• Block Designs

• Blinding and Double-Blinding

• Experiments vs. Observational Studies

• Association vs. Causation

Stat 111 - Lecture 2 - Experiments 13

Experiments

• Used to address a specific question

• Often used to examine causal effects

• Eg. medical trials, education interventions

Treatment Group Treatment

1

Population

Experimental

Units

2 3

Control Group No Treatment

4

Result

Result

• Can we just look at difference in results to get the causal effect of the treatment?

• Depends on whether the experiment was done well

• many possible sources of bias in design of experiments

Stat 111 - Lecture 2 - Experiments 14

Sources of Bias

• An experiment or study is biased if it systematically favors a particular outcome

1.

Subjects are not representative of the population

2.

Treatment and control groups are inherently different on some lurking or confounding variable

3.

Subjects are influenced by knowing they are in treatment or control groups

4.

Evaluator of outcomes is influenced by knowing they are in treatment or control groups

Treatment Group Treatment Result

1

Population

Experimental

Units

2 3 4

Control Group No Treatment Result

Stat 111 - Lecture 2 - Experiments 15

Bias 1: Non-representative units

• If your subjects are not representative of the population, you won’t be able to generalize the results even if the experiment is well done

• Here are two examples

• Treatment group: High Level NICUs

• Control Group: Low Level NICUs

• Problem: classification of NICU is different from state to state, so a hospital that might qualify as a high level NICU in one state might not in another

• Observed differences between the groups can not be generalized from one state to another

Stat 111 - Lecture 2 - Experiments 16

Bias 2: Confounding/Lurking Variables

• Treatment group and control group are different on some variable that also influences the outcome

• A confounding variable means that we can’t attribute difference in outcomes to just the treatment

– Part of the difference may be due to the confounding variable not the treatment

• Simple example: a breast cancer drug trial where only women receive the treatment and only men receive the control

• Gender becomes a confounding variable

• Are treatment vs control outcomes different due to the treatment or gender differences between groups?

Stat 111 - Lecture 2 - Experiments 17

Bias 3: Subject knows treatment assignment

• A subject’s outcome is influenced by knowing that he/she is in a treatment or control group

• Eg. drug trials: patients improve just because they think they are receiving the drug

• Solution: blinded experiment with placebo

• Placebo appears to be the treatment, so all subjects

(treatment and control) don’t know their true treatment assignment

• Controls may improve outcomes slightly; this is often called “the placebo effect”

Stat 111 - Lecture 2 - Experiments 18

Bias 4: Evaluator knows treatment assignment

• Person evaluating outcome (eg. doctor in drug trial) may also be influenced by knowing who receives treatment

• Not a problem if outcome is something indisputable, such as death!

• This is a problem for more subjective measures like pain reduction or results from social programs

• Solution: double-blinded experiment where neither subjects not evaluators know treatment assignments

Stat 111 - Lecture 2 - Experiments 19

Association vs Causation

• In the presence of a confounding variable, we can only conclude there is an association between treatment and outcome, not causation

Stat 111 - Lecture 2 - Experiments 20

Examples: “Reporters are stupid”

• Children who watch many hours of TV get lower grades in school on average than those who watch less TV

• Does this mean that TV causes poor grades?

• What are potential confounding variables?

• People who use artificial sweeteners in place of sugar tend to be heavier than people who use sugar

• Does this mean that sweeteners cause weight gain?

• What is probably happening here?

Stat 111 - Lecture 2 - Experiments 21

One solution: Matching

• Make sure that treatment and control groups are very similar on observed variables like race, gender, age etc.

• Block designs: divide subjects into blocks with similar observed variables before dividing them into treatment vs control

• Special case: Matched Pairs

• Subjects are matched up into pairs, then one member of each pair gets treatment and the other gets control

• Example: Dandruff experiment

• treatment applied to one side and control to other side of head

• No reason to expect difference in sides except for treatment

Stat 111 - Lecture 2 - Experiments 22

Another Solution: Randomization

• Problem with matching is that you cannot usually match on unobserved characteristics (eg. Genetics)

• Eg. Cholesterol drug trial can’t match treatment and control groups on genetic predisposition for high cholesterol

• Randomly assign subjects to treatment or control

• Random assignment should lead to groups that are similar or balanced on both observed and unobserved confounding variables

• Example: student questionnaire earlier in class - each form you filled out was randomly assigned either a 1 or 2

Stat 111 - Lecture 2 - Experiments 23

Randomization of In-Class Survey

• Check to see if groups are balanced:

Variable

Average Height

Average Shoe Size

Average Number of Siblings

Treatment Control

• There are differences, but are they “significant” ?

• Later on in the course, we will be able to answer questions like this

• Of course, we can’t check the balance for unobserved variables…we just have to trust the randomization process

• This is why good science needs to be replicable

Stat 111 - Lecture 2 - Experiments 24

Even Better: Randomization + Matching

• Randomization generally leads to treatment and control groups that are evenly balanced but you can still get unlucky and get unbalanced groups

• Example: randomly placing 20 people (10 males, 10 females) into treatment and control groups.

• How many males will end up in treatment group?

• Ideally, we would have 5 males in treatment group, and 5 males in control group ( balanced )

• However, there is a chance to get 9 males in treatment and 1 male in control group ( unbalanced )

Stat 111 - Lecture 2 - Experiments 25

Even Better: Randomization + Matching

• Randomized Blocks : randomize within blocks of observed variables

• Example:

• Divide up subjects into males and females first, then randomly assign treatment or control to subjects in each group separately

• Guarantees that equal number of males end up in treatment group and control group (same with females)

• Randomized Matched Pairs : randomly decide which member of each pair gets treatment vs. control

• Example:

• For each head in dandruff experiment, randomly assign which side of head to get dandruff shampoo vs. control

Stat 111 - Lecture 2 - Experiments 26

Experiments vs. Observational Studies

• Often, we want the causal effect of some treatment, but our data are from an observational study

• Observational studies examine effects of some variable but without the advantages of a controlled experiment

• No treatment is applied in observational studies

• Example: health effects of smoking

• Unethical to randomly impose a treatment

• Could there be some confounding variable that explains health differences between smokers and non-smokers ?

• Very risky to make causal statements from observational data, since we can not avoid bias!

Stat 111 - Lecture 2 - Experiments 27

Health Effects of Chocolate

• Report to European Society of Sexual

Medicine:

• 153 Italian women filled out sexual function questionnaires

• “intriguing correlation”: sexual function/desire significantly greater among chocolate-eaters

• Observational study : association does not imply causation!

• Confounding : average age is 35 among frequent chocolate-eaters, compared with 40.4 in nonchocolate group

Stat 111 - Lecture 2 - Experiments 28

Next Class - Lecture 2

• Collecting Data:

– Surveys and Sampling

– Graphical summaries of a single variable

• Moore, McCabe and Craig: Sections 3.3 and 1.1

Stat 111 - Lecture 2 - Experiments 29

Download