STAT 111 Principles of statistics

advertisement
YASAR UNIVERSITY
2010-2011 Fall
Assist. Prof. Dr. R. Serkan Albayrak
serkan.albayrak@yasar.edu.tr
http://yasaruniversity.yahooboard.net/




Syllabus
Course Materials
Grading-Next Slide
Etc.
Letter Grades
Count
13
13
11
8
7
5
A
A-
17
B+
3
B
B-
C+
C
5
5
C-
D+
3
D
F
Letter Grades
10
9
8
7
6
5
4
3
2
1
0
9
6
Total
3
3
3
3
3
2
1
A
A-
B
1
B-
B
1
B+
C
C-
C
C+
1
1
D
D+
F




Statistics can be thought of as a whole subject or
discipline ...
It can be thought of as the methods used to
collect, process and/or interpret data ...
It can be thought of as the collections of data
gathered by those methods ...
It can also be thought of as a specially calculated
figures (e.g. averages) to characterize collection
...


Statistics are like a bikini; What is revealed is
interesting; What is concealed is crucial. - R. Taylor
Statistics is the science and art of making
decisions based on quantitative evidence.
“The most fundamental
principle of all in gambling
is simply equal conditions,
e.g. of opponents, of
bystanders, of money, of
situation, of the dice box,
and of the die itself. To the
extent to which you depart
from that equality, if it is in
your opponent’s favor, you
are a fool, and if in your
own, you are unjust.”
Girolamo Cardano
1501 – 1576
PROPOSITION IV
“Suppose now that I am playing
against someone with the
agreement that the first of us to
win three times will take the
stake. And suppose that I have
already won twice and my
opponent has already won once. I
want to know how much of the
money should fall to me if we do
not wish to continue the game,
but rather to divide equitably the
money we are playing for.”
Christiann Huygens
1629 – 1695
Descriptive Statistics
Involves organizing,
summarizing, and
displaying data.
Inferential Statistics
Involves using
sample data to draw
conclusions about a
population.


The objective of descriptive statistics methods
is to summarize a set of observations.
The objective of inferential statistics methods is
to make inferences (predictions, decisions) about
population based on information contained in a
sample, and to quantify the level of uncertainty
in our decisions.
Decide which part of the study represents the
descriptive branch of statistics. What conclusions
might be drawn from the study using inferential
statistics?
A large sample of men, aged 48,
was studied for 18 years. For
unmarried men, approximately
70% were alive at age 65. For
married men, 90% were alive
at age 65.
Descriptive statistics involves statements such as “For unmarried
men, approximately 70% were alive at age 65” and “For married
men, 90% were alive at 65.”
A possible inference drawn from the study is that being married is
associated with a longer life for men.

We will follow




Logic
Statistics (Descriptive)
Probability
Statistics (Inferential)
An ARGUMENT is a sequence of statements, one
of which is called the CONCLUSION. The other
statements are PREMISES (assumptions). The
argument presents the premises—collectively—
as evidence that the conclusion is true.
Example: If A is true then B is true. A is true.
Therefore, B is true.

The CONCLUSION is that B is true. The PREMISES are If A is true then B is true and A is
true. The premises support the conclusion that B is true. The word "therefore"
is not part of the conclusion: It is a signal that the statement after it is the
conclusion.
 The words thus, hence, so, and the phrases it follows that, we see that, and so
on, also flag conclusions. The words suppose, let, given, assume, and so on,
flag premises.
 A concrete argument of the form just given might be:
 If it is sunny, I will wear sandals. It is sunny. Therefore, I will wear sandals.
 Here, A is "it is sunny" and B is "I will wear sandals."
 We usually omit the words "is true." So, for example, the previous argument
would be written
 If A then B. A. Therefore, B.
 The statement not A means A is false.
An argument is VALID if the conclusion must be true
whenever the premises are true.
If an argument is valid and its premises are true,
the argument is SOUND.
Cheese more than a billion years old is stale. The
Moon is made of cheese. The Moon is more
than a billion years old. Therefore, the Moon is
stale cheese.
VALID but NOT SOUND!











A or not A. (LAW OF THE EXCLUDED MIDDLE)
Not (A and not A).
A. Therefore, A or B.
A. B. Therefore, A and B.
A and B. Therefore, A.
Not A. Therefore, not (A and B).
A or B. Not A. Therefore, B. (DENYING THE DISJUNCT)
Not (A and B). Therefore, (not A) or (not B). (DE MORGAN)
Not (A or B). Therefore, (not A) and (not B). (DE MORGAN)
If A then B. A. Therefore, B. (AFFIRMING THE PRECEDENT, MODUS PONENDO PONENS,
"affirming by affirming")
If A then B. Not B. Therefore, not A. (DENYING THE CONSEQUENT, MODUS TOLLENDO
TOLLENS, "denying by denying")









A or B. Therefore, A.
A or B. A. Therefore, not B. (AFFIRMING THE DISJUNCT)
NOT BOTH A AND B ARE TRUE. NOT A. THEREFORE, B.
IF A THEN B. B. THEREFORE, A.
IF A THEN B. NOT A. THEREFORE, NOT B.
IF A THEN B. C. THEREFORE, B.
IF A THEN B. NOT C. THEREFORE, NOT A.
IF A THEN B. A. THEREFORE, C.
IF A THEN B. NOT B. THEREFORE, NOT C.




NANCY CLAIMS THE DEATH PENALTY IS A GOOD THING. BUT NANCY ONCE SET FIRE TO A
VACANT WAREHOUSE. NANCY IS EVIL. THEREFORE, THE DEATH PENALTY IS A BAD THING.
THIS ARGUMENT DOES NOT ADDRESS NANCY'S ARGUMENT, IT JUST SAYS SHE MUST BE
WRONG (ABOUT EVERYTHING) BECAUSE SHE IS EVIL. WHETHER NANCY IS GOOD OR EVIL IS
IRRELEVANT: IT HAS NO BEARING ON WHETHER HER ARGUMENT IS SOUND.
THIS IS A FALLACY OF RELEVANCE: IT ESTABLISHES THAT NANCY IS BAD, THEN EQUATES
BEING BAD AND NEVER BEING RIGHT. IN SYMBOLS, THE ARGUMENT IS IF A THEN B. A.
THEREFORE C. (IF SOMEBODY SETS FIRE TO A VACANT WAREHOUSE, THAT PERSON IS EVIL.
NANCY SET FIRE TO A VACANT WAREHOUSE. THEREFORE, NANCY'S OPINION ABOUT THE
DEATH PENALTY IS WRONG.)
AD HOMINEM IS LATIN FOR "TOWARDS THE PERSON." AN AD HOMINEM ARGUMENT ATTACKS
THE PERSON MAKING THE CLAIM, RATHER THAN THE PERSON'S REASONING. A VARIANT OF
THE AD HOMINEM ARGUMENT IS "GUILT BY ASSOCIATION."



BOB CLAIMS THE DEATH PENALTY IS A GOOD THING. BUT BOB'S FAMILY
BUSINESS MANUFACTURES CASKETS. BOB BENEFITS WHEN PEOPLE DIE, SO
HIS MOTIVES ARE SUSPECT. THEREFORE, THE DEATH PENALTY IS A BAD
THING.
THIS ARGUMENT DOES NOT ADDRESS BOB'S ARGUMENT, IT ADDRESSES
BOB'S MOTIVES. HIS MOTIVES ARE IRRELEVANT: THEY HAVE NOTHING TO DO
WITH WHETHER HIS ARGUMENT FOR THE DEATH PENALTY IS SOUND.
THIS IS RELATED TO AN AD HOMINEM ARGUMENT. IT, TOO, ADDRESSES THE
PERSON, NOT THE PERSON'S ARGUMENT. HOWEVER, RATHER THAN
CONDEMNING BOB AS EVIL, IT IMPUGNS HIS MOTIVES IN ARGUING FOR THIS
PARTICULAR CONCLUSION.


AMY SAYS PEOPLE SHOULDN'T SMOKE CIGARETTES IN PUBLIC
BECAUSE CIGARETTE SMOKE HAS A STRONG ODOR. BUT AMY
WEARS STRONG PERFUME ALL THE TIME. AMY IS CLEARLY A
HYPOCRITE. THEREFORE, SMOKING IN PUBLIC IS FINE.
THIS ARGUMENT DOES NOT ENGAGE AMY'S ARGUMENT: IT
ATTACKS HER FOR THE (IN)CONSISTENCY OF HER OPINIONS IN
THIS MATTER AND IN SOME OTHER MATTER. WHETHER AMY
WEARS STRONG FRAGRANCES HAS NOTHING TO DO WITH
WHETHER HER ARGUMENT AGAINST SMOKING IS SOUND.



YES, I HIT BILLY. BUT SALLY HIT HIM FIRST.
THIS ARGUMENT CLAIMS IT IS FINE TO DO SOMETHING WRONG
BECAUSE SOMEBODY ELSE DID SOMETHING WRONG. THE ARGUMENT
IS OF THE FORM: IF A THEN B. A. THEREFORE C. (IN WORDS: IF SALLY
HIT BILLY, IT'S OK FOR BILLY TO HIT SALLY. SALLY HIT BILLY.
THEREFORE, IT'S OK FOR ME TO HIT BILLY.)
GENERALLY, THE TWO-WRONGS-MAKE-A-RIGHT ARGUMENT SAYS
THAT THE JUSTIFIED WRONG HAPPENED AFTER THE EXCULPATORY
WRONG, OR WAS LESS SEVERE. FOR INSTANCE, SALLY HIT BILLY
FIRST, OR SALLY HIT BILLY HARDER THAN I DID, OR SALLY PULLED A
KNIFE ON BILLY.




IF YOU DON'T GIVE ME YOUR LUNCH MONEY, MY BIG BROTHER WILL BEAT YOU UP.
YOU DON'T WANT TO BE BEATEN UP, DO YOU? THEREFORE, YOU SHOULD GIVE ME
YOUR LUNCH MONEY.
THIS ARGUMENT APPEALS TO FORCE: ACCEPT MY CONCLUSION—OR ELSE. IT IS NOT
A LOGICAL ARGUMENT. [+17]
NOTE 2-17: BUT IT CAN BE QUITE PERSUASIVE NONETHELESS.
IT IS AN ARGUMENT THAT IF YOU DO NOT ACCEPT THE CONCLUSION (AND GIVE ME
YOUR LUNCH MONEY), SOMETHING BAD WILL HAPPEN (YOU WILL GET BEATEN)—
NOT AN ARGUMENT THAT THE CONCLUSION IS CORRECT. THE FORM OF THE
ARGUMENT IS IF A THEN B. B IS BAD. THEREFORE, NOT A. HERE, A IS "YOU DON'T
GIVE ME YOUR LUNCH MONEY," B IS "YOU WILL BE BEATEN UP."




YES, I DOWNLOADED MUSIC ILLEGALLY—BUT MY GIRLFRIEND LEFT ME AND I LOST MY
JOB SO I WAS BROKE AND I COULDN'T AFFORD TO BUY MUSIC AND I WAS SO SAD THAT I
WAS BROKE AND THAT MY GIRLFRIEND WAS GONE THAT I REALLY HAD TO LISTEN TO
100 VARIATIONS OF SHE CAUGHT THE KATY.
THIS ARGUMENT JUSTIFIES AN ACTION NOT BY CLAIMING THAT IT IS CORRECT, BUT
BY AN APPEAL TO PITY: EXTENUATING CIRCUMSTANCES OF A SORT.
AD MISERICORDIUM IS LATIN FOR "TO PITY." IT IS AN APPEAL TO COMPASSION
RATHER THAN TO REASON. ANOTHER EXAMPLE:
YES, I FAILED THE FINAL. BUT I NEED TO GET AN A IN THE CLASS OR I [WON'T GET
INTO BUSINESS SCHOOL] / [WILL LOSE MY SCHOLARSHIP] / [WILL VIOLATE MY
ACADEMIC PROBATION] / [WILL LOSE MY 4.0 GPA]. YOU HAVE TO GIVE ME AN A!



MILLIONS OF PEOPLE SHARE COPYRIGHTED MP3 FILES AND VIDEOS ONLINE.
THEREFORE, SHARING COPYRIGHTED MUSIC AND VIDEOS IS FINE.
THIS "BANDWAGON" ARGUMENT CLAIMS THAT SOMETHING IS MORAL
BECAUSE IT IS COMMON. COMMON AND CORRECT ARE NOT THE SAME.
WHETHER A PRACTICE IS WIDESPREAD HAS LITTLE BEARING ON WHETHER IT
IS LEGAL OR MORAL. THAT MANY PEOPLE BELIEVE SOMETHING IS TRUE
DOES NOT MAKE IT TRUE.
AD POPULUM IS LATIN FOR "TO THE PEOPLE." IT EQUATES THE POPULARITY
OF AN IDEA WITH THE TRUTH OF THE IDEA: EVERYBODY CAN'T BE WRONG.
FEW TEENAGERS HAVE NOT MADE AD POPULUM ARGUMENTS: "BUT MOM,
EVERYBODY IS DOING IT!"



BOB: SLEEPING A FULL 12 HOURS ONCE IN A WHILE IS A HEALTHY
PLEASURE.
SAMANTHA: IF EVERYBODY SLEPT 12 HOURS ALL THE TIME, NOTHING
WOULD EVER GET DONE; THE REDUCTION IN PRODUCTIVITY WOULD
DRIVE THE COUNTRY INTO BANKRUPTCY. THEREFORE, NOBODY
SHOULD SLEEP FOR 12 HOURS.
SAMANTHA ATTACKED A DIFFERENT CLAIM FROM THE ONE BOB
MADE: SHE ATTACKED THE ASSERTION THAT IT IS GOOD FOR
EVERYBODY TO SLEEP 12 HOURS EVERY DAY. BOB ONLY CLAIMED
THAT IS WAS GOOD ONCE IN A WHILE.



ART: TEACHER SALARIES SHOULD BE INCREASED TO ATTRACT BETTER TEACHERS.
BETTE: LENGTHENING THE SCHOOL DAY WOULD ALSO IMPROVE STUDENT
LEARNING OUTCOMES. THEREFORE, TEACHER SALARIES SHOULD REMAIN THE SAME.
ART ARGUES THAT INCREASING TEACHER SALARIES WOULD ATTRACT BETTER
TEACHERS. BETTE DOES NOT ADDRESS HIS ARGUMENT: SHE SIMPLY ARGUES THAT
THERE ARE OTHER WAYS OF IMPROVING STUDENT LEARNING OUTCOMES. ART DID
NOT EVEN USE STUDENT LEARNING OUTCOMES AS A REASON FOR INCREASING
TEACHER SALARIES. EVEN IF BETTE IS CORRECT THAT LENGTHENING THE SCHOOL
DAY WOULD IMPROVE LEARNING OUTCOMES, HER ARGUMENT IS SIDEWAYS TO


ART'S: IT IS A DISTRACTION, NOT A REFUTATION.
A RED HERRING ARGUMENT DISTRACTS THE LISTENER FROM THE REAL TOPIC
RED HERRING ARGUMENTS ARE VERY COMMON IN POLITICAL DISCOURSE.




ALL MEN SHOULD HAVE THE RIGHT TO VOTE. SALLY IS NOT A MAN.
THEREFORE, SALLY SHOULD NOT NECESSARILY HAVE THE RIGHT TO VOTE.
THIS IS AN EXAMPLE OF EQUIVOCATION, A FALLACY FACILITATED BY THE
FACT THAT A WORD CAN HAVE MORE THAN ONE MEANING.
THIS ARGUMENT USES THE WORD MAN IN TWO DIFFERENT WAYS. IN THE
FIRST PREMISE, THE WORD MEANS HUMAN WHILE IN THE SECOND, IT MEANS
MALE. GENERALLY, EQUIVOCATION IS CONSIDERED A FALLACY OF
RELEVANCE, BUT THIS EXAMPLE FITS OUR DEFINITION OF A FALLACY OF
EVIDENCE.
THE LOGICAL FORM OF THIS ARGUMENT IS IF A THEN B. NOT C. THEREFORE,
B IS NOT NECESSARILYTRUE.
Trident (4/5)
Trident® sugarless gum used to advertise that "4
out of 5 dentists surveyed recommend Trident®
sugarless gum for their patients who chew gum."
 Yale University Graduates



In its broadest sense, Statistics is the science of
drawing conclusions about the world from data.
Data are observations (measurements) of some
quantity or quality of something in the world.
"Data" is a plural noun; the singular form is "datum."
Our lives are filled with data: the weather, weights,
prices, our state of health, exam grades, bank
balances, election results, and so on. Data come in
many forms, most of which are numbers, or can be
translated into numbers for analysis.




There are several important questions to keep in
mind when you evaluate quantitative evidence:
Are the data relevant to the question asked?
Was the data collection fair, or might there have
been some conscious or unconscious BIAS that
influenced the results or made some cases less
likely to be observed?
Do the data make sense?

Qualitative Data : Consists of attributes, labels,
or nonnumerical entries.
Major
Place of birth
Eye color
Hot/Warm/Cold
 Population density: low/medium/high
 Height: short/medium/tall
 Young/Middle-aged/Old
 Social class: lower/middle/upper
 Family size: fewer than 3, 3–5, 5 or more
 Rural/Urban area
 Type of climate
 Gender
 Ethnicity
 Zip code
 Hair color
 Country of origin

Quantitative Data : Numerical measurements or
counts.
Age
Weight of a letter
Temperature








Temperature in °C
Population density: people per square mile
Height in inches
Height in centimeters
Body mass index (BMI)
Age in seconds
Income in dollars
Family size (#people)

The base prices of several vehicles are shown in
the table. Which data are qualitative data and
which are quantitative data? (Source Ford Motor Company)
Qualitative Data
(Names of vehicle
models are
nonnumerical entries)
Quantitative Data
(Base prices of
vehicles models are
numerical entries)


The fact that a category is labeled with a
number does not make the variable
quantitative!
The real issue is whether arithmetic with the
values makes sense.
Nominal level of measurement
 Qualitative data only
 Categorized using names, labels, or qualities
 No mathematical computations can be made
Ordinal level of measurement
• Qualitative or quantitative data
• Data can be arranged in order
• Differences between data entries is not
meaningful

Two data sets are shown. Which data set consists
of data at the nominal level? Which data set
consists of data at the ordinal level? (Source: Nielsen
Media Research)
Ordinal level (lists the
rank of five TV programs.
Data can be ordered.
Difference between ranks
is not meaningful.)
Nominal level (lists the
call letters of each
network affiliate. Call
letters are names of
network affiliates.)
Interval level of measurement
 Quantitative data
 Data can be ordered
 Differences between data entries is meaningful
 Zero represents a position on a scale (not an
inherent zero – zero does not imply “none”)
Ratio level of measurement
 Similar to interval level
 Zero entry is an inherent zero (implies “none”)
 A ratio of two data values can be formed
 One data value can be expressed as a multiple of
another

Two data sets are shown. Which data set consists
of data at the interval level? Which data set
consists of data at the ratio level? (Source: Major League
Baseball)
Interval level
(Quantitative data. Can
find a difference between
two dates, but a ratio
does not make sense.)
Ratio level (Can find
differences and write
ratios.)
Put data in
categories
Arrange
data in
order
Subtract
data
values
Determine if one
data value is a
multiple of another
Nominal
Yes
No
No
No
Ordinal
Yes
Yes
No
No
Interval
Yes
Yes
Yes
No
Ratio
Yes
Yes
Yes
Yes
Level of
Measurement





One of the most problematic relationship.
What is really a variable?
What is value?
Data
What is data?
Observed
How they are related?
Variable
Theoretical
Values
Variable: New York Yankees’ World
Series Victories
Values: 1901,1902,…(all possible
years)
Data: 1923,1927,1928,…
Download