Chapter-6-Section-1-Day-2-Continuous-RV-Notes-2

advertisement
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Honors Statistics
Aug 23-8:26 PM
Daily Agenda
Aug 23-8:31 PM
1
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
A Skip 4, 12, 16
Write a program to generate random
numbers. I've decided to give them
free will.
Apr 25-10:55 AM
Toss 4 times Suppose you toss a fair coin 4 times.
Let X = the number of heads you get.
First List the Sample Space ....
HHHH
THHH
HHHT
THHT
HHTH
THTH
HHTT
THTT
HTHH
TTHH
HTHT
TTHT
HTTH
TTTH
HTTT
TTTT
(a) Find the probability distribution of X.
(b) Make a histogram of the probability distribution. Describe what you see.
frequency
0.5
0.4
0.3
0.2
0.1
Number of heads
(c) Find P(X ≤ 3) and interpret the result.
P( X ≤ 3) =
+
+
+
=
15
16
= 0.9375
The probability that 4 tosses of a coin results in 3 or fewer heads is 0.9375
Nov 28-12:03 AM
2
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Kids and toys In an experiment on the behavior of young
children, each subject is placed in an area with five toys. Past
experiments have shown that the probability distribution of
the number X of toys played with by a randomly selected
subject is as follows:
(a) Write the event “plays with at most two toys” in terms of X.
What is the probability of this event?
P(x ≤ 2) = 0.03 + 0.16 + 0.30 = 0.49
(b) Describe the event X > 3 in words.
The probability that the young child plays with more than 3 toys.
What is its probability?
P(X > 3) = 0.17 + 0.11 = 0.28
What is the probability that X ≥ 3? P(X ≥ 3) = 0.28 + 0.23 = 0.51
Nov 28-12:08 AM
Kids and toys Refer to Exercise 4.
Calculate the mean of the random
variable X and interpret this result in
context.
µx = 0(0.03) + 1(0.16) + 2(0.30) + 3(0.23) + 4(0.17) + 5(0.11) = 2.68
If many, many children participated in this experiment, the
mean number of toys that randomly selected children would
play with will average 2.68 toys.
(The expected number of toys a randomly selected young child
will play with is 2.68.) This statement is optional.
Nov 28-12:16 AM
3
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Kids and toys Refer to Exercise 4. Calculate and interpret the standard deviation
of the random variable X. Show your work.
σ2x = (0-2.68)2(0.03) + (1-2.68)2(0.16)
+ (2-2.68)2(0.30) + (3-2.68)2(0.23)
+ (4-2.68)2(0.17) + (5-2.68)2(0.11) = 1.7176
σx =√1.7176 = 1.31057
The standard deviation of X is σx = 1.31057
The number of toys a randomly selected
young child will play with will typically differ
from the mean (2.68) by about 1.31 toys.
Nov 28-12:22 AM
Benford’s law Faked numbers in tax returns, invoices, or
expense account claims often display patterns that aren’t
present in legitimate records. Some patterns, like too many
round numbers, are obvious and easily avoided by a clever
crook. Others are more subtle. It is a striking fact that the
first digits of numbers in legitimate records often follow a
model known as Benford’s law.7 Call the first digit of a
randomly chosen record X for short. Benford’s law gives this
probability model for X (note that a first digit can’t be 0):
(a) Show that this is a legitimate probability distribution.
all individual probabilities are between 0 and 1
0.301 + 0.176 + 0.125 + 0.097 + 0.079 + 0.067 + 0.058 + 0.051 + 0.046 = 1
(b) Make a histogram of the probability distribution. Describe what you see.
See histogram above. The distribution is NOT symmetric. It is skewed
to the right. The data should be analyzed using the 5 number summary.
(c) Describe the event X ≥ 6 in words. What is P(X ≥ 6)?
What is the probability that the first digit in a legitimate legal document is 6 or greater?
P(X ≥ 6) = 0.067 + 0.058 + 0.051 + 0.046 = 0.222
(d) Express the event “first digit is at most 5” in terms of X. What is the probability
of this event?
P(X < 6) = 1 - P(X ≥ 6) = 1 - 0.222 = 0.778
Nov 14-5:53 PM
4
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Benford’s law Refer to Exercise 5. The first digit of a randomly chosen expense
account claim follows Benford’s law. Consider the events A = first digit is 7 or
greater and B = first digit is odd.
(a) What outcomes make up the event A? What is P(A)?
P(X ≥ 7) = 0.058 + 0.051 + 0.046 = 0.155
(b) What outcomes make up the event B? What is P(B)?
P(X is odd) = 0.301 + 0.125 + 0.079 + 0.058 + 0.046 = 0.609
(c) What outcomes make up the event “A or B”? What is P(A or B)? Why is this
probability not equal to P(A) + P(B)?
P(X ≥ 7 or X is odd) = 0.609 + 0.155 - (0.058 + 0.046) = 0.66
Both 7 and 9 are included in each event and must their sum must be subtracted
because they were counted twice ( the general probability addition rule)
Nov 28-12:14 AM
Benford’s law and fraud A not-so-clever employee decided to fake his monthly
expense report. He believed that the first digits of his expense amounts should be
equally likely to be any of the numbers from 1 to 9. In that case, the first digit Y of
a randomly selected expense amount would have the probability distribution
shown in the histogram.
> (a) Explain why the mean of the random variable Y is located at the solid red line
in the figure.
The mean is the balance point of the distribution. So it is located in the
center of a uniform or symmetric distribution histogram in this case at 5.
> (b) The first digits of randomly selected expense amounts actually follow
Benford’s law (Exercise 5). According to Benford’s law, what’s the expected value
of the first digit? Explain how this information could be used to detect a fake
expense report.
µx = 1(0.301) + 2(0.176) + 3(0.125) + 4(0.097) + 5(0.079)
+ 6(0.067) + 7(0.058) + 8(0.051) + 9(0.046) = 3.441
To detect a fake expense report, compute the sample mean of the first
digits of the numbers on the report. A value closer to 3.441 suggests a
truthful report but a value closer to 5 (the more uniform distribution)
suggest a false report.
> (c) What’s P(Y > 6) in the above distribution? According to Benford’s law, what
proportion of first digits in the employee’s expense amounts should be greater
than 6? How could this information be used to detect a fake expense report?
P(Y > 6) = 0.058 + 0.051 + 0.046 = 0.155
For a uniform distribution the P(Y > 6) = 0.3
To detect a fake expense report, compute the percent of the first digits
that are greater than 6 on the report. A value closer to 0.155 suggests a
truthful report but a value closer to 0.3 (the more uniform distribution)
suggest a false report.
Nov 28-12:18 AM
5
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Benford’s law and fraud Refer to Exercise 13. It might also be possible to detect an
employee’s fake expense records by looking at the variability in the first digits of
those expense amounts.
> (a) Calculate the standard deviation σY. This gives us an idea of how much
variation we’d expect in the employee’s expense records if he assumed that
first digits from 1 to 9 were equally likely.
σ2x = (1-5)2(0.10) + (2-5)2(0.10) + (3-5)2(0.10)
+ (4-5)2(0.10) + (5-5)2(0.10) + (6-5)2(0.10)
+ (7-5)2(0.10) + (8-5)2(0.10) + (9-5)2(0.10) = 6.66
σx =√6.66 = 2.58
> (b) Now calculate the standard deviation of first digits that follow Benford’s
law (Exercise 5). Would using standard deviations be a good way to detect
fraud? Explain.
σ2x = (1-3.44)2(0.301) + (2-3.44)2(0.176) + (3-3.44)2(0.125)
+ (4-3.44)2(0.097) + (5-3.44)2(0.079) + (6-3.44)2(0.067)
+ (7-3.44)2(0.058) + (8-3.44)2(0.051) + (9-3.44)2(0.046)
= 6.06052
σx =√6.06052 = 2.42
Because the standard deviations are so close 2.58 and
2.42 it would be difficult to determine fake reports from
legitimate reports using the standard deviation.
Nov 28-12:22 AM
Finish Stuff from Yesterday
Apr 25-1:29 PM
6
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
1. Write in words what the meaning of P(X ≥ 3) is. What is this probability?
2. Write the event “the student got a grade worse than C” in terms of values of
the random variable X. What is the probability of this event?
3. Sketch a graph of the probability distribution. Describe what you see.
0.5
0.4
0.3
0.2
0.1
4. Find the expected value of the distribution. Interpret this value in context.
5. Find the standard deviation of the distribution. Interpret this value in context.
Nov 27-10:56 PM
1. Write in words what the meaning of P(X ≥ 3) is. What is this probability?
What is the probability that a randomly selected student in online statistics 101 earned
a grade of B or higher?
P(X ≥ 3) = 0.42 + 0.26 = 0.68
2. Write the event “the student got a grade worse than C” in terms of values of
the random variable X. What is the probability of this event?
P(X < 2) = 0.02 + 0.10 = 0.12
frequency of letter grade earned
3. Sketch a graph of the probability distribution. Describe what you see.
0.5
0.4
0.3
0.2
0.1
Value of letter grade
This distribution of
letter grade
probabilities is not
symmetric and skewed
left. The center is at
the median which
appears to be about 3.
The data spreads from
0 to 4 giving a range of
4. The data should be
analyzed using the 5 #
summary and there are
no mentionable
deviations.
4. Find the expected value of the distribution. Interpret this value in context.
µx = 0(0.02) + 1(0.10) + 2(0.20) + 3(0.42) + 4(0.26) = 2.8
If many, many STAT 101 students were randomly
selected, their GPA's would average 2.8 points.
5. Find the standard deviation of the distribution. Interpret this value in context.
σ2x = (0-2.8)2(0.02) + (1-2.8)2(0.10)
+ (2-2.8)2(0.20) + (3-2.8)2(0.42)
+ (4-2.8)2(0.26) = 1
σx =√1 = 1
The standard deviation of X is σx = 1
A randomly selected stats 101 grade will
will typically differ from the mean (2.8) by
about 1 point.
Nov 27-10:56 PM
7
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Nov 27-10:38 PM
Nov 28-9:29 PM
8
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Area above the x axis
Nov 21-11:55 AM
CONTINUE ...
CONTINUOUS RANDOM VARIABLES AND PROBABILITY
Dec 2-3:01 PM
9
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Why does P(X = #) = 0 in the continuous "world"
Y
P( 0 < x < 1)
= (1)(1) = 1.0
Dec 2-7:34 PM
The probability distribution for a continuous random variable assigns probabilities to intervals of
outcomes rather than to individual outcomes. In fact, all continuous probability models assign
probability 0 to every individual outcome. Only intervals of values have positive probability. To see that
this is true, consider a specific outcome from the random number generator of the previous example,
such as P(Y = 0.7). The probability of this event is the area under the density curve that’s above the
point 0.70000…on the horizontal axis.But this vertical line segment has no width, so the area is 0. For
that reason,
UT I O
N
CA
P(0.3 ≤ Y ≤ 0.7) = P(0.3 ≤ Y < 0.7) = P(0.3 < Y < 0.7) = 0.4
ALL continuous probability models assign
probability 0 to every individual outcome.
P(x=3) = 0
In many cases, discrete random variables arise from counting something
—for instance, the number of siblings that a randomly selected student has.
Continuous random variables often arise from measuring something
—for instance, the height or time to run a mile for a randomly selected student.
Nov 28-9:33 PM
10
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
DISCRETE
Apr 27-11:47 AM
Y
b) P(X
#22 Random numbers
a) P(Y≤ 0.4) =
b) P(Y < 0.4) =
c) P(0.1 < Y ≤ 0.15 or 0.77 ≤ Y < 0.88) =
d) What important fact about continuous random variables does comparing
your answer to a and b illustrate?
Nov 21-11:51 AM
11
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Dec 1-2:08 PM
Sep 26-6:57 PM
12
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Sep 26-6:58 PM
±
Oh normdist program
Oh normdist program I will be true
I'm blue
Oh normdist program
Sep 27-1:27 PM
13
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Nov 30-11:23 AM
Nov 30-11:23 AM
14
Chapter 6 Section 1 Day 2 Notes.notebook
April 28, 2017
Nov 29-3:12 PM
Nov 21-8:16 PM
15
Download