CS1512

advertisement
CS1512
Foundations of
Computing Science 2
Lecture 24
Probability and statistics (5)
Random number generators
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
© J R W Hunter, 2006
1
After Easter
Lectures
• Dr Kees van Deemter will take over
• Logic and HCI
• Same times and places
Tutorials
• Logic and HCI
• Same times and places
Practicals
• Java programming
 simulation – Robocode
 ‘take-home asessment’
• Same times and places
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
2
Continuous Assessment
Week 9 test
• week 9 = week after Easter vacation;
• worth 10% of the marks of the course;
• as for week 5 – test under practical exam conditions;
• will test your knowledge of inheritance.
‘Practical exam’
• completed in your own time;
• worth 30% of the marks of the course;
• handed out in week 10; hand in by the end of week 12.
Both
• conditional on AUT dispute being resolved;
• safest course is to assume that they will go ahead;
• if you are worried about the possible effects of the AUT
action, write to the Principal and express those
concerns.
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
3
Remember: Continuous data
•
•
Divide range of observations into non-overlapping intervals (bins)
Count number of observations in each bin
•
Enzyme concentration data:
121
95
85
119
62
•
•
25
81
145
57
104
83
123
100
64
139
110
67
70
151
201
60
113
93
48
68
101
78
118
92
95
Range: 25 to 201
10 bins of width 20
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
4
Remember: Enzyme concentrations
Concentration
19.5
39.5
59.5
79.5
99.5
119.5
139.5
159.5
179.5
199.5
Totals
≤ c < 39.5
≤ c < 59.5
≤ c < 79.5
≤ c < 99.5
≤ c < 119.5
≤ c < 139.5
≤ c < 159.5
≤ c < 179.5
≤ c < 199.5
≤ c < 219.5
Freq.
Rel.Freq.
1
2
7
7
7
3
2
0
0
1
0.033
0.067
0.233
0.233
0.233
0.100
0.067
0.000
0.000
0.033
30
1.000
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
5
Relative Frequency Histogram
relative frequency
2.50E-01
2.00E-01
height of the bar
gives the relative
frequency
1.50E-01
1.00E-01
5.00E-02
0.00E+00
19. 5
39. 5
59. 5
79. 5
99. 5
119.5
139.5
159.5
179.5
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
199.5
6
Density Histograms
Plot relative frequency / width of the column (bin width) so that
the area of the bar now gives the relative frequency
relative frequency density
1.40E-02
relative frequency
= relative frequency density
bin width
= 0.0165  20
= 0.233
1.20E-02
1.00E-02
8.00E-03
6.00E-03
4.00E-03
2.00E-03
0.00E+00
19. 5
39. 5
59. 5
79. 5
99. 5
119.5
139.5
159.5
179.5
199.5
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
7
Addition of areas
1.40E-02
1.20E-02
1.00E-02
8.00E-03
6.00E-03
4.00E-03
2.00E-03
0.00E+00
19. 5
39. 5
59. 5
79. 5
99. 5
119.5
139.5
159.5
179.5
199.5
relative frequencies of values between here and here = this area
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
8
Increase the number of samples
... and decrease the width of the bin ...
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
9
Relative frequency as area under
the curve
relative frequency of values between a and b = area
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
10
Continuous random variable
Consider a large population of individuals
e.g. all males in the UK over 16
Consider a continuous attribute
e.g. Height: X
Select an individual at random so that any individual
is as likely to be selected as any other
X is said to be a continuous random variable
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
11
Probability density function
The probability distribution of X is said to be its probability density function
defined such that:
P(a ≥ x > b) = area under the curve between a and b
NB total area under curve must be 1.0
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
12
The ‘normal’ distribution
Very common distribution:
• often called a Gaussian distribution
• variable measured for large number of nominally identical objects;
• variation assumed to be caused by a large number of factors;
• each factor exerts a small random positive or negative influence;
• e.g. height:
0.25
0.15
0.1
0.05
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
18
14
14
.8
15
.6
16
.4
17
.2
10
10
.8
11
.6
12
.4
13
.2
9.
2
8.
4
7.
6
6
6.
8
5.
2
0
4.
4
Symmetric about mean
Unimodal
0.2
3.
6
age
diet
bone structure
genetic influences
etc.
2
2.
8





13
Mean = 30
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
.4
.8
.2
.6
40
38
36
35
.4
.8
.2
.6
40
38
36
35
33
.4
.8
.2
.6
32
30
28
27
33
.4
.8
.2
.6
32
30
28
27
24
.4
.8
.2
.6
25
22
20
19
24
.4
.8
.2
.6
25
22
20
19
16
.4
.8
17
14
16
.4
.8
17
14
.2
11
11
12
9.
6
9.
6
12
8
8
.2
6.
4
6.
4
0
4.
8
0.05
4.
8
0.1
3.
2
0.15
3.
2
0.2
1.
6
0.25
1.
6
0
Mean = 10
0
Mean
Mean determines the centre of the curve:
0.25
0.2
0.15
0.1
0.05
0
14
Remember: Variance
Measure of spread: variance
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
16
17
18
15
Remember: Variance
sample variance =
s2
sample standard deviation =
s
= √ variance
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
16
Standard deviation
Standard deviation determines the ‘width’ of the curve:
Std. Devn. = 2
0.25
0.2
0.15
0.1
0.05
18
.4
.2
17
.6
15
16
14
4.
6
.8
14
4
.2
13
.4
12
.6
11
.8
10
10
9.
2
8.
4
7.
6
6.
8
6
5.
2
4.
4
3.
6
2.
8
2
0
Std. Devn. = 1
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
7
1
7.
6
1
6.
4
1
5.
8
1
5.
2
1
1
1
3.
4
1
2.
8
1
2.
2
1
1.
6
1
1
1
.8
0.
4
1
9
.2
9
.6
8
8
.8
.4
7
6
.2
6
5
.6
5
.4
4
.8
3
.2
3
.6
2
2
0
17
Remember:
Cumulative frequencies
Number of piglets
in a litter:
(discrete data)
cK = n
Litter size
Frequency
5
6
7
8
9
10
11
12
13
14
Total
1
0
2
3
3
9
8
5
3
2
Cum. Freq
1
1
3
6
9
18
26
31
34
36
36
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
18
Remember: Plotting
frequency
cumulative frequency
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
19
Cumulative normal distribution
1.2
1
0.8
0.6
0.4
0.2
18
14
14
.8
15
.6
16
.4
17
.2
10
10
.8
11
.6
12
.4
13
.2
9.
2
8.
4
7.
6
6
6.
8
5.
2
4.
4
3.
6
2
2.
8
0
For good demo, go to:
http://www.vertex42.com/ExcelArticles/mc/NormalDistribution-Excel.html
and download the Excel file
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
20
Relationship between the two
distributions
0.25
1.2
1
0.2
0.8
0.15
0.6
0.1
0.4
0.05
0.2
18
14
14
.8
15
.6
16
.4
17
.2
10
10
.8
11
.6
12
.4
13
.2
9.
2
8.
4
7.
6
6
6.
8
5.
2
4.
4
3.
6
18
14
14
.8
15
.6
16
.4
17
.2
10
10
.8
11
.6
12
.4
13
.2
9.
2
8.
4
7.
6
6
6.
8
5.
2
4.
4
3.
6
2
2.
8
2
2.
8
0
0
area under curve = 0.84
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
21
Probability of sample lying within
mean ± 2 standard deviations
Mean (μ)= 10.0
Std. devn (σ)= 2.0
P(X < μ – σ) = 15.866%
P(X < μ – 2σ) = 2.275%
P(μ – 2σ < X < μ + 2σ) =
= (100 – 2 * 2.275)%
= 94.5%
x
2.0
2.4
2.8
3.2
3.6
4.0
4.4
4.8
5.2
5.6
6.0
6.4
6.8
7.2
7.6
8.0
8.4
8.8
9.2
9.6
10.0
Prob Dist
0.00013383
0.000291947
0.000611902
0.001232219
0.002384088
0.004431848
0.007915452
0.013582969
0.02239453
0.035474593
0.053990967
0.078950158
0.110920835
0.149727466
0.194186055
0.241970725
0.289691553
0.333224603
0.36827014
0.391042694
0.39894228
Cum Prob Dist
00.003%
00.007%
00.016%
00.034%
00.069%
00.135%
00.256%
00.466%
00.820%
01.390%
02.275%
03.593%
05.480%
08.076%
11.507%
15.866%
21.186%
27.425%
34.458%
42.074%
50.000%
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
22
Probability of sample lying within
mean ± 2 standard deviations
0.25
0.2
0.15
0.1
2.275%
94.5%
2.275%
0.05
μ – 2σ
μ
18
14
14
.8
15
.6
16
.4
17
.2
10
10
.8
11
.6
12
.4
13
.2
9.
2
8.
4
7.
6
6.
8
6
5.
2
4.
4
3.
6
2.
8
2
0
μ + 2σ
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
23
Uniform probability distribution
Also called rectangular distribution
P(X)
1.0
P(X < y) = 1.0  y
=y
x
0.0
0.0
1.0
y
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
24
Uniform probability distribution
P(X)
P(X < a) = a
P(X < b) = b
1.0
P(a ≤ X < b) = b - a
x
0.0
0.0
a
b
1.0
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
25
Sampling from a distribution
Suppose we want a stream of numbers sampled from a given distribution.
Previously we had the sample data and wanted the distribution.
Now we have the distribution and want sample data.
Simplest to sample from the uniform distribution between 0.0 and 1.0:
0.6282666143546787
0.0720166704579284
0.6377396244822982
0.3689553414430091
0.0867381632942044
0.7706191731891433
0.4400410508617185
0.1874836450093842
0.5892161544310359
0.6832029056956863
0.6597233555218959
0.4262198006313059
0.7327364126731544
0.7270704022602184
0.1345094277951323
0.9375335692470778
0.8196076287840244
0.9969146442986877
0.3064954363270612
0.6184114671445469
...
Use a random number generator
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
26
Random Number Generators
From a given starting number (the seed) there are algorithms
which will generate a series of pseudo-random numbers
which are uniformly distributed:
• linear congruential pseudorandom number generator
• but you didn’t want to know this!
Computers are deterministic:
• from a given starting point they always do the same thing;
• how do we get different series?
• start from different seeds:
 choose the seed yourself
 derive it from the computer clock (date and time of day)
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
27
Java support – class Math
static double random()
• returns a double value with a positive sign, greater than or equal to 0.0
and less than 1.0 (0.0 ≤ x < 1.0);
• when this method is first called, it creates a single new pseudo-randomnumber generator (seed derived automatically) which is used thereafter
for all calls to this method and is used nowhere else.
public void randGen() //demo of Math.random()
{
for (int i=0; i<20; i++)
System.out.println(Math.random());
}
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
28
Java support – class java.util.Random
Construct a random number generator:
• public Random()

Creates a new random number generator; this constructor sets the
seed of the random number generator to a value very likely to be
distinct from any other invocation of this constructor.
• public Random(long seed)

Creates a new random number generator using a single long seed
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
29
Java support – class java.util.Random
Get the (next) random number:
• public double nextDouble()
 just like Math.random()
• public int nextInt(int n)
 returns a pseudo-random, uniformly distributed int value between 0
(inclusive) and the specified value (exclusive)
• public double nextGaussian()
 returns the next pseudo-random, Gaussian ("normally") distributed
double value with mean 0.0 and standard deviation 1.0 from this
random number generator's sequence.
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
30
Testing the uniformity
public void testUniform(int numberOfSamples, int numberOfBins){
double[] hist = new double[numberOfBins];
double sample;
int binNumber;
for (int i = 0; i < numberOfSamples; i++){
sample = Math.random();
binNumber = (int) (sample * numberOfBins);
hist[binNumber]++;
}
double relativeFrequency;
for (int k = 0; k < numberOfBins; k++){
relativeFrequency = hist[k]/numberOfSamples;
System.out.println(relativeFrequency);
}
System.out.println();
}
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
31
Simulating coin toss
public String coinToss(int n){
String s = "";
Random random = new Random();
for (int i = 0; i < n; i++) {
int t = random.nextInt(2);
// i.e. t = 0 or 1
if (t == 0)
// we want this to happen
// with probability 0.5
s = s + "T ";
else
s = s + "H ";
}
return s;
}
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
32
Simulating dice throw
public String diceThrow(int n){
String s = "";
Random random = new Random();
for (int i = 0; i < n; i++)
s = s + random.nextInt(6) + " ";
return s;
}
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
33
Simulating picking balls
public String pickABall(int n){
String s = "";
Random random = new Random();
for (int i = 0; i < n; i++) {
double b = random.nextDouble();
if (b < 0.3)
// we want this to happen with probability 0.3
s = s + "R ";
else
s = s + "W ";
}
return s;
}
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
34
‘Foxes and rabbits’ simulation
Rabbits and foxes in an enclosed field;
• example of a “predator-prey” simulation;
• see Barnes and Kölling, Objects first with Java, Chapter 10.
The field:
• has a fixed number of square
cells arranged in a square grid;
• each cell can be occupied
by only one animal;
• animals can’t leave the field.
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
35
Animals
All animals have:
• a state (alive or dead!)
• an age
• a location in the field
Rabbits:
• die from being eaten by a fox
All animals do:
• get older
• breed
• try to move to a new location
• die of old age
• die of overcrowding
Foxes:
• die of hunger
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
36
Breeding
private int breed() // returns size of litter (if any)
{
int births = 0;
if (rand.nextDouble() <= BREEDING_PROBABILITY)
{
births = rand.nextInt(MAX_LITTER_SIZE) + 1;
}
return births;
}
Rabbits:
BREEDING_PROBABILITY = 0.15;
MAX_LITTER_SIZE = 5;
Foxes:
BREEDING_PROBABILITY = 0.09;
MAX_LITTER_SIZE = 3;
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
37
Breeding probability
P(X)
1.0
x
0.0
0.0
1.0
if (rand.nextDouble() <= BREEDING_PROBABILITY) ...
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
38
Number of births with
each litter size equally likely
P(X)
1 / MAX_LITTER_SIZE
x
0.0
0
1
2
3
...
MAX_LITTER_SIZE
births = rand.nextInt(MAX_LITTER_SIZE) + 1
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
39
Number of births with different
probabilities of litter sizes
P(X)
p3
p…
p2
p1
pMAX
x
0.0
0
1
2
3
...
MAX_LITTER_SIZE
Given p1, p2, ... pMAX, how do you use a random
number generator to generate a litter size?
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
40
Have a
good
Easter!
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
41
Download