CS1512 Foundations of Computing Science 2 Lecture 24 Probability and statistics (5) Random number generators www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ © J R W Hunter, 2006 1 After Easter Lectures • Dr Kees van Deemter will take over • Logic and HCI • Same times and places Tutorials • Logic and HCI • Same times and places Practicals • Java programming simulation – Robocode ‘take-home asessment’ • Same times and places www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 2 Continuous Assessment Week 9 test • week 9 = week after Easter vacation; • worth 10% of the marks of the course; • as for week 5 – test under practical exam conditions; • will test your knowledge of inheritance. ‘Practical exam’ • completed in your own time; • worth 30% of the marks of the course; • handed out in week 10; hand in by the end of week 12. Both • conditional on AUT dispute being resolved; • safest course is to assume that they will go ahead; • if you are worried about the possible effects of the AUT action, write to the Principal and express those concerns. www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 3 Remember: Continuous data • • Divide range of observations into non-overlapping intervals (bins) Count number of observations in each bin • Enzyme concentration data: 121 95 85 119 62 • • 25 81 145 57 104 83 123 100 64 139 110 67 70 151 201 60 113 93 48 68 101 78 118 92 95 Range: 25 to 201 10 bins of width 20 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 4 Remember: Enzyme concentrations Concentration 19.5 39.5 59.5 79.5 99.5 119.5 139.5 159.5 179.5 199.5 Totals ≤ c < 39.5 ≤ c < 59.5 ≤ c < 79.5 ≤ c < 99.5 ≤ c < 119.5 ≤ c < 139.5 ≤ c < 159.5 ≤ c < 179.5 ≤ c < 199.5 ≤ c < 219.5 Freq. Rel.Freq. 1 2 7 7 7 3 2 0 0 1 0.033 0.067 0.233 0.233 0.233 0.100 0.067 0.000 0.000 0.033 30 1.000 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 5 Relative Frequency Histogram relative frequency 2.50E-01 2.00E-01 height of the bar gives the relative frequency 1.50E-01 1.00E-01 5.00E-02 0.00E+00 19. 5 39. 5 59. 5 79. 5 99. 5 119.5 139.5 159.5 179.5 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 199.5 6 Density Histograms Plot relative frequency / width of the column (bin width) so that the area of the bar now gives the relative frequency relative frequency density 1.40E-02 relative frequency = relative frequency density bin width = 0.0165 20 = 0.233 1.20E-02 1.00E-02 8.00E-03 6.00E-03 4.00E-03 2.00E-03 0.00E+00 19. 5 39. 5 59. 5 79. 5 99. 5 119.5 139.5 159.5 179.5 199.5 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 7 Addition of areas 1.40E-02 1.20E-02 1.00E-02 8.00E-03 6.00E-03 4.00E-03 2.00E-03 0.00E+00 19. 5 39. 5 59. 5 79. 5 99. 5 119.5 139.5 159.5 179.5 199.5 relative frequencies of values between here and here = this area www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 8 Increase the number of samples ... and decrease the width of the bin ... www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 9 Relative frequency as area under the curve relative frequency of values between a and b = area www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 10 Continuous random variable Consider a large population of individuals e.g. all males in the UK over 16 Consider a continuous attribute e.g. Height: X Select an individual at random so that any individual is as likely to be selected as any other X is said to be a continuous random variable www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 11 Probability density function The probability distribution of X is said to be its probability density function defined such that: P(a ≥ x > b) = area under the curve between a and b NB total area under curve must be 1.0 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 12 The ‘normal’ distribution Very common distribution: • often called a Gaussian distribution • variable measured for large number of nominally identical objects; • variation assumed to be caused by a large number of factors; • each factor exerts a small random positive or negative influence; • e.g. height: 0.25 0.15 0.1 0.05 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 18 14 14 .8 15 .6 16 .4 17 .2 10 10 .8 11 .6 12 .4 13 .2 9. 2 8. 4 7. 6 6 6. 8 5. 2 0 4. 4 Symmetric about mean Unimodal 0.2 3. 6 age diet bone structure genetic influences etc. 2 2. 8 13 Mean = 30 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ .4 .8 .2 .6 40 38 36 35 .4 .8 .2 .6 40 38 36 35 33 .4 .8 .2 .6 32 30 28 27 33 .4 .8 .2 .6 32 30 28 27 24 .4 .8 .2 .6 25 22 20 19 24 .4 .8 .2 .6 25 22 20 19 16 .4 .8 17 14 16 .4 .8 17 14 .2 11 11 12 9. 6 9. 6 12 8 8 .2 6. 4 6. 4 0 4. 8 0.05 4. 8 0.1 3. 2 0.15 3. 2 0.2 1. 6 0.25 1. 6 0 Mean = 10 0 Mean Mean determines the centre of the curve: 0.25 0.2 0.15 0.1 0.05 0 14 Remember: Variance Measure of spread: variance 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 16 17 18 15 Remember: Variance sample variance = s2 sample standard deviation = s = √ variance www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 16 Standard deviation Standard deviation determines the ‘width’ of the curve: Std. Devn. = 2 0.25 0.2 0.15 0.1 0.05 18 .4 .2 17 .6 15 16 14 4. 6 .8 14 4 .2 13 .4 12 .6 11 .8 10 10 9. 2 8. 4 7. 6 6. 8 6 5. 2 4. 4 3. 6 2. 8 2 0 Std. Devn. = 1 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 7 1 7. 6 1 6. 4 1 5. 8 1 5. 2 1 1 1 3. 4 1 2. 8 1 2. 2 1 1. 6 1 1 1 .8 0. 4 1 9 .2 9 .6 8 8 .8 .4 7 6 .2 6 5 .6 5 .4 4 .8 3 .2 3 .6 2 2 0 17 Remember: Cumulative frequencies Number of piglets in a litter: (discrete data) cK = n Litter size Frequency 5 6 7 8 9 10 11 12 13 14 Total 1 0 2 3 3 9 8 5 3 2 Cum. Freq 1 1 3 6 9 18 26 31 34 36 36 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 18 Remember: Plotting frequency cumulative frequency www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 19 Cumulative normal distribution 1.2 1 0.8 0.6 0.4 0.2 18 14 14 .8 15 .6 16 .4 17 .2 10 10 .8 11 .6 12 .4 13 .2 9. 2 8. 4 7. 6 6 6. 8 5. 2 4. 4 3. 6 2 2. 8 0 For good demo, go to: http://www.vertex42.com/ExcelArticles/mc/NormalDistribution-Excel.html and download the Excel file www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 20 Relationship between the two distributions 0.25 1.2 1 0.2 0.8 0.15 0.6 0.1 0.4 0.05 0.2 18 14 14 .8 15 .6 16 .4 17 .2 10 10 .8 11 .6 12 .4 13 .2 9. 2 8. 4 7. 6 6 6. 8 5. 2 4. 4 3. 6 18 14 14 .8 15 .6 16 .4 17 .2 10 10 .8 11 .6 12 .4 13 .2 9. 2 8. 4 7. 6 6 6. 8 5. 2 4. 4 3. 6 2 2. 8 2 2. 8 0 0 area under curve = 0.84 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 21 Probability of sample lying within mean ± 2 standard deviations Mean (μ)= 10.0 Std. devn (σ)= 2.0 P(X < μ – σ) = 15.866% P(X < μ – 2σ) = 2.275% P(μ – 2σ < X < μ + 2σ) = = (100 – 2 * 2.275)% = 94.5% x 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0 8.4 8.8 9.2 9.6 10.0 Prob Dist 0.00013383 0.000291947 0.000611902 0.001232219 0.002384088 0.004431848 0.007915452 0.013582969 0.02239453 0.035474593 0.053990967 0.078950158 0.110920835 0.149727466 0.194186055 0.241970725 0.289691553 0.333224603 0.36827014 0.391042694 0.39894228 Cum Prob Dist 00.003% 00.007% 00.016% 00.034% 00.069% 00.135% 00.256% 00.466% 00.820% 01.390% 02.275% 03.593% 05.480% 08.076% 11.507% 15.866% 21.186% 27.425% 34.458% 42.074% 50.000% www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 22 Probability of sample lying within mean ± 2 standard deviations 0.25 0.2 0.15 0.1 2.275% 94.5% 2.275% 0.05 μ – 2σ μ 18 14 14 .8 15 .6 16 .4 17 .2 10 10 .8 11 .6 12 .4 13 .2 9. 2 8. 4 7. 6 6. 8 6 5. 2 4. 4 3. 6 2. 8 2 0 μ + 2σ www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 23 Uniform probability distribution Also called rectangular distribution P(X) 1.0 P(X < y) = 1.0 y =y x 0.0 0.0 1.0 y www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 24 Uniform probability distribution P(X) P(X < a) = a P(X < b) = b 1.0 P(a ≤ X < b) = b - a x 0.0 0.0 a b 1.0 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 25 Sampling from a distribution Suppose we want a stream of numbers sampled from a given distribution. Previously we had the sample data and wanted the distribution. Now we have the distribution and want sample data. Simplest to sample from the uniform distribution between 0.0 and 1.0: 0.6282666143546787 0.0720166704579284 0.6377396244822982 0.3689553414430091 0.0867381632942044 0.7706191731891433 0.4400410508617185 0.1874836450093842 0.5892161544310359 0.6832029056956863 0.6597233555218959 0.4262198006313059 0.7327364126731544 0.7270704022602184 0.1345094277951323 0.9375335692470778 0.8196076287840244 0.9969146442986877 0.3064954363270612 0.6184114671445469 ... Use a random number generator www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 26 Random Number Generators From a given starting number (the seed) there are algorithms which will generate a series of pseudo-random numbers which are uniformly distributed: • linear congruential pseudorandom number generator • but you didn’t want to know this! Computers are deterministic: • from a given starting point they always do the same thing; • how do we get different series? • start from different seeds: choose the seed yourself derive it from the computer clock (date and time of day) www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 27 Java support – class Math static double random() • returns a double value with a positive sign, greater than or equal to 0.0 and less than 1.0 (0.0 ≤ x < 1.0); • when this method is first called, it creates a single new pseudo-randomnumber generator (seed derived automatically) which is used thereafter for all calls to this method and is used nowhere else. public void randGen() //demo of Math.random() { for (int i=0; i<20; i++) System.out.println(Math.random()); } www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 28 Java support – class java.util.Random Construct a random number generator: • public Random() Creates a new random number generator; this constructor sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor. • public Random(long seed) Creates a new random number generator using a single long seed www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 29 Java support – class java.util.Random Get the (next) random number: • public double nextDouble() just like Math.random() • public int nextInt(int n) returns a pseudo-random, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive) • public double nextGaussian() returns the next pseudo-random, Gaussian ("normally") distributed double value with mean 0.0 and standard deviation 1.0 from this random number generator's sequence. www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 30 Testing the uniformity public void testUniform(int numberOfSamples, int numberOfBins){ double[] hist = new double[numberOfBins]; double sample; int binNumber; for (int i = 0; i < numberOfSamples; i++){ sample = Math.random(); binNumber = (int) (sample * numberOfBins); hist[binNumber]++; } double relativeFrequency; for (int k = 0; k < numberOfBins; k++){ relativeFrequency = hist[k]/numberOfSamples; System.out.println(relativeFrequency); } System.out.println(); } www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 31 Simulating coin toss public String coinToss(int n){ String s = ""; Random random = new Random(); for (int i = 0; i < n; i++) { int t = random.nextInt(2); // i.e. t = 0 or 1 if (t == 0) // we want this to happen // with probability 0.5 s = s + "T "; else s = s + "H "; } return s; } www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 32 Simulating dice throw public String diceThrow(int n){ String s = ""; Random random = new Random(); for (int i = 0; i < n; i++) s = s + random.nextInt(6) + " "; return s; } www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 33 Simulating picking balls public String pickABall(int n){ String s = ""; Random random = new Random(); for (int i = 0; i < n; i++) { double b = random.nextDouble(); if (b < 0.3) // we want this to happen with probability 0.3 s = s + "R "; else s = s + "W "; } return s; } www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 34 ‘Foxes and rabbits’ simulation Rabbits and foxes in an enclosed field; • example of a “predator-prey” simulation; • see Barnes and Kölling, Objects first with Java, Chapter 10. The field: • has a fixed number of square cells arranged in a square grid; • each cell can be occupied by only one animal; • animals can’t leave the field. www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 35 Animals All animals have: • a state (alive or dead!) • an age • a location in the field Rabbits: • die from being eaten by a fox All animals do: • get older • breed • try to move to a new location • die of old age • die of overcrowding Foxes: • die of hunger www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 36 Breeding private int breed() // returns size of litter (if any) { int births = 0; if (rand.nextDouble() <= BREEDING_PROBABILITY) { births = rand.nextInt(MAX_LITTER_SIZE) + 1; } return births; } Rabbits: BREEDING_PROBABILITY = 0.15; MAX_LITTER_SIZE = 5; Foxes: BREEDING_PROBABILITY = 0.09; MAX_LITTER_SIZE = 3; www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 37 Breeding probability P(X) 1.0 x 0.0 0.0 1.0 if (rand.nextDouble() <= BREEDING_PROBABILITY) ... www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 38 Number of births with each litter size equally likely P(X) 1 / MAX_LITTER_SIZE x 0.0 0 1 2 3 ... MAX_LITTER_SIZE births = rand.nextInt(MAX_LITTER_SIZE) + 1 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 39 Number of births with different probabilities of litter sizes P(X) p3 p… p2 p1 pMAX x 0.0 0 1 2 3 ... MAX_LITTER_SIZE Given p1, p2, ... pMAX, how do you use a random number generator to generate a litter size? www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 40 Have a good Easter! www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 41