Physics 2660: Fundamentals of Scientific Computing Lecture 12 Notes • Only 3 weeks left in the semester, 3 lectures left including this one: – 19 April – 26 April – 3 May • Labs: two labs left – 21 April – 28 April • Upcoming homeworks: – – – – HW11 due Monday 25 April at midnight HW12 due Saturday 30 April at 6pm HW13 due Wednesday 4 May at midnight HW14 due Wednesday 4 May at midnight • Solutions to all labs and hw’s are in the process of being posted … sorry for recent delays 2 Notes • Final exam is coming: – Take-­‐‑home projects • 3 or 4 problems – Like a more involved, longer multipart homework assignment – Assigned last week of semester on Tuesday 3 May – Due Thursday May 12: • electronic copies by 9:00am • hard-­‐‑copies must be submiTed Thursday 12 May between 08:00-­‐‑10:00 in room 022-­‐‑C, our computer lab 3 Notes • Office hours reminder: – My office hours are in Room 022-­‐‑C (our computer lab) from 3:30-­‐‑5pm on Tuesdays or by appointment • Today they will start a liTle late! • 3:45 or so – TA office hours, also in Room 022-­‐‑C • Mondays 5-­‐‑8pm • Tuesdays 5-­‐‑8pm 4 Review and Today’s Outline • Last time: – Three probability distributions and the Gaussian Limit – Experimental Uncertainties – Comparing two models • Today: some powerful ideas! – – – – – Monte Carlo methods Comparing two models Tuning a model/theory to best match the data Searching Sorting 5 Comparing Data to a Prediction 6 Comparing Data to Some Prediction • This is science at its best! 0. Prediction 1. Observation 2. Comparison 3. Conclusion 4. Refine Prediction 5. Repeat as necessary • The comparison step is a crucial step in how we arrive at a refined picture of how the world works. – • That’s our mission as scientists, no? Great news: there are numerical methods one can use to do this quantitatively – perfect for executing in computer programs! 7 How good is this theory? 8 How good is this theory? 9 How good is this theory? Suggestions for a simple model? 10 How good is this theory? Question: How well does this model fit the data? 11 How good is this theory? Question: How well does this model fit the data? 12 Which theory is beMer? How to arbitrate between these two? 13 Which theory is beMer? 14 Calculation of Chi2 15 Comparison of Models: Chi2 Values 16 Chi2 Distribution 17 Chi2 Distribution …and still be right …and still be right 18 Chi2 Distribution • So the probability of having a measurement with χ2 > N can be determined from this χ2 distribution • This distribution is same as the one for the integral of the Gaussian dist one data point = one degree of freedom • Why use this other thing? – We can calculate the χ2 for more than one data point and easily combine into a single figure of merit 19 More Data Points: More Degrees of Freedom 20 Degrees of Freedom • When comparing a theory to some data, as we are doing here, each compared prediction from the model is called a degree of freedom of the comparison – comparing 1 data point = 1 degree of freedom – comparing 5 data points = 5 degrees of freedom – comparing N data points = N degrees of freedom • The χ2 distribution changes as one considers a comparison with more degrees of freedom 21 Many Degrees of Freedom For large num of degrees of freedom k, the most probable value of χ2 is equal to k. 22 2 Reduced χ 23 2 Probabilities for Reduced χ As a rough rule of thumb, a reduced chi2 of ~1.0 indicates good agreement between samples, given their uncertainties 24 2 Probabilities for Reduced χ 25 Probability of being consistent? 26 If this theory were an accurate representation of our data… 2 Probabilities for Reduced χ Notes: Too LARGE reduced chi2 implies poor agreement btwn theory and data. Too SMALL reduced chi2 implies one could be OVERFITTING the data – the agreement should still be impacted by the uncertainty on each point. reduced chi2 ~= 1.0 indicates theory and data are in accord within uncertainties, ie, measurement collection sometimes high (50%) sometimes low (50%). 27 2 Usefulness ofχ 28 Summary so far… • Compare some data to a model, account for uncertainties • Calculate reduced χ2 • If good agreement – should see different points sometimes high/low – if k large, reduced χ2 ~ 1.0 29 Tuning a Model to Best Match Some Data 30 Tuning a Model 31 Tuning a Model Can we figure out which model – which values of a and b – the data most favors? 32 Probability of Some Observation 33 Probability of Multiple Observations The probability of the collection of data – 3 observations – is just the product of the three individual probabilities 34 Probability of Multiple Observations The probability of the collection of data – k observations – is just the product of the k individual probabilities 35 P: The χ2 Likelihood Function 36 Minimizing the χ2 37 Minimizing the χ2 38 Minimizing the χ2 39 Minimizing the χ2 40 More Powerful Application: An Arbitrary Theory 41 FiMing with Gnuplot P(x;a, b, c) = 1 2π c 2 e − (ax−b)2 2c 2 42 Assessing the Quality of a Fit 43 Assessing the Quality of a Fit Trivial case Consequence: If the number of fit parameters is greater than or equal to the number of data points the χ2 is undefined. 44 Assessing the Quality of a Fit P(x;a, b, c) = 1 2π c 2 e − (ax−b)2 2c 2 45 Assessing the Quality of a Fit So there is a 90% probability that, if the data were consistent with the model (here a Gaussian-­‐‑like thing with 3 params), the data would have a higher chi2 value. Too good to be true? Why are the points so close to the model? Did the fit procedure cheat in some way? Are the uncertainties over-­‐‑estimated? 46 FiMing is Done EVERYWHERE 47 Curve FiMing 48 Deviations from the Model 49 The Pull Distribution 50 Bias – Is the Prediction In Accord with the Data? 51 Clusters of Data Above/Below 52 Clusters of Data Above/Below 53 More Testing of Compatibility 54 Cumulative Distribution Function 55 Cumulative Distribution Function 56 Example: PDF and CDF 57 Empirical Distribution Function 58 Empirical Distribution Function • The ECDF is made from “unbinned” data – not from a binned histogram – use raw measured values • Do this by: 1. say you have N values, xi 2. sort the N values in order of increasing value 3. plot each of the N values with xi on the x-­‐‑axis and i/ N on the y-­‐‑axis • Now, compare model’s CDF and the data’s ECDF… 59 Testing Compatibility 60 Testing Compatibility 61