Mathematical Modelling 2015 Instructor: Prof. Ganser, Armstrong Hall 408K ,ganser@math.wvu.edu PR: A background in differential equations, linear algebra and statistics/probability is necessary for this course. The course in statistics should be at the calculus level such as Stat 461 here at WVU. Students must have knowledge of a spread sheet program like JMP or Excel as well as a program for doing math such as Mathematica or Matlab. The first set assignments should help you decide if you have the proper background. Grading: 50% (assignments, projects, semester tests) 20% (Midterm) 30%(Final) Assignments are to be done individually and typed. The project write-up should be clear and concise with page numbers and include: i) Statement of the problem ii) Summary of the solution with reference (page numbers) to calculations and data analysis in the back. iii) Computer code and output in the back Assignments will have a due date. Sometimes the date is relaxed for all students because of unforeseen circumstances. However, once a date is set the assignments are due on that date. Any projects that are turned in late will either not be accepted or the grade will be reduced. Goals: This course covers many models. A summary is discussed the first day of class. It is more of a survey of mathematical models than a specialized course in particular models. Topics will not be studied in complete detail in order to see more examples. There is a danger that a student will feel that they have not learned the topic well enough to use it in the future. However, it is hoped that the student will have sufficient understanding of the models discussed so that he or she may know “where to start” when faced with a new problem. Also, the course should help students better understand the models developed by others. Outline Linear Models (such as y 0 1 x1 2 x2 . linear in the betas) Basic Theory of Least Squares Normally Distributed ' s Parameter Table Focus on model selection using AIC, SIC and CP statistics Prediction Dimensional Analysis How it works Examples (simple ones plus drag on a sphere, period of a pendulum, etc) Probability Models Review of selected distributions Maximum Likelihood Approximate confidence intervals Goodness of fit Main Effects / Additive Models Design of Experiments Orthogonal Arrays Time Series (if there is time) Spectral Analysis ARMA models Assignments 1. Enter the data for the tape problem into a spread sheet. The physical problem will be explained in class. Estimate the value of the angle A for c 24cm and w 15cm. in any way. 10 10 10 10 10 20 20 20 20 20 30 30 30 30 30 Circum c/cm Tape 3 5 7 9 11 3 5 7 9 11 3 5 7 9 11 width w/cm Angle 17 30 44 64 --- 9 14 20 27 33 6 10 14 17 21 of Pitch A/deg 2. x 1 1 2 4 5 6 6 7 4 y 3 6 4 3 5 9 10 8 6 Use linear regression to fit the line y 0 1 x to the data. Include the parameter table for the fit. This should have at minimum the estimates and standard errors for the parameters, t-statistic, p-values or probabilities and the standard error of regression. It is not necessary to know what these numbers mean yet. It is important that you know how to use some software to find these numbers. 2a.A baseball player wants to use analytics to improve their hitting and so it is decided to use a linear model. First it is decided, based on knowledge of the ingredients that go into hitting, to reduce the analysis to four factors with two levels as shown in the table. Factor s Name Unit Type Level 1 Level 2 Foot Angle Categorical Square 45 Position c Choke on Inches Numeric .000 2 in. bat p Position in Categorical Forward Back box t Speed of mph Numeric 60mph 80mph pitch The player tried to hit the ball 100 times at each of the 16 combinations in the data set below. The results show on many hits he was able to get out of the 100 tries. Run Stance s Choke c Position Speed Hits 1 2 in. Back 60 mph 13 45 2 Square 0 in. Forward 60 mph 28 3 Square 2 in. Forward 80 mph 14 4 2 in. Forward 60 mph 38 45 5 6 7 8 9 10 11 12 13 14 15 16 Square 45 Square Square Square 45 45 45 Square Square 45 45 2 in. 0 in. 0 in. 2 in. 0 in. 0 in. 0 in. 0 in. 0 in. 2 in. 2 in. 2 in. Back Back Back Forward Back Back Forward Forward Forward Back Forward Back 80 mph 60 mph 60 mph 60 mph 80 mph 80 mph 60 mph 80 mph 80 mph 60 mph 80 mph 80 mph 27 11 13 40 19 23 34 5 2 23 9 31 (a)Use the model hi s j ck pl tm ( sc ) jk ( sp ) jl ( st ) jm (cp ) kl (ct ) km ( pt )lm i . As with the Bowling Ball problem, s1 s2 0 , c1 c2 0 etc. To be uniform use the table to determine the meaning of the variables. s1 45 s2 Square c1 2 in c2 0 in p1 Forward p2 Back t1 80 mph t2 60mph (b) Calculate the Mean Square Error of the model and the common Standard Error for one of the parameters. (c)Use the rule of thumb 2 S.D. to eliminate parameters. Note if an interaction term is not eliminated this means the individual parameters making up the interaction must also be included even if they would be eliminated on their own. (d)Redo the calculation for the reduced model and find the new Mean Square Error and compare to the previous value. Does the reduced model seem superior? (e) Based on this analysis, what is the advice you would give to the baseball player? 3. The input –output model problem. The program is on e-campus. See if you can find a formula for predicting the output from the input by doing different experiments. Remember that the “system” has the properties discussed in class. 4. Suppose you experiment with two different shoes and two different balls to determine the best combination for scoring the highest in bowling. Below is the data. Can you come to some conclusion as to what may be the best combination? One possible answer is there is no conclusion. Note: Each combination was done at random and then put in the order below. Shoes Ball 176 A1 B1 176 186 184 180 182 182 188 A1 A2 A2 A1 A1 A2 A2 B2 B1 B2 B1 B2 B1 B2 5. The following data set gives the mass of a chunk of cement and the mass of the four components used in the mix. The problem is to use these masses to predict the heat that was produced given in the y column. What variables would you use in the model? This is very basic, and the question has to do with the initial pencil and paper analyisis. Mass(gm) 1005 990 985 1025 900 905 1005 890 955 1000 870 980 960 x1(gm) 70.35 9.9 108.35 112.75 63 99.55 30.15 8.9 19.1 210 8.7 107.8 96 x2(gm) 261.3 287.1 551.6 317.75 468 497.75 713.55 275.9 515.7 470 348 646.8 652.8 x3(gm) 60.3 148.5 78.8 82 54 81.4 170.85 195.8 171.9 40 200.1 88.2 76.8 x4(gm) 603 514.8 197 481.75 297 199.1 60.3 391.6 210.1 260 295.8 117.6 115.2 y(calories) 78892.5 73557 102735.5 89790 85950 98826 103213.5 64525 88910.5 115900 72906 111034 105024 6. Read the Dimensional Analysis Notes that are on the e-campus page for our class. Do problems 3 and 4 on page 9. 7. Crater Ejecta Scaling Laws Article. (a) Do the dimensional analysis of Eq.(1)( the authors never do this) (b) In your own words derive Eq(6) from Eq.(1). Pay attention to how the authors do it and why they do it. 8. A computer chip maker is testing several possible ways of producing a new chip. 9 wafers at various locations in the chamber where chemicals are deposited on the wafers are used to measure the value of the particular control factors effecting the depositions. Each wafer is analyzed and the number of defects is counted making a total of 9 measurements for each particular process, yi , i 1,...,9. In general a wafer is considered acceptable if the number of defects is less than or equal to n0 . Using the numbers yi determine a quantity based on these numbers that will be used to pick the best process for producing wafers. Actually find two possible choices for such a quantity. 9. A company sells a product for $500 that is guaranteed to be on target with a variation of at most 2 . The company knows that the production process yields items on target on average with a variation of .5 with no further testing. An additional test on each individual unit costs $15 to determine if it is within the guaranteed specifications. Make a case for or against the cost of further testing using a quadratic loss function. 10. A medical company produces a part that has a hole measuring .50 .050 cm. The tooling used to make the hole is worn and needs replacing, but management doesn’t feel it is necessary since it still makes “good parts”. All parts pass QC, but several parts have been rejected by assembly. Failure costs per part is $45.00. A new tool costs $10,000. Experience indicates that a worn tool can produce around 3000 holes that are within specs. Using the quadratic loss function explain why it 𝑚ay benefit the company and the customer to replace the tool more frequently. Typical Data giving the precise diameter of the hole follows (in cm.) {.459,.462,.467,.474,.476,.478,.483,.489,.491,.492,.495,.495,.495,.498,.500,.501,.501,.502,.505, .509,.511,.516,.521,.524,.527,.527,.532,.532,.533,.536} 11. Consider the experiment given in the Table with 3 Factors A ,B, and C. Each with 2 levels. What factor affects the loss function L[y] the most and what settings would minimize L[y]? (Use main effects model L[ y ] ai b j ck ) 3.567 2.145 1.678 2.564 1 2 3 A1 A1 A2 A2 B1 B2 B1 B2 C1 C2 C2 C1 12. The problem is to predict the energy consumption for the winter of 1971(the first quarter of 1971). (a)Graph the data and determine a linear model. Model the slight increasing trend with a straight line. Write out the linear model y X . Year 1 2 3 4 1965 874 679 616 816 1966 866 700 603 814 1967 843 719 594 819 1968 906 703 634 844 1969 952 745 635 871 13. Turkey data. The following data corresponds to the weight of turkeys(y) at age (x) raised in Georgia, Virginia, or Wisconsin. y 13.3 8.9 15.1 10.4 13.1 12.4 13.2 11.8 11.5 14.2 15.4 13.1 13.8 x 28 20 32 22 29 27 28 26 21 27 29 23 25 Origin G G G G V V V V W W W W W (a) Write out the linear model y X for this problem using a dummy variable to model the state of origin and assuming the weight grows linearly with age ( straight line). Use a1 , a2 , a3 (G,V,W) with a2 a1 a3 for the coefficients corresponding to the states. k k 14. Plot y(k) cos 22.4 .5cos 23.1 .3cos 21 k k 1,2,...50. Use this data to calculate the Sample Spectrum. 15. Let wt be a sequence of independent and normally distributed random variables with mean zero and variance one for every t . Use any program to graph a realization of wt as a function of t for t 1,2,...100. Use this data to calculate the Sample Spectrum. 16. A huge multinational company, Gans Dynamics, manufactures spacecraft as well as breakfast cereal. A major problem in the production process of cereal is contamination of the cereal by metal pieces that entered the process before the start or pieces that have broken off of the many metal rollers involved in the processing. Also rollers with an imperfection can cause problems. To monitor the process, sensors that record vibrations are positioned throughout the line to detect abnormal vibrations. The many rollers rotate at various speeds with the highest being a little less than 5 rounds per second. Data from the sensors is collected at the rate of 10Hz. Two files with n=1000 data points each (100 seconds of data) are given on e-campus. One file corresponds to normal operations and the other to contaminated operations. Graph the data and use spectral analysis to determine the sample spectrums. Discuss results and in particular the “contaminated spectrum” and what it implies. Of importance are the frequencies that are highlighted in the contaminated spectrum. This would help locate where the problem might be located. 17. An experiment is done to see if a coin is fair(the probability of heads is .5). In one experiment the coin is tossed 100 times and heads show up 57 times. Use the likelihood ratio test and the result that 2 log is approximately 2 (when is large) to test p0 .5 . Do the same calculation for n 1000 and heads appeared 570 times. Why do you think (use common sense) the conclusions are different? 18. Suppose a great basketball player is a terrible free throw shooter. The coach knows that on average, “Mounty” makes about 40% of his free throws but the coach wants to know what percent of free throws Mounty will make given that he either missed the one before or made the one before. Here is data from one game from start to finish: 1-0-0-10-1-1-0-0-0-1. Can you help the coach answer her question? Name:___________________________________ Major/Year: Statistics Background: Software for doing Mathematics/spreadsheets: What do you expect from a Math Modelling course?