Math Modelling 2015

advertisement
Mathematical Modelling 2015
Instructor: Prof. Ganser, Armstrong Hall 408K ,ganser@math.wvu.edu
PR: A background in differential equations, linear algebra and statistics/probability is
necessary for this course. The course in statistics should be at the calculus level such as
Stat 461 here at WVU. Students must have knowledge of a spread sheet program like
JMP or Excel as well as a program for doing math such as Mathematica or Matlab. The
first set assignments should help you decide if you have the proper background.
Grading: 50% (assignments, projects, semester tests)
20% (Midterm)
30%(Final)
Assignments are to be done individually and typed. The project write-up should be
clear and concise with page numbers and include:
i) Statement of the problem
ii) Summary of the solution with reference (page numbers) to calculations and data
analysis in the back.
iii) Computer code and output in the back
Assignments will have a due date. Sometimes the date is relaxed for all students because
of unforeseen circumstances. However, once a date is set the assignments are due on that
date. Any projects that are turned in late will either not be accepted or the grade will be
reduced.
Goals: This course covers many models. A summary is discussed the first day of class.
It is more of a survey of mathematical models than a specialized course in particular
models. Topics will not be studied in complete detail in order to see more examples.
There is a danger that a student will feel that they have not learned the topic well enough
to use it in the future. However, it is hoped that the student will have sufficient
understanding of the models discussed so that he or she may know “where to start” when
faced with a new problem. Also, the course should help students better understand the
models developed by others.
Outline
Linear Models (such as y  0  1 x1   2 x2   . linear in the betas)
Basic Theory of Least Squares
Normally Distributed  ' s
Parameter Table
Focus on model selection using AIC, SIC and CP statistics
Prediction
Dimensional Analysis
How it works
Examples (simple ones plus drag on a sphere, period of a pendulum, etc)
Probability Models
Review of selected distributions
Maximum Likelihood
Approximate confidence intervals
Goodness of fit
Main Effects / Additive Models
Design of Experiments
Orthogonal Arrays
Time Series (if there is time)
Spectral Analysis
ARMA models
Assignments
1. Enter the data for the tape problem into a spread sheet. The physical problem will be
explained in class. Estimate the value of the angle A for c  24cm and w  15cm. in
any way.
10 10 10 10 10 20 20 20 20 20 30 30 30 30 30
Circum
c/cm
Tape
3
5
7
9
11 3
5
7
9
11 3
5
7
9
11
width
w/cm
Angle 17 30 44 64 --- 9
14 20 27 33 6
10 14 17 21
of
Pitch
A/deg
2.
x
1
1
2
4
5
6
6
7
4
y
3
6
4
3
5
9
10
8
6
Use linear regression to fit the line y  0  1 x to the data. Include the parameter table
for the fit. This should have at minimum the estimates and standard errors for the
parameters, t-statistic, p-values or probabilities and the standard error of regression. It is
not necessary to know what these numbers mean yet. It is important that you know how
to use some software to find these numbers.
2a.A baseball player wants to use analytics to improve their hitting and so it is decided to
use a linear model. First it is decided, based on knowledge of the ingredients that go into
hitting, to reduce the analysis to four factors with two levels as shown in the table.
Factor
s
Name
Unit
Type
Level 1
Level 2
Foot
Angle
Categorical
Square
45
Position
c
Choke on
Inches
Numeric
.000
2 in.
bat
p
Position in
Categorical
Forward
Back
box
t
Speed of
mph
Numeric
60mph
80mph
pitch
The player tried to hit the ball 100 times at each of the 16 combinations in the data set
below. The results show on many hits he was able to get out of the 100 tries.
Run
Stance s
Choke c
Position
Speed
Hits
1
2 in.
Back
60 mph
13
45
2
Square
0 in.
Forward
60 mph
28
3
Square
2 in.
Forward
80 mph
14
4
2
in.
Forward
60
mph
38
45
5
6
7
8
9
10
11
12
13
14
15
16
Square
45
Square
Square
Square
45
45
45
Square
Square
45
45
2 in.
0 in.
0 in.
2 in.
0 in.
0 in.
0 in.
0 in.
0 in.
2 in.
2 in.
2 in.
Back
Back
Back
Forward
Back
Back
Forward
Forward
Forward
Back
Forward
Back
80 mph
60 mph
60 mph
60 mph
80 mph
80 mph
60 mph
80 mph
80 mph
60 mph
80 mph
80 mph
27
11
13
40
19
23
34
5
2
23
9
31
(a)Use the model
hi    s j  ck  pl  tm  ( sc ) jk  ( sp ) jl  ( st ) jm  (cp ) kl  (ct ) km  ( pt )lm   i .
As with the Bowling Ball problem, s1  s2  0 , c1  c2  0 etc. To be uniform use the table
to determine the meaning of the variables.
s1
45
s2
Square
c1
2 in
c2
0 in
p1
Forward
p2
Back
t1
80 mph
t2
60mph
(b) Calculate the Mean Square Error of the model and the common Standard Error for
one of the parameters.
(c)Use the rule of thumb 2  S.D. to eliminate parameters. Note if an interaction term is
not eliminated this means the individual parameters making up the interaction must also
be included even if they would be eliminated on their own.
(d)Redo the calculation for the reduced model and find the new Mean Square Error and
compare to the previous value. Does the reduced model seem superior?
(e) Based on this analysis, what is the advice you would give to the baseball player?
3. The input –output model problem. The program is on e-campus. See if you can find a
formula for predicting the output from the input by doing different experiments.
Remember that the “system” has the properties discussed in class.
4. Suppose you experiment with two different shoes and two different balls to determine
the best combination for scoring the highest in bowling. Below is the data. Can you
come to some conclusion as to what may be the best combination? One possible answer
is there is no conclusion. Note: Each combination was done at random and then put in
the order below.

Shoes
Ball
176
A1
B1
176
186
184
180
182
182
188
A1
A2
A2
A1
A1
A2
A2
B2
B1
B2
B1
B2
B1
B2
5. The following data set gives the mass of a chunk of cement and the mass of the four
components used in the mix. The problem is to use these masses to predict the heat that
was produced given in the y column. What variables would you use in the model?
This is very basic, and the question has to do with the initial pencil and paper analyisis.
Mass(gm)
1005
990
985
1025
900
905
1005
890
955
1000
870
980
960
x1(gm)
70.35
9.9
108.35
112.75
63
99.55
30.15
8.9
19.1
210
8.7
107.8
96
x2(gm)
261.3
287.1
551.6
317.75
468
497.75
713.55
275.9
515.7
470
348
646.8
652.8
x3(gm)
60.3
148.5
78.8
82
54
81.4
170.85
195.8
171.9
40
200.1
88.2
76.8
x4(gm)
603
514.8
197
481.75
297
199.1
60.3
391.6
210.1
260
295.8
117.6
115.2
y(calories)
78892.5
73557
102735.5
89790
85950
98826
103213.5
64525
88910.5
115900
72906
111034
105024
6. Read the Dimensional Analysis Notes that are on the e-campus page for our class. Do
problems 3 and 4 on page 9.
7. Crater Ejecta Scaling Laws Article. (a) Do the dimensional analysis of Eq.(1)( the
authors never do this) (b) In your own words derive Eq(6) from Eq.(1). Pay
attention to how the authors do it and why they do it.
8. A computer chip maker is testing several possible ways of producing a new chip.
9 wafers at various locations in the chamber where chemicals are deposited on the
wafers are used to measure the value of the particular control factors effecting the
depositions. Each wafer is analyzed and the number of defects is counted making a
total of 9 measurements for each particular process, yi , i  1,...,9. In general a wafer
is considered acceptable if the number of defects is less than or equal to n0 . Using
the numbers yi determine a quantity based on these numbers that will be used to
pick the best process for producing wafers. Actually find two possible choices for
such a quantity.
9. A company sells a product for $500 that is guaranteed to be on target with a
variation of at most 2 . The company knows that the production process yields
items on target on average with a variation of   .5 with no further testing. An
additional test on each individual unit costs $15 to determine if it is within the
guaranteed specifications. Make a case for or against the cost of further testing
using a quadratic loss function.
10. A medical company produces a part that has a hole measuring .50  .050 cm.
The tooling used to make the hole is worn and needs replacing, but management
doesn’t feel it is necessary since it still makes “good parts”. All parts pass QC, but
several parts have been rejected by assembly. Failure costs per part is $45.00. A
new tool costs $10,000. Experience indicates that a worn tool can produce around
3000 holes that are within specs. Using the quadratic loss function explain why it
𝑚ay benefit the company and the customer to replace the tool more frequently.
Typical Data giving the precise diameter of the hole follows (in cm.)
{.459,.462,.467,.474,.476,.478,.483,.489,.491,.492,.495,.495,.495,.498,.500,.501,.501,.502,.505,
.509,.511,.516,.521,.524,.527,.527,.532,.532,.533,.536}
11. Consider the experiment given in the Table with 3 Factors A ,B, and C. Each with
2 levels. What factor affects the loss function L[y] the most and what settings would
minimize L[y]? (Use main effects model L[ y ]    ai  b j  ck   )
3.567
2.145
1.678
2.564
1
2
3
A1
A1
A2
A2
B1
B2
B1
B2
C1
C2
C2
C1
12. The problem is to predict the energy consumption for the winter of 1971(the first
quarter of 1971).
(a)Graph the data and determine a linear model. Model the slight increasing trend with a
straight line. Write out the linear model y  X    .
Year
1
2
3
4
1965 874 679 616 816
1966 866 700 603 814
1967 843 719 594 819
1968 906 703 634 844
1969 952 745 635 871
13. Turkey data. The following data corresponds to the weight of turkeys(y) at age (x)
raised in Georgia, Virginia, or Wisconsin.
y
13.3
8.9
15.1
10.4
13.1
12.4
13.2
11.8
11.5
14.2
15.4
13.1
13.8
x
28
20
32
22
29
27
28
26
21
27
29
23
25
Origin
G
G
G
G
V
V
V
V
W
W
W
W
W
(a) Write out the linear model y  X    for this problem using a dummy variable
to model the state of origin and assuming the weight grows linearly with age (
straight line). Use a1 , a2 , a3 (G,V,W) with a2  a1  a3 for the coefficients
corresponding to the states.
k
k
14. Plot y(k)  cos 22.4
 .5cos 23.1
 .3cos 21 k k  1,2,...50. Use this data to calculate the
Sample Spectrum.
15. Let wt be a sequence of independent and normally distributed random variables with
mean zero and variance one for every t . Use any program to graph a realization of wt as
a function of t for t  1,2,...100. Use this data to calculate the Sample Spectrum.
16. A huge multinational company, Gans Dynamics, manufactures spacecraft as well as
breakfast cereal. A major problem in the production process of cereal is contamination of
the cereal by metal pieces that entered the process before the start or pieces that have
broken off of the many metal rollers involved in the processing. Also rollers with an
imperfection can cause problems.
To monitor the process, sensors that record vibrations are positioned throughout the line
to detect abnormal vibrations. The many rollers rotate at various speeds with the highest
being a little less than 5 rounds per second. Data from the sensors is collected at the rate
of 10Hz.
Two files with n=1000 data points each (100 seconds of data) are given on e-campus.
One file corresponds to normal operations and the other to contaminated operations.
Graph the data and use spectral analysis to determine the sample spectrums. Discuss
results and in particular the “contaminated spectrum” and what it implies. Of importance
are the frequencies that are highlighted in the contaminated spectrum. This would help
locate where the problem might be located.
17. An experiment is done to see if a coin is fair(the probability of heads is .5). In one
experiment the coin is tossed 100 times and heads show up 57 times. Use the likelihood
ratio test and the result that 2 log  is approximately  2 (when is large) to test p0  .5 .
Do the same calculation for n  1000 and heads appeared 570 times. Why do you think
(use common sense) the conclusions are different?
18. Suppose a great basketball player is a terrible free throw shooter. The coach knows
that on average, “Mounty” makes about 40% of his free throws but the coach wants to
know what percent of free throws Mounty will make given that he either missed the one
before or made the one before. Here is data from one game from start to finish: 1-0-0-10-1-1-0-0-0-1. Can you help the coach answer her question?
Name:___________________________________
Major/Year:
Statistics Background:
Software for doing Mathematics/spreadsheets:
What do you expect from a Math Modelling course?
Download