Fitting data to a distribution: Arena>tools>input analyzer >file>new

advertisement
Fitting data to a distribution: Arena>tools>input analyzer >file>new, an input window will open with a
new control bar, to file>data file>use existing (your data will most likely be in an excel file, save as a .txt
(tab delineated) (read help in input analyzer if you get lost)) a histogram and other information will
appear>fit>fit all, distribution summary will appear  the higher the p value, the better
Save excel file as a text delimitated
After the data file has been loaded and displayed as a histogram in a data fit window, the next
step is to fit a probability distribution function to the data. To do this, first select the Fit menu
item. A drop-down menu displays all of the available distribution functions. Note that the
Poisson distribution will be inactive unless the Input Analyzer detects all integer data.
Next select the desired probability distribution function. The Input Analyzer will then determine
the parameters that will fit the distribution function to the data. As soon as the curve-fitting
calculations are complete, the resulting probability density function is drawn on top of the
histogram. (In the case of the empirical distribution, the cumulative distribution function is
shown instead.) Information characterizing the curve-fit, including an expression that could be
included in an Arena model, is shown in the bottom section of the window.
More detailed information, including tabulations of the probability densities (histogram and
probability distribution function) and the corresponding cumulative distributions, is provided in a
text file that is written to the default directory with the file name [distribution].out, where
[distribution] is the name of the selected distribution function. For example, if the exponential
distribution was chosen, the information would be written to a file called expon.out. In addition,
a summary of all distributions fitted to your data file (e.g., myfile.dst) will be written to a
summary file of the same name, with the extension .sum (e.g., myfile.sum). These text files may
be viewed within the Input Analyzer by choosing the Window menu option and clicking on
Curve Fit Summary. For more information on these functions, refer to the Viewing Tabular Data
section.
The results of the Fit All calculations should be interpreted as guidelines rather than precise
scientific calculations, since the relative rankings can be affected by the number of intervals
within the histogram or choice of histogram end points. Thus, if two or more distribution
functions show small square errors that are relatively close to each other, it is not clear that the
function with the smallest square error is necessarily "the best." It often happens that multiple
distribution functions offer satisfactory representations for a given data file, and the final choice
might be determined by other factors, such as the results of the goodness-of-fit tests or the
computational efficiency of the functions within Arena. On the other hand, the results of the Fit
All calculations do allow you to distinguish clearly between those functions that fit the data well
and those that do not.
Histogram command (Options, Parameters menu)
In the Options, Parameters menu, choosing the Histogram option will make a dialog appear that
allows you to change the number of intervals, the lower bound (ignoring all data below this
bound), and the upper bound (ignoring all data above this bound). The number of intervals must
be at least 5 and not more than 40. In addition, the histogram lower bound must be greater than
or equal to the largest integer that does not exceed the minimum data value in the file. The
histogram upper bound, then, must be less than or equal to the smallest integer that equals or
exceeds the maximum data value in the file.
If a histogram parameter is changed after a distribution function has been selected, a new curvefit will automatically be carried out utilizing the new histogram parameters.
Distribution command (Options, Parameters menu)
The Distribution option only becomes active once a distribution function (other than Empirical)
has been fitted to the data. If you choose this option, a dialog appears allowing you to change the
distribution function parameters. Once a distribution function parameter has been changed, a new
evaluation of the goodness of the distribution’s fit to the data will be performed.
p values over .05 means the null hypotheses that predicted values match actual values is up held and we
do not reject that hypotheses. If the p value is .05 or less we reject the null and the modeled distribution
does not fit the actual distribution. If the test statistic is large relative to the degrees of freedom, the p
will be small. Think of the Z test, when Z is large, the probability you will reach that point is small. In
hypotheses testing, we are asking what is the probability that the actual measure is within the same
distribution as the null number; the further away that number is from the null hypotheses/number, the
smaller the p that the measured number is in the same distribution.
An attractive feature of the Kolmogorov-Smironov test is that the distribution of the K-S test statistic
itself does not depend on the underlying cumulative distribution function being tested. Another
advantage is that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample
size for the approximations to be valid). Despite these advantages, the K-S test has several important
limitations:
1. It only applies to continuous distributions.
2. It tends to be more sensitive near the center of the distribution than at the tails.
3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if
location, scale, and shape parameters are estimated from the data, the critical region of the K-S
test is no longer valid. It typically must be determined by simulation.
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm 2/2010
Download