Case-Based Reasoning (cont.)

advertisement
CSC445: A Case Study
Case-Based Reasoning
Case-Based Reasoning

Case-based reasoning
–
–
–
akin to the human intuitive thinking process
make use of analogies or cases of previous
experiences when solving problems
useful in a wide variety of software development
domains



software quality estimation
software cost estimation
software design and reuse
Case-Based Reasoning (cont.)

Working hypothesis for CBR
–

To obtain a CBR model for a given data set some
parameters have to be assigned
–

modules with similar attributes should belong to the same
quality-based group
e.x.
nN & c
In order to obtain a preferred model, we have to vary
the combinations of parameters, build the models
and choose the ''best one'' manually
Case-Based Reasoning (cont.)

A CBR system comprises of 3 major
components:
–
–
–

a case library
a similarity function
a solution algorithm
In a CBR system, program modules related
to previously developed systems are stored
in a case library
Case-Based Reasoning (cont.)



A similarity function measures the distance
between the current case and all the cases in the
case library.
Modules with the smallest distances from the module
under investigation are considered similar and
designated as the nearest neighbors.
Many similarity functions can be used, such as
–
city block, Euclidean & Mahalanobis
Case-Based Reasoning (cont.)

Mahalanobis distance
d ij  (c j  x i )' S 1 (c j  x i )
where
–
–
–
–
xi stands for the current case
cj is the jth case in the case library
the prime (′) implies a transpose
S is the variance-covariance matrix of the independent variables
over the entire case library
Case-Based Reasoning (cont.)

A generalized data clustering classification rule is
used as the solution algorithm of the CBR system

 fp,
Class (x i )  
nfp,

if
d nfp (x i )
d fp (x i )
otherwise
c
Case-Based Reasoning (cont.)

In the context of a two-group classification
model, two types of misclassifications can
occur:
–
–
Type I (nfp module classified as fp)
Type II (fp module classified as nfp)
Case-Based Reasoning (cont.)
0.4
0.3
Type I
1
0.98
0.96
0.94
0.92
0.9
0.88
0.86
0.1
0.84
0.2
0.82

For a given nN, an inverse
relationship between the
Type I and Type II error
rates is observed when
varying the value of c
The preferred balance is
that the two error rates are
approximately equal with
the Type II error rate
being as low as possible.
0.8

An Example:
Type II
preferred balance:
C=0.95 Type I =23.16% Type II = 23.14%
Case-Based Reasoning (cont.)
1. Create a new project
2. Choose the fit data set
Cross validation:
In K-fold cross-validation, the original
sample is partitioned into K subsamples.
Of the K subsamples, a single subsample
is retained as the validation data for
testing the model, and the remaining K − 1
subsamples are used as training data. The
cross-validation process is then repeated
K times (the folds), with each of the K
subsamples used exactly once as the
validation data. The K results from the
folds then can be averaged (or otherwise
combined) to produce a single estimation.
Case-Based Reasoning (cont.)
3. Select the metrics (independent
variables) and dependent variable.
4. Choose the model with CBR
Case-Based Reasoning (cont.)
Case-Based Reasoning (cont.)
5. Create a new experiment
6. Choose the similarity function
Note: If you choose Mahaanobis
distance, use the “pooled
covariance”
7. Model type should be
“classification”
Case-Based Reasoning (cont.)
Case-Based Reasoning (cont.)
Press the “Execute”
button to run the
program.
Case-Based Reasoning (cont.)
Result will be display in
this box.
8. Choose the
preferred model based
on the model selection
strategy.
Case-Based Reasoning (cont.)
Which one is the preferred model?
Case-Based Reasoning (cont.)
Which one is the preferred model?
C=0.6 and nN=14 or C=0.6 and nN=15
Type I error rate =27.551%
Type II error rate =28.571%
Case-Based Reasoning (cont.)
9. Once you choose the preferred model, record the parameters you used.
For example C=0.6 and nN=15
10. Then, apply the selected model ( the selected parameters) to the test data set.
Case-Based Reasoning (cont.)
Case-Based Reasoning (cont.)
Case-Based Reasoning (cont.)
Case-Based Reasoning (cont.)
This is the prediction result on the test data set
Case-Based Reasoning (cont.)
12. Calculate the ECM.
For example:
ECM = (15×1+14×5)/94=0.904255
Case-Based Reasoning (cont.)
Thresh
old
Similarity
function
Fit
Type I
Type II
City Block
1
Euclidean
Mahalanobis
City Block
2
Euclidean
Mahalanobis
In terms of Type I and Type II error rates
Test
Overall
Type I
Type II
Overall
Download