Uploaded by nrs2130

CODING USED FOR FRAYM DATA CASE

advertisement
CODING USED FOR FRAYM DATA CASE
1.) Comparing Lat/Long of customer locations with census data locations to
extract third variable
a. Case: % Access to Electricity
>>%Load energy table
>> energy = readtable('Energy_Sources_LATLONG.csv');
>> %Change cells to numeric
>> energyMat = table2array(energy);
>>
>>%Remove duplicates from energyMat
>> [uvals, ~, uidx] = unique(energyMat, 'stable');
energyMat2 = energyMat; %mostly to copy the class and size
for K = 1 : length(uvals)
mask = uidx == K;
energyMat2(mask) = uvals(K) + (0 : nnz(mask) - 1) * 0.01;
end
>>
>> E = scatteredInterpolant(energyMat2(:,1), energyMat2(:,2),
(1:size(energyMat2,1)).', 'nearest');
>> custEnerNear = E( custProfMat(:,1:2) )
>>energyNear = energyMat2(custEnerNear, :)
MACHINE LEARNING STEPS
>>C = readtable('customer_profiles_F.csv');
>>%Create categorical array for HighRepayor variable
>>%(>=70% = High, >=30% and <70% = Med, <30% = Low)
>> C.HighRepayors = categorical(C.HighRepayors)
%Remove unnecessary variables
>> C(:,1:4)=[]
%Split into training and test datasets
>> cvpt = cvpartition(C.HighRepayors,'Holdout',0.3);
%Create logical variable which contains a value of ‘true’ for data used to ‘train’ the
classifier and ‘false’ for test data
>> trainingIdx = training(cvpt);
>> testIdx = test(cvpt);
%Create table containing all data to train
>> trainingData = C(trainingIdx,:);
>> testData = C(testIdx,:);
%Fit a Model Using k-NN Clustering Technique to training data
>> knnMdl = fitcknn(trainingData,'HighRepayors');
%Use model to predict which group the test data belongs to
>> predictedGroups = predict(knnMdl,testData)
%%Evaluate the classification
%Calculate Training and Test Error
>> trainErr = resubLoss(knnMdl)
trainErr =
0
>> testErr = loss(knnMdl,testData)
testErr =
0.1992
%see how the data is misclassified, you can use a confusion matrix
>> [cm,grp] = confusionmat(testData.HighRepayors,predictedGroups)
cm =
2 3 1
2 48 1
2 3 3
grp =
High
Low
Med
% Calculate rate at which each category was misclassified:
% High as Low
>> misClass = cm(grp=='High',grp=='Low');
>> falseNeg = 100*misClass/height(testData);
>> disp(['Percentage of False Negatives: ',num2str(falseNeg),'%'])
Percentage of False Negatives: 4.6154%
% Low as High
>> misClass = cm(grp=='Low',grp=='High');
>> falseNeg = 100*misClass/height(testData);
>> disp(['Percentage of False Negatives: ',num2str(falseNeg),'%'])
Percentage of False Negatives: 3.0769%
%High as Med
>> misClass = cm(grp=='High',grp=='Med');
>> falseNeg = 100*misClass/height(testData);
>> disp(['Percentage of False Negatives: ',num2str(falseNeg),'%'])
Percentage of False Negatives: 1.5385%
%Med as High
>> falseNeg = 100*misClass/height(testData);
>> disp(['Percentage of False Negatives: ',num2str(falseNeg),'%'])
Percentage of False Negatives: 3.0769%
%Cross validation techniques
%K-fold cross validation (leave-one-out method)
>> knnMdl2 = fitcknn(C,'HighRepayors','Leaveout','on');
>> mdl2Loss = kfoldLoss(knnMdl2)
mdl2Loss =
0.1689
%%Principal Component Analysis (Feature Transformation)
>> [~,scrs,~,~,pexp] = pca(C{:,1:end-1});
>> pareto(pexp)
%visualize the contributions of each variable to the principal components as an
image.
>> imagesc(abs(pcs(:,1:5)))
>>xlabel('Principal Component')
>>colorbar
%%Decide on predictor importance to an accurate model
%Some classifiers, such as decision trees, have their own built-in methods of feature
selection.
%Check the methods associated with the decision tree model. One of the
methods,predictorImportance, can be used to identify the predictor variables that
are important for creating an accurate model.
>> treeMdl = fitctree(trainingData,'HighRepayors');
>> methods(treeMdl)
Methods for class ClassificationTree:
compact
margin
resubMargin
compareHoldout
predict
resubPredict
crossval
predictorImportance surrogateAssociation
cvloss
prune
view
edge
resubEdge
loss
resubLoss
>> p = predictorImportance(treeMdl);
>>bar(p)
>> trainTreeErr = resubLoss(treeMdl)
trainTreeErr =
0.1039
>> testTreeErr = loss(treeMdl,testData)
testTreeErr =
0.1989