Uploaded by jialu zhang

gaozi

advertisement
1. My project is how to build a model for predicting whether customers will be interested in
the vehicle insurance by using machine learning.
2. Here is the content of this presentation
3. The background of my task is that an insurance company want to know if their current
customers will be interested in the vehicle insurance
4. Therefore we can refine this task into a more specific problem, which variable will have a
significant effect on the customers interest, through calculating the correlation, we aim to
determine the effects of a binary factor, vehicle damage or not. The null hypothesis for
this study is simply that Vehicle damage and vehicle not damage has the same
performance on the response. the alternative hypothesis is that Vehicle damage has
a better performance on the response than vehicle not damage. Besides that, some
relationship between independent variables will be mentioned in the result.
5. Before modelling we need to do the following preprocessing work for the training data.
6. Basically, we train logistic regression model and classification model, such as decision
tree, random forest for this binary problem, Set the parameters manually first then
apply grid search to select parameters to better fit the model.
7. For evaluating the model ,we use 5% as the significance level to test the reliability and
classification f1 score, accuracy score, auc score to measure the effectiveness, McNemar's
test for comparing different models.
8. However, when it comes to similar accuracy score when compare two different
performance model, for this vehicle insurance problem, we focus more on whether
customers are interested in or not, therefore the higher precision score the better.
For this reason the final model is logistic regression model. Because there is no
significant difference between logistic regression and random forest, the feature
importance of random forest can be used for testing the null hypothesis, Vehicle
with damage is regarded as a much more important factor for customers to
purchase car insurance, rather than vehicle with no damage.
9. As a result, combine the solution with the inner correlation of variables from
spearman correlation, the selling strategy should focus more on the group of elder
people whose cars were bought within one year but had damaged already and
looking for the new insurers, choose traditional agents as the main communication
method.
Above all is my presentation, thank you.
Download