Uploaded by lijinze0430

Final Project

advertisement
Overview of the data
• Data info
• 7420 rows of client data
• 15 attributes
• Key attributes
• Balance
• Offer
• Check
• Card
• Data Types
• Continuous
• Nominal
• Target Variable
• Revenue
Data Cleaning and
Preparation
• Used missing data analysis to check the
pattern for missing data.
• Saw linear dependencies between
predictor variables (Heat Map). Removed
these redundant variables to avoid
singularity.
• Loan = CD
• INSUR = MM
• INSUR = Savings
• MM = Savings
• Decided to exclude/hide one outstanding
revenue data.
Numerical Variable Highlights
Upon investigation of numerical variables, the team highlighted several key statistics below:
Categorical Variable Highlights
Upon investigation of categorical variables, the team demonstrated several key data below:
Data Transforming – Highly skewed variables
• Highly skewed (revenue & balance), taking the logarithm can help reduce the impact of
extreme values and make the distribution more symmetric.
Model building - Stepwise Linear Regression
• Used stepwise linear regression to build the model as the target is revenue, a continuous
variable.
• Four variables left
• RSquare=0.5979, RSqurae Adj=0.5976  No overfit problem
• Completed Lasso and Ridge – R didn’t change much
Recommendations
Our predictive model has determined what factors can we use to predict the revenue.
1. Do more promotions to attract customers.
2. Attract customers with high account balance
3. Checking account indicates lower revenue.
Try to avoid doing promotions for customers
who has a checking account.
What can we do better
with specific data
1. Revenue: Total revenue generated by the
customer over 6 months
If we can know how often and when each
account starts generating profit, it will help
improve the accuracy.
2. Avoid Simpson’s paradox with more basic
customer information such as gender and
occupation.
Overview of the data
• Data info
• 3332 rows of client data
• 19 attributes
• Key attributes
• Churn
• State
• IntPlan
• DayMinutes
• Data Types
• Continuous
• Nominal
• Target Variable
• Churn
Data Cleaning
and Preparation
• No missing data founded
• Found strong paired
relationship among some
values (ex: IntlCharge &
IntMin). After careful
observation and internet
searching, decide NOT to
remove those values
• Plan to use JMP autochoosing features to help
make decision
Data Exploration
After doing Bivariate Analysis, several data
draw our attention
For example: The Contingency Analysis of Churn by VMPlan
reveals that a predominant majority, 85.50%, of individuals
did not churn. While most individuals without a VMPlan fall
into both the churned and non-churned categories, a notably
small proportion, only 2.40%, of those with a VMPlan
churned. The Mosaic Plot visually reinforces this distribution,
highlighting the pronounced presence of non-churners,
especially among those without a VMPlan.
In total: there's a discernible trend linking customer
service interactions and churn rates: as the number of
customer service calls increases, so does the likelihood
of churn. This could suggest dissatisfaction or recurring
issues among certain customers. Furthermore, the
relationship between 'IntlPlan' and churn is similar to the
earlier 'VMPlan' observation: users without an 'IntlPlan'
have a higher propensity to churn. Interestingly,
geographical factors (State) also influence churn, but the
impact varies per state, possibly due to regional service
quality, marketing campaigns, or other location-specific
factors. In summary, service quality, plan offerings, and
regional factors all play a role in influencing customer
retention.
Data Transforming – Highly skewed variables
Use ‘Log’ method to
normalize highly skewed
data. Turned
NVMailMasgs to
Log[NVMailMsgs]. Make it
readable.
Download