Overview of the data • Data info • 7420 rows of client data • 15 attributes • Key attributes • Balance • Offer • Check • Card • Data Types • Continuous • Nominal • Target Variable • Revenue Data Cleaning and Preparation • Used missing data analysis to check the pattern for missing data. • Saw linear dependencies between predictor variables (Heat Map). Removed these redundant variables to avoid singularity. • Loan = CD • INSUR = MM • INSUR = Savings • MM = Savings • Decided to exclude/hide one outstanding revenue data. Numerical Variable Highlights Upon investigation of numerical variables, the team highlighted several key statistics below: Categorical Variable Highlights Upon investigation of categorical variables, the team demonstrated several key data below: Data Transforming – Highly skewed variables • Highly skewed (revenue & balance), taking the logarithm can help reduce the impact of extreme values and make the distribution more symmetric. Model building - Stepwise Linear Regression • Used stepwise linear regression to build the model as the target is revenue, a continuous variable. • Four variables left • RSquare=0.5979, RSqurae Adj=0.5976 No overfit problem • Completed Lasso and Ridge – R didn’t change much Recommendations Our predictive model has determined what factors can we use to predict the revenue. 1. Do more promotions to attract customers. 2. Attract customers with high account balance 3. Checking account indicates lower revenue. Try to avoid doing promotions for customers who has a checking account. What can we do better with specific data 1. Revenue: Total revenue generated by the customer over 6 months If we can know how often and when each account starts generating profit, it will help improve the accuracy. 2. Avoid Simpson’s paradox with more basic customer information such as gender and occupation. Overview of the data • Data info • 3332 rows of client data • 19 attributes • Key attributes • Churn • State • IntPlan • DayMinutes • Data Types • Continuous • Nominal • Target Variable • Churn Data Cleaning and Preparation • No missing data founded • Found strong paired relationship among some values (ex: IntlCharge & IntMin). After careful observation and internet searching, decide NOT to remove those values • Plan to use JMP autochoosing features to help make decision Data Exploration After doing Bivariate Analysis, several data draw our attention For example: The Contingency Analysis of Churn by VMPlan reveals that a predominant majority, 85.50%, of individuals did not churn. While most individuals without a VMPlan fall into both the churned and non-churned categories, a notably small proportion, only 2.40%, of those with a VMPlan churned. The Mosaic Plot visually reinforces this distribution, highlighting the pronounced presence of non-churners, especially among those without a VMPlan. In total: there's a discernible trend linking customer service interactions and churn rates: as the number of customer service calls increases, so does the likelihood of churn. This could suggest dissatisfaction or recurring issues among certain customers. Furthermore, the relationship between 'IntlPlan' and churn is similar to the earlier 'VMPlan' observation: users without an 'IntlPlan' have a higher propensity to churn. Interestingly, geographical factors (State) also influence churn, but the impact varies per state, possibly due to regional service quality, marketing campaigns, or other location-specific factors. In summary, service quality, plan offerings, and regional factors all play a role in influencing customer retention. Data Transforming – Highly skewed variables Use ‘Log’ method to normalize highly skewed data. Turned NVMailMasgs to Log[NVMailMsgs]. Make it readable.