1. Provide a concise description of your data. For example, what is the average age of
the customers,employment status, percentage of males/females, average revenue,
maximum amount spent on excursions/entertainment/spa services? What
percentage of customers in the dataset intend to rebook? Feel free to consider
additional relevant variables and consider visualizing your results using graphs.
2. Use intuition (bonus points: theory) to select variables in your dataset that might
influence the rebooking probability. Clearly explain why you think a variable might
have an effect.
Small x ≤ .3
Medium .3 < x < .5
Large x ≥ .5
Correlation size top 3: Age, ( -0.306), Older they get, less likely to travel
Membership tier (-0.291) , High-tier members might have already used the service a
lot.
Months since first visit(-0.231), They are not interested in the hotel anymore
Positive sign correlation size top 3:
Last visit to SG property ( 0.159)
No. of times the customer has booked (0.146)
Max time spent in hotel spa (0.105)
3. Estimate a binary logistic regression model with the variables selected under 2. What
is the hit rate?Which variables have a significant effect on rebooking? How do you
interpret the signs of the significant coefficients?
Hit rate: 52.8