1. Research Objectives
To analyze customer perceptions, attitudes, and behaviors related to their experiences at Hong
Kong Ocean Park using survey data, and to provide actionable recommendations for enhancing the
park's offerings and reputation.
2. Research Questions and Hypotheses
Research Questions
a. How do satisfaction levels influence visitors' likelihood to recommend Ocean Park?
b. Does perceived value for money affect overall satisfaction?
c. Are there significant differences in satisfaction levels across demographic groups (age, gender)?
Hypotheses
a. Visitors with higher satisfaction levels are more likely to recommend Ocean Park.
b. Visitors perceiving better value for money report higher satisfaction.
c. Satisfaction levels differ significantly across demographic groups (age, gender).
3. Research Methods
Data Collection
A structured questionnaire was distributed to a sample of visitors, collecting information on
satisfaction, likelihood to recommend, demographic details, and perceived value for money.
2. Data Cleaning
1.
1
# Check and clean missing data
2
print(data.isnull().sum())
3
data.dropna(subset=['Satisfaction', 'Likelihood to Recommend'], inplace=True)
4
3.
Analysis Framework
Descriptive Statistics: Distributions for satisfaction, value for money, and demographics.
Statistical Tests:
Correlation analysis for satisfaction vs. likelihood to recommend.
ANOVA and t-tests for demographic differences.
Regression analysis to determine drivers of recommendations.
4. Research Findings
Satisfaction vs. Likelihood to Recommend
Statistical Test: Correlation and regression.
from scipy.stats import pearsonr
corr, p_value = pearsonr(data['Satisfaction'], data['Likelihood to Recommend'])
print(f"Correlation: {corr}, p-value: {p_value}")
Result: Correlation coefficient = 0.72, p-value < 0.05. Positive and significant relationship.
Value for Money vs. Satisfaction
Statistical Test: ANOVA and correlation.
from scipy.stats import f_oneway
f_stat, p_value_anova = f_oneway(*[group['Satisfaction'] for name, group in
data.groupby('Value for Money')])
print(f"ANOVA F-statistic: {f_stat}, p-value: {p_value_anova}")
Result: Significant differences in satisfaction across value-for-money groups (p < 0.05).
Demographic Influence
Statistical Test: T-test for gender; ANOVA for age groups.
from scipy.stats import ttest_ind
t_stat, p_value_gender = ttest_ind(data[data['Gender'] == 'Male']
['Satisfaction'],
data[data['Gender'] == 'Female']
['Satisfaction'])
print(f"T-test p-value for gender: {p_value_gender}")
Result:
No significant gender differences (p > 0.05).
Age groups show significant variance in satisfaction (p < 0.05).
5. Visualizations
Satisfaction Distribution
import matplotlib.pyplot as plt
data['Satisfaction'].value_counts().plot(kind='bar', color='blue',
edgecolor='black')
plt.title('Satisfaction Distribution')
plt.xlabel('Satisfaction Level')
plt.ylabel('Count')
plt.savefig('satisfaction_distribution.png')
plt.show()
Satisfaction by Value for Money
data.groupby('Value for Money')['Satisfaction'].mean().plot(kind='bar',
color='green', edgecolor='black')
plt.title('Average Satisfaction by Value for Money')
plt.xlabel('Value for Money')
plt.ylabel('Average Satisfaction')
plt.savefig('value_for_money_vs_satisfaction.png')
plt.show()
Likelihood to Recommend vs. Satisfaction
plt.scatter(data['Satisfaction'], data['Likelihood to Recommend'], alpha=0.7)
plt.title('Satisfaction vs. Likelihood to Recommend')
plt.xlabel('Satisfaction')
plt.ylabel('Likelihood to Recommend')
plt.savefig('satisfaction_vs_recommend.png')
plt.show()
6. Conclusions
Satisfaction Drives Recommendations:
Positive correlation suggests enhancing visitor satisfaction directly boosts recommendations.
2. Value for Money Influences Satisfaction:
Improvements in perceived value can significantly enhance overall satisfaction.
3. Demographic Insights:
Age-specific preferences highlight the need for tailored marketing and service improvements.
1.
7. Recommendations
Improve Value for Money:
Introduce bundled offers and promotions (e.g., family packages).
Emphasize unique aspects (e.g., conservation efforts) in marketing.
2. Enhance Visitor Satisfaction:
Upgrade underperforming features (e.g., marine exhibits).
Address neutral satisfaction levels with targeted surveys for improvement.
3. Demographic-Based Strategies:
Focus on family-friendly activities for older age groups.
Develop campaigns tailored to younger audiences (e.g., thrill ride enthusiasts).
1.
Code Analysis
1. Load Data
Code Block:
data_path = "HK_Ocean_Park_Final_Survey_Analysis.csv"
# Replace with the actual
file path
data = pd.read_csv(data_path)
Purpose:
Reads the survey data from a CSV file into a Pandas DataFrame for further analysis.
Key Notes:
Ensure the file path is correct and accessible.
Use .head() or .info() to inspect the structure and integrity of the loaded data.
hk_ocean_park… .csv
4.63KB
2. Data Cleaning
Code Block:
# Check for missing values
print("Missing values per column:")
print(data.isnull().sum())
# Drop rows with critical missing data
initial_row_count = len(data)
data.dropna(subset=['Satisfaction', 'Likelihood to Recommend', 'Age Group',
'Gender'], inplace=True)
final_row_count = len(data)
print(f"Rows dropped during cleaning: {initial_row_count - final_row_count}")
Purpose:
Identifies missing values in key columns (e.g., Satisfaction, Likelihood to Recommend).
Drops rows with missing values in these critical columns to ensure robust analysis.
Importance:
Prevents bias or errors in statistical tests caused by incomplete data.
Example Output:
Missing values per column:
Satisfaction
3
Likelihood to Recommend
5
Age Group
0
Gender
0
Rows dropped during cleaning: 5
3. Data Analysis
Code Block:
# Country Distribution
country_counts = data['Country'].value_counts()
# Age Group Distribution
age_counts = data['Age Group'].value_counts()
# Gender Distribution
gender_counts = data['Gender'].value_counts()
# Satisfaction Level Distribution
satisfaction_counts = data['Satisfaction'].value_counts()
Purpose:
Computes the frequency distribution for key columns: Country, Age Group, Gender, and Satisfaction.
Importance:
Provides a descriptive overview of survey participants and their satisfaction levels.
Example Output:
Country Distribution:
HK
120
CN
80
US
50
Age Group Distribution:
25-34
100
35-44
80
Gender Distribution:
Male
130
Female
120
4. Statistical Tests
(a) Correlation Between Satisfaction and Likelihood to Recommend
Code Block:
corr_coeff, p_value_corr = pearsonr(satisfaction, recommendation)
print(f"Correlation between Satisfaction and Likelihood to Recommend:
correlation coefficient={corr_coeff:.4f}, p-value={p_value_corr:.4f}")
Purpose:
Measures the linear relationship between Satisfaction and Likelihood to Recommend.
Explanation:
corr_coeff: Correlation coefficient (range [-1, 1]).
p_value_corr: p-value indicating statistical significance.
Example Output:
Correlation between Satisfaction and Likelihood to Recommend: correlation
coefficient=0.7254, p-value=0.0021
(b) T-Test for Gender Differences in Satisfaction
Code Block:
t_stat, p_value_gender = ttest_ind(male_satisfaction, female_satisfaction,
equal_var=False)
print(f"T-test for Gender and Satisfaction: t-statistic={t_stat}, p-value=
{p_value_gender}")
Purpose:
Compares average satisfaction levels between male and female groups.
Explanation:
p_value_gender < 0.05: Significant difference between genders.
Example Output:
T-test for Gender and Satisfaction: t-statistic=1.654, p-value=0.104
(c) ANOVA for Age Group and Satisfaction
Code Block:
f_stat, p_value_age = f_oneway(*group_satisfaction)
print(f"ANOVA for Age Group and Satisfaction: f-statistic={f_stat}, p-value=
{p_value_age}")
Purpose:
Tests whether satisfaction levels differ significantly among age groups.
Explanation:
p_value_age < 0.05: Significant differences exist between groups.
Example Output:
ANOVA for Age Group and Satisfaction: f-statistic=5.432, p-value=0.012
(d) Shapiro-Wilk Test for Normality
Code Block:
stat, p_value_normality = shapiro(data['Satisfaction'])
print(f"Shapiro-Wilk Test for Normality of Satisfaction: p-value=
{p_value_normality}")
Purpose:
Tests if satisfaction scores follow a normal distribution.
Explanation:
p_value_normality > 0.05: Data is approximately normal.
Example Output:
Shapiro-Wilk Test for Normality of Satisfaction: p-value=0.087
(f) Chi-Square Test for Gender vs Satisfaction
Code Block:
f_stat, p_value_age = f_oneway(*group_satisfaction)
print(f"ANOVA for Age Group and Satisfaction: f-statistic={f_stat}, p-value=
{p_value_age}")
Purpose:
Assess the independence of two categorical variables: Gender and Satisfaction.
The Chi-Square test checks whether the satisfaction scores differ across genders, i.e., whether there is
a significant relationship between gender and satisfaction.
Explanation:
p-value < 0.05: There is a significant relationship between gender and satisfaction.
p-value ≥ 0.05: There is no significant relationship between gender and satisfaction.
Example Output:
Chi-Square Test for Gender and Satisfaction:
Chi-square Statistic: 7.3214
P-value: 0.0612
Degrees of Freedom: 2
Expected Frequencies:
[[10.5 10.5]
[20.5 20.5]
[5.5 5.5]]
5. Visualization
Code Block:
plt.figure(figsize=(8, 6))
satisfaction_counts.plot(kind='bar', color='green', edgecolor='black')
plt.title('Satisfaction Distribution')
plt.xlabel('Satisfaction Level')
plt.ylabel('Count')
plt.savefig('satisfaction_distribution.png')
Purpose:
Visualizes satisfaction levels among visitors.
Importance:
Provides clear, actionable insights for reports or presentations.
6. Report Generation
Code Block:
def generate_pdf_report(report_path, p_value_corr, p_value_gender, p_value_age,
p_val_chi2, output_dir):
"""Generate a PDF report with results."""
c = canvas.Canvas(report_path, pagesize=letter)
width, height = letter
# Title
c.setFont("Helvetica-Bold", 16)
c.drawString(30, height - 40, "Ocean Park HK Survey Report")
c.setFont("Helvetica", 12)
c.drawString(30, height - 60, "============================")
c.drawString(30, height - 80, "1. Research Questions")
c.drawString(30, height - 100, "- What are the satisfaction levels of
visitors?")
c.drawString(30, height - 120, "- Are there differences based on demographic
factors?")
c.drawString(30, height - 140, "- Does satisfaction correlate with
likelihood to recommend?")
c.drawString(30, height - 180, "2. Methodology")
c.drawString(30, height - 200, "Survey responses were collected online. Data
was analyzed using Python for statistical tests and visualization.")
c.drawString(30, height - 240, "3. Results")
c.drawString(30, height - 260, "4. Statistical Test Results:")
c.drawString(30, height - 280, f"- Correlation between Satisfaction and
Likelihood to Recommend: p-value={p_value_corr:.4f}")
c.drawString(30, height - 300, f"- Gender and Satisfaction: p-value=
{p_value_gender:.4f}")
c.drawString(30, height - 320, f"- Age Group and Satisfaction: p-value=
{p_value_age:.4f}")
c.drawString(30, height - 340, f"- Chi-Square Test for Gender and
Satisfaction: p-value={p_val_chi2:.4f}")
# Inserting images (charts)
c.drawImage(os.path.join(output_dir, 'country_distribution.png'), 30, height
- 460, width=500, height=300)
c.drawImage(os.path.join(output_dir, 'age_distribution.png'), 30, height 760, width=500, height=300)
c.drawImage(os.path.join(output_dir, 'gender_distribution.png'), 30, height 1060, width=500, height=300)
c.drawImage(os.path.join(output_dir, 'satisfaction_distribution.png'), 30,
height - 1360, width=500, height=300)
c.drawImage(os.path.join(output_dir, 'satisfaction_vs_recommendation.png'),
30, height - 1660, width=500, height=300)
# Save the PDF
c.save()
print(f"PDF Report generated: {report_path}")
# Include analysis and statistical test results
Purpose:
Summarizes findings into a plain-text report for documentation and sharing.
Output:
Report saved as Ocean_Park_Survey_Report.pdf.
ocean_park_sur….pdf
884.99KB
7. Full Code
analysis.py
8.76KB
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )