Uploaded by njirubryance

489753716 project3 2256643421582991

advertisement
A large telecommunications company conducts a marketing survey to better
understand the perceptions about their products and the competition. Our
consulting team has worked with the company to identify a variety of
questions to answer. This client would also benefit from building a dynamic
reporting engine that can be run as a shiny application in R.
In addition to building these tools, our team would also like you to identify
opportunities for the client to make use of the information you’re working with
in novel ways.
This project is divided into 4 parts:

Part 1: Summarizing the data.
 Part 2: Answering specific questions about the respondents and their
perceptions of the industry’s products.
 Part 3: Building a dynamic reporting engine to explore many facets of
the survey’s information.
 Part 4: Identifying opportunities.
Details
Part 1: Summary
How would you summarize the data? Briefly describe what is measured in the
data and provide a summary of the information. You can show a table or
graphic, but keep things brief.
This part of the report will be directed to your internal team at the consulting
company. It is intended to document the sources of information that were
used in the project. It will also describe the data in less technical terms to
team members who are not data scientists. If another member of the team
joins the project later, they will rely on your descriptions to gain familiarity with
the data. To that end, we recommend providing some instructions that will
help other consultants use the information more effectively.
Part 2: Specific Questions
Our prior work has identified specific questions of interest. Please provide
these answers in output that is easy to read (e.g. tables or graphs).
This part of the report will be directed to marketing and product managers
throughout the client’s company. The idea is to give them the useful
information they need to act on the specific questions they posed. Plan your
communication accordingly.
Questions
1. Respondent Variables
o In percentage terms, how were the survey’s respondents
divided into categories for the following variables? Hint: Keep
in mind that each respondent may appear multiple times in the
data set.
 Age Group. The age groups are defined as:
 At least 18 and under 35;
 At least 35 and under 50;
 At least 50 and under 65;
 At least 65.
 Gender
 Income Group. The income groups are defined as:
 Under
50 thousand;
 At
least 50 thousand and under 75 thousand;
 At
least 75 thousand and under 100 thousand;
 At
least 100 thousand and under 150 thousand;
 At
least 150 thousand.

Region
 Persona
2. Segmented Outcomes
o Part A: What are the top 5 products by Awareness rates in the
Northeast?
o Part B: What are the top 5 products by Advocacy rates among
females who earn at least $100,000?
3. Overall Brand Perceptions
o
What are the top 5 brands by the overall average perceptions?
o
Evaluating this question can be tricky. Some of the perceptions
are for positive traits, and others are for negative traits. The
brand with the best overall perception would have the highest
scores for the positive traits and the lowest scores for the
negative traits. To aggregate these scores, we will follow a
number of steps:
1. For each brand, compute the average score of each
brand perception variable. In computing these
averages, remove any missing values from the
calculations.
2. Then, for the questions that assess negative
perceptions, invert the scores to place them on a
comparable scale with the positive traits. To do this,
use the conversion formula:
Inverted Average Score = 10 - Average Score.
3. With all of the average scores of each perception now
recorded on the same scale, we can aggregate them
into one measure, the Overall Average Perception of a
product. For each brand, compute the mean of these
scores.
4. Now rank the brands in decreasing order of their
Overall Average Perception scores.
5. Show the results for the top 5 brands.
4. Gaps in Outcomes
o
The marketing department wants to identify products with
engagement that is underperforming in some ways. The best
products should have high rates of engagement across all of
the outcomes, but that is not always the case.
o
For the purposes of this question, we will create a scoring
metric that ranges from 0% to 100%. For binary outcomes
(awareness, consideration, consumption, and advocacy), the
score will be the percentage of the respondents who answered
yes to the question among those who were asked. For
outcomes on an integer scale (e.g. Satisfaction), the score will
be the average value as a percentage of the maximum score.
So, for instance, if the average satisfaction for a product is 7
out of 10, then its percentage rating would be 70%.

Part A: Which 5 products have the largest gap between
the rate of consumption and the rate of awareness?
This would correspond to a formula of Difference =
Rate of Consumption - Rate of Awareness. (Please
use this exact formula.) Display a bar graph showing
the 5 largest differences in decreasing sorted order.
 Part B: Which 5 products have the largest gap between
the rate of Awareness and the average Satisfaction (in
percentage terms)? Here the formula would be
Difference = Rate of Awareness - Percentage Average
Satisfaction. (Please use this exact formula.) Display
a bar graph showing the 5 largest differences in
decreasing sorted order.
5. Aggregated Engagement
o
How much does a respondent’s engagement depend on the
product, and how much depends on the respondent? One way
we might investigate this further is to see whether the
respondent’s outcomes in other products has an impact on this
one.
o
For each marketing outcome and product, we will define a
user’s aggregated engagement as the average value of that
outcome variable on all of the products not being modeled.
For instance, consider a model of Awareness on the product
Buzzdial. A single user’s aggregated engagement would then
be the user’s average awareness on all of the other products
(not including Buzzdial). Any missing scores should be
removed from the calculation of the aggregated engagement. If
a user has no measured scores in the other products, define
the aggregated engagement as zero.
Part A: How much impact does respondent’s overall
trends in awareness have for that person’s awareness
with Buzzdial phones? To answer this question, we
want to create a logistic regression model. The
outcome will be the respondents’ Awareness of
Buzzdial. The variables in the model will include age
group, gender, income group, region, persona, and
the aggregated engagement (calculated on
awareness). Then, fit the logistic regression model.
Display a table including the model’s Odds Ratios,
95% confidence intervals for the Odds Ratios, and the
p-values.
 Part B: How much impact does respondent’s overall
trends in satisfaction have for that person’s satisfaction
with Buzzdial phones? To answer this question, we
want to create a linear regression model. The outcome
will be the respondents’ Satisfaction with Buzzdial. The
variables in the model will include age group, gender,
income group, region, persona, and the aggregated
engagement (computed on satisfaction). Display a
table including the model’s coefficients, 95%
confidence intervals for the coefficients, and the pvalues.

Part 3: Reporting Engine
Each of the specific questions in Part 2 can be generalized. A reporting
engine can allow a user to select many different outcomes or subgroups to
explore. In this portion, we will construct a dynamic reporting engine as a
shiny application in R. The sections of the reporting engine should include
user interfaces and reactive content for the scenarios presented Part 2.
Between Parts 2 and 3, you will be generating two kinds of reports – one
static and one dynamic. Since these reports rely on common data and similar
calculations, this is a good opportunity for you to build an infrastructure for the
software. To that end, you should create two additional files:


constants.R
functions.R
Then many of the initial commonalities in the two reports can be unified by
referring to your constants and functions. For each specific question in Part 2,
you should be able to write a single function that can be called to answer it
and also be called in the dynamic reporting engine of Part 3.
Your dynamic reporting engine will also be directed to the marketing and
product managers throughout the client’s company. The idea is to give them
tools that will help them answer novel questions and go beyond the specific
information they requested. Plan your communication accordingly.
Keep in mind that this reporting engine should be designed to be used by
others on their own machine. To simplify that process, please keep your files
in a folder structure:

The Project's Folder (named according to your preference)
o Data
o Reports
The Rmarkdown file should be within the Reports subfolder. It should read
the data from the Data folder using relative directories:
dat <- fread(input = "../Data/mobile phone survey data.csv", ve
rbose = F)
Otherwise the program will have difficulty operating on other machines.
Questions
1. Respondent Variables
o Depict the information produced in Q1 of Part 2. Allow the user
to select which variable to explore. Then create a graph that
depicts the percentages of respondents in each category for
that variable.
2. Segmented Outcomes
o
Build a dynamic, visual display ranking the products by their
outcomes in the manner of Question 2 of Part 2. The user will
make the following selections:

State of engagement: Only a single state may be
selected at once.

Other variables: Age Group, Gender, Income Group,
Region, Persona
o
Then, for all of the other variables, any combination of categories
may be selected, so long as at least one category from each
variable is chosen. For instance, for Gender, the user may select
Male only, Female only, or both Male and Female.
o
Then, the user should be able to select how many products to
display. Once a number is selected, the outcome rates should be
graphically displayed in sorted decreasing order for the top
products in the selected subgroups. If 5 is selected for Awareness,
then the 5 products with the highest rates of Awareness for the
specified subgroup will be depicted.
3. Overall Brand Perceptions
o
Generate a dynamic, graphical display that allows the user to
perform a calculation of the Overall Brand Perception in
selected subgroups. Much like the previous question, the user
may make any combination of selections in the following
variables, provided that at least one category of each variable
is selected: Age Group, Gender, Income Group, Region,
Persona.
o
Also allow the user to select how many brands should be
displayed, with the top k brands depicted in decreasing sorted
order. All results should display the overall average perception
for the brand.
4. Gaps in Outcomes
o
Create a dynamic, graphical display that ranks the products in
terms of the difference in averages between any two selected
outcomes. The user will be allowed to make the following
selections:

First Outcome: One of the outcome variables.

Second Outcome: Another outcome variable.
o
The difference in rates will be Difference = Average First
Outcome - Average Second Outcome per product.
o
Number of Top Products: The user will select how many
products to display.
o
Display Percentages: If checked, the bargraph will display the
percentages for each product.
o
Digits: How many digits should the percentages be rounded
to? 1 digit would be a number like 84.2%.
5. Aggregated Engagement
o
o
Let’s allow the user to build a model including an aggregated
outcome for a specific product. The site should include the
following features:

The user can select the products (1 or more).

The user can select the state of engagement as the
outcome.

The user can select the other variables to include in the
model. The list of choices should include the age
group, gender, income group, region, persona, brand
perceptions, and the Aggregated Engagement. Each
person’s aggregated engagement will be calculated as
the average score of the selected state of engagement
across the measured values of the other products .
You can give this variable a name like
“Aggregated.Engagement”.
The user’s selections will then be incorporated into a model. For
Satisfaction outcomes, use a linear regression. For all of the
other outcomes, use a logistic regression. Then create a
dynamic table showing the model’s results. For logistic
regressions, this must include the Odds Ratios, 95%
confidence intervals for the Odds ratios, and the p-values. For
linear regressions, this must include the coefficients, 95%
confidence intervals for the coefficients, and the p-values.
Part 4: Opportunities
This part of the report will be directed externally to your client’s senior
leadership. Your work will help to determine the future direction of the project
and the company’s contract with this client. Plan your communication
accordingly.
Questions
1. How would you build on the reporting capabilities that you have
created? What would you design next?
2. What are some opportunities to learn valuable information and inform
strategic decisions? List a number of questions that you might explore.
3. How would you approach other decisionmakers within the client’s
organization to assess their priorities and help them better utilize the
available information?
4. Video Submission: Make a 2-minute pitch to the client with a
proposal for the next phase of work. Include in your request a budget,
a time frame, and staffing levels. Explain why this proposal would be
valuable for the client and worth the investment in your consulting
services. Please submit this answer as a short video recording. You
may use any video recording program you feel comfortable with. The
only requirements are that you are visible and audible in the
video. You may also submit a file of slides if that is part of your pitch.
Assessment

Part 1: Summarizing the marketing data: 10 points.
Part 2: Answering specific questions about the surveys: 15 points,
evenly divided among each question. Partial credit may be awarded
for modest effort in the right direction (1-2 points) or a largely correct
approach with small mistakes (3-4 points).
 Part 3: Creating a dynamic reporting engine for further exploration: 50
points, evenly divided among each question. Partial credit may be
awarded based on judgments.
 Part 4: Identifying opportunities: 25 points, with 5 points each for
Questions 1-3 and 10 points for Question 4.

Submission
Please turn in the following files:

Your reporting code for the static report (.Rmd file);

Your code for the dynamic application (.Rmd file);
Any supplementary coding files (constants.R, functions.R, and any
additional files);


Output file showing your answers to Parts 1-4 (.html file);

Video file for Part 4;
Download