Group Assignment 1 Foundations of Business Analytics Semester 2, 2025 Assignment Information This group assignment is due on Friday, September 5th at 23:59. Your task is to draw data driven insights from data provided by a Chicago Bike-Sharing App called Divvy. A quick introduction to Divvy and bike sharing, along with a table describing variables included in the dataset is provided in the file “Divvy_Intro.pdf” that was provided in the same zip file as this document. This question document and an answer document for you to fill in are available on Canvas as a zip file “group_assignment_1.zip.” A link to the Posit Cloud workspace containing the data and a Quarto Document where you can write the R code and explore the data is also on Canvas. The majority of questions in this assignment are multiple choice questions. Answer these by selecting the answer box on the Answer Document. For questions where you need to submit a short text based answer or a figure, the answer document clearly points you to where to insert the necessary text and/or figure(s). Answers to short answer questions should be contained within the text boxes. Sentence limits often apply to questions that require written answers - stick to them (or under them) to get full credit. Graders are instructed to not grade any content that exceeds the sentence limits. Point allocations per question are clearly marked next to each question. There are a total of 65 points across all questions. Fill in the answer document using Google Docs, and not Microsoft Word. The checkboxes are only guaranteed to work with Google Docs, which is collaboration friendly. When you are ready to submit, download the file as a PDF. Submit your assignment on Canvas before the due date. To submit your assignment, upload the answer sheet in PDF form. Submissions that do not use the Answer Sheet provided will receive a grade of zero. Submissions that are not received as a PDF will also receive a grade of zero. You do not need to submit the Quarto Document as part of your assignment solutions. Part A: Understanding the Business and Data Context Question 1 [1 points]. What types of business decisions might Divvy support using this data? Select all that apply: A. Deciding where to add new docking stations B. Personalizing pricing for each rider C. Understanding how casual and member riders differ D. Planning winter service levels E. Tracking bike maintenance compliance F. Evaluating ad effectiveness on Instagram Question 2 [1 points]. Which variables in the dataset help distinguish how and when people ride? Select all that apply: A. rideable_type B.start_lat C.ride_length D.started_at E.ride_id Question 3 [1 points]. Which of the following best describes the structure of this dataset? A. Panel data tracking individuals over time B. Aggregated data by station and hour C. A single-level transaction log where each row is a ride D. Experimental data with random assignment Question 4 [1 points]. Why is the variable member_casual central to Divvy’s business decisions? A. It allows Divvy to identify repeat riders by ID B. It distinguishes ride types that are priced differently and may reflect different user goals C. It tracks whether the rider started at a high-traffic station D. It is needed to calculate average trip distance Part B: Data Structure and Cleaning Before you can analyze or visualize data, you need to understand its structure and address any issues that could distort your results. This section focuses on identifying and fixing problems such as implausible values or missing station names. Question 5 [1 points]. Which line of code returns the number of rows and columns? A.colnames(df) B.glimpse(df) C.dim(df) D.df |> count() Question 6 [1 points]. Which function displays the variable names in the dataset? A.glimpse(df) B.colnames(df) C.names(df) D. Both b and c Question 7 [1 points]. Which variables are most useful for calculating ride duration and classifying user type? A.ride_id and rideable_type B.start_station_name and end_station_id C.started_at, ended_at and member_casual D.ride_length and ride_id Question 8 [1 points]. What does this code calculate? df |> summarise( missing_start = sum(is.na(start_station_name)), missing_end = sum(is.na(end_station_name)) ) A. It removes rides with missing stations B. It counts missing station names C. It replaces missing values with "Unknown" D. It filters out duplicates Question 9 [1 points]. Why is it important to identify rides with missing start or end station names? A. These rides tend to be duplicates B. They may lead to incorrect grouping or mapping C. Riders without a station ID are likely commuters D. Missing stations are more common on weekends Question 10 [3 points]. You want to remove all rides from the data that contain any of these conditions: ● A ride length of 0 minutes or less ● A ride length over 24 hours ● A missing start_station_name or end_station_name Write the code to complete these steps and store this updated data for the remainder of your analysis. Question 11 [1 points]. Why is it important to remove these implausible rides? A. They indicate theft or vandalism B. They can distort averages and mislead conclusions C. They show which users need account review D. They reduce file size for plotting Part C: Exploratory Analysis of Rider Behavior Once your data is clean, the next step is to summarize it. In this section, you’ll compute and interpret key statistics to uncover patterns in rider behavior across time, rider types, and trip characteristics. Question 12 [1 points]. Which line of code correctly counts the number of rides per month? A.df |> summarise(n = month) B.df |> count(month) C.df |> filter(month) D.df |> group_by(month) |> summarise(ride_length) Question 13 [1 points]. What can you learn from ride counts by month? A. Differences in average trip length by month B. Seasonality in ride volume C. Whether riders prefer weekdays or weekends D. Membership sign-up trends Question 14 [3 points]. You want to calculate ride counts by month and member_casual, then pivot the result so that each row is a month and columns show separate counts for members and casual riders. Complete the code below to return the correct result. df |> group_by(______, ______) |> summarise(n_rides = n(), .groups = "drop") |> _______(names_from = ______, values_from = ______) Question 15 [1 points]. What does the table from Question 14 reveal? A. Member usage peaks in winter; casuals peak in spring B. Casual ridership is highly seasonal; member usage is steadier C. Member and casual riders show identical monthly patterns D. Casual users avoid weekends Question 16 [4 points]. Write code that calculates the average ride duration by rider type and its standard deviation. Question 17 [1 points]. You find that casual riders take longer trips on average. What is the most likely explanation? A. Members take short rides due to price incentives B. Casual riders are tourists or leisure users C. Casual users are penalized for long rides D. Members use Divvy only during business hours Question 18 [2 points]. Write code that counts rides by day of the week and rider type. Use the count() function. Question 19 [1 points]. Casual rider volume peaks on weekends, while member usage is more consistent. What does this suggest? A. Casual riders commute more than members B. Members mostly ride on Sundays C. Casuals use Divvy for leisure; members ride regularly D. Weekends are when bike maintenance occurs Question 20 [4 points]. Write code that summarizes ride volume by hour and user type. Use the group_by() function. Question 21 [1 points]. You observe that member ride volume peaks around 8am and 5pm, while casual rider volume peaks in the mid-to-late afternoon. What is the most plausible interpretation? A. Member usage reflects commuting behavior, while casual riders are more likely riding for leisure B. Casual riders commute at less congested times to avoid traffic C. Members prefer to ride during peak pricing hours D. Both groups are most active during the workday, just at different times Part D: Data Visualization In this section, you'll create visualizations to explore rider behavior across time and user type. Your goal is to translate descriptive patterns into clear, well-labeled plots that support the insights you'll later communicate in your management summary. Question 22 [3 points]. Create a line plot showing the number of rides taken by members and casual riders each month. Your plot should reveal any seasonal trends in ride volume and clearly distinguish between rider types. Question 23 [3 points]. Create a bar plot comparing the number of rides taken on each day of the week, separated by rider type. Use side-by-side bars or grouped formatting to allow easy comparison. Question 24 [3 points]. Create a line plot showing the number of rides that begin during each hour of the day, separated by rider type. Question 25 [3 points]. Create a box plot comparing the distribution of ride lengths (in minutes) between members and casual riders. Use appropriate filtering to remove extreme outliers if needed, and explain your choices as necessary. Question 26 [2 points]. Combine your four plots into a single 2×2 figure. Arrange the plots in two rows and ensure that all plots share a single combined legend at the bottom of the figure. You may need to adjust your plot codes above to ensure consistency across all four plots. Hint: You may want to use the following lines of code to clean up any legends across multiple plots into one common legend: ● plot_layout(guides = "collect") ● theme(legend.position = "bottom") Part E: Management Summary Question 27 [5 points]. Summarize what you learned from the analysis. Highlight at least two important behavioral differences between members and casual users. (Max 2 paragraphs, 8 sentences) Hint 1: Think of this summary as the first thing that will be read by managers and analysts who are attending a meeting that you will lead which reports on the data you have examined. The meeting would be run as follows: (i) There are no slides or Powerpoint presentations, (ii) for the first 15 mins of the meeting all participants read a four page narrative memo summarising the data analysis and the results of which your Management Summary is the first page, (iii) After reading the report, the meeting will continue and attendees will be able to ask your team further questions to develop a deeper understanding of what you have done. Hint 2: Use the tips from the document “Write like an Amazonian” that we have posted on Canvas. These tips are written to help staff at Amazon write concise and meaningful documents, and are helpful to ensure your writing gets to the point. Question 28 [3 points]. Based on your analysis, propose one actionable change Divvy could implement to better serve its users or optimize operations. Your recommendation should be supported by evidence from your analysis. (Max 3 sentences) Question 29 [5 points]. Suppose Divvy wants to act on your recommendation. How could you test its effectiveness using data? Describe an analytical approach that could evaluate its impact. (Max 4 sentences)
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )