Uploaded by mofiy13955

Data Engineer Assignment

advertisement
Data Engineer Assignment
Question 1:
Scenario: You are working on a real-time data processing project for a social media platform.
The platform generates a massive stream of user activity data, including posts, likes, and
comments. As a data engineer, you need to design a data pipeline that can handle this
continuous stream of data efficiently.
Question: How would you utilize Google's Pub/Sub in this scenario to process and analyze the
real-time user activity data from the social media platform? Explain the key components and
steps involved in setting up the data pipeline using Pub/Sub.
Question 2:
Scenario: You are part of a team responsible for building and managing data pipelines for a
multinational e-commerce company. The company has a diverse range of data sources,
including transactional databases, customer reviews, and website clickstream data. The team is
looking for a solution to streamline the data integration and transformation processes.
Question: Explain how Google's Data Fusion can be leveraged in this scenario to simplify the
building and management of data pipelines. Discuss the key features and advantages of using
Data Fusion in an e-commerce data engineering workflow.
Question 3 (Bonus Question):
Scenario: You are working for a transportation company that operates a large fleet of vehicles.
The company collects extensive log data from its vehicles, including GPS coordinates, engine
diagnostics, and fuel consumption. The management team wants to optimize the operational
efficiency of the fleet and identify potential maintenance issues proactively.
Question: Describe the importance of log processing in this transportation company's data
engineering workflow. Explain how log processing can be used to monitor, debug, and optimize
the data pipelines that handle the vehicle log data. Provide an example of how log processing
can help in identifying maintenance issues and improving operational efficiency.
Please note that the scenario descriptions are provided to add context to the questions. Feel
free to modify or elaborate on the scenarios based on your specific requirements or
preferences.
Assignment: Visualizing Data in Looker from BigQuery or Power BI
Task:
Using either Looker or Power BI, create a visualization dashboard that displays meaningful
insights from a dataset stored in Google BigQuery. The dataset should contain relevant
information that can be visualized effectively to derive valuable business insights.
Instructions:
​ Select a dataset from Google BigQuery that aligns with the assigned task.
​ Determine the key metrics or dimensions that are relevant to the dataset and could
provide valuable insights.
​ Create an account in Looker or Power BI if you don't have one already.
​ Connect Looker or Power BI to Google BigQuery and load the selected dataset.
​ Design a dashboard in Looker or Power BI that includes at least three visualizations
showcasing different aspects of the data.
​ Ensure that the visualizations are visually appealing, clearly labeled, and provide
meaningful insights.
​ Annotate the visualizations with explanatory notes, highlighting the key findings or
trends.
​ Export the dashboard as a sharable link or document for evaluation.
Evaluation Criteria:
Your assignment will be evaluated based on the following criteria:
​ Understanding of data visualization principles and best practices.
​ Effectiveness and relevance of chosen visualizations in conveying insights.
​ Clarity and conciseness of explanatory notes accompanying the visualizations.
​ Overall design aesthetics and user-friendliness of the dashboard.
​ Creativity and originality in presenting the data.
Submission:
Form URL: https://forms.office.com/r/hB3eXea8Kp
Submit your assignment by providing the shareable link or document containing the
visualization dashboard in Looker or Power BI. Additionally, include any necessary credentials
or access permissions to view the dashboard.
There are three expected deliverables from candidates:
1) Create a document on Google Docs or any other suitable tool, and attach a shareable
unprotected URL in the designated field. All submissions will go through a plagiarism
check and candidates should avoid using genAI. We prioritize concise and
straightforward responses that focus on the main content.(Field 4 of the form).
2) Provide a live link to the dashboard graphs or alternatively, if the dashboard is not
deployed, provide PDF snapshots. Please enter the link or upload the snapshots in Field
5 of the form.
3) Record a screen capture with a voice-over explaining the dashboard you have created.
Upload the video to any drive or YouTube as a private video and submit the link in Field
6 of the form.
Note: Ensure that any sensitive or confidential information is removed or anonymized from the
dataset before submission.
In case of any queries please contact the following
1)Hema hema@vigaet.com
2)Chinmay chinmay.p@vigaet.com
3)Siddesh siddesh@vigaet.com
FAQ
Which dataset to use in bigquery?
Ans: Use any inbuilt data set provided by gcp or populate a custom one.
What if we don't have access to GCP credits?
Ans: If you don't have gcp credits, try to answer theoretical questions in the best way
possible.If not for google’s looker use powerBI to create graphs.
Which visualization tool to use out of looker and power BI?
Ans: Candidates can select any one of them.Those who don't have gcp free credits can
submit work of powerBI but looker is more preferable.
How to submit an assignment?
Ans: This document contains a form link which you can fill and submit your work
Download