Uploaded by Norman Chua

Learning Summary - Data Analysis (Google Certification)

advertisement
Course Outline
Foundations
Ask
Prepare
Process
Analyze
Share
Act
Capstone
Skillsets
 Using data in everyday life
 Thinking analytically
 Applying tools from the data analytics toolkit
 Showing trends and patterns with data visualizations
 Ensuring your data analysis is fair
 Asking SMART and effective questions
 Structuring how you think
 Summarizing data
 Putting things into context
 Managing team and stakeholder expectations
 Problem-solving and conflict-resolution
 Ensuring ethical data analysis practices
 Addressing issues of bias and credibility
 Accessing databases and importing data
 Writing simple queries
 Organizing and protecting data
 Connecting with the data community (optional)
 Connecting business objectives to data analysis
 Identifying clean and dirty data
 Cleaning small datasets using spreadsheet tools
 Cleaning large datasets by writing SQL queries
 Documenting data-cleaning processes
 Sorting data in spreadsheets and by writing SQL queries
 Filtering data in spreadsheets and by writing SQL queries
 Converting data
 Formatting data
 Substantiating data analysis processes
 Seeking feedback and support from others during data analysis
 Creating visualizations and dashboards in Tableau
 Addressing accessibility issues when communicating about data
 Understanding the purpose of different business communication tools
 Telling a data-driven story
 Presenting to others about data
 Answering questions about data
 Coding in R
 Writing functions in R
 Accessing data in R
 Cleaning data in R
 Generating data visualizations in R
 Reporting on data analysis to stakeholders
 Building a portfolio
 Increasing your employability
 Showcasing your data analytics knowledge, skill, and technical expertise
 Sharing your work during an interview
 Communicating your unique value proposition to a potential employer
Course 1– Foundations: Data, Data, Everywhere
Course 1, Week 1
Transforming data into insights
Data – a collection of facts that can be used to draw conclusions, make predictions, and assist in decisionmaking.

Data needs to be controlled by businesses so they can use it to improve processes, identify
opportunities and trends, launch new products, serve customers, and make thoughtful
decisions.
Analysis – turning data into insights.
Data Analysis – the collection, transformation, and organization of data in order to draw conclusions, make
predictions, and drive informed decision-making.
Data Analysis Process:
o
o
o
o
o
o
Ask
Prepare
Process
Analyze
Share
Act
Data Science branches into:
a)
b)
c)
Machine Learning / AI
used for automation / making many decisions under uncertainty
excellence requires high performance (ie. success rate, accuracy)
Statistics
used for making a few important decisions under uncertainty
excellence requires care and rigor, with the intent to protect decision makers from coming to
the wrong conclusion
Analytics
used for uncovering the unknown and figuring out inspiration for decisions (ie. unsure how
many decisions are needed to be made)
excellence requires speed amidst ambiguity
Business Analytics – the use of math and statistics to collect, analyze, and interpret data to make better business
decisions.
o
o
o
o
Descriptive Analytics – the interpretation of historical data to identify trends and patterns (“what
happened?”)
Diagnostic Analytics – used to identify root causes of problems and correlations between
variables (“why did this happen?”)
Predictive Analytics – taking interpreted data and using it to forecast future outcomes to inform
business strategies (“what might happen in the future?”)
Prescriptive Analytics – used to determine which outcome will yield the best result given a
scenario (“what should we do next?”)
Business Analytics vs. Data Science
Business Analytics – main goal is to extract meaningful insights from data to guide organizational decisions
(tasks such as budgeting, forecasting, product development)
Data Science – focused on turning raw data into meaningful conclusions through using algorithms and statistical
models (tasks such as data wrangling*, programming, statistical modeling)
(https://online.hbs.edu/blog/post/importance-of-business-analytics)
(https://online.hbs.edu/blog/post/business-analytics-examples)
* Data Wrangling – also called data cleaning, data remediation, or data munging – refers to a variety of
processes designed to transform data into more readily used formats. Can be manual or automated. (ie. Merging
multiple data sources into a single dataset for analysis, identifying gaps in data, deleting data that’s either
unnecessary or irrelevant to the project being worked on, identifying extreme outliers in data and either explaining
the discrepancies or removing them)
o
o
o
o
o
o
Discovery – familiarizing with data to conceptualize how to employ it
Structuring – transforming data to readily use it
Cleaning – removing inherent errors in data that might distort the analysis
Enriching – determining whether to enrich or augment existing data
Verifying – confirming if data is consistent and of high quality
Publishing – making data available for analysis
(https://online.hbs.edu/blog/post/data-wrangling)
5 Business Analytics Skills for Professionals
1.
2.
3.
4.
5.
Data Literacy*
 Familiarity with the language of data, including different types, sources, tools, and techniques
Data Collection
 Samples include existing datasets, customer surveys, interviews, questionnaires, and focus
groups
Statistical Analysis
 Methods include:

Hypothesis Testing (statistical means of testing an assumption),

Linear Regression Analysis (used to evaluate the relationship between two
variables)

Multiple Regression Analysis (used to evaluate the relationship between three or
more variables)
Communication
 Includes oral communication and presentation skills, and written communication in the form of
reports
Data Visualization
 Allows to present findings in easily digestible formats for those who may not be as data literate
 More effective to distill findings to key takeaways and present in a manner that’s easy to
understand
(https://online.hbs.edu/blog/post/business-analytics-skills)
*Data Literacy Skills & Concepts






Data Analysis
 Descriptive Analysis – seeks to explain or describe what has happened
 Diagnostic Analysis – seeks to explain or diagnose why something has happened
 Predictive Analysis – seeks to forecast what might happen
 Prescriptive Analysis – seeks to prescribe a course of action that might lead to a desired
outcome
Data Wrangling
 Act of transforming data from raw state into something that can be readily used
 Also known as data munging or data cleaning
Data Visualization
 Process of creating a graphical or visual representation of data and often crucial piece of
effectively communication insights both inside and outside the organization
Data Ecosystem
 Refers to all of the components an organization leverages to collect, store, and analyze data
 Includes physical infrastructure like server space and cloud storage solutions, and non-physical
components like data sources, programming languages, code packages, algorithms, and
software
Data Governance
 The process and practices an organization uses to formally manage its data assets
 Typically broken down into:

Quality: how to ensure data remains accurate, trustworthy, and complete

Security: how to secure data from unauthorized access

Privacy: how to protect sensitive information collected and stored

Stewardship: how to ensure data processes are followed appropriately
Data Team



Data Scientists – leverages advanced mathematics, programming, and tools to conduct and
manage large-scale analyses
Data Engineers – responsible for building and maintaining datasets that are leveraged in data
projects
Data Analysts – conducts majority of the analyses an organization requires
(https://online.hbs.edu/blog/post/data-literacy)
Top Data Science Skills:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Critical Thinking – ability to recognize business problems, conduct testing, and swiftly identify trends in
data
Mathematical Ability – including statistics, probability, linear algebra, multivariable calculus
Data Visualization – transforming data into compelling visuals that tell a story
Programming Skills – including Python, R, SQL
Data Wrangling – cleaning up data in preparation for analysis
Business Fluency – necessary to understand what information drives business decisions
Communication – with the help of data visualization
Machine Learning – the use of computer algorithms that automatically learn and adapt from data. Uses
include risk management, performance analysis, trading, and automation
Ethical Skills
(https://online.hbs.edu/blog/post/data-science-skills)
4 Ways to Improve Analytical Skills
(https://online.hbs.edu/blog/post/how-to-improve-analytical-skills)
6 Steps to Analyzing Datasets
1.
2.
3.
4.
5.
6.
Clean up data
Data wrangling – the process of uncovering and correcting, or eliminating inaccurate or repeat
records from dataset, transforming raw data into useful format for analysis.
Identify the right questions
Questions should easily be measurable and closely related to specific business problems.
Know what should be learned, what is expected to be learned, and how information will be
used.
Break down data into segments
Break datasets into smaller, defined groups
Visualize the data
Data visualization* – the process of creating graphical representations of data to help easily
identify any trends or patterns and obvious outliers. Engaging visuals also help in effectively
communicating findings to key stakeholders.
Use data to answer questions
If results are inconclusive, revisit previous steps in the process
Supplement with qualitative data
Pair quantitative findings with qualitative information (which may be captured via
questionnaires, interviews, or testimonials). Datasets tend to be useful in understand “what”,
while qualitative information tends to give insights on “why.”
*Data Visualization – the process of creating a visual representation of the information within a dataset. The idea
is to make data more accessible across an organization and to better communicate with external parties. Most
common techniques include:






Pie Charts
Bar Charts
Histograms
Gantt Charts
Heat Maps
Box-and-Whisker Plots
(https://online.hbs.edu/blog/post/how-to-analyze-datasets)





Waterfall Charts
Area Charts
Scatter Plots
Infographics
Maps
Data Visualization Techniques
(https://online.hbs.edu/blog/post/data-visualization-techniques)
Data Visualization Tools:
1.
2.
3.
4.
5.
6.
7.
8.
Microsoft Excel (and Power BI)
Google Charts
Tableau
Zoho Analytics
HubSpot
Databox
Datawrapper
Infogram
(https://online.hbs.edu/blog/post/data-visualization-tools)
Understanding the data ecosystem
Ecosystem – a group of elements that interact with one another
Data Ecosystem – a combination of hardware tools, software tools, and human resource that interact with one
another in order to produce, manage, store, organize, analyze, and share data*
Cloud – a place to keep data online
*Job of data analysts is to harness the power of the data ecosystem to find the right information, and provide
analysis that helps make smart decisions
Data Scientist vs. Data Analyst
Data Science – creating new ways of modeling and understanding the unknown by using raw data
Data Scientists – creates new questions using data
Data Analysts – finds answers to existing questions by creating insights from data sources
Data Analysis vs. Data Analytics
Data Analysis – collection, transformation, and organization of data in order to draw conclusions that help drive
informed decision-making
Data Analytics – the science of data, a broad concept that encompasses everything from the job of managing and
using data to the tools and methods the data workers use everyday
Data-driven decision-making
-
Using facts to guide business strategy
Involves figuring out business needs, finding relevant data, analyzing it, then using it to uncover
trends, patterns, and relationships
Most powerful when data is combined with human experience, observation, and intuition (in
certain cases)
Insights from subject matter experts must be included, as they can identify inconsistencies,
make sense of gray areas, and eventually validate choices being made
Daya Analysis Process:
o Ask questions and define the problem
o Prepare data by collecting and storing the information
o Process data by cleaning and checking the information
o Analyze data to find patterns, relationships, and trends
o Share data with your audience
o Act on the data and use the analysis results
* Blending data with business knowledge, plus sometimes a touch of gut instinct, is a common part of
the process.
* Good questions to ask when figuring out how much business knowledge and gut instinct should be
involved in each project:
(a) What kind of results are needed?
(b) Who will be informed?
(c) Am I answering the question being asked?
(d) How quickly does a decision need to be made?
Data Analysis Life Cycle
*No single defined structure of phases, but fundamentals are always shared.
(https://online.hbs.edu/blog/post/data-life-cycle)
(https://www.informit.com/articles/article.aspx?p=2473128&seqNum=11&ranMID=24808)
Google’s process:
1.
Ask: Business Challenge/Objective/Question
2.
Prepare: Data generation, collection, storage, and data management
3.
Process: Data cleaning/data integrity
4.
Analyze: Data exploration, visualization, and analysis
5.
Share: Communicating and interpreting results
6.
Act: Putting your insights to work to solve the problem
Dell’s process:
1.
Discovery
2.
Pre-processing data
3.
Model planning
4.
Model building
5.
Communicate results
6.
Operationalize
Project-based data analytics life cycle:
1.
Identifying the problem
2.
Designing data requirements
3.
Pre-processing data
4.
Performing data analysis
5.
Visualizing data
(http://pingax.com/understanding-data-analytics-project-life-cycle/#google_vignette)
Big Data analytics life cycle:
1.
Business case evaluation
2.
Data identification
3.
Data acquisition and filtering
4.
Data extraction
5.
Data validation and cleaning
6.
Data aggregation and representation
7.
Data analysis
8.
Data visualization
9.
Utilization of analysis results
(https://www.informit.com/articles/article.aspx?p=2473128&seqNum=11&ranMID=24808)
Course 1, Week 2
Embracing Data Analyst Skills
Analytical Skills – qualities and characteristics associated with solving problems using facts.
1.
2.
3.
4.
5.
Curiosity – wanting to learn something
Understanding Context – the condition to which something exists or happens
Having Technical Mindset – the ability to break things down into smaller steps or pieces, and work with
them in an orderly and logical way
Data Design – the skill of organizing information
Data Strategy – the management of people, processes, and tools used in data analysis
a. People needs to know how to use the right data to find solutions to the problem
b. Processes ensure the path to the solution is clear and accessible
c. The right technological tools need to be used for the job
Thinking About Analytical Thinking
Analytical Thinking – identifying and defining a problem and then solving it by using data in an organized, stepby-step manner
Five Key Aspects to Analytical Thinking:
1.
2.
3.
4.
5.
Visualization
The graphical representation of information that allows information to be understood and
explained more effectively
Examples include graphs, maps, or other design elements
Strategy
Necessary to stay focused and on track amidst large quantity of data
Involves understanding what needs to be achieved with data and identifying how to get there
Improves quality and usefulness of data collected
Problem-Orientation
Using a problem-oriented approach in identifying, describing, and solving problems
Keeping the problem top of mind throughout the entire project
Correlation
Identifying relationships between two or more pieces of data
REMINDER: CORRELATION DOES NOT EQUAL CAUSATION, just because pieces of data
are both trending in the same direction does not necessarily mean they’re all related
Big-Picture / Detail-Oriented Thinking
a. Big-Picture Thinking

Looking at the whole picture without getting stuck on every tiny piece of information

Important to zoom out and see possibilities and opportunities
b. Detail-Oriented Thinking

Figuring out all of the aspects that will help execute plans
Thinking Methods:
1.
2.
3.
Analytical Thinking
Critical Thinking
Creative Thinking
Usual questions by Data Analysts:
1.
2.
3.
“What is the root cause of the problem?”
- Can be addressed through the Five Whys
Ask “why?” five times to get to the root cause of the problem
“Where are the gaps in our process?”
- Can be addressed through Gap Analysis
Examination and evaluation of how a process works currently order to achieve ideal future
improvement
General approach is to understand where the process is now in comparison to where it should
ideally be, then identify the gap and how to bridge the current and future state
“What did we not consider before?”
- A great way to think about information or procedure that might be missing from a process in order to
improve future decision and strategy making
Thinking About Outcomes


Data-driven decision-making allows businesses to:
 Gain valuable insights
 Verify theories or assumptions
 Better understand opportunities and challenges
 Support objectives
 Help make plans
 Allows for greater confidence in choices and ability to address business challenges
 Allows for proactivity when opportunities present themselves
 Allows saving of time and effort when working towards a goal
Practicing necessary skills:
 Curiosity and Context – curiosity in patterns and relationships in everyday life, then using
context to make predictions, research answers, and draw conclusions
 Having a Technical Mindset – building on gut feelings and using technical approach to explore
them (seek out facts, analyze, then use infights to make informed decisions)
 Data Design – actively designing day-to-day data so that they are organized in a logical way
that makes them easy to access, understand, and make the most of
 Data Strategy – making sure others are on board on the procedures in place and technology
being used in gathering and using data
Course 1, Week 3
Data Life Cycle
Six Stages of Data Life Cycle:
1.
2.
3.
4.
5.
6.
Plan

When it’s decided what kind of data is needed, how it will be managed, and who will be
responsible for it
Capture
 When data is collected from variety of sources
 Data can be collected from outside resources (ie. publicly available datasets) or from internal
database*
 *When maintaining an internal database, ensuring data integrity, credibility, and privacy are
important
Manage
 When data is cared for and maintained. Includes determining how and where it is stored and
the tools used to do so.
 Integral to the process of data cleansing
Analyze
 When data is used to solve problems, make informed decisions, and support business goals
Archive
 When data is stored for long-term and for future reference
Destroy
 When data is removed from storage and any shared copies deleted to protect a company’s
private information and private data about its customers
 To destroy data on hard drives, secure data erasure software will be used. To destroy paper
files, they will be shredded
(https://www.sfmagazine.com/articles/2018/july/the-data-life-cycle/?psso=true)
(https://online.hbs.edu/blog/post/data-life-cycle)
Data Analysis Process Outline
Six Phases of Data Analysis Life Cycle:
1.
Ask: Define the problem and confirm stakeholder expectations



2.
Prepare: Collect and store data for analysis


3.

Begins with cleaning data, and understanding its structure, quirks, nuances, and its potential to
answer business questions
Involves quality assurance checks (ie. checking if all data anticipated is available, minimizing
missing data or gaps in data collection effort, identifying outliers) to ensure data can be
analyzed appropriately and responsibly
Analyze: Use data analysis tools to draw conclusions



5.
Think about what kind of data we need based on what is learned from Ask phase (ie. qualitative
vs. quantitative, cross-sectional / points in time vs. longitudinal over a long period of time)
Think about how to collect data (ie. existing data? Brand new data?)
Process: Clean and transform data to ensure integrity

4.
“What is the problem that we’re trying solve?”
“What is the purpose of this analysis?”
“What are we hoping to learn?”
Must be objective and unbiased
Involves a series of analyses that are planned as early as Ask phase
Involves looking for patterns (while being mindful of personal intuition)
Share: Interpret and communicate results to others to make data-driven decisions

May begin by sharing high-level findings to leadership, followed through by gradual digging and
sharing of information to the rest of the organization
6.
Act: Put your insights to work in order to solve the original problem

Based on results of analysis, decide on interventions not only on an organizational level but
also on the team level
Data Analysis Toolbox
Most common tools:
1.
2.
3.
4.
Spreadsheets
 A digital worksheet that allows to:
o Collect, store, organize, and sort information
o Identify patterns and piece the data together in a way that works for each specific data
project
o Create excellent data visualizations, like graphs and charts
 Two most popular ones are Microsoft Excel and Google Sheets
 Useful features include Formulas and Functions
o Formulas – a set of instructions that specific calculation using data in a spreadsheet
(ie. MDAS, average, sum of values that meet particular rule, etc.)
o Function – a preset command that automatically performs a specific process or task
using the data in a spreadsheet, allowing for efficiency
Query Languages
 A computer programming language that allows to:
o Isolate specific information from a database(s)*
o Make it easier to learn and understand requests made to databases
o Select, create, add, or download data from a database for analysis
 *A database is a collection of structured data stored in a computer system
 Most popular is SQL (Structured Query Language) or Sequel
 Akin to requesting the database to act on a command (ie. insert, delete, select, or update data)
Visualization Tools
 Using graphical representation of information (ie. graphs, maps, tables) to better communicate
insights to others
 Most popular ones include Tableau and Looker
o Tableau – simple drag and drop feature lets users create interactive graphs in
dashboards and worksheets
o Looker – communicates directly with a database, allowing connection with data right
to the visual tool chosen
 Allows to:
o Turn complex numbers into a story that people can understand
o Help stakeholders come up with conclusions that lead to informed decisions and
effective business strategies
Programming Languages
 Most common languages used by Data Analysts include R and Python, both used for
statistical analysis, visualization, and other data analysis
Choosing the right tools


During the Share phase of Data Analysis, Data Visualization tools are mostly used to create complex
and eye-catching visualizations
During Prepare, Process, and Analyze phase of Data Analysis, Spreadsheets and Query Languages are
most useful. The differences of both outlined below:
Spreadsheets
Databases
Software applications
Data stores - accessed using a query language (e.g. SQL)
Structure data in a row and column format
Structure data using rules and relationships
Organize information in cells
Organize information in complex collections
Provide access to a limited amount of data
Provide access to huge amounts of data
Manual data entry
Strict and consistent data entry
Generally one user at a time
Multiple users
Controlled by the user
Controlled by a database management system
Course 1, Week 4
Mastering Spreadsheet Basics
Main features of a spreadsheet:
1.
2.
3.
4.
5.
Cell
Column
 Column labels are called Attributes (also referred to as column names, column labels, headers,
or header row)
Row
 Also called as Observation
Formulas
 A set of instructions that perform specific actions using the data in the spreadsheet
 Uses Cell references for values calculated
 Always begin with “=” sign
Functions
Training references:
Google Sheets Training and Help
https://support.google.com/a/users/answer/9282959?visit_id=637361702049227170-1815413770&rd=1
Google Sheets Cheat Sheet
https://support.google.com/a/users/answer/9300022
Microsoft Excel Video Training
https://support.microsoft.com/en-us/office/excel-video-training-9bc05390-e94c-46af-a5b3-d7c22f6990bb
Structured Query Languages (SQL)






Useful for storing, organizing, and analyzing of data just like Spreadsheets, but allows for larger scale
(like a supersized Spreadsheet)
Needs a database that understands its language, and its queries* are universal
Query – a request for data or information from a database
Syntax
o A unique set of guidelines followed by programming languages (like SQL)
o The predetermined structure of a language that includes all required words, symbols, and
punctuation, as well as their proper placement
SQL Syntax:
o SELECT – use to choose the columns you want to return
o FROM – use to choose the tables where the columns you want are located
o WHERE – use to filter for certain information
Sample structure:
#2 Select
[choose the column(s) you want]
#1 From
[from the appropriate table the data lives on]
#3 Where
[a certain condition is met]


Fill information in sequence #1-3, the suggested order is to start big (data table) and
go small (specific conditions)
New line and indent are necessary when filling in information
Example (pulling data on customers with the first name Tony):
SELECT
first_name
FROM
customer_data.customer_name
WHERE
first_name = “Tony”

Multiple columns in a query structure:
SELECT
columnA,
columnB,
columnC,
FROM
Table where the data lives
WHERE
Certain condition is met
Example (pulling data on customers with the first name Tony):
SELECT
customer_id,
first_name,
last_name
FROM
customer_data.customer_name
WHERE
first_name = ‘Tony’


In general, it is more efficient to only select columns that are needed (such as those
that will actually use the additional fields in the WHERE clause)
Multiple columns and WHERE clause in a query structure:
SELECT
columnA,
columnB,
columnC,
FROM
Table where the data lives
WHERE
Condition 1
AND Condition 2
AND Condition 3



SELECT command uses a comma to separate fields/variable parameters
WHERE command uses the ‘AND’ statement to connect conditions
There are other connectors/operators for the WHERE command such as OR and
NOT
Example (pulling data on customers with multiple conditions):
SELECT
customer_id,
first_name,
last_name
FROM
customer_data.customer_name
WHERE
customer_id > 0
AND first_name = ‘Tony’
AND last_name = ‘Magnolia’
SQL Guide
A.
B.
C.
D.
Capitalization, indentation, and semicolons
o SQL queries can be written in all lower case and with extra spaces between words
o Capitalization and indentation can help read information more easily (better formatting)
o Semicolon may be required as a statement terminator

Part of the American National Standards Institute (ANSI) SQL-92 standard, semicolon
to be used as a common syntax

If a statement works without a semicolon, it’s fine
WHERE conditions
o SELECT clause -- identifies the column you want to pull data from
o FROM clause – identifies the table where the column is located
o WHERE clause – narrows query so that the database returns only the data with an exact value
match or the data that matches the input condition
o LIKE query – can be used to tell the database to look for certain patterns
o % (percent sign) or * (asterisk) – used as a wildcard to match one or more characters
o <> query – used to create conditions which “does not equal”
o Example:

Specific query:
WHERE field1 = ‘Chavez’

Pattern query:
WHERE field1 LIKE ‘Ch%’
WHERE field1 LIKE ‘Ch*’
SELECT all columns
o SELECT * -- selecting all columns in the table
o Although a correct SQL statement from a syntax point of view, it should be used sparingly and
with caution as it can cause a query to run slowly
Comments
o Used when tables aren’t designed with descriptive enough naming conventions
o Good practice to save time and energy understand previously written queries
o Comments are text placed between characters /* and */, or after two dashes (--)
o Can be placed outside of a statement or within a statement
o Example:
SELECT
field1 /* this is the last name column */
FROM
table -- this is the customer data table
WHERE
field1 LIKE ‘Ch%’;
o
E.
Aliases
o
o
o
o
o
o
Example:
-- This is an important query used later to join with the accounts table
SELECT
rowkey, -- key used to join with account_id
info.date, -- date is in string format YYYY-MM-DD HH:MM:SS
info.code – e.g., ‘pub-###’
FROM
Publishers
Assigns a new name or alias to the column or table names to make them easier to work with
Uses SQL “AS” clause
Can be used to avoid the need for comments
Only good for the duration of the query only
Doesn’t change the actual name of a column or table in the database
Example:
Field1 AS last_name – Alias to make my work easier
Table AS customer – Alias to make my work easier
SELECT
last_name
FROM
Customer
WHERE
last_name LIKE “Ch%’
F. SQL Tutorials:
(https://www.w3schools.com/sql/default.asp)
(https://www.sqltutorial.org/sql-cheat-sheet/)
Data Visualization


Graphical representation of information
Allows data to be easily understood and interesting to look at
Steps to plan a data visualization:
1.
2.
3.
Explore the data for patterns
 Reviewing basic information, behaviors, numerical data (ie. sales, basket size),
qualitative data (ie. gender, mobile/desktop), geographical information, etc.
Plan your visuals
 Identify which data, findings, and patterns should be included in the visualization
 Ex. Show sales over time, connect sales to location, show relationship between sales
and website use, show which customers fuel growth
Create your visuals
 Creating the right visualization for a presentation is a process which involves trying
different visualization formats and making adjustments as necessary
 The idea is to create the most compelling story for stakeholders
 Line charts -- can track sales over time
 Maps -- can connect sales to locations
 Donut charts -- can show customer segments
 Bar charts – can compare total visitors and visitors that make purchase
Data Visualization Toolkit:




Can use built-in visualization tools in spreadsheets
Can use more advanced tools such as Tableau that allow to integrate data into dashboard style
visualizations
With the programming language R, can use visualization tools in RStudio (an independent
integrated developer environment / IDE for visualization needs)
Choice will be driven by a variety of drivers including size of data, process used for analyzing
data (ie. spreadsheet, database/queries, or programming languages)
Course 1, Week 5
Issue
–
A topic or subject to investigate
Question
–
Designed to discover information
Problem
–
An obstacle or complication that needs to be worked out
Business Task
–
–
The question or problem data analysis answers for a business
ie. “Analyze weather data from the last decade to identify predictable patterns”
Data-driven decision-making
–
Using facts observed from data analysis to guide business strategy
Fairness
–
Ensuring that analysis doesn’t create or reinforce bias
Course 2– Ask Questions to Make Data-Driven Decisions
Course 2, Week 1
Problem-solving and effective questioning
Structured thinking
–
–
The process of recognizing the current problem or situation, organizing available information, revealing
gaps and opportunities, and identifying the options
Used to address a vague, complex problem by breaking it down into smaller steps that will lead to
logical solutions
Take action with data
Structured Thinking:
-
Phase
Ask
Breaking the data analysis process into smaller, manageable parts (four basic activities):

Recognizing the current problem or situation

Organizing available information

Revealing gaps and opportunities

Identifying options
Primary Goal
Figure out what problem is
being solved





Prepare
Process
Decide what data are
needed to be collected and
how to organize it in order
to answer questions /
resolve problems
Clean data and get rid of
possible errors,
inaccuracies, or
inconsistencies







Analyze
Think analytically about
data (make sure it’s sorted
and formatted for easier
use)



Objectives/Considerations
Define the problem
Understand stakeholder’s
expectations
Focus on actual problem and
avoid distractions
Collaborate with stakeholders and
keep open line of communication
Take a step back and see the
whole situation in context
What metrics to measure
Locate data in database
Create security measures to
protect data
Use spreadsheet functions to find
incorrectly entered data
Use SQL functions to check for
extra spaces
Removing repeated entries
Checking as much as possible for
bias in the data
Perform calculations
Combine data from multiple
sources
Create tables with your results
1.
2.
1.
2.
1.
2.
1.
2.
3.
Share
Act
Summarize results with
clear and enticing visuals
to show stakeholders how
to solve problems and how
answers are reached upon
Take everything learned
from data analysis and put
them to use (ie. provide
stakeholders with
recommendations based
on findings in order to
make data-driven
decisions)
Solve problems with data




Make better decisions
Make more informed decisions
Lead to stronger outcomes
Successfully communicate
findings
1.
2.
1.
Relevant Questions
What are my stakeholders
saying their problems are?
How can I help the stakeholders
resolve their questions?
What do I need to figure out how
to solve this problem?
What research do I need to do?
What data errors or inaccuracies
might get in my way of getting
the best possible answer to the
problem I am trying to solve?
How can I clean my data so the
information I have is more
consistent?
What story is my data telling
me?
How will my data help me solve
this problem?
Who needs my company’s
products or services? What type
of person is most likely to use it?
How can I make what I present
to the stakeholders engaging
and easy to understand?
What would help me understand
this if I were the listener?
How can I use the feedback I
received during the share phase
(step 5) to actually meet the
stakeholder’s needs and
expectations?
Common Problem Types:
1.
2.
3.
4.
5.
6.
Making predictions
Using data to make an informed business decision about how things may be in the future
(ie. A company that wants to know the best advertising method to bring in new customers)
Categorizing things
Assigning information to different groups or clusters based on common features
(ie. To improve customer satisfaction, classify customer service calls based on keywords or
scores to identify top performers and help correlate certain actions taken with higher customer
satisfaction scores)
Spotting something unusual
Identifying data that is different from the norm
(ie. Smart watches analyzing aggregated health data can help product developers determine
the right algorithms to spot or set off alarms when certain data doesn’t trend normally)
Identifying themes
Grouping categorized information into broader concepts
(ie. UX designers needing help to identify themes to help prioritize right product features for
improvement. Examples of themes in a user study include: beliefs, practices, and needs)
Discovering connections
Finding similar challenges faced by different entities and combining data and insights to
address them
(ie. 3PL working with another company to get shipments delivered to customers on time by
analyzing wait times at shipping hubs)
Finding patterns
Using historical data to understand what happened in the past and is therefore likely to happen
again
(ie. Minimizing downtime caused by machine failure by analyzing maintenance data to discover
when / why most failures happen)
Craft effective questions
Avoid asking leading questions
-
Questions which lead respondents to answer in a certain way
Avoid asking close-ended questions
-
Questions that can be answered with a yes or no
Avoid asking vague questions
-
Questions that are too vague and lacks context
SMART Questions
1.
Specific
Specific questions are simple, significant, and focused on a single topic or a few closely related
ideas
Bad Question
“Are kids getting enough exercise these
days?”
2.
Measurable
Measurable questions can be quantified and assessed
Bad Question
“Why did our recent video go viral?”
3.
Good Question
“What percentage of kids achieve the
recommended 60 minutes of physical activity
at least give days a week?”
Action-oriented
Good Question
“How many times was our video shared on
social channels the first week it was
posted?”
-
Action-oriented questions encourage change
Bad Question
“How can we get customers to recycle our
product packaging?”
4.
Relevant
Relevant questions matter, are important, and have significance to the problem you’re trying to
solve
Bad Question
“Why does it matter that Pine Barrens tree
frogs started disappearing?”
5.
Good Question
“What design features will make our
packaging easier to recycle?”
Good Question
“What environmental factors changed in
Durham, North Carolina, between 1983 and
2004 that could cause Pine Barrens tree
frogs to disappear from the Sandhills
Regions?”
Time-bound
Time-bound questions specific the time to be studied
Bad Question
“Why does it matter that Pine Barrens tree
frogs started disappearing?”
Good Question
“What environmental factors changed in
Durham, North Carolina, between 1983 and
2004 that could cause Pine Barrens tree
frogs to disappear from the Sandhills
Regions?”
Common topics for SMART questions:





Objectives (ie. “What are the goals of the deep dive? What, if any, questions are expected to
be answered by this deep dive?”)
Audience (ie. “Who are the stakeholders? Who is interested or concerned about the results of
this deep dive? Who is the audience for the presentation?”)
Time (“What is the time frame for completion? By what date does this need to be done?”)
Resources (“What resources are available to accomplish the deep dive's goals?”)
Security (“Who should have access to the information?”)
Questions should always take into account fairness
-
Ensuring that questions don’t create or reinforce bias
ie. “These are the best sandwiches ever, aren’t they?”
ie. “What do you love most about our exhibits?”
Sample questions:
Take good notes
-
-
Ideal process is to ask questions, clarify understanding of responses, and then briefly record
them in notes
If a question is worth asking, then the answer is worth recording
Important aspects of the conversation to note include:

Facts – Write down any concrete piece of information, such as dates, times, names,
and other specifics

Context – Facts without context are useless. Note any relevant details that are
needed in order to understand the information being gathered

Unknowns – Sometimes there are important questions missed during a conversation.
Make a note of when they happen so answers can be figured out later
Sample notes:
A good guideline to think about:




Stakeholder’s business goals; in this case, the person you had a conversation with
Identifying the data needed to answer the SMART questions
Exploring what data the stakeholder already has
Determining the data that you don’t have, but need in order to answer the questions
Course 2, Week 2
Understand the Power of Data
Data-Driven Decision Making
Finding patterns and important insights from a collection of facts to make informed business
decisions
-
Data-Inspired Decision Making
Exploring different data sources to find out what they have in common
-
Algorithm – A process or set of rules to be followed for a specific task
Quantitative Data
-
Specific and objective measures of numerical facts
What? How many? How often? / Things that can be measured
Examples of measurable questions:

“How many negative reviews are there?”

“What’s the average rating?”

“How many of these reviews use the same keywords?”
Qualitative Data
-



Subjective or explanatory measures of qualities and characteristics
Things that can’t be measured by numerical data
Great for helping answer “why” questions
Examples of immeasurable questions:

“Why are customers unsatisfied?”

“How can we improve their experience?”
Qualitative Data Tools
Focus Groups
Social Media Text Analysis
In-Person Interviews



Quantitative Data Tools
Structured Interviews
Surveys
Polls
Follow the Evidence
Two common data presentation tools include:
1. Reports
- A static collection of data given to stakeholders periodically
2. Dashboards
- Live reflection of incoming data. Organizes information from multiple datasets into one central location
Data Presentation
Tool
Reports
Description
Pros
A static collection
of data given to
stakeholders
periodically



High-level historical data
Easy to design
Static, pre-cleaned and sorted
data


Continual maintenance
Less visually appealing
Dashboards
Monitors live,
incoming data

Dynamic, automatic, and
interactive
Shows more data
More stakeholder accesses
Low maintenance
Visually appealing



Labor-intensive design
Can be confusing
Not suitable if data report will
not be used very often
Potentially uncleaned data
Interface is susceptible to
breaking




Cons


Pivot Table
-
A data summarization tool that is used in data processing. Pivot tables are used to summarize,
sort, reorganize, group, count, total, or average data stored in a database.
-
A single quantifiable type of data that can be used as measurement
Usually involves simple math and can be combined into formulas that we can plug our
numerical data into
ie.

Revenue by individual salesperson = (number of individual sales) x (sales price)

ROI (Return on Investment) = (net profit over a period of time) / cost of investment

Customer Retention Rate (ability to keep customers over time) = Customers at the
beginning of period / customers at the end of the period
Metrics
-
Metric Goal
-
A measurable goal set by a company and evaluated using metrics
Benefits of using dashboards for analysts and stakeholders:
Creating a dashboard
Here is a process you can follow to create a dashboard:
1. Identify the stakeholders who need to see the data and how they will use it
To get started with this, you need to ask effective questions. Check out this Requirements Gathering Worksheet
to explore a wide range of good questions you can use to identify relevant stakeholders and their data
needs. This is a great resource to help guide you through this process again and again.
2. Design the dashboard (what should be displayed)
Use these tips to help make your dashboard design clear, easy to follow, and simple:

Use a clear header to label the information

Add short text descriptions to each visualization

Show the most important information at the top
3. Create mock-ups if desired
This is optional, but a lot of data analysts like to sketch out their dashboards before creating them.
4. Select the visualizations you will use on the dashboard
You have a lot of options here and it all depends on what data story you are telling. If you need to show a change
of values over time, line charts or bar graphs might be the best choice. If your goal is to show how each part
contributes to the whole amount being reported, a pie or donut chart is probably a better choice.
To learn more about choosing the right visualizations, check out Tableau’s galleries:

For more samples of area charts, column charts, and other visualizations, visit Tableau’s Viz Gallery.
This gallery is full of great examples that were created using real data; explore this resource on your
own to get some inspiration.

Explore Tableau’s Viz of the Day to see visualizations curated by the community. These are
visualizations created by Tableau users and are a great way to learn more about how other data
analysts are using data visualization tools.
5. Create filters as needed
Filters show certain data while hiding the rest of the data in a dashboard. This can be a big help to identify
patterns while keeping the original data intact. It is common for data analysts to use and share the same
dashboard, but manage their part of it with a filter. To dig deeper into filters and find an example of filters in
action, you can visit Tableau’s page on Filter Actions. This is a useful resource to save and come back to when
you start practicing using filters in Tableau on your own.
Types of Dashboards



Strategic – focuses on long term goals and strategies at the highest level of metrics
Operational – short-term performance tracking and intermediate goals
Analytical – consists of the datasets and the mathematics used in these sets
Strategic Dashboards
-
Used in evaluating and aligning strategic goals, providing information over the longest time
frame (from a single financial quarter to years)
Typically contains information used for enterprise-wide decision-making
Operational Dashboards
-
Arguably the most common type of dashboard containing information on a time scale of days,
weeks, or months, allowing for almost real-time performance on insights
Allows business to track and maintain immediate operational process in light of strategic goals
Analytical Dashboards
-
Contains vast amounts of data used by data analysts. They also contain details involved in the
usage, analysis, and predictions made by data scientists.
Created and maintained by data science teams and rarely shared with upper management as
they are difficult to understand
Connecting the Data Dots
Mathematical Thinking
-
Looking at a problem and logically breaking it down step-by-step to see relationship of patterns
in data in order to better analyze problems
Helps figure out the best tools to use for analysis (ie. different sizes of datasets will require
different tools)
Small Data
-
Specific
Short time-period
Day-to-day decisions
-
Large and less-specific
Long time-period
Big decisions
Big Data
Course 2, Week 3
Working with Spreadsheets
Common Spreadsheet Math Functions:





Sum
Average
Count
Min
Max
Spreadsheet Tasks:


Organize your data
o Pivot table

Sort and filter
Calculate your data
o Formulas
o Functions
Spreadsheets and the Data Life Cycle
Sample Open Data Sources:





World Bank (https://data.worldbank.org/)
World Health Organization
Google Public Data Explorer
U.S. Census Bureau
Philippine Statistics Authority (https://www.psa.gov.ph/)
Formulas in Spreadsheets
Formulas
-
A set of instructions that performs a specific calculation
-
Symbols that name the type of operator or calculation to be performed
Operator
Cell Reference
-
A single cell or a range of cells in a worksheet that can be used in a formula
-
Collection of two or more cells
Range
Functions in Spreadsheets
Functions
-
A preset command that automatically performs a specific process or task using the data
ie. SUM, AVERAGE, COUNT, MIN, MAX
Save Time with Structured Thinking
Problem Domain
-
The specific area of analysis that encompasses every activity affecting or affected by the
problem
Structured Thinking
-
The process of recognizing the current problem or situation, organizing available information,
revealing gaps and opportunities, and identifying the options
Scope of Work (SOW)
-
An agreed-upon outline of the work you’re going to perform on a project
Usually includes work details, schedules, reports
For data analysis, normally includes data preparation, validation, analysis of quantitative and
qualitative datasets, initial results, and reporting visuals
To keep data collection objective, ask the ff about the data collected:






Who
What
When
Where
How
Why
Course 2, Week 4
Balance Team and Stakeholder Needs
Stakeholders
-
People that have invested time, interest, and resources into the projects to be worked on
Project Managers
-
Responsible for planning and executing projects
Common Project Stakeholders:
Working Effectively w/ Stakeholders:
Focus on what matters:
1.
2.
3.
Who are the primary and secondary stakeholders?
Who is managing the data?
Where can we go for help?
Communication is Key
-
Different teams have different expectations on communication
It’s normal to feel lost at first, but it’s important to keep learning as you go and not be afraid to
ask clarifications
Set realistic timelines and prepare for roadblocks
Flag stakeholders about potential delays as early as possible
Good writing habits for emails:
-
Complete sentences with proper spelling and punctuation
Write clearly enough that anyone could understood
Read emails out loud
Don’t write too long, be clear and concise
Short and to the point; polite and well-written
Answer timely
If discussion is too long, set up a meeting instead
Communication skills needed by data analysts:
-
Listening
Speaking
Presenting
Writing
Communication tip:

Know your audience (can be used as a guideline for email flow)
1. Who is your audience?
2. What do they already know?
3. What do they need to know?
4. How can you best communicate what they need to know?
When approached with a request that is hasty:
1.
2.
3.
4.
5.
Reframe question
Problems
Challenges
Solutions
Timelines
ie. “I can certain check out the rates of completion, but I sense there may be more to the story here.
Could you give me two days to run some reports and learn what’s really going on?”
Limitations of Data:
When thinking about communicating findings, consider the ff:
1.
2.
3.
4.
5.
Does the analysis answer the original question?
Are there angles that haven’t been considered?
Can we answer questions that may get asked about the data and analysis?
How detailed should we be when sharing the results? Would high level analysis be
okay?
Does the analysis help the team make better, more informed decisions?
Amazing Teamwork
Meeting Best Practices:
Do’s:




Come prepared
o Bring what you need
o Read the meeting agenda
o Prepare notes and presentations
o Be ready to answer questions
Be on time
Pay attention
Ask questions
Don’ts:






Show up unprepared
Arrive late
Be distracted
Dominate the conversation, give others chance to talk and let them finish speaking
Talk over others
Distract people with unfocused discussion
Other tips:




Every meeting should focus on making a clear decision and should include the person needed to make
the decision
Schedule meetings immediately if decisions are needed to be made
Try to keep meeting participants under 10
When leading a meeting:
o Make sure to build and send an agenda beforehand
o Try to keep everyone involved
o Let everyone know the floor is open for questions after the meeting
o Take notes
o Afterwards, follow up on questions and send updates
o Try to have everyone put their phones or computers on silent when not speaking
Leading great meetings:
Conflict resolution:
-
Most common reasons for conflict are mismatched expectations and miscommunications
When conflicts arise, instead of focusing on who’s at fault, it’s best to reframe the problem

ie; “how can I best help you reach your goal?”
Find opportunities for the team to work together instead of feeling frustrated by the problem
Discussion is key to conflict resolution
Start a conversation
Understand the context
Course 3, Week 1
Collecting Data
Differentiate Between Data Formats & Structures
Explore Data Types, Fields, and Values
Download