Uploaded by Otar Tchitchinadze

BentleySpring2023Lectures

advertisement
Bentley160
Lecture -1-5
Dr. Hema Seshadri
Faculty Profile
https://dataworksai.com/?page_id=9
Upcoming Book
Analytics for Business Success
A Guide to Analytics Fitness™
by Hema Seshadri, Ph.D.
Book Synopsis: https://dataworksai.com/?page_id=529
Book Blurb: https://dataworksai.com/?page_id=27
Bentley Honor Code
1. Academic Integrity System Structure
2. Faculty and Student Responsibilities and Rights in the Academic Integrity
System
3. Violation Levels Defined and Recommended Sanctions
4. Academic Integrity Incident Reports and Consequences
5. Academic Integrity Hearing
https://catalog.bentley.edu/undergraduate/academic-policiesprocedures/academic-integrity/
Academic Support
1.
2.
3.
4.
Academic Learning Centers and Labs
Academic Skills Assistance
First Year Seminar
Peer Tutoring Assistance
https://catalog.bentley.edu/undergraduate/academic-programsresources/academic-support-services/
Disability Services
●
●
●
●
●
●
Learning disabilities
Attention Deficit/Hyperactivity Disorders
Mobility, visual and hearing impairments
Medical conditions
Psychiatric/psychological disabilities.
Services include:
● Academic accommodations
● Assistance with accessibility issues
● Community education
● Individual coaching and support
https://catalog.bentley.edu/undergraduate/academicprograms-resources/disability-services/
Tableau Accounts Setup
You will need at least 2 tableau accounts – the first will be the downloaded tableau desktop
Sign up here
https://www.tableau.com/academic/teaching
The second is tableau public
https://public.tableau.com/en-us/s/
CIS Sandbox
● This semester we have four tutors in the CIS Sandbox who can
help with CS 160
● Grace Hesterberg, John Giaquinto, Tomas Hahn, and Chris
Hagedorn. Together they are working about 20 hours per week for
drop in time
● (the schedule has not been finalized yet.)
● tutor and student responsibilities
CIS Sandbox
● Sandbox tutors for CS150, 350, 605 will be able to help with Tableau,
Lucid Charts, ERD and SQL
● When you come to see a tutor either for drop-in or appointment help,
be prepared for the meeting.
● Check the CIS Sandbox home page(http://cissandbox.com) to see
when Grace, John, Tomas, or Chris are working – the schedule should
be finalized by end of next week
● Contact the tutors to set up an appointment with one of the tutors
during their regular hours or outside of them.
● Tutors have limited availability, so be prepared for your session by
providing as much background information as possible in the email
that you send them.
Tableau
https://www.tableau.com/why-tableau/what-is-tableau
https://www.tableau.com/learn/training/20222
Getting Started
Tableau Prep
Connecting to Data
Lucid Charts
https://lucid.app/documents#/dashboard
https://www.lucidchart.com/blog/getting-started-in-lucidchart
https://www.youtube.com/watch?v=kM1B-jQUeVI
Welcome to CS160
The
primary objective of this course is to expose the student to the breadth, depth,
versatility and usefulness of data and databases in problem solving. This course
will develop the students’ foundational competencies related to data management
that allow them to critically analyze complex problems using a variety of data
sources and tools and to effectively present their ideas to others.
Welcome to CS160 –
The key learning objectives of this course are:
1. Understanding how data can support effective problem solving and
decision making in specific problem contexts,
2. Understanding how data are stored, organized, managed, and how data
can support effective problem solving and decision making in specific
problem contexts,
3. Acquiring, cleaning, and structuring data for analysis and decision
support,
4. Analyzing the data with relevant tools, and
5. Presenting the results of the analysis effectively to various stakeholder
groups
Data analyst salary and job outlook
● The average base salary for a data analyst in the US is $69,517 in December
2021, according to Glassdoor.
● This can vary depending on your seniority, where in the US you’re located,
and other factors.
● Data analysts are in high demand. The World Economic Forum listed it as
number two in growing jobs in the US [1].
● The Bureau of Labor Statistics also reports related occupations as having
extremely high growth rates.
https://www.coursera.org/articles/data-analytics-projects-for-beginners
Data analyst salary and job outlook
● From 2020 to 2030, operations research analyst positions are expected to
grow by 25 percent, market research analysts by 22 percent, and
mathematicians and statisticians by 33 percent.
● That’s a lot higher than the total employment growth rate of 7.7 percent.
https://www.coursera.org/articles/data-analytics-projects-for-beginners
Types of Data analyst
People who perform data analysis might have other titles such as:
● Medical and health care analyst
● Market research analyst
● Business analyst
● Business intelligence analyst
● Operations research analyst
● Intelligence analyst
https://www.coursera.org/articles/data-analytics-projects-for-beginners
What do Data Analyst do?
● Analysts are data storytellers.
● Their mandate is to summarize interesting facts and to use data
for inspiration.
● In some organizations those facts and that inspiration become
input for human decision-makers.
● But in more sophisticated data operations, data-driven inspiration
gets flagged for proper statistical follow-up.
● Good analysts have unwavering respect for the one golden rule of
their profession: do not come to conclusions beyond the data (and
prevent your audience from doing it, too).
https://hbr.org/2018/12/what-great-data-analysts-do-and-why-every-organization-needsthem#:~:text=Analysts%20are%20data%20storytellers.,input%20for%20human%20decision%2Dmakers.
Analytics for decision-making
● Analysts should lay out the story they’re tempted to tell and poke it from
several angles with follow-up investigations to see if it holds water
before bringing it to decision-makers.
● The decision-maker should then function as a filter between exploratory
data analytics and statistical rigor.
● If someone with decision responsibility finds the analyst’s exploration
promising for a decision they have to make, they then can sign off on a
statistician spending the time to do a more rigorous analysis. (This
process indicates why just telling analysts to get better at statistics
misses the point in an important way.
https://hbr.org/2018/12/what-great-data-analysts-do-and-why-every-organization-needsthem#:~:text=Analysts%20are%20data%20storytellers.,input%20for%20human%20decision%2Dmakers .
What do Data Analyst do?
● Good analyst use softened, hedging language.
● For example, not “we conclude” but “we are inspired to wonder”.
They also discourage leaders’ overconfidence by emphasizing a
multitude of possible interpretations for every insight.
● As long as analysts stick to the facts — saying only “This is what
is here.” — and don’t take themselves too seriously, the worst
crime they could commit is wasting someone’s time when they run
it by them.
https://hbr.org/2018/12/what-great-data-analysts-do-and-why-every-organization-needsthem#:~:text=Analysts%20are%20data%20storytellers.,input%20for%20human%20decision%2Dmakers.
Analytics for decision-making
● If someone with decision responsibility finds the analyst’s
exploration promising for a decision they have to make, they
then can sign off on a statistician spending the time to do a
more rigorous analysis.
● This process indicates why just telling analysts to get better at
statistics misses the point in an important way.
● Not only are the two activities separate, but another person sits
in between them, meaning it’s not necessarily any more efficient
for one person to do both things.)
https://hbr.org/2018/12/what-great-data-analysts-do-and-why-every-organization-needsthem#:~:text=Analysts%20are%20data%20storytellers.,input%20for%20human%20decision%2Dmakers.
Excellence in analytics: speed
● The best analysts are lightning-fast coders who can surf vast
datasets quickly, encountering and surfacing potential insights
faster than those other specialists can say “whiteboard.”
● Speed is their highest virtue, closely followed by the ability to
identify potentially useful gems.
● A mastery of visual presentation of information helps, too:
beautiful and effective plots allow the mind to extract information
faster, which pays off in time-to-potential-insights.
https://hbr.org/2018/12/what-great-data-analysts-do-and-why-every-organization-needsthem#:~:text=Analysts%20are%20data%20storytellers.,input%20for%20human%20decision%2Dmakers.
Project Portfolio
● If you’re getting ready to launch a new career as a data analyst,
Job listings ask for experience, but how do you get experience if
you’re looking for your first data analyst job?
● This is where your portfolio comes in.
● The projects you include in your portfolio demonstrate your skills
and experience—even if it’s not from a previous data analytics
job—to hiring managers and interviewers.
https://www.coursera.org/articles/what-does-a-data-analyst-do-a-career-guide
Project Portfolio
● What do I put in my portfolio if I don’t have work experience?
If you’re just starting out and don’t yet have work experience as a data
analyst, include projects you’ve completed on your own or as part of
your coursework.
● Start with small projects, and add them as you go. Once you learn
how to scrape a website, for example, you can add a screenshot
of your code, as well as a short paragraph explaining what you
did.
https://www.coursera.org/articles/what-does-a-data-analyst-do-a-career-guide
How to build a data analytics portfolio
What to include in your portfolio
A simple portfolio should include at least two sections, an
● “About me” section
● Data analytics projects.
https://www.coursera.org/articles/what-does-a-data-analyst-do-a-career-guide
About me
How to build a data analytics portfolio
The “About me” page gives you an opportunity to introduce prospective
employers to who you are, what you do, and why it’s important to you. You
can use this section to explain:
● How you got started in data analysis
● What about data interests you most
● Where your passions lie in relation to data analytics
● This is also a great place to include your contact details (if you don’t
have them on a separate page) and links to your social media
https://www.coursera.org/articles/what-does-a-data-analyst-do-a-career-guide
accounts.
How to build a data analytics portfolio
Projects
● Visualize data to tell a story: Create a chart, map, graph, or other visualization
to make your data easier to understand.
● Communicate complex ideas: Consider writing a blog post that outlines your
process or explains a difficult data concept to highlight your communication
skills.
● Collaborate with others: If you’ve worked on a group project, be sure to
include it.
● Use data analysis tools: Share projects that show off your ability to use SQL,
Python, R, Tableau, etc.
https://www.coursera.org/articles/what-does-a-data-analyst-do-a-career-guide
Projects
How to build a data analytics portfolio
The bulk of your portfolio will likely comprise a series of projects and case
studies that demonstrate your key skills. In general, your portfolio should
showcase your best or latest work. Try to include projects that highlight your
ability to:
● Scrape data from websites: Show your code, and use hashed
comments to explain your thinking.
● Clean data: Take a dataset with missing, duplicate, or other
problematic data, and walk through your data cleaning process.
● Perform different types of analysis: Use data to perform diagnostic,
descriptive, predictive, and prescriptive analysis.
https://www.coursera.org/articles/what-does-a-data-analyst-do-a-career-guide
Excellence in analytics: speed
● As analysts mature, they’ll begin to get the hang of judging
what’s important in addition to what’s interesting, allowing
decision-makers to step away from the middleman role.
● The result is that the business gets a finger on its pulse and
eyes on previously-unknown unknowns.
● This generates the inspiration that helps decision-makers select
valuable quests to send statisticians and ML engineers on,
saving them from mathematically-impressive excavations of
useless rabbit holes.
https://hbr.org/2018/12/what-great-data-analysts-do-and-why-every-organization-needsthem#:~:text=Analysts%20are%20data%20storytellers.,input%20for%20human%20decisio
n%2Dmakers.
Business Objectives
●
●
●
●
●
●
What are the business Objectives(s)?
What are the business metrics to measure?
What are the business output & outcomes?
Why are they important?
How should you answer the question(s)?
How do you know when the question(s) are
answered?Wh
https://hbr.org/2018/12/what-great-data-analysts-do-and-why-every-organization-needsthem#:~:text=Analysts%20are%20data%20storytellers.,input%20for%20human%20decisio
n%2Dmakers.
https://www.dailymail.co.uk/news/article-3168408/Where-s-sun-s-lolly-Rising-temperatures-lead-boomsales-icecreams-beer-cider.html
Ice Cream Sales (Monthly )
http://www.mas.ncl.ac.uk/~nag48/teaching/MAS1403/notes7slr.pdf
Ice Cream Sales (Monthly
http://www.mas.ncl.ac.uk/~nag48/teaching/MAS1403/notes7slr.pdf
Sales Distribution
1. The Sales distribution of
various categories relative
to each other
2. Their respective Profit
margins.
3. Each Category’s Sub –
Category Product Sales
4. Sales growth of
Categories over the years
https://www.analyticsvidhya.com/blog/2017/07/data-visualisation-made-easy/
PROJECT
As part of this course, students will undertake a real-world data project. The project will consist of
addressing several questions and requirements using data and analytic approaches and tools.
The project will be carried out in multiple phases, each requiring a mandatory in class
presentation. The following questions will help you to consider your project:
●
●
●
●
●
●
What are the business Objectives(s)?
What are the business metrics to measure?
What are the business output & outcomes?
Why are they important?
How should you answer the question(s)?
How do you know when the question(s) are answered?Wh
Class Project and Presentations
Throughout the course, we will be using the same dataset as this gives you the
opportunity to become well versed in the data. With this dataset you will complete
the following:
Phase I
● Initial analysis in Tableau that will provide the foundation for your subsequent
analyses
● Tools: Group Tableau Dashboard, slides and presentation
Phase II
● Incorporating Business objectives, consulting for context, storytelling and
visualization concepts
● Tools: Group Tableau Dashboard, Lucid Charts, slides and presentation
Class Project and Presentations
Phase III
● Final Project Presentation
● Final Group Dashboard (Tableau, Lucid Charts, slides and mandatory in
class presentation
●
●
●
●
●
Dataset criteria
At least two data sets with a common key to be joined*
A data set that needs some level of cleaning,
At least one field of one of the data sets would need some prep
A data set that lands itself to some basic stats
The data sets should have more than seven data points
(variables/attributes) and at least one thousand rows/records
● Data sets that provide a rich story to be told, this is, rich in
patterns and/or some statistical correlation
https://learning.oreilly.com/library/view/effective-data-storytelling/9781119615712/c07.xhtml#usec0003
● P
Assignment
● Tools: Powerpoint, Tableau
1 [Phase I]
Purpose of Assignment (WHY)
● Communicating complex data information and insights through storytelling with
data visualizations using dashboards, scorecards, spatial data representations and
use of annotations is an important aspect of data analytics.
● In various roles you must be able to evaluate, propose and implement appropriate
visualizations for a specified audience using key informational design concepts.
● In this assignment you will conduct an initial analysis of the data set of your
choosing.
● Your group will be using the same dataset as this gives you the opportunity to
become well versed in the data.
Assignment 1- Group Activity
Tools: Tableau, Powerpoint
Assignment Description (WHAT)
● Once you understand the data, it’s much easier to design a dashboard knowing the
key variables.
● First, review the data set and start looking for valuable information to determine
key drivers of the analysis.
● Analyze the variables/attributes/columns, information and seek correlations that
complete a basic analysis.
● In your analysis, answer the following questions.
Collaborate, Collaborate, Collaborate
● We are using groups for assignments and projects
● Our colleagues are sometimes our best resource when
learning a new theory or application.
● Therefore, it is important that you get to know your group
members as soon as possible.
● Use your group as a resource throughout the course when
you encounter software, technical, or any other questions
about the project.
https://www.coursera.org/articles/data-analytics-projects-for-beginners
In Class Activity
1.
2.
3.
4.
Class members introduction
Find your group for projects
Dataset review
Lucid Chart set up
From:
Degree:
Class:
ADD PHOTO
HERE
Most Interesting or Unusual Job:
Hobbies:
Recent Achievement or News:
Fun Facts:
Name
On the Bucket List:
Favorite Meal:
Major
Goals:
Popular Analytics Data Sets
https://careerfoundry.com/en/blog/data-analytics/data-analytics-portfolio-examples/
https://dataworksai.com/?p=389
Part I: Essentials
Concepts and Terminology
Datasets
● Collections or groups of related data are generally referred to as datasets.
● Each group or dataset member (datum) shares the same set of attributes
or properties as others in the same dataset.
● Some examples of datasets are:
• tweets stored in a flat file
• a collection of image files in a directory
• an extract of rows from a database table stored in a CSV formatted file
• historical weather observations that are stored as XML files
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec2
Concepts and Terminology
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec2
Analytics Terminology
● Data are observations or measurements (unprocessed or processed)
represented as text, numbers, or multimedia.
● A dataset is a structured collection of data generally associated with a
unique body of work.
● A database is an organized collection of data stored as multiple datasets.
Those datasets are generally stored and accessed electronically from a
computer system that allows the data to be easily accessed, manipulated,
and updated
https://www.usgs.gov/faqs/what-are-differences-between-data-dataset-and-database
Analytics Terminology
Data Science vs. Data Analytics
Data science is the process of building, cleaning, and structuring
datasets to analyze and extract meaning.
Data analytics, on the other hand, refers to the process and practice of
analyzing data to answer questions, extract insights, and identify trends.
You can think of data science as a precursor to data analysis. If your
dataset isn’t structured, cleaned, and wrangled, how will you be able to
draw accurate, insightful conclusions? Below is a deeper dive into each
field’s role in business
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Analytics in Business
The main goal of business analytics is to extract meaningful insights from data
that an organization can use to inform its strategy and, ultimately, reach its
objectives.
Business analytics can be used for:
Budgeting and forecasting:
● By assessing a company’s historical revenue, sales, and costs data
alongside its goals for future growth, an analyst can identify the budget and
investments required to make those goals a reality. •
Risk management:
● By understanding the likelihood of certain business risks occurring—and
their associated expenses—an analyst can make cost-effective
recommendations to help mitigate them. •
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Analytics in Business
Business analytics can be used for:
Marketing and sales:
● By understanding key metrics, such as leadto-customer conversion
rate, a marketing analyst can identify the number of leads their
efforts must generate to fill the sales pipeline. •
Product development (or research and development):
● By understanding how customers reacted to product features in the
past, an analyst can help guide product development, design, and
user experience in the future.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Analytics Terminology
● Data preparation i(Data Wrangling) s the process of cleaning and
transforming raw data prior to processing and analysis. It is an
important step prior to processing and often involves reformatting
data, making corrections to data and the combining of data sets to
enrich data.
● Data preparation is often a lengthy undertaking for data
professionals or business users, but it is essential as a prerequisite
to put data in context in order to turn it into insights and eliminate
bias resulting from poor data quality.
● For example, the data preparation process usually includes
standardizing data formats, enriching source data, and/or removing
outliers.
https://www.talend.com/resources/what-is-data-preparation/
Reporting vs Analysis
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-the-difference#gs.ndyrvp
Reporting vs Analysis
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-the-difference#gs.ndyrvp
Reporting vs Analysis
● Reporting translates raw
data into information.
● Reporting helps companies to
monitor their online business
and be alerted to when data
falls outside of expected
ranges.
● Reporting raises questions
about the business from its
end users.
● Reporting provides no or
limited context about what’s
happening in the data.
● Analysis transforms data and information
into insights.
● The goal of analysis is to answer
questions by interpreting the data at a
deeper level and providing actionable
recommendations.
● Through the process of performing
analysis you may raise additional
questions, but the goal is to identify
answers, or at least potential answers that
can be tested.
● Context is critical to good analysis.
In summary, Reporting shows you what is happening
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-the-difference#gs.ndyrvp
Analysis focuses on explaining why it is happening and what you can do about it.
Data Driven Organization
https://www.oreilly.com/radar/being-data-driven-its-all-about-the-culture/
Analytics Value Chain
https://www.oreilly.com/radar/being-data-driven-its-all-about-the-culture/
Driving Value
● Data-driven stages (data > reporting > analysis > decision > action >
value) as a series of dominoes.
● If you remove a domino, it can be more difficult or impossible to achieve
the desired value.
● Provides specific guidance (Recommendations) on what actions to take
based on the key insights found in the data.
● Once a recommendation has been made, follow-up is another potent
outcome because
● Recommendations demand decisions to be made (go/no go/explore
further).
● Decisions precede action.
https://www.oreilly.com/radar/being-data-driven-its-all-about-the-culture/
● Action precedes value.
Analytics Value Chain
Types of Reporting: Canned reports, dashboards, and
alerts.
Canned reports
● These are the out-of-the-box and custom reports that you can access
within the analytics tool or which can also be delivered on a recurring
basis to a group of end users.
● Canned reports are fairly static with fixed metrics and dimensions.
● In general, some canned reports are more valuable than others, and a
report’s value may depend on how relevant it is to an individual’s role
(e.g., SEO specialist vs. web producer).
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-the-difference#gs.njnk2d
Analytics Value Chain
Types of Reporting: Canned reports, dashboards, and
alerts.
Dashboards
● These custom-made reports combine different KPIs and reports to
provide a comprehensive, high-level view of business performance for
specific audiences.
● Dashboards may include data from various data sources
● Dashboards can be static or dynamic
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-thedifference#gs.njnk2d
Analytics Value Chain
Types of Reporting: Canned reports, dashboards, and
alerts.
Dashboards
● These custom-made reports combine different KPIs and reports to
provide a comprehensive, high-level view of business performance for
specific audiences.
● Dashboards may include data from various data sources
● Dashboards can be static or dynamic
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-thedifference#gs.njnk2d
Analytics Value Chain
Types of Reporting: Canned reports, dashboards, and
alerts.
Alerts
● These conditional reports are triggered when data falls outside of
expected ranges or some other pre-defined criteria is met.
● Once people are notified of what happened, they can take appropriate
action as necessary.
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-thedifference#gs.njnk2d
Analytics Value Chain
Ad hoc responses:
● Analysts receive requests to answer a variety of business questions,
which may be spurred by questions raised by the reporting.
● Typically, these urgent requests are time sensitive and demand a quick
turnaround.
● The analytics team may have to juggle multiple requests at the same
time.
● As a result, the analyses cannot go as deep or wide as the analysts may
like, and the deliverable is a short and concise report, which may or may
not include any specific recommendations.
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-thedifference#gs.njnk2d
Analytics Value Chain
Analysis presentations
● Some business questions are more complex in nature and require more
time to perform a comprehensive, deep-dive analysis.
● These analysis projects result in a more formal deliverable, which
includes two key sections: key findings and recommendations.
● The key findings section highlights the most meaningful and actionable
insights gleaned from the analyses performed.
● The recommendations section provides guidance on what actions to take
based on the analysis findings.
https://blog.adobe.com/en/publish/2010/10/19/reporting-vs-analysis-whats-thedifference#gs.njnk2d
Analytics Categories
Descriptive Analytics
● Descriptive analysis is the simplest type of analysis.
● It describes and summarizes a dataset quantitatively.
● It characterizes the sample of data at hand and does not attempt
to describe anything about the population from which it comes.
● It can often form the data that is displayed in dashboards, such as
number of new members this week or booking year to date
https://learning.oreilly.com/library/view/creating-a-datadriven/9781491916902/ch05.html#chap_analysis
Descriptive Analytics
● The goal of descriptive analysis is to describe the key features of the
sample numerically.
● It should shed light on the key numbers that summarize distributions
within the data,
● It may describe or show the relationships among variables with
metrics that describe association, or by tables that cross-tabulate
counts.
https://learning.oreily.com/library/view/creating-a-data-driven/9781491916902/ch05.html#chap_analysis
Descriptive Analytics Metrics
The simplest but one of the most important measures is:
● Sample size: The number of data points or records in the sample.
● Mean (average): The arithmetic mean of the data: sum of values
divided by number of values.
● Median: The 50th percentile.
● Mode: The most frequently occurring value.
● Minimum: The smallest value in the sample (0th percentile).
● Q1: The 25th percentile. The value such that one quarter of the sample
has values below this value. Also known as the lower hinge.
● Q3: The 75th percentile. Also known as the upper hinge.
● Maximum: The largest value in the sample (100th percentile).
● Interquartile range: The central 50% of data; that is, Q3 – Q1.
https://learning.oreily.com/library/view/creating-a-data-driven/9781491916902/ch05.html#chap_analysis
Descriptive Analytics
The simplest but one of the most important measures is:
● Range: Maximum minus minimum.
● Standard deviation: Measure of the dispersion from the arithmetic
mean of a sample. It is the square root of variance, and its units are
the same as the sample data.
● Variance: Another measure of dispersion and is the average squared
difference from the arithmetic mean and is the square of standard
deviation. Its units are the square of those of the data.
https://learning.oreily.com/library/view/creating-a-data-driven/9781491916902/ch05.html#chap_analysis
Descriptive Analytics
This seeks answers about:
● "What is happening now?"
● What has happened in the past?"
Descriptive analytics, also called business intelligence (BI) or
operational analytics, is the gateway to advanced analytics.
In Class Activity
Data Exploration
Data: US Bureau of Labor Statistics
https://www.bls.gov/oes/current/oes_nat.htm
Goals: Data Exploration, Descriptive Analytics, Presentation, Collaboration,
Communication, Knowledge sharing,
● Group activity or via zoom)
● Combine your processes and data
● Add content to Lecture 2 activity file lname1_lname2_cs160_Assignment2
● Save file as lname1_lname2_cs160_Assignment3
● Present results in class room on designated date (PPT slides)
Locate Descriptive Analytics Metrics
● Download Data: downloadable XLS file.
● Sample size: The number of data points or records in the sample.
● Mean (average): The arithmetic mean of the data: sum of values
divided by number of values.
● Median: The 50th percentile.
● Mode: The most frequently occurring value.
● Minimum: The smallest value in the sample (0th percentile).
● Q1: The 25th percentile. The value such that one quarter of the sample
has values below this value. Also known as the lower hinge.
● Q3: The 75th percentile. Also known as the upper hinge.
● Maximum: The largest value in the sample (100th percentile).
● Interquartile range: The central 50% of data; that is, Q3 – Q1.
https://learning.oreily.com/library/view/creating-a-data-driven/9781491916902/ch05.html#chap_analysis
Locate Descriptive Analytics
● Download Data: downloadable XLS file.
● Range: Maximum minus minimum.
● Standard deviation: Measure of the dispersion from the arithmetic
mean of a sample. It is the square root of variance, and its units are
the same as the sample data.
● Variance: Another measure of dispersion and is the average squared
difference from the arithmetic mean and is the square of standard
deviation. Its units are the square of those of the data.
https://learning.oreily.com/library/view/creating-a-data-driven/9781491916902/ch05.html#chap_analysis
Descriptive Analytics Activity
● Download Data: downloadable XLS file.
Partner with 1 other person (in person or via zoom)
● Introduce yourselves
● Review/Identify (5-10) Descriptive metrics covered in excel sheet
● Document values
● Find 1-3 metrics you find interesting not covered in the lecture
● Google the definitions
● Combine your processes and data
https://learning.oreily.com/library/view/creating-a-data-driven/9781491916902/ch05.html#chap_analysis
Descriptive Analytics- Activity
● Open a document on your Google Drive or local drive
● Add content to activity from Lecture 2
○ lname1_lname2_cs160_Assignment3
○ Add Title, your name, activity title
○ Summarize metrics in the excel sheet you identified in the lectures
○ Summarize 1-3 metrics in the excel sheet that were not covered
○ Add definitions for 1-3 metrics above
Predictive Analytics
● The goal of Predictive analytics is to learn about relationships
among variables from an existing training dataset and
develop a statistical model that can predict values of
attributes for new, incomplete, or future data points.
● Predictive analysis can then be used to generate forecasts,
that is, future predictions in a time series, which in turn can be
used to generate plans of when to manufacture or buy goods,
how many to make or buy, when to have them shipped to
stores, and so on.
https://learning.oreilly.com/library/view/creating-a-datadriven/9781491916902/ch05.html
Predictive Analytics
● Predictive analysis can also make predictions about which class
an object might fall into.
● For instance, given a person’s salary information, credit card
purchase history, and history of paying (or not paying) bills, we
can predict their credit risk.
● Given a set of tweets that contain a short movie review, each of
which has been labeled by a human as being positive (“I loved
this movie.”) or negative (“This movie sucked.”), we can develop
a model that will predict the sentiment—positive or negative—for
new tweets, such as “The movie’s special effects were
awesome,” that the model was not trained upon.
https://learning.oreilly.com/library/view/creating-a-datadriven/9781491916902/ch05.html
Predictive Analytics
https://learning.oreilly.com/library/view/creating-a-datadriven/9781491916902/ch05.html
Predictive Analytics
● Good recommendations of who to date, Stock prediction software
(caveat emptor!)
● By tracking movements in stock prices and identifying patterns,
algorithms can attempt to buy low, sell high, and maximize
returns.
● Content apps: Good recommendations of what to watch (Netflix)
leads to higher retention and lower churn.
● Social networking: LinkedIn’s “People You May Know” increases
the user’s network effect and provides both the user greater value
and the service more valuable data.
https://learning.oreilly.com/library/view/creating-a-datadriven/9781491916902/ch05.html
Predictive Analytics
Predictions that can drive higher conversion and basket sizes:
● Cross sell and upsell: Even simple association-based
recommendations, such as “Customers Who Bought the Frozen
DVD Also Bought The Little Mermaid" (Amazon) leads to higher
sales and, for some, makes holiday shopping quicker and easier.
● Ads and coupons: Learning and individual’s history and predicting
an individual’s state, interest, or intent can drive more relevant
display ads or effective supermarket coupons
https://learning.oreilly.com/library/view/creating-a-datadriven/9781491916902/ch05.html
Prescriptive Analytics
● Prescriptive analytics is the upper echelon in the advanced
analytics taxonomy. Prescriptive analytics seeks to answer
"What should I do now?" or "How should I do it?" or "What
caused this to happen (root cause analysis)?"
● Predictive and prescriptive analytics are collectively called
advanced analytics and AI or strategic analytics.
Analytics Strategy
● Analytics strategy is simply a blueprint for using analytics and
detailing analytics capabilities, organizational capabilities,
management systems, resources, communication, alignment,
execution, and other efforts to enable an organization to be
successful in reaching its goals
Part II: Applying the Essentials
Activity -Finding your data
● Google Advanced Search Operators
https://www.spyfu.com/blog/google-search-operators/
Tableau Data sets for everyone
https://www.tableau.com/learn/articles/free-public-data-sets
Kaggle Data Sets
https://www.kaggle.com/datasets
● Google Advanced Search Operators
https://www.spyfu.com/blog/google-search-operators/
Tableau Data sets for everyone
https://www.tableau.com/learn/articles/free-public-data-sets
Activity -Finding data sets
Find data sets for the following data topics (Pick any three)
● Trends in Digital transformation
● Trends in AI and automation
● Job growth in your area/field of study
● Job openings for Data position
● Trends in Cloud Computing
● Investments in Data and Analytics
● Types of Analytics used in field of study
● Others that interest you
Add content to Data Exploration and Data Blending activity
○ lname1_lname2_cs160_Assignment3
Concepts and Terminology
Data Analysis
● Data analysis is the process of examining data to find facts, relationships,
patterns, insights and/or trends.
● The overall goal of data analysis is to support better decision-making.
● Carrying out data analysis helps establish patterns and relationships
among the data being analyzed.
● https://www.analyticsvidhya.com/blog/2017/07/data-visualisation-made-easy/
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
https://www.dailymail.co.uk/news/article-3168408/Where-s-sun-s-lolly-Rising-temperatures-lead-boomsales-icecreams-beer-cider.html
Ice Cream Sales (Monthly )
http://www.mas.ncl.ac.uk/~nag48/teaching/MAS1403/notes7slr.pdf
Ice Cream Sales (Monthly
http://www.mas.ncl.ac.uk/~nag48/teaching/MAS1403/notes7slr.pdf
Sales Distribution
1. The Sales distribution of
various categories relative
to each other
2. Their respective Profit
margins.
3. Each Category’s Sub –
Category Product Sales
4. Sales growth of
Categories over the years
https://www.analyticsvidhya.com/blog/2017/07/data-visualisation-made-easy/
Data & Analytics Concepts and Terminology
Part I-Essentials
Concepts and Terminology
Data Analytics
● Different kinds of organizations use data analytics tools and techniques in
different ways.
● In business-oriented environments, data analytics results can lower
operational costs and facilitate strategic decision-making.
● In the scientific domain, data analytics can help identify the cause of a
phenomenon to improve the accuracy of predictions.
● In service-based environments like public sector organizations, data
analytics can help strengthen the focus on delivering high-quality services
by driving down costs.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Concepts and Terminology
Data Analytics and Data Science
● Enterprises are collecting, procuring, storing, curating and processing
increasing quantities of data.
● Find new insights that can drive more efficient and effective operations
● Provide management the ability to steer the business proactively
● Allow the C-suite to better formulate and assess their strategic initiatives.
● Looking for new ways to gain a competitive edge
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Innovation
● Companies focus outward, looking to find new customers and keep
existing customers from defecting to marketplace competitors
● Offering new products and services and delivering increased value
propositions to customers.
● Organization can gain a competitive edge via Digitization and Innovation
● Need for techniques and technologies that can extract meaningful
information and insights has increased.
● Computational approaches, statistical techniques and data warehousing
have advanced significantly
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Digitization
● Digital mediums have replaced physical mediums as the de facto
communications and delivery mechanism.
● The use of digital artifacts saves both time and cost as distribution
is supported by the vast pre-existing infrastructure of the Internet.
● As consumers connect to a business through their interaction with
these digital substitutes, it leads to an opportunity to collect further
“secondary” data;
● Collection Secondary Data for Data Mining. (Eg.)Requesting a
customer to provide feedback,
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Digitization
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Affordable Technology
● Technology capable of storing and processing large quantities
of diverse data has become increasingly affordable
● Data solutions often leverage open-source software that
executes on commodity hardware, further reducing costs.
● The combination of commodity hardware and open source
software has virtually eliminated the advantage that large
enterprises used to hold
● Technology becomes the platform upon which the business
executes.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Affordable Technology
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Affordable Technology
● Significant price decline associated with data storage prices
over the past 20 years
● From a business standpoint, utilization of affordable technology
and commodity hardware generates analytic results that can
further optimize the execution of its business processes is the
path to competitive advantage.
● The use of commodity hardware makes the adoption of Data
solutions accessible to businesses without large capital
investments.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Part II: Applying the Essentials
In Class Activity- I
Tableau Set up
You will need at least 2 tableau accounts – the first will
be the downloaded tableau desktop
Sign up here
https://www.tableau.com/academic/teaching
The second is tableau public
https://public.tableau.com/en-us/s/
In Class Group Activity- 2
Data Exploration
Data: US Bureau of Labor Statistics
https://www.bls.gov/oes/current/oes_nat.htm
Goals: Data Exploration, Data Blending, Presentation, Collaboration,
Communication, Knowledge sharing,
● Work with your assigned group members
● Combine your processes and data
● Save file as lname1_lname2_cs160_Assignment1
● Present results in class room on designated date (PPT slides)
Data Exploration
Data: US Bureau of Labor Statistics
● May 2020 National Occupational Employment and Wage Estimates
● These estimates are calculated with data collected from employers in all
industry sectors in metropolitan and nonmetropolitan areas in every state
and the District of Columbia.
● Additional information, including the hourly and annual 10th, 25th, 75th,
and 90th percentile wages, is available
● Download Data: downloadable XLS file.
Data Exploration
Data:
US Bureau of Labor Statistics
https://blackboard.bentley.edu/webapps/blackboard/execute/content/file?cmd=view&mod
e=designer&content_id=_1798627_1&course_id=_24334_1
Data Exploration
Data US Bureau of Labor Statistics
● Find your major : Major Operational Groups
Data Exploration
Data: US Bureau of Labor Statistics
● Find your occupation : use text search option
Data Exploration & Blending
Data: Nerd Wallet
● https://www.nerdwallet.com/cost-of-living-calculator/compare/boston-mavs-colorado-springs-co
● Review background (About the data)
● Use of the cost of living calculator functionality
○ Where you live
○ Where you want to live
○ Income after graduation
Data Exploration & Blending
Data: Nerd Wallet
● Enter Salary information
● Enter current area of residence (Boston, MA)
● Enter desired area of residence (Colorado Springs, CO)
Data Exploration & Blending
Data: Nerd Wallet
● Review Results
Data Exploration & Blending
Data: Nerd Wallet & US Bureau of Labor Statistics
● https://www.bls.gov/oes/current/oes_nat.htm
● https://www.nerdwallet.com/cost-of-living-calculator/compare/boston-mavs-colorado-springs-co
● Review background (About the data)
● Use of the cost of Living calculator functionality
● Find your major and associated salary for your profession Provide at least
3 cities with reasonable cost of living
● Combine your processes and data
Results
● Open a document on your Google Drive
● Call it lname1_lname2_cs160_group#_assignment1
● Combine your processes and data
○ Occupation, Salary Career growth, cost-of-living results
Answer the following questions
● What does the salary data tell you?
● What does the cost of living calculator tell you?
● After graduation what steps can you take?
Assignments 1&2&3
Goals: Data Exploration, Data Blending, Descriptive Statistics,
Process Mapping, Collaboration, Storytelling, and Presentation
Assignments 1&2
1.
2.
3.
4.
5.
6.
7.
8.
9.
Explore information about the data, professions, and entry-level and manager position salaries from the United States
Bureau of Labor Statistics (USBLS) https://www.bls.gov/oes/current/oes_nat.htm
Explore a total of six descriptive statistics. Learn about three new descriptive not covered in class in the USBLS data.
Explore information about the data and Identify three cities with a reasonable cost of living expenses from Nerd wallet
https://www.nerdwallet.com/cost-of-living-calculator/compare/boston-ma-vs-colorado-springs-co
Story tell to your team members the reasoning for picking the location, benefits, and affordability
Create a process map using Lucid Charts to outline steps 1 through 4.
Identify websites with rich datasets available for projects
Include links to 3-5 analytics rich datasets you can use for future assignments
Prepare a slide deck that includes an “about you” slide and a storytelling narrative for steps 1-5
Present your team results in an in-class presentation. (15 min presentation with 5 mins for questions)
Homework
Mandatory:
Visit CIS Sandbox either in person/ attend a review session via
Zoom /
Get Started on the following
1. Get Tableau Set up
2. Get Lucid Account
3. Get SQL server set up
Optional: Visit Data.World website
Lucid Charts In Class Activity
Map dataset 1 and 2 activity process In Lucid Charts; Data Discovery to
Presentation
Process Mapping Guidelines
Data Discovery to Presentation
https://www.riversideca.gov/audit/pdf/Process%20Mapping%20Guidelines.pdf
In Class Activity
Map Lecture 2 activity process In Lucid Charts; Data Discovery to Presentation
In Class Activity
Map Lecture 2 activity process In Lucid Charts; Data Discovery to Presentation
In Class Activity
Map activity process In Lucid Charts; Data Discovery to Presentation
Data Exploration - US Bureau of Labor
Statistics
●
●
●
●
●
●
●
●
●
●
●
Partner with your group members
Introduce yourselves
https://www.bls.gov/oes/current/oes_nat.htm
Review background (About the data)
Download data
Review data in excel
Navigate to the USBLS web page
Find your occupation : use text search option
Find your major and associated salary for your profession
Find the salary associated with the manager position
Combine your processes and data
Nerd Wallet data -Data Blending
● https://www.nerdwallet.com/cost-of-living-calculator/compare/boston-mavs-colorado-springs-co
● Review background (About the data)
● Use of the cost of Living calculator functionality
● Find your major and associated salary for your profession from activity 1a
● Provide at least 3 cities with reasonable cost of living
● Combine your processes and data
● Summarize
Presenting Results
●
●
●
●
●
Open a document on your Google Drive
Call it lname1_lname2_cs160A1
Combine your processes and data from activity 1a and 1b
Occupation, Salary Career growth, cost-of-living results
What does the salary data tell you?
○ Hint: Information from Activity 1a
● What does the cost of living calculator tell you?
○ Hint: Information from Activity 1a
● After graduation what steps can you take?
○ Hint: Blend information from Activity 1a and 1b
In Class Activity
Map your process In Lucid Charts; Data Discovery to Presentation
Submit lname_fname_lecture4a_CS160
Data Types
● The data processed by Big Data solutions can be humangenerated or machine-generated,
● It is ultimately the responsibility of machines to generate the
analytic results.
● Human-generated data is the result of human interaction with
systems,
○ such as online services and digital devices.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types
● The data processed by Big Data solutions can be humangenerated or machine-generated,
● It is ultimately the responsibility of machines to generate the
analytic results.
● Human-generated data is the result of human interaction with
systems,
○ such as online services and digital devices.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml
Data Types
● Machine-generated data is generated by software programs and
hardware devices in response to real-world events.
● For example,
○ a log file captures an authorization decision made by a
security service
○ a point-of-sale system generates a transaction against
inventory to reflect items purchased by a customer.
● From a hardware perspective:
○ an example of machine-generated data would be
information conveyed from the numerous sensors in a
cellphone that may be reporting information, including
position and cell tower signal strength.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types
● Human-generated and machine-generated data can come from a
variety of sources and be represented in various formats or types.
● The primary types of data that are processed by Big Data solutions.
are:
○ structured data
○ unstructured data
○ semi-structured data
● These data types refer to the internal organization of data and are
sometimes called data formats. Apart from these three fundamental
data types, another important type of data in Big Data environments
is metadata.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types -Structured Data
● Structured data conforms to a data model or schema and is often
stored in tabular form.
● It is used to capture relationships between different entities and is
therefore most often stored in a relational database.
● Structured data is frequently generated by enterprise applications
and information systems like ERP and CRM systems.
● Due to the abundance of tools and databases that natively support
structured data, it rarely requires special consideration in regards to
processing or storage.
● Examples of this type of data include banking transactions, invoices,
and customer records.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types -UnStructured Data
● Data that does not conform to a data model or data schema is known
as unstructured data.
● It is estimated that unstructured data makes up 80% of the data
within any given enterprise.
● Unstructured data has a faster growth rate than structured data.
● Common types of unstructured data include.
○ Textual or binary and often conveyed via files that are selfcontained and non-relational.
○ A text file may contain the contents of various tweets or blog
postings.
○ Binary files are often media files that contain image, audio or
video data. Technically, both text and binary files have a
structure defined by the file format itself, but this aspect is
https://learning.oreilly.com/library/view/big-datadisregarded, and the notion of being unstructured is in relation to
fundamentals/9780134291185/ch01.xhtml
the format of the data contained in the file itself.
Data Types -UnStructured Data
● Common types of unstructured data include.
○ Textual or binary and often conveyed via files that are selfcontained and non-relational.
○ A text file may contain the contents of various tweets or blog
postings.
○ Binary files are often media files that contain image, audio or
video data.
○ Technically, both text and binary files have a structure defined by
the file format itself, but this aspect is disregarded.
○ The notion of being unstructured is in relation to the format of the
data contained in the file itself.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types -UnStructured Data
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types -UnStructured Data
● Special purpose logic is usually required to process and store
unstructured data.
● For example, to play a video file, it is essential that the correct
codec (coder-decoder) is available.
● Unstructured data cannot be directly processed or queried
using SQL.
● If it is required to be stored within a relational database, it is
stored in a table as a Binary Large Object (BLOB).
● Alternatively, a Not-only SQL (NoSQL) database is a nonrelational database that can be used to store unstructured
data alongside structured data.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types -SemiStructured Data
Semi-structured data has a defined level of structure and
consistency, but is not relational in nature.
Semi-structured data is hierarchical or graph-based.
This kind of data is commonly stored in files that contain text.
XML and JSON files are common forms of semi-structured data.
Due to the textual nature of this data and its conformance to some
level of structure, it is more easily processed than unstructured
data.
https://learning.oreilly.com/library/view/big-datafundamentals/9780134291185/ch01.xhtml
Data Types -SemiStructured Data
Data Types -SemiStructured Data
https://www.javatpoint.com/xml-example
Data Types -SemiStructured Data
https://www.javatpoint.com/json-example
Data Types -Meta Data
● Metadata provides information about a dataset’s characteristics and
structure.
● This type of data is mostly machine-generated and can be appended
to data.
● The tracking of metadata is crucial to Big Data processing, storage
and analysis because it provides information about the pedigree of
the data and its provenance during processing.
● Examples of metadata include:
○ XML tags providing the author and creation date of a document
○ Attributes providing the file size and resolution of a digital
photograph
● Big Data solutions rely on metadata, particularly when processing
semi-structured and unstructured data.
https://www.javatpoint.com/json-example
Data Types -Meta Data
https://www.javatpoint.com/json-example
Database Terminology
● Datasets
○ Collections or groups of related data are generally referred to as
datasets. Each group or dataset member (datum) shares the same set
of attributes or properties as others in the same dataset. Some
examples of datasets are:
○ • tweets stored in a flat file
○ • a collection of image files in a directory
○ • an extract of rows from a database table stored in a CSV formatted
file
○ • historical weather observations that are stored as XML files
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec1
Database Terminology
●
●
Tables: Tables, on the other hand, contain all the necessary information in
the database. They have a format similar to a spreadsheet, with rows
(records) and columns (fields)
Relational databases such as MySql, Postgres, and Oracle databases are
often used to store structured data.
● RDBMS: The systems that manage relational databases, which store
transactional records or data structured or arranged in a predetermined
format, are called relational database management systems (RDBMS).
Why Databases- Video
https://www.youtube.com/watch?v=wR0jg0eQsZA
Online Transaction Processing (OLTP)
● OLTP is a software system that processes transaction-oriented data.
● The term “online transaction” refers to the completion of an activity in
realtime and is not batch-processed.
● OLTP systems store operational data that is normalized.
● This data is a common source of structured data and serves as input to
many analytic processes.
● OLTP systems, for example a point of sale system, execute business
processes in support of corporate operations.
● OLTP perform transactions against a relational database.
● Big Data analysis results can be used to augment OLTP data stored in
the underlying relational databases.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch04.xhtml#ch04lev1sec1
Online Transaction Processing (OLTP)
● The queries supported by OLTP systems are comprised of simple insert,
delete and update operations with sub-second response times.
● Examples include ticket reservation systems, banking and point of sale
systems.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch04.xhtml#ch04lev1sec1
Online Analytical Processing (OLTP)
● Online analytical processing (OLAP) systems are used for processing data
analysis queries.
● OLAPs form an integral part of business intelligence, data mining and
machine learning processes.
● They are relevant to Big Data in that they can serve as both a data source
as well as a data sink that is capable of receiving data.
● They are used in diagnostic, predictive and prescriptive analytics.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch04.xhtml#ch04lev1sec1
Online Analytical Processing (OLAP)
● Online analytical processing (OLAP) systems are used for processing data
analysis queries.
● OLAPs form an integral part of business intelligence, data mining and
machine learning processes.
● They are relevant to Big Data in that they can serve as both a data source
as well as a data sink that is capable of receiving data.
● They are used in diagnostic, predictive and prescriptive analytics.
● OLAP systems perform long-running, complex queries against a
multidimensional database whose structure is optimized for performing
advanced analytics.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch04.xhtml#ch04lev1sec1
Online Analytical Processing (OLAP)
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch04.xhtml#ch04lev1sec1
Online Analytical Processing (OLAP)
● OLAP systems store historical data that is aggregated and
denormalized to support fast reporting capability.
● They further use databases that store historical data in
multidimensional structures and can answer complex queries
based on the relationships between multiple aspects of the data.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch04.xhtml#ch04lev1sec1
Extract Transform Load
● Extract Transform Load (ETL) is a process of loading data from a
source system into a target system.
● The source system can be a database, a flat file, or an
application.
● Similarly, the target system can be a database or some other
storage system.
● ETL represents the main operation through which data
warehouses are fed data.
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch04.xhtml#ch04lev1sec1
Extract Transform Load
● Required data is first obtained or extracted from the sources, after which the
extracts are modified or transformed by the application of rules.
● Finally, the data is inserted or loaded into the target system.
Data Warehouses
● A data warehouse is a central, enterprise-wide repository consisting of
historical and current data.
● Data warehouses are heavily used by BI to run various analytical queries,
and they usually interface with an OLAP system to support multidimensional analytical queries,
● Data pertaining to multiple business entities from different operational
systems is periodically extracted, validated, transformed and
consolidated into a single denormalized database.
● With periodic data imports from across the enterprise, the amount of
data contained in a given data warehouse will continue to increase.
● Over time this leads to slower query response times for data analysis
tasks. To resolve this shortcoming, data warehouses usually contain
optimized databases, called analytical databases, to handle reporting
and data analysis tasks. An analytical database can exist as a separate
DBMS, as in the case of an OLAP database.
Data Warehouses
https://learning.oreilly.com/library/vi
ew/big-datafundamentals/9780134291185/ch0
Customer Database
https://www.w3schools.com/sql/sql_syntax.asp
NorthWind Database
https://www.w3schools.com/sql/sql_syntax.asp
NorthWind Database
SQL Data Types
https://www.w3schools.com/sql/sql_datatypes.asp
SQL Data Types
https://www.w3schools.com/sql/sql_datatypes.asp
SQL Data Types
https://www.w3schools.com/sql/sql_datatypes.asp
W3 School
● Access NorthWind Data base
● Go to
https://www.w3schools.com/sql/sql_ref_database.asp
W3 School
W3 School
Entity Relationship Diagram
https://www.youtube.com/watch?v=QpdhBUYk7Kk
Business Metrics
● So what metrics do companies care about?
● The sole purpose of businesses, according to the Nobel-winning economist
Milton Friedman, is to maximize profits for shareholders.
● The ultimate goal of any project within a business is, therefore, to increase
profits, either directly or indirectly:
○ Directly such as increasing sales (conversion rates) and cutting costs;
○ Indirectly such as higher customer satisfaction and increasing time spent
on a website.
https://learning.oreilly.com/library/view/designing-machine-learning/9781098107956/ch02.html
Business Objectives
● We first need to consider the objectives of the proposed projects.
● When working on an project, data analyst/data scientists tend to care about
the the metrics they can measure.
○ Performance of their ML models such as accuracy, F1 score,
inference latency, etc.
● They get excited about improving their model’s accuracy from 94% to 94.2%
and might spend a ton of resources—data, compute, and engineering time—
to achieve that.
● But the truth is: most companies don’t care about the fancy ML metrics or
complex visualization.
https://learning.oreilly.com/library/view/designing-machine-learning/9781098107956/ch02.html
Business Objectives
● They don’t care about increasing a model’s accuracy from 94% to 94.2%
unless it moves some business metrics.
● A pattern in many short-lived ML projects and visualization projects is that
the data analyst/ scientists become too focused on hacking ML metrics or
complex visualization without paying attention to business metrics.
● Their managers, however, only care about business metrics and, after failing
to see how an ML project can help push their business metrics, kill the
projects prematurely (and possibly let go of the data science team involved)
https://learning.oreilly.com/library/view/designing-machine-learning/9781098107956/ch02.html
Business Metrics- ML metrics example
● Imagine that you work for an ecommerce site that cares about purchasethrough rate.
● You want to move your recommender system from batch prediction to online
prediction.
● You might reason that online prediction will enable recommendations more
relevant to users right now.
● Which can lead to a higher purchase-through rate.
● You can even do an experiment to show that online prediction can improve
your recommender system’s predictive accuracy by X%
● Historically on your site, each percent increase in the recommender system’s
predictive accuracy led to a certain increase in purchase-through rate.
Business Metrics- ML metrics example
● One of the reasons why predicting ad click-through rates and fraud detection
are among the most popular use cases for ML today
○ Is that it’s easy to map ML models’ performance to business metrics:
○ Every increase in click-through rate results in actual ad revenue, and
every fraudulent transaction stopped results in actual money saved.
https://learning.oreilly.com/library/view/designing-machine-learning/9781098107956/ch02.html
Mapping Business Metrics to ML metrics
● Many companies create their own metrics to map business metrics to ML
metrics.
● For example, Netflix measures the performance of their recommender
system using take-rate:: the number of quality plays divided by the number of
recommendations a user sees.
● The higher the take-rate, the better the recommender system.
● Netflix also put a recommender system’s take-rate in the context of their
other business metrics like
○ total streaming hours and subscription cancellation rate.
● They found that a higher take-rate also results in higher total streaming hours
and lower subscription cancellation rates
PROJECT PLANNING
Even the most complex project can be tackled successfully if broken down in a
series of steps. This sequence will help you to consider your project in stages:
1.
2.
3.
4.
5.
6.
7.
8.
Understand Overall Project
Detail Objectives
Brainstorm Resources to Complete Deliverable
Establish Project Timeline in Phases
Research--Identify What Resources Are Available
Analyze Research/Findings
Outline Finished Product
Write/Compile Document Presentation
Exploratory Data Analysis (EDA)
● Classical statistics focused almost exclusively on inference, a sometimes
complex set of procedures for drawing conclusions about large
populations based on small samples.
● In 1962, John W. Tukey called for a reformation of statistics in his
seminal paper “The Future of Data Analysis” [Tukey-1962].
● He proposed a new scientific discipline called data analysis that included
statistical inference as just one component.
● Tukey forged links to the engineering and computer science communities
(he coined the terms bit, short for binary digit, and software), and his
original tenets are surprisingly durable and form part of the foundation for
data science.
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html
Additional Data Camp resources
Follow this link to learn about Data Camp
Links to an external site.
These practice assignments
Links to an external site.
can help you learn, practice and build your datasheet skills in Tableau.
These Data Camp Training Resources
Links to an external site.
can help you build other Tableau skills. You will have to create a user
in Data Camp before beginning.
Popular
Analytics
Data
Sets
● Google
Dataset Search;
Sample dataset:
Global
price of coffee,
1990-present
● Kaggle: Sample dataset: Daily temperature of major cities
● Data.Gov: Sample dataset: Lobster Report for Transshipment and
Sales
● Datahub.io: Sample dataset: Average mass of glaciers since 1945
● UCI Machine Learning Repository: Sample dataset: Behavior of
urban traffic in Sao Paulo, Brazil
● Earth Data: Sample dataset: Environmental conditions during fall
moose hunting season in Alaska, 2000-2016
● FBI Crime Data Explorer: FBI Crime Data Explorer
https://careerfoundry.com/en/blog/data-analytics/where-to-find-free-datasets/
●
●
●
●
●
●
●
●
●
●
Popular Analytics Data Sets
Pro Football Reference: Contains all-time statistics from NFL players and teaPro
Football Stats, History, Scores,
Standings, Playoffs, Schedule & Records | Pro-Football-Reference.com
Kaggle: Contains a library of various user submitted data sets Find Open Datasets and Machine
Learning Projects | Kaggle
Data.gov: Contains datasets from various US agencies Data.gov
Academic Torrents: Provides datasets from academic papers Search - Academic Torrents
Nasdaq Data: Provides financial and economic datasets: Search | Nasdaq Data Link
World Bank Open Data: economic datasets from around the world and ranging in several topics
https://data.worldbank.org/
NASA: datasets covering the association’s scientific/astronomic discoveries https://data.nasa.gov/
WHO Global Health Observatory: provides datasets from medical research
https://www.who.int/data/gho
Spotify: datasets on preferences in music genres/podcasts https://research.atspotify.com/datasets/
OECD: datasets on different countries’ state of living and financial states https://stats.oecd.org/
Shared by Ampie S (CS160 Spring2022)
PROJECT PHASE I
Shared by Ampie S (CS160 Spring2022)
Module 2 Learning objectives
● Describe what is Tableau
● Explain the basic functions in Tableau
https://learning.oreilly.com/library/view/effective-data-storytelling/9781119615712/c07.xhtml#usec0003
Tableau
https://www.tableau.com/learn/get-started
Tableau
https://www.tableau.com/learn/get-started
1. Click on Free training
2. Enter login credentials or Create one to access
Getting Started
https://www.tableau.com/learn/tutorials/on-demand/getting-started?playlist=391099
Connect to Data
https://www.tableau.com/learn/tutorials/on-demand/getting-started-part3?playlist=391099
https://learning.oreilly.com/library/view/practicaltableau/9781491977309/ch08.html#an_introduction_to_aggregation_in_tablea
https://learning.oreilly.com/library/view/practicaltableau/9781491977309/ch08.html#an_introduction_to_aggregation_in_tablea
https://learning.oreilly.com/library/view/practicaltableau/9781491977309/ch08.html#an_introduction_to_aggregation_in_tablea
https://learning.oreilly.com/library/view/practicaltableau/9781491977309/ch08.html#an_introduction_to_aggregation_in_tablea
https://learning.oreilly.com/library/view/practicaltableau/9781491977309/ch08.html#an_introduction_to_aggregation_in_tablea
Data Preparation
Data Preparation – The Data Interpreter (4:29)
See Tableau Public’s ideal data structure, and learn how to
use the Data Interpreter to clean data
Covers:
● How your data should (ideally) be structured
● How to clean your data using the Data Interpreter
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
Data Preparation
Data Preparation – Pivoting your Data (4:54)
Learn how to pivot your data structure in Tableau
Covers:
● Why you might need to pivot your data structure
● How to use Tableau Public’s pivot function
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
Data Preparation
Data Preparation – Splitting your Data (2:26)
Learn how to split a field into multiple fields in Tableau
Covers:
● Why you might need to split a field in Tableau
Public
● How to use Tableau Public’s split function
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
Data Preparation
Data Preparation – Splitting your Data (2:26)
Learn how to split a field into multiple fields in Tableau
Covers:
● Why you might need to split a field in Tableau Public
● How to use Tableau Public’s split function
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
Data Preparation
Data Preparation – Joins and Unions (6:28)
Learn how to join multiple data sets together in Tableau
Covers:
● What are joins and unions
● How to join two data sets together
● How to union multiple data sets
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
Charts
Learn about the logic of how Tableau Public creates charts
Covers:
● Overview of Dimensions and Measures
● Overview of Columns and Rows shelf
● Overview of the Marks card
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
World Map Chart
Data Preparation – Splitting your Data (2:26)
Learn how to split a field into multiple fields in Tableau
Covers:
● Why you might need to split a field in Tableau
Public
● How to use Tableau Public’s split function
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
Dashboards
Combining Sheets on a Dashboard (5:27)
See how to combine your visualizations together on a
dashboard
Covers:
● How to combine sheets on a dashboard
● How to re-arrange and add items to a dashboard
Data: World Bank CO2 (.xlsx)
Viz: Combine Sheets on a Dashboard
https://public.tableau.com/s/resources
Dashboards
Adding Interactivity to Dashboards (4:30)
Learn how to add interactivity to your dashboards
Covers:
● See how to add filter actions
● See how to add highlight actions
Data: World Bank CO2 (.xlsx)
Viz: Combine Sheets on a Dashboard
https://public.tableau.com/s/resources
DESCRIPTIVE STATISTICS
https://www.pluralsight.com/guides/tableau-worksheetsummary-card:-quick-descriptive-statistics
https://public.tableau.com/s/resources
STORIES
Creating Stories (5:55)
Learn how to turn your data into a cohesive
narrative using Story Points
Covers:
● See examples of data stories
● Learn how to create story points
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
STORIES
Creating Stories (5:55)
Learn how to turn your data into a cohesive
narrative using Story Points
Covers:
● See examples of data stories
● Learn how to create story points
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
STORIES
Formatting Story Points (6:54)
Make your stories come to life with these formatting tips
Covers:
● Learn how to fit your dashboards to the story points
● See how to format the story points
● See how to add annotations to your story
Data: World Bank CO2 (.xlsx)
https://public.tableau.com/s/resources
TABLEAU TUTORIAL-1
https://mdl.library.utoronto.ca/technology/tutorials/creatingdata-visualizations-using-tableau-desktop-beginner#7
https://public.tableau.com/s/resources
TABLEAU TUTORIAL-2
https://maps.library.utoronto.ca/workshops/Tableau1hrOnline
/Demo.pdf
https://maps.library.utoronto.ca/workshops/Tableau1hrOnline
/SetupInstructions.pdf
https://public.tableau.com/s/resources
CALCULATED FIELDS
https://learning.oreilly.com/library/view/practical-tableau/9781491977309/ch08.html#an_introduction_to_aggregation_in_tablea
Connect to Data
Watch Video:
https://www.tableau.com/learn/tutorials/on-demand/getting-started-part3?playlist=391099
Tableau Workspace
https://help.tableau.com/current/pro/desktop/en-us/environment_workspace.htm
Watch: Tour the Tableau Interface
Dive deeper: The Tableau Workspace
Getting Started with Visual Analytics
VIDEO PLACEHOLDER
QUESTION TO ANSWER:
1. Sales over timer
2. Profit over time
3. Relationship between shipping cost and profit
https://www.tableau.com/learn/tutorials/on-demand/getting-started-visual-analytics
Download Video from BlackBoard or Webpage
ORDERS, PEOPLE, RETURNS
https://www.tableau.com/learn/tutorials/on-demand/getting-started-part3?playlist=391099
JOIN ORDERS and RETURNS
https://www.tableau.com/learn/tutorials/on-demand/getting-started-part3?playlist=391099
Tableau Workspace
Watch: Tour the Tableau Interface
Dive deeper: The Tableau Workspace
Video Placeholder
Tableau Calculations
Watch
Dive deeper: The Tableau Workspace
Video Placeholder
COMMON CHARTS
https://help.tableau.com/current/pro/desktop/en-us/dataview_examples.htm
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Build an Area Chart
Build a Bar Chart
Build a Box Plot
Build a Bullet Graph
Build with Density Marks (Heatmap)
Build a Gantt Chart
Build a Highlight Table
Build a Histogram
Build a Line Chart
Build a Packed Bubble Chart
Build a Pie Chart
Build a Scatter Plot
Build a Text Table
Build a Treemap
Build Combination Charts
AREA CHARTS
https://help.tableau.com/current/pro/desktop/en-us/qs_area_charts.htm
BAR CHARTS
https://help.tableau.com/current/pro/desktop/en-us/buildexamples_bar.htm
Build Visualization
Watch: Build a Visualization
Additional reading: Viz Building
Basics
Learn how the 'Show Me' menu can
help you.
BOX WHISKERS PLOT
https://help.tableau.com/current/pro/desktop/en-us/buildexamples_boxplot.htm
PIE PLOT
https://help.tableau.com/current/pro/desktop/en-us/buildexamples_pie.htm
CREATE DASHBOARDS
https://help.tableau.com/current/pro/desktop/en-us/dashboards.htm
●
●
●
●
●
●
●
●
Best Practices for Effective Dashboards
Create a Dashboard
Accelerators for Cloud-based Data
Size and Lay Out Your Dashboard
Create Dashboard Layouts for Different Devices
Build Accessible Dashboards
Manage Sheets in Dashboards and Stories
Use Dashboard Extensions
STORIES
https://help.tableau.com/current/pro/desktop/en-us/stories.htm
In Tableau, a story is a sequence of visualizations that work
together to convey information. You can create stories to tell a
data narrative, provide context, demonstrate how decisions relate
to outcomes, or to simply make a compelling case.
Additional resources
● This article The Ultimate Cheat Sheet on Tableau Charts
Links to an external site.
by Kate Strachnyi explains why Tableau Desktop is a superior
data analysis and visualization tool.
● When you click on this link you will see a Tableau Example.
Interviewing data: exploratory graphical analysis
● Please review these free training videos from Tableau
PROJECT PHASE I
Shared by Ampie S (CS160 Spring2022)
PROJECT
As part of this course, students will undertake a real-world data project. The project
will consist of addressing several questions and requirements using data and analytic
approaches and tools. The project will be carried out in multiple phases, each
requiring a mandatory in class presentation. The following questions will help you to
consider your project:
●
●
●
●
●
●
●
What are the business question(s)?
What are the business Objectives(s)?
What are the business metrics to measure?
What are the business output & outcomes?
Why are they important?
How should you answer the question(s)?
How do you know when the question(s) are answered?Wh
Class Project and Presentations
Throughout the course, we will be using the same dataset as this gives you the
opportunity to become well versed in the data. With this dataset you will complete
the following:
Phase I
● Initial analysis in Tableau that will provide the foundation for your subsequent
analyses (Assignment I)
a. Group Tableau Dashboard, slides and presentation
Phase II
● Incorporating Business objectives, consulting for context, story telling and
vizualization concepts
a. Group Tableau Dashboard, slides and presentation
PHASE I PROJECT - Guidelines
Phase I
●
●
●
●
●
Initial analysis in Tableau that will provide the foundation for your subsequent analyses
(Assignment I)
Group Tableau Dashboard, slides and presentation
In class mandatory group presentation
Each member participates in the presentation (under 3 mins per student)
Slide deck should include the following areas
○ Include “About you” section in the Slide deck
○ Understand Overall Project
○ Project Objectives
○ Data set to use for the project
Phase I- Guidelines
Mandatory in-class presentation (under 3 mins per student). Students will walk
the class through the Tableau dashboard. We will skip student intro “About you”
during in-class presentation.
○
○
○
●
●
Resources to Complete Deliverable
Establish Project Timeline in Phases
Create a chart, map, graph, or other visualization to make your data easier to
understand.
○ Use Descriptive Analytics
○ Snapshot of Tableau graphs summary n Slide deck
○ Each team member presents
Submit slides individually (fn_ln_Grp#_CS160_month_day_year)
Submit Tableau workbook individually (fn_ln_Grp#_CS160_month_day_year)
Project Phase II
Phase II- Guidelines
○
○
○
○
○
Resources to Complete Deliverable
Establish Project Timeline in Phases
Visualize data to tell a story
Incorporate Consulting for Context principles
Iterate on Phase I Tableau Dashboard
■ Create a chart, map, graph, or other visualization to make your data
easier to understand.
■ Use Descriptive Analytics
■ Snapshot of Tableau graphs
● Submit slides individually (fn_ln_Grp#_ALY6070_1_30_2023)
● Submit Tableau workbook individually (fn_ln_Grp#_ALY6070_1_30_2023)
Project Phase II- Guidelines
●
●
●
●
●
Incorporating Business objectives, consulting for context, story telling and vizualization
concepts
Group Tableau Dashboard, slides and presentation
In class mandatory group presentation
Each member participates (under 3 mins per student) in the presentation
Slide deck should include the following areas
a. Include “About you” section in the Slide deck
b. Understand Overall Project
c. Project Objectives
d. Data set to use for the project
Module 2 : Learning Objectives
By the end of this module, you should be able to:
● Explain the importance of data storytelling
● Explain the importance of context
● Differentiate between exploratory and explanatory data visualization
approaches
● Tailor data presentation to identified target audience
● Incorporate learnings in Phase II project
Exploratory vs. explanatory analysis
● Exploratory analysis is what you do to understand the data and figure
out what might be noteworthy or interesting to highlight to others.
● When we’re at the point of communicating our analysis to our
audience, we really want to be in the explanatory space.
● You have a specific thing you want to explain, a specific story you
want to tell—probably about those two pieces
Explanatory analysisWho? What? and How?
● When it comes to explanatory analysis, there are a few things to think
about and be extremely clear on before visualizing any data or
creating content.
● First, To whom are you communicating?
○ It is important to have a good understanding of who your
audience is and how they perceive you.
○ This can help you to identify common ground that will help you
ensure they hear your message.
Explanatory analysis- Who, what, and how
● Second, What do you want your audience to know or do?
○ You should be clear how you want your audience to act.
○ Take into account how you will communicate to them.
○ Overall tone that you want to set for your communication.
● Third: How can you use data to help make your point?
Consulting for context: questions to ask
● What background information is relevant or essential?
● Who is the audience or decision maker?
○ What do we know about them?
● What biases does our audience have?
○ that might make them supportive of or resistant to our message?
● What data is available that would strengthen our case?
● Is our audience familiar with this data, or is it new?
Consulting for context: questions to ask
● Where are the risks: what factors could weaken our case and do we
need to proactively address them?
● What would a successful outcome look like?
● If you only had a limited amount of time or a single sentence to tell
your audience what they need to know, what would you say?
Storytelling
● Find a subject you care about.
○ It is this genuine caring, and not games with language, which will be the most
compelling and seductive element in your style.
● Do not ramble, though.
● Keep it simple.
○ “To be or not to be?” asks Shakespeare’s Hamlet. The longest word is three
letters.
● Have the guts to cut. If a sentence, no matter how excellent, does not illuminate
your subject in some new and useful way, scratch it out.
● Sound like yourself.
● Say what you meant to say.
● Pity the readers. Our audience requires us to be sympathetic and patient
teachers, ever willing to simplify and clarify.
https://learning.oreilly.com/library/view/storytelling-with-data/9781119002253/c07.xhtml#c7_2
Constructing the story- The beginning
● The setting: When and where does the story take place?
● The main character: Who is driving the action? (This should be framed
in terms of your audience!)
● The imbalance: Why is it necessary, what has changed?
● The balance: What do you want to see happen?
● The solution: How will you bring about the changes?
Constructing the story- The middle
● Further develop the situation or problem by covering relevant
background.
● Incorporate external context or comparison points.
● Give examples that illustrate the issue.
● Include data that demonstrates the problem.
● Articulate what will happen if no action is taken or no change is made.
● Discuss potential options for addressing the problem.
● Illustrate the benefits of your recommended solution.
● Make it clear to your audience why they are in a unique position to make
a decision or drive action.
Constructing the story- The end
● End with a call to action: make it totally clear to your audience what you want
them to do with the new understanding or knowledge that you’ve imparted to
them.
● One classic way to end a story is to tie it back to the beginning.
● At the beginning of our story, we set up the plot and introduced the dramatic
tension.
● To wrap up, you can think about recapping this problem and the resulting
need for action, reiterating any sense of urgency and sending your
audience off ready to act.
3-minute story
● The 3-minute story is exactly that: if you had only three minutes to tell your
audience what they need to know, what would you say?
● This is a great way to ensure you are clear on and can articulate the story
you want to tell.
● Being able to do this removes you from dependence on your slides or
visuals for a presentation.
● This is useful in the situation where your boss asks you what you’re working
on or if you find yourself in an elevator with one of your stakeholders and
want to give her the quick rundown.
● Or if your half-hour on the agenda gets shortened to ten minutes, or to five.
● If you know exactly what it is you want to communicate, you can make it fit
the time slot you’re given, even if it isn’t the one for which you are prepared.
Big● Idea
The Big Idea boils the so-what down even further: to a single
sentence.
● Big Idea has three components:
○ It must articulate your unique point of view;
○ It must convey what’s at stake; and
○ It must be a complete sentence.
Storyboarding
● The storyboard establishes a structure for your communication.
● It is a visual outline of the content you plan to create.
● It can be subject to change as you work through the details.
● Establishing a structure early on will set you up for success.
● When you can (and as makes sense), get acceptance from your client or
stakeholder at this step.
● It will help ensure that what you’re planning is in line with the need.
● Use a whiteboard, Post-it notes, or plain paper.
● It’s much easier to put a line through an idea on a piece of paper or recycle a
Post-it note without feeling the same sense of loss as when you cut something
you’ve spent time creating with your computer.
● With Post-it notes its easy to rearrange and easily to explore different narrative
flows.
Storyboarding
●
●
●
●
Who are the users of this screen?
What is the page showing?
What questions will the page answer?
What actions will that enable?
HOW TO STORYBOARD
1.
2.
3.
4.
Get large sticky notes or sheets of paper.
Brainstorm the elements in your story.
Play around with sequences that seem right.
Illustrate ideas that show what the step is about.
STORYBOARD EXAMPLE
We will create a storyboard for the feature in mapping software, such as
Google Maps, where a user can share their real-time route information with
others:
● Character: Susan is a sales representative. She uses mapping software
on her phone all the time to find her way to clients. She is a busy, single
mom to Simon.
● Context: She is in her car, stuck in traffic. Her phone is in its holder on the
dashboard. Simon, the second character, is waiting at school.
https://docs.google.com/presentation/d/14LXsWyo4fjnS28K9eHc0dnvy3ijyN307Y_USbAodWA/edit#slide=id.g1996323a360_0_356
STORYBOARD EXAMPLE
● Plot: Susan is rushing back from a client to fetch Simon from
school (Characters).
● She sees that she is late. Because of the unexpected traffic, she
will be very late fetching Simon.
● Simon is waiting at school, anxious and upset (Struggle).
● Susan remembers that she can send her real-time location to
Simon with the click of a button on her phone.
● Then he will know that she is coming.
● She is relieved (Trigger). Simon gets the notification and is
immediately relieved because he knows when Susan will be
arriving and that she is thinking of him (Climax + Resolution).
https://docs.google.com/presentation/d/14LXsWyo4fjnS28K9eHc0dnvy3ijyN307Y_USbAodWA/edit#slide=id.g1996323a360_0_356
STORYBOARD EXAMPLE
The next step is to separate the story into panels.
1. Susan driving from an appointment to fetch her son, Simon, from school.
She is late and the traffic is bad. She is upset and worried.
2. Closeup of Susan realizing that she can share her location with Simon, so
he can see where she is and know when she's going to arrive.
3. Simon waiting at school. He is sad and anxious because he doesn't know
when Susan will arrive and she is late. His phone buzzes with a message.
4. Simon sees Susan's progress and ETA on his phone. He is happy because
he sees that Susan is on her way and knows when she will arrive. He feels
cared for because he knows she thought of him.
SETTING THE SCENE FOR YOUR DATA
STORY
https://learning.oreilly.com/library/view/effective-datastorytelling/9781119615712/c07.xhtml#usec0003
SETTING THE SCENE FOR YOUR DATA STORY
https://learning.oreilly.com/library/view/effective-data-storytelling/9781119615712/c07.xhtml#usec0003
PHASE II PROJECT
https://learning.oreilly.com/library/view/effective-data-storytelling/9781119615712/c07.xhtml#usec0003
Module 3
As we move into this module, we will cover the
following topics:
● The importance of choosing an effective visual
display as well as decluttering your visuals.
● We will also examine what consists of graphical
integrity.
Module 3
By the end of this module, you should be able to:
● Recognize appropriate data visualizations that help to
communicate the meaning of data
● Explain how to improve a visual to better communicate
the meaning of data
● Describe Graphical Integrity
DIRECTORY OF VISUALIZATIONS
Amounts
● The most common approach to visualizing amounts (i.e.,
numerical values shown for some set of categories) is using
bars, either vertically or horizontally arranged
● However, instead of using bars, we can also place dots at the
location where the corresponding bar would end
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS
Amounts
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Distributions
● Histograms and density plots provide the most intuitive
visualizations of a distribution, but both require arbitrary
parameter choices and can be misleading.
● Cumulative densities and quantile-quantile (q-q) plots always
represent the data faithfully but can be more difficult to interpret.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Distributions
● Boxplots, violin plots, strip charts, and sina plots are useful when
we want to visualize many distributions at once and/or
● If we are primarily interested in overall shifts among the
distributions.
● Stacked histograms and overlapping densities allow a more indepth comparison of a smaller number of distributions, though
stacked histograms can be difficult to interpret and are best
avoided
● Ridgeline plots can be a useful alternative to violin plots and are
often useful when visualizing very large numbers of distributions
or changes in distributions over time
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Distributions
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Proportions
● Proportions can be visualized as pie charts, side-by-side bars,
or stacked bars.
● As for amounts, when we visualize proportions with bars, the
bars can be arranged either vertically or horizontally.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Proportions
● Pie charts emphasize that the individual parts add up to a
whole and highlight simple fractions.
● However, the individual pieces are more easily compared in
side-by-side bars.
● Stacked bars look awkward for a single set of proportions, but
can be useful when comparing multiple sets of proportions.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Proportions
●
●
●
When visualizing multiple sets of proportions or changes in proportions
across conditions, pie charts tend to be space-inefficient and often
obscure relationships.
Grouped bars work well as long as the number of conditions compared
is moderate, and stacked bars can work for large numbers of
conditions.
Stacked densities are appropriate when the proportions change along a
continuous variable.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Proportions
● When proportions are specified according to multiple grouping
variables, mosaic plots, treemaps, or parallel sets are useful
visualization approaches.
● Mosaic plots assume that every level of one grouping variable
can be combined with every level of another grouping variable,
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS- Proportions
● Treemaps do not make such an assumption.
● Treemaps work well even if the subdivisions of one group are
entirely distinct from the subdivisions of another.
● Parallel sets work better than either mosaic plots or treemaps
when there are more than two grouping variables.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :x-y relationships
● Scatterplots represent the archetypical visualization when we
want to show one quantitative variable relative to another.
● If we have three quantitative variables, we can map one onto
the dot size, creating a variant of the scatterplot called a
bubble chart.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :x-y relationships
● For paired data, where the variables along the x and y axes
are measured in the same units, it is generally helpful to add
a line indicating x = y (see “Paired Data”).
● Paired data can also be shown as a slopegraph of paired
points connected by straight lines.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :x-y relationships
● For large numbers of points, regular scatterplots can become
uninformative due to overplotting.
● In this case, contour lines, 2D bins, or hex bins may provide
an alternative
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :x-y relationships
● When we want to visualize more than two quantities, on the
other hand, we may choose to plot correlation coefficients in
the form of a correlogram instead of the underlying raw data
(see “Correlograms”).
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :x-y relationships
● When the x axis represents time or a strictly increasing quantity such as a
treatment dose, we commonly draw line graphs.
● If we have a temporal sequence of two response variables we can draw a
connected scatterplot.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :x-y relationships
● Connected scatterplot, where we first plot the two response variables in a
scatterplot and then connect dots corresponding to adjacent time points (see
“Time Series of Two or More Response Variables”).
● We can use smooth lines to represent trends in a larger dataset
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :GeoSpatial data
● The primary mode of showing geospatial data is in the form of a map
(Chapter 15). A map takes coordinates on the globe and projects them onto
a flat surface, such that shapes and distances on the globe are
approximately represented by shapes and distances in the 2D
representation. In addition, we can show data values in different regions by
coloring those regions in the map according to the data. Such a map is
called a choropleth (see “Choropleth Mapping”). In some cases, it may be
helpful to distort the different regions according to some other quantity
(e.g., population number) or simplify each region into a square. Such
visualizations are called cartograms (see “Cartograms”).
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :GeoSpatial data
● The primary mode of showing geospatial data is in the form of a map
(Chapter 15). A map takes coordinates on the globe and projects them onto
a flat surface, such that shapes and distances on the globe are
approximately represented by shapes and distances in the 2D
representation. In addition, we can show data values in different regions by
coloring those regions in the map according to the data. Such a map is
called a choropleth (see “Choropleth Mapping”). In some cases, it may be
helpful to distort the different regions according to some other quantity
(e.g., population number) or simplify each region into a square. Such
visualizations are called cartograms (see “Cartograms”).
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY
OF VISUALIZATIONS :Uncertainty
Error bars are meant to indicate the range of
likely values for some estimate or
measurement. They extend horizontally
and/or vertically from some reference point
representing the estimate or measurement
(Chapter 16). Reference points can be shown
in various ways, such as by dots or by bars.
Graded error bars show multiple ranges at the
same time, where each range corresponds to a
different degree of confidence. They are in
effect multiple error bars with different line
thicknesses plotted on top of each other.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY
OF VISUALIZATIONS :Uncertainty
Error bars are meant to indicate the range of
likely values for some estimate or
measurement. They extend horizontally
and/or vertically from some reference point
representing the estimate or measurement
(Chapter 16). Reference points can be shown
in various ways, such as by dots or by bars.
Graded error bars show multiple ranges at the
same time, where each range corresponds to a
different degree of confidence. They are in
effect multiple error bars with different line
thicknesses plotted on top of each other.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
To achieve a more detailed visualization than
is possible with error bars or graded error
bars, we can visualize the actual confidence
or posterior distributions (Chapter 16).
Confidence strips provide a visual sense of
uncertainty but are difficult to read
accurately. Eyes and half-eyes combine error
bars with approaches to visualize
distributions (violins and ridgelines,
respectively), and thus show both precise
ranges for some confidence levels and the
overall uncertainty distribution. A quantile
dot plot can serve as an alternative
visualization of an uncertainty distribution
(see “Framing Probabilities as Frequencies”).
Because it shows the distribution in discrete
units, the quantile dot plot is not as precise
but can be easier to read than the continuous
distribution shown by a violin or ridgeline
plot.
DIRECTORY OF VISUALIZATIONS :Uncertainty
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
To achieve a more detailed visualization than
is possible with error bars or graded error
bars, we can visualize the actual confidence
or posterior distributions (Chapter 16).
Confidence strips provide a visual sense of
uncertainty but are difficult to read
accurately. Eyes and half-eyes combine error
bars with approaches to visualize
distributions (violins and ridgelines,
respectively), and thus show both precise
ranges for some confidence levels and the
overall uncertainty distribution. A quantile
dot plot can serve as an alternative
visualization of an uncertainty distribution
(see “Framing Probabilities as Frequencies”).
Because it shows the distribution in discrete
units, the quantile dot plot is not as precise
but can be easier to read than the continuous
distribution shown by a violin or ridgeline
plot.
DIRECTORY OF VISUALIZATIONS :Uncertainty
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :Uncertainty
For smooth line graphs, the equivalent of an
error bar is a confidence band (see
“Visualizing the Uncertainty of Curve Fits”).
It shows a range of values the line might pass
through at a given confidence level. Like
with error bars, we can draw graded
confidence bands that show multiple
confidence levels at once. We can also show
individual fitted draws in lieu of or in
addition to the confidence bands.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
DIRECTORY OF VISUALIZATIONS :Uncertainty
For smooth line graphs, the equivalent of an
error bar is a confidence band (see
“Visualizing the Uncertainty of Curve Fits”).
It shows a range of values the line might pass
through at a given confidence level. Like
with error bars, we can draw graded
confidence bands that show multiple
confidence levels at once. We can also show
individual fitted draws in lieu of or in
addition to the confidence bands.
https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/ch05.html#amounts
PROJECT-Phase II
Iterate on Assignment 1 (Power Point and Tableau Workbook) and
incorporate the consulting, storytelling, visualization concepts you learnt.
●
●
●
●
●
●
●
What are the business question(s)?
What are the business Objectives(s)?
What are the business metrics to measure?
What are the business output & outcomes?
Why are they important?
How should you answer the question(s)?
How do you know when the question(s) are answered?Wh
PROJECT-Phase II-Submission guideline
● In class mandatory presentation Jan 30, 2023
● Submit slides individually (fn_ln_Grp#_ALY6070_1_30_2023_prj_2)
● Submit Tableau workbook individually
(fn_ln_Grp#_ALY6070_1_23_2023_prj_2)
Chart Design principles
There
are so many different types of charts. However, just because data can be
made into a chart doesn’t necessarily mean that it should be turned into one.
Before creating a chart, stop and ask: Does a visualized data pattern really matter
to your story? Sometimes a simple table, or even text alone, can communicate the
idea more effectively to your audience. Creating a well-designed chart requires
time and effort, so make sure it enhances your data story.
Although not a science, data visualization comes with a set of principles and best
practices that serve as a foundation for creating truthful and eloquent charts. In
this section, we’ll identify some important rules about chart design. You may be
surprised to learn that some rules are less rigid than others and can be “broken”
https://learning.oreilly.com/library/view/hands-on-datavisualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
● There are so many different types of charts.
● However, just because data can be made into a chart doesn’t
necessarily mean that it should be turned into one.
● Before creating a chart, stop and ask:
● Does a visualized data pattern really matter to your story?
● Sometimes a simple table, or even text alone, can communicate the
idea more effectively to your audience.
● Creating a well-designed chart requires time and effort, so make sure it
enhances your data story.
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
● Although not a science, data visualization comes with a set of
principles and best practices that serve as a foundation for
creating truthful and eloquent charts.
● In this section, we’ll identify some important rules about chart
design.
● You may be surprised to learn that some rules are less rigid
than others and can be “broken”
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Chart Design principles
https://learning.oreilly.com/library/view/hands-on-data-visualization/9781492085997/ch06.html#idm45750379312520
Exploratory Data Analysis (EDA)
●
●
●
●
●
Classical statistics focused almost exclusively on inference, a sometimes complex
set of procedures for drawing conclusions about large populations based on small
samples.
In 1962, John W. Tukey called for a reformation of statistics in his seminal paper
“The Future of Data Analysis” [Tukey-1962].
He proposed a new scientific discipline called data analysis that included
statistical inference as just one component.
Tukey forged links to the engineering and computer science communities (he
coined the terms bit, short for binary digit, and software),
His original tenets are surprisingly durable and form part of the foundation for data
science.
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html
Exploratory Data Analysis (EDA)
● The field of exploratory data analysis was established with Tukey’s 1977
now-classic book Exploratory Data Analysis [Tukey-1977].
● Tukey presented simple plots (e.g., boxplots, scatterplots).
● That, along with summary statistics (mean, median, quantiles, etc.), help
paint a picture of a data set.
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html
Types of Data
There are two basic types of structured data: numeric and categorical.
Numeric: Data that are expressed on a numeric scale.
● Continuous: Data that can take on any value in an interval. (Synonyms:
interval, float, numeric)
○ such as wind speed or time duration
● Discrete: Data that can take on only integer values, such as counts.
(Synonyms: integer, count)
○ such as the count of the occurrence of an event.
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html
Types of Data- Categorical
Categorical: Data that can take on only a specific set of values representing a
set of possible categories. (Synonyms: enums, enumerated, factors, nominal)
● Binary: A special case of categorical data with just two categories of
values, e.g., 0/1, true/false. (Synonyms: dichotomous, logical, indicator,
boolean)
● Ordinal: Categorical data that has an explicit ordering. (Synonym:
ordered factor)
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html
Types of Data- Categorical
● Categorical data takes only a fixed set of values, such as a type of TV
screen (plasma, LCD, LED, etc.) or a state name (Alabama, Alaska, etc.).
● Knowing that data is categorical can act as a signal telling software how
statistical procedures,
● Such as producing a chart or fitting a model, should behave.
● In R , ordinal data can be represented as an ordered.factor in R,
preserving a user-specified ordering in charts, tables, and models.
● In Python, scikit-learn supports ordinal data with the
sklearn.preprocessing.OrdinalEncoder.
○ Storage and indexing can be optimized (as in a relational database).
○ The possible values a given categorical variable can take are
enforced in the software (like an enum).
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html
Types of Data- Categorical
● The third “benefit” can lead to unintended or unexpected behavior:
● In R: the default behavior of data import functions in R (e.g., read.csv) is
to automatically convert a text column into a factor.
○ Subsequent operations on that column will assume that the only
allowable values for that column are the ones originally imported, and
assigning a new text value will introduce a warning and produce an
NA (missing value).
● In Python: The pandas package in Python will not make such a
conversion automatically.
○ However, you can specify a column as categorical explicitly in the
read_csv function.
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html
Types of Data
https://learning.oreilly.com/library/view/fundamentals-ofdata/9781492031079/ch02.html
Types of Data
https://learning.oreilly.com/library/view/fundamentals-ofdata/9781492031079/ch02.html
Types of Data
https://learning.oreilly.com/library/view/fundamentals-ofdata/9781492031079/ch02.html
Types of Data
https://learning.oreilly.com/library/view/fundamentals-ofdata/9781492031079/ch02.html
EDA terminology
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
Descriptive Statistics
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
Descriptive Statistics
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
Descriptive Statistics
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
Descriptive Statistics
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
EDA
●
●
Documentation on data frames in R
Documentation on data frames in
Python
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
EDA
● Variables with measured or count data might have thousands of
distinct values.
● A basic step in exploring your data is getting a “typical value” for
each feature (variable): an estimate of where most of the data is
located (i.e., its central tendency).
● At first glance, summarizing data might seem fairly trivial: just take
the mean of the data.
● While the mean is easy to compute and expedient to use, it may
not always be the best measure for a central value.
● For this reason, statisticians have developed and promoted several
alternative estimates to the mean.
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
EDA
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
Descriptive Statistics
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
EDA
https://learning.oreilly.com/library/view/practical-statistics-for/9781492072935/ch01.html#Percentiles
https://www.ibm.com/blogs/internet-of-things/what-is-the-iot/
Internet of Things (IoT)
● The Internet of Things, or IoT, refers to the billions of physical
devices around the world that are now connected to the internet,
all collecting and sharing data.
● Pretty much any physical object can be transformed into an IoT
device.
● If it can be connected to the internet to be controlled or
communicate information.
● A lightbulb that can be switched on using a smartphone app is an
IoT device, as is a motion sensor or a smart thermostat in your
office or a connected streetlight.
https://www.zdnet.com/article/what-is-the-internet-of-things-everything-you-need-to-know-about-the-iot-rightnow/
Data and Analytics Concepts and
Terminology
-Advanced
https://www.zdnet.com/article/what-is-the-internet-of-things-everything-you-need-to-know-about-the-iot-rightnow/
Internet of Things (IoT)
https://learning.oreilly.com/library/view/big-data-fundamentals/9780134291185/ch01.xhtml#ch01lev2sec3
Internet of Things (IoT)
● The broadening coverage of the Internet and the
proliferation of cellular and Wi-Fi networks has enabled
more people and their devices to be continuously active in
virtual communities.
● Coupled with the proliferation of Internet connected
sensors, the underpinnings of the Internet of Things (IoT), a
vast collection of smart Internet-connected devices
● This in turn has resulted in a massive increase in the
number of available data streams.
https://www.zdnet.com/article/what-is-the-internet-of-things-everything-you-need-to-know-about-the-iot-rightnow/
https://aws.amazon.com/what-is-cloud-computing/
Cloud Computing
https://aws.amazon.com/what-is-cloud-computing/
https://aws.amazon.com/what-is-cloud-computing/
Cloud Computing
●
Cloud computing advancements have led to the creation of
environments that are capable of providing highly scalable,
on-demand IT resources
● Cloud computing environments can be leased via pay-asyou-go models.
● Businesses can leverage the infrastructure, storage and
processing capabilities
● Build-out scalable data analytics solutions for large-scale
automated anlaysis
https://www.zdnet.com/article/what-is-the-internet-of-things-everything-you-need-to-know-about-the-iot-rightnow/
xaaS- Cloud product offerings
https://learning.oreilly.com/library/view/kubernetes-applicationdeveloper/9781484280324/html/511560_1_En_1_Chapter.xhtml
Data Ecosystem (Unified Analytics Platform)
● The term data ecosystem refers to the programming languages,
packages, algorithms, cloud-computing services, and general
infrastructure an organization uses to collect, store, analyze, and
leverage data.
● No two organizations leverage the same data in the same way.
● As such, each organization has a unique data ecosystem.
● Also referred to an UAP
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Life Cycle
● While the data ecosystem encompasses everything that handles,
organizes, and processes data, the data life cycle describes the
path data takes from when it’s first generated to when it’s
interpreted into actionable insights.
● This life cycle can be split into eight steps: generation, collection,
processing, storage, management, analysis, visualization, and
interpretation.
● A data project’s steps are often described as a cycle because the
lessons learned and insights gleaned from one project typically
inform the next.
● In this way, the final step of the process feeds back into the first,
enabling you to start again with new goals and learnings.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Life Cycle
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
7 Data & Analytics Skills You Need
Critical Thinking
● If you’re interested in using data to solve business problems, you
need to be adept at thinking critically about challenges and
solutions.
● While data can provide many answers, it’s nothing without a
human’s discerning eye. “From the first steps of determining the
quality of a data source to determining the success of an algorithm,
critical thinking is at the heart of every decision data scientists—and
those who work with them—make,” Tingley (HBS)
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
7 Data & Analytics Skills You Need
Hypothesis Formation and Testing
● At the heart of data and analytics is the desire to answer questions.
● The proposed explanations for these leading questions are called
hypotheses, which must be formed before analysis takes place.
● An example of a hypothesis is, “I predict that a person’s likelihood of
recommending our product is directly proportional to their reported
satisfaction with the product.”
● You predict the data will show this trend and must prove or disprove the
hypothesis through analysis.
● Without a hypothesis, your analysis has no clear direction.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
7 Data & Analytics Skills You Need
Data Wrangling (Data Preparation)
● The process of cleaning raw data in preparation for analysis. It involves
identifying and resolving mistakes, filling in missing data, and organizing
and transferring it into an easily understandable format.
● This is an important skill for anyone dealing with data to acquire because
it leads to a more efficient and organized data analysis process.
● You can extract valuable insights from data more quickly when it’s
cleaned and in its optimal viewing format.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
7 Data & Analytics Skills You Need
Mathematical Ability
● You don’t have to be a mathematician to become data literate, but strong
math skills become increasingly important as you deal with more complex
analyses.
● A seasoned data professional needs a solid understanding of statistics,
probability, linear algebra, and multivariable calculus.
● Data scientists often call on statistical methods to find structure in data
and make predictions, and linear algebra and calculus can make
machine-learning algorithms easier to comprehend.
● If you’re not a data scientist or analyst, your work may not require you to
understand the more complex mathematical concepts, but having a basic
understanding of statistics can go a long way.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
7 Data & Analytics Skills You Need
Data Visualization
● It’s crucial to know how to transform raw data into compelling visuals that
tell a story.
● Rather than simply presenting a list of values to your stakeholders, it’s
more effective to visually communicate data in a way that’s easily
digestible.
● Data visualization tool, a form of software designed to present data.a llow
you to input a dataset and visually manipulate it.
● Most, but not all, come with built-in templates you can use to generate
basic visualizations (pie charts, bar charts, and histograms) Microsoft
Excel and Power BI, Google Charts, Tableau, Zoho Analytics, Data
Wrapper, and Infogram.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
7 Data & Analytics Skills You Need
Machine Learning (ML)
● As artificial intelligence (AI) grows in popularity, machine learning is a
highly valuable skill for professionals working with big data.
● Machine learning refers to the use of computer algorithms that
automatically learn from and adapt in response to data.
● Some business applications of machine learning include risk
management, performance analysis, trading, and automation.
● Even if you’re not responsible for writing code, knowing the basics of
machine learning can help you gain a deeper understanding of your
organization and boost efficiency through automation.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Science in Business
● In business, data science is used to
○ collect,
○ organize,
○ maintain data
○ often to write algorithms that make large-scale analysis
possible.
● When designed correctly and tested thoroughly, algorithms can
catch information or trends that humans miss.
● They can also significantly speed up the processes of gathering and
analyzing data.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Science in Business
● You can use data science to: •
○ Gain customer insights:
■ Data about your customers can reveal details about their
habits, demographics, preferences, and aspirations.
■ A foundational understanding of data science can help you
make sense of and leverage it to improve user
experiences and inform retargeting efforts.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Science in Business
You can use data science to: •
Increase security:
● To increase your business’s security and protect sensitive
information.
● For example, ML algorithms can detect bank fraud faster and with
greater accuracy than humans, simply because of the sheer volume
of data generated every day.
Inform internal finances:
● Your organization’s financial team can utilize data science to create
reports, generate forecasts, and analyze financial trends.
● Data on a company’s cash flows, assets, and debts is constantly
gathered, which financial analysts use to manually or algorithmically
detect trends in financial growth or decline. •
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Science in Business
You can use data science to: •
Streamline manufacturing:
● Manufacturing machines gather data from production processes at
high volumes.
● In cases where the volume of data collected is too high for a human
to manually analyze it, an algorithm can be written to clean, sort,
and interpret it quickly and accurately to gather insights that drive
cost-saving improvements. •
Predict future market trends:
● Collecting and analyzing data on a larger scale can enable you to
identify emerging trends in your market.
● By staying up to date on the behaviors of your target market, you
can make business decisions that allow you to get ahead of the
curve.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Data Science in Business
You can use data science to: •
Streamline manufacturing:
● Manufacturing machines gather data from production processes at
high volumes.
● In cases where the volume of data collected is too high for a human
to manually analyze it, an algorithm can be written to clean, sort,
and interpret it quickly and accurately to gather insights that drive
cost-saving improvements. •
Predict future market trends:
● Collecting and analyzing data on a larger scale can enable you to
identify emerging trends in your market.
● By staying up to date on the behaviors of your target market, you
can make business decisions that allow you to get ahead of the
curve.
https://online.hbs.edu/Documents/a-beginners-guide-to-data-and-analytics.pdf?hsCtaTracking=2bb079d4-1f8a-4052-95482430ccb52d48%7C4d888017-3b60-48fb-abd2-754f4106abb4
Download