Class1__Introduction_BI_Introduction

advertisement
Big Data Issues &
Introduction to Business
Intelligence
Dr. Chang Liu
My Story …

Chair and Professor of MIS at the OM&IS Department, College
of Business, Northern Illinois University.
• Taught Database Management, Web Computing, Business Information
Technologies, and Business Intelligence Applications courses
• Received MIS Undergraduate Teaching Award in 2005.
• Received MIS Graduate Teaching Awards in 2001, 2002, and 2006.
• Recognized as top 20 researchers by Information & Management in 2007
•


http://www.niutoday.info/2013/02/15/omis-chair-earns-2012-citation-of-excellence/
Born and raised in Beijing, China
PH.D. from Mississippi State University in 1997: Doctor of
Business Administration in Management Information Systems
(MIS)

Taught at Beijing Institute of Business from 1988 to 1992

Served as a Project Manager for Motorola Inc. Beijing office
from 1992 to 1994

Interest:
• Ping Pong, Badminton
Department of
Operations Management & Information Systems
OM & IS
http://cob.niu.edu/omis
+
Information
Technology
Business
Processes
OM&IS Programs
•
B.S. in Operations and Information Management
•
M.S. in Management Information Systems
•
Certificate Programs
•
•
•
•
•
•
Certificate of Graduate Study in Business Analytics Using
SAP Software
Certificate of Undergraduate Study in Business Analytics
Using SAP Software
Certificate of Graduate Study in Management Information
Systems
Certificate of Undergraduate Study in Information Systems
Certificate of Undergraduate Study in Service Management
SAP Student Recognition Award
OM&IS Department

11 tenured & tenure-track faculty

7 full-time/part-time instructors

1 program advisor & internship coordinator

1 department secretary

200 Undergraduate OM&IS majors

95 MIS master students
Top Ranked Programs
In 2013, Businessweek Ranked NIU College of
Business’s Information
Systems Program
#34 and Operations Management Program
#60 nationwide.
Introductions

Course Syllabus

Name

Background

Hobbies
Big Data Issue



Big data is a term that describes the
exponential growth and availability of data.
It is estimated that by the year of 2020 the
digital universe will have grown 44 times its
size in 2009.
Businesses must employ a data
management, analysis, and security plan in
order to stay compliant and competitive.
Big Data Big Opportunities
Business & Technology
Priorities in 2013
Top 10 Business Priorities
Rank Top 10 Technology Priorities
Rank
Increasing enterprise growth
1
Business intelligence and analytics
1
Delivering operational results
2
Mobile technologies
2
Reducing enterprise costs
3
Cloud computing
3
Attracting and retaining new customers
4
Collaboration technologies
4
Improving IT applications & infrastructure
5
Legacy modernization
5
Creating new products & services (innovation)
6
IT management
6
Improving efficiency
7
CRM
7
Attracting and retaining the workforce
8
Virtualization
8
Implementing analytics and big data
9
Security
9
Improving business processes
10
ERP Applications
10
Source: Gartner EXP (January 2013)
Worldwide BI, Analytics and Performance
Management Revenue Estimates for 2011
(Millions of U.S Dollars)
20102011
2010
2011
2011
Market
2010
Market
Growth
Company Revenue Share (%) Revenue Share (%)
(%)
SAP
2,883.5
23.6
2,413.1
23.0
19.5
Oracle
1,913.5
15.6
1,645.8
15.7
16.3
Institute
1,542.8
12.6
1,386.5
13.2
11.3
IBM
1,477.6
12.1
1,222.0
11.6
20.9
913.7
8.7
16.0
SAS
Microsoft
1,059.9
8.7
Source: Gartner(March2012),
http://www.gartner.com/it/page.jsp?id=1971516
Others
3,363.8
27.5
2,931.1
27.9
14.8
Total
12,241.0
100.0
10,512.2
100.0
16.4
How Come It Takes Me So
Long to Get Answers to
Simple Questions About My
Business?
Technologies for Business Intelligence
Why BI / What’s the problem?

Businesses (people, really) can’t get
answers efficiently.
32%!!!
An example – Let’s start small

Two spreadsheets. One has student
name, znumber (student id) and major,
the other has student name, znumber and
quiz score.
??
What’s “the answer”?


Database approach
Centralized, any-time, any-place data in a
data warehouse.
And the biggest? The “HITECH Act”

The purpose of the HITECH Act is to
promote the use of health information
technology with a goal of utilization of an
electronic health record for each person in
the United States by 2014.
What can YOU do?
(technology-wise)



Step 1: Get familiar with Microsoft
Access (or any Relational Database
Management System, RDBMS).
Step 2: Make Access centrally
accessible to your employees.
Step 3: Building a company-wide
data warehouse.
What can YOU do?
(process-wise)


Step 1: Identify where your corporate
information comes from.
Step 2: For things spreadsheetbased, consider moving that data out
of spreadsheets.
Some Definitions -1

“Data” is characters, fields, and files that are stored

“Information” is data with meaning and context. It
is an organizational asset.

A database is a collection of related data.

A relational database is data stored in a table
format.

somewhere.
The most common use of a database is an “ad hoc”
query. For example, “How many cases of bottled
water did we sell to college students in September
vs. August?”
Relationships of Users, Database
Applications, DBMS, and Database
Application
#1
Application
#2
Application
#3
DBMS
Database
containing
centralized
shared data
Hierarchy of Database
Elements
Fields
Records
Tables
(files)
Database
+
Metadata
Class Exercise – Database
Structure

You have been hired as the Personnel Director for a
medium-sized firm (500 employees) and are
expected to implement a database system to track
employee compensation. You want to be able to
calculate the age of every employee as well as
length of service. You want to know each employee’s
most recent performance evaluation. You want to be
able to calculate the amount of the most recent
salary increase, both in dollars and as a percentage
of the previous salary. You also want to know how
long the employee had to wait for that increase --that is, how much time elapsed between the present
and previous salary. Design a database table capable
of providing this information.
Q. How can you make it work?
A. Centralized database, allowing for BI and mining.
Server - responds to client requests
DBMS - the program. Manages
interaction with databases.
request
response
Client - makes requests of
the DBMS server
database - the collection of data.
Created and defined to meet the
needs of the organization.
Database Management System (DBMS)
• a program for creating & managing databases; ex.
Oracle, MS-Access, SQL Server, Sybase.
• Basically synonymous with “database” at this point.
What’s the File Processing


The “old” way of doing things; still
often used in practice.
Separate information stored on
separate files.
File-Processing Systems
Problems with
File-Processing Systems






Data are separated and isolated
Data are often duplicated
Application program dependent
Incompatible data files
Difficult to understand
Create problems with data integrity because
data is:
Duplicated & Inconsistent
File System: An Example
Duplicate
Data
Benefits of
Database Systems






Data
Data
Data
Data
Data
Data
is integrated
duplication is reduced
is consistence
is program independent
is easy to understand
is:
Shared & Integrated
Class Exercise – Database
Relationship


How to create two tables in a database,
one for Customer, the other for Order
information?
How to build a relationship between the
two tables in a database?
Some Definitions - 2



A relational database has numerous tables
(like spreadsheets) which are tied together
by common fields.
Primary key: A field that uniquely
identifies each record in a table.
Common Field (Foreign Key): A field that
appears as a non primary key (field) in one
table and as a primary key in another table.
Database Management? Ugh,
just give me a spreadsheet

There are a number of ‘versions’ of the
spreadsheet system, all with common
features …
•
•
•
•
Lack
Lack
Lack
Lack
of
of
of
of
data definition/documentation
ownership
support
change control
But Excel can be a good
front end tool:
Pivot Tables & PowerPivot

Pivot Tables can be used for data analysis
and presentation as a front end tool if all
of your data in one spreadsheet.

Demo

Exercise Assignments
Business Intelligence (BI)
Definition

BI is a set of technologies for tuning raw
data into actionable information
• Leads to better decisions that are in line with business
goals and objectives
• Helps organizations operate more efficiently
• Can lead to the discovery of new opportunities
• The BI market is growing rapidly.

BI gives decision makers and operational
staff access to ORGANIZATIONAL data
• Allows them to interact with the data
• Analyzes it
• Uses the data to perform forecasts, etc.
Data Warehouse

A subject-oriented, integrated, time-variant,
non-updatable collection of data used in
support of management decision-making
processes
• Subject-oriented: e.g. customers, patients,
students, products
• Integrated: consistent naming conventions,
formats, encoding structures; from multiple
data sources
• Time-variant: can study trends and changes
• Non-updatable: read-only, periodically
refreshed
Need for Data Warehousing


Integrated, company-wide view of high-quality
information (from disparate databases)
Separation of operational and informational
systems and data (for improved performance)
• Operational system – a system that is used to run
a business in real time, based on current data; also
called a system of record
• Informational system – a system designed to
support decision making based on historical point-intime and prediction data for complex queries or
data-mining applications
Data Warehouse Architectures




Independent Data Mart
Dependent Data Mart and
Operational Data Store
Logical Data Mart and Near RealTime Data Warehouse
Three-Layer architecture
All involve some form of extract, transform and load (ETL)
Data Mart: A data warehouse that is limited in scope to support
a single business function or process.
Figure 1: Independent data mart
data warehousing architecture
Data marts:
Mini-warehouses, limited in scope
L
T
E
Separate ETL for each
independent data mart
Data access complexity
due to multiple data marts
Figure 2: Dependent data mart with
operational data store: a three-level
architecture
ODS provides option for
obtaining current data
L
E
T
Simpler data access
Single ETL for
enterprise data warehouse (EDW)
Dependent data marts
loaded from EDW
Figure 3:Three-layer data architecture for a data
warehouse
Derived Data

Objectives
•
•
•
•
•

Ease of use for decision support applications
Fast response to predefined user queries
Customized data for particular target audiences
Ad-hoc query support
Data mining capabilities
Star Schema – A simple database design in
which dimensional data are separated from
fact data
Most common data model = star schema
(also called “dimensional model”)
How the dimensional model can
solve the problem of analyzing data?


A retailer, John Doe, sells products to customers
over a period of time
Here are some questions John needs to ask to
analyze his business:
•
•
•
•
•
•

How many units of products did I sell altogether?
Which products sold the greatest number of units?
How did sales in Week 1 compare with sales in Week 2?
How did sales perform by street?
Who are my top two customers?
How do sales split by customer gender and by street?
To provide a solution for John, most people think SQL
is the answer – write different SQL for each question
• Slow, more joins, more loads on server
Dimensional Model (Cube) to the
Rescue

John needs to have BI tool (such as SQL Server
Analysis Services or SAP Business Information
Warehouse) to create a cube for him
• The cube stores the numeric answers for all
combinations of product, store, customer, and time
• The numbers can easily be analyzed by customer, by
store, and by product. Customer, Time, Store, and
Product are the cube’s dimensions
• John does not need to run database queries as the
answers are pre-calculated
• All he needs is a nice client interface; e.g., Microsoft
Excel
Figure 4: Components of a star schema
Fact tables contain
factual or quantitative
data
1:N relationship between
dimension tables and fact
tables
Dimension tables are denormalized
to maximize performance
Dimension tables contain
descriptions about the subjects of
the business
Figure 5: Star schema example
Fact table provides statistics for
sales broken down by product,
period and store dimensions
Figure 6: Star schema with sample data
Cubes are intuitive, which makes them easy for non-technical users
Different Names, Same Ideas
•
Data Mining - uncovers important
patterns in existing data to support
decision making.
•
Online Analytical Processing (OLAP) –
User-driven discovery with
multidimensional views of their data
•
Data Warehousing
Example: Sales Data in DB
Question 1: How a star schema can be designed to represent data in a cube?
Question 2: Which salespeople do better with certain customers?
Class Exercise –
Draw The Star Schema?
Case Study – Dimensional
Modeling
Download