Uploaded by rummens.nico

More information about Business intelligence

advertisement
More information about Business intelligence.
1. Sources of data
A. Source : scanning of data
B. Source : manual registration of data


Definition: "registration of data"
Registration of data is putting data into a computer, in order to memorize the data. Often such a
computer is called a server or a database. Scanning and manual registration are only two
examples of registration of data, as there are many more ways to register such as the use of
camera's (in traffic), sensors (measurement of temperature, ...), ...
Who does the registration?
o It can be you: when you register on a website, you start registering data about yourself. Age,
name, address, ... and the computer keeps track of what you buy.
o It can be some other person: at school a staff member enrols you for the programme you
choose.
2. Where does the registered data go to?
All registered information flows by means of WIFI, cables, ... to a data center. Most customers and staff
members don't know where that center is. The ICT staff is responsible for the storage of the data. In our
school we have our own ICT staff and our own local servers, but more and more companies store the data
in a remote data center, supervised and maintained by a company such as Google, HP, IBM, ... A data
center often contains thousands of servers. Below we see some 30 to 100 servers.
Inside the data center, data is stored in a
database. In the picture above the databases
are integrated in the black machines. A
database is in fact a program, such as excel or
windows, but it's not designed to click on it.
The registered data flows in through wires at
the back, coming from the outside world
(school, bank, shop, ...). The database program
automatically stores the incoming data on hard
disks (just like the disk in a normal computer,
but of higher quality and price).
Later, the data in the database can be retrieved when needed, by other programs, by ICT-professionals or
by you (if you got the right to access the data). A database works completely autonomous, which means
that most of the time, no one is needed in the data center (except for maintenance, security, ...). The data
flows in and out of the databases automatically.
1
Inside the database, the data is stored in tables.
There is one separate table for each data type. So, there is a separate table for orders, a separate table for
customers, ... In each table there is at least one column with numbers (ID's which means "IDentification
numbers") that refer to a different table. Below, CustomerID refers to CustID.
Flow from data to report
In powerpivot, you can use the data of the MODEL to create tables/graphs.
The data flows to the Powerpivot "MODEL" in your laptop. (structured storage)
You can use Power Query to get the data to
your laptop.
Sales are stored in the data center
sales are registered
2






Name 2 possible sources of data.
Describe the way data get into a database.
What can be seen in a data center?
How do database tables refer to one another?
Is a database physically often very close to you? Explain.
Is a Powerpivot model often very close to you? Explain.
3. What is "business intelligence" and how is it used?
With ICT, several goals can be achieved

Most ICT is used to manage day-to-day business. In a shop, a bank, a hospital, a school etc, people
make data flow by means of scanning, typing, presenting a credit card.... This kind of ICT is the
most common ICT of every day life. It can have all kinds of names, such as "registration", "filing",
"data entry", "administration", ...

A special kind of ICT is called business intelligence (BI). It refers to all actions, programs and reports
that aim at getting a better understanding of the business (business = selling, treating patients, ...).
Business intelligence does not deal with one customer/patient/student, but it looks at all
customers/patients/students.
In a hospital, BI is not about questions like "what is the telephone number of patient X?", but rather
about more general questions like "are patients on average staying longer in hospital now, than
they did a year ago?"
In a school a BI-question could be "in which programmes, is the average number of years to get a
diploma, higher than in other programmes?". And what could be the reasons for that?
BI-questions typically involve the use of a lot of data (all customers, all students, several years...)
In practice, BI is used mainly by directors and their assistants. The BI reports give an insight in how
good/bad the business is going. In a small company, a director can see how things are going just by walking
around. But in a big company with many facilities (buildings, ...) it is no longer possible to get an overview
by walking around. Moreover, in e-commerce, you can walk around in the warehouse, but you can
nowhere meet the customers. That is an important reason to create reports that reveal how customers
behave: how they use the website, what they buy and when, ...
Overview of the differences
Classical ICT
Business intelligence
goal = making every-day-business work.
Accomplishing the daily tasks.
Goal = getting insight in the business
deals with one customer/student/patient at a
time. (registering a patient, updating 1 telephone
number, closing one sales transaction...)
looks at a lot of data at the same time, in order to
find an interesting insight.
Registering what happens now.
Comparing several months, years
lot of text, few tables and graphs
many tables and graphs
used by staff dealing with the every-day-business
used by directors and their assistants
3
Absolutely needed to make the business work. This
ICT has to be available 24/7.
Very useful, but the company can do without for a
while. Deep knowledge and wisdom is not needed
just to get through the day.
Every company has it. In most cases, it's already in
place for many years.
Companies have it, but some companies are still
looking how to get real good insight. BI is always
evolving in a company, and each question
answered, leads to more interesting questions.
Examples of the use of BI
SCHOOL:
Managing a school on the medium term: "which educational programs are successful and which are not?"
Possible action: better targeting of marketing efforts on specific groups of secondary school students.
LOGISTICS:
20 to 30% of the (returning) trucks on the road are empty. What a waste!
Thus, the management of the logistics company wants to know "what are reasons why the trucks return
(half) empty". "How can this be avoided?".
Although this is a very tough problem to solve, finding clues can be very rewarding, financially and for the
environment.
Possible action: use more information to determine what can be loaded on the way back.
HOSPITAL:
An empty hospital bed does not yield money.
Thus, the management wants to know "What is the average occupancy of a hospital bed in department x ?"
Possible action: beds/rooms can be used by some other department with a higher need of beds/rooms.
(Remark: in many companies, departments don't share much information, which leads to inefficiencies.
Consequently, reports that combine data of several departments, can be very revealing.)






What kind of people in a company use BI-reports?
What are these people looking for in these reports?
About a hospital: give an example of the use of classical ICT and the use of BIreports.
Same question, for a logistics company.
Same question for a school.
Which type of ICT can be missed for a day or some days? Why?
4. Definition and importance of measures, such as sales, cost, ...
All companies deal with money, and profit is important for the survival of the company. Therefore, reports
about profits are created in all companies. In the old days, these reports were handcrafted by zealous
accountants, but nowadays, BI-tools make reporting a lot easier. The basics measures are
measure
sales
reference to accounting
revenue
4
it as calculated as follows: (example)
In this BI-course, the "sales" is
in fact the revenue ( which is
also equal to net sales).
cost
margin
= sales - cost
margin %
= (sales - cost) / sales
revenue = net sales
Cost of Goods Sold (COGS)
difference between revenue and cost of goods sold (COGS)
Gross margin is the difference between revenue and cost of goods sold
(COGS) divided by revenue.
As you see, this course is limited to a small part of the income statement. We don't calculate net profit, as
we omit important elements such as interest payments on loans, all other expenses (e.g. wages),
depreciation, ... These topics are treated in more detail in your course "Business fundamentals".
In many exercises, the margin % is calculated, because it is an important indicator of the quality of the
company (and of the quality of the management), as this margin % easily shrinks when competition gets
fierce. (car. course Business fundamentals, Porter's competitive forces model).

Which of these calculations is the correct one? Why?
o Margin % = sales - cost / sales
o Margin % = (sales - cost / sales )
o Margin % = (sales - cost) / sales

Explain the importance of margin %, in relation to competition.
5. What are typical problem related to getting good data
A. The data is not available


If some important data is not collected by your company (e.g. customer satisfaction), then on the
short term you can try to create the data, for instance by conducting a survey among the
customers.
If the data needed does exist, but the ICT staff doesn't provide you with the data.
Then, try to meet these people and ask nicely for the data. Explain them the importance of it and
don't forget to offer them lots of coffee, cola and cookies. ;)
B. The data provided by the ICT staff is not exactly what I need.


On the short term, try to manipulate the data yourself to get it in the right shape.
In the mean time, ask the ICT staff whether they can deliver better/other data in the future. Try to
be very specific about what date you want, because creating a dataset you can download, takes
effort for the ICT staff to create and they don't have much time. So, do you homework (list explicitly
what you need), and most importantly, keep the communication channel open to the ICT guys. Go
to them, talk to them. Mail and SMS are bad tools to connect to colleagues. Learn how to meet
new people, add them to your network. differ
5
C. How to manipulate data?
What is data manipulation and how to organize it?
If you want to calculate something and get one final result, than you can manipulate (change) the
data in any way you want, as long as you stick to the "truth". The techniques you learn in "digital
toolkit" are sufficient to get a good result.
But, if you create a report, the data of that report will be updated
daily, weekly or monthly. It is important to do the manipulations in
such a way, that you don't have to redo every manipulation step
over and over again. By means of a QUERY, you can show to
Powerpivot which manipulations you want to do. These steps are
saved in the excel file and will be re-executed every time you
refresh/update the source data.
Some common issues and the solution for it.
issue
solution (in a query)
duplicate rows
remove duplicate rows
incorrect data
change the value. After a refresh, the
same change will be applied again.
merged information in one column.
E.g. NameFirstnameDateOfBirth
split column via an appropriate
delimiter.
you need some other kind of information. E.g. we need
to know how many students there are. But this number
depends on what we want to do with the information.
The number of students enrolled
< > ("is not equal to")
the number of students that are enrolled in a complete
program
<>
the number of students that still try to get a diploma
(some stopped their study without informing the school
administration)
<>
the maximum number of student to be evacuated in
case of a fire.
- Look well what the data means you
got in the dataset.
- Think well about what you want to
report.
- Manipulate the data accordingly (in a
query), to create a datafile that
matches the problem you are dealing
with. (For instance, counting students
that are fully or partially enrolled, can
be done by checking if the number of
credits of the student enrolment
equals 180.)
- Check you report. Are the tables and
graphs correct?
Question: if we create a query and the result is good. Can we be sure the query is really OK?
Answer: no, the same query must also work properly when the data is updated. In the future new
customers and products will be added. A query should be written in such a way, that it works fine,
now and also in the future, with new data.


If the data provided does not perfectly fit your needs, what can you do?
(short term /medium term)
A query that works fine is not always an acceptable query. Why not?
6
6. About models.
A. Why do we need a model?
Updating data gets easier
By creating a model (linked to the corporate database(s)), we can afterwards update (refresh) the
data easily. All tables and graphs will update automatically. So, no rework any more.
Do the demo exercise "starter kit 4" and experience how refreshingly simple a "refresh" works.
Defining a data range is no longer needed
Without a model (without Powerpivot, only with the techniques of "Digital toolkit") we have to
define a range of data for each table or graph. Each time the data updates, the ranges have to be
checked and often need to be changed.
Combining tables gets easy
Without a model, combining data of different tables (orders - customers - ...) would be quite a job,
because this can only be done by means of a formula such as VLOOKUP. One separate VLOOPKUP
for each column... this job could take a whole day, whereas powerpivot can do it for you in a blink
of an eye, by use of the relations.
Thus, for reporting, a tool like Powerpivot is surely recommended.
B. How can we get the data into the model
As shown in the course, we can connect to the data:


by means of a direct connection, using Powerpivot/manage/ some item in "get external data"
or
by means of query, using Data / some item in "get & transform data".
In both cases, the data is physically copied to the background of the workbook, and the workbook
stores the path of the source data (e.g. "C:\data sources"). That way, powerpivot can, on your
demand, perform a refresh from that same location.
When to use a query? Answer: in case data manipulation is needed.
7
C. What is the meaning and the purpose of the relations?
In picture on the previous page, we see the relations as lines interconnecting the tables. Excel uses
these lines as follows:
When you create a pivot table with orders and customers, Powerpivot (PP) looks in the model (below) how
these tables are linked. The arrow
in the model describes how the tables are linked (which
column of the first table, with which column of the second table). By means of that link, PP knows it can get
for any order, all the information of the customer of that order. For each order, PP takes the CustomerID (5 in
this example), goes to the table Customers and looks for the corresponding number (also 5) in the column
CustID. There, all the customer information is found and becomes available for the pivot table. Which part of
the customer information appears in the pivot table depends on your selection in the right pane with pivot
table fields.
D. Common errors in a model + solutions.

Error: an inactive relation. The connection is a dotted line.
Solution: right click, tick activate. Next, you can use the activated relation or you can delete it,
if needed.

Error: a link from a number to a text. E.g. from ID to Name. Connect numbers to numbers (that
have exactly the same meaning). Such an error often leads to a very small pivot table, with no
values or "blank" values.
Solution: delete the relation. Create a correct new relation.

Error: a link has to be drawn from the inside (where the measures are) to the outside. In this
course, you drag away from Orders. If it is drawn from the outside to the inside, numbers in
the pivot table can be wrong.
Solution: If you think the direction might be wrong, delete the relation and make sure to
8
create a new one from the inside (orders) to the outside.
Look in the model below (of exercise 007).
To create a pivot table like
Excel uses the following tables (make a choice)
o Orders
o All tables
o Orders and Customers
To create a pivot table like
Excel uses the following tables (make a choice)
o Orders
o All tables
o Orders and Customers
o Orders and Products
9
If you have 2 tables, such as Orders and Customers, but you don't have
Powerpivot as an Add-in (outdated version of excel). Which formula can you
use to combine both tables into one large table?
When you create a new relation in the model, in which direction should you
drag the line?
o That does never matter
o From the inside (Orders) to the outside
o From the Outside to the inside (Orders)
o From the largest to the smallest table
7. What are characteristics of a useful report?

It should be readable for the audience. If it will be presented on the intranet of the company, make
sure it nicely fits on the screen. If you will present it to your colleagues, make sure that all fonts are
large enough. Showing unreadable tables and graphs is a sign of poor preparation or even a lack of
respect for the audience. Respect is a virtue worth to pursue in life. So, use rather big fonts.

The report should focus the attention of the audience on something you want to show. The tables
and graphs are merely tools to support the story you bring. Your story is paramount. Such a story
could be for instance: "One year ago, we started our new way of dealing with complaints. As you
know, in this new policy we react within 6 hours by mail and we send a replacement product to the
customer, without waiting for the returned defective product. Customers are delighted!
On this graph (...show it now...) we can see that 90% of the
customers who complained about a product, buy significantly more
products in the year after the complaint. In 80% of the cases
(complaining customers), the sales to the customer grows with
more than 20%. This is significantly better than average, as you can
see in the second graph.
What is a dashboard or a scorecard?
A dashboard is a collection of tables and graphs that belong together. Dashboarding is just a trendy
name for creating graphs.
For a real estate company (buying and selling buildings), a dashboard could look like this:
10
Such a dashboard gives an overview to managers, management assistants and other employees (sales
representatives, ...) how business is going. They can see timely new trends (sales going up or going
down, ...). A dashboard is a start of thinking, discussing, changing plans, ...
A scorecard is a concise (small) dashboard, with important KPI's (Key performance indicators)
management wants to keep under control, e.g. profit, number of new customers, number of lost
customers, ...
Often, the managers predefine target values. These target values are where the green turns into
yellow. When a target is met, the indicator is green, otherwise it is yellow or red.
A scorecard scores the performance of the business. In each part of the scorecard, there is clear
indication whether the target is met or not.
Managers often use dashboards as a starting point for further discussion about what is going on
(good/bad) in the company. After discussion, the needed action is taken.
What do dashboards and scorecards look like?
Give some examples of what can be seen on it.
8. How to deal with the formulas to create new measures?
In this course, only 3 types of formulas were used:

TOTALYTD([Sum of Sales];Calendar[Date])

Calculate([Sum of Sales]; PREVIOUSYEAR('Calendar'[Date]))

Calculate([Sum of Sales]; PARALLELPERIOD('Calendar'[Date];-12;Month))
How to decide which one to use?
Well, PARALLELPERIOD is a very convenient formula, because it can be combined with any time line or
time filter. In most cases a period in the recent past has to be compared with a similar period further
in the past. With "parallelperiod" such a comparison can be done, for any length of period.
11
The formula PREVIOUSYEAR always takes the previous year, from 1/1 to 31/12.
PREVIOUSYEAR always shows a whole year (12 months).
TotalYTD selects a part of a year, a part starting from 1/1. If the targeted period is May 2008, TotalYTD
selects 1/1/2008-31/05/2008. So, it selects a part of a year.
Any of these 3 approaches can be useful in some specific situation, but parallelperiod surely is my
favourite!
Question: fill in what belongs in the yellow zone.
(the answers are coloured white. First answer, then look by changing the colour)
calculate ( [Sum of Sales]; Parallelperiod(datetable[Date]; -12; month )
=> sum of sales of period "May 2007"
TotalYTD ( [Sum of Sales]; (datetable[Date]) )
=> sum of sales of period "1th Januari 2008 - 31 May 2008"
calculate ( [Sum of Sales]; PreviousYear(datetable[Date])
=> sum of sales of period "1th Januari 2007 - 31 December 2007"
It is important to know that these formulas can only work properly when a time line or time filter has
been used. The formulas need a targeted period to start from. In the example above, May 2008 is the
targeted period and the formulas start from there.
On the next row, the targeted period is June 2008 and so on.
12
Download