Document 15063081

advertisement
Course Name: Business Intelligence
Year: 2009
Using Publicly Available Data
20th Meeting
Source of this Material
(2).
Loshin, David (2003). Business Intelligence:
The Savvy Manager’s Guide. Chapter 15
Bina Nusantara University
3
The Business Case
It is very simple to make the case for using public data. Data that has been
collected and made available by government resources is available at a low
cost, and the only costs involve storage management and integrating with other
BI data. In any company that has set up a BI environment, the processes
associated with importing, managing, and integrating data have already been
streamlined for internal data set aggregation. And so the only increase is in
those variable costs associated with executing those processes. On the other
hand, in the right circumstance there can be significant value through data
enhancement using publicly available data.
Bina Nusantara University
4
Management Issues
There are three major management issues associated with the use of publicly
available data: integration, privacy, and its lack of structure. In fact, there are a
number of companies whose business is to enhance and improve public data
sets and the resell them based on their added value.
The second major issues revolves around personal privacy. There is a
perception that any organization that collects data about individuals and the
tries to exploit that information is invading a person’s privacy.
The third major issue is that a lot of publicly available data is not always in a
nicely structured form that is easily adaptable. Frequently, this data is
semistructured, which means that the data requires some manipulations before
it can be successfully and properly integrated.
Bina Nusantara University
5
Public Data
There is a large amount of public data that is easily accessible, and how to
explore all of that data could fill an entire book. What is important is to
explore the process of locating the data resources that are available and
how to determine the usage possibilities for that data.
There are many ways that data sets can be categorized, but we will break the
realm of public data into these areas:
• Personal Information
Any data that attributes the information about a person could be called personal
information.
•
Business Information
Aside from personal information, there is a lot of data that can be used to attribute
business entities. The public records are frequently related to rules and regulations
imposed on business operations by federal or state government jurisdictions. This
kind of data includes the following.
 Incorporations
 Uniform Commercial Code (UCC)
Bina Nusantara University
6
Public Data (cont…)





•
Bankruptcy Filings
Professional Licensing
Securities Filings
Regulatory Licensing
Patents and Trademarks
Legal Information
A large number of legal cases are accessible online, providing the names of the
parties involved in the cases as well as free text describing the case. These
documents, many of which having been indexed and made available for search,
contain embedded psychographic and geographic enhancement potential, along with
opportunities for entity extraction and entity linkage. Those linkages may represent
either personal or business relationships.
•
Factual Information
There is an abundance of factual information embedded in available data sets.
Although there may be some restriction on specific uses of some of this data, there is
still much business value that can be derived from data sets such as the following.
Bina Nusantara University
7
Public Data (cont…)
 Census Summary
 Topologically Integrated Geographic Encoding and Referencing
database
 Federal Election Commission
 Bureau of Labor Statistics (BLS)
 Pharmaceutical Data
Bina Nusantara University
8
Data Resources
There are basically two approaches: gather data from the original source, and
pay a data aggregator for a value-added data set.
• Original Source
As mentioned in the previous sections, the government is a very good source of
publicly available data. Another source of publicly available information may be
provided by third parties in a form that is not meant for exploitation. A good examples
is a Web site, which may have some data but not in a directly usable form. Another
interesting source of publicly available data is the subject of that data itself.
•
Data Aggregators
The term data aggregator to refer to any organization that collects data form one or
more sources, provides some value-added processing, and repackages the result in
a useable form. Another method for providing aggregated data is through a queryand-delivery process.
Bina Nusantara University
9
Semistructured Data
On the other hand, when the content is limited to a vocabulary or a format that
can be reasonably modeled, it is possible, with some degree of certainty, to
extract bits and pieces of information from semistructured data. The point is
that although the data has not been broken down into a distinct set of attributes
and their assigned values, there is some predictable context that appears
frequently enough that allows an application to extract information.
Bina Nusantara University
10
The Myth of Privacy
•
Fear of Invasion
The truth is, as BI professionals, we are somewhat responsible for collecting
customer information and manipulating that information for marketing purposes, but
are we really guilty of invasion of privacy?
•
The Value and Cost of Privacy
This demonstrates an interesting model of information valuation, in that the consumer
is being compensated in some way in return for providing information.
•
The “Privacy” Statement
The issuing of a privacy statement does not imply that your data is being treated as
private data. These statements actually are the opposite-they tell the consumer how
the information is not being kept private.
•
The Good News for Business Intelligence
There are a lot of benefits in society to the dissemination of personal information,
such as the ability to track down criminals, detect fraud, provide channels for
improved customer relationship management, and even track down terrorists. As BI
professional, we have a twofold opportunity with respect to the privacy issue.
Bina Nusantara University
11
The Myth of Privacy (cont…)
The first is to raise awareness regarding the consumer’s value proposition with
respect to data provision, leading to raised awareness about both the legality and the
propriety of BI analysis and information use. The second is to build better BI
applications.
Bina Nusantara University
12
End of Slide
Bina Nusantara University
13
Download