Document 15063057

advertisement
Course Name: Business Intelligence
Year: 2009
Data Enhancement
18th Meeting
Source of this Material
(2).
Loshin, David (2003). Business Intelligence:
The Savvy Manager’s Guide. Chapter 13
Bina Nusantara University
3
The Business Case
There are two aspect to the business value of data enhancement. The first is
that as organizational data environments mature and data managers want to
exploit the corporate data asset, there is an increased necessity for sharing
data from different group. The second aspect emerges from the actionable
knowledge that can be discovered only by analyzing the result of composing
multiple data sets. Data enhancement is a critical component to the BI
program, especially as a value-adding process to the following.
• Competition in knowledge industries
• Customer relationship management
• Micromarketing and personalization
• Cooperative marketing
• Industry deregulation
Bina Nusantara University
4
Types of Data Enhancement
There are two approaches to data enhancement. One focuses on incrementally
improving or adding information as data is viewed or processed. Incremental
enhancements are useful as a component of a later analysis stage, such
sequence pattern analysis and behavior modeling. The other approach is batch
enhancement, where data collections are aggregated and methods are applied
to the collection to create value-added information. Here some examples.
• Auditing Enhancement
In business processes that require some degree of tracing capability, a frequent data
enhancement is the addition of auditing data. Creating a tracking system associated
with a sequence of related events provides a framework for evaluating efficiency
within a business process.
•
Temporal Enhancement
Historical data provides critical insight to a BI program. Whereas in some cases the
history is embedded in the collected data, other instances require that activity be
enhanced by incrementally adding timestamps noting the time at which some event
occurred.
Bina Nusantara University
5
Types of Data Enhancement (cont…)
•
Contextual Enhancement
The place, or context, of data manipulation is an enhancement as well. A physical
location, a path of access, the login account through which a series of transactions
were performed, are examples of context that can augment data. Contextual
enhancement also includes tagging data records in a way to be correlated with other
pieces of data.
•
Geographic Enhancement
Data enhanced with geographic information allows for analysis based on regional
clustering and data inference based in predefined geodemographics. The first kind of
geographic enhancement is the process of address standardization, where addresses
are cleansed and then modified to fit a predefined postal standard.
•
Demographic Enhancement
Demographic describe the similarities that exist within an entity cluster, such as
customer age, marital status, gender, income, and ethnic coding. Demographic
enhancements or through direct information merging.
Bina Nusantara University
6
Types of Data Enhancement (cont…)
•
Psychographic Enhancement
Psychographics describe what distinguishes individual entities within a cluster.
Psychographics information is frequently collected via surveys, contest forms,
customer service activity, registration cards, as well as specialized lists. The trick to
using psychographic data is in being able to make the linkage between the entity
within the organization database and the supplied psychographic data set.
•
Inference Enhancement
Information inference is a BI technique that allows the user to draw conclusions about
the examined entity based on supporting evidence and business rules. Inferred
knowledge can be used to augment data to reflect what we have learned, and this in
turn provides greater insight into solving the business problem at hand.
Bina Nusantara University
7
Incremental Enhancement
Incremental enhancement are those that can be attached to data in process.
• Provenance
The provenance of an item is its source. This idea generalizes the temporal and
auditing enhancements described earlier. A provenance can be as simple as a single
string data field describing the source or as complex as a separate table containing a
time stamp and a location code each time the record is updated, related through a
foreign key.
•
Audit Trails
The combination of location, time, and activity information associated with a series of
manipulations of a data record allows us to trace back all occasions at which that
information was touched, giving us the audit data allowing us to see how activities
cause data to flow through a system.
•
Context
This kind of enhanced data provides significant marketing benefit, because this
context information can be fed into a statistical framework for reporting on the
behavior of users based on their locations or times of activity.
Bina Nusantara University
8
Batch Enhancements
Batch enhancements are applied to a large set of data instances as an offline
process. They typically involve the merging of data from multiple instances
within a single data set or multiple data instances drawn from multiple data
sets.
• Householding
Householding is a process that attempts to reduce a set of individuals to a single
grouped housing unit based on the database record attribution. A household consists
of all people living as an entity within the same residence.
•
Organizational Merging
When organizations merge, they will eventually want to merge their vendor,
customer, and employee databases as well as their base reference data.
•
Other Batch Enhancements
Other batch enhancements include data scrubbing, data cleansing, and health care
diagnosis assistance, as well as building affinity programs and constructing relational
associations, among others.
Bina Nusantara University
9
Standardization
Standardization refers to ensuring that a data instance conforms to a
predefined expected format. A data standard is a format representation for
data values that can be described using a series of rules. Because a standard
is a distinct model to which all items in a set must conform, this means we can
try to automate two components of any standardization process:
• Determination of conformance to the standard
• Bringing a nonstandard data instance into conformance with the standard
There is usually a well-defined rule set describing both how to determine if an
item conforms to the standard and what actions need to be taken to bring the
offending item into conformance.
• Data Standard and Standardization
The value of data standardization lies in the notion that given the right base of
reference information and a well-defined rule set, additional data can be added to a
record in a purely automated way. Probably the most important benefit of
standardization is that through the process of defining standards, organizations
create a streamlined means for the transference and sharing of information.
Bina Nusantara University
10
Standardization
•
Kinds of Standards
Most standards either are dictated by some authority (such as the government), are
developed through cooperation (such as an industry-defined standard), or are derived
from common use (such as geographical biases toward representing dates).
Bina Nusantara University
11
Example: Address Standardization
In this section, we look at the different components of an address.
• The Address Standard
 Recipient line
The recipient line indicates the person or entity to which the mail is to be delivered.
 Delivery Address line
The delivery address line is the line that contains the specific location associated with
the recipient.
 Last line
The last line of the address includes the city name, state, and ZIP code.
•
Standard Abbreviations
The postal service provides, a set of enumerations of standard abbreviations,
including U.S. State and Possession abbreviations, street abbreviations, as well as
common business word abbreviations.
Bina Nusantara University
12
Example: Address Standardization (cont…)
•
Zip + 4
ZIP codes are postal codes assigned to delivery areas to improve the precision of
sorting and delivering mail. ZIP + 4 codes are a further refinement, narrowing down a
delivery location within as subsection of a building or a street.
•
Address Standardization Software
Because the USPS addressing standard is so well documented, it is relatively
straightforward to build automated address standardization software, which eases the
way in which this enhancement can be performed.
Bina Nusantara University
13
Enhancement Methodologies
There are many issues involved in data enhancement, but because a large
number of them revolve around information record linkage, it is worthwhile to
explore this greater detail.
• Record Linkage
Any two records can be connected based on a set of chosen attributes are
candidates to be linked together. Usually record linkage is performed only when the
chosen attributes match exactly, but simple record linkage is limited, for the following
reasons.
 Information is missing
 Information sources are in different formats
 Record linkage is imprecise
 Information is out of synchronization
 Information is lost
•
Semistructured Data
Semistructured data refers to information that is partially formatted, such as data
elements on a web page or the comments field in a customer service database.
Bina Nusantara University
14
Enhancement Methodologies
Semistructured data may be a good source for both association and relation
information, but the problem of extracting information out of the data is particularly
difficult.
•
Inference
An inference is an application of a heuristic rule that essentially creates a piece of
information where its didn’t exist before. Even though inferencing represents the
application of intuition, it is done so in a way that can be automated. Inference rules
usually reflect some understood business analysis that can be boiled down to a set of
business rules.
•
Types of Inference
Enhancements based on inferencing are usually very focused bits of information
relevant within a particular analytical context. Inferences are likely to center on
demographic or psychographic details that can be derived as a direct result of data
merging and analysis.
Bina Nusantara University
15
Management Issues
•
Buy versus Build
In the software and services market, the term data enhancement is overloaded and
can be used to refer to anything from data cleansing and address standardization all
the way to services-based record linkage as a means to add data fields to submitted
data, such as credit ratings.
•
Performance Issues
Some data enhancement applications are likely to be of high computational
complexity, and therefore members of the team should be aware of high performance
computing as well as database manipulation, ETL, and pattern matching.
Bina Nusantara University
16
End of Slide
Bina Nusantara University
17
Download