CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE

advertisement
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
SMUAPR15
ASSESSMENT_CODE MC0088_SMUAPR15
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5231
QUESTION_TEXT
Explain the data warehouse models from the architecture point of
view.
SCHEME OF
EVALUATION
Enterprise warehouse: An enterprise warehouse collects all of the
information about subjects spanning the entire organization. It
provides corporate wide data integration, usually from one or more
operational systems or external information providers, and is cross functional in scope. It typically contains detailed data as well as
summarized data, and can range in size from a few gigabytes to
hundreds of gigabytes, terabytes or beyond. An enterprise data
warehouse may be implemented on traditional mainframes, UNIX
superservers, or parallel architecture platforms. It requires extensive
business modeling and may take years to design and build.
(3.5 marks)
Data mart: A data mart contains a subset of corporate — wide data
that is of value to a specific group of users. The scope is confined to
specific selected subjects. For example, a marketing data mart may
confine its subjects to customer, item, and sales. The data contained
in data marts tend to be summarized. Depending on the sources of
data, data marts can be categorized as independent or dependent.
Independent data marts are sourced from data captured from one or
more operational systems or external information providers, or from
data generated locally within a particular department or geographic
area. Dependent data marts are sourced directly from enterprise data
warehouses.
(3.5 marks)
Virtual warehouse: A virtual warehouse is a set of views over
operational databases. For efficient query processing, only some of
the possible summary views may be materialized. A virtual
warehouse is easy to build but requires excess capacity on operational
database servers.
(3 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72562
QUESTION_TEXT
What are the Four different views regarding the design of a
data warehouse? Explain.
The views are:
● Top-down view
● Data source view
● Data warehouse view
● Business query view
SCHEME OF
EVALUATION
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117788
QUESTION_TEXT
Explain the methods of classification by Decision Tree Induction.
A decision tree is a flow – chart – like tree structure, where each
internal node denotes a test on an attribute, each branch represents an
outcome of the test, and leaf nodes represent classes or class
distributions. The top – most node in a tree is the root node.
Algorithm: Generate _ decision _ tree. Generate a decision tree from
the given training data.
Input: The training samples, samples, represented by discrete –
valued attributes; the set of candidate attributes, attribute – list.
Output: A decision tree
Method:
SCHEME OF
EVALUATION
1.
create a node N;
2.
if samples are all of the same class, C then
3.
return N as a leaf node labeled with the class C;
4.
if attribute – list is empty then
5.
return N as a leaf node labeled with the most common class in
samples ;
// majority voting
6.
select test – attribute, the attribute among attribute – list with
the highest
information gain;
7.
label node N with test – attribute;
8.
for each known value ai of test – attribute // partition the
sample
9.
ai;
grow a branch from node N for the condition test – attribute =
10. let si be the set of samples in samples for which test – attribute
= ai; // a
partition
11.
if si is empty then
12.
attach a leaf labeled with the most common class in samples;
13. else attach the node returned by Generate _ decision_ tree (si,
attribute – list – test – attribute);
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117790
QUESTION_TEXT
Define FP-Tree. Explain FP-tree construction Algorithm
A frequent pattern tree( or fp-tree) is a tree structure consisting of
an item –prefix –tree and a frequent – item-header table. (3 marks)
*
Item – prefix- tree:
*
It consists of a root node labelled null
*
Each on-root node consists of three fields
SCHEME OF
EVALUATION
*
*
Item name
*
Support count,
*
Node link
Frequent – item – header – table: it consists of two fields:
*
*
in the FP-tree
Item name
Head of node link which points to the first node
(7 marks)
FP-Tree construction algorithm is given in page no: 113
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117793
QUESTION_TEXT
SCHEME OF
EVALUATION
List and explain the various criteria used to compare the
classification methods.

Predictive accuracy

Speed

Robustness

Scalability

Interpretability
5×2=10 marks
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117794
QUESTION_TEXT
List and explain the web content mining challenges.

Data/Information extraction

Web information integration and schema matching

Opinion extraction from online sources

Knowledge synthesis

Segmenting web pages and detecting noise
SCHEME OF EVALUATION
5×2=10 marks
Download