CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE SMUAPR15 ASSESSMENT_CODE MC0088_SMUAPR15 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5231 QUESTION_TEXT Explain the data warehouse models from the architecture point of view. SCHEME OF EVALUATION Enterprise warehouse: An enterprise warehouse collects all of the information about subjects spanning the entire organization. It provides corporate wide data integration, usually from one or more operational systems or external information providers, and is cross functional in scope. It typically contains detailed data as well as summarized data, and can range in size from a few gigabytes to hundreds of gigabytes, terabytes or beyond. An enterprise data warehouse may be implemented on traditional mainframes, UNIX superservers, or parallel architecture platforms. It requires extensive business modeling and may take years to design and build. (3.5 marks) Data mart: A data mart contains a subset of corporate — wide data that is of value to a specific group of users. The scope is confined to specific selected subjects. For example, a marketing data mart may confine its subjects to customer, item, and sales. The data contained in data marts tend to be summarized. Depending on the sources of data, data marts can be categorized as independent or dependent. Independent data marts are sourced from data captured from one or more operational systems or external information providers, or from data generated locally within a particular department or geographic area. Dependent data marts are sourced directly from enterprise data warehouses. (3.5 marks) Virtual warehouse: A virtual warehouse is a set of views over operational databases. For efficient query processing, only some of the possible summary views may be materialized. A virtual warehouse is easy to build but requires excess capacity on operational database servers. (3 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72562 QUESTION_TEXT What are the Four different views regarding the design of a data warehouse? Explain. The views are: ● Top-down view ● Data source view ● Data warehouse view ● Business query view SCHEME OF EVALUATION QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117788 QUESTION_TEXT Explain the methods of classification by Decision Tree Induction. A decision tree is a flow – chart – like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. The top – most node in a tree is the root node. Algorithm: Generate _ decision _ tree. Generate a decision tree from the given training data. Input: The training samples, samples, represented by discrete – valued attributes; the set of candidate attributes, attribute – list. Output: A decision tree Method: SCHEME OF EVALUATION 1. create a node N; 2. if samples are all of the same class, C then 3. return N as a leaf node labeled with the class C; 4. if attribute – list is empty then 5. return N as a leaf node labeled with the most common class in samples ; // majority voting 6. select test – attribute, the attribute among attribute – list with the highest information gain; 7. label node N with test – attribute; 8. for each known value ai of test – attribute // partition the sample 9. ai; grow a branch from node N for the condition test – attribute = 10. let si be the set of samples in samples for which test – attribute = ai; // a partition 11. if si is empty then 12. attach a leaf labeled with the most common class in samples; 13. else attach the node returned by Generate _ decision_ tree (si, attribute – list – test – attribute); QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117790 QUESTION_TEXT Define FP-Tree. Explain FP-tree construction Algorithm A frequent pattern tree( or fp-tree) is a tree structure consisting of an item –prefix –tree and a frequent – item-header table. (3 marks) * Item – prefix- tree: * It consists of a root node labelled null * Each on-root node consists of three fields SCHEME OF EVALUATION * * Item name * Support count, * Node link Frequent – item – header – table: it consists of two fields: * * in the FP-tree Item name Head of node link which points to the first node (7 marks) FP-Tree construction algorithm is given in page no: 113 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117793 QUESTION_TEXT SCHEME OF EVALUATION List and explain the various criteria used to compare the classification methods. Predictive accuracy Speed Robustness Scalability Interpretability 5×2=10 marks QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117794 QUESTION_TEXT List and explain the web content mining challenges. Data/Information extraction Web information integration and schema matching Opinion extraction from online sources Knowledge synthesis Segmenting web pages and detecting noise SCHEME OF EVALUATION 5×2=10 marks