CS2032 DATA WAREHOUSING AND DATA MINING SURYA GROUP OF INSTITUTIONS SCHOOL OF ENGINEERING & TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE&ENGINEERING ACADEMIC YEAR 2011-2012 / ODD SEMESTER SUBJECT CODE\SUBJECT NAME: CS2032 \ DATA WAREHOUSING AND DATA MINING YEAR/SEM: IV/VII UNIT-3-DATA MINING PART A (2 MARKS) 1. What is data mining? 2. Describe the need for data mining. 3. What is meant by KDD? 4. Write down the steps involved in knowledge discovery. 5. Write the role of data mining in data warehousing. 6. Define DMQL. 7. State the significance of hierarchy of data. 8. State the importance of statistics in data mining. 9. Describe the role of DBMS in data mining. 10. Define object relational database. 11. Define temporal, sequence and timeseries database. 12. Define spatial and spatiotemporal databases. 13. Define data streams. 14. Define concept description. 15. Define classification and prediction. 16. What is outlier analysis? 17. What makes a pattern interesting? Can a data mining system generate all of the interesting patterns? Can a data mining system generate only interesting patterns? 18. Classifications of data mining. 19. Define No coupling. 20. Define loose coupling. 21. Define semi tight coupling. CS2032 DATA WAREHOUSING AND DATA MINING 22. Define tight coupling. 23. What are the various forms of data preprocessing? 24. What are the methods of data preprocessing? 25. Define data cleaning. 26. Define data integration. 27. Define data transformation 28. Define data reduction. 29. What is linear regression? 30. Define the methods to avoid redundancy in data integration. 31. What is data generalization? 32. Define normalization. 33. Define attribute construction. 34. Define attribute subset selection. 35. Define decision tree induction. 36. Define PCA. 37. Classification of sampling. 38. Define entropy based discretization. 39. What is concept hierarchy? Give example. PART B 1. Explain the steps knowledge discovery in databases with a neat sketch. (8) 2. Describe the data mining functionalities and examine what kinds of patterns can be mined? (16) 3. Explain the classification of Data mining systems. (4) 4. Discuss the five primitives for specifying data mining tasks. 5. List out and describe the primitives for specifying a data mining task. 6. Describe the major issues in data mining regarding mining methodology, user interaction, performance and diverse data types. (12) 7. Describe the various issues in data mining. (8) 8. List and discuss the various data mining techniques. (8) 9. Explain the need and the steps involved in data preprocessing. 10. What are the types of data preprocessing techniques? Explain in detail about them? CS2032 DATA WAREHOUSING AND DATA MINING 11. Why do we preprocess the data? Explain how data preprocessing techniques can improve the quality of the data. (8) 12. Explain the various methods of data cleaning in detail. 13. Explain the smoothing techniques? 14. Suppose that the data for analysis include the attribute age. The age values for the data tuples are 13,15,16,19,20,20,21,22,22,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70. a. Use smoothing by bin means to smooth the above data, using a bin depth of 3. Illustrate your steps. b. How will you determine outliers in the data. c. What other methods are there for data smoothing? (16) 15. Discuss the various issues that have to be addressed during data integration. (8) 16. Explain data integration and transformation techniques in detail? 17. Explain data transformation in detail. 18. What are the value ranges of the following normalization methods a. Min-max normalization b. Z-score normalization c. Normalization by decimal scaling. 19. Explain data reduction. 20. Explain any two dimensionality reduction method. 21. Explain parametric methods and non-parametric methods of reduction? 22. Explain data discretization and concept hierarchy generation? 23. Describe how concept hierarchies are useful in data mining. 24. Elaborately explain the discretization and concept hierarchy generation for numeric data and categorical data. STAFF INCHARGE HOD