Uploaded by Kartikey Chaubey

QUESTION BANK (1)

advertisement
KIET Group of Institutions, Ghaziabad
Department of Computer Science & Information Technology
(An ISO – 9001: 2015 Certified & ‘A+’ Grade accredited Institution by NAAC)
DATA ANALYTICS – KCS051
QUESTION BANK
UNIT 1
1. Define the term data science and data analytics.
2. Different types of data?
3. Develop and explain (Phases names) Data analytics life cycle.
4. Compare and contrast analysis and reporting in data analytics with suitable example.
5. What are the various stages in big data analytics life cycle? Illustrate with a figure, explaining
each of them.
6. What is Bigdata? Describe the main features of a big data in detail?
7. Compare structured, semi structured and unstructured data types.
8. Discuss Supervised and Unsupervised Learning with example.
9. Data Analysis Process Step.
10. Types of Data Analytics.
11. Analysis vs reporting.
12. Traditional Analytics Sturucure vs Modern Analytics Architecture.
UNIT 2
1. What is Decision Tree?
2. How can you deal with uncertainty?
3. Differentiate between fuzzy logic and Boolean logic.
4. Discuss some application of genetic algorithm-based classification.
5. Discuss different types of Time Series Data Analysis along with its Major Application area.
6. Discuss some application of Genetic algorithm-based classification.
7. Given data ={2,3,4,5,6,7;1,5,3,6,7,8} Compute the principal component using PCA.
8. What is prediction error? State and explain prediction error in regression and classification
with suitable example.
9. Discuss Bayesian data analysis.
10. Discuss technologies used for Multivariate analysis.
11. The following table shows the midterm and final exam grades obtained for students in a
database course.
(a) Plot the data. Do x and y seem to have a linear relationship?
(b) Use the method of least squares to find an equation for the prediction of a student’s final
exam
grade based on the student’s midterm grade in the course.
KIET Group of Institutions, Ghaziabad
Department of Computer Science & Information Technology
(An ISO – 9001: 2015 Certified & ‘A+’ Grade accredited Institution by NAAC)
(c) Predict the final exam grade of a student who received an 86 on the midterm exam.
UNIT 3
1.
2.
3.
4.
5.
6.
Discuss RTAP.
Data sampling is crucial for data analytics. How?
How to deal with uncertainty?
Write two issues in Stream Processing.
Discuss the concept of Estimating Moments in stream mining.
Discuss the architecture/Components of a General Stream Processing Model. List few sources
of Streaming Data.
7. Discuss the Publish/ Subscribe Model of Streaming Architecture.
8. Explain the Datar-Gionis-Indyk-Motwani (DGIM) Algorithm with example. Why the number
of buckets representing a window must be small ?
9. Explain distinct count problem for data streaming. How do you Identify unique users who have
made web server request (log analysis) in each month?
10. Explain any one algorithm to count number of distinct elements in a Data Stream.
11. Discuss any two sampling techniques.
12. Discuss FM Algorithm. Apply Flajolet-Martin algorithm on the following stream of data to
identify unique elements in the stream.
S=1,3,2,1,2,3,4,3,1,2,3,1
Given: h(x)=(6x+2) mod 5
13. Discuss Time series Data Analysis along with Its application area.
14. Discuss Support vector and Kernel Methods of Data Analysis.
UNIT 4
1. Briefly describe association rule mining.
2. Brief about the working of CLIQUE algorithm.
3. Discuss CLIQUE vs ProCLUS Clustering.
4. What is the curse of dimensionality explain with example?
5. Suppose that the data mining task is to cluster points (with (x, y) representing location) into three
clusters, where the points are
A1(2, 10), A2(2, 5),A3 (8, 4),B1 (5, 8), B2(7, 5),B3(6, 4),C1(1, 2),C2(4, 9). The distance function is
Manhattan distance. Suppose initially we assign A1, B1, and C1 as the center of each cluster,
respectively. Use the k-means algorithm to show only (A) the three cluster centers after the
second iteration. (B) the final three clusters.
6. Explain in detail about Apriori Algorithm. Also, give a short example to show that items in a
strong association rule may actually be negatively correlated.
7. Explain K Means Algorithm.
8. PCY vs Apriori.
9. Define Lift in Association Data Mining.
10. Hierarchal Clusturing or Types of Clusterning.
KIET Group of Institutions, Ghaziabad
Department of Computer Science & Information Technology
(An ISO – 9001: 2015 Certified & ‘A+’ Grade accredited Institution by NAAC)
11. Find all the association rule from the above given Transaction with Given minsup = 50%,
minconf=50%.
TID
ITEM BOUGHT
10
Beer,Nuts,Diaper
20
Beer, Coffee, Diaper
30
Beer, Diaper,Eggs
40
Nuts, Eggs, Milk
50
Nuts, Coffee, Diaper,Eggs, Milk
UNIT 5
1. Name some visualization tools.
2. Differentiate between RDBMS and Hadoop ?
3. How to find mean and sum of matrix using R.
4. List two Data Visualization Tool
5. What is the Role of name node in Hadoop.
6. Discuss Heartbeat in HDFS.
7. Illustrate and explain the concept of Map Reduce framework.
8. Briefly discuss about features of YARN.
9. Explain the working of Hadoop Distributed File Systems.
10. List and explain five R function used in Descriptive Statistics
11. Describe HIVE architecture with its features.
12. Discuss Main Components of Map Reduce.
13. NoSQL vs RDBMS.
14. BOX PLOT in R
Download