Uploaded by AMIRHANI2

Review Sheet 1.2

advertisement
Review Sheet 1
Question I: MCQ
1) Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?
a.
b.
c.
d.
Supervised learning
Unsupervised learning
Hybrid learning
Reinforcement learning
2) Which one of the following refers to querying the unstructured textual data?
a.
b.
c.
d.
Information access
Information update
Information retrieval
Information manipulation
3) Which of the following can be considered as the correct process of Data Mining?
a.
b.
c.
d.
Infrastructure, Exploration, Analysis, Interpretation, Exploitation
Exploration, Infrastructure, Analysis, Interpretation, Exploitation
Exploration, Infrastructure, Interpretation, Analysis, Exploitation
Exploration, Infrastructure, Analysis, Exploitation, Interpretation
4) Which of the following is an essential process in which the intelligent methods are applied to extract data
patterns?
a.
b.
c.
d.
Warehousing
Data Mining
Text Mining
Data Selection
5) What is KDD in data mining?
a.
b.
c.
d.
Knowledge Discovery Database
Knowledge Discovery Data
Knowledge Data definition
Knowledge data house
6) The adaptive system management refers to:
a.
b.
c.
d.
Science of making machine performs the task that would require intelligence when performed by
humans.
A computational procedure that takes some values as input and produces some values as the output.
It uses machine learning techniques, in which programs learn from their past experience and adapt
themself to new conditions or situations.
All of the above.
7) For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
a.
b.
c.
d.
In order to maintain consistency
For authentication
For data access
To obtain the queries response
8) What are the functions of Data Mining?
a.
b.
c.
d.
Association and correctional analysis classification
Prediction and characterization
Cluster analysis and Evolution analysis
All of the above
9) In the following given diagram, which type of clustering is used?
a.
b.
c.
d.
Hierarchal
Naive Bayes
Partitional
None of the above
10) Which of the following statements is incorrect about the hierarchal clustering?
a.
b.
c.
d.
The hierarchal type of clustering is also known as the HCA
The choice of an appropriate metric can influence the shape of the cluster
In general, the splits and merges both are determined in a greedy manner
All of the above
11) Which one of the following can be considered as the final output of the hierarchal type of clustering?
a.
b.
c.
d.
A tree which displays how the close thing are to each other
Assignment of each point to clusters
Finalize estimation of cluster centroids
None of the above
12) Which one of the following statements about the K-means clustering is incorrect?
a.
b.
c.
d.
The goal of the k-means clustering is to partition (n) observation into (k) clusters
K-means clustering can be defined as the method of quantization
The nearest neighbor is the same as the K-means
All of the above
13) Which of the following technique is supervised learning?
a.
b.
Classification
Clustering
c.
d.
Association rules
None of the above
14) Which one of the clustering technique needs the merging approach?
a.
b.
c.
d.
Partitioned
Naïve Bayes
Hierarchical
Both A and C
15) The self-organizing maps can also be considered as the instance of _________ type of learning.
a.
b.
c.
d.
Supervised learning
Unsupervised learning
Missing data imputation
Both A & C
16) The following given statement can be considered as the examples of_________. “Suppose one wants to predict
the number of newborns according to the size of storks' population by performing supervised learning”
a.
b.
c.
d.
Structural equation modeling
Clustering
Regression
Classification
17) In the example predicting the number of newborns, the final number of total newborns can be considered as
the _________
a.
b.
c.
d.
Features
Observation
Attribute
Outcome
18) Which of the following statement best describes classification?
a.
b.
c.
d.
It is a measure of accuracy
It is a function that maps a data object to a predefined class
It is the task of assigning a classification
None of the above
19) Which of the following statements is correct about data mining?
a.
b.
c.
d.
It can be referred to as the procedure of mining knowledge from data
Data mining can be defined as the procedure of extracting information from a set of the data
The procedure of data mining also involves several other processes like data cleaning, data
transformation, and data integration
All of the above
20) In data mining, how many categories of functions are included?
a.
b.
5
4
c.
d.
2
3
21) Which of the following can be considered as mapping of a set or objects with some predefined group or
classes?
a.
b.
c.
d.
Data set
Data Characterization
Data Sub Structure
Data Discrimination
22) The analysis performed to uncover the interesting statistical relationship between associated -attributes value
pairs are known as the _______.
a.
b.
c.
d.
Mining of association
Mining of correlation
Mining of clusters
All of the above
23) Which one of the following can be defined as the data object which does not comply with the general behavior
(or the model of available data)?
a.
b.
c.
d.
Evaluation Analysis
Outliner Analysis
Classification
Prediction
24) Which one of the following statements is correct about the data cleaning?
a.
b.
c.
d.
It refers to removing noise and duplicate data
It refers to the transformation of wrong data into correct data
It refers to removing missing and inconsistent data
All of the above
25) The classification of the data mining system involves:
a.
b.
c.
d.
Database technology
Information Science
Machine learning
All of the above
26) The issues like efficiency, scalability of data mining algorithms comes under_______
a.
b.
c.
d.
Performance issues
Diverse data type issues
Mining methodology and user interaction
All of the above
27) Which one of the following correctly defines the term cluster?
a.
b.
Group of similar objects that differ significantly from other objects
Symbolic representation of facts or ideas from which information can potentially be extracted
c.
d.
Operations on a database to transform or simplify data in order to prepare it for a machine-learning
algorithm
All of the above
28) Which one of the following refers to the binary attribute?
a.
b.
c.
d.
This takes only two values. In general, these values will be 0 and 1, and they can be coded as one bit
The natural environment of a certain species
Systems that can be used without knowledge of internal operations
All of the above
29) Which of the following correctly refers the data selection?
a.
b.
c.
d.
A subject-oriented integrated time-variant non-volatile collection of data in support of management
The actual discovery phase of a knowledge discovery process
The stage of selecting the right data for a KDD process
All of the above
30) Which one of the following correctly refers to the task of the classification?
a.
b.
c.
d.
A measure of the accuracy, of the classification of a concept that is given by a certain theory
The task of assigning a classification to a set of examples
A subdivision of a set of examples into a number of classes
None of the above
31) Which of the following correctly defines the term "Hybrid"?
a.
b.
c.
d.
Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
Decision support systems that contain an information base filled with the knowledge of an expert
formulated in terms of if-then rules.
Combining different types of method or information
None of these
32) Which of the following correctly defines the term "Discovery"?
a.
b.
c.
d.
It is hidden within a database and can only be recovered if one is given certain clues (an example IS
encrypted information).
An extremely complex molecule that occurs in human chromosomes and that carries genetic information
in the form of genes.
It is a kind of process of executing implicit, previously unknown and potentially useful information from
data
None of the above
33) Euclidean distance measure is defined as ___________
a.
b.
c.
d.
The process of finding a solution for a problem simply by enumerating all possible solutions according
to some predefined order and then testing them
The distance between two points as calculated using the Pythagoras theorem
A stage of the KDD process in which new data is added to the existing selection.
All of the above
34) Which one of the following can be considered as the correct application of the data mining?
a.
b.
c.
d.
Fraud detection
Corporate Analysis & Risk management
Management and market analysis
All of the above
35) Which of the following refers to the sequence of pattern that occurs frequently?
a.
b.
c.
d.
Frequent sub-sequence
Frequent sub-structure
Frequent sub-items
All of the above
36) The issues like "handling the rational and complex types of data" comes under which of the following
category?
a.
b.
c.
d.
Diverse Data Type
Mining methodology and user interaction Issues
Performance issues
All of the above
37) Which of the following also used as the first step in the knowledge discovery process?
a.
b.
c.
d.
Data selection
Data cleaning
Data transformation
Data integration
38) Which of the following refers to the steps of the knowledge discovery process, in which the several data
sources are combined?
a.
b.
c.
d.
Data selection
Data cleaning
Data transformation
Data integration
39) In certain cases, it is not clear what kind of pattern need to find, data mining should_________:
a.
b.
c.
d.
Try to perform all possible tasks
Perform both predictive and descriptive task
It may allow interaction with the user so that he can guide the mining process
All of the above
40) Which of the following is a probabilistic model?
a.
b.
c.
d.
Decision tree
Naïve bayes
Artificial Neural Network
Support Vector Machine
41) An example of continuous attribute is
a.
b.
c.
d.
Color
Gender
Weight
Zip code
42) An example of discrete attribute is
a.
b.
c.
d.
Temperature
Height
Income
Marital status
43) An attribute encompassing the notion of Good, Better, Best is
a.
b.
c.
d.
Nominal
Ordinal
Interval
Ratio
44) The Celsius temperature is --------- attribute
a.
b.
c.
d.
Nominal
Ordinal
Interval
Ratio
45) The length is --------- attribute
a.
b.
c.
d.
Nominal
Ordinal
Interval
Ratio
46) A categorical attribute can be
a.
b.
c.
d.
Nominal or ordinal
Ordinal or Interval
Interval or ratio
Ratio or nominal
47) which of the following operations are applicable to Nominal attribute
a.
b.
c.
d.
Distinctness
Addition
Multiplication
Order
48) Which of the following operations are applicable to Ratio attribute
a. Distinctness
b. Addition/Subtraction
c. Multiplication/Division
d. All of the above
49) Which of the following operations are applicable to temperature in Fahrenheit
a.
b.
c.
d.
Division
Addition
Multiplication
All of the above
50) Which of the following are ordered data
a.
b.
c.
d.
Temporal data
Molecular Structures
Document Data
World Wide Web
51) The data in the table are what type of data
a.
b.
c.
d.
Spatial Data
Transaction Data
Document Data
Sequential Data
TID
Items
1
2
3
4
5
Bread, Coke, Milk
Beer, Bread
Beer, Coke, Diaper, Milk
Beer, Bread, Diaper, Milk
Coke, Diaper, Milk
Question II: Fill in the spaces
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
social media, cyber security, satellite sensors
Sources of Big Data includes -------,-------, and ------5 Vs
Characteristics of big data includes --------, -------, --------, --------, and --------Velocity
High speed data generation and storage is called ------semi-structured, unstructured
Variety means different types of data such as --------, --------, and Structured,
-------Examples of high speed data are -------, -------, -------, and ---------------- is the Non-trivial extraction of implicit, previously unknown and potentially useful knowledge
from data Datamining
Ai, Machine learning, statistics,
Datamining draws ideas from other fields such as ------, -------, ------, ------, and ------database, pattern recognition
The main phases in datamining are -------,-------,-------, -------, and -------Aggregation,
sampling,
feature
selection, dimensionality
Data preprocessing comprises ------,-------,------,and ------reduction
Data cleaning involves --------, -------, and ------Datamining techniques includes two categories, they are --------- andpredictive,
----------- descriptive
supervised
In datamining there are two types of learning, they are --------- and (un)
-------Classification is and example for -------- learning while ------- is unsupervised learning supervised, clustering
Association rule is a type of -------- learning, while regression is -------- learning
Supervised learning creates models from ---------- data objects labeled
16. Unsupervised learning creates models from ---------- data objects unlabeled
classes
17. A classifier maps input data objects topredefined
---------18. --------- finds groups of objects such that the objects in a group will be similar to one another and
different from the objects in other groups clustering
19. -------- use some variables to predict unknown or future values of other variables predictive method
20. -------- find human-interpretable patterns that describe the data descriptive
21. Sky Survey Cataloging is an application of classification
-------clustering
22. Market segmentation is an application of ------23. --------- predicts a value of a given continuous valued variable based on the values of other variables,
assuming a linear or nonlinear model of dependency regression
24. Given a set of records each of which contain some number of items from a given collection
---------- produce dependency rules which will predict occurrence of an item based on occurrences of
other items. Association rules
25. ------- is an application of association rules market-basket analysis
26. ------- detects significant deviations from normal behavior anomaly detection
anomaly detection
27. Network Intrusion Detection is an application of --------28. Examples of classification models are ------, -------, and KNN,
------- naïve Bayes, neural network, tree
29. Examples of clustering models are --------, ---------, and -------- hieratical, density, grid, model based
30. Web mining employs -------- technique. clustering
31. An ----------- is a property or characteristic of an object attribute
object
32. A collection of attributes describes an ……….
field, feature, dimension, characteristic
33. Attribute is also known as …….., ……., or ----------34. Object is also known as …………, ………..,, or ---------- record, instance, sample
35. Types of attributes are ----------,------------,------------, and --------- nominal, ordinal, interval, ratio
36. ----------- attribute has only a finite or countably infinite set of values discrete
37. ----------- attribute has real numbers as attribute values continuous
sparsity, resolution, size
38. The important characteristics of data include -------,----------, ----------, anddimensionality,
-----------noise,
outliers,
fake,
missing,
wrong,
duplicate
39. Example of problem that affect data quality are ……….,…………….,………….,and………
40. ------------ are data objects with characteristics that are considerably different than most of the other
data objects in the data set outliers
Question III: Problems
1. Assume X and Y are two object in a training dataset where
X= [ 2 4 5 -1 0] and Y=[1 4 -3 6 7]
Compute the Euclidean Distance between X and Y
2. If X and Y are two binary objects where
X= 1000000000
Y= 0000001001
Compute the proximity using Jacard and Simple Matching Coefficients
3.
Compute the similarity between the two documents
d1 = 3 2 0 5 0 0 0 2 0 0
d2 = 1 0 0 0 0 0 0 1 0 2
4. Compute the correlation between the following two objects
x = (1, 2, 4, 3, 0, 0, 0), y = (1, 2, 3, 4, 0, 0, 0)
5. For the following splits compute
a. Gini-split
b. Gain-split
c. Gain-ratio
d. Classification error
Parent
C0
7
C1
5
Node 1 Node 2
C0
5
2
C1
1
4
Question IV: True/False
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Ordinal attributes are considered as continuous attributes
false
Classification is a type of unsupervised learning
false
Nominal attributes are quantitative
false
Data cleaning comprises removing noise and duplicated data
true
A classification model is induced from unlabeled data
false
true
A cluster is a group of similar objects
Recommendation System predicts whether a user would like a given resource or not true
Document recognition can recognize whether an e-mail is spam or not, based on the
message header and content true
Choosing a model is the first step in pattern recognition life cycle
false
Clustering Model is trained with data samples with unknown labels true
Number of classes is known a priori
true
Entropy measure is used for evaluating proximity
false
Gini index can be used as a measure for impurity measure
true
true
SVM is an algorithm for clustering
K-means is an algorithm for classification
false
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Download