File - Department of Information Technology-SRIT

advertisement
UNIT V
RECENT TRENDS
1.What are the classifications of tools for data mining?
 Commercial Tools
 Public domain Tools

2.What are commercial tools?
Commercial tools can be defined as the following products and usually are
associated with the consulting activity by the same company:
1. ‘Intelligent Miner’ from IBM
2. ‘SAS’ System from SAS Institute
3. ‘Thought’ from Right Information Systems. Etc
3.What is web mining usage?
 Mines web log records to discover user access pattern of web pages
 It can identify potential customer for e-commerce,enhance the quality and
delivery of internet information service to the end user and improve web server
systems performance
4. What are the applications of spatial database?
 Geographic information
 Geomarketing
 Remote sensing
 Image database exporation
 Medical imaging
 Navigation
5. What are Research prototypes?
Some of the research products may find their way into commercial
market: ‘DB Miner’ from Simon Fraser University, British Columbia, ‘Mining Kernel
System’ from University of Ulster, North Ireland.
6.What is the difference between generic single-task tools and generic multi-task
tools?
Generic single-task tools generally use neural networks or decision trees.
They cover only the data mining part and require extensive pre-processing and
postprocessing steps.
Generic multi-task tools offer modules for pre-processing and postprocessing
steps and also offer a broad selection of several popular data mining
algorithms as clustering.
7.What kind of association can be mined form multimedia data?
 Association between image content and content features
 Association among image contents that are not related to spatial relationship
 Association among image contents related to spatial relationship
1
8. What are the areas in which data warehouses are used in present and in future?
The potential subject areas in which data ware houses may be developed at
present and also in future are
 Census data:
The registrar general and census commissioner of India decennially
compiles information of all individuals, villages, population groups, etc. This
information is wide ranging such as the individual slip. A compilation of information
of individual households, of which a database of 5%sample is maintained for
analysis. A datawarehouse can be built from this database upon which OLAP
techniques can be applied,Data mining also can be performed for analysis and
knowledge discovery
 Prices of Essential Commodities
The ministry of food and civil supplies, Government of India complies
daily data for about 300 observation centers in the entire country on the prices of
essential commodities such as rice, edible oil etc, A data warehouse can be built
for this data and OLAP techniques can be applied for its analysis
9. What are the other areas for Data warehousing and data mining?






ommerce and Trade
10. Specify some of the sectors in which data warehousing and data mining are
used?

 Program Implementation



11.What is the use of DBMiner and applications of DBMiner
Used to perform data mining functions, including characterization,
association, classification, prediction and clustering.
The DBMiner system can be used as a general-purpose online analytical
mining system for both OLAP and data mining in relational database and
datawarehouses.Used in medium to large relational databases with fast response time.
12. Give some data mining tools.
 DBMiner
 GeoMiner
 Multimedia miner
 WeblogMiner
2
13. Mention some of the application areas of data mining
 DNA analysis
 Financial data analysis
 Retail Industry
 Telecommunication industry
 Market analysis
 Banking industry
 Health care analysis.
14. Differentiate data query and knowledge query
A data query finds concrete data stored in a database and corresponds to a
basic retrieval statement in a database system.
A knowledge query finds rules, patterns and other kinds of knowledge in a
database and corresponds to querying database knowledge including
deduction rules, integrity constraints, generalized rules, frequent patterns and
other regularities.
15.Differentiate direct query answering and intelligent query answering.
Direct query answering means that a query answers by returning exactly what
is being asked.
Intelligent query answering consists of analyzing the intent of query and
providing generalized, neighborhood, or associated information relevant to the
query.
16. What is Time Series Analysis and goals of Time series analysis?
A time series is a set of attribute values over a period of time. Time Series
Analysis may be viewed as finding patterns in the data and predicting future values.
1.Finding Patterns in the data
2.Predicting future values
17. Define visual data mining
Discovers implicit and useful knowledge from large data sets using data and/
or knowledge visualization techniques.
Integration of data visualization and data mining.
18. What does audio data mining mean?
Uses audio signals to indicate patterns of data or the features of data mining
results. Patterns are transformed into sound and music.
To identify interesting or unusual patterns by listening pitches, rhythms, tune
and melody.
19.What are the steps involved in DNA analysis?
Steps involved in DNA analysis
 Semantic integration of heterogeneous, distributed genome databases
 Similarity search and comparison among DNA sequences
 Association analysis: Identification of co-occuring gene sequences
 Path analysis: Linking genes to different stages of disease development
 Visualization tools and genetic data analysis
3
20.What are the factors involved while choosing data mining system?
 Data types
 System issues
 Data sources
 Data Mining functions and methodologies
 Coupling data mining with database and/or data warehouse systems
 Scalability
 Visualization tools
 Data mining query language and graphical user interface.
21. Define DMQL
Data Mining Query Language
It specifies clauses and syntaxes for performing different types of data mining
tasks for example data classification, data clustering and mining association
rules. Also it uses SQl-like syntaxes to mine databases.
22. Define text mining
Extraction of meaningful information from large amounts free format textual
data.Useful in Artificial intelligence and pattern matching
Also known as text mining, knowledge discovery from text, or content
analysis.
23. What does web mining mean
Technique to process information available on web and search for useful data.
To discover web pages, text documents , multimedia files, images, and other
types of resources from web.
Used in several fields such as E-commerce, information filtering, fraud
detection and education and research.
24.Define spatial data mining.
Extracting undiscovered and implied spatial information.
Spatial data: Data that is associated with a location
Used in several fields such as geography, geology, medical imaging etc.
25. Explain multimedia data mining.
Mines large data bases.
Does not retrieve any specific information from multimedia databases
Derive new relationships , trends, and patterns from stored multimedia data
mining.
Used in medical diagnosis, stock markets ,Animation industry, Airline
industry, Traffic management systems, Surveillance systems etc.
26.Give the lifecycle of data mining in the technology adoption terms?
The following stages are
 Innovators :takes form when there is need for methods to solve a particular
problem
 Early adopters : Interest increases as more and more methods are proposed
4
 Chasm : This represents the hurdles or challenges that must be met before the
technology
 Early majority : The technology becomes mature and is generally accepted and
used
 Late majority : The technology is well accepted ,but interest in it declines
 Laggards : outdated and old version
27. What are three types of dimensions in spatial database
Nonspatial dimension : contains only nonspatial data.
A spatial –to-nonspatial dimension : is a dimension whose primitive –level data are
spatial but whose generalization ,starting at a certain high level
A spatial –to-spatial dimension :is a dimension whose primitive level and all of its
high level generalized are spatial
28. Name some examples of data mining in retail industry?
• Design and construction of data warehouses based on the benefits of data mining
• Multidimensional analysis of sales,customers,products,time and region
• Analysis of the effectiveness of sales campaigns
• Customer retention-analysis of customer loyalty
• Purchase recommendation and cross-reference of item
29. What are the various approaches given for image based signature
 Color histogram based signature – The signature of an image includes color
histogram based on the color compositon
 Multifeature composed signature – The signature of an image includes a
composition of multiple features
 Wavelet-based signature – Uses the the dominant wavelet coefficients of an
image
 Wavelet-based signature with region –based granularity- the computation and
comparison of signatures are the granularity of regions not the entire image
30. What are the basic measures for assessing the quality of text retrieval
Precision : This is the percentage of retrieved documents that are in fact relevant to
the query
Precision = | { relevant}intersection{retrieved}
{ Retieved}
Recall : This is the percentage of documents that are relevant to the query and were in
fact retrieved
Recall = | { relevant}intersection{retrieved}
{ Relevant}
31. How to generalize a complex structure valued attributes
The methods are
 Generalize each attribute in the structure, while maintaining the shape of the
structure
 Flattening the structure and generalizing the flattened structure
 Summarizing the low level structures by high level concepts or aggregations
 Returning the type or an overview of the structure
5
32. What are the two multimedia indexing and retrieval systems available
Description –based retrieval systems : It builds indices and perform object retrieval
based on image description such as keyword, captions, size and time.It is labour
intensive .If automated the results are typically of poor quality
Content based retrieval systems : It supports retrieval based on the image content
such as color histogram ,texture, objects and wavelet transform.This uses visual
features to index images and promotes object retrieval based on feature
similarity,which is highly desirable in many applications
PART B
1.Explain briefly about Mining spatial database
2. Explain briefly how to store and manage multimedia objects
3. Explain briefly about Mining text database
4. What are the four major components that are used to characterize time series
data with example
5. Discuss in detail about any four data mining applications
6. What is the basic measure used for text retrieval? explain each with example
7. What are the salient features of time series data mining
8. Explain the various trends in data mining
9. Explain the role of data mining in financial data analysis
10. Explain briefly about mining the world wide web
6
Download