Web Data Mining: Survey Avneet , Hardeep Singh

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
Web Data Mining: Survey
Avneet *, Hardeep Singh#
*
M.Tech Student, #Assistant Professor
1
Dept of CSE, Lovely Professional University, Phagwara, Punjab
2
Dept of ECE, Lovely Professional University, Phagwara, Punjab
Abstract: - The use of World Wide Web increases day
A. Extract new information using the available data
by day, with this rapid growth it becomes a biggest
As we know the World Wide Web contains large number
database. Most of the people publicly access the web
of datasets and basically need to find the new information
services on daily basis and generate a huge amount of
from that available datasets. Like:-Accessing the user
data like:-text, images, multimedia, queries, and user
logs to extract the behavior of user and his interest.
logs and blogs data. When it concerns to mine the web
B. To find the relevant and effective knowledge
data then it’s quite difficult because it contains huge,
The web is collection of large numbers of datasets and
dynamic and diverse data. Web Usage Mining
when user query for any information they get accurate or
available to mine user logs, Web Structure Mining
inaccurate information.
available to mine the links structure between
C. To understand the users
Webpage’s, Web Content Mining available to mine
The web contains large records of users from where we
the content of webpage’s. This survey present the
can understand the behavior of users and provide the
types of web data mining, problems occur in web data
information to users according to their need.
mining and applications.
To find the relevant and useful information from the
biggest database it uses the data mining techniques called
Keywords: Web Data Mining, Web Usage Mining,
Web Structure Mining, Web Content Mining,
Problems, Applications
web data mining. The web data mining basically access
the online web data to clean extract and find relevant
information or knowledge discovery. Through web data
I. INTRODUCTION
mining, reduce the irrelevant data from the web page.
Data mining (knowledge discovery) is the process that
“Web data mining is an application of data mining that
helps to analyze data from different perspectives and
finds the interesting and potentially useful knowledge
summarize it into valuable information and also discover
from web page. Normally, it is expected that hyperlink
interesting patterns to improve the business processes.
structure and user data both helpful for mining process.”
We can also say it “GOLD MINING”. Web Data Mining
is application of Data Mining and its uses the techniques
of data mining to find the relevant information. With the
rapid growth of web data it become difficult to handle it
because necessary and unnecessary data available. With
the availability of huge web data users face a lot many
problems while interacting with the World Wide Web.
The main problems are:-
II. TYPES OF WEB DATA MINING
A. Web Usage Mining
The web usage mining basically used to predict the user
logs or behavior of website users. It basically uses the
secondary data that available on web to predict the user
behavior while users access the web. Most of this
information is usually generated automatically by Web
servers and collected in server access logs. Other sources
ISSN: 2231-5381
http://www.ijettjournal.org
Page 144
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
of user information include referrer logs which contains
connection through search engines and third party co
information about the referring pages for each page
links [5].This enables clustering of connected Web pages
reference, and user registration or survey data gathered.
to establish the relationship of these pages. It is quite
[6].To Analyze such data can help organizations to
useful to establish connection between two or more
determine the life time value of customers, cross
organizations. Web structure mining is the process of
marketing strategies across products. Analysis of server
using graph theory to analyze the node and connection
access logs and user registration data can also provide
structure of a web site.
valuable information on how to better structure a Web
site in order to create a more effective presence for the
C. Web Content Mining
organization [4]. It describes various steps to provide the
The web content mining basically extracts the hidden
effective information by usage mining:-
user specific data or knowledge from the available data or
1)
document.
Data Collection: First collect the user specific
data from the web servers.
With
this
enchase
improvement
of
accessibility each and every organizations focus to make
2) Data Preprocessing: After collect the data then
better web page with large scale of content. The web page
preprocess the logs and user specific data to get the
contains large scale of data like text, videos, audio,
efficient information from the huge data.
images and available in the form of unstructured, semi-
3) Data Clustering: After cleaning of data it is divided
into same clusters for easy pattern discovery.
4) Pattern Discovery and Analysis: After clustering the
structured and structured so difficult to find the
interesting and knowledge pattern from it. The various
challenges faced by content mining are:-
data is divided into pattern and then analysis the
1) Information
pattern to gain knowledge.
Extraction:
-
The
require
information/content is extracted from the various
B. Web Structure Mining
The structure mining basically concludes the summary of
web pages mostly concern to extract the structured
the structure of the web page. The content mining
data. There are various techniques to extract the
basically concern with the inner structure of the webpage
information automatically from web pages is
but structure mining consider the link structure of the
difficult.
hyperlinks at inter webpage level. The Web Structure
2) Web information integrating and schema matching: -
Mining basically includes the link structure of web page
Although the Web contains a huge amount of data,
and categorizes the web pages to find similarities and
each web site (or even page) represents similar
relationship between the link structures. The main
information differently. It is difficult to integrate the
purpose for structure mining is to extract unknown
different types of data and then schema matching.
relationships between Web pages. The web structure data
3) Detecting the noise:-To detect the noise from web
mining provides use for a business to link the information
pages
is
very
difficult
task.
Automatically
of its own Web site to enable navigation and cluster
segmenting Web page to extract the main content of
information into site maps. This allows its users the
the pages is interesting problem. Web page contains
ability to access the desired information through keyword
huge amount of content that changes time by time so
association and content mining. Hyperlink hierarchy is
difficult to find the specific information.
also determined to path the related information within the
4) Opinion extraction from online sources: There are
sites to the relationship of competitor links and
many online opinion sources, e.g., customer reviews
ISSN: 2231-5381
http://www.ijettjournal.org
Page 145
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
of products, forums, blogs and chat rooms. Mining
G. Web content Mining
opinions (especially consumer opinions) is of great
It is difficult to mine the content of social network sites
importance for marketing intelligence and product
because the content available in huge amount and
benchmarking. We will introduce a few tasks and
changes within millisecond. To find relevant information
techniques to mine such sources.
difficult when there is noisy and complicated data
available.
III. PROBLEMS OF WEB DATA MINING
H.
A. Indexing Data in search Engine
Web structured Mining
The search engine basically mines the data to assign the
It is difficult to find the complicated link structure
page ranking and better indexing but sometimes, it’s
between pages and also difficult to find duplicate web
complicated to do the indexing of large scale data for
pages or web page content.
search engine.
I.
B.
The user data is basically in privacy so mine the user data
Update Data
Web Usage mining
Most of the web pages require time to time update of data
to provide them better facility sometime create problem.
so to mine that particular web pages data and predict the
Like, when user acce25ss the YouTube then mine the
future trends is more difficult.
record of user search to provide him/her the recent added
C. Real time data updating
videos related to his/her interest but it doesn't ethical to
The web pages that basically works in real time
use user search data that is personal to user.
environment need more consideration to mine the data
IV. APPLICATIONS OF WEB DATA MINING
because the data update/change within millisecond like
Air ticketing.
Today most of the organizations rely on internet to
D. Predict the user requirements
improve their business and make better relationship with
To do the advertisement on user page you have the mine
customers. Web data mining provides analysis of
the data of user search to find user interest and changes in
available web data and now it extends analysis much by
need to provide better facilities to user and advertisement
combining the corporate information with Web traffic
according to user need. Sometimes user needs to change
data. Web mining tools can be extended and programmed
the web page time to time so it is difficult to predict the
to answer almost any query. Web data mining tools are
user requirement.
used in following areas:
E.
A. Web mining can provide companies managerial
Fraud and treat Analysis
As we know data mining basically helps to know fraud
insight into visitor
detection by checking the activities of particular user but
management take strategic actions accordingly [2].
difficult if person give wrong personal information on
B. The
company
profiles, which help top
can
obtain
some
subjective
web sites.
measurements
F.
effectiveness of their marketing campaign or
Security
through
Web
Mining
on
the
As we know through mining we can know the user
marketing research, which will help the business to
personal information whether we doesn’t know who is
improve and align their marketing strategies timely.
that particular user but we come to know the activities of
C. In the business world, structure mining can be quite
that particular user so it is a ethical issue to mine the user
useful in determining the connection between two or
log information and user profile information.
more business Web sites.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 146
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
D. The company can identify the strength and weakness
of its web marketing campaign through Web Mining,
and then make strategic adjustments, obtain the
feedback from Web Mining again to see the
improvement.
E.
Search engine Google provides advanced and
efficient searching capabilities [6]
V. CONCLUSION
The Internet has grown from a simple search tool to a
gold mine. Internet is a gold mine, but only for those
companies who adopt the web data mining strategy. Web
VI. REFEENCES
[1]
Robert Cooley, Bamshad Mobasher, Jaideep
Srivastava, “Web Mining: information and Pattern
Discovery on the WWW”.
[2]. Mary Garvin “Data Mining and the Web: What They
Can Do Together”.
[3] B. Masand, M. Spiliopoulou, J. Srivastava, O.
Zaiane, ed. Proceedings of “ WebKDD – Web Mining for
Usage Patternsand User Profiles 2002”, Edmonton, CA
[4] M. Spiliopoulou, “Data Mining for the Web”,
Proceedings of the Symposium on Principles of
Knowledge Discovery in Databases (PKDD).
[5] R. Kosala, H. Blockeel, “Web Mining Research: A
Survey”, in SIGKDD Explorations 2(1), ACM, July
2000.
data mining contains different categories to analysis
different kind of available web data. Most of the
Companies implement web data mining to understand
[6]
R. Cooley, B. Mobasher, J. Srivastava, “Data
Preparation for Mining World Wide Web Browsing
Patterns”.
their customers' profiles, and also to identify their own
strength and weakness of their E-marketing efforts on the
web through continuous improvements.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 147
Download