Research Journal of Applied Sciences, Engineering and Technology 7(22): 4779-4794,... ISSN: 2040-7459; e-ISSN: 2040-7467

advertisement
Research Journal of Applied Sciences, Engineering and Technology 7(22): 4779-4794, 2014
ISSN: 2040-7459; e-ISSN: 2040-7467
© Maxwell Scientific Organization, 2014
Submitted: January 23, 2014
Accepted: February 25, 2014
Published: June 10, 2014
An Extensive Review of Significant Researches in Data Mining
1
Paul P. Mathai and 2R.V. Siva Balan
Department of Computer Science and Engineering,
2
Department of Computer Applications, Noorul Islam University, Kanya Kumari, India
1
Abstract: An action that removes a few novel nontrivial data enclosed in large databases is defined as Data Mining.
On noticing the statistical connections between the items that are more regular in the operation databases traditional
data mining methods have spotlighted mostly. Numerous functions are using data mining in dissimilar fields like
medical, marketing and so on commonly. Several methods and techniques have been extended for mine the in order
from the databases. In this study, we provide a comprehensive survey and study of various methods in existence for
item set mining based on the utility and frequency and association rule mining based research works and also
presented a brief introduction about data mining and its advantages. Moreover we present a concise description
about the Data Mining techniques, performance review and the instructions for future research.
Keywords: Data mining, frequency, item sets, Knowledge Discovery Database (KDD), utility
INTRODUCTION
Data mining has emerged as a significant research
area over the previous two decades. Data mining has set
up its position as an outstanding and chief research area
(Tsytsarau and Palpanas, 2012) and moreover identified
as knowledge detection in databases. Data mining is
apprehended with study of large volumes of
information to routinely find out interesting regularities
or associations which in turn leads to enhanced
considering of the essential processes (Laxman and
Sastry, 2006). In trade research data mining has been
significantly applied in the study of customer dealings
where it is named as market basket study. To recognize
the buy patterns of the alpha consumer (Bhattacharya
and Dubey, 2012) market basket study has also been
employed.
For recognizing samples and tendencies from large
quantities of data mining knowledge has come forward
as a means. Knowledge Discovery in Databases, also
identified as data mining, has been described as "The
nontrivial extraction of implicit, formerly unidentified
and potentially helpful information from data" (Deypir
and Sadreddini, 2011). To remove configured
understanding routinely from huge data sets Data
mining is applied. The data that is ‘mined’ is conveyed
as a replica of the semantic arrangement of the dataset,
where in the forecast or categorization of the attained
information is made easy with the help of the model
(Patil et al., 2012; Shahnawaz et al., 2011).
The arrangement of the data mining method and
algorithms with the widespread economic intentions of
the tasks sustained by data mining is necessary so as to
allow the supplementary impact of data mining on
business functions (Sreenivasarao and Vidyavathi,
2010). The final cost-effective usefulness attained as
the product of the data mining artifact has the impact of
all the different stages of the data mining procedures. It
is significant to regard the economic function of getting
information, removing a model and using the gained
knowledge (Okada et al., 2011). The assessment of the
decisions made on the basis of the educated knowledge
is powered by the economic effectiveness. The
uncomplicated evaluation measures such as prophetic
precision (Erwin et al., 2007) is substituted by the
economic measures, for instance, abundance and revisit
on investment. To mean any form of large-scale
information or in sequence processing the expression is
often distorted. The automatic or semi-automatic
analysis of huge quantities of data to extract formerly
unfamiliar interesting patterns (Geurts et al., 2005) is
the real data mining job.
Descriptive Mining and Predictive Mining are the
two Data mining assignments. To discover humaninterpretable patterns that explain the information, the
Descriptive Mining techniques such as Clustering,
Association Rule Discovery, Sequential Pattern
Discovery, is applied. In order to forecast unidentified
or future values of other variables (Tsay and Chiang,
2005) the Predictive Mining methods like
Classification, Regression and Deviation Detection,
uses a few variables.
Mining based on item-sets frequency: In data mining
regular item set mining is a conventional and significant
problem. An item-set is repeated if its support is not
Corresponding Author: Paul P. Mathai, Department of Computer Science and Engineering, Noorul Islam University, Kanya
Kumari, India
4779
Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
less than a brink stated by users (Otey et al., 2004).
Conventional regular item-set mining approaches have
chiefly regarded as the crisis of mining static operation
databases. In the operation data set regular item-sets are
the item-sets that happen often. To recognize all the
regular item-sets in a operation dataset (Valtchev et al.,
2002) is the objective of Frequent Item-set Mining.
Within the finding of relationship rules it created as a
phase, but has been simplified autonomous of these to
several other samples. It is confronting to enlarge
scalable methods for mining regular item-sets in a huge
operation database (Mao et al., 2007) as there are
frequently a great number of diverse single items in a
distinctive transaction database and their groupings may
form a very vast number of item-sets.
Mining based on item-sets utility: Depending upon
his circumstance of usage (Pillai and Vyas, 2010) as
precised by the user a high utility item-set is the one
with utility value larger than the minimum brink utility.
A wide topic that wraps all features of economic utility
in data mining (Kannimuthu et al., 2011) is known to
be utility-based data mining. It includes the work in
cost-sensitive education and dynamic learning as well
as work on the recognition of uncommon events of high
effectiveness value by itself. By maintaining this in
mind, we at this point offer a set of algorithms for
mining all sorts of utility and frequency based item-sets
from a trade business deal database which would
considerably aid in inventory control and sales
promotion (Weiss et al., 2005). Consideration of a
utility based mining approach was motivated by
researchers due to the limitations of frequent or rare
item-set mining, which permits a user to suitably
communicate his or her views regarding the usefulness
of item-sets as utility values and then find item-sets
with high utility values higher than a threshold (Song
and Rajasekaran, 2006). Identifying the lively
customers of each such type of item-set mined and rank
them based on their total business value can be done by
these set of algorithms. This would be enormously
supportive in developing Customer Relationship
Management (CRM) processes like campaign
management and customer segmentation (Gopalan and
Sucahyo, 2002). In all types of utility factors like profit,
significance, subjective interestingness, aesthetic value
etc the utility based data mining is a newly absorbed
research area. This can add economic and business
utility to existing data mining processes and techniques.
A research area inside utility based data mining
identified as high utility item-set mining is intended to
discover item-sets that introduce high utility (Shankar
et al., 2009).
EXTENSIVE REVIEW OF SIGNIFICANT
RESEARCHES ON MINING
TECHNIQUES
A range of research methodologies employed for
developing the successful framework for mining the
item sets or patterns from the database. Several research
works were developed to accomplish the mining
process in database based on the item sets frequency
and utility. The techniques and algorithms perform the
mining process based on the item-sets frequency and
utility are reviewed in this section.
Literature survey on data mining techniques based
on item-sets frequency: An algorithm for mining
regular item-sets data mining assignments has been
proposed by Grahne and Zhu (2005). They offered FParray technique in this proposed method that seriously
diminishs the require to go across FP-trees, therefore
attaining considerably enhanced presentation for FPtree-based algorithms. For light data sets this proposed
method labors particularly well. In addition, they
offered an algorithm for mining all, maximal and
secured frequent item-sets. The effects have explained
that methods were the high-speed for several cases.
Algorithms were still the fastest ones when the
minimum hold up was low, although the algorithms
devour much recall when the data sets were light.
CHARM, a competent algorithm for mining all
regular sealed item-sets has been presented by Zaki and
Hsiao (2005). By means of a competent hybrid search
that skips many levels, it specifies sealed sets by a dual
item-set tidset search tree. To diminish the recall
footprint of transitional computations it furthermore
applies a technique named diff sets. At last, it uses a
fast hash-based approach to eradicate any “non-closed”
sets found during calculation. For rule production and
apparition CHARM-L, an algorithm was presented by
them that yield the sealed item-set lattice, which was
extremely useful. A widespread investigational
assessment on a number of actual and artificial
databases illustrates that CHARM was a state-of-the-art
algorithm that do better than earlier methods.
Supplementary, CHARM-L clearly produces the
regular sealed item-set lattice.
A usefulness based item-set mining approached to
rise above this restriction has been proposed by Yao
and Hamilton (2006). The suggested approach allows
clients to enumerate their favorites regarding the
effectiveness of item-sets by means of utility values.
The effectiveness of an item-set was exemplified as a
utility restraint. To facilitate was, an item-set was
fascinating to the customer only if it pleases a certain
usefulness restraint. To utility restraints the reducing
strategies utilized in preceding item-set mining
approaches could be used was shown by them.
Numerous mathematical properties of usefulness
restraints were identified by them. Subsequently, two
new reducing strategies are planned. By integrating
these reducing strategies, two algorithms for usefulness
based item-set mining were built up. As a result of
relating them to artificial and true world databases the
algorithms were assessed. Investigational effects
4780 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
demonstrate that the proposed algorithms were efficient
on the databases analyzed.
Mining regular samples on approach for mining
item-sets has been proposed by Raissi et al. (2007). An
approach in proposed method has the subsequent
benefits: a competent symbol of things and a new data
configuration to keep regular patterns combined with a
high-speed reducing approach. Customers could subject
requests for regular item-sets more than a random time
interval at whatever time. The approach creates a fairly
accurate answer with a guarantee that it was not bypass
user-defined occurrence and temporal brinks moreover.
By a series of tests on dissimilar datasets this proposed
method was examined at last.
A technique of regular sample mining has been
proposed by Han et al. (2007). They provide a short
over-view of the present class of regular sample mining
and explain a few promising study directions in this
proposed method. They considered that regular sample
mining study has considerably widened the range of
information analysis and will have unfathomable blow
on data mining methodologies and functions in the long
run. There was however some taxing study issues that
need to be worked out before regular pattern mining
could be declare a cornerstone approach in data mining
functions on the other hand.
Liu et al. (2008) have proposed to eradicate the
redundancy. Generator based representations depend on
a negative border to formulate the representation
without loss. On the other hand, the number of itemsets on a negative border now and then even goes
beyond the entire number of frequent item-sets. A
positive border mutually with frequent generators to
form a lossless representation is being proposed to be
used in this proposed method. A positive border was
generally in orders of enormity lesser than its
equivalent negative border. A set of common generators
plus its positive border was at all times no larger than
the equivalent complete set of frequent item-sets, thus it
was an accurate brief representation. The depth-firstsearch strategy is used by the GrGrowth algorithm to
discover the search space, which was a lot more capable
than the breadth-first-search approach adopted by most
of the existing generator mining algorithms. The
GrGrowth algorithm was extensively faster than levelwise algorithms for mining generator based
representations and was comparable to the state-of-theart algorithms for mining frequent closed item-sets.
This was shown by the experimental results.
Han et al. (2008) have proposed an efficient onepass algorithm in sliding windows over data streams
with an error bound guarantee. This algorithm did not
need to refer to obsolete transactions when they were
removed from the sliding window. It exploits a compact
data structure to maintain potentially frequent item-sets
so that it can output recent frequent item-sets at any
time. Flexible queries for continuous transactions in the
sliding window can be answered with an error bound
guarantee. The experimentation on this collection
shows that proposal gets better results than graph-based
image classification using some algorithms reported in
related work.
An algorithm for mining regular item-sets were
decisive for mining organization rules as well as for
numerous other data mining tasks has been proposed by
Songa et al. (2008). Using BitTable configuration
techniques for mining regular item-sets have been
executed. BitTableFI was such a newly suggested
competent BitTable-based algorithm, which develops
BitTable both parallel and upright. By using breadthfirst hunt at one time and by calculating the count
index, those item-sets that co-occurrence with delegate
item could be recognized rapidly. After that, depth-first
hunt approach was applied to create all other regular
item-sets for the effecting item-sets produced via the
index display. The hybrid look for is executed and the
hunt space is diminished greatly as a result. The
benefits of the suggested methods were as follows. The
unnecessary procedures on connection of tidsets and
frequency-checking could be kept away from greatly on
the one hand. With all the mixtures of items in its
subsume index it was confirmed that regular item-sets,
together with representative thing and having the
similar supports as representative thing, could be
recognized directly by linking the representative item
on the other hand. The effectiveness is enhanced and
the cost for dealing out this type of item-sets was
subordinated accordingly. Investigational effects have
explained that the suggested algorithm was competent
particularly for solid datasets.
A technique of Non-derivable regular item-sets has
been proposed by Li and Chen (2009), which
accumulates all of the data enclosed in regular item-sets
by less space, therefore being more appropriate for
stream mining. To recognize the addressed, namely,
how to extract non-derivable regular item-sets in an
incremental fashion is to be the considered problem.
Competently to continue a vigorously chosen set of
item-sets, they have planned a compacted data structure
named NDFIT. In order to diminish the unnecessary
computational cost based on their properties the nodes
were classified into four groups in NDFIT. To produce
non-derivable regular item-sets above stream sliding
window an optimized algorithm named NDFIoDS was
suggested as a result.
Masseglia et al. (2008) have approaches which has
two most important disadvantages. They depend on the
aforementioned uninformed association of information
initially. From among the accumulated information,
next they cannot routinely remove “seasonal peaks”. In
order to expose the densest eras routinely a specific
data mining process (in particular, to extract frequent
behavior patterns) is used in this suggested method.
Method removes the regular chronological samples
4781 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
associated to the removed eras from the entire set of
feasible amalgamations. A stage was well thought-out
to be solid if it encloses at least one regular
chronological sample for the set of clients associated to
the website in that era. Our approach is capable to
remove both regular chronological samples and the
tests demonstrate that the removed periods were related
and the linked solid periods.
For taking out regular item sets in a record, a
competent Regular Items Ultra-metric Trees (FIUT)
technique has been proposed by Tsaya et al. (2009). To
improve its effectiveness in attaining regular item sets,
FIUT used a particular regular Items Ultra-metric tree
(FIU-tree) arrangement. FIUT has four major benefits
compared to associated work. Primary, by scrutinizing
the database only two times it was reduced I/O
overhead. Next, the FIU-tree was an enhanced way to
separating a database, which effects from gathering
operations and considerably diminishes the look for
space. Third, for condensed storage only regular items
in every operation were introduced as nodes into the
FIU-tree. Lastly, all regular item sets were produced by
testing the leaves of each FIU-tree, with no go across
the tree recursively, which considerably diminishes
calculating time. A famous and widely used algorithm,
FIUT was balanced with FP-growth and the
reproduction effects illustrated that the FIUT
outperforms the FP-growth.
A competent method to find out the whole set of
current regular patterns from a quick data torrent over a
downhill window has been proposed by Tanbeer et al.
(2009). To incarcerate the current torrent information
satisfied and competently eliminate the outdated old
torrent information content they widened a Compact
Pattern Stream tree (CPS-tree). In order to generate an
extremely compressed frequency-descending tree
arrangement at runtime, they moreover established the
thought of active tree reforming in our CPS-tree. By
means of a FP-growth mining method the entire set of
current regular samples was attained from the CPS-tree
of the recent window. Wide-ranging investigational
analyses have illustrated that their CPS-tree was
extremely competent in terms of memory and time
intricacy when discovering new frequent patterns from
a rapid data torrent.
Woo and Lee (2009) have proposed a method of
tracing the set of MFIs instantly over an online data
stream. The method, namely estMax, maintains the set
of frequent item sets by a prefix tree and extracts all
MFIs without any additional superset/subset checking
mechanism. Upon processing a new transaction, those
frequent item sets that are matched maximally by the
transaction were newly marked in their corresponding
nodes of the prefix tree as candidates for MFIs. At the
same time, if any subset of a newly marked item set has
been already marked as a candidate MFI by a previous
transaction, it was cleared as well. By employing this
additional step, it was possible to extract the set of
MFIs at any moment. The performance of the estMax
method was comparatively analyzed by a series of
experiments to identify its various characteristics.
A method for regular sealed item-sets for frequent
item-sets has been proposed by Liu et al. (2009). From
a milestone window mining regular closed item-sets
above data streams was a taxing problem on the other
hand. In this algorithm (called FP-CDS) that can
confine all regular sealed item-sets and a novel storage
arrangement (called FP-CDS tree) that can be
vigorously altered to imitate the development of itemsets’ frequencies over time. A milestone window was
separated into a number of fundamental windows and
these fundamental windows were utilized as revised
units. Based on a few suggested approaches potential
regular sealed item-sets in every basic window were
mined and stored in FP-CDS tree. To certify the
suggested method widespread tests were performed.
In order to mine the set of repeated item-sets from
data flows inside a transaction-sensitive sliding window
which consists of a permanent number of transactions a
successful bit-sequence based, one-pass algorithm,
named MFI-TransSW (Mining Frequent Item-sets
within a Transaction-sensitive Sliding Window), has
been proposed by Li and Chen (2009). There are three
phases in the proposed MFI-TransSW algorithm:
window initialization, window sliding and pattern
generation. Initially, in the window initialization stage
each piece of every transaction was programmed in an
efficient bit-sequence illustration. To decrease the time
and memory required to slip the windows in the
subsequent stages the suggested bit-sequence
illustration of point was applied. Next, to go down the
windows competently in the window sliding phase,
MFI-Trans SW applies the left bit-shift method. At last,
the entire set of regular item-sets inside the current
sliding window was produced by a level-wise technique
in the sample production phase.
To diminish the injuries to isolation sourced by
mean parties throughout the rule mining process,
isolation Preserving Data Mining (PPDM) algorithms
effort has been offered by Sekhavat and Fathian (2010).
Generally, these algorithms were planned for the semihonest model, where members do not depart from the
protocol. They safeguard the isolation of members in a
collusion-free replica of the repeated item-set mining
process, where the protocol guards against inquisitive
harass and complicity in this suggested method. Two
privacy-preserving repeated item-set mining algorithms
were explained in this suggested method for both twoparty and multi-party states in a collusion-free model
for up and down partitioned (heterogeneous) data.
Additionally, an isolated evaluating technique was
suggested, which counts solitude based on the amount
of released perceptive data.
4782 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
Shankar et al. (2010) have presented a technique
for ‘data-centered pattern mining’ to ‘domain driven
actionable knowledge discovery’ for considering the
business yield (utility) and demand or rate of recurrence
of the items (frequency). In rhis proposed method, they
presented a set of algorithms for mining all types of
utility and frequency based item-sets from a retail
business transaction database which would significantly
aid in inventory control and sales promotion. This set of
algorithms were also capable of identifying the active
customers of each such type of item-set mined and rank
them based on their total or lifetime business value
which would be extremely helpful in improving
Customer Relationship Management (CRM) processes
like campaign management and customer segmentation.
A method for ‘data-centered pattern mining’ to
‘domain driven actionable knowledge discovery’ for
taking into account the business yield (utility) and
require or rate of recurrence of the items (frequency)
has been offered by Shankar et al. (2010). They offered
a set of algorithms for mining all kinds of usefulness
and occurrence based item-sets from a trade business
operation database which would considerably help in
inventory manage and sales endorsement in this
suggested method. Based on their total or lifetime
business value which would be awfully useful in
developing Customer Relationship Management (CRM)
procedures like campaign administration and customer
segmentation, this set of algorithms were furthermore
capable of recognizing the dynamic customers of every
such type of item-set mined and grade them.
In order to develop the grouping precision of
Frequent Item-set-Based Hierarchical Clustering
(FIHC) method, an efficient Fuzzy Frequent Item-setbased Hierarchical Clustering (F2IHC) approach, which
exploits fuzzy association rule mining algorithm, has
been presented by Chen et al. (2010). The key phrases
were removed from the document set and every
document was pre-processed into the nominated
representation for the following mining process in this
approach. After that, to find out a set of highly-related
fuzzy repeated item-sets, a fuzzy organization rule
mining algorithm for content was occupied which
enclose key terms to be considered as the labels of the
applicant clusters. By referring to these applicant
clusters these manuscripts was grouped into a
hierarchical cluster tree at last. They have carried out
tests to assess the presentation based on Classic 4,
Hitech, Re 0, Reuters and Wap datasets. The
investigational effects demonstrate that approach not
only utterly keeps the advantages of FIHC, however
furthermore develops the exactness value of FIHC.
Attention was made by Guo et al. (2011) on
resourcefully discovering for the global Approximate
Closed Frequent Item-sets (ACFIs) over streams. This
purposely is for a multiple, continuous, rapid and timevarying data stream, a fast, incremental, real-time and
little-memory-cost algorithm. A Max-Frequency
Pattern Tree (MFP-Tree) structure is established which
is based on the max-frequency window model to
maintain summary information over the global stream.
Consequently, a novel algorithm Generating Global
Approximate Closed Frequent Item-sets on MaxFrequency Window model (GGACFI-MFW) was
proposed to update the MFP-Tree with higher
efficiency. The effectiveness and efficiency of the
proposed approach is shown by the case studies.
A proposal is made to mine frequent patterns by
agreeable minimum support threshold by NadimiShahrakia et al. (2011). Deciding on a suitable value for
minimum support threshold was generally difficult.
This is because the suitable value depends on the type
of application and anticipation of the user. In various
real-time applications such as web mining and ebusiness, discovering correlations between patterns by
changing the minimum support threshold was required.
Researchers have introduced interactive mining of
frequent patterns since rerunning mining algorithms
from scratch was incredibly costly and takes long time.
Based on prime number and its characteristics mainly
for interactive mining of frequent patterns this proposal
is made. Separation of the mining model from the
mining process is done by this method in such a way
that once the mining model was constructed; it can be
regularly used by mining process with various
minimum support thresholds. During the mining
process, a candidate set called candidate head set and
more than a few efficient pruning techniques are
utilized by the mining algorithm which reduces the
number of candidate patterns and comparisons. The
efficiency of method for interactive mining of frequent
patterns is verified by the experimental results.
Literature survey on data mining techniques based
on item-sets utility: A crisis however with two
supplementary developments has been proposed by Hu
and Chen (2006). Primarily, to accumulate the decisive
data about repeated samples they propose a FP-tree-like
arrangement, MIS-tree. For mining all repeated itemsets, a competent MIS-tree-based algorithm, named the
CFP-growth algorithm, was improved as a result. In
order to rate up this time-consuming modification
process, a competent algorithm which can continue the
MIS-tree arrangement devoid of rescanning file was
intended. Tests on both artificial and real-life datasets
illustrate that our algorithms were much more
competent and scalable than the earlier algorithm.
An incremental algorithm for upholding the
generator demonstration in active datasets has been
proposed by Xu and Xie (2006). A type of lossless,
brief demonstration of the set of repeated item-sets is
the generator representation. It may be arrays of degree
lesser than the set of repeated item-sets. Moreover, the
algorithm employs a new optimization based on
4783 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
generator edges for the first time in the prose. Generator
edges were the mediocre among repeated generators
and further item-sets. Repeated generators could be
produced through observing them. Widespread tests
demonstrate that this algorithm was more competent
than earlier solutions.
The idea of a profound item-set has been presented
by Palshikar et al. (2007). An item-set A was intense
(for given support and confidence values) if all feasible
organization rules made up of things merely in A were
present. An effortless needed and adequate condition
for an item-set to be intense was proved by them. A
principle for the number of feasible rules for a specified
heavy item-set and demonstrate that a heavy item-set
packed together symbolizes an exponential number of
organization rules are presented by them. To produce a
group of displace grave item-sets in a specified
operation database they give a competent insatiable
algorithm beside with two easy search algorithms. An
adapted apriori algorithm that begins with a set
collection of displace heavy item-sets and discerns
more heavy item-sets, not necessarily displace with the
known one is then presented by them.
An algorithm for repeated item set mining that
recognizes high-utility item amalgamations has been
presented by Hu and Mojsilovic (2007). The objective
of the algorithm was to discover segments of
information, described via mixtures of few items
(rules), which please assured conditions as an assembly
and exploit a predefined intention purpose in difference
to the long-established organization rule and repeated
item mining methods. They put together the charge as
an optimization problem, give a competent rough
calculation to work out it through focused division
trees, named High-Yield Partition Trees and examine
the presentation of dissimilar splitting approaches.
A technique the Isolated Items Discarding Strategy
(IIDS), for presented level-wise usefulness mining
technique to diminish applicants has been proposed by
Li et al. (2008). A widespread form of the share mining
model, utility mining, efforts to surmount this
difficulty. Given that Apriori trimming approach could
recognize high usefulness item-sets, raising a wellorganized algorithm was critical for usefulness mining.
This suggested technique, the Isolated Items Discarding
Strategy (IIDS), which could be used to any accessible
level-wise usefulness mining technique to diminish
applicants and to enhance presentation. The most
competent branded models for split mining were
ShFSM and DCG, which also labor sufficiently for
usefulness mining as well. The two techniques FUM
and DCG were executed, correspondingly by using
IIDS to ShFSM and DCG. For both artificial and actual
datasets, investigational effects expose that the
presentation of FUM and DCG+ was more competent
than that of ShFSM and DCG, correspondingly. Hence,
for usefulness mining IIDS was an efficient approach.
For mining secular high usefulness item-sets from
information streams competently and successfully a
new method, namely THUI (Temporal High Utility
Item-sets)-Mine has been proposed by Chu et al.
(2008). This effort was on mining temporal high
usefulness item-sets from information streams to the
awareness. The new involvement of THUI-Mine was
that it can successfully recognize the secular high
usefulness item-sets by producing smaller quantity
candidate item-sets such that the implementation time
can be diminished considerably in mining all high
usefulness item-sets in information streams., By means
of less memory gap and implementation time, the
course of finding out all temporal high usefulness itemsets under all-time windows of information streams
could be realized successfully in this suggested method.
This convenes the serious constraints on time and space
effectiveness for mining information streams. By
investigational assessment, THUI-Mine was shown to
considerably outperform other presented techniques
like
Two-Phase
algorithm
under
different
investigational conditions.
Li and Lee (2008) have regarded a setback how to
mine non-derivable repeated item-sets in an incremental
fashion that to the best of information has not been
addressed. To competently uphold an energetically
chosen set of item-sets they propose a compressed data
structure called NDFIT. To decrease the unnecessary
computational cost based on their properties the nodes
were divided into four categories in NDFIT.
Accordingly, an optimized algorithm called NDFIoDS
was suggested to produce non-derivable repeated itemsets above stream sliding window. Investigational
effects demonstrate that this technique is efficient and
more competent than earlier approaches.
To find out item-sets in a operation database with
usefulness values High utility item-sets mining method
broadens repeated sample mining has been proposed by
Erwin et al. (2008). Transaction Weighted Utility
(TWU) suggested newly by investigators has antimonotone property, however it was an over value of
item-set usefulness and thus directs to a bigger search
space. Based on a compressed usefulness sample tree
information structure, they suggested an algorithm that
employs TWU with sample growth. In this algorithm
executes a similar protrusion proposal to employ disk
storage when the main memory was insufficient for
dealing with great datasets. Our algorithm was more
competent compared to preceding algorithms and might
be larger datasets of both solid and thin information
holding long patterns was shown by investigational
assessment.
For competently and successfully mining high
usefulness item-sets from large databases with
contemplation of pessimistic item values a method,
namely HUINIV (High Utility Item-sets with Negative
Item Values)-Mine has been proposed by Chu et al.
4784 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
(2009). To the information, this was the primary work
that believes the idea of negative item values in
usefulness mining. By producing smaller quantity of
high transaction-weighted consumption item-sets such
that the implementation time could be diminished
considerably in mining the high usefulness item-sets,
the new involvement of HUINIV-Mine was that it can
successfully recognize high utility item-sets. In this
suggested method, the procedure of finding out all high
usefulness item-sets with deliberation of negative item
values could be achieved successfully with less
necessities on memory space and CPU I/O., It was
shown in the course of experimental assessment that
HUINIV-Mine superior in performance comparing to
other methods significantly by producing much less
applicant item-sets under divergent experimental
environment.
Synergies of Operations Research and Data Mining
have been proposed by Meisel and Mattfeld (2010).
Synergies can be accomplished by incorporation of
optimization methods into Data Mining and vice versa.
They characterize three classes of synergies and
demonstrate all of them by cases in particular. Based on
a common report of aims, the classification was done
prerequisites as well as procedure models of Operations
Research and Data Mining. At the junction of the two
systems it provides as a structure for the evaluation of
approaches.
Lan et al. (2011) have planned a type of samples,
named high on-shelf usefulness item-sets, which
regards not only entity yield and number of every item
in an operation however moreover common on-shelf
time eras of a product amalgamation. They furthermore
offer a two-phased mining algorithm to successfully
and competently find out high on-shelf usefulness itemsets. The feasible applicant on-shelf usefulness itemsets inside every time period were established level by
level in the initial phase. The applicant on-shelf
usefulness item-sets were more made sure for their real
usefulness values by an added database check in the
next phase. The test effects on artificial datasets
furthermore demonstrate the suggested approach has an
excellent presentation at last.
According to clients’ first choice, Lin et al. (2011)
have proposed to mine organization rules; Utility
mining was therefore put forwarded to regard costs,
profits and further measures. To obtain high usefulness
samples successfully and competently the High
Usefulness sample tree (HUP tree) was planned and the
HUP-growth mining algorithm was suggested in this
proposed technique. In order to make use of the
downward-closure property and produce a compacted
tree arrangement the suggested approach incorporates
the earlier two-phase process for usefulness mining and
the FP-tree idea. In implementation time test effects
moreover demonstrate that the suggested approach has
presentation than Liu et al. (2009) two-phase algorithm.
The numbers of tree nodes produced from three
dissimilar thing arranging techniques were furthermore
compared, with effects demonstrating that the
frequency ordering makes less tree nodes than the other
two at last.
A method to discover the high average-utility itemsets has been proposed by Hong et al. (2011). The
utility of an item-set was the abridgment of the utilities
of the item-set in all the operations in spite of its length
by tradition. To make public an enhanced utility result
of combining numerous items than the original
usefulness measure the usual utility measure was
therefore implemented in this thesis. It was identified as
the total usefulness of an item-set separated by its
number of things inside it. The average-utility itemsets, plus the unique utility item-sets, does not contain
the ‘‘down-ward-closure’’ property. To competently
discover the high average-utility item-sets a mining
algorithm was then recommended. In order to
miscalculate the real average utilities of the item-set
and processes it in two stages, it make use of the
abstract of the maximal usefulness among the things in
every operation with the tar-get item-set as the higher
leap. Unsurprisingly, the mined elevated average-utility
item-sets in the recommended way will be lesser than
the high usefulness item-sets under the similar porch.
Consequently with a further important and related
criterion the recommended approach can therefore be
performed under a bigger threshold than the unique.
The presentation of the recommended algorithm was
demonstrated by the test effects too.
Chang (2011) have proposed a method for
weighted sequential pattern mining, usually pre
assigned weights of data elements are used to get the
importance, which were derived from their quantitative
information and their importance in real world
application domains. In general sequential pattern
mining, the generation order of data elements was
considered to find sequential patterns. However, their
generation times and time-intervals are also important
in real world application domains. Therefore, timeinterval information of data elements can be helpful in
finding more interesting sequential patterns. In this
proposed method for finding Time-interval Weighted
Sequential (TiWS) patterns in a sequence database and
Time-interval Weighted support (TiW-support) to find
the TiWS patterns. In addition, a method of mining
TiWS patterns in a sequence database. In the proposed
framework of TiWS pattern mining, the weight of each
sequence in a sequence database was first obtained
from the time-intervals of elements in the sequence and
subsequently TiWS patterns were found considering the
weight.
Mining weighted association rules, weighted
sequential patterns, weighted closed patterns, frequent
patterns with dynamic weights, weighted graphs and
weighted sub-trees or sub arrangements has been
4785 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
proposed by Yun and Ryu (2011). In earlier approaches
of weighted repeated pattern mining, weighted supports
of samples were precisely equivalent to reduce
weighted infrequent samples. The small alter in weights
or hold ups of items distress the effect sets critically in
the loud background on the other hand. This could
formulate the weighted repeated samples less helpful in
the loud environment. In this recommended method,
they suggest the vigorous notion of mining fairly
accurate weighted repeated samples. An estimated
factor is delineated to loosen up the necessity for
precise impartiality between weighted supports of
samples and a least threshold based on the structure of
weight based sample mining. To discover significant
patterns with/with no the loud data we concentrate on
the idea of mining estimated weighted repeated patterns
subsequently.
Shie et al. (2012) have Data stream mining is an
appearing study theme in the data mining field. One of
the most significant chores in data stream mining with
extensive functions like online e-business and web
click-stream analysis is the Finding frequent item-sets.
Conversely, two most significant problems presented in
related studies:
•
•
The utilities (e.g., importance or profits) of items
are not regarded. In repeated item-sets real utilities
of samples cannot be replicated.
Presented usefulness mining techniques generate
moreover numerous samples and this makes it
difficult for the customer’s to strain useful patterns
with the vast set of patterns.
In outlook of this, to discover maximal high usefulness
item-sets from information streams with dissimilar
models, i.e., landmark, descending window and time
vanishing models we pro-pose a novel framework,
named GUIDE (Generation of maximal high Utility
Item-sets from Data Streams), in this study. The
suggested arrangement, named MUI-Tree (Maximal
high Utility Item-set Tree), upholds necessary data for
the mining processes and the suggested approaches
more facilitates the presentation of GUIDE. Major
involvements of this thesis are as follows:
•
•
•
To the greatest of our information, this is the initial
work on mining the compressed form of high
usefulness samples from information streams.
GUIDE is a successful one-pass structure which
gathers the obligations of information stream
mining.
GUIDE produces new samples which are not only
high usefulness but in addition maximal, which
offer compressed and perceptive unseen data in the
data streams.
Under different circumstances in information stream
surroundings on dissimilar models test effects
demonstrate that our approach outperforms the state-ofthe-art algorithms.
A technique to regard as further measures, such as
profits or costs according to customer first choice has
been presented by Lin et al. (2012a). A two-phase
mining algorithm was planned for fast discerning high
usefulness item-sets from databases in the suggested
method. The approach requires progressing all the
operations in a group way when information comes
now and then. To take care of the above condition an
incremental mining algorithm for efficiently mining
high utility item-sets was suggested in this proposed
method. First of all in the proposed approach item-sets
are divided into four parts in accordance with their high
transaction-weighted utilization item-sets in the original
database and in the recently inserted transactions. Then
each part has its own procedure to be executed. In the
intermittent data environment it has been proved by
experimental results that the proposed algorithm
functions quicker than the two-phase batch mining
algorithm.
A proposal is made to find out recent HUPs by
using a sliding window by Ahmed et al. (2012).
However, they suffer from the level-wise candidate
generation-and-test problem. Consequently, they
require a large quantity of completing time and
memory. Furthermore, their data structures were not
appropriate for interactive mining. A novel tree
structure, called HUS-tree (High Utility Stream tree)
and a new algorithm, called HUPMS (high utility
pattern mining over stream data) for incremental and
interactive HUP mining over data streams with a sliding
window was proposed to resolve these problems of the
presented algorithms. Our HUPMS algorithm can mine
all the HUPs in the current window with a pattern
growth approach by capturing the important
information of stream data into an HUS-tree. Moreover,
HUS-tree was very well-organized for interactive
mining. The algorithm was extremely competent for
incremental and interactive HUP mining over data
streams and considerably outperforms the existing
sliding window-based HUP mining algorithms which
have been shown by extensive performance analyses.
A proposal for enhancing the efficiencies of data
mining was proposed by Tseng (2013). This proposed
methodology, hierarchical partitioning, for mining
frequent item-sets in large databases is based on a new
data structure called the Frequent Pattern List (FPL).
The capability to partition the database and thus
transform the database into a set of sub-databases of
manageable sizes is one of the major features of the
FPL. Accordingly, a divide-and-conquer approach can
be created to perform the desired data-mining tasks.
The hierarchical partitioning was able to mining
frequent item-sets and frequent closed item-sets in
extremely huge databases which has been proved by
experimental results.
4786 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
A proposal is made for frequent pattern mining
over data streams by Nori et al. (2013). Inside
descending window mining repeated sealed item-sets as
an alternative of whole set of repeated item-set was
motivating as it required a restricted sum of memory
and processing control. It needs stretchy and competent
information arrangements with instinctive algorithms
on the other hand. We have brought in an successful
and competent algorithm for sealed repeated item-set
mining above information streams functioning in the
sliding window model in this thesis. For storing
operations of the window and equivalent repeated
sealed item-sets this algorithm applies a new
information configuration. A mature sample was
eliminated from the watching set when it was no longer
repeated sealed item-set and the hold up of a novel
repeated sealed item-set is competently calculated
additionally. Wide-ranging tests on both actual and
artificial information streams demonstrate that the
suggested algorithm was better-quality to formerly
developed algorithms in terms of runtime and memory
procedure.
Literature survey on other data mining techniques:
Standard algorithms for organization regulation mining
based on recognition of repeated item-sets have been
proposed by Zhong (2007). They learn how to keep
privacy in dispersed mining of repeated item-sets in this
suggested method. So as to was, they learn how two (or
more) parties can discover repeated item-sets in a
dispersed database with no exposing every party’s
section of the information to the other. The presented
solution for flat partitioned information only efforts for
three parties or more whereas the presented solution for
upright partitioned information seeps out an important
amount of data. With cryptographically well-built
privacy, we intend algorithms for both upright and flat
partitioned information in this document. An algorithm
for flat divided information works for two parties and
more than and is more competent than the presented
solution.
Valtchev et al. (2008) have proposed repeated
Item-sets (FIs), whereby the quantity of FIs may be
potentially vast. When a novel operation was added to a
database whose mining effects are accessible they
demonstrate here how FCI scan be mined incrementally
so far competently. Our approach for taking out FIs in
active databases relies on current effects about lattice
incremental rearranging and lattice erection. The basics
of the incremental FCI mining mission are conversed
and their diminutions to the trouble of lattice
modernize, via the CI family, was prepared explicit.
The associated structural effects inspire two algorithms
for updating the set of FCIs of a known TDB upon the
incorporation of a novel operation. An uncomplicated
technique searches for essential achievements right
through the whole CI family, while a next technique
develops lattice properties to bound the search to CIs
which share at least one thing with the novel
transaction. Collectively with a set of effects from an
opening study of the method’s practical presentations
competent completions of the parsimonious technique
was conversed in the study.
A proposal is made on an efficient algorithm,
called ICMiner (Inter-transaction Closed patterns
Miner), for mining closed inter-transaction item-sets by
Lee et al. (2008). There are two phases in this proposed
algorithm. First foremost to find out the frequent items
they scan the database once. The ICMiner converts the
original transaction database into a set of domain
attributes, called a dataset for each frequent item found.
Enumeration is done then for the closed intertransaction item-sets by means of an item-set dataset
tree, called an ID-tree. To keep away from costly
applicant production and frequent support counting, the
ICMiner can be implant successful pruning approaches,
by using the ID-tree and datasets to mine sealed intertransaction item-sets. The research effects demonstrate
that the suggested algorithm outperforms the EHApriori, FITI, clogged PROWL and ITP-Miner
algorithms in majority cases.
A new solitude defense technique for distance
based mining algorithms that provides most awful case
privacy guarantees and guards the information beside
correlation-based and transform-based assaults has been
offered by Mukherjee et al. (2008). The subsequent
three new features are in this technique. Initially, to
offer theoretical bound of solitude break in the most
awful case, this technique employs a structure. This
structure offers effortless to make sure circumstances
that one can decide whether a technique offers most
awful case assurance. Rapid type of assessment shows
that unusual types of noise such as Laplace noise offer
worst case guarantee, while the majority of existing
techniques such as addition of normal or uniform noise,
with random projection method do not offer worst case
guarantee. Secondly, the proposed method combines
the positive features of additive perturbation and
orthogonal transform methods. Thirdly, the proposed
method develops precision of one of the popular
distance-based classification algorithms: K-nearest
neighbor classification, by considering the degree of
distance distortion introduced by sanitization. The
efficiency of the proposed method can be proved by
wide-ranging experiments.
For the improvement in the efficiency of finding
large item-sets a mining algorithm is presented by Hong
et al. (2009). Our technique considers the data
dependency in the given transactions to forecast
capable and non-capable candidate item-sets based on
the idea of prediction proposed in the (n, p) algorithm.
This projected method estimates for each level a diverse
support threshold that is a derivative from a data
dependency parameter and determines if an item should
be integrated in a capable candidate item-set directly.
Herein the proposed method, they preserve the
4787 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
competence of finding large item-sets by lessening the
number of scanning the input dataset and the number
candidate items. When the minimum support value is
small experimental outcome illustrate that our method
has a superior effectiveness than the apriori and the (n,
p) algorithms.
A Mining closed frequent item-sets from data
streams of interest has been proposed recently by Li
and Chen (2009). Deciding an appropriate least support
threshold was however not easy for users. Therefore, it
was more practical to request users to set a bounce on
the result size. Hence, an interactive single-pass
algorithm, called TKC-DS (Top-K frequent Closed
item-sets of Data Streams), was proposed for mining
top-K closed item-sets from data streams proficiently.
A new data structure, called CIL (Closed Item-set
Lattice), was created for maintaining the necessary
information of closed item-sets generated until now.
The proposed TKC-DS algorithm was a competent
method for mining top-K frequent item-sets from data
streams which has been shown by experimental results.
A mining algorithm for discovering repeated itemsets over the transactional information stream has been
proposed by Jea and Li (2009). This technique labors
based on the hypothesis of Approximate Inclusion–
Exclusion not like most of the presented algorithms.
They can fairly accurate the item-sets’ counts according
to assured kept data and the counts bounding technique
devoid of incrementally upholding the overall summary
of the stream. A few other techniques were planned and
incorporated into the algorithm for presentation
development. In addition, the presentation of the
suggested algorithm was experimented and examined
via a cycle of researches.
A competent Graph-Based Mining (GBM)
algorithm for mining the repeated curve samples in a
spatial-temporal database has been proposed by Lee
et al. (2009). The offered technique consists of two
stages. First, to produce a mapping graph and course
Information records (TI-lists) they scrutinize the
database one time. After that, to mine all repeated
course samples in the database they go across the
mapping graph in a depth-first seek manner. The GBM
algorithm can focus support count up and sample
addition in a petite number of TI-lists by means of the
mapping chart and TI-lists. Besides, it makes use of the
adjacency property to diminish the seek out space. So,
they suggested technique can competently mine the
repeated course samples in the database. The test
effects demonstrate that it do better than the Aprioribased and PrefixSpan-based techniques by more than
one array of degree.
A frequent-pattern mining algorithms for
intimately reliant on information being holded, i.e.,
sparse or dense has been presented by Hamrouni et al.
(2009). With reverence to the withdrawal mission and
the gained compression rates and for other information
mining methods such as clustering and for the mining
algorithms of dissimilar sample groups such as hyper
graphs the similar condition applies to the competence
of brief demonstrations of repeatedly happening
samples. They offer an original categorization of
framework sparseness, in addition to a novel sparseness
measure which effects from the aggregation of two
corresponding measures, namely the briefness and
trimness measures of every similarity class, encouraged
by the Galois closure operator. Tests solved principally
reach a finer categorization of target circumstances and,
after that, authenticate our point of view that the
‘‘dense’’ and ‘‘sparse’’ markers were not complete.
For creating a covered edition of information that
conserves both entity solitude and data utility for group
study, a realistic information circulating structure has
been proposed by Fung et al. (2009). Tests on real-life
information propose that by spotlighting on preserving
group arrangement in the covering procedure, the group
value was considerably improved than the group value
of the covered information with no such spotlight. To
direct the covering procedure the main dispute of
covering information for group study was the need of
class tags that can be applied. For categorization study,
an approach changes the problem into the equivalent
problem in which class tags instruct the group
arrangement in the information and gives a structure to
assess the group value on the covered information.
Alzghoul and Lofstrand (2011) have proposed
OCSVM technique, not astonishingly, superior to the
other two techniques, except in industrial functions the
OCSVMs’ relatively long time required for preparation
was a feasible dispute. The technique can be
effortlessly customized to be incremental and the major
benefit of the grid-based categorization technique was
that it permits for computation of the possibility (%)
that an information point belongs to a particular class.
To notice the breakdowns at a premature stage, the high
categorization precision could be developed thus raising
the consistency and therefore the accessibility of the
artifact (since availability is a function of
maintainability and reliability). Besides, the results of
equipment breakdowns in terms of time and cost can be
moderated.
Guns et al. (2011) have this document and gives a
declarative restraint training approach to information
mining. We demonstrate that it is feasible to use offthe-shelf constraint programming techniques for
modeling and working out an extensive range of
constraint-based item-set mining tasks, such as
frequent, closed, discriminative and cost-based item-set
mining further distinctively. To understand the other
locations we expand a fundamental restraint
programming model for indicating repeated item-sets
and demonstrate that this model can simply be
expanded specifically. These differences with
distinctive procedural information mining schemes
where the fundamental process requires to be altered in
order to contain novel kinds of restraint, or new
4788 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
mixtures thereof. We moreover demonstrate that there
live problems where the restraint training advance
guides to important presentation developments over
state-of-the-art techniques in information mining and in
addition to novel imminent into the fundamental
information mining problems, although the presentation
of state-of-the-art information mining schemes do better
than that of the restraint programming approach on a
few ordinary missions. By connecting the fundamental
seek algorithms of information mining and restraint
programming schemes to one another a lot of such
imminent can be achieved. Through the declarative
restraint programming approach to information mining,
we converse a quantity of exciting novel explores
querys and disputes lifted.
Organization regulation mining is a significant
information study technique for the finding of
organizations inside data has been given by Weng
(2011). A lot of studies spotlighted on discovering
unclear organization regulations from operation
databases are there. In the actual globe, one may have
accessible moderately irregular information, plus
repeated information regrettably. We are able to
discover a set of odd item-sets that will be helpful for
teachers to discover out which scholars require further
assist in education from irregular information. To verify
the significance of a regulation compiled of occurrence
based information items, this is not enough. They
expand an algorithm based on the Apriori approach to
mine unclear exact odd item-sets from quantitative
information to cure this problem. Unclear organization
regulations can be produced from these unclear exact
odd item-sets at last. To find out education problems
the samples were helpful. Tests effects demonstrate that
the suggested approach was capable to find out
fascinating and precious samples from the review
information.
An automatic summary of clusters and entities
common follow in information culture has been
presented by Schermer (2011). They offer a general
idea of the hazards related with information mining and
the approaches that have been suggested over the years
to moderate these hazards in this technique.
Commencing there we shall inspect whether existing
defends that are chiefly based on solitude and
information protection law (such as data minimization
and data exclusion) are enough. They will propose
substitute strategy choices based on these results and
dictatorial appliances for dealing with the hazards of
data mining, incorporating plans from the meadow of
computer science and that of law and ethics.
A statistics representation called NC_set, which
remains path of the entire data applied for mining
erasable item-sets has been presented by Deng and Xu
(2012). They suggest a novel algorithm named MERIT
for mining erasable item-sets competently based on
NC_set. The effectiveness of MERIT is attained with
three methods as follows. Primarily, the NC_set was a
compressed arrangement, which trims immaterial
information routinely. Next, the calculation of the
increase of an item-set was changed into the
amalgamation of NC_sets, which can be finished in
linear time difficulty by an inventive approach. Third,
MERIT could straightly locate erasable item-sets
devoid of making applicant item-sets in a few cases.
They have performed widespread tests on a lot of
artificial artifact databases for assessing MERIT. shown
that The MERIT was competent and was on standard
about two orders of degree faster than the META, was
shown by the presentation learn and the initial
algorithm for taking out erasable item-sets.
To considerably diminish the number of potential
item-sets in the initial step, an algorithm named UMMI
(high Utility Mining using the Maximal Item-set
property) has been proposed by Lin et al. (2012b).
UMMI exploits an efficient lexicographic tree
arrangement to decide all of the high usefulness itemsets in the next step. UMMI do better than all three of
the formerly applied algorithms, together with CTUPRO, an optimized TWU-mining algorithm and TwoPhase, in tests by means of artificial datasets in
common. On regular, UMMI is 5, 3 and 7 times faster
than CTU-PRO, TWU-mining and Two-Phase,
correspondingly. UMMI is 6 times faster than TwoPhase in an actual information test. In a rational amount
of time the extra two algorithms are not competent of
completing the mining footstep. An approximately
predetermined quantity of memory was used by UMMI
that was usually less than the other algorithms for each
mining. The proposed algorithm could mine the high
utility item-sets efficiently which was shown by
experimental results. The great number of probable
item-sets in the first step was usually the mining
bottleneck. The mining performance could improve
drastically provided they can lessen the number of
potential item-sets,
The practical problem of frequent-item-set
discovery in data-stream environments which possibly
will suffer from data overload has been discussed by Li
et al. (2012). Frequent-pattern mining and dataoverload handling are the two main issues. Hence, a
mining algorithm jointly with two committed overloadhandling mechanisms was projected. The algorithm
keeps the information in its data structure after
extracting basic information from streaming data. We
have found out from the experimental data that our
mining algorithm is efficient and possesses good
accuracy.
Prominently,
the
overload-handling
mechanisms could effectively manage data overload.
Our research results may well show the way towards a
practicable solution for frequent-pattern mining in
dynamic data streams.
Mining repeated item-sets from information
streams by the model of descending window has been
proposed by Dai and Chen (2012). For mining the
repeated item-sets from information streams this
suggested technique offered an algorithm AFPCFI-DS.
Using a FP-tree in every descending window the
algorithm identifies the repeated items. To evade the
4789 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
time-consuming functions of exploring in the full tree
to add or erase operations the algorithm implements
local revising approach. The test effects demonstrate
that the algorithm was more competent and had lower
time and memory complication than the algorithms
Moment and FPCFI-DS.
Sensitive knowledge thrashing is the crisis of
eliminating perceptive information from databases
previous to publishing has been proposed by Abul and
Gokce (2012). To conceal repeated item-sets and
organization regulations the problem was widely
studied in the milieu of relational databases. Newly,
chronological sample thrashing from sequential (both
sequence and spatio-temporal) databases has been
examined. Novel forms of information and databases
should be addressed as well with the ever rising flexible
function orders.
Hamrouni et al. (2013) have brought in a novel
precise short depiction of repeated item-sets. This
depiction was based on an examination of the
disjunctive hunt space. The disjunctive item-sets
communicate data regarding the corresponding
happening of items in a dataset. The suggested worker
plans at mapping a lot of disjunctive item-sets to a
single one, named a disjunctive seald item-set. Inside
the indented representation it allows to radically
diminish the number of handled item-sets consequently.
Fascinatingly, the suggested depiction presents straight
contact to the disjunctive and pessimistic holds of
repeated item-sets while making sure the source of their
precise conjunctive supports. They bring to a close
from the test effects stated and conversed here that
depiction was successful and resonance in similarity
with dissimilar other brief depictions.
RESULT ANALYSIS
FIDS algorithm with lever item-sets in information
streams and they find a way around user-defined
occurrence and sequential brinks has been proposed by
Raissi et al. (2007). Based on our FIU-tree arrangement
for mining total repeated item-sets and they find the
FIUT and the FP-growth amplified value when the
number of operations amplified, FIUT method a has
been proposed by Tsaya et al. (2009). Grahne and Zhu
(2005) have proposed FP-array method that permits
using FP-trees more competently when mining repeated
item-sets and they attained fastest algorithms and
devour less memory than other techniques when the
information sets are solid. Based on the quantity of
revealed receptive data, Sekhavat and Fathian (2010)
have put forward Privacy conserving data mining
(PPDM) algorithms effort to diminish the injuries to
solitude caused by nasty parties during the regulation
mining process and they provide quantifies solitude.
Fast Utility Mining (FUM) algorithm has been
proposed by Shankar et al. (2009) and they demonstrate
that FUM algorithm is quicker, more precise and more
dependable and best set for area driven data mining
functions than the presented UMining algorithm.
To attain a frequency-descending highly
compressed prefix-tree arrangement by a single-pass
they get competent in terms of memory and time
complication when discovering new frequent pat-terns
from a high-speed information stream, CPS-tree, an
algorithm that uses tree reforming has been proposed by
Tanbeer et al. (2009). Non-derivable repeated item-set
mining over the descending window of a stream and
they find dissimilar quantity of NDFIT nodes, the
number of CETnodes produced by MOMENT and
every one of the repeated item-sets has been offered by
Li and Chen (2006). Repeated sample mining
investigate has sub-statically widened the range of
information study and they get profound contact on
information mining methodologies and functions in the
long run has been proposed by Han et al. (2007). To
discover the worldwide fairly accurate sealed many
item-sets over the information streams with the Maxfrequency Window model an algorithm called
GGACFI-MFW has been proposed by Guo et al.
(2011). Liu et al. (2008) have proposed repeated itemsets, which applies an optimistic border as a substitute
of a pessimistic border collectively with repeated
generators to outline a lossless brief representation and
they accomplished 3 times larger than the number of
repeated sealed item-sets on pumsb with least support
of 50%.
Chen et al. (2010) watch a common direction that
the best option of MinSup can be set between 3 and 6%
and Fuzzy Frequent Item-set-Based Hierarchical
Clustering (F2 IHC) approach has been proposed by
them. Li and Lee (2009) have proposed MFI-TimeSW
algorithm to mine the set of repeated item-sets
competently over time-sensitive descending windows
they get extremely precise mining effects, however
moreover run important more rapidly and devour less
memory.
BitTableFI
algorithm attains
good
presentation expanded by reducing the cost of applicant
production and support counting has been proposed by
Songa et al. (2008). Li et al. (2012) have proposed
single-pass algorithm Moment to excavate the set of
sealed repeated item-sets over information streams with
a transaction-sensitive descending window and they
obtain highly precise mining results, although moreover
run important faster and devour less memory than
accessible algorithm. Single-pass algorithm named
DSM-MFI which mines the set of all maximal repeated
item-sets over the complete history of the streaming
information and find incremental maintaining the
necessary information about maximal repeated itemsets implanted in the stream has been offered by Li and
Lee (2009). A technique based on major number that is
planned particularly for interactive mining of repeated
samples and them able to diminish the number of
applicant samples and comparisons by means of
competent pruning methods has been offered by Zaki
and Hsiao (2005).
Two algorithms for utility based item-set mining
are developed by integrating these pruning approaches
and they find out high usefulness item-sets competently
4790 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
has been proposed by Yao and Hamilton (2006). Tsaya
et al. (2009) have offered Isolated Items Discarding
Strategy (IIDS) to recognize lonely items from
operations and the presentation of FUM and DCG+ is
more resourceful than that of ShFSM and DCG,
correspondingly. Usefulness item-sets with negative
item assessments in large databases and HUINIV-Mine
delivers a standard development about 99.2% over
MEU method in terms of implementation presentation
has been proposed by Chu et al. (2009). Hu and
Mojsilovic (2007) have suggested repeated item set
mining that recognizes high-utility item mixtures and
they find moderately common and can effortlessly
account for dissimilar item-attributes and purpose
utilities. A structure, namely GUIDE, for competently
mining maximal high usefulness item-sets from data
streams and they obtain precision of GUIDE is not
100% as the transaction-projection offers unfinished
data for GUIDE has been proposed by Shie et al.
(2012). The standard accuracy is about 83%.
An algorithm THUI-Mine, which can find out
sequential high usefulness item-sets from information
streams competently and successfully and they obtain
the implementation time ratio of the scale-up tests with
least support thresholds assorted from 0.6 to 1% stays
steady at just about 0.4% has been proposed by Chu
et al. (2008). Lan et al. (2011) have offered a twophased mining algorithm to successfully and
competently find out high on-shelf usefulness item-sets
they get T10I4N4K datasets for dissimilar sizes ranging
from 100 to 500 K when k is set at 0.2%. An algorithm
Usefulness model tree (HUP tree) and then obtain the
minimum usefulness threshold was set at from 70 to 90,
with 5% increase every time has been proposed by Lin
et al. (2012a). Tzung-Pei Hong et al. have suggested a
two-phase mining algorithm to find out high averageutility item-sets and they get made known an enhanced
usefulness result of combining numerous items than the
unique usefulness assess.
An incremental mining algorithm for competently
mining high usefulness item-sets is suggested to grip
the above condition and they obtain the minimum
usefulness thresholds were then place from 1.0 to 0.2%
(decrease 0.2% each time) has been proposed by Lin
et al. (2012a). Ahmed et al. (2012) have suggested
High Usefulness Pattern (HUP) mining over
information streams has turn into a taxing research
subject in data mining and find the applicants of
threshold 6% are a division of the applicants of
threshold 5%. A methodology, hierarchical partitioning,
for mining repeated item-sets in large databases, based
on a new information structure named the Frequent
Pattern List (FPL) and they obtain normal run time of
FPLCI-Miningis 246 sec, whereas the standard run time
of partition-based FPL_HPFCI-Mining is only 187 sec
an development of 32% has been proposed by Tseng
(2013).
CHARM, an competent algorithm for mining all
repeated sealed item-sets and offer CHARM and
CHARM-L can supply orders of degree development
over presented techniques for mining sealed item-sets
has been proposed by Zaki and Hsiao (2005). Erwin
et al. (2007) have offered CTU-PROL algorithm to
mine the entire set of high usefulness item-sets from
both thin and comparatively thick datasets with small or
longer high usefulness samples. An approach called T
Moment for mining all sealed repeated item-set inside
descending window over information streams and they
obtain assessments on actual and non-natural datasets
has been proposed by Nori et al. (2013).
FI mining apply the sealed item-set paradigm to
bound the mining attempt to a subset of the whole FI
family, the repeated sealed Item-sets (FCIs) and find
bound the search to CIs which go halves at least one
item with the novel operation has been proposed by
Valtchev et al. (2002). Guo et al. (2011) have
recommended a stretchy stream mining approach for
hot chirp subject finding and they obtain FP-Stream
assessment to mine repeated samples in twitter dataset.
Xinjun Qi and Mingkui Zong have Mobile data mining
and data stream mining relating to about solitude in
information mining which is an assuring way and they
carry out, information usefulness, solitude guard degree
and data mining complexity. An algorithm, operation
weighted usefulness of sample calculated by means of
the anticipated series and they find a competent and
faster, since it searches for usefulness samples in the
diminished search space with anticipated series have
been proposed by Thilagu and Nadarajan (2013).
The idea of weighty item-set and they find apriori
algorithm that make use of known compilation of heavy
item-sets and notices more heavy item-sets, not
unavoidably displace with the specified ones and of
course the outstanding organization regulations has
been proposed by Palshikar et al. (2007). To work out
the fairly accurate supports of item-sets by means of the
Principle of Inclusion and Exclusion a support
approximation method, CAC algorithm has been
proposed by Jea and Li (2009). Along with they obtain
CAC on standard increases by 23.46% mining
exactness and by 39.90% system presentation compared
with its precursor.
EH-Apriori, FITI, Closed PROWL and ITP-Miner
algorithms in the majority case and they acquire
enhanced presentation in sealed inter-transaction
samples in a depth-first search way has been proposed
by Lee et al. (2008). Guns et al. (2011) have offered
aggressive with existing focused systems for constraintbased item-set mining and offers improved effect.
Hamrouni et al. (2013) have suggested two
corresponding measures, namely the succinctness and
compression measures of every similarity class,
provoked by the Galois closure worker and give
viewpoint that the ‘‘dense’’ and ‘‘sparse’’ labels are not
complete. GBorder2, a competent algorithm for mining
the generator demonstration in active datasets and they
find the amount of informs should be much less that
50% of D has been proposed and assessed by Xu and
Xie (2006). They applied a support threshold of 0.02%
for the T10 dataset and 0.12% for the kosarak dataset.
4791 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
Mining algorithm FARIM which can be employed
to find out low-rank fuzzy rare item-sets and they find
the possibility of the suggested FARIM algorithm for
learning data has been proposed by Weng (2011). Hong
et al. (2009) proposed mining algorithm that merges the
benefits of the apriori and the (n, p) algorithm in
discovering large item-sets and presentation the (n, p)
algorithm. Single-pass algorithm, named TKC-DS (topK repeated sealed item-sets of data streams) and TKCDS algorithm applies less memory than that of both
algorithms Moment-K and NewMoment-K has been
proposed by Li (2009). Deng and Xu (2012) have
suggested NC_sets for accumulating compacted, critical
data about item-sets of a creation database and they get
standard creation size and the possibility division of
product yield steady and differ the database size from
20 to 100 k. An algorithm AFPCFI-DS for mining the
repeated item-sets from information streams and they
find Moment requires the majority memory and
AFPCFI-DS wants the slightest has been presented by
Dai and Chen (2012). AFPCFI-DS is clearly faster than
Moment, however a petite slower than FPCFI-DS.
CONCLUSION
In this study, we have performed an extensive
survey about different data mining techniques and
algorithms used in the mining process. Numerous
efficient algorithms are available in the literature for
mining frequent item-sets or patterns based on the
frequency and utility. In this study, we have presented a
brief review of the various approaches and algorithms
for mining of item-sets. Here the researches are
categorized based on the mining techniques and also an
introduction about data mining and its advantages are
presented. From this review, the researchers can able to
know about several techniques existing in data mining.
RECOMMENDATIONS
In the review work, we will present a deeper
insight into the different data mining techniques that
used to mine the significant information from the
databases based on the patterns (or) item sets frequency
or utility. Here all the proposed methods are working
efficiently, but these techniques not considered the item
sets frequency and utility in the future i.e., these works
do not provide assurance that the extracted patterns will
continue to provide the same level of utility and
frequency in the future. As a result of this review, paper
will be supportive for the researchers to improve the
concentration on this data mining techniques by
considering the item sets or patterns utility and
frequency in the future also.
REFERENCES
Abul, O. and H. Gokce, 2012. Knowledge hiding from
tree and graph databases. Data Knowl. Eng., 72:
148-171.
Ahmed, C.F., S.K. Tanbeer, B. Jeong and H. Choi,
2012. Interactive mining of high utility patterns
over data streams. Expert Syst. Appl., 39(15):
11979-11991.
Alzghoul, A. and M. Lofstrand, 2011. Increasing
availability of industrial systems through data
stream mining. Comput. Ind. Eng., 60(2): 195-205.
Bhattacharya, S. and D. Dubey, 2012. High utility itemset mining. Int. J. Emerg. Technol. Adv. Eng., 2(8):
476-481.
Chang, J.H., 2011. Mining weighted sequential patterns
in a sequence database with a time-interval weight.
Knowl-Based Syst., 24(1): 1-9.
Chen, C., F.S.C. Tseng and T. Liang, 2010. Mining
fuzzy frequent item-sets for hierarchical document
clustering. Inform. Process. Manag., 46(2):
193-211.
Chu, C., V.S. Tseng and T. Liang, 2008. An efficient
algorithm for mining temporal high utility item sets
from data streams. J. Syst. Software, 81(7):
1105-1117.
Chu, C., V.S. Tseng and T. Liang, 2009. An efficient
algorithm for mining high utility item-sets with
negative item values in large databases. Appl.
Math. Comput., 215(2): 767-778.
Dai, C. and L. Chen, 2012. An algorithm for mining
frequent closed item-sets in data stream. Phys.
Procedia, 24(3): 1722-1728.
Deng, Z. and X. Xu, 2012. Fast mining erasable itemsets using NC_sets. Expert Syst. Appl., 39(4):
4453-4463.
Deypir, M. and M.H. Sadreddini, 2011. An efficient
algorithm for mining frequent item-sets within
large windows over data streams. Int. J. Data Eng.,
2(3): 119-125.
Erwin, A., R.P. Gopalan and N.R. Achuthan, 2007.
CTU-mine: An efficient high utility item-set
mining algorithm using the pattern growth
approach. Preceding of the 7th IEEE International
Conference on Computer and Information
Technology. Aizu-Wakamatsu, Fukushima, pp:
71-76.
Erwin, A., R.P. Gopalan and N.R. Achuthan, 2008.
Efficient mining of high utility item-sets from large
datasets. Lect. Notes Artif. Int., 5012: 554-561.
Fung, B.C.M., K. Wang, L. Wang and P.C.K. Hung,
2009. Privacy-preserving data publishing for
cluster analysis. Data Knowl. Eng., 68(6): 552-575.
Geurts, K., I. Thomas and G. Wets, 2005.
Understanding spatial concentrations of road
accidents using frequent item sets. Accident Anal.
Prev., 37(4): 787-799.
Gopalan, R.P. and Y.G. Sucahyo, 2002. TreeITL-mine:
Mining frequent item-sets using pattern growth, tid
intersection and prefix tree. Lect. Notes Comput.
Sc., 2557: 535-546.
Grahne, G. and J. Zhu, 2005. Fast algorithms for
frequent item-set mining using FP-trees. IEEE
T. Knowl. Data En., 17(10): 1347-1362.
4792 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
Guns, T., S. Nijssen and L.D. Raedt, 2011. Item-set
mining: A constraint programming perspective.
Artif. Intell., 175(12-13): 1951-1983.
Guo, L., H. Su and Y. Qu, 2011. Approximate mining
of global closed frequent item-sets over data
streams. J. Frankl. Inst., 348(6): 1052-1081.
Hamrouni, S.T., E.B. Yahia and M. Nguifo, 2009.
Sweeping the disjunctive search space towards
mining new exact concise representations of
frequent item-sets. Data Knowl. Eng., 68(10):
1091-1111.
Hamrouni, T., S.B. Yahia and E.M. Nguifo, 2013.
Looking for a structural characterization of the
sparseness measure of (frequent closed) item-set
contexts. Inform. Sciences, 222: 343-361.
Han, C., L. Xu and G. He, 2008. Mining recent frequent
item-sets in sliding windows over data streams.
Comput. Inform., 27: 315-339.
Han, J., H. Cheng, D. Xin and X. Yan, 2007. Frequent
pattern mining: current status and future directions.
Data Min. Knowl. Disc., 15(1): 55-86.
Hong, T., C. Lee and S. Wang, 2011. Effective utility
mining with the measure of average utility. Expert
Syst. Appl., 38(7): 8259-8265.
Hong, T., C. Horng, C. Wu and S. Wang, 2009. An
improved data mining approach using predictive
item-sets. Expert Syst. Appl., 36(1): 72-80.
Hu, Y. and Y. Chen, 2006. Mining association rules
with multiple minimum supports: A new mining
algorithm and a support tuning mechanism. Decis.
Support Syst., 42(1): 1-24.
Hu, J. and A. Mojsilovic, 2007. High-utility pattern
mining: A method for discovery of high-utility
item sets. Pattern Recogn., 40(11): 3317-3324.
Jea, K. and C. Li, 2009. Discovering frequent item-sets
over transactional data streams through an efficient
and stable approximate approach. Expert Syst.
Appl., 36(10): 12323-12331.
Kannimuthu, S., K. Premalatha and S. Shankar, 2011.
iFUM: Improved fast utility mining. Int.
J. Comput. Appl., 27(11): 32-36.
Lan, G., T. Hong and V.S. Tseng, 2011. Discovery of
high utility item-sets from on-shelf time periods of
products. Expert Syst. Appl., 38(5): 5851-5857.
Laxman, S. and P.S. Sastry, 2006. A survey of temporal
data mining. Sadhana, 31(2): 173-198.
Lee, A.J.T., Y. Chen and W. Weng-Chong Ip, 2009.
Mining frequent trajectory patterns in spatialtemporal databases. Inform. Sciences, 179(13):
2218-2231.
Lee, A.J.T., C. Wang, W. Weng, Y. Chen and H. Wu,
2008. An efficient algorithm for mining closed
inter-transaction item-sets. Data Knowl. Eng.,
66(1): 68-91.
Li, H., 2009. Interactive mining of top-K frequent
closed item-sets from data streams. Expert Syst.
Appl., 36(7): 10779-10788.
Li, H. and S. Lee, 2009. Mining frequent item-sets over
data streams using efficient window sliding
techniques. Expert Syst. Appl., 36(2): 1466-1477.
Li, H. and S. Lee, 2008. Approximate mining of
maximal frequent item-sets in data streams with
different window models. Expert Syst. Appl.,
35(3): 781-789.
Li, H. and H. Chen, 2009. Mining non-derivable
frequent item-sets over data stream. Data Knowl.
Eng., 68(5): 481-498.
Li, Y., J. Yeh and C. Chang, 2008. Isolated items
discarding strategy for discovering high utility
item-sets. Data Knowl. Eng., 64(1): 198-217.
Li, C., K. Jea, R. Lin, S. Yen and C. Hsu, 2012. Mining
frequent patterns from dynamic data streams with
data load management. J. Syst. Software, 85(6):
1346-1362.
Lin, C., T. Hong and W. Lu, 2011. An effective tree
structure for mining high utility item-sets. Expert
Syst. Appl., 38(6): 7419-7424.
Lin, C., G. Lan and T. Hong, 2012a. An incremental
mining algorithm for high utility item-sets. Expert
Syst. Appl., 39(8): 7173-7180.
Lin, M., T. Tu and S. Hsueh, 2012b. High utility pattern
mining using the maximal item-set property and
lexicographic tree structures. Inform. Sciences,
215: 1-14.
Liu, G., J. Li and L. Wong, 2008. A new concise
representation of frequent item-sets using
generators and a positive border. Knowl. Inf.
Syst., 17(1): 35-56.
Liu, X., J. Guan and P. Hu, 2009. Mining frequent
closed item-sets from a landmark window over
online data streams. Comput. Math. Appl., 57(6):
927-936.
Mao, G., X. Wu, X. Zhu and G. Chen, 2007. Mining
maximal frequent item-sets from data streams.
J. Inf. Sci., 33(3): 251-262.
Masseglia, F., P. Poncelet, M. Teisseire and
A. Marascu, 2008. Web usage mining: Extracting
unexpected periods from web logs. Data Min.
Knowl. Disc., 16(1): 39-65.
Meisel, S. and D. Mattfeld, 2010. Synergies of
operations research and data mining. Eur. J. Oper.
Res., 206(1): 1-10.
Mukherjee, S., M. Banerjee, Z. Chen and
A. Gangopadhyay, 2008. A privacy preserving
technique for distance-based classification with
worst case privacy guarantee. Data Knowl. Eng.,
66(2): 264-288.
Nadimi-Shahrakia, M.H., N. Mustapha, M.N. Sulaiman
and A. Mamat, 2011. Efficient prime-based method
for interactive mining of frequent patterns. Expert
Syst. Appl., 38(10): 12654-12670.
Nori, F., M. Deypir and M.H. Sadreddini, 2013. A
sliding window based algorithm for frequent closed
item-set mining over data streams. J. Syst.
Software, 86(3): 615-623.
4793 Res. J. Appl. Sci. Eng. Technol., 7(22): 4779-4794, 2014
Okada, Y., T. Tada, K. Fukuta and T. Nagashima, 2011.
Audio classification based on closed item-set
mining algorithm. Proceeding of International
Conference on Computer Information Systems and
Industrial Management Applications. AizuWakamatsu, Fukushima, 3: 159-164.
Otey, M.E., S. Parthasarathy, C. Wang, A. Veloso and
W.J. Meira, 2004. Parallel and distributed methods
for incremental frequent item-set mining. IEEE
T. Syst. Man Cyb., 34(6): 2439-2450.
Palshikar, G.K., M.S. Kale and M.M. Apte, 2007.
Association rules mining using heavy item-set.
Data Knowl. Eng., 61(1): 93-113.
Patil, S.P., U.M. Patil and S. Borse, 2012. The novel
approach for improving apriori algorithm for
mining association rule. World J. Sci. Technol.,
2(3): 75-78.
Pillai, J. and O.P. Vyas, 2010. Overview of item-set
utility mining and its applications. Int. J. Comput.
Appl., 5(11): 9-13.
Raissi, C., P. Poncelet and M. Teisseire, 2007. Towards
a new approach for mining frequent item-sets on
data stream. J. Intell. Inf. Syst., 28(1): 23-36.
Schermer, B.W., 2011. The limits of privacy in
automated profiling and data mining. Comput. Law
Secur. Rev., 27(1): 45-52.
Sekhavat, Y.A. and M. Fathian, 2010. Mining frequent
item-sets in the presence of malicious participants.
Inst. Eng. Technol., 4(2): 80-92.
Shahnawaz, M., A. Ranjan and M. Danish, 2011.
Temporal data mining: An overview. Int. J. Eng.
Adv. Technol., 1(1): 20-24.
Shankar, S., T. Purusothaman, S. Jayanthi and N. Babu,
2009. A fast algorithm for mining high utility itemsets. Proceeding of the IEEE International Advance
Computing Conference. Patiala, pp: 459- 1464.
Shankar, S., T. Purusothaman, S. Kannimuthu and
K.V. Priya, 2010. A novel utility and frequency
based item-set mining approach for improving
CRM in retail business. Int. J. Comput. Appl.,
1(16): 87-94.
Shie, B., P.S. Yu and V.S. Tseng, 2012. Efficient
algorithms for mining maximal high utility itemsets from data streams with different models.
Expert Syst. Appl., 39(17): 12947-12960.
Song, M. and S. Rajasekaran, 2006. A transaction
mapping algorithm for frequent item-sets mining.
IEEE T. Knowl. Data En., 18(4): 472-481.
Songa, W., B. Yang and Z. Xu, 2008. IndexBitTableFI: An improved algorithm for mining
frequent item-sets. Knowl-Based Syst., 21(6):
507-513.
Sreenivasarao, V. and S. Vidyavathi, 2010.
Comparative analysis of fuzzy C-mean and
Modified Fuzzy Possibilistic C-mean algorithms in
data mining. Int. J. Comput. Sci. Technol., 1(1):
104-106.
Tanbeer, S.K., C.F. Ahmed, B. Jeong and Y. Lee, 2009.
Sliding window-based frequent pattern mining over
data streams. Inform. Sciences, 179(22):
3843-3865.
Tsay, Y. and J. Chiang, 2005. CBAR: An efficient
method for mining association rules. Knowl-Based
Syst., 18(2-3): 99-105.
Tsaya, Y., T. Hsua and J. Yub, 2009. FIUT: A new
method for mining frequent item-sets. Inform.
Sciences, 179(11): 1724-1737.
Tseng, F., 2013. Mining frequent item-sets in large
databases: The hierarchical partitioning approach.
Expert Syst. Appl., 40(5): 1654-1661.
Tsytsarau, M. and T. Palpanas, 2012. Survey on mining
subjective data on the web. Data Min. Knowl.
Disc., 24(3): 478-514.
Thilagu, M. and R. Nadarajan, 2013. Investigating
Significant Changes in Users’ Interest on Web
Traversal Patterns, International Journal on
Cybernetics and Informatics (IJCI), 2(4): 39-55.
Valtchev, P., R. Missaoui and R. Godin, 2008. A
framework for incremental generation of closed
item-set. Discrete Appl. Math., 156(6): 924-949.
Valtchev, P., R. Missaoui, R. Godin and M. Meridji,
2002. Generating frequent item-sets incrementally:
Two novel approaches based on Galois lattice
theory. J. Exp. Theor. Artif. In., 14(2-3): 115-142.
Weiss, G., M. Saar-Tsechansky and B. Zadrozny, 2005.
Report on UBDM-05: Workshop on utility-based
data mining. ACM SIGKDD Explor. Newslett.,
7(2): 145-147.
Weng, C., 2011. Mining fuzzy specific rare item-sets
for education data. Knowl-Based Syst., 24(5):
697-708.
Woo, H.J. and W.S. Lee, 2009. estMax: Tracing
maximal frequent item sets instantly over online
transactional data streams. IEEE T. Knowl. Data
En., 21(10): 1418-1431.
Xu, L. and K. Xie, 2006. Incremental mining of
generator representation using border sets. Inform.
Softw. Technol., 48(8): 756-764.
Yao, H. and H.J. Hamilton, 2006. Mining item-set
utilities from transaction databases. Data Knowl.
Eng., 59(3): 603-626.
Yun, U. and K.H. Ryu, 2011. Approximate weighted
frequent pattern mining with/without noisy
environments. Knowl-Based Syst., 24(1): 73-82.
Zaki, M.J. and C. Hsiao, 2005. Efficient algorithms for
mining closed item-sets and their lattice structure.
IEEE T. Knowl. Data En., 17(4): 462-478.
Zhong, S., 2007. Privacy-preserving algorithms for
distributed mining of frequent item-sets. Inform.
Sciences, 177(2): 490-503.
4794 
Download