Distributed Classification Based on Association rules (CBA) algorithm

advertisement
Distributed Classification Based on Association rules (CBA) algorithm
By Shuanghui Luo
Data mining refers to extracting knowledge from large amounts of data. Classification
Based on Association rules (CBA) algorithm is an integration of two important data
mining techniques: Classification rule mining and association rule mining. The strength
of CBA is its ability to use the most accurate rules for classification. However, the
existing techniques based on exhaustive search face a challenge in the case of huge
amount data due to its computation complexity. CBA deals with centralized databases. In
today’s Internet environment, the databases may be scattered over different locations and
heterogeneous. We will combine CBA and distributed techniques to develop a distributed
CBA algorithm to mine distributed and heterogeneous databases.
The goal of this research is to improve the scalability and performance of CBA algorithm
and to apply it to distributed database environment. The first step is to survey and
understand both CBA algorithm and distributed techniques. To identify possible
bottlenecks in applying CBA to distributed environment, it is wise to exam the
performance of various CBA algorithms on the meta-database that hides the distributed
nature and appear as one integrated database. Common method in distributed computing
such as divide-and-conquer may then be combined with CBA algorithm to develop a
distributed CBA algorithm to mine distributed databases.
Download