Participants are required to email to nnoriel@gmail

Summary for PAKDD competition Submitted by Data Mining Group Software and Computing Division, IHPC Understanding of the problem: A data set with 249 attributes is given in this classification problem. 18,000 samples with class labels are used for training and validation. 6000 samples are available for the evaluation of the classification performance. The purpose of the classification problem is to classify 2G and 3G customers. The classification model will be used to predict potential 3G customers. The attributes are the information of existing customers including mobile usage and demographic data. Full technical details of algorithm(s) used: a) Attribute ranking is implemented before training classifiers. Chi-squared Ranking Filter is used to rank the importance of attributes. b) Support vector machines are trained as classifiers with the top 10 attributes c) The linguistic rule extraction algorithm is used to extract rules describing classification decisions. The classification model produced: 5 SVM classifiers are generated, and the votes from 5 classifiers used as the final decision to classify the unlabelled data. Insights obtained from the classification model: The attribute subset which is composed of top 10 attributes can give better results than other attribute subsets for predicting samples (600 3G and 600 2G data are used for validation). 103 linguistic rules are obtained which are composed of 10 premises in “IF… THEN…” form. The rules can give 60% accuracy in prediction based on our validation

Participants are required to email to nnoriel@gmail

Related documents

Products

Support

Participants are required to email to nnoriel@gmail

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib