Open34

advertisement
PAKDD 2006 Data Mining Competition
Approach and Understanding of the Problem
The problem that was given by PAKDD 2006 is a data mining problem in which data has
to be classified between two types of subscribers (2G/3G). The problem is a classification
problem. We are given the training set of data from PAKDD to train the classifier. We
used Supervised learning in which first the classifier is trained using the training data and
then it is used to classify the unknown or prediction data. We used Neural Network for
this classification problem.
Technical Details
We used a Multilayer Feed Forward Neural Network to classify given data set. It uses
backpropagation algorithm for learning. Our network has three layers namely 1) Input
layer 2) Hidden Layer 3) Output Layer. Before feeding the data to Neural network,
preprocessing step was performed. In the preprocessing, we eliminated all those attributes
which were irrelevant for the classification. Thereby reducing the data dimension. We
also encoded and normalized all the attributes so that they suit the requirements of Neural
network.
Classification Model
In our model, we used 224 features out of given 249 features. Hence, the Neural network
has 224 perceptrons in the Input layer. Hidden layer of model has 8 and Output layer has
1 perceptron. First, we performed the supervised training of Neural network using the
training data. Network was trained for 10000 loops.
Download