Open24 - Nanyang Technological University

advertisement

PAKDD 2006 Data Mining Competition

Submitted by:

Kamal Bhatia

Pavan Kaniganti

Kapil warikoo

Sundar Saravana

(Oklahoma State University, Stillwater, OK - 74075)

1

Table of Content

Executive Summary ………………………………………………………………..... 3

Introduction…………...……………………………………………………… ………5

Business Understanding ……………………………………………………………..6

Data Understanding …………………………………………………………………6

Data Preparation …………………………………………………………………….7

Multi-colinearity……………………………………………………………………...7

Missing Values…………………………………………………….............................8

Sampling ………………………………………………………………………….….9

Stratified Sampling……………………………………………………………….…..9

Data Partition…………………………………………………………………….….10

Data Transformation………………………………………………………………...10

Modeling

…………………………………………………………………………....12

Logistic Regression………………………………………………………………....12

Decision Tree……………………………………………………………………… 14

Decision Tree using Gini Split…………………………………………….…….….15

Decision Tree Using Entropy Split………………………………………….……...16

Artificial Neural Network…………………………………………..........................17

ANN with MLP (Decision Tree)…………………………………………….……..18

ANN with MLP (Variable Selection in SAS)………...............................................18

ANN with RBF (Variable Selection in SAS)……………………………….……...19

Ensemble Model…………………………………………………………...….……20

Evaluation …………………………………………………………….……………21

Measures of Evaluation…………………………………………….........................21

Sensitivity……………………………………………..............................................22

Accuracy ……………………………………………...............................................22

Lift Charts……………………………………………………………………..……23

Deployment ……………………………………………………………….……….24

Conclusion

………………………………………………………………..………. 24

References ………………………………………………………………..……….26

2

3

Executive summary

Objective: Although mobile phones have become a fundamental part of personal communication across the globe during the past ten years, consumer research has devoted little specific attention to motives and choice underlying the mobile phone buying decision process. We have been provided a dataset that has the demographic and usage data of customers of a telecom company and whether they switched from a 2G network service to

3G network service. We need to determine which customers would switch from 2G to 3G network based on their data. Therefore, the main objective of this project is to develop prediction models to predict whether a customer would switch service or not.

Methods and material : We have used SAS Data Miner, Version 4.3, for developing the prediction models. For the entire project we followed the CRISP DM methodology. The six phases of this method helped us work through the project in a systematic way. We used two popular data mining algorithms (artificial neural networks and decision trees) along with a most commonly used statistical method (logistic regression) to develop the prediction models using a large dataset (more than 20,000 cases). We also developed an ensemble model combining the performance of the three models developed. Decision tree was also used as a tool for variable selection for the artificial neural network model.

Results: We compared the three models on account of accuracy, false positives and mainly sensitivity. The results indicated that the artificial neural network is the best predictor with

78% accuracy on the holdout sample (test dataset), decision tree came out to be the second with 77.0% accuracy and the logistic regression models came out to be the last best of the three with 76.74% accuracy. Our main model comparison was on the basis of the model’s sensitivity (or false positives). Artificial neural network came out to be the best again with a sensitivity of 81.00% on the test set, logistic regression came out to be the second with

76.41% sensitivity and the decision tree models came out to be the last of the three with

72.61% sensitivity. Therefore, we selected artificial neural network model as our best model for prediction. This model used decision tree for initial variable selection.

4

Conclusion: The comparative study of multiple prediction models for predicting customers that would switch from 2G to 3G network based on their usage and demographic data using a large dataset provided us with an insight into the relative prediction ability of different data mining methods. Using sensitivity analysis on the different prediction models that were developed gave us an insight on the relative importance of the variables (demographic or usage factors) in determining whether a customer would switch from one network to another or not.

5

1. Introduction

There are numerous complex factors that need to be taken into account when exploring mobile phone buying decision process, including both macro and micro economic conditions that affect the evolution of mobile phone market in general and individual consumer’s motives and decision making in particular.

The introduction of 3G in 2001 opened a new page in mobile phone history. The speed-up of data communication and rapidly growing needs for memory-consuming applications raised requirements for CPU performance and created demand for an application processor to offload to the communication processor [1].

The 3G enhancements include:

(1) Communication services such as voice, text and pictures,

(2) Wireless Internet services such as browsing, corporate access and e-mail, and

(3) different media services such as motion pictures, games and music etc.

Factors affecting choice of 3G networks (Based on the literature survey)

- Decrease in handset prices (especially on the standard models) and also cheaper rate plans [2].

- Adoption of a common platform which will reduce the prices of the handsets (Mobile phone manufacturers install software modules and users interfaces — which are specific for each manufacturer — on the same design platform). This will not only help in decreasing costs to the manufacturer but also help in simplifying the functionality, as the customer can easily understand the use of the technology even though he changes the phones easily.

- Demographic factors have an influence on the evaluations of different attributes related to mobile phone choice. Specifically, gender and social class will impact on the evaluations of the attributes as men belonging to higher social class seem to be more technology savvy.

- The other factors like contract type, monthly expenditure, choice of operator, daily use of voice services, weekly use of SMS, as well as familiarization with mobile devices’ functionalities and mobile technologies (e.g. knowledge of GPRS) [3].

6

- Broadband Internet Functions, PC Synchronization, Messenger, Map and positioning,

Games, Email service are the other important features that affect the buying behavior of consumers shifting to 3G.

2. Business Understanding

An Asian telecom operator which has successfully launched a third generation (3G) mobile telecommunications network would like to make use of existing customer usage and demographic data to identify which customers are likely to switch to using their 3G network . We have been provided a dataset of 24,000 records, with 18,000 records having the value for dependent or output variable (CUSTOMER_TYPE), which would be used to build and train the model. The remaining 6000 records would be used to score the performance of the model.

We need to build different prediction models to predict the binary output variable in order to determine whether a person in question is a potential 3G service customer based on his/her demographic and usage data. We intend to develop mainly three different types of prediction models: logistic regression, decision trees and artificial neural networks. A number of models would be developed based on these algorithms and compared on account of accuracy and mainly sensitivity, which is the percentage of people who selected 3G service and predicted correctly. Finally, after the model is selected based on its sensitivity and accuracy, we would apply the model on the score set in order to determine how well does the model perform on completely unseen data.

3. Data understanding

In order to perform our analysis we have used a dataset provided by an Asian Telco operator describing their current customer usage along with their demographic data. The dataset has already been split into two parts. The first part is called the training data which

7 would be used to build the model. The other part is called the scoring data which would be used to score and understand the performance of the model on unseen data.

An original sample dataset of 20,000 2G network customers and 4,000 3G network customers has been provided with 252 data fields. The target categorical variable is

“Customer_Type” (2G/3G). A 3G customer is defined as a customer who has a 3G

Subscriber Identity Module (SIM) card and is currently using a 3G network compatible mobile phone. Using these different variables we would build a model in order to predict the Customer Type. As the output/dependent variable is binary (2G/3G), we can consider the problem to be a binary classification problem.

Three quarters of the dataset, which has 18,000 records, has 15,000 customers subscribed to 2G network and the remaining 3000 customers are subscribed to 3G network. This dataset has the target field available and is meant to be used for training/testing. The remaining portion of 6000 records, which has 5000 customers subscribed to 2G network and the remaining 1000 customers subscribed to 3G networks, has the target field data missing and is meant to be used to verifying the prediction performance of the model.

4. Data preparation

This was probably the most difficult step in our data mining application. A major portion of the time was spent on preparing the data to be worked upon in the application. Almost 70% of the time and effort in this project was spent on cleaning and preparing the data for predictive modeling.

4.1 Multi-colinearity

There were a total of 252 variables in the dataset including the target variable. There were variables in the dataset that contained redundant information such as variable overrides and recodes. For all redundant variables that provided the same information we selected just the one variable that would provide the information and the remaining variables were deleted.

8

For example, variables that provide information such as total number minutes used, average number of minutes used and the standard deviation of the minutes used in the last six months provide the same information. We used the average minutes used and rejected the other two variables. On similar accounts a large number of variables were rejected.

4.2.

Missing Values

We first considered the interval variables to determine the missing variables with missing values. These are as shown Table1.

Variable

AGE

Missing%

3%

HS_MANUFACTURE 3%

Table1 : Missing Value Percentages for Interval Variables

We then considered the class variables to determine the missing variables with missing values. These are as shown Table2.

Variable

MARITAL_STATUS

OCCUP_CD

Missing%

6%

63%

CONTRACT_FLAG

PAY_METD

5%

5%

Table2 : Missing Value Percentages for Class Variables

We used tree imputation for replacing the missing values for interval variables and used the default constant U (or .) for class variables. We deleted the variable OCCUP_CD as majority of the records had a missing value for it. In order to see the effect of missing value replacement, we used it in way that it would not skew the original distribution of the variable. Fig1 below shows the effect of replacement on AGE variable. From the figure, we see that there is almost no change in the distribution of the variable after imputation.

9

Before Imputation

Fig1- Effect of Imputation

After Imputation

4.3.

Sampling

Based on the distribution of the output variable we notice that approximately 83% of the records belong to one class label (2G). Since, we are using SAS Data Miner for developing our predictive models; we need to change the distribution of the output variable. The reason for this is that SAS Data Miner on analyzing such a biased data would predict all the records to be in one class (2G). Even though the accuracy of such a model would be 83% as it would predict the correct output variable 83% of the times, it would not be useful for proper prediction in the true sense. This is because such a model even though highly accurate is not useful for our prediction purpose. As the aim of our prediction model is to predict people who will switch from 2G to 3G network service, the model developed by us needs not only high accuracy but also high sensitivity. For this purpose we used stratified sampling which is considered in detail in the following section.

4.3.1.

Stratified Sampling

A lot of literature is available for the fact that an even distribution of the dependent variable is very important for developing accurate and correct prediction models. Therefore, in order to develop a model using SAS Data Miner we need to have an almost even distribution of records between the two class labels [4].

For this purpose we have used stratified sampling. By using stratified sampling on the output variable we get a set of

10 records from the clean dataset with an equal distribution for the output variable. We used a random seed of 4633 for stratified sampling.

After performing stratified sampling we have 6000 records left in the cleaned dataset. But this dataset has far more even distribution of the output variable than the previous dataset.

This is as shown in Fig2 .

Before Sampling After Sampling

Fig2 – Stratified Sampling using Output Variable

From the figure, we see that the data 17% customers with 3G service in the training data.

Using stratified sampling, we balanced the data to have equal distribution of 3G and 2G network service users.

4.4.

Data Partition

For prediction modeling we split the cleansed and transformed data into 3 parts using stratified sampling based on the dependent variable. The first part contained 60% of the data; called training data was used to build the model. The second part contained 30% of the data; called validation data was used to validate the performance of the model on the data it wasn’t built. The final part contained 10% of the data; called test data was used to see the performance of the model on unseen data. The testing set is the most important in judging the performance of our model for scoring unseen and new datasets with similar variables. To build the three partitions we used random sampling with a seed of 12345 . In

11 order to have a model that would be unbiased, we needed that the distribution of the output variable “CUSTOMER_TYPE” be similar across testing, validation and test data. This was achieved as seen in Fig3 below.

Training Data Validation Data

Fig3 – Distribution of Dependent Variable

Test Data

4.5. Data Transformation

On further exploration into the data we also found that independent variables like

AVG_VAS_SMS, AVG_VAS_GAMES, AVG_VAS_WAP, AVG_VAS_XP and many other variables we selected into the model were right skewed in distribution. The entire regression analysis is based on “central limit theorem” which requires that the dependent variable be normally distributed [5].

This is tested using a t-test or ANOVA (Analysis of

Variance). This also implies that the independent variables be normally distributed and hence we transformed most of these variables using bucket transformation, to a distribution that would maximize normality that would reduce its skewness. An insight node added to the replacement node shows the imputed values inserted into fields that were earlier missing with indicators having a value of ‘1’ if imputed and ‘0’ if not imputed. An example result of data transformation for the variable AVG_VAS_SMS is as shown in Fig4 below.

12

Before Transformation After Transformation

Fig4 – Transformation Results on AVG_VAS_SMS

5.

Modeling

We used three different types of prediction models: artificial neural networks, decision trees, and logistic regression. These models were selected for inclusions in this project due to their popularity in the recently published literature as well as their better than average performance in our initial comparisons carried out. We finally also build an ensemble model combining the performance of the three models developed. We consider the three models in detail in the next three sections.

5.1.

Logistic Regression

Logistic regression is a generalization of linear regression [6] . It is used primarily for predicting binary or multi-class dependent variables. As our dependent variable

(Customer_Type) is a categorical variable, we have used logistic regression as a prediction model. The logistic regression model was built on the training data as explained in section

4.4. The data transformation and missing value replacement as explained in section 4.2 and

4.3 were carried for logistic regression model developed. The following results were obtained on running logistic regression on the cleansed and transformed data as shown in

Fig5.

13

Fig5 – Logistic Regression Results

The variables selected in the model in descending order of importance starting from the most important variable are shown in Table3 below.

Variable

HS_AGE

AVG_BILL_AMT

Description

Handset Age in Months

HS_MANUFACTURE

AGE

Average billing amount in last six months

Handset manufacturer

Age of the customer

TOT_RETENTION_CAMP Total number of received retention campaign in the last six months

LINE_TENURE Line tenure in days

Table3 – Variables Selected In Logistic Regression

14

In order to understand the performance of the model we developed a 2 x 2 confusion matrix with the target variable “CUSTOMER_TYPE” versus the predicted variable

I_CUSTOMER_TYPE. This is as shown in Fig6 below.

Fig6 – Confusion Matrix for Logistic Regression on Test set

From the above confusion matrix, we see that the model has an accuracy of 76.64% and a sensitivity of 76.41%. Accuracy and sensitivity are used in section 6 for evaluation of the models in detail.

5.2.

Decision Tree

Decision trees are powerful classification algorithms that are becoming increasingly more popular with the growth of data mining in the field of information systems [7].

As the name implies, this technique recursively separates observations in branches to construct a tree for the purpose of improving the prediction accuracy. In doing so, they use mathematical algorithms (e.g., information gain, Gini index, and Chi-squared test) to identify a variable and corresponding threshold for the variable that splits the input observation into two or more subgroups. This step is repeated at each leaf node until the complete tree is constructed.

15

We developed decision tree models using of each of the three splitting criteria. In the following to subsections we have discussed decision trees developed based on Gini and entropy splitting criteria as they came out to be the best from a nuber of decision trees.

5.2.1.

Decision Tree using Gini Split

We ran a number of decision trees based on the Gini split. We changed the minimum number of observations in a leaf and also observations required for a split search in order to improve the performance of our model. We also compared the performance of binary and multi phase trees. For Gini split the best performance was observed 10 number of observation for a leaf with 100 observations required for a leaf. The tree selected was a 3 way-split decision tree. The variables selected in the tree in order of their importance starting from the most important variable are shown in Table4 below.

Table4 – Variables Selected for Decision Tree

The variables with role rejected in Table4 were not taken into the model.

16

In order to understand the performance of the model we developed a 2 x 2 confusion matrix with the target variable “CUSTOMER_TYPE” versus the predicted variable

I_CUSTOMER_TYPE. This is as shown in Fig7 below.

Fig7 – Confusion Matrix for Decision Tree on Test set

From the above confusion matrix, we see that the model has an accuracy of 77.16% and a sensitivity of 75.67%. Accuracy and sensitivity are used in section 6 for evaluation of the models in detail.

5.2.2.

Decision Tree Using Entropy Split

Similar to Gini split we ran a number of decision trees by changing the factors as mentioned for Gini split. We also compared the performance of binary and multi phase trees. For Entropy split the best performance was observed 10 number of observation for a leaf with 100 observations required for a leaf. The tree selected was a 3 way-split decision tree. The variables selected in the tree were exactly the same as for Gini split with the same importance as well.

17

In order to understand the performance of the model we developed a 2 x 2 confusion matrix with the target variable “CUSTOMER_TYPE” versus the predicted variable

I_CUSTOMER_TYPE. This is as shown in Fig8 below.

Fig8 – Confusion Matrix for Decision Tree on Test set

From the above confusion matrix, we see that the model has an accuracy of 76.66% and a sensitivity of 76.00%. Accuracy and sensitivity are used in section 6 for evaluation of the models in detail.

Decision Tree Model Selection: Based on the sensitivity and accuracy numbers we selected Decision Tree which used entropy split discussed in section 5.2.2 as our best DT model and used it for further comparisons between different models for evaluation as discussed in section 6.

5.3.

Artificial Neural Network

Artificial neural networks (ANNs) are commonly known as biologically inspired, highly sophisticated analytical techniques, capable of modeling extremely complex non-linear functions. We firstly used a popular ANN architecture called multi-layer perceptron (MLP)

18 with back-propagation (a supervised learning algorithm). The MLP is known to be a powerful function approximator for prediction and classification problems. It is arguably the most commonly used and well-studied ANN architecture [8].

We also developed artificial neural network based on the radical basis function architecture. For all our artificial neural networks, we selected misclassification model type and used one hidden layer.

The different ANN models developed for our study are described in the following sections.

5.3.1. ANN with MLP (Decision Tree Variable Selection)

As neural network in SAS Data Miner version 4.3 uses all the variables in the data provided to build the model we performed variable selection before running the ANN network. Firstly, we performed variable selection using binary decision tree with Gini

Index as the splitting criteria with a depth of 8. The variables selected for the ANN model are the same as shown in Table4 in section 5.2. In order to understand the performance of the model we developed a 2 x 2 confusion matrix with the target variable

“CUSTOMER_TYPE” versus the predicted variable I_CUSTOMER_TYPE as before. We got an accuracy of 78% and sensitivity of 81% on the test set.

5.3.2

ANN with MLP (Variable Selection in SAS)

We next performed variable selection using variable selection node in SAS Data Miner.

The variables selected for the ANN model are the same as shown in Table4 below. In order to understand the performance of the model we developed a 2 x 2 confusion matrix with the target variable “CUSTOMER_TYPE” versus the predicted variable

I_CUSTOMER_TYPE. This is as shown in Fig9 below.

19

Fig9 – Confusion Matrix for ANN using Variable Selection for Test set

From the above confusion matrix, we see that the model has an accuracy of 77.83% and a sensitivity of 79.33%. Accuracy and sensitivity are used in section 6 for evaluation of the models in detail.

5.3.3.

ANN with RBF (Variable Selection in SAS)

We then used a different ANN architecture called radial basis function. Variable selection was performed using variable selection node in SAS Data Miner. The variables selected for the ANN model are the same as shown in Table4 above. In order to understand the performance of the model we developed a 2 x 2 confusion matrix with the target variable

“CUSTOMER_TYPE” versus the predicted variable I_CUSTOMER_TYPE. This is as shown in Fig10 below.

20

Fig10 - Confusion Matrix for ANN (RBF) using Variable Selection for Test set

From the above confusion matrix, we see that the model has an accuracy of 78.83% and a sensitivity of 78.33%. Accuracy and sensitivity are used in section 6 for evaluation of the models in detail.

ANN Model Selection: Based on the sensitivity and accuracy numbers we selected ANN which used MLP architecture with decision tree as variable selection discussed in section

5.3.1 as our best ANN model and used it for further comparisons between different models for evaluation as discussed in section 6.

5.4.

Ensemble Model

With an ensemble model we tried to combine the prediction capabilities of our logistic regression model, best decision tree model and best ANN model. For this purpose, we used the ensemble node in SAS Data Miner. In order to understand the performance of the model we developed a 2 x 2 confusion matrix with the target variable

“CUSTOMER_TYPE” versus the predicted variable I_CUSTOMER_TYPE. This is as shown in Fig11 below.

21

Fig11 - Confusion Matrix for Ensemble for Test set

From the above confusion matrix, we see that the model has an accuracy of 77.83% and a sensitivity of 79.33%. Accuracy and sensitivity are used in section 6 for evaluation of the models in detail.

6.

Evaluation

6.1. Measures of Evaluation (Accuracy and Sensitivity)

For the purpose of evaluating the results of our prediction model we have mainly used three evaluation parameters namely accuracy, sensitivity and false positives. Accuracy is defined as the ratio of the total correct predictions by total number of observations. In other words it is the ratio of the sum of the true positives and true negatives divided by total number of observations [5].

Another important evaluation factor for our prediction model is sensitivity. Sensitivity is defined as the ratio of all the positive observations that are predicted correctly divided by all the actual positive observations in the dataset [9].

In other words it is the percentage of people predicted correctly by the model who switched from

2G to 3G network. Our final evaluation factor was false positives, which is the number of positive responses predicted incorrectly. In other words it is the percentage of people predicted incorrectly who actually switched from 2G to 3G network.

22

6.1.1.

Sensitivity

We first compared the three selected models on the basis of their prediction sensitivity. The prediction sensitivity for the models over the training, validation and test set is shown in

Table5.

Model

Logistic Regression

Decision Tree

Training (%)

77.27

81.62

Validation (%)

80.22

79.05

Test (%)

76.41

76.00

ANN

Ensemble Model

81.90

79.30

82.43

79.62

81.00

79.33

Table5: Sensitivity of the four models

For our evaluation of the model, sensitivity was of main concern as we needed to predict people that would switch from 2G to 3G network. From the above table we see that artificial neural network performs better than logistic regression and decision tree on comparison over all three training, validation and test data. Therefore, on account of sensitivity we conclude that artificial neural network performs best compared to the other two models.

6.1.2.

Accuracy

We then compared the three selected models on the basis of their prediction accuracy. The prediction accuracy for the models over the training, validation and test set is shown in

Table6. From Table6 we see that no one model performs better on account of accuracy over the training, validation and test set. However, the most important here is the performance on the test set as it is the data that is unknown before the model was built as it was used to score the performance of the modeled developed.

23

Model

Logistic Regression

Decision Tree

Training (%)

77.26

81.94

Validation (%)

78.27

76.44

Test (%)

76.74

76.66

ANN

Ensemble Model

78.59

77.41

77.34

74.88

78.00

77.83

Table6: Accuracy of the four models

. We see that ANN performs best on account of accuracy on the test set as well.

We therefore select Artificial Neural Network which used decision tree for variable selection as our best model for prediction on account of sensitivity and accuracy.

6.2. Lift Charts

Lift charts are pictorial representations that show the merit of developing a model based on data mining principles. The performance of the binary models can be assessed by means of lift charts. The lift chart plots the same but on a relative scale with ‘building no model’ as a baseline measure. These are shown in Fig12 below.

Lift Value (DT vs ANN) % Captured Response (DT vs ANN)

Fig12 – Lift Charts

24

By analyzing the models on accounts of sensitivity and accuracy, we found artificial neural network to be our best model, and decision tree closely followed. From the lift chart on the left of the Fig, we see that having either of the two models for prediction we would have a much higher response rate compared to not using any model. For example, mailing to the top 40 th

percentile selected by the model, we would have a response rate of 1.65 times for

ANN and 1.62 times for DT than by just randomly selected 40% customers. On comparisons between the two models, ANN performs better on this account.

The lift chart on the right of Fig12 shows the captured responses of the models. We see that by having a model we have 50% more captured response at 30% level for DT model and

53% more captured response at 30% level for ANN model. This means that if the company sent out offers to its top 30% customers selected by the model it would have 53% (ANN) more positive response rate than just by sending offers to 30% of its customers randomly.

Again we see that ANN performs much better on this account as well. Therefore, we finally select our ANN model as our final model.

7.

Deployment

The deployment of the model refers to the application of a model for prediction or classification of new data. We can the use the model that has been developed to predict the

8. people who would subscribe to 3G network service based on their demographic and usage data. This model can also be used for prediction of real world data where we don’t know whether the customer would subscribe to 3G service or not.

Conclusion

We had been provided a dataset that has the demographic and usage data of customers of a telecom company and whether they switched from a 2G network service to 3G network service. We needed to determine which customers would switch from 2G to 3G network

25 based on their data. Therefore, the main objective of this project was to develop prediction models to predict whether a customer would subscribe service or not.

We have used SAS Data Miner, Version 4.3, for developing the prediction models. For the entire project we followed the CRISP DM methodology We used two popular data mining algorithms (artificial neural networks and decision trees) along with a most commonly used statistical method (logistic regression) to develop the prediction models using a large dataset (more than 20,000 cases). We also developed an ensemble model combining the performance of the three models developed. Decision tree was also used as a tool for variable selection for the artificial neural network model.

The results indicated that the artificial neural network is the best predictor with 78% accuracy on the holdout sample (test dataset), decision tree came out to be the second with

77.0% accuracy and the logistic regression models came out to be the worst of the three with 76.74% accuracy. Our main model comparison was on the basis of the model’s sensitivity (or false positives). Artificial neural network came out to be the best again with a sensitivity of 81.00% on the test set, logistic regression came out to be the second with

76.41% sensitivity and the decision tree models came out to be the worst of the three with

72.61% sensitivity. Therefore, we selected artificial neural network model as our best model for prediction. This model used decision tree for initial variable selection.

26

References

[1] Tomi T Ahonen, Timo Kasper, Sara Melkko. 3G marketing- Communities and

strategic partnerships. John Wiley and Sons 2002;17-22.

[2] S.Wolfe, S.Robinson. Offering the town Orange 3G Services. Telecommunication

Review 2004;12-13.

[3] Mort G.S.; Drennan J. Mobile digital technology: Emerging issue for marketing. The

Journal of Database Marketing, Volume 10, Number 1, September 2002, pp. 9-

23(15)

[4] M. Joanne Morgan, John M. Hoenig. Estimating Maturity-at-Age from Length

Stratified Sampling. Journal of Northwest Atlantic Fishery Science 2002;21: 51-63.

[5] Dursun Delen, Glenn Walker, Amit Kadam. Predicting breast cancer survivability:

a comparison of three data mining methods. Oklahoma State University 2004;2-3.

[6] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. New

York, NY: Springer-Verlag; 2001.

[7] Quinlan J. Induction of decision trees. Mach Learn 1986;1:81-106.

[8] Haykin S. Neural networks: a comprehensive foundation. New Jersey: Prentice

Hall; 1998.

[9] Robert M. Nosofsky, Thomas J. Palmeri, Stephen C. McKinley, Paul Glauthier.

Comparing Models of Rule-Based Classification Learning. Department of

Psychology Indiana University 2003,14-18.

[10] 3G marketing- Communities and strategic partnerships by Tomi T Ahonen, Timo

Kasper, Sara Melkko.

Web resources:

1) http://www.oasis.oulu.fi/publications/jem-05-hk.pdf

2) www.Gartner.com

3) www.marshall.usc.edu/ctm/publications/FITCE2002.pdf

4) www.emeraldinsight.com

5) Doebele, J. (2002), "No A for Asia", Forbes , No.April, pp.35-7

6) http://www.aprg.com/asp/ana_SR.asp

(Asia pacific research group)

Download