Carter 1 SECTION 1: INTRODUCTION 1.1 OVERVIEW The process of data collection, no matter the objective, can provide a person or entity with a large amount of generic information on a particular subject. However, to learn from and to interpret the data requires much more than simply gathering it. For example, a business may build meaningful relationships with its customers by learning from previous interactions with them, observing their needs, and remembering what their preferences are, in order to determine how to serve them better in the future. In order for this type of learning to take place, data must first be collected and organized in a useful and consistent way. This procedure is known as data warehousing. Data warehousing allows a user to remember what has been noticed in the data. Afterwards, the data must then be analyzed, interpreted, and transformed into useful information. At this stage is where data mining comes into play. Data mining is the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules (Berry, Linoff, pg. 5). Data mining can be applied in a wide variety of areas, from sports to law enforcement to education. In this project, I use data mining techniques to predict the current contraceptive method choice (no use, long-term methods, or short-term methods) of Indonesian women based on their demographic and socio-economic characteristics. The algorithms that are implemented here are Naive Bayesian Classification, One-Rule Classification and Decision Tree. This project presents a Web-based client/server application. The project makes use of the three-tier client/server architecture, with the Web browser as the client Carter 2 front-end, the Common Gateway Interface (CGI), Perl, Visual Basic, and Active Server Pages (ASP) as the middle-tier software, and Microsoft Access 2000 and a commaseparated value (CSV) text file for the database back-end. The database administrator has the capability to add, delete, edit, and search for records. The administrator can also change the administrator password and add users who have permission to gain access to the website. Users have privileges to add records and to search for records. A logging system is also implemented, which keeps track of the time, date, host server, browser, and operating system of users that access the database. The log is accessible by both the administrator and the users. 1.2 BACKGROUND INFORMATION According to the Central Bureau of Statistics, the nation of Indonesia is the fourth-most populous country in the world, with an estimated total population of 207 million in 2000 (United Nations Population Fund). Indonesia has a growth rate of 1.5 percent a year, and although the population growth rate is at a moderate level, the country has a significant momentum of growth. The government of Indonesia is concerned about the uneven distribution of the population and the scale of the population growth. This is especially true in when considering overcrowding in urban, densely populated areas, such as Java and Bali. Other areas of concern are the relatively high infant and under-five mortality rates (52 and 71 per 1,000, respectively) and the persistently high maternal mortality ratio (estimated at 370 per 100,000 births). Indonesia has been recognized for the success of its family planning efforts. However, according to (United Nations Population Fund), the progress in the contraceptive prevalence rate (CPR) seems to have stalled at about 57 percent. Also, the Carter 3 burden of use of contraceptives appears to be unevenly shouldered by women, as the male-based CPR is less than 2 percent. And even though the “unmet need” for contraceptives of currently married women has been estimated at the relatively small 9.2 percent, this number is probably considerably higher when unmarried men and women are taken into account. In order to meet this need, it is paramount that the quality and scope of contraceptive services and information be expanded. A critical challenge for Indonesia remains the access to affordable contraceptives by all its citizens, especially the poor. 1.3 ABOUT THE DATASET This dataset comes via a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. It was created and donated by Tjen-Sien Lim on June 7, 1997. The contents were downloaded from the UCI Machine Learning Depository. The samples contained in the survey are of married women who were either not pregnant or did not know if they were pregnant at the time of the interview. The problem faced is predicting the contraceptive method choice of the woman based on her demographic and socioeconomic characteristics. Predicting the contraceptive method choice of Indonesian women can assist the government with how to and where to target and provide information on contraceptive choices for its female population. The three choices are no use, long-term methods, or short-term methods. The number of instances is 1473, and the number of attributes is 11, including the primary key (ID) and the classifying attribute (cmchoice). Carter 4 1.4 ATTRIBUTE INFORMATION No. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Attribute ID wife_age wife_ed hus_ed no_child wife_rel wife_work hus_oc st_live media cmchoice Description ID number Wife’s age Wife’s education Husband’s education Number of children ever born? Wife’s religion Wife’s now working? Husband’s occupation Standard-of-living index Media exposure Contraceptive method used Type / Values (primary key attribute) (numerical) (categorical) 1=low level, 2, 3, 4=high level (categorical) 1=low level, 2, 3, 4=high level (numerical) (binary) 0=Non-Islam, 1=Islam (binary) 0=Yes, 1=No (categorical) 1=low level, 2, 3, 4=high level (categorical) 1=low, 2, 3, 4=high (binary) 0=Good, 1=Not good (class) 1=No-use, 2-Long-term, 3=Short-term Figure 1: Attribute Information There are no missing values in the dataset. Carter 5 SECTION 2: TECHNICAL DESCRIPTION 2.1 THE THREE-TIER ARCHITECTURE The most commonly used application development architecture, and the one supported by most application servers, is a component-based, three-tier model (Directions on Microsoft). Components provide an increase of reusable code and simplify development. By using components, a developer can package the compiled (binary) code in such a way that another developer is able to easily and efficiently discover the functions provided by the component (usually by using a programming language application such as Visual Basic) and invoke those functions. This is accomplished while keeping the internal workings of the component hidden. The three-tier architecture increases scalability and reliability by separating the three major logical functions of an application (user interaction, business logic, data storage) from one another. Many Web services must provide functionality that displays the graphical user interface (GUI), performs the main logic of the program, and then stores and retrieves data. And although a developer may write a single module that will interconnect the three functions of user interaction, logic, and data storage, such an approach would require a great deal of work in maintenance and in deployment. Therefore, developers attempt to divide the application’s functionality into tiers, or layers. Years ago, as business applications moved from minicomputer or mainframe systems to the PC, developers adopted a two-tier strategy, which is also known as the client-server model. In this model, the data storage (typically provided by a server running a database management system such as SQL Server or DB2) is separated from the rest of the application (typically running on desktop PCs). This resulted in many Carter 6 developer tools being created around the client-server model. However, the client-server model had its drawbacks, which included the following, as described in (Directions on Microsoft): Difficult to evolve. Because the client piece of a client-server system included both the GUI and the business logic, developers updating the GUI could inadvertently change the business logic as well. Difficult to deploy. A client application had to be deployed on the desktop PC of each user who wanted to access the application, potentially requiring thousands of deployments. Difficult to scale. Each running client connected directly to the database, thereby consuming server resources and often limiting the number of simultaneous users that could access an application. On the other hand, the three-tier model introduces an intermediate business-logic tier between the GUI and the data storage, which provides these advantages over the clientserver model: Increased scalability. Logic components can be pooled and shared across multiple running clients. Easier to maintain. Since the GUI code is separate from the business logic, the GUI can be changed and enhanced without accidentally altering core business rules. In addition, when the business logic must be changed, only a relatively small number of middle-tier servers need to be updated instead of a larger number of desktop PCs. Shared business logic and support for multiple interfaces. The same business logic can be used from a Web-based interface and a thick-client interface. Carter Figure 2 illustrates the setup of a typical three-tier architectural model: Figure 2 (Delphi 2) 7 Carter 8 2.2 WEB BROWSER / HTML HTML, the HyperText Markup Language, is the standard authoring language for publishing on the World Wide Web. Having gone through several stages of evolution, today’s HTML has a wide range of features reflecting the needs of a very diverse and international community wishing to make information available on the Web (HTML Activity Statement). HTML defines the layout and structure of a Web document by using a series of tags and attributes. In this project, I use HTML for the structure of the Web pages within my project site. A Web browser is a software application used to locate and display HTML pages. The Microsoft Internet Explorer Web browser serves as the client in this application. 2.3 CGI The Common Gateway Interface (CGI) is a standard for interfacing external applications with information servers, such as HTTP or Web servers (CGI: Common Gateway Interface). A CGI program is executed in real-time, which means that it can output dynamic data to a Web page. On the other hand, a generic HTML document that is retrieved contains static information, which means it exists in a constant state and the information outputted to the screen does not change. Because a CGI program is executable, it allows visitors to a Web page to run a program on the server where the CGI document is hosted. For this and other reasons, authors of CGI scripts must take some security measures when it comes to the execution of the scripts. CGI programs must reside in a special directory, so that the Web server knows to execute the program instead of merely displaying it to the browser. Typically, this directory is under direct control of Carter the webmaster, which prevents the average user from creating CGI programs. The most common practice is to place CGI programs in a directory entitled ‘/cgi-bin’. 2.4 PERL A CGI program can be written in any language that allows it to be executed on the user’s system, and Perl is the language of choice for many developers. Perl is an acronym for the Practical Extraction Report Language. Perl is available for most operating systems, including virtually all Unix-like platforms (Perl). The language is optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. Perl can handle many system management tasks, and the language’s designers intended it to be practical, easy to use, and efficient. Perl combines many features of C, sed, awk, and sh, as well as csh, Pascal, and BASIC-PLUS. Expression syntax in Perl corresponds closely to C expression syntax. Perl, unlike most Unix utilities, does not arbitrarily limit the size of the user’s data, as long as the required memory is available. As an example, Perl can parse a whole file as a single string. Recursion in Perl is of unlimited depth. The tables used by hashes, commonly referred to as associative arrays, grow as necessary to prevent diminished performance. One of Perl’s most useful capabilities is that it can use sophisticated pattern matching techniques to scan large amounts of data quickly. And although optimized for scanning text, Perl can also deal with binary data. In this project, I use Perl to implement CGI scripts for performing the database manipulation operations, such as insert, delete, edit, and search. Perl and CGI serve as a part of the middle tier of this application. 9 Carter 10 2.5 ASP / VBScript Active Server Pages (ASP) are components that allow Web developers to create server-side scripted templates. In turn, these templates generate dynamic, interactive web server applications. By embedding special programmatic codes in standard HTML pages, a user can interact with page objects such as Active-X or Java components, access data in a database, or create other types of dynamic output. The HTML output by an Active Server Page is totally browser independent, which means that it can be read equally well by Microsoft Explorer, Netscape Navigator, or most other browsers (ASP-help.com). In this project, I use ASP technology to allow the implementation of the user login feature, as well as the add user function, which is done using Visual Basic script, or VBScript. ASP / VBScript serve as a part of the middle tier of this application. 2.6 B-Course B-Course is a Web-based data analysis tool for Naive Bayesian modeling. Specifically, B-Course is used for dependence and classification modeling. B-Course can be freely used for educational and research purposes as an analysis tool where dependence or classification modeling based on data is needed. The software provides two courses of modeling: dependency modeling and classification. 2.7 VISUAL BASIC DATA MINING.NET Visual Basic Data Mining.Net is a Web portal that provides data mining algorithm and application documentation, as well as various source codes in .Net and Visual Basic. These features of the site demonstrate how the .NET Framework and/or Visual Basic can be used to either learn how data mining algorithms and applications function or to build data mining applications. Visual Basic Data Mining.Net also offers Carter 11 a data mining community and provides functionality of data mining algorithms and applications. The site provides a wizard-based interface for implementing the algorithms. Visual Basic Data Mining.Net can be found online at: http://www.visualbasic-data-mining.net. 2.8 SEE5 See5 analyzes data to produce decision trees and/or rulesets that relate a case’s class to the values of its attributes (See5). In See5, an application consists of a collection of text files. These files define classes and attributes, describe the cases to be analyzed, provide new cases to test the classifiers produced by See5, and specify misclassification costs or penalties. A See5 application consists of two mandatory files, which are a .names file and a .data file. The .names file defines the classes and attributes associated with the data. The .data file contains the actual cases to be analyzed by See5 in the process of producing a classifier. Carter 12 SECTION 3: DATA MINING ALGORITHMS 3.1 NAIVE BAYESIAN CLASSIFICATION Bayes Theorem illustrates how to calculate the probability of one event given that it is known some other event has occurred. Expressed algebraically, this is a simple classconditional approach, based upon the following assumption: P(A|B) = P(A) * P(B|A) / P(B) or, the probability that A takes place given that B has occurred (P(A|B)) equals the probability that A occurs (P(A)) times the probability that B occurs if A has happened (P(B|A)), divided by the probability of B occurring (P(B)). Naive Bayesian classifiers make the assumption that an attribute’s effect on a given class is independent of values of any other attribute, and this assumption is known as class conditional independence. It is made to simplify the computation and in this sense considered to be “naive” (Naive Bayes – Introduction). The independence assumption that underlies the Naive Bayesian classification technique is one that is deep-seated and therefore, may not be realistic. However, a Naive Bayesian classifier can yield an excellent prediction. One example of this case may occur when a feature selection process on the data is completed prior to classification. This ensures that only one pair of any highly correlated features is saved and used in the classification process. When dealing with gene expression data, feature selection must be performed prior to classification due to the extremely high dimensionality of the feature space (Wallach, 2003). Carter 13 A Bayesian network consists of nodes and arcs that can connect pairs of nodes (P.Myllymäki, et. al). For each variable, exactly one node exists. A major restriction for the Bayesian network is that arcs are not allowed to form loops. If the arcs can be followed such that some node is visited twice, the model is not a Bayesian network. Figure 3 is an example of a network that is NOT a Bayesian network: Figure 3 (P.Myllymäki, et.al.) Presented next is a dependency model for a Bayesian network. This example model is given in (P.Myllymäki, et. al): A and B are dependent on each other if we know something about C or D (or both). A and C are dependent on each other no matter what we know and what we don't know about B or D (or both). B and C are dependent on each other no matter what we know and what we don't know about A or D (or both). C and D are dependent on each other no matter what we know and what we don't know about A or B (or both). There are no other dependencies that do not follow from those listed above. Figure 4 shows the Bayesian network for these dependencies: Figure 4 (P.Myllymäki, et.al.) Carter 14 A and B are considered dependent, when given a (possibly empty) set S that contains some other variables of the network, if one can freely travel the arcs from A to B. If the arcs cannot be freely traveled from A to B, A and B are not dependent given S. The ability to travel an arc is generally independent of the direction of the arc. If S is an empty set, one may travel the arcs forward and backward, given that the same node is never visited twice and that an arc is first traveled forward, and immediately afterward traveled backward on some other arc. In this project, I use B-Course to perform Naive Bayesian dependency modeling and Naive Bayesian Classification on the contraceptive method choice database. 3.2 ONE-RULE CLASSIFICATION The one-rule algorithm creates one data mining rule for the dataset based on one attribute (one column in a database table). After comparing the error rates from all the attributes, it then chooses the rule that gives the lowest classification error. The rule will assign to one category or class each distinct value of one chosen attribute. This rule can be defined in pseudocode as (Tagbo): For each attribute in the data set For each distinct value of the attribute Find the most frequent classification Assign the classification to the value Calculate the error rate for the value Calculate the total error rate for the attribute Choose the attribute with the lowest error rate Carter 15 Create one rule for the chosen attribute The goal of the one rule data mining algorithm in this implementation is to classify each of the attributes wife_age, hus_ed, no_child, wife_rel, wife_work, hus_oc, st_live, and media of the contraceptive method choice database as no use, long-term methods, and short-term methods. Afterwards, the attribute with the lowest error rate is chosen as the best rule. In this project, I use Visual Basic Data Mining.Net to process the results of the One-Rule Classification algorithm on the contraceptive method choice database. 3.3 DECISION TREE A visual aid for data mining is the decision tree. A decision tree is in essence a flow chart of questions or data points. These questions or data points eventually lead to a decision. Decision tree algorithms begin by finding the test that performs the best task of splitting the data among the preferred categories. At each successive level of the tree, subsets created by the previous split are themselves split, making a path down the tree. Each of the paths through the tree represents a rule. However, some rules are more useful than other ones. And in some cases, the predictive power of the entire tree can be bettered by pruning back the weaker branches. At each node of the tree, three things can be measured: the number of records entering the node, the percentage of records classified correctly at the node, and the way the records would be classified if it were a leaf node. The tree continues to grow until it is no longer possible to locate more useful ways to split the incoming records. Decision trees create a set of bins or boxes where the data miner may place records. Carter 16 In Figure 5, a partial binary tree for the classification of musical instruments. The gap in the center of the row of bins corresponds to the root node of the tree. All stringed instruments then fall to the left of the gap, and all other instruments fall to the right. Figure 5 (Berry, Linoff, pg. 245) In this project, I use See5 to construct decision trees and process those results for the contraceptive method choice database. Carter 17 SECTION 4: SYSTEM DESIGN 4.1 SYSTEM LAYOUT Query / Manipulation User Login Logs Data Mining Administrator Login Naive Bayes Search Delete Edit Add Records Add Users Change Admin Password Figure 6: Project System Flow One Rule Decision Tree Carter 18 4.2 WEBSITE PRESENTATION Figure 7: The contraceptive method choice database homepage Carter 19 Figure 8: Administrator Login Page To guarantee security, only the privileged database administrator can log in to the database to perform three of the database manipulation functions, which are to add users, delete records, and edit records. The administrator can also add users and change the admin password. Carter 20 Figure 9: Administrator Options After the administrator successfully logs in, administrator options are presented. These options include: search records, change password, add records, add users, delete records, and edit records. NOTE: Clicking the “Delete Record” button next to an entry will delete that entry from the database. Carter 21 Figure 10: Password Change Success Page Carter 22 Figure 11: Add User Page Figure 12: Add User Success Page Carter 23 Figure 13: Edit Record Page Carter 24 Figure 14: Edit Record Success Page Carter 25 Figure 15: User Login Page Users have privileges to add records and to search for records. Figure 16: Bad User Login Page Carter 26 Figure 17: User Request Page Figure 18: User Request Success Page Carter 27 Figure 19: Email Message This is the email that the system automatically sends to the database administrator when a user requests a login name and password. Carter 28 Figure 20: Search Page Figure 21: Search Page Results Both the database administrator and users have access to the search function. Carter 29 Figure 22: Add Record Page Figure 23: Add Record Success Page Both the database administrator and users have access to the add records function. Carter 30 Figure 24: Access Log Detail Both the database administrator and users have access to the access log feature. A count is kept for the different types of browsers and operating systems used. The log detail contains the time, date, host server, browser, and operating system of the computer that accesses the system. Carter 31 SECTION 5: DISCUSSION 5.1 NAIVE BAYESIAN RESULTS B-Course was used to construct Bayesian dependency models for the contraceptive method choice database. All variables, excluding the primary key ID, were used in constructing the model. When the software is invoked, B-Course searches for the most probable model for the data and returns these intermediate results. B-Course can then continue using a search strategy of selecting models that resemble the current best model, instead of picking models randomly from a set. As B-Course continues, it collects a set of relatively good models and then attempts to combine the best parts of these models so that the resulting combined model is better than any of the original models. After evaluating 8539 candidate models, B-Course returned the following Bayesian network as the best model: Figure 25: Bayesian Network (P.Myllymäki, et.al.) Carter 32 B-Course was started again, evaluating 444681 more candidate models, for a grand total of 453220 models evaluated. After searching these candidate models, BCourse located a new Bayesian network that represents the same model as the previous network: Figure 26: New Bayesian Network (P.Myllymäki, et.al.) B-Course also provides for Naive Bayesian classification. In classification modeling, one attribute of the data is chosen as the class variable, and the other attributes become predictor variables. The ultimate goal is to find the model that, given the values of predictor variables, deduces the value of the class variable. Classification modeling can also help to test whether some classes are similar or not. For example, if a model can correctly tell the classes apart, then there must be some difference in those particular classes. More analysis can measure how significant the differences in classes are. Carter 33 B-Course merges many quantitative models to build one single classification model. After running B-Course, 301 candidate models were evaluated. The estimated classification accuracy of the best model found was 48.74%. On the average the correct class received 36.56% probability. Figure 27 displays the variables B-Course found as the best subset for predicting the class variable: Figure 27: Classification model (P.Myllymäki, et.al.) Figure 28: Class arc weights (P.Myllymäki, et.al.) Carter 34 It was estimated that if the selected models were used, then 48.74% of future classifications would be done correctly. B-Course built 1473 models, each of which was constructed using the data items in the dataset. Next, the model was used to classify the data items not used in the model’s construction. Out of 1473 models, 718 succeeded in classifying the one unseen data item correctly. A confusion matrix displays how many members of a certain class were predicted to be members of a different class. Figure 28 shows a confusion matrix for the Naïve Bayesian classifier, where the entries denoting numbers of correct classifications are in bold print. Predicted Confusion Long-term No-use Shortterm Long-term 102 60 171 No-use 79 319 231 Short-term 66 147 297 Actual Figure 29: Confusion Matrix (P.Myllymäki, et.al.) 5.2 ONE-RULE RESULTS Using Visual Basic Data Mining.Net software, I applied the one-rule classification algorithm to the contraceptive method choice database. The steps used in producing the one-rule results are as follows: Step 1: Decide which of the attributes will be used to create the best one-rule for the dataset. Attribute ID is not chosen because it is the primary key for the database. Attribute cmchoice is not selected because it is the class attribute, containing the categories needed for classification. The remaining 9 attributes are chosen. Step 2: List the distinct values of each attribute. These values can be seen in Figure 1. Carter 35 Step 3: Find the most frequent classification for every distinct value of an attribute using the contraceptive method choice class values (no use, long-term methods, short-term methods). For example, according to the output, when no_child = 8, there were 9 cases of category no use, 7 cases of category long-term methods, and 8 cases of category short-term methods. Therefore, the most frequent classification is category no use, and a rule is made classifying 8 children as category no use, or 8 children No Use. The error rate for 8 children is the total number of times it appears in the dataset (24) minus the number of instances of its most frequent class (9), divided by the total (24). So the error rate in this case is 15 / 24. Step 4: Repeat Step 3 for each case of each attribute. Step 5: Choose the attribute with the lowest error rate Step 6: Create a one-rule classification based on this attribute Figure 30 displays a portion of the one-rule classification output. As shown, the attribute with the lowest error rate, which was selected as the best rule, is no_child. Carter 36 Attribute IsNumeric BestRule Value L.P.B. U.P.B. Class Frequency Total wife_work False False wife_rel False False wife_ed False False st_liv False False media False False hus_oc False False hus_ed False False wife_age True False no_child True True 0 0 0 1 62 62 no_child True True 0 0 0 1 33 34 no_child True True 0 0 1 1 94 95 no_child True True 0 1 1 2 31 31 no_child True True 0 1 1 3 61 61 no_child True True 0 1 1 1 49 49 no_child True True 0 1 1 2 15 15 no_child True True 0 1 1.5 3 26 26 no_child True True 0 1.5 2 1 83 83 no_child True True 0 2 2 2 39 39 no_child True True 0 2 2 3 77 77 no_child True True 0 2 2 1 31 31 no_child True True 0 2 2 2 17 17 no_child True True 0 2 2.5 3 29 29 no_child True True 0 2.5 3 1 46 46 no_child True True 0 3 3 2 44 44 no_child True True 0 3 3 3 90 90 no_child True True 0 3 3 1 24 24 no_child True True 0 3 3 2 26 26 no_child True True 0 3 3.5 3 29 29 no_child True True 0 3.5 4 1 37 37 Figure 30: One Rule Output (Tagbo) Carter 37 5.3 DECISION TREE RESULTS I performed decision tree analysis on the contraceptive method choice dataset using See5. There are 1473 instances in the dataset, with the 10 attributes, plus the unique identifier ID. However, this version of See5 allowed a maximum of 400 cases that could be used at a time. The class attribute, cmchoice, is represented by three categories (1 = no use; 2 = long-term; 3 = short-term). The numbers shown between 0 and 1 represent the probability of the attribute, at the given criteria, belonging to the specific class (no use, long-term, short term). The 400 cases were selected such that relatively equal numbers of cases for each contraceptive method choice classification are present. Thus, for the cmc.data file, the breakdown by ID is as follows: No use (1): ID# 1 – 133 Long-term (2): ID# 416-549 Short-term (3): ID# 643-776 Below is a partial screen shot of a mine for the ruleset of the attributes. A 95% confidence interval was used for all mines. Carter 38 Figure 31: Ruleset (Quinlan) Carter 39 Figure 32: Decision Tree Output (Quinlan) See5 creates a decision tree of the results. To paraphrase, the tree can be translated in this manner: Carter 40 if no_child is less than or equal to 0, then no use else if no_child > 0 if wife_ed = 1 if wife_age > 36, then no use else if wife_age <= 36 if st_live = 1, then no use if st_live = 2, then long-term if st_live = 3, then short-term if st_live = 4, then long-term if wife_ed = 2 ……………. (etc…) From the decision tree, conclusions can be drawn for determining which contraceptive method choice is best for Indonesian women. For example, a woman with no children would be most likely to choose no use. A wife with at least one child, a low educational level, and above the age of 36 is predicted for no use. A wife with at least one child, a low educational level, less than or equal to 36 years old, and with a standard of living index of 2 is predicted to have long-term methods. A wife with those same characteristics, but with a standard of living index of 3, is predicted to have short-term methods. Numerous predictions can be seen throughout the decision tree. Many times, classification decisions can occur slowly with changes in attribute values. For example, a threshold may be a value less than or equal to 0.5 for one classification, say long-term methods, and the values more than 0.5 may be another classification, say short-term methods. If the former holds, we go no further and predict long-term methods. By default, a threshold such as this is sharp. Therefore, a case with a hypothetical value of 0.49 is treated quite differently from one with a value of 0.51. See5 contains an option to invoke, instead of sharp thresholds like the case mentioned in the previous paragraph, fuzzy thresholds. A fuzzy set is a set whose Carter 41 elements are usually neither totally in the set nor totally out of the set (Meadow, et.al., pg. 217). When this is invoked, each threshold is broken into three ranges – they are denoted by a lower bound lb, an upper bound ub, and a central value t. If the questioned attribute value is below lb or above ub, classification is made by using the single branch corresponding to the `<=' or '>' result respectively. If the value falls between lb and ub, then both branches of the tree are investigated, with the results combined probabilistically. The values of lb and ub are determined by See5 based on a study of the perceived sensitivity of classification to small changes in the threshold. Figure 33 shows a screenshot of the classifier construction options, and Figure 34 displays part of the decision tree with fuzzy thresholds: Figure 33: Classifier Construction Options (Quinlan) Carter 42 Figure 34: Decision Tree Output with Fuzzy Thresholds (Quinlan) Of note is how the upper and lower bounds of the thresholds are specified. For instance, in the non-fuzzy example, when no_child is > 0 and wife_ed = 1, wife_age has one threshold, 36 – if wife_age is greater than 36, no use is returned; if wife_age is less than or equal to 36, then the tree branches to the st_live attribute to determine the appropriate class. However, in the fuzzy example, there is no one specific threshold, or cut-off. If no_child >= 1 and wife_ed = 1 when wife_age is >= 38, no use is returned; when wife_age is <= 35, then the tree branches to the the st_live attribrute to determine which class is predicted. The fuzzy thresholds option constructs an interval close to the threshold. Within this interval, both branches of the tree are explored. Next, the results are combined to give a predicted class. When wife_age is greater than 35 and less than 38, or 35 < wife_age < 38, the prediction becomes imprecise. A wife_age value of 36.5 is chosen as the fuzzy threshold. Carter 43 5.4 CONCLUSION All three data mining algorithms were successful at predicting the contraceptive method choice of an Indonesian woman based on her demographic and socio-economic characteristics. B-Course created Bayesian dependency networks for the attributes of the dataset. The estimated classification accuracy of the best model found was 48.74%. With the resulting accuracy of the classification being less than 50% in this case, the Naive Bayesian algorithm may not be the best model for this dataset. It is possible that the creation of more candidate models may increase the accuracy percentage. One-Rule classification determined that the no_child attribute, which is the number of children born to an Indonesian woman, was the best rule for predicting the contraceptive method choice. The decision tree algorithm determined that the best predictor of the contraceptive method choice was the rule where no_child <=0, which would predict the no use category (95.7%). In comparing the regular decision tree to the decision tree containing fuzzy thresholds, the regular decision tree had an error rate of 25.0%, while the decision tree with fuzzy thresholds had an error rate of 25.5%. There was not a significant difference between these two methods. Carter 44 WORKS CITED ASPhelp.com. “What are Active Server Pages?”. Retrieved March 8, 2003, from the World Wide Web. http://www.asp-help.com/getstarted/gs_aboutasp.asp Berry, Michael, and Gordon Linoff. Data Mining Techniques for Marketing, Sales, and Customer Support. New York: John Wiley and Sons. 1997. CGI: Common Gateway Interface. Retrieved March 8, 2003, from the World Wide Web. http://hoohoo.ncsa.uiuc.edu/cgi/intro.html. Delphi 2 – Developing for Multi-Tier Distributed Computing Architectures. Retrieved March 9, 2003, from the World Wide Web. http://community.borland.com/article/0,1410,10343,00.html#three. Directions on Microsoft. “What is an Application Server?”. Retrieved March 9, 2003, from the World Wide Web. http://www.directionsonmicrosoft.com/sample/DOMIS/research/2002/12dec/1202wiaas. htm HTML Activity Statement. Retrieved March 8, 2003, from the World Wide Web. http://www.w3.org/MarkUp/Activity. Lewis, David. “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval”. Proceedings of ECML-98, 10th European Conference on Machine Learning. Florham Park, NJ: AT&T Labs Research, 1998. Meadow, Charles, B.R. Boyce, D.H. Kraft. Text Information Retrieval Systems, 2nd Edition. San Diego: Academic Press. 2000. “Naïve Bayes – Introduction”. Retrieved February 5, 2003, from the World Wide Web. http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm. O’Reilly and Associates. “Perl”. Retrieved March 8, 2003, from the World Wide Web. http://www.perldoc.com/perl5.6/pod/perl.html. P.Myllymäki, T.Silander, H.Tirri, and P.Uronen. B-Course: A Web-Based Tool for Bayesian and Causal Data Analysis. International Journal on Artificial Intelligence Tools, Vol 11, No. 3 (2002) 369-387. Quinlan, Ross. “RuleQuest Research Data Mining Tools”. Retrieved March 18, 2003, from the World Wide Web. http://www.rulequest.com/. Tagbo, Kingsley. “Visual Basic Data Mining.Net”. http://www.visual-basicdata-mining.net. 2002. Carter 45 United Nations Population Fund - Indonesia. Retrieved March 16, 2003, from the World Wide Web. http://www.un.or.id/unfpa/idpop.html. Wallach, Hannah. “Supervised Learning Methods”. Retrieved March 14, 2003, from the World Wide Web. http://www.srcf.ucam.org/~hmw26/coursework/dme/node14.html