Clustering Tutorial for IBM Intelligent Miner

advertisement
Tutorial for IBM Intelligent Miner
1. Start -> Programs -> DB2 Intelligent Miner -> Intelligent Miner
2. Click Cancel in the following window
3. In the following window, called mining base container, right click the Data folder
in the left panel and click the Create data in the popup menu.
4. In the following window, select Flat files as your data format and name your
dataset as “banking”. Click Next to continue
5. In the next window, type path and file name of your dataset file (banking.txt),
click Add file, select Read only as your Use mode and then click Next button.
6. Now, we specify each data field in the source dataset. For each data field, you
should input the Begin and end position, Field name, and data type as shown in
the follow and click Add button when you finish specifying each data field. The
result is presented as follow. Click Next button to continue.
7. Click Next button again to skip defining the computed fields and click finish in
the last wizard window.
8. Save your work in the Mining Base by select the Mining Base -> Save Mining
Base As in the menu bar. Type in your mining base name in the popup window
and click Save button.
9. To perform clustering, right click Mining folder in the Mining base container and
select create mining in the popup menu
10. Click Next in the Welcome window. In the following window, select Clustering –
Demographic as your mining function, type your Settings name (Banking
Clustering Model), check Show the advanced pages and controls, click next
button to continue.
11. In the following window, select banking as input data, select optimize running for
Disk space, and then click on the button next to the filter records condition.
12. In the following popup window, follow the following steps to input the condition
product = 1
a. Click the AND push button. The expression builder creates a template for
the expression, which displays as ((Arg1 = Arg2)).
b. In the Category list, click on Field Names. The Value list displays all the
available fields that you can include in this expression.
c. From the Value list, select the field product.
d. Click on the Arg1 button. This sets the field product as the first argument
in the expression.
e. Select Constants from the Category list.
f. Double-click on <new constant> in the Value list.
g. Type in the new constant value 1.
h. Press Enter. The new constant is added to the list of constants.
i. Select the constant 1 from the Value list.
j. Click the Arg2 button. This sets the constant value of 1 as the second
argument in the expression.
k. Click OK to return to the Input data page of the Mining wizard.
l. Click Next to continue.
13. In the following window, make sure you select the Clustering mode option and
check the Use default for all four parameters. Click Next to continue.
14. In the following window, select age, income, siblings and type as Active fields,
and select gender as complementary fields since the bank policy does not allow
the user to use gender to make market decision. Click Next to continue.
15. On the next Field parameters page of the wizard, click Next to continue.
16. On the Additional field parameters page of the wizard, click Next to continue.
17. On the Outlier treatment page of the wizard, click Next to continue.
18. On the Similarity matrix page of the wizard, click Next to continue.
19. In the output fields window, make sure that select the option of Do not create
output, and click Next button to continue.
20. In result window, check if a result with this name exists, overwrite it. Please click
Next to continue
21. Click Finish button in the last summary window.
22. In the main window, expand the Mining folder and select Clustering in the
Mining base container, then select Banking Clustering Model in the up-right
container. Click the Run icon to start mining.
23. After finishing mining, IM will popup one result window. It presents all the
clusters and the distributions of values of each active and supplementary fields in
each cluster and whole data set. You can see the detail information by:
a. Double click statistical graph in the window
b. Select a statistical graph or cluster, then choose Selected -> Details for ->
Partition or Fields in the menu bar
c. Right click each part in the result window you are interested in, then select
the appropriate menu item in the popup menu.
24. Please do not forget to save your work in the mining base.
Apply clustering model
25. To apply clustering model, right click Mining folder in the Mining base container
and select create mining in the popup menu
26. Click Next in the Welcome window. In the next window, select Clustering –
Demographic as your mining function, type your Settings name (Banking Apply
Model), check Show the advanced pages and controls, click next button to
continue.
27. In the following window, select banking as input data, select optimize running for
Disk space, and then click the button next to the filter records condition.
28. In the popup window, follow the following steps to input the condition
product <> 1
a. Click the AND push button. The expression builder creates a template for
the expression, which displays as ((Arg1 = Arg2)).
b. In the Category list, click on Field Names. The Value list displays all the
available fields that you can include in this expression.
c. From the Value list, select the field product.
d. Click on the Arg1 button. This sets the field product as the first argument
in the expression.
e. Click the <> button. This sets the operand as “not equal to”
f. Select Constants from the Category list.
g. Select the constant 1 from the Value list.
h. Click the Arg2 button. This sets the constant value of 1 as the second
argument in the expression.
i. Click OK to return to the Input data page of the Mining wizard.
j. Click Next to continue.
29. In the following window, make sure you select the Application mode option and
select the Banking Clustering Model. Click next to continue.
30. In the next window, select age, income, siblings and type as Active fields, and
select gender as complementary fields since the bank policy does not allow the
user to use gender to make market decision. Click Next to continue.
31. On the next Field parameters page of the wizard, click Next to continue.
32. On the Additional field parameters page of the wizard, click Next to continue.
33. On the Outlier treatment page of the wizard, click Next to continue.
34. On the Similarity matrix page of the wizard, click Next to continue.
35. In the output fields window, select the option of Create output data. Select
available fields as output fields. Type in clusterID in the Cluster ID field name
entry field, type in score in the Record score field name entry field, and type in
conf in the Confidence field name entry field. Click Next button to continue.
36. In output data window, click Create data button.
37. In the Welcome window, Click Next to continue.
38. Select Flat files, type in the settings name, such as Bankapp, Click Next.
39. On the Flat files page, change to the directory that contains the filebanking.txt. In
the Path and file name entry field append bankapp.txt to the path, then click on
Add file. Select Read and Write as Use mode and check the The specified flat file
does not yet exist. Click Next to continue.
40. On the Summary page of the Data wizard, click Finish to continue.
41. As you return to the output data window, select Bankapp as output data and click
Next.
42. Click Finish button in the last summary window.
43. In the main window, expand the Mining folder and select Clustering in the
Mining base container, then select Banking Apply Model in the up-right container.
Click the Run icon to start..
44. After running, please do not forget to save your work in the mining base.
45. Use Notepad or Textpad to open the output data file (bankapp.txt) in your
working directory.
Check IBM official tutorial for further information.
http://www-3.ibm.com/software/data/iminer/fordata/scenario1.pdf
Download