UNIVERSITY OF GUJRAT HAFIZ HAYAT DEPARTMENT OF INFORMATION TECHNOLOGY Project No#1 Name: Mubashir Ismail Ahmad Khan Ehtisham Murtaza Roll No: 19011556-020 19011556-022 19011556-030 Teacher: Dr. Samina Naz Section: IT-19- A K-nn Classification: k-nn is the supervised learning method used for the classification and regression. We take the 2 datasets from the UCI machine learning repository and import them on the mat lab and there we applied the K-nn classification algorithm on them. 1. Haberman Dataset The dataset was generated in the result of a research conducted to study the survival of the patient who has done their breast cancer surgery. And the total of the 308 number of instance are in the dataset that has the attributes like 1) Age of patient 2) Year of operation 3) Number of the positive auxiliary nodes 4) Alive status(class attribute) Survived 5 year or more Survived less than 5 years In Haberman dataset we have 2 classes and total 4 attributes. So we will apply the fine K-nn algorithm on them. We take the different values of (K) and check the accuracy of the dataset. We take the value of K=3 K=7 K=10 K=35 and get and get and get and get accuracy=67.4% accuracy=72.3% accuracy=72.6% accuracy=74.6% 2. Caesarian Section Classification Dataset This dataset was generated by the M.Zain Amin.and the datset was generated after the research conducted to study the caesarian or not.The attributes are the following 1) Age of the patient 2) The delivery number 3) The delivery time 4) Pressure of the blood 5) Problem of the heart The dataset has the total of the 80 observations and the number of the attributes are the 5. We K=3 and get accuracy=52.5% K=6 and get accuracy=67.55 K-mean Clustering: K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster. Working Mechanisam 1. 2. 3. 4. Choose the value of K. Randomly select the K data points to represent cluster centroid. Assign all other data points to its nearest cluster centroids. Reposition the cluster centroid until it is the average of the points in the cluster. 5. Repeat step 3 and step 4 until there are no changes in cluster. The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum value of K for a given data. 1. K-mean Clustering on the iris species In the Iris Dataset we have given the following attributes.The sepal length sepal width petal length petal width class: -- Sentosa -- Versicolor -- Virginica And the dataset contains the 50 instances of the 3 dataset each. The next thing is to find the number of clusters in the dataset. So after finding the value of the K we can make the clusters according to it. We are using the Elbow curve method to find the number of clusters in the dataset. After the analysis we get the value of k=2 and we make the 2 clusters of the dataset. If we change the value of k=3 ,then we get following results. So, this shows that we have carefully make the clusters and set the value of the K. 5. Comparative analysis of the K-nn and the K-mean K means is an unsupervised learning clustering algorithm, while KNN is a supervised learning classification algorithm. K means creates classes out of unlabeled data while KNN classifies data to available classes from labeled data. K-nn has shown incredible utility in solving the classification problems. But, selecting K may be complicated and it needs large number of samples for precision. It requires no training phase, all the work is done throughout the testing phase. Traditional KNN is simple effective and non parametric widely used for classification but it may not be effective for large scale database or data having many categories. Moreover, it uses all training samples for classification and prediction that may become a problem for large scale databases.