Assignment 2 Weekly Logs – Back Up Steven Graham B00444855 Table of Contents Learning Log ............................................................................................................................................ 3 Week 1 – 20th September – 24th September........................................................................................... 3 Lecture ................................................................................................................................................ 3 Tutorial ................................................................................................................................................ 3 Practical............................................................................................................................................... 3 Exercise 2 ........................................................................................................................................ 3 Exercise 3 ........................................................................................................................................ 8 Week 2 – 27th September – 1st October............................................................................................... 12 Lecture .............................................................................................................................................. 12 Tutorial .............................................................................................................................................. 12 Practical............................................................................................................................................. 13 Learning Log .......................................................................................................................................... 17 Week 3 –4th October - 8th October ...................................................................................................... 17 Lecture .............................................................................................................................................. 17 Tutorial .............................................................................................................................................. 17 Practical............................................................................................................................................. 18 Telecare ......................................................................................................................................... 18 Telehealth ..................................................................................................................................... 20 Learning Log .......................................................................................................................................... 22 Week 4 – 11th October - 15th October ................................................................................................. 22 Lecture .............................................................................................................................................. 22 Tutorial .............................................................................................................................................. 23 Practical............................................................................................................................................. 23 Learning Log .......................................................................................................................................... 25 Week 5 – 18th October – 22nd October ................................................................................................ 25 Lecture .............................................................................................................................................. 25 Tutorial .............................................................................................................................................. 26 Learning Log .......................................................................................................................................... 27 Week 6 – 25th October - 29th October ................................................................................................. 27 Lecture .............................................................................................................................................. 27 Tutorial .............................................................................................................................................. 29 Practical............................................................................................................................................. 29 Week 7 – 1st November – 5th November ............................................................................................. 30 Practical............................................................................................................................................. 30 Week 8 – 8th November – 12th November ............................................................................................ 31 Practical............................................................................................................................................. 31 Task 1 ................................................................................................................................................ 31 Task 2 ................................................................................................................................................ 32 Resubmit ....................................................................................................................................... 33 Learning Log .......................................................................................................................................... 34 Week 9 – 15th November – 19th November........................................................................................ 34 Practical............................................................................................................................................. 34 Task 3.1 ......................................................................................................................................... 36 Task 3.2 ......................................................................................................................................... 36 Task 3.3 ......................................................................................................................................... 40 Task 3.4 ......................................................................................................................................... 40 Task 3.5 ......................................................................................................................................... 40 Task 3.6 ......................................................................................................................................... 41 Learning Log .......................................................................................................................................... 42 Week 10 – 22nd November – 26th November ..................................................................................... 42 Practical............................................................................................................................................. 42 Task 2.1 ......................................................................................................................................... 42 Task 2.3 ......................................................................................................................................... 42 Task 2.5 ......................................................................................................................................... 42 Task 2.6 ......................................................................................................................................... 42 Tasks 3.1 ........................................................................................................................................ 43 Task 3.2 ......................................................................................................................................... 43 Learning Log Week 1 – 20th September – 24th September Lecture This week’s lecture was an introduction to the module. The lecture covered the different areas in which we would be covering during the module and I found that I was interested in the topics that where being listed. I was also happy that I had chosen the Emerging healthcare module for the second semester. I also got a better understanding of what health informatics means and how it has evolved within the medical profession. I believe that this module is going to be very interesting and enjoyable Tutorial This week’s tutorial was an introduction into the Matlab software. We were taken through some of the basic commands which are used in Matlab in preparation for the practical. Practical This week’s practical was an introduction to Matlab, below are the answers to the practical questions Exercise 2 Q1. >> a = [1 2 3 4 5] a= 1 2 3 4 5 Q2. >> b = [6, 7 , 8 , 9] b= 6 7 8 9 Q3. >> c = [1; 2; 3; 4; 5] c= 1 2 3 4 5 Q4. >> d = [ 123 456 789] d= 123 456 789 Q5. >> e = d' e= 123 456 789 Q6. >> f = [1 2 3 4 5; 6 7 8 9 10]; Q7. A ‘ on a matrix turns the numbers from going down in a number of rows to just one. Q8. A ; on the end stops the results being shown in screen Q9. >> [12345] ans = 12345 Q10. The above row vector was stored inside a variable called ans Q11. >> u = [0:8] u= 0 1 2 3 4 5 6 7 10 12 14 Q12. >> s = [0:2:100] s= Columns 1 through 13 0 2 4 6 8 16 18 20 22 24 Columns 14 through 26 26 28 30 32 34 36 38 40 42 44 46 48 50 62 64 66 68 70 72 74 76 88 90 92 94 96 98 100 Columns 27 through 39 52 54 56 58 60 Columns 40 through 51 78 80 82 84 Q13. >> t = [2:2:6; 7 2 9] t= 2 4 6 7 2 9 86 Q14. >> t = t' t= 2 7 4 2 6 9 Q15. >> v = t(1:3) v= 2 4 6 Q16. >> v = t(1,2) v= 7 Q17. >> v = t (2,1) v= 4 Q18. >> a = [1 2 3 4 5; 6:10; 11:2:19] a= 1 2 3 4 5 6 7 8 9 10 11 13 15 17 Q19. >> a(:,2)=[] a= 1 3 4 5 6 8 9 10 11 15 17 19 19 Q20. >> a = [1 2 3 4 5; 6:10; 11:2:19] a= 1 2 3 4 5 6 7 8 9 10 11 13 15 17 19 Q21. >> a(4,3) ??? Attempted to access a(4,3); index out of bounds because size(a)=[3,5]. Q22. >> a a= 1 2 3 4 5 6 7 8 9 10 11 13 15 17 19 Q23. >> a - 1 ans = 0 1 2 3 4 5 6 7 8 9 10 12 14 16 18 Q24. >> a = a - 1 a= 0 1 2 3 4 5 6 7 8 9 10 12 14 16 18 Exercise 3 Q1. >> a = a' a= 0 5 10 1 6 12 2 7 14 3 8 16 4 9 18 Q3. >> a = [1 2 3 4]; >> a a= 1 2 3 4 a now contains the numbers 1 2 3 4 Q4. >> b = [ 5 6 7 8]; >> b b= 5 6 7 8 b now contains the numbers 5 6 7 8 Q5. >> c = a + b c= 6 8 10 12 C now contains variable a plus variable b. It is taking each number in sequence and adding them together for example a 1 + b 5 = c6 Q6. >> m = [1:2:9; 10:2:19]; >> m m= 1 3 5 10 12 14 7 9 16 18 This is creating a variable called m and creating a matrix of two rows by 5 columns and placing the numbers 1 to 9 in steps of two in the first row and numbers 10 to 19 in the second row in steps of two. Q7. >> b = [2:2:10; 11:2:20] b= 2 4 6 11 13 15 8 10 17 19 The above statement is doing the same as the previous question Q8. >> c = m-b c= -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 This statement is creating a variable called c and performing a mathematical calculation by subtracting the numbers in variable m away from variable b. Q9. >> a =[1 2 3; 4 5 6] a= 1 2 3 4 5 6 Q10. >> b = a' b= 1 4 2 5 3 6 This is changing variable a by placing the data into two columns and then inserting it into a new variable called b. Q11. >> c = a*b c= 14 32 32 77 This is multiplying the two matrices. To work out 14 we do the following Multiply a 1 by b 1 which equals 1 We then Multiply a 2 by b 2 which equals 4 We then Multiply a 3 by b 3 which equals 9 We then add the three answers together 1 + 4 + 9 this equals 14 Below is the website link which I used to learn how matrices are multiplied http://www.intmath.com/Matrices-determinants/4_Multiplying-matrices.php Access on the 28/09/2010 at 21:03 Q12. >> a(3,:) = [] ??? Index of element to remove exceeds matrix dimensions. Q13. >> a = [a(1,:) ; a(2,:); [7 8 0]] a= 1 2 3 4 5 6 7 8 0 Q14. >> z = [] z= [] This creates a variable called z but places no information inside it. Q15. >> for i = 1:10; z(i) = i*i; end Q16. z = 1 4 9 16 25 36 49 64 81 100 Week 2 – 27th September – 1st October Lecture As this week lecture was not available I used the time to read through one of the articles in which we have access to namely the Medical Informatics Past, Present and Future 2010 article. The article was a great read and gave a good understanding on how far health informatics has come over the last decade. As it says in the paper “We can hardly imagine diagnostic procedures without, for instance, diagnostic imaging tools such as computer tomography, or therapeutic actions without the software that checks for medication interactions or uses computer-assisted tools for surgery” I personal would find it very strange to walk into my own doctors surgery and not check myself in automatically using the touch screen arrival, or see the doctor writing inside one of the old patients files rather than filling in boxes on their computer screens. From reading the article you can see that Health informatics is still a relatively new discipline and it is only after the last couple of years that it has really grown and more and more medical task are turning to using some sort of computing. Health care is constantly changing and so Is health informatics, people and groups are constantly trying to improve and created some new way of achieving something, be this new equipment to help monitor our elderly relatives in their homes or news ways of manipulating and creating images from scans to achieve better diagnoses. Tutorial This week’s tutorial was a further extension onto the Matlab software, in the tutorial we were shown how Matlab could be used to – 1. 2. 3. 4. 5. 6. 7. 8. Plotting 2d Graphs Manipulate the lines on the graphs e.g colour, style and marker style Produce figure windows, which could then be saved a jpeg file Create executable file Plotting medical data Subploting Contour plots Surface plots This tutorial has shown how powerful this piece of software is and how useful it could be within health informatics Practical This week’s practical is to help me have a better understanding of how to manipulate medical data in to visual medical data i.e. graphs. In exercise one I was asked to create an executable file which would plot a graph, below is the commands from the file and also the figure that they produced In exercise two I was asked to plot a graph of emg data, on this graph I also had to change the line colours for each finger. Below are a few screen shots taken from exercise two Above image show the commands used to plot the emg data graph, also shown in image. The above image shows the emg commands and then a subplot of averages which is now included inside the same figure as seen above. The above image is produced using the commands as seen in the screen shot. The graphs shows the plotting of ecg data. Learning Log Week 3 –4th October - 8th October Lecture The first part of this week lecture was showing us where we uploaded are assignments for the module. The main part of this week’s lecture was based on patient informatics. This part of medical informatics is new and the aim of it is to empower the patients. Two of the main faculties that would be desired are; 1. Email reminder of appointments 2. Schedule your own appointments online More and more people are now using the internet to do more research into their medical conditions and are also using this information to tell their doctors what they think is wrong (self-diagnoses). Although there is a number of good website out there, there are also quite a few bad ones. One patient education website which we were shown during the lecture was the WebMD website. The main features of this website are; Massive health library New treatment information Symptom checker Another is Revolution health which would appear to be more personalised with it Personal health record Health checker Tutorial This week’s tutorial was an overview of the second assignment. Practical This week’s practical was a research based task. I had to research for companies that claimed to provide telecare and telehealth, below are definition of what telecare and telehealth are; Telecare Telecare is a term given to offering remote care of elderly and vulnerable people, providing the care and reassurance needed to allow them to remain living in their own homes. Wikipedia - http://en.wikipedia.org/wiki/Telecare Below are a few companies within Northern Ireland who claim to provide telecare Figure 1- http://www.aidcall.co.uk/healthcare/ Figure 2 - http://www.mcelwainegroup.com/index.php?page=mcelwaine-smart Figure 3 - http://www.foldgroup.co.uk/pages/27/telecare Telehealth Telehealth is the delivery of health-related services and information via telecommunications technologies. Wikipedia - http://en.wikipedia.org/wiki/Telehealth Below are a few companies within Northern Ireland who claim to provide telecare Figure 4 - http://www.hometelehealthltd.co.uk/ Figure 5 - http://www.telehealthsolutions.co.uk/ The result from the research will be the base of our discussion topic in the week 4 tutorial. Learning Log Week 4 – 11th October - 15th October Lecture This week’s lecture was based on technologies that are used to do measurements of our bodies. The topics discusses where; X-RAY MRI CT ULTRASOUNDS ECG EEG EMG EOG The above technologies have allowed health care professionals to examine patients 1. 2. 3. 4. Nervous System Cardiovascular System Respiratory System Skeletal System Although x-rays are still safer than surgery, they still have their problems, such as radiation sickness and can lead to mutations such as cancers. A major drawback of the x-rays is it can only do 1 dimensional image; this was overcome by in the invention of the CAT or CT scan which allowed for 2D X-rays to be processed into 3D images. The MRI scan then took this imaging to a new level of better quality and it also reduced any risk and is said not to be as harmful. The next types of measurements are Bio signals and these include; Electrocardiogram (ECG) The ECG is used to measure the patient’s heart rate by usual using a 12 lead ECG machine. Electroencephalogram (EEG) The EEG is used to measure the brains electrical activity. This is done by using between 16 to 25 electrodes on the patients scalp. Electromyogram (EMG) The EMG is used to measure muscle function and activity. This is achieved by either placing electrodes into the muscles or by placing gel electrodes on the skin. By placing the electrode needles into the muscle will give a more specific measurement. Electrooculogram (EOG) The EOG is used to measure the resting potentials of the retina. This is achieved by placing the electrodes either, above, below or to the side of the eye. Another type of measurement is the acoustic measurement, which picks up vibrations from the heart and lungs and turns them into sounds. These measurements are taken by using stethoscopes and in more recent years electronic stethoscopes have been invented, where there is now a sensor inside the chest piece of the stethoscope. Tutorial This week’s tutorial is used to do some research on my assignment, as I was unable to attend the tutorial this week. Practical This week’s practical was an introduction into audio processing using a piece of software called Goldwave. From using this product for the practical I found that the quality of the sound that it produced after placing the filters on was excellent. From completing the last task where I had removed the person talking from the breathing patterns, I thought that the software was very powerful and it didn’t seem to lose any quality or any of the breathing patterns. Below are a couple of screen shots taken from the practical. Learning Log Week 5 – 18th October – 22nd October Lecture This week’s lecture was an introduction into medical data, how it’s processed, PAC system and eprescribing and the associated security issues. Medical data is crucial to information processing and decision making; computers are used to process this information in three ways 1. Observation 2. Diagnosis 3. Therapy This medical data can be anything from ECG results to family history, it is usual things that can be observed. There are four different types of data; 1. 2. 3. 4. Narrative data Discrete Numerical Values Analog Data Visual Data Picture Archiving and Communication Systems or PACS are computer, commonly servers which allow medical professional to View images – for example X-Rays Archive images Communicate these images between different areas PACS uses its own independent standard for image storage, this is the Digital Imaging and Communications in Medicine or DICOM. ePrescribing this is the introduction of paperless prescriptions. The doctor will simply fill in the prescription on screen and send it directly to your pharmacy. The aim of ePrescribing is to reduce the amount of errors that currently occur, for example 1 in 20 hospital admissions are thought to be related to medication errors within the UK. ePrescribing may be a good idea and may save lives but on thinking about it for my own local area, I have two pharmacies next to my doctors surgery I could uses, also I know people that will travel to a pharmacy nearer there home for example lisburn where there must be a least 20 pharmacies. So when the system is being implemented all these pharmacies are going to have to be listed and an error could occur where the doctor accidently selects the wrong pharmacy, the patient won’t find out that there prescription has went to the wrong pharmacy until they go to the usual one, then how do you find out which pharmacy it has went to. That’s one problem that I envisage could happen, but I am sure some sort of preventative measure could be put in place to prevent this. Tutorial This week’s tutorial is a reading week. While writing the log I search around for some information on PACS and found couple of website which talk about PACS, pleas find links below NHS Connect - http://www.connectingforhealth.nhs.uk/systemsandservices/pacs eHow - http://www.ehow.co.uk/about_6771301_job-description-pacs-administrator.html Learning Log Week 6 – 25th October - 29th October Lecture This week’s lecture was based on patient records; these are historical records of patient care. Previously patient records had been paper based and this lead to a number of problems which included 1. 2. 3. 4. Illegible handwriting Lost due fire Lost due to flood etc. Lost due to human error Also paper records take up a lot of room, if every person on earth had a patient record there wouldn’t be enough room to store them all. Below is an example of a patient records warehouse Now the paper free patient records era has begun with the introduction of EHR or Electronic Health Records and this is a repository of electronically maintained information about an individual’s health. The electronic health records system has five functional components 1. 2. 3. 4. 5. Integrated view of data Clinical decision support Clinical order entry Access to knowledge resources Integrated communications support EHR systems have the potentials to bring huge benefits to both patients and health professional and this is the reason why they are being implemented across the developed world. The EHR aims to provide easy navigation through the entire medical history of a patient There are a number of different uses for the EHR system, these include 1. 2. 3. 4. 5. 6. 7. 8. 9. Inpatient Outpatient Primary care Disease specific Intensive care Emergency department Hospitals Nursing homes Research departments The main disadvantages for this system are the Initial costs Maintenance costs Treatment of old paper based records Security Below is an example of an EHR system Tutorial This week tutorial was an introduction into the English health services, called NPfIT or National Programme for IT. It was announced in 2002 and was due to be completed within 7 to 8 years at a cost of £6 billion. The project has still not been completed and is well over budget. The main components of the system where to be a National record system o Electronic transfer of prescriptions o Choose and book o PACS o NHS care records service IT infrastructure The aim of the system was to provide o o o o o o o o o Improve share of patient records Allow patients and GPs to book hospital appointments ePrescribing a national network (N3) NHS email services PACS Online personal health organiser NHS care website for both patients and care providers Common user interface in partnership with Microsoft – In researching the user interface I found the Microsoft website - http://www.mscui.net/ Practical This week’s practical I will include in weeks 7 and 8 log. Week 7 – 1st November – 5th November Practical Attributes are the variables Total number of instances – 150 Percentage of correctly classified – 96% Percentage of incorrectly classified – 4% Cross validation is the method of estimating the performance of a predictive model Confused matrix is a visualisation tool used in supervised learning. Each row represents an instance from the class Every instance would contain the correct number Is a confused matrix – class a has 49 and plus one has been incorrectly placed in class b Class b has 47 correct and 3 have been placed incorrectly in c Class c has 48 correct and 2 have been put in class b Overall 6 have been wrongly classified Week 8 – 8th November – 12th November Practical University of Massachusetts Amherst Citation Policy: If you publish material based on databases obtained from this repository, then, in your acknowledgements, please note the assistance you received by using this repository. This will help others to obtain the same data sets and replicate your experiments. We suggest the following pseudo-APA reference format for referring to this repository: Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Here is a BiBTeX citation as well: @misc{Frank+Asuncion:2010 , author = "A. Frank and A. Asuncion", year = "2010", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" } A few data sets have additional citation requests. These requests can be found on the bottom of each data set's web page. To donate a data set to the repository you simple fill out the online form and attached the dataset http://archive.ics.uci.edu/ml/about.html http://archive.ics.uci.edu/ml/citation_policy.html http://archive.ics.uci.edu/ml/donation_policy.html http://archive.ics.uci.edu/ml/donation_form.html Task 1 Citation - B. Kaluza, V. Mirchevska, E. Dovgan, M. Lustrek, M. Gams, An Agent-based Approach to Care in Independent Living, International Joint Conference on Ambient Intelligence (AmI-10), Malaga, Spain, In press Abstract: Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing the same scenario five times. http://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity Task 2 Data Set Characteristics: Multivariate Number of Instances: 336 Attribute Characteristics: Real Number of Attributes: 8 Associated Tasks: Classification Missing Values? No Instances –data points/records Attribute –features / variables Dataset –Collection of data points / records Associated tasks are associated with a specific location in a resource Are missing values allowed within the dataset, this can lead to incorrect results Attribute list 1. Sequence Name: Accession number for the SWISS-PROT database 2. mcg: McGeoch's method for signal sequence recognition. 3. gvh: von Heijne's method for signal sequence recognition. 4. lip: von Heijne's Signal Peptidase II consensus sequence score. Binary attribute. 5. chg: Presence of charge on N-terminus of predicted lipoproteins. Binary attribute. 6. aac: score of discriminant analysis of the amino acid content of outer membrane and periplasmic proteins. 7. alm1: score of the ALOM membrane spanning region prediction program. 8. alm2: score of ALOM program after excluding putative cleavable signal regions from the sequence. http://archive.ics.uci.edu/ml/datasets/Ecoli Resubmit Cross-validation is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. One round of cross-validation involves partitioning asample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Cross validation is used within data mining to fine tune or improve on the results. Confusion Matrix is a visualized tool used for data sets. The rows of matrix show the instances in a predicted class and the column in the matrix shows the instances in the actual class. It can be used to make sure that systems are not confusing 2 classes A perfect matrix would have the numbers diagionaly Learning Log Week 9 – 15th November – 19th November Practical Task 1 Task 2 Task 3.1 Supervised learning is where the machine concludes a function from supervised training data. The training data will consist of training examples. Each example will be a pair consisting of input objesct and output values. The supervised algorithm will analysis the training data and will produce an inferred function or classifier. Task 3.2 === Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: WDBC-weka.filters.unsupervised.attribute.ReorderR2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,1weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.AllFilter Instances: 569 Attributes: 31 radius1 texture1 perimeter1 area1 smoothness1 compactness1 concavity1 concave1 symmetry1 fractal_dimension1 radius2 texture2 perimeter2 area2 smoothness2 compactness2 concavity2 concave2 symmetry2 fractal_dimension2 radius3 texture3 perimeter3 area3 smoothness3 compactness3 concavity3 concave3 symmetry3 fractal_dimension3 class Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------ area3 <= 880.8 | concave3 <= 0.1357 | | area2 <= 36.46: B (319.0/3.0) | | area2 > 36.46 | | | radius1 <= 14.97 | | | | texture2 <= 1.978: B (11.0) | | | | texture2 > 1.978 | | | | | texture2 <= 2.239: M (2.0) | | | | | texture2 > 2.239: B (3.0) | | | radius1 > 14.97: M (2.0) | concave3 > 0.1357 | | texture3 <= 27.37 | | | concave3 <= 0.1789 | | | | area2 <= 21.91: B (12.0) | | | | area2 > 21.91 | | | | | perimeter2 <= 2.615: M (6.0/1.0) | | | | | perimeter2 > 2.615: B (6.0) | | | concave3 > 0.1789: M (4.0) | | texture3 > 27.37: M (21.0) area3 > 880.8 | concavity1 <= 0.0716 | | texture1 <= 19.54: B (9.0/1.0) | | texture1 > 19.54: M (10.0) | concavity1 > 0.0716: M (164.0) Number of Leaves : 13 Size of the tree : 25 Time taken to build model: 0.06 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 530 93.1459 % 39 6.8541 % Incorrectly Classified Instances Kappa statistic 0.8544 Mean absolute error 0.0741 Root mean squared error 0.2579 Relative absolute error 15.8366 % Root relative squared error 53.331 % Total Number of Instances 569 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.925 0.064 0.895 0.925 0.91 0.927 M 0.936 0.075 0.954 0.936 0.945 0.927 B Weighted Avg. 0.931 0.071 === Confusion Matrix === a b <-- classified as 196 16 | a = M 23 334 | b = B 0.932 0.931 0.932 0.927 Task 3.3 Sensitivity – 0.931 Specificity – 0.071 Task 3.4 Random Forest Classification – Correct Classification – 95.7821% Incorrect Classification – 4.2179% Tp – 0.958 Kappa – 0.91 Decision Table Classification Correct Classification – 94.0246% Incorrect Classification – 5.9754% Tp – 0.94 Kappa – 0.871 JRIP Classification – Correct Classification – 92.7944% Incorrect Classification – 7.2056% Tp – 0.928 Kappa – 0.846 Task 3.5 In terms of correct classification random tree classification is the best with 95.7821% In terms of TP random forest classification had the highest of 0.958 In terms of kappa random forest had the highest with 0.91. From the tree classifications above the method in which provided the best results was Radom Forest Classification. Task 3.6 Results from increasing the FOLD 10 fold 20 fold 30 fold 40 fold 50 fold Correct Classification 95.7821% 93.3216% 94.9033% 94.0246% 95.2548% Incorrect Classification 4.2179% 6.6784% 5.0967% 5.9754% 4.7452% TP Kappa 0.958 0.933 0.949 0.94 0.953 0.91 0.8577 0.8905 0.8719 0.8982 From the changing the fold from 10 up to 40 the results where worse, 10 fold provided the best classification. When I entered 50 fold the results appeared to start improving, to see if the higher the fold was the better the result is I decided to enter a fold of 100, below are the results 100 fold Correct Classification 95.4306% Incorrect Classification 4.5694% As you can see the results slightly improved. TP Kappa 0.954 0.9011 Learning Log Week 10 – 22nd November – 26th November Practical Task 2.1 Unsupervised learning is a class of problems where you seek to determine how the data is organised. There are many methods employed here which are based on data mining methods used to preprocess data. It is different from supervised learning as the learner is only given unlabelled examples. Task 2.3 I expect to see two clusters from the dataset Task 2.5 Sensitivity = 0.08421 Specificity = 0.04761 Task 2.6 EM -1 Using EM-1 did not cluster the data correctly. EM -2 Sensitivity = 0.5507 Specificity = 0.1383 Tasks 3.1 Data cleansing is where the detection and correction or removal of corrupt or inaccurate records from the record set takes place. Task 3.2 Data cleansing algorithms can be found under the pre-process tab, and selecting filter.