Weekly Logs

advertisement
Assignment 2
Weekly Logs – Back Up
Steven Graham
B00444855
Table of Contents
Learning Log ............................................................................................................................................ 3
Week 1 – 20th September – 24th September........................................................................................... 3
Lecture ................................................................................................................................................ 3
Tutorial ................................................................................................................................................ 3
Practical............................................................................................................................................... 3
Exercise 2 ........................................................................................................................................ 3
Exercise 3 ........................................................................................................................................ 8
Week 2 – 27th September – 1st October............................................................................................... 12
Lecture .............................................................................................................................................. 12
Tutorial .............................................................................................................................................. 12
Practical............................................................................................................................................. 13
Learning Log .......................................................................................................................................... 17
Week 3 –4th October - 8th October ...................................................................................................... 17
Lecture .............................................................................................................................................. 17
Tutorial .............................................................................................................................................. 17
Practical............................................................................................................................................. 18
Telecare ......................................................................................................................................... 18
Telehealth ..................................................................................................................................... 20
Learning Log .......................................................................................................................................... 22
Week 4 – 11th October - 15th October ................................................................................................. 22
Lecture .............................................................................................................................................. 22
Tutorial .............................................................................................................................................. 23
Practical............................................................................................................................................. 23
Learning Log .......................................................................................................................................... 25
Week 5 – 18th October – 22nd October ................................................................................................ 25
Lecture .............................................................................................................................................. 25
Tutorial .............................................................................................................................................. 26
Learning Log .......................................................................................................................................... 27
Week 6 – 25th October - 29th October ................................................................................................. 27
Lecture .............................................................................................................................................. 27
Tutorial .............................................................................................................................................. 29
Practical............................................................................................................................................. 29
Week 7 – 1st November – 5th November ............................................................................................. 30
Practical............................................................................................................................................. 30
Week 8 – 8th November – 12th November ............................................................................................ 31
Practical............................................................................................................................................. 31
Task 1 ................................................................................................................................................ 31
Task 2 ................................................................................................................................................ 32
Resubmit ....................................................................................................................................... 33
Learning Log .......................................................................................................................................... 34
Week 9 – 15th November – 19th November........................................................................................ 34
Practical............................................................................................................................................. 34
Task 3.1 ......................................................................................................................................... 36
Task 3.2 ......................................................................................................................................... 36
Task 3.3 ......................................................................................................................................... 40
Task 3.4 ......................................................................................................................................... 40
Task 3.5 ......................................................................................................................................... 40
Task 3.6 ......................................................................................................................................... 41
Learning Log .......................................................................................................................................... 42
Week 10 – 22nd November – 26th November ..................................................................................... 42
Practical............................................................................................................................................. 42
Task 2.1 ......................................................................................................................................... 42
Task 2.3 ......................................................................................................................................... 42
Task 2.5 ......................................................................................................................................... 42
Task 2.6 ......................................................................................................................................... 42
Tasks 3.1 ........................................................................................................................................ 43
Task 3.2 ......................................................................................................................................... 43
Learning Log
Week 1 – 20th September – 24th September
Lecture
This week’s lecture was an introduction to the module. The lecture covered the different areas in
which we would be covering during the module and I found that I was interested in the topics that
where being listed. I was also happy that I had chosen the Emerging healthcare module for the
second semester. I also got a better understanding of what health informatics means and how it has
evolved within the medical profession.
I believe that this module is going to be very interesting and enjoyable
Tutorial
This week’s tutorial was an introduction into the Matlab software. We were taken through some of
the basic commands which are used in Matlab in preparation for the practical.
Practical
This week’s practical was an introduction to Matlab, below are the answers to the practical
questions
Exercise 2
Q1. >> a = [1 2 3 4 5]
a=
1
2
3
4
5
Q2. >> b = [6, 7 , 8 , 9]
b=
6
7
8
9
Q3. >> c = [1; 2; 3; 4; 5]
c=
1
2
3
4
5
Q4. >> d = [
123
456
789]
d=
123
456
789
Q5. >> e = d'
e=
123 456 789
Q6. >> f = [1 2 3 4 5; 6 7 8 9 10];
Q7. A ‘ on a matrix turns the numbers from going down in a number of rows to just
one.
Q8. A ; on the end stops the results being shown in screen
Q9. >> [12345]
ans =
12345
Q10. The above row vector was stored inside a variable called ans
Q11. >> u = [0:8]
u=
0
1
2
3
4
5
6
7
10
12
14
Q12. >> s = [0:2:100]
s=
Columns 1 through 13
0
2
4
6
8
16
18
20
22
24
Columns 14 through 26
26
28
30
32
34
36
38
40
42
44
46
48
50
62
64
66
68
70
72
74
76
88
90
92
94
96
98 100
Columns 27 through 39
52
54
56
58
60
Columns 40 through 51
78
80
82
84
Q13. >> t = [2:2:6; 7 2 9]
t=
2
4
6
7
2
9
86
Q14. >> t = t'
t=
2
7
4
2
6
9
Q15. >> v = t(1:3)
v=
2
4
6
Q16. >> v = t(1,2)
v=
7
Q17. >> v = t (2,1)
v=
4
Q18. >> a = [1 2 3 4 5; 6:10; 11:2:19]
a=
1
2
3
4
5
6
7
8
9
10
11
13
15
17
Q19. >> a(:,2)=[]
a=
1
3
4
5
6
8
9
10
11
15
17
19
19
Q20. >> a = [1 2 3 4 5; 6:10; 11:2:19]
a=
1
2
3
4
5
6
7
8
9
10
11
13
15
17
19
Q21. >> a(4,3)
??? Attempted to access a(4,3); index out of bounds because size(a)=[3,5].
Q22. >> a
a=
1
2
3
4
5
6
7
8
9
10
11
13
15
17
19
Q23. >> a - 1
ans =
0
1
2
3
4
5
6
7
8
9
10
12
14
16
18
Q24. >> a = a - 1
a=
0
1
2
3
4
5
6
7
8
9
10
12
14
16
18
Exercise 3
Q1. >> a = a'
a=
0
5
10
1
6
12
2
7
14
3
8
16
4
9
18
Q3. >> a = [1 2 3 4];
>> a
a=
1
2
3
4
a now contains the numbers 1 2 3 4
Q4. >> b = [ 5 6 7 8];
>> b
b=
5
6
7
8
b now contains the numbers 5 6 7 8
Q5. >> c = a + b
c=
6
8
10
12
C now contains variable a plus variable b. It is taking each number in sequence and adding
them together for example a 1 + b 5 = c6
Q6. >> m = [1:2:9; 10:2:19];
>> m
m=
1
3
5
10
12
14
7
9
16
18
This is creating a variable called m and creating a matrix of two rows by 5 columns and
placing the numbers 1 to 9 in steps of two in the first row and numbers 10 to 19 in the
second row in steps of two.
Q7. >> b = [2:2:10; 11:2:20]
b=
2
4
6
11
13
15
8
10
17
19
The above statement is doing the same as the previous question
Q8. >> c = m-b
c=
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
This statement is creating a variable called c and performing a mathematical calculation by
subtracting the numbers in variable m away from variable b.
Q9. >> a =[1 2 3; 4 5 6]
a=
1
2
3
4
5
6
Q10. >> b = a'
b=
1
4
2
5
3
6
This is changing variable a by placing the data into two columns and then inserting it into a
new variable called b.
Q11. >> c = a*b
c=
14
32
32
77
This is multiplying the two matrices. To work out 14 we do the following
Multiply a 1 by b 1 which equals 1
We then
Multiply a 2 by b 2 which equals 4
We then
Multiply a 3 by b 3 which equals 9
We then add the three answers together
1 + 4 + 9 this equals 14
Below is the website link which I used to learn how matrices are multiplied
http://www.intmath.com/Matrices-determinants/4_Multiplying-matrices.php
Access on the 28/09/2010 at 21:03
Q12. >> a(3,:) = []
??? Index of element to remove exceeds matrix dimensions.
Q13. >> a = [a(1,:) ; a(2,:); [7 8 0]]
a=
1
2
3
4
5
6
7
8
0
Q14. >> z = []
z=
[]
This creates a variable called z but places no information inside it.
Q15. >> for i = 1:10;
z(i) = i*i;
end
Q16. z =
1
4
9
16
25
36
49
64
81 100
Week 2 – 27th September – 1st October
Lecture
As this week lecture was not available I used the time to read through one of the articles in which we
have access to namely the Medical Informatics Past, Present and Future 2010 article. The article was
a great read and gave a good understanding on how far health informatics has come over the last
decade. As it says in the paper “We can hardly imagine diagnostic procedures without, for instance,
diagnostic imaging tools such as computer tomography, or therapeutic actions without the software
that checks for medication interactions or uses computer-assisted tools for surgery” I personal
would find it very strange to walk into my own doctors surgery and not check myself in automatically
using the touch screen arrival, or see the doctor writing inside one of the old patients files rather
than filling in boxes on their computer screens.
From reading the article you can see that Health informatics is still a relatively new discipline and it is
only after the last couple of years that it has really grown and more and more medical task are
turning to using some sort of computing.
Health care is constantly changing and so Is health informatics, people and groups are constantly
trying to improve and created some new way of achieving something, be this new equipment to help
monitor our elderly relatives in their homes or news ways of manipulating and creating images from
scans to achieve better diagnoses.
Tutorial
This week’s tutorial was a further extension onto the Matlab software, in the tutorial we were
shown how Matlab could be used to –
1.
2.
3.
4.
5.
6.
7.
8.
Plotting 2d Graphs
Manipulate the lines on the graphs e.g colour, style and marker style
Produce figure windows, which could then be saved a jpeg file
Create executable file
Plotting medical data
Subploting
Contour plots
Surface plots
This tutorial has shown how powerful this piece of software is and how useful it could be within
health informatics
Practical
This week’s practical is to help me have a better understanding of how to manipulate
medical data in to visual medical data i.e. graphs. In exercise one I was asked to create an
executable file which would plot a graph, below is the commands from the file and also the
figure that they produced
In exercise two I was asked to plot a graph of emg data, on this graph I also had to change
the line colours for each finger. Below are a few screen shots taken from exercise two
Above image show the commands used to plot the emg data graph, also shown in image.
The above image shows the emg commands and then a subplot of averages which is now
included inside the same figure as seen above.
The above image is produced using the commands as seen in the screen shot. The graphs
shows the plotting of ecg data.
Learning Log
Week 3 –4th October - 8th October
Lecture
The first part of this week lecture was showing us where we uploaded are assignments for the
module.
The main part of this week’s lecture was based on patient informatics. This part of medical
informatics is new and the aim of it is to empower the patients. Two of the main faculties that would
be desired are;
1. Email reminder of appointments
2. Schedule your own appointments online
More and more people are now using the internet to do more research into their medical conditions
and are also using this information to tell their doctors what they think is wrong (self-diagnoses).
Although there is a number of good website out there, there are also quite a few bad ones.
One patient education website which we were shown during the lecture was the WebMD website.
The main features of this website are;



Massive health library
New treatment information
Symptom checker
Another is Revolution health which would appear to be more personalised with it


Personal health record
Health checker
Tutorial
This week’s tutorial was an overview of the second assignment.
Practical
This week’s practical was a research based task. I had to research for companies that claimed to
provide telecare and telehealth, below are definition of what telecare and telehealth are;
Telecare
Telecare is a term given to offering remote care of elderly and vulnerable people, providing
the care and reassurance needed to allow them to remain living in their own homes.
Wikipedia - http://en.wikipedia.org/wiki/Telecare
Below are a few companies within Northern Ireland who claim to provide telecare
Figure 1- http://www.aidcall.co.uk/healthcare/
Figure 2 - http://www.mcelwainegroup.com/index.php?page=mcelwaine-smart
Figure 3 - http://www.foldgroup.co.uk/pages/27/telecare
Telehealth
Telehealth is the delivery of health-related services and information via telecommunications
technologies.
Wikipedia - http://en.wikipedia.org/wiki/Telehealth
Below are a few companies within Northern Ireland who claim to provide telecare
Figure 4 - http://www.hometelehealthltd.co.uk/
Figure 5 - http://www.telehealthsolutions.co.uk/
The result from the research will be the base of our discussion topic in the week 4 tutorial.
Learning Log
Week 4 – 11th October - 15th October
Lecture
This week’s lecture was based on technologies that are used to do measurements of our bodies. The
topics discusses where;








X-RAY
MRI
CT
ULTRASOUNDS
ECG
EEG
EMG
EOG
The above technologies have allowed health care professionals to examine patients
1.
2.
3.
4.
Nervous System
Cardiovascular System
Respiratory System
Skeletal System
Although x-rays are still safer than surgery, they still have their problems, such as radiation sickness
and can lead to mutations such as cancers. A major drawback of the x-rays is it can only do 1
dimensional image; this was overcome by in the invention of the CAT or CT scan which allowed for
2D X-rays to be processed into 3D images.
The MRI scan then took this imaging to a new level of better quality and it also reduced any risk and
is said not to be as harmful.
The next types of measurements are Bio signals and these include;

Electrocardiogram (ECG)
The ECG is used to measure the patient’s heart rate by usual using a 12 lead ECG
machine.

Electroencephalogram (EEG)
The EEG is used to measure the brains electrical activity. This is done by using
between 16 to 25 electrodes on the patients scalp.

Electromyogram (EMG)
The EMG is used to measure muscle function and activity. This is achieved by either
placing electrodes into the muscles or by placing gel electrodes on the skin. By
placing the electrode needles into the muscle will give a more specific
measurement.

Electrooculogram (EOG)
The EOG is used to measure the resting potentials of the retina. This is achieved by
placing the electrodes either, above, below or to the side of the eye.
Another type of measurement is the acoustic measurement, which picks up vibrations from the
heart and lungs and turns them into sounds. These measurements are taken by using stethoscopes
and in more recent years electronic stethoscopes have been invented, where there is now a sensor
inside the chest piece of the stethoscope.
Tutorial
This week’s tutorial is used to do some research on my assignment, as I was unable to attend the
tutorial this week.
Practical
This week’s practical was an introduction into audio processing using a piece of software called
Goldwave. From using this product for the practical I found that the quality of the sound that it
produced after placing the filters on was excellent. From completing the last task where I had
removed the person talking from the breathing patterns, I thought that the software was very
powerful and it didn’t seem to lose any quality or any of the breathing patterns. Below are a couple
of screen shots taken from the practical.
Learning Log
Week 5 – 18th October – 22nd October
Lecture
This week’s lecture was an introduction into medical data, how it’s processed, PAC system and
eprescribing and the associated security issues.
Medical data is crucial to information processing and decision making; computers are used to
process this information in three ways
1. Observation
2. Diagnosis
3. Therapy
This medical data can be anything from ECG results to family history, it is usual things that can be
observed. There are four different types of data;
1.
2.
3.
4.
Narrative data
Discrete Numerical Values
Analog Data
Visual Data
Picture Archiving and Communication Systems or PACS are computer, commonly servers which allow
medical professional to



View images – for example X-Rays
Archive images
Communicate these images between different areas
PACS uses its own independent standard for image storage, this is the Digital Imaging and
Communications in Medicine or DICOM.
ePrescribing this is the introduction of paperless prescriptions. The doctor will simply fill in the
prescription on screen and send it directly to your pharmacy. The aim of ePrescribing is to reduce
the amount of errors that currently occur, for example 1 in 20 hospital admissions are thought to be
related to medication errors within the UK.
ePrescribing may be a good idea and may save lives but on thinking about it for my own local area, I
have two pharmacies next to my doctors surgery I could uses, also I know people that will travel to a
pharmacy nearer there home for example lisburn where there must be a least 20 pharmacies. So
when the system is being implemented all these pharmacies are going to have to be listed and an
error could occur where the doctor accidently selects the wrong pharmacy, the patient won’t find
out that there prescription has went to the wrong pharmacy until they go to the usual one, then how
do you find out which pharmacy it has went to.
That’s one problem that I envisage could happen, but I am sure some sort of preventative measure
could be put in place to prevent this.
Tutorial
This week’s tutorial is a reading week. While writing the log I search around for some information
on PACS and found couple of website which talk about PACS, pleas find links below
NHS Connect - http://www.connectingforhealth.nhs.uk/systemsandservices/pacs
eHow - http://www.ehow.co.uk/about_6771301_job-description-pacs-administrator.html
Learning Log
Week 6 – 25th October - 29th October
Lecture
This week’s lecture was based on patient records; these are historical records of patient care.
Previously patient records had been paper based and this lead to a number of problems which
included
1.
2.
3.
4.
Illegible handwriting
Lost due fire
Lost due to flood etc.
Lost due to human error
Also paper records take up a lot of room, if every person on earth had a patient record there
wouldn’t be enough room to store them all. Below is an example of a patient records warehouse
Now the paper free patient records era has begun with the introduction of EHR or Electronic Health
Records and this is a repository of electronically maintained information about an individual’s
health.
The electronic health records system has five functional components
1.
2.
3.
4.
5.
Integrated view of data
Clinical decision support
Clinical order entry
Access to knowledge resources
Integrated communications support
EHR systems have the potentials to bring huge benefits to both patients and health professional and
this is the reason why they are being implemented across the developed world. The EHR aims to
provide easy navigation through the entire medical history of a patient
There are a number of different uses for the EHR system, these include
1.
2.
3.
4.
5.
6.
7.
8.
9.
Inpatient
Outpatient
Primary care
Disease specific
Intensive care
Emergency department
Hospitals
Nursing homes
Research departments
The main disadvantages for this system are the




Initial costs
Maintenance costs
Treatment of old paper based records
Security
Below is an example of an EHR
system
Tutorial
This week tutorial was an introduction into the English health services, called NPfIT or National
Programme for IT. It was announced in 2002 and was due to be completed within 7 to 8 years at a
cost of £6 billion. The project has still not been completed and is well over budget. The main
components of the system where to be a


National record system
o Electronic transfer of prescriptions
o Choose and book
o PACS
o NHS care records service
IT infrastructure
The aim of the system was to provide
o
o
o
o
o
o
o
o
o
Improve share of patient records
Allow patients and GPs to book hospital appointments
ePrescribing
a national network (N3)
NHS email services
PACS
Online personal health organiser
NHS care website for both patients and care providers
Common user interface in partnership with Microsoft – In researching the user interface I
found the Microsoft website - http://www.mscui.net/
Practical
This week’s practical I will include in weeks 7 and 8 log.
Week 7 – 1st November – 5th November
Practical

Attributes are the variables

Total number of instances – 150

Percentage of correctly classified – 96%

Percentage of incorrectly classified – 4%

Cross validation is the method of estimating the performance of a predictive model

Confused matrix is a visualisation tool used in supervised learning. Each row represents an
instance from the class

Every instance would contain the correct number

Is a confused matrix – class a has 49 and plus one has been incorrectly placed in class b

Class b has 47 correct and 3 have been placed incorrectly in c

Class c has 48 correct and 2 have been put in class b

Overall 6 have been wrongly classified
Week 8 – 8th November – 12th November
Practical

University of Massachusetts Amherst
Citation Policy:
If you publish material based on databases obtained from this repository, then, in your
acknowledgements, please note the assistance you received by using this repository. This will help
others to obtain the same data sets and replicate your experiments. We suggest the following
pseudo-APA reference format for referring to this repository:
Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].
Irvine, CA: University of California, School of Information and Computer Science.
Here is a BiBTeX citation as well:
@misc{Frank+Asuncion:2010 ,
author = "A. Frank and A. Asuncion",
year = "2010",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences" }
A few data sets have additional citation requests. These requests can be found on the bottom of
each data set's web page.

To donate a data set to the repository you simple fill out the online form and attached the
dataset
http://archive.ics.uci.edu/ml/about.html
http://archive.ics.uci.edu/ml/citation_policy.html
http://archive.ics.uci.edu/ml/donation_policy.html
http://archive.ics.uci.edu/ml/donation_form.html
Task 1
Citation - B. Kaluza, V. Mirchevska, E. Dovgan, M. Lustrek, M. Gams, An Agent-based Approach to
Care in Independent Living, International Joint Conference on Ambient Intelligence (AmI-10),
Malaga, Spain, In press
Abstract: Data contains recordings of five people performing different activities. Each person wore
four sensors (tags) while performing the same scenario five times.
http://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity
Task 2
Data Set Characteristics:
Multivariate
Number of Instances:
336
Attribute Characteristics:
Real
Number of Attributes:
8
Associated Tasks:
Classification
Missing Values?
No
Instances –data points/records
Attribute –features / variables
Dataset –Collection of data points / records
Associated tasks are associated with a specific location in a resource
Are missing values allowed within the dataset, this can lead to incorrect results
Attribute list 1. Sequence Name: Accession number for the SWISS-PROT database
2. mcg: McGeoch's method for signal sequence recognition.
3. gvh: von Heijne's method for signal sequence recognition.
4. lip: von Heijne's Signal Peptidase II consensus sequence score. Binary attribute.
5. chg: Presence of charge on N-terminus of predicted lipoproteins. Binary attribute.
6. aac: score of discriminant analysis of the amino acid content of outer membrane and periplasmic
proteins.
7. alm1: score of the ALOM membrane spanning region prediction program.
8. alm2: score of ALOM program after excluding putative cleavable signal regions from the
sequence.
http://archive.ics.uci.edu/ml/datasets/Ecoli
Resubmit
Cross-validation is a technique for assessing how the results of a statistical analysis will generalize to
an independent data set. It is mainly used in settings where the goal is prediction, and one wants to
estimate how accurately a predictive model will perform in practice. One round of cross-validation
involves partitioning asample of data into complementary subsets, performing the analysis on one
subset (called the training set), and validating the analysis on the other subset (called the validation
set or testing set). To reduce variability, multiple rounds of cross-validation are performed using
different partitions, and the validation results are averaged over the rounds.
Cross validation is used within data mining to fine tune or improve on the results.
Confusion Matrix is a visualized tool used for data sets. The rows of matrix show the instances in a
predicted class and the column in the matrix shows the instances in the actual class. It can be used
to make sure that systems are not confusing 2 classes
A perfect matrix would have the numbers diagionaly
Learning Log
Week 9 – 15th November – 19th November
Practical
Task 1
Task 2
Task 3.1
Supervised learning is where the machine concludes a function from supervised training data. The
training data will consist of training examples. Each example will be a pair consisting of input objesct
and output values. The supervised algorithm will analysis the training data and will produce an
inferred function or classifier.
Task 3.2
=== Run information ===
Scheme:
weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: WDBC-weka.filters.unsupervised.attribute.ReorderR2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,1weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroRweka.filters.AllFilter
Instances: 569
Attributes: 31
radius1
texture1
perimeter1
area1
smoothness1
compactness1
concavity1
concave1
symmetry1
fractal_dimension1
radius2
texture2
perimeter2
area2
smoothness2
compactness2
concavity2
concave2
symmetry2
fractal_dimension2
radius3
texture3
perimeter3
area3
smoothness3
compactness3
concavity3
concave3
symmetry3
fractal_dimension3
class
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
------------------
area3 <= 880.8
| concave3 <= 0.1357
| | area2 <= 36.46: B (319.0/3.0)
| | area2 > 36.46
| | | radius1 <= 14.97
| | | | texture2 <= 1.978: B (11.0)
| | | | texture2 > 1.978
| | | | | texture2 <= 2.239: M (2.0)
| | | | | texture2 > 2.239: B (3.0)
| | | radius1 > 14.97: M (2.0)
| concave3 > 0.1357
| | texture3 <= 27.37
| | | concave3 <= 0.1789
| | | | area2 <= 21.91: B (12.0)
| | | | area2 > 21.91
| | | | | perimeter2 <= 2.615: M (6.0/1.0)
| | | | | perimeter2 > 2.615: B (6.0)
| | | concave3 > 0.1789: M (4.0)
| | texture3 > 27.37: M (21.0)
area3 > 880.8
| concavity1 <= 0.0716
| | texture1 <= 19.54: B (9.0/1.0)
| | texture1 > 19.54: M (10.0)
| concavity1 > 0.0716: M (164.0)
Number of Leaves :
13
Size of the tree :
25
Time taken to build model: 0.06 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances
530
93.1459 %
39
6.8541 %
Incorrectly Classified Instances
Kappa statistic
0.8544
Mean absolute error
0.0741
Root mean squared error
0.2579
Relative absolute error
15.8366 %
Root relative squared error
53.331 %
Total Number of Instances
569
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.925
0.064
0.895
0.925
0.91
0.927 M
0.936
0.075
0.954
0.936
0.945
0.927 B
Weighted Avg. 0.931
0.071
=== Confusion Matrix ===
a b <-- classified as
196 16 | a = M
23 334 | b = B
0.932
0.931
0.932
0.927
Task 3.3
Sensitivity – 0.931
Specificity – 0.071
Task 3.4
Random Forest Classification –
Correct Classification – 95.7821%
Incorrect Classification – 4.2179%
Tp – 0.958
Kappa – 0.91
Decision Table Classification Correct Classification – 94.0246%
Incorrect Classification – 5.9754%
Tp – 0.94
Kappa – 0.871
JRIP Classification –
Correct Classification – 92.7944%
Incorrect Classification – 7.2056%
Tp – 0.928
Kappa – 0.846
Task 3.5
In terms of correct classification random tree classification is the best with 95.7821%
In terms of TP random forest classification had the highest of 0.958
In terms of kappa random forest had the highest with 0.91.
From the tree classifications above the method in which provided the best results was Radom Forest
Classification.
Task 3.6
Results from increasing the FOLD
10 fold
20 fold
30 fold
40 fold
50 fold
Correct
Classification
95.7821%
93.3216%
94.9033%
94.0246%
95.2548%
Incorrect
Classification
4.2179%
6.6784%
5.0967%
5.9754%
4.7452%
TP
Kappa
0.958
0.933
0.949
0.94
0.953
0.91
0.8577
0.8905
0.8719
0.8982
From the changing the fold from 10 up to 40 the results where worse, 10 fold provided the best
classification. When I entered 50 fold the results appeared to start improving, to see if the higher the
fold was the better the result is I decided to enter a fold of 100, below are the results
100 fold
Correct
Classification
95.4306%
Incorrect
Classification
4.5694%
As you can see the results slightly improved.
TP
Kappa
0.954
0.9011
Learning Log
Week 10 – 22nd November – 26th November
Practical
Task 2.1
Unsupervised learning is a class of problems where you seek to determine how the data is organised.
There are many methods employed here which are based on data mining methods used to preprocess data. It is different from supervised learning as the learner is only given unlabelled
examples.
Task 2.3
I expect to see two clusters from the dataset
Task 2.5
Sensitivity = 0.08421
Specificity = 0.04761
Task 2.6
EM -1
Using EM-1 did not cluster the data correctly.
EM -2
Sensitivity = 0.5507
Specificity = 0.1383
Tasks 3.1
Data cleansing is where the detection and correction or removal of corrupt or inaccurate records
from the record set takes place.
Task 3.2
Data cleansing algorithms can be found under the pre-process tab, and selecting filter.
Download