COIT11224 – Computer Systems – Lecture – Week 6

advertisement
COIT11224 – Computer Systems – Lecture – Week 6
2.1
Listing Numerical Data
Suppose we have a class of 60 students that have submitted assignment 1 (worth a
maximum of 10 marks) for the course COIT11224 – Computer Systems. We would
have 60 pieces of data with each piece of data being a number in the range of 0 to 10.
The data may look like the following:
5
2
9
5
0
8
7
4
9
10
6
9
9
9
8
8
8
6
3
7
7
6
7
8
10
5
8
7
4
3
7
1
6
7
3
8
9
1
5
4
2
0
5
6
8
8
9
2
6
8
7
9
1
4
8
7
9
10
1
5
This is called raw data and a very basic listing has been done simply by having six
rows with ten marks in each row. By inspection, we can see that some students
received the top mark being 10. We can also see that some students really struggled
and didn’t get any marks at all. We can see quite a few marks greater than five and it
would appear that maybe more students passed the assignment than failed. However,
to get a good understanding of the above data, different ways of listing or grouping of
the data into categories can provide easier analysis of just how well the students
performed.
We could list the above data is ascending order and see how it looks.
0
3
5
7
8
9
0
3
5
7
8
9
1
4
6
7
8
9
1
4
6
7
8
9
1
4
6
7
8
9
1
4
6
7
8
9
2
5
6
7
8
9
2
5
6
8
8
10
2
5
7
8
9
10
3
5
7
8
9
10
It is reasonably easy to see that two students received 0 with another 3 students
received the maximum mark of 10. It is also relatively easy to see that far more
students passed than failed the assignment. Can we list or group the data even more
effectively?
Dot diagrams use dots (and other symbols) to indicate the number of times each data
value occurs. Using the above data, we have already noticed that 2 students obtained 0
marks and another 3 obtained 10. We can continue in like manner and count the
number of students achieving each mark represented by the above data. Then we can
construct a dot diagram. Below is a dot diagram of the above data. I have used the *
symbol, but could have used other symbols or a dot.
1
# of students
11
10
9
8
7
6
5
4
3
2
*
1
*
0
*
*
*
*
1
*
*
*
2
*
*
*
3
*
*
*
*
*
*
5
*
*
*
*
4
*
*
*
*
*
*
6
*
*
*
*
*
*
*
*
*
7
*
*
*
*
*
*
*
*
*
*
*
8
*
*
*
*
*
*
*
*
*
9
*
*
*
10
Marks for Assignment 1
Very easily we can see that the mark of 8 was the most common mark obtained by the
students. It is also easy to see that the cluster of asterisks is far more plentiful on the
right-hand side of the diagram. This indicates that far more students passed than
failed. By grouping data, we can improve the view. It is easier to gauge how well
overall the class performed in assignment 1 with the above dot diagram.
The above dot graph uses vertically displayed asterisks to represent the number of
marks in each mark category. We could quite easily rearranged the diagram such that
the asterisks were horizontal and the category listing of the marks shown vertically on
the left-hand side.
Another popular way to represent the frequency of categories is to use proportional
length rectangles or bars. Such diagrams are referred to as bar graphs. In the above
diagram, we would simply replace each column of asterisks with a similar height bar.
These bars are usually separated but sometimes you will see bars touching each other.
The important part is that the height of each bar represents the relative frequency.
Again, we can also have horizontal bars with the extension of the bar to the right
indicating the relative frequency.
2.2 Stem-and-Leaf Displays
Please refer to section 2.2, page 302 of the textbook..
2.3
Frequency Distribution
Suppose now we turn our attention from the marks allocated for assignment 1 to the
overall marks out of 100 that each of the 60 students obtained for the course
COIT11224. The possible range of marks is now 0 to 100. If we tried to apply a dot
diagram the represent the overall marks we soon see that the resultant diagram is not
2
so useful. Image if we had a graph with 101 mark columns along the horizontal axis.
It would be 10 times the size of our assignment 1 diagram. Additionally, we only have
60 marks in total spread out over 101 vertical columns or mark categories. We would
also have many categories with a frequency of zero particularly towards the lower end
of the marks. The rest of the diagram would consist of a light spattering of data
symbols and the diagram would lose its appeal. Trends on how well students were
performing overall could be difficult to see by inspection.
We can make improvements to the dot diagram to convey a more meaningful view.
We can introduce classes of data (intervals or categories that cover a range of data).
Instead of having a diagram with 101 individual mark categories, we could have five
classes of marks whereby each class represents a range of marks. Performing this
operation loses the individual raw marks, but can improve the visual appeal and
understanding of the diagram.
Using the above example, we can construct custom classes to line up with the grading
system used at CQU. For example, if a student achieves 85 or higher marks in a
course, CQU awards a HD, 75 to 84, a D, 65 to 74, C, 50 to 64, P, and finally a F for
all marks less than 50.
Using these classes, we sort and tally all the student marks and count the number of
marks (frequency) in each class. Again, we can use a dot or bar diagram to represent
graphically the frequency of data in each class. Alternately, we can simply tabulate
the frequency distribution as follows:
Grade
# of students
______________________________
HD
D
C
P
F
3
11
15
22
9
______________________________
Total
60
A simple dot diagram of the above could look like the following:
Grade
HD
D
C
P
F
***
***********
***************
**********************
*********
Number of Students
3
The above dot diagram indicates visually that the class or category with the greatest
number of students was P – Pass. General conclusions can be draw quite quickly on
how well overall students performed using the above frequency distribution diagrams.
A couple of points should be mentioned here. Firstly, our classes should be such that
all data fits within one class or another, but never fits in more than one of the classes.
Normally, classes would cover equal ranges of values. Our example above doesn’t,
but the classes used are aimed to creating the desired results for the allocation of
grades. So, some freedom in using or not using equal class sizes can improve the use
and effectiveness of the diagram.
2.4
Graphical Presentations
Histograms are often used to graphically represent frequency distributions.
Histograms are like bar or dot graphs with the intervals on the horizontal axis
representing a range of values.
Other graphical representations of frequency distributions are frequency polygons,
pictograms and pie charts.
Refer to Section 2.4 of the textbook.
2.5
Summarizing two-Variable Data
Lightly read Section 2.5 of the textbook and have an understanding of a scatter
diagram.
3.1
Populations and Samples
What is a population in statistical terms? It is the set of data that covers all
conceivable possibilities in the area of data under consideration. For example, if there
are 60 students in the class and we have the 60 marks, we would say that the set of
data constitutes a population. If we pick (usually randomly) a group of student marks
from the population, we refer to this smaller number of marks as a sample. For
example, we may pick randomly 32 student marks from the population. There 32
marks form a sample. Interestingly, a sample of reasonable size can be used to predict
the trends of the population with reasonable accuracy.
3.2
The Mean
One of the most important calculations that is done on a population or sample is the
“average” or in statistical terms, the arithmetic mean or simply the mean. The mean of
n numbers is their sum divided by n.
For example, to calculate the mean of the 60 student marks, we would add all of the
60 marks together then divide by 60. This gives us the population mean. Similarly, we
can calculate the mean of our sample of 32 student marks in a similar way. We simply
add all the 32 marks together and divide by 32 giving us the sample mean. We might
be surprised at just how close the value of our sample mean is to the population mean.
4
Why do we use samples? Consider the case where there may be more than 5,000
students. It could be quite a task analyzing and calculating the population mean. We
could simply select a random sample of 50 student marks and calculate the sample
mean. The resulting sample mean and sample dot diagram usually can accurately
predict the population mean and population trends without all the work.
In mathematical terms, the mean of the sample is represented by:
x
x
n
where x = sample mean,  x = the sum of the data in the sample and n = the
number of data objects in the sample.
If we consider the mean of the total population,

x
N
where  = the population mean,  x = the sum of the data in the sample and N = the
number of data objects in the population.
3.3
The Weighted Mean
Skip this section dealing with “Weighted Mean”, “Geometric Mean” and
“Harmonic Mean”.
3.4
Median
The median is the value of the middle term when n is odd, and the mean of the middle
two terms when n is even. Before the median can be found or calculated, the data
must be arranged according to size (ie. In ascending or descending order).
The median of a sample is denoted as x . The median of a population is denoted by  .
The median is the value of the
3.5
(n  1)
th term.
2
Other Fractiles
Skip this section on “Other Fractiles”.
3.6
Mode
5
Mode is another statistical Measure. It is simply the value or category that occurs with
highest frequency.
3.7
The Description of Grouped Data
Skip section 3.7
3.8
Technical Note (Summations)
The notation  x used above does not tell us how many and which values of x we
should add together. We can use a more explicit notation as follows:
n
xi  x1  x2  x3  ...  xn

i 1
where it is clear we are adding all the terms of x with subscripts of i ranging from 1
to n .
In next week’s class, we will come across notation as follows:
n
xi

i 1
2
 x12  x2 2  x32  ...  xn 2
which simply means that we are summing the squares of each term of x with
subscripts of i ranging from 1 to n .
6
Download