STATISTICS
Data Collection Data Organisation Data Interpretation Data Presentation
•
Descriptive
the processed data provides a summary of the observations/measurements, such as averages, variations and graphs
•
Inferential
the processed data is used to make judgements or predictions, such as trends, indications of variations between different samples
a) the average December temperature in Sydney has increased by
1
C in the last 50 years
Descriptive
b) it is expected that the average December temperature in Sydney will increase by another 1
C within 25 years
Inferential
c) 25% of people surveyed at a shopping centre indicated that they were aware of increasing temperatures in Sydney
Descriptive
d) A survey has a shown that 75% of Sydneysiders are ignorant of the changing climatic conditions in their city
Inferential
Identify the sample and the population in the following.
(a) a bottle of water is taken from a dam to be tested
Sample – the water in the bottle
Population – all the water in the dam
(b) the frog population of a large wetland is checked by looking at two separate hectares
Sample – the two hectares
Population – the whole wetlands
(c) the levels of lead in fallout around a smelter are assessed by testing a selection of properties
Sample – the selected properties
Population - the whole area
(d) people in shopping centre are asked their opinions … to determine the level of awareness in the community
Sample – the people asked
Population – the community
•
•
• characteristic being measured
category - result of measurement is a “word”, e.g. yes (or no), truck, bird, sparrow, first (or second) etc
numerical - measurement produces number
• could be limited to certain values (e.g. whole numbers)
• any value (e.g. mass of an object)
Exercise 1.3
(a) lead levels in fallout
(b) types of birds observed
(c)
numerical – any value category
numbers of birds observed in different locations
numerical –set values
• large quantities of raw data are not useful for presenting the results of the tests
• they need to organised to show the results in a smaller scale
• tables
• graphs
• averages
• comparisons
•
•
•
•
•
•
•
• organising it so that it can be evaluated more easily generally some sort of table category data is most usually grouped (tallied) the number of times each different category occurs is the recorded result can also be used where the data is numerical
• only with fixed and pre-known values
• a large number of data points numerical (all values) data presents problem must be grouped into ranges information is lost, e.g. 0.1 and 4.9 both fit into 0-5 range
•
•
• identify the minimum and maximum values decide how many groups are appropriate for the size of the dataset determine the groups (which should be equivalent ranges – for example, 0-
5,6-10 etc, but not 0-5, 6-20)
•
C
LASS
E
XERCISE
1.4
You have a data set of 100 pH measurements of river water, ranging from 5 to 9. What would be an appropriate way of grouping them?
•
•
8 ranges of 0.5
e.g. 5.0-5.49, 5.50-5.99 etc
•
• number of times a particular value or range occurs is the frequency spread of data across the range of values is the distribution
•
Is it evenly spread across the groups?
•
Do certain groups have higher frequencies?
•
Is there any pattern?
•
• frequency should considered in relation to total number of data values
relative frequency – the proportion (often as a percentage) of the frequency of the total dataset
• manually tallying - how many occurrences of each value – of large data sets is boring, tiring and potentially inaccurate
•
Excel has some functions which help:
•
COUNT(cell range)
•
COUNTIF(range , criterion)
•
FREQUENCY (range , group) – probably more trouble than it’s worth
•
• returns the total number of cells with numerical data ignores blank cells and nonnumerical values
A
0 n/a
***
10
3
4
7
8
2
6
7
4
5
8
9
1
2
3
B
7
=count(A1:A9)
•
• returns the number of cells meeting a given criteria criteria include =, > or <
A
10
3 n/a
***
4
0
7
8
2
7
8
5
6
9
1
2
3
4
B
7
3
=countif(A1:A9,
”>5”)
•
•
•
• tally data into user-chosen groups entered as an array formula highlight a group of cells where you want the frequencies to appear type in the formula and then hit the key combination
CTRL+SHIFT+ENTER
A
0 n/a
***
10
3
4
7
8
2
4
5
6
1
2
3
7
8
9
B
5
10
4
3
values for groups 0-5,
6-10
=frequency(A1:A9,B6:B7)
One sample set – two variables
General state of health
Healthy
Ill
Male
Sex of koala
Female
45
21
28
9
Two sample sets – one variable
Origin of plant
Native
Introduced
Not identified
Type of parkland
Urban Undeveloped
37%
51
12
65
20
15
•
•
• represents all the data values with one or two average – some way of representing the “most common” value variation – how much spread there is in data set
•
• category variables – class with highest frequency (mode)
• variation cannot be measured numerical variables
• mean – what we normally refer to as average
• mode – most common value (used in grouped data)
• median – the value in the middle when arranged in order
• range – highest – lowest
• standard deviation – calculation of difference of all points from mean
• mean & std dev normally used in scientific data
•
•
• large amount of data simple formulas required all questions and directions contained in Excel spreadsheet