Sci Info Skills
STATISTICS
Data Collection Data Organisation Data Interpretation Data Presentation
Descriptive
• the processed data provides a summary of the observations/measurements, such as averages, variations and graphs
Inferential
• the processed data is used to make judgements or predictions, such as trends, indications of variations between different samples
a) the average December temperature in Sydney has increased by
1
C in the last 50 years
Descriptive b) it is expected that the average December temperature in Sydney will increase by another 1
C within 25 years Inferential c) 25% of people surveyed at a shopping centre indicated that they were aware of increasing temperatures in Sydney Descriptive d) A survey has a shown that 75% of Sydneysiders are ignorant of the changing climatic conditions in their city
Inferential
Identify the sample and the population in the following.
(a) a bottle of water is taken from a dam to be tested
Sample – the water in the bottle
Population – all the water in the dam
(b) the frog population of a large wetland is checked by looking at two separate hectares
Sample – the two hectares
Population – the whole wetlands
(c) the levels of lead in fallout around a smelter are assessed by testing a selection of properties
Sample – the selected properties
Population - the whole area
(d) people in shopping centre are asked their opinions … to determine the level of awareness in the community
Sample – the people asked
Population – the community
• characteristic being measured
• category - result of measurement is a “word”, e.g. yes (or no), truck, bird, sparrow, first (or second) etc
• numerical - measurement produces number
• could be limited to certain values (e.g. whole numbers)
• any value (e.g. mass of an object)
Exercise 1.3
(a) lead levels in fallout
(b) types of birds observed numerical – any value category
(c) numbers of birds observed in different locations numerical –set values
• large quantities of raw data are not useful for presenting the results of the tests
• they need to organised to show the results in a smaller scale
• tables
• graphs
• averages
• comparisons
• organising it so that it can be evaluated more easily
• generally some sort of table
• category data is most usually grouped (tallied)
• the number of times each different category occurs is the recorded result
• can also be used where the data is numerical
• only with fixed and pre-known values
• a large number of data points
• numerical (all values) data presents problem
• must be grouped into ranges
• information is lost, e.g. 0.1 and 4.9 both fit into 0-5 range
• identify the minimum and maximum values
• decide how many groups are appropriate for the size of the dataset
• determine the groups (which should be equivalent ranges – for example, 0-
5,6-10 etc, but not 0-5, 6-20)
C LASS E XERCISE 1.4
• You have a data set of 100 pH measurements of river water, ranging from 5 to 9. What would be an appropriate way of grouping them?
• 8 ranges of 0.5
• e.g. 5.0-5.49, 5.50-5.99 etc
• number of times a particular value or range occurs is the frequency
• spread of data across the range of values is the distribution
• Is it evenly spread across the groups?
• Do certain groups have higher frequencies?
• Is there any pattern?
• frequency should considered in relation to total number of data values
• relative frequency – the proportion (often as a percentage) of the frequency of the total dataset
• manually tallying - how many occurrences of each value – of large data sets is boring, tiring and potentially inaccurate
• Excel has some functions which help:
• COUNT(cell range)
• COUNTIF(range , criterion)
• FREQUENCY (range , group) – probably more trouble than it’s worth
• returns the total number of cells with numerical data
• ignores blank cells and nonnumerical values
A
0 n/a
***
10
3
4
7
8
2
6
7
4
5
8
9
1
2
3
B
7 =count(A1:A9)
• returns the number of cells meeting a given criteria
• criteria include =, > or <
A
10
3 n/a
***
4
0
7
8
2
7
8
5
6
9
1
2
3
4
B
7
3 =countif(A1:A9,
”>5”)
• tally data into user-chosen groups
• entered as an array formula
• highlight a group of cells where you want the frequencies to appear
• type in the formula and then hit the key combination
CTRL+SHIFT+ENTER
A
0 n/a
***
10
3
4
7
8
2
4
5
6
1
2
3
7
8
9
B
5
10
4
3 values for groups 0-5,
6-10
=frequency(A1:A9,B6:B7)
One sample set – two variables General state of health Male
Sex of koala
Female
Healthy
Ill
45
21
28
9
Two sample sets – one variable
Origin of plant
Native
Introduced
Not identified
Type of parkland
Urban Undeveloped
37%
51
12
65
20
15
• represents all the data values with one or two
• average – some way of representing the “most common” value
• variation – how much spread there is in data set
• category variables – class with highest frequency (mode)
• variation cannot be measured
• numerical variables
• mean – what we normally refer to as average
• mode – most common value (used in grouped data)
• median – the value in the middle when arranged in order
• range – highest – lowest
• standard deviation – calculation of difference of all points from mean
• mean & std dev normally used in scientific data
• large amount of data
• simple formulas required
• all questions and directions contained in Excel spreadsheet