1. Data Processing Sci Info Skills 1. Data Processing

Sci Info Skills

Statistics – making sense of numbers

STATISTICS

Data Collection Data Organisation Data Interpretation Data Presentation

Types of statistics

Descriptive

• the processed data provides a summary of the observations/measurements, such as averages, variations and graphs

Inferential

• the processed data is used to make judgements or predictions, such as trends, indications of variations between different samples

Class Exercise 1.1

a) the average December temperature in Sydney has increased by

1

C in the last 50 years

Descriptive b) it is expected that the average December temperature in Sydney will increase by another 1

C within 25 years Inferential c) 25% of people surveyed at a shopping centre indicated that they were aware of increasing temperatures in Sydney Descriptive d) A survey has a shown that 75% of Sydneysiders are ignorant of the changing climatic conditions in their city

Inferential

Class Exercise 1.2

Identify the sample and the population in the following.

(a) a bottle of water is taken from a dam to be tested

Sample – the water in the bottle

Population – all the water in the dam

(b) the frog population of a large wetland is checked by looking at two separate hectares

Sample – the two hectares

Population – the whole wetlands

Class Exercise 1.2

(c) the levels of lead in fallout around a smelter are assessed by testing a selection of properties

Sample – the selected properties

Population - the whole area

(d) people in shopping centre are asked their opinions … to determine the level of awareness in the community

Population – the community

Variables

• characteristic being measured

category - result of measurement is a “word”, e.g. yes (or no), truck, bird, sparrow, first (or second) etc

numerical - measurement produces number

• could be limited to certain values (e.g. whole numbers)

• any value (e.g. mass of an object)

Exercise 1.3

(b) types of birds observed numerical – any value category

(c) numbers of birds observed in different locations numerical –set values

Presenting & organising data

• large quantities of raw data are not useful for presenting the results of the tests

• they need to organised to show the results in a smaller scale

• tables

• graphs

• averages

• comparisons

Tabulating data

• organising it so that it can be evaluated more easily

• generally some sort of table

• category data is most usually grouped (tallied)

• the number of times each different category occurs is the recorded result

• can also be used where the data is numerical

• only with fixed and pre-known values

• a large number of data points

• numerical (all values) data presents problem

• must be grouped into ranges

• information is lost, e.g. 0.1 and 4.9 both fit into 0-5 range

Grouping numerical data

• identify the minimum and maximum values

• decide how many groups are appropriate for the size of the dataset

• determine the groups (which should be equivalent ranges – for example, 0-

5,6-10 etc, but not 0-5, 6-20)

C LASS E XERCISE 1.4

• You have a data set of 100 pH measurements of river water, ranging from 5 to 9. What would be an appropriate way of grouping them?

• 8 ranges of 0.5

• e.g. 5.0-5.49, 5.50-5.99 etc

Frequencies

• number of times a particular value or range occurs is the frequency

• spread of data across the range of values is the distribution

• Is it evenly spread across the groups?

• Do certain groups have higher frequencies?

• Is there any pattern?

• frequency should considered in relation to total number of data values

relative frequency – the proportion (often as a percentage) of the frequency of the total dataset

Excel & tally charts

• manually tallying - how many occurrences of each value – of large data sets is boring, tiring and potentially inaccurate

• Excel has some functions which help:

• COUNT(cell range)

• COUNTIF(range , criterion)

• FREQUENCY (range , group) – probably more trouble than it’s worth

COUNT ( )

• returns the total number of cells with numerical data

• ignores blank cells and nonnumerical values

A

0 n/a

***

10

3

4

7

8

2

6

7

4

5

8

9

1

2

3

B

7 =count(A1:A9)

COUNTIF ( )

• returns the number of cells meeting a given criteria

• criteria include =, > or <

A

10

3 n/a

***

4

0

7

8

2

7

8

5

6

9

1

2

3

4

B

7

3 =countif(A1:A9,

”>5”)

FREQUENCY(,)

• tally data into user-chosen groups

• entered as an array formula

• highlight a group of cells where you want the frequencies to appear

• type in the formula and then hit the key combination

CTRL+SHIFT+ENTER

A

0 n/a

***

10

3

4

7

8

2

4

5

6

1

2

3

7

8

9

B

5

10

4

3 values for groups 0-5,

6-10

=frequency(A1:A9,B6:B7)

Two-way frequency tables

One sample set – two variables General state of health Male

Sex of koala

Female

Healthy

Ill

45

21

28

9

Two sample sets – one variable

Origin of plant

Native

Introduced

Not identified

Type of parkland

Urban Undeveloped

37%

51

12

65

20

15

The typical value

• represents all the data values with one or two

• average – some way of representing the “most common” value

• variation – how much spread there is in data set

• category variables – class with highest frequency (mode)

• variation cannot be measured

• numerical variables

• mean – what we normally refer to as average

• mode – most common value (used in grouped data)

• median – the value in the middle when arranged in order

• range – highest – lowest

• standard deviation – calculation of difference of all points from mean

• mean & std dev normally used in scientific data

Assignment 1

• large amount of data

• simple formulas required

• all questions and directions contained in Excel spreadsheet