# 1. Data Processing Sci Info Skills # 1. Data Processing

Sci Info Skills

### Statistics – making sense of numbers

STATISTICS

Data Collection Data Organisation Data Interpretation Data Presentation

### Types of statistics

Descriptive

• the processed data provides a summary of the observations/measurements, such as averages, variations and graphs

Inferential

• the processed data is used to make judgements or predictions, such as trends, indications of variations between different samples

### Class Exercise 1.1

a) the average December temperature in Sydney has increased by

1

C in the last 50 years

Descriptive b) it is expected that the average December temperature in Sydney will increase by another 1

C within 25 years Inferential c) 25% of people surveyed at a shopping centre indicated that they were aware of increasing temperatures in Sydney Descriptive d) A survey has a shown that 75% of Sydneysiders are ignorant of the changing climatic conditions in their city

Inferential

### Class Exercise 1.2

Identify the sample and the population in the following.

(a) a bottle of water is taken from a dam to be tested

Sample – the water in the bottle

Population – all the water in the dam

(b) the frog population of a large wetland is checked by looking at two separate hectares

Sample – the two hectares

Population – the whole wetlands

### Class Exercise 1.2

(c) the levels of lead in fallout around a smelter are assessed by testing a selection of properties

Sample – the selected properties

Population - the whole area

(d) people in shopping centre are asked their opinions … to determine the level of awareness in the community

Population – the community

### Variables

• characteristic being measured

category - result of measurement is a “word”, e.g. yes (or no), truck, bird, sparrow, first (or second) etc

numerical - measurement produces number

• could be limited to certain values (e.g. whole numbers)

• any value (e.g. mass of an object)

Exercise 1.3

(b) types of birds observed numerical – any value category

(c) numbers of birds observed in different locations numerical –set values

### Presenting & organising data

• large quantities of raw data are not useful for presenting the results of the tests

• they need to organised to show the results in a smaller scale

• tables

• graphs

• averages

• comparisons

### Tabulating data

• organising it so that it can be evaluated more easily

• generally some sort of table

• category data is most usually grouped (tallied)

• the number of times each different category occurs is the recorded result

• can also be used where the data is numerical

• only with fixed and pre-known values

• a large number of data points

• numerical (all values) data presents problem

• must be grouped into ranges

• information is lost, e.g. 0.1 and 4.9 both fit into 0-5 range

### Grouping numerical data

• identify the minimum and maximum values

• decide how many groups are appropriate for the size of the dataset

• determine the groups (which should be equivalent ranges – for example, 0-

5,6-10 etc, but not 0-5, 6-20)

C LASS E XERCISE 1.4

• You have a data set of 100 pH measurements of river water, ranging from 5 to 9. What would be an appropriate way of grouping them?

• 8 ranges of 0.5

• e.g. 5.0-5.49, 5.50-5.99 etc

### Frequencies

• number of times a particular value or range occurs is the frequency

• spread of data across the range of values is the distribution

• Is it evenly spread across the groups?

• Do certain groups have higher frequencies?

• Is there any pattern?

• frequency should considered in relation to total number of data values

relative frequency – the proportion (often as a percentage) of the frequency of the total dataset

### Excel & tally charts

• manually tallying - how many occurrences of each value – of large data sets is boring, tiring and potentially inaccurate

• Excel has some functions which help:

• COUNT(cell range)

• COUNTIF(range , criterion)

• FREQUENCY (range , group) – probably more trouble than it’s worth

### COUNT ( )

• returns the total number of cells with numerical data

• ignores blank cells and nonnumerical values

A

0 n/a

***

10

3

4

7

8

2

6

7

4

5

8

9

1

2

3

B

7 =count(A1:A9)

### COUNTIF ( )

• returns the number of cells meeting a given criteria

• criteria include =, > or <

A

10

3 n/a

***

4

0

7

8

2

7

8

5

6

9

1

2

3

4

B

7

3 =countif(A1:A9,

”>5”)

### FREQUENCY(,)

• tally data into user-chosen groups

• entered as an array formula

• highlight a group of cells where you want the frequencies to appear

• type in the formula and then hit the key combination

CTRL+SHIFT+ENTER

A

0 n/a

***

10

3

4

7

8

2

4

5

6

1

2

3

7

8

9

B

5

10

4

3 values for groups 0-5,

6-10

=frequency(A1:A9,B6:B7)

### Two-way frequency tables

One sample set – two variables General state of health Male

Sex of koala

Female

Healthy

Ill

45

21

28

9

Two sample sets – one variable

Origin of plant

Native

Introduced

Not identified

Type of parkland

Urban Undeveloped

37%

51

12

65

20

15

### The typical value

• represents all the data values with one or two

• average – some way of representing the “most common” value

• variation – how much spread there is in data set

• category variables – class with highest frequency (mode)

• variation cannot be measured

• numerical variables

• mean – what we normally refer to as average

• mode – most common value (used in grouped data)

• median – the value in the middle when arranged in order

• range – highest – lowest

• standard deviation – calculation of difference of all points from mean

• mean & std dev normally used in scientific data

### Assignment 1

• large amount of data

• simple formulas required

• all questions and directions contained in Excel spreadsheet