advertisement

Sci Info Skills

**STATISTICS**

Data Collection Data Organisation Data Interpretation Data Presentation

*Descriptive*

• the processed data provides a summary of the observations/measurements, such as averages, variations and graphs

*Inferential*

• the processed data is used to make judgements or predictions, such as trends, indications of variations between different samples

a) the average December temperature in Sydney has increased by

1

C in the last 50 years

Descriptive b) it is expected that the average December temperature in Sydney will increase by another 1

C within 25 years Inferential c) 25% of people surveyed at a shopping centre indicated that they were aware of increasing temperatures in Sydney Descriptive d) A survey has a shown that 75% of Sydneysiders are ignorant of the changing climatic conditions in their city

*Inferential*

*Identify the sample and the population in the following.*

(a) a bottle of water is taken from a dam to be tested

**Sample – the water in the bottle**

**Population – all the water in the dam**

(b) the frog population of a large wetland is checked by looking at two separate hectares

**Sample – the two hectares**

**Population – the whole wetlands**

(c) the levels of lead in fallout around a smelter are assessed by testing a selection of properties

**Sample – the selected properties**

**Population - the whole area**

(d) people in shopping centre are asked their opinions … to determine the level of awareness in the community

**Sample – the people asked**

**Population – the community**

• characteristic being measured

• **category **- result of measurement is a “word”, e.g. yes (or no), truck, bird, sparrow, first (or second) etc

• **numerical **- measurement produces number

• could be limited to certain values (e.g. whole numbers)

• any value (e.g. mass of an object)

**Exercise 1.3**

(a) lead levels in fallout

*(b) types of birds observed numerical – any value category*

*(c) numbers of birds observed in different locations numerical –set values*

• large quantities of raw data are not useful for presenting the results of the tests

• they need to organised to show the results in a smaller scale

• tables

• graphs

• averages

• comparisons

• organising it so that it can be evaluated more easily

• generally some sort of table

• category data is most usually grouped (tallied)

• the number of times each different category occurs is the recorded result

• can also be used where the data is numerical

• only with fixed and pre-known values

• a large number of data points

• numerical (all values) data presents problem

• must be grouped into ranges

• information is lost, e.g. 0.1 and 4.9 both fit into 0-5 range

• identify the minimum and maximum values

• decide how many groups are appropriate for the size of the dataset

• determine the groups (which should be equivalent ranges – for example, 0-

5,6-10 etc, but not 0-5, 6-20)

**C LASS E XERCISE 1.4**

*• You have a data set of 100 pH measurements of river water, ranging from 5 to 9. What would be an appropriate way of grouping them?*

• 8 ranges of 0.5

• e.g. 5.0-5.49, 5.50-5.99 etc

• number of times a particular value or range occurs is the **frequency**

• spread of data across the range of values is the **distribution**

• Is it evenly spread across the groups?

• Do certain groups have higher frequencies?

• Is there any pattern?

• frequency should considered in relation to total number of data values

• **relative frequency **– the proportion (often as a percentage) of the frequency of the total dataset

• manually tallying - how many occurrences of each value – of large data sets is boring, tiring and potentially inaccurate

• Excel has some functions which help:

• COUNT(cell range)

• COUNTIF(range , criterion)

• FREQUENCY (range , group) – probably more trouble than it’s worth

• returns the total number of cells with numerical data

• ignores blank cells and nonnumerical values

A

0 n/a

***

10

3

4

7

8

2

6

7

4

5

8

9

1

2

3

B

7 =count(A1:A9)

• returns the number of cells meeting a given criteria

• criteria include =, > or <

A

10

3 n/a

***

4

0

7

8

2

7

8

5

6

9

1

2

3

4

B

**7**

3 =countif(A1:A9,

”>5”)

• tally data into user-chosen groups

• entered as an array formula

• highlight a group of cells where you want the frequencies to appear

• type in the formula and then hit the key combination

CTRL+SHIFT+ENTER

A

0 n/a

***

10

3

4

7

8

2

4

5

6

1

2

3

7

8

9

B

5

10

**4**

3 values for groups 0-5,

6-10

=frequency(A1:A9,B6:B7)

One sample set – two variables General state of health Male

Sex of koala

Female

Healthy

Ill

45

21

28

9

Two sample sets – one variable

Origin of plant

Native

Introduced

Not identified

Type of parkland

Urban Undeveloped

37%

51

12

65

20

15

• represents all the data values with one or two

• average – some way of representing the “most common” value

• variation – how much spread there is in data set

• category variables – class with highest frequency (mode)

• variation cannot be measured

• numerical variables

• mean – what we normally refer to as average

• mode – most common value (used in grouped data)

• median – the value in the middle when arranged in order

• range – highest – lowest

• standard deviation – calculation of difference of all points from mean

• mean & std dev normally used in scientific data

• large amount of data

• simple formulas required

• all questions and directions contained in Excel spreadsheet