advertisement

**STATISTICS**

Data Collection Data Organisation Data Interpretation Data Presentation

•

*Descriptive*

the processed data provides a summary of the observations/measurements, such as averages, variations and graphs

•

*Inferential*

the processed data is used to make judgements or predictions, such as trends, indications of variations between different samples

a) the average December temperature in Sydney has increased by

1

C in the last 50 years

*Descriptive*

b) it is expected that the average December temperature in Sydney will increase by another 1

C within 25 years

*Inferential*

c) 25% of people surveyed at a shopping centre indicated that they were aware of increasing temperatures in Sydney

*Descriptive*

d) A survey has a shown that 75% of Sydneysiders are ignorant of the changing climatic conditions in their city

*Inferential*

*Identify the sample and the population in the following.*

(a) a bottle of water is taken from a dam to be tested

**Sample – the water in the bottle**

**Population – all the water in the dam**

(b) the frog population of a large wetland is checked by looking at two separate hectares

**Sample – the two hectares**

**Population – the whole wetlands**

(c) the levels of lead in fallout around a smelter are assessed by testing a selection of properties

**Sample – the selected properties**

**Population - the whole area**

(d) people in shopping centre are asked their opinions … to determine the level of awareness in the community

**Sample – the people asked**

**Population – the community**

•

•

• characteristic being measured

**category **- result of measurement is a “word”, e.g. yes (or no), truck, bird, sparrow, first (or second) etc

**numerical **- measurement produces number

• could be limited to certain values (e.g. whole numbers)

• any value (e.g. mass of an object)

**Exercise 1.3**

(a) lead levels in fallout

(b) types of birds observed

(c)

*numerical – any value category*

numbers of birds observed in different locations

*numerical –set values*

• large quantities of raw data are not useful for presenting the results of the tests

• they need to organised to show the results in a smaller scale

• tables

• graphs

• averages

• comparisons

•

•

•

•

•

•

•

• organising it so that it can be evaluated more easily generally some sort of table category data is most usually grouped (tallied) the number of times each different category occurs is the recorded result can also be used where the data is numerical

• only with fixed and pre-known values

• a large number of data points numerical (all values) data presents problem must be grouped into ranges information is lost, e.g. 0.1 and 4.9 both fit into 0-5 range

•

•

• identify the minimum and maximum values decide how many groups are appropriate for the size of the dataset determine the groups (which should be equivalent ranges – for example, 0-

5,6-10 etc, but not 0-5, 6-20)

•

**C**

**LASS**

**E**

**XERCISE**

**1.4**

*You have a data set of 100 pH measurements of river water, ranging from 5 to 9. What would be an appropriate way of grouping them?*

•

•

8 ranges of 0.5

e.g. 5.0-5.49, 5.50-5.99 etc

•

• number of times a particular value or range occurs is the **frequency** spread of data across the range of values is the **distribution**

•

Is it evenly spread across the groups?

•

Do certain groups have higher frequencies?

•

Is there any pattern?

•

• frequency should considered in relation to total number of data values

**relative frequency **– the proportion (often as a percentage) of the frequency of the total dataset

• manually tallying - how many occurrences of each value – of large data sets is boring, tiring and potentially inaccurate

•

Excel has some functions which help:

•

COUNT(cell range)

•

COUNTIF(range , criterion)

•

FREQUENCY (range , group) – probably more trouble than it’s worth

•

• returns the total number of cells with numerical data ignores blank cells and nonnumerical values

A

0 n/a

***

10

3

4

7

8

2

6

7

4

5

8

9

1

2

3

B

**7**

=count(A1:A9)

•

• returns the number of cells meeting a given criteria criteria include =, > or <

A

10

3 n/a

***

4

0

7

8

2

7

8

5

6

9

1

2

3

4

B

**7**

**3**

=countif(A1:A9,

”>5”)

•

•

•

• tally data into user-chosen groups entered as an array formula highlight a group of cells where you want the frequencies to appear type in the formula and then hit the key combination

CTRL+SHIFT+ENTER

A

0 n/a

***

10

3

4

7

8

2

4

5

6

1

2

3

7

8

9

B

5

10

**4**

**3**

values for groups 0-5,

6-10

=frequency(A1:A9,B6:B7)

One sample set – two variables

General state of health

Healthy

Ill

Male

Sex of koala

Female

45

21

28

9

Two sample sets – one variable

Origin of plant

Native

Introduced

Not identified

Type of parkland

Urban Undeveloped

37%

51

12

65

20

15

•

•

• represents all the data values with one or two average – some way of representing the “most common” value variation – how much spread there is in data set

•

• category variables – class with highest frequency (mode)

• variation cannot be measured numerical variables

• mean – what we normally refer to as average

• mode – most common value (used in grouped data)

• median – the value in the middle when arranged in order

• range – highest – lowest

• standard deviation – calculation of difference of all points from mean

• mean & std dev normally used in scientific data

•

•

• large amount of data simple formulas required all questions and directions contained in Excel spreadsheet