ba5001business_decision_making_june2012

advertisement
Module Title
Business Decision Making
Module Code
BA 5001
Session
2012/ 13
Teaching Period
Year long
Janet Geary 2012
Page 1
Module Booklet Contents
Welcome to Module Title:
Details of the Staff teaching team
Name of Module Leader
Janet Geary and Maurice Pratt
Office Location
Janet Geary : Stapleton House SH317
Maurice Pratt : Calcutta House CM 147
Email
j.geary@londonmet.ac.uk
m.pratt@londonmet.ac.uk
Telephone
Janet 0207 133 3839
Maurice 0207 320 3270
Office Hours
Maurice : Monday 11am to 12, Thursday 12 to 1pm
Janet :
MODULE SPECIFICATION
1
Module title
Business Decision Making
2
Module code
BA 5001
Level 5
3
Module Level
Janet Geary & Maurice Pratt
4
Module Leader
Business School
5
Faculty
6
Teaching site(s) for course
cross-campus
cross-campus
cross-campus
7
Teaching period
year long (30 weeks)
8
Teaching mode
Day
9
Module Type
Year long
30
10
Credit rating
11
Prerequisites and corequisites
BA 4002 Managing Information and Accounting
12
Module description
The Business Decision Making module is directed at students who are following a number of
different business degree programmes and requires that students have previously completed
the Level 4 module Managing Information and Accounting or its equivalent. The year-long
module is designed to enable students to understand the role of quantitative and statistical
techniques in managerial decision making and also to familiarise students with management
accounting concepts and techniques with an emphasis on decision making.
Assessment: Data analysis and reporting on findings 30%, In-class tests 30%, and integrated
assignment 40%
13
Module aims
Janet Geary 2012
Page 2
The aims of the module are:
•To provide an introduction to business decision making through the application of selected
quantitative/statistical techniques and associated specialist software.
•To develop an understanding of how such analyses fit into the wider business and
management context.
•To enable students to assess the reliability and usefulness of any information generated by the
analysis and hence justify decisions made.
•To develop students’ understanding of the major uses of accounting information by
management in problem-solving, decision-making and planning and control.
•To familiarise students with the decision-making framework for internal users of accounting
information in short term decision-making.
•To examine alternative techniques in decision-making situations, including capital investment
appraisal.
•To enable students to design spreadsheet models and interpret the managerial accounting
information outputs of such models.
•To prepare students, where appropriate, for Level 6 project work in this area.
14
Module learning outcomes
On successful completion of this module, students will be able to:
•
Understand how uncertainty can be built into business decision making.
•
Apply linear programming techniques to optimise constrained decision choices.
•
Use multiple regression analysis to model business problems.
•
Apply a range of approaches to collect and analyse survey data.
•
Apply selected techniques to manage projects.
•
Use specialist software (e.g. SPSS, MSProject) to support business decision making.
•
Account for short-term decisions, allocating costs utilising traditional methods and
activity- based costing systems.
•
Calculate and interpret accounting information for relevant costs and benefits in shortterm decision-making situations.
•
Demonstrate knowledge and understanding of the budgeting process and how to
construct budgets, including the importance of behavioural implications.
•
Calculate and communicate financial and non-financial measures of attractiveness in
investment appraisal decisions and understand the role of cost of capital.
15
Syllabus
Janet Geary 2012
Page 3
Survey methods: sampling, collection and analysis, using SPSS; statistical hypothesis tests.
Managing projects: scheduling for efficient resourcing; using MSProject.
Using SPSS to model relationships between variables - multiple regression, hypothesis tests.
Linear programming models in situations of constrained optimisation; using QSB (or
equivalent).
The value of accounting information and theories of decision-making.
Accounting for short-term decisions, the nature and classification of costs, relevant costing.
Application of cost-volume-profit (CVP) analysis in decision-making for organisations and risk
and uncertainty in decision-making.
The budgeting process, responsibility accounting and the effects of budgeting on motivation.
Accounting for long-term decisions: objectives of capital budgeting; the notion of time value of
money; compounding and discounting.
Determination of a company’s cost of capital and capital rationing, methods of evaluating
investment projects: Payback, ARR, NPV and IRR.
16
Assessment Strategy
The assessment will be in three parts. Formal assessment will comprise:
Assessment 1 (30%): Students will be presented with case study material that enables them to
model management data in order to facilitate decision making. Students will be required to
report on the results of their analysis.
Assessment 2 (30%): In-class tests. Students will be set a number of in-class tests focussing on
Management Accounting.
Assessment 3 (40%): Integrated assignment, combining decision-making and project
management.
17
Summary description of assessment items
Assessment Type
Description of item
CWK
In-class tests
CWK
Report on the findings of data analysis
Report on a case study involving decisionmaking and project management
%
Weighting
30%
30%
40%
Tariff
Week Due
13
19,23,27
30
18
Learning and teaching
19
Learning and Teaching strategy for the module including approach to blended learning,
students’ study responsibilities and opportunities for reflective learning/pdp
The teaching will consist of 2½ hour blocks per week, some of which will be in I.T. labs and
some in classrooms. This will enable students to develop skills in the software appropriate to
the area of the syllabus being covered.
Bibliography
Janet Geary 2012
Page 4
Indicative bibliography and key on-line resources
Atrill, P. and McLaney, E. (2010) Management Accounting for Decision Makers, 6th Edition,
Financial Times Press.
Bryman A. and Cramer D. (2011) Quantitative Data Analysis with IBM SPSS 17, 18 & 19 – A
Guide for Social Scientists, Routledge
Dewhurst, F. (2006) Quantitative Methods for Business and Management, 2nd Edition, McGraw
Hill
Drury, C. (2009) Management Accounting for Business, 4th ed, Cengage Learning EMEA.
Hongren C., Bhimani A., Datar S. and Foster G. (2012) Management and Cost Accounting, 5th
ed., Financial Times Press.
Oakshott, L. (2012) Essential Quantitative Methods, 5th Edition, Palgrave
Ray Proctor (2009), Managerial Accounting for Business Decisions, 3rd edition, Financial Times
Press.
Render B. Stair, R. and Hanna, M. (2012) Quantitative Analysis for Management, 11th Edition,
Pearson
Rowntree, D. (2004) Statistics Without Tears: An introduction for non-mathematicians, Penguin
Books
Swift, L. and Piff, S. (2010) Quantitative Methods for Business, Management and Finance, 3rd
edition, Palgrave
Wisniewski, M. (2009) Quantitative Methods for Decision Makers, 5th Edition, Financial
Times/Prentice Hall
20
Approved to run from
September 2012
21
Module multivalency
22
Subject Standards Board
Janet Geary 2012
Page 5
Weekly Programme Lecture Topics – Seminar/Workshop/Practical details
week
content
1
Introduction to the Module
setting up data files in SPSS
Normal Distribution
Producing graphs and descriptive statistics in SPSS
Hypothesis testing: Chi-squared
Chi-squared using SPSS
Hypothesis testing : Significance tests for means
T-test for population mean , two-sample t-test using SPSS
Confidence intervals
using SPSS for confidence intervals
Correlation and Regression
Using SPSS for bivariate correlation and regression
Student Activity week
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Assessments
Multiple Regression
using SPSS for multiple regression
Interpreting Multiple Regression output
Graphical Linear Programming
Using Win QSB for LP
Linear programming: shadow prices and sensitivity analysis
Using Win QSB for LP
Linear Programming: extension to more than 2 variables
Using Win QSB for LP
Project management
Using MS Project
Project management
Using MS Project
Project management
Using MS Project
coursework 1 (30%)
16
Intro. to accounting for management decisions
17
18
Cost behaviour and cost-volume-profit (CVP) analysis
Product costing – traditional overhead costing and activity-based
costing (ABC) systems
19
Management Accounting Test One
Management
Accounting Test 1 (10%)
20
21
Student Activity week
Budgeting I – Budgeting process and the preparation of sales and
operational budgets
22
Budgeting II – Preparation of cash budgets and the master budget
23
Management Accounting Test Two
Management
Accounting Test 2 (10%)
24
Relevant costs and benefits for decisions
Janet Geary 2012
Page 6
25
Relevant information for operating decisions: Short-term decisions
with scarce resources
26
Risk and uncertainty in decision-making
27
Management Accounting Test Three
Long-term decisions II – Methods of investment appraisal and capital
rationing
28
Management
Accounting Test 3 (10%)
29
Revision Week
30
Final coursework (40%
Essential Books/on line resources including Weblearn/Blackboard
For the first 15 weeks, the module booklet provides most of the required material. Students should
supplement their reading of the module booklet with the recommended texts.
Required/Weekly Reading/Practice/on line resources including any Weblearn/Blackboard
Students will need to look at weblearn on a regular basis (twice week ,say)for :
1, powerpoint slides
2. data files
3. answers to selected exercises
4. Additional readings
Additional/Weekly Reading/Practice/on line resources including any Weblearn/Blackboard
This module is supported by Weblearn – students are advised to access the site on a regular basis,
at least once a week
Module Assessment Details, including Assessment Criteria for all elements of the assessment,
including any examination




Well–structured, well-written reports for courseworks
Appropriate use of software packages
Accurate interpretations of the output from software packages
Accurate calculations
Assessment criteria for individual assessments will be provided along with the assessment guidelines
Assessment completion dates/deadlines
Coursework 1: Friday of week 13
Test 1: In class during week 19
Test 2: In class during week 23
Test 3: in class during week 27
Coursework 2: Friday of week 30
Please ensure that coursework is handed in at the Assessments Unit not later than 5pm on
the due date.
Janet Geary 2012
Page 7
Practical session 1
In order to learn how to use SPSS , we will base our exercises around the results of a survey using the
following questionnaire. We will not be analyzing all the results
Survey of Seabridge Fitness & Sports Centre
The Centre’s management wants to ensure that its members get full value for money so is undertaking this
survey. Included are a number of potential developments to the Centre, and we would appreciate your help
in determining the nature and priority of these developments, as well as your opinion on other aspects of
the Centre. Please complete this questionnaire (which is both confidential & anonymous) and post it in the
box at the reception desk.
1.
Which sport/activity have you taken part in during this visit to the Centre?
Tick one box only (main sport/activity)
Swimming
1
Badminton
4
Specialist classes:
2.
2
5
7
Judo
Gym Training
Alexander Technique
3
6
8
What is your main reason for taking part in this Sports/Activity?
Tick one box only
To get fit
Competition
3.
Keep Fit/Aerobics
Basketball
Pilates
Social
Skill development
1
3
2
4
What is your opinion on the range of Sports/Activities offered at the Centre?
Tick one box only
Very good 1
4.
Good 2
Average 3
Poor 4
Very poor 5
We are thinking of introducing a number of changes at the Centre and need to identify
priorities.
Please show your preferences by ticking the two most important developments
Offering hot meals in the café area
Building a sauna adjacent to the swimming pool
Introducing longer opening hours (7am – 11pm)
Introducing family membership scheme
Introducing weekday only membership
Expanding the range of bookings for team sports
Expanding the range of specialist classes
Other please state
………………………………………………….
5.
Are you:
6.
How old are you ________________years?
7.
Male 1
1
2
3
4
5
6
7
8
Female 2
When do you normally attend the Centre?
Tick one box only, for each of a) & b) below
a) Time of day: Morning 1
b) Day:
Weekday 1
Janet Geary 2012
Afternoon 2
Weekend 2
Evening 3
Page 8
8.
How long have you been a member of the Centre?
Please enter the length of your membership to the nearest number of years ______________
9.
Please rate your fitness levels when you joined the centre and now:
Tick one box only, for each of a) & b) below
a) Fitness when I joined the Centre
b) Fitness now
10.
Very Good Fairly Average/ Fairly Poor Very
Good
Good Moderate Poor
Poor
1
2
3
4
5
6
7
1
2
3
4
5
6
7
What was the main reason for joining the Centre?
Location of the Centre
Range of facilities available
Recommended by a friend or relative
Membership rates
Other please state
………………………………………………….
11.
Please rate each aspect of service quality:
Tick one box only, for each of a) & b) below
Very
Good
a) Helpfulness of the reception staff
1
b) Cleanliness of the changing rooms
1
12.
Tick one box only
1
2
3
4
5
Good
Average
2
2
3
3
Poor
4
4
Very
Poor
5
5
Please use the space below to add any comments about the Centre.
Thank you completing this questionnaire. Please post it in the box at the reception desk.
[Thanks to Richard Charlesworth for letting us use this questionnaire and associated data file ]
Janet Geary 2012
Page 9
Questionnaire responses:
Resp
1
2
3
4
5
6
7
8
9
10
11
12
13
Q1
Sport
6
1
7
3
5
1
5
2
5
4
6
3
4
Q2
Reason
3
1
1
4
2
1
3
1
4
1
2
4
4
Q3
Variety
2
2
4
2
4
2
1
3
2
3
1
1
4
Q4(1)
Changes1
3
1
2
3
3
5
1
2
1
2
1
6
4
Q4(2)
Changes2
4
6
7
4
5
6
5
8
4
8
3
7
7
Q5
Gender
2
1
2
2
1
2
1
2
2
2
1
1
1
Q6
Age
29
34
44
42
28
36
47
19
27
999
37
24
52
Q7a
Time
2
1
1
1
1
1
3
1
1
2
2
1
2
Q7b
Day
1
1
1
1
2
1
1
1
2
1
1
1
2
Q8
Howlong
2
1
2
2
4
2
3
1
2
4
3
1
3
Q9a
Fitness1
2
5
7
3
2
4
3
6
5
4
4
4
7
Q9b
Fitness2
2
5
4
3
3
3
2
5
4
2
2
2
3
Q10
Join
3
5
1
4
5
5
5
1
4
1
5
5
3
Q11a
Helpfulness
2
3
1
5
4
4
5
4
3
2
4
3
4
Q11b
Cleanliness
3
2
1
4
4
5
5
3
3
3
4
5
3
14
15
16
2
1
4
2
2
3
2
4
2
3
5
3
4
7
5
2
1
1
35
31
23
3
3
3
1
1
2
2
2
1
3
6
1
4
5
1
888
2
3
1
3
1
2
4
2
17
5
1
4
3
8
2
32
2
1
2
4
2
3
4
4
18
19
20
3
3
6
3
2
3
3
2
3
2
1
2
3
6
8
2
2
1
21
33
23
3
2
1
1
1
2
1
2
2
1
3
4
2
2
3
1
3
5
2
3
2
2
4
2
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
4
7
4
1
5
1
1
5
7
5
2
5
2
6
1
4
1
3
1
3
3
3
4
2
888
1
2
4
4
2
3
4
2
3
3
2
3
3
1
5
2
4
3
2
3
5
4
5
6
4
1
4
4
6
5
2
3
6
1
1
7
6
6
8
5
3
6
7
8
8
2
5
8
7
5
2
2
1
2
2
2
2
2
2
1
1
1
2
1
1
38
51
30
46
31
42
46
25
61
27
38
41
37
37
57
1
3
1
1
1
2
1
3
2
2
3
1
1
1
1
1
2
1
2
1
2
2
1
1
2
2
1
2
2
2
3
3
3
3
2
2
4
2
1
2
4
4
2
1
3
6
5
6
3
3
2
4
2
2
4
5
4
4
2
5
4
5
4
2
3
4
2
2
2
3
4
4
2
2
3
4
1
2
2
2
4
2
1
3
4
3
2
5
2
3
4
3
1
4
1
1
2
3
5
2
1
2
2
3
2
4
4
1
3
2
2
2
2
5
3
1
1
2
1
3
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
7
3
6
2
2
2
4
6
5
5
4
6
7
6
1
2
8
7
1
4
4
1
2
1
1
3
2
2
2
3
3
3
4
3
3
2
3
3
3
5
3
4
4
3
4
3
1
3
2
2
3
2
3
2
2
3
3
2
1
6
2
1
1
4
4
1
2
4
1
5
1
6
3
6
5
7
6
7
7
2
6
4
5
8
8
8
2
6
3
8
2
1
1
1
2
2
1
1
1
1
2
1
1
2
2
2
1
1
39
22
19
43
68
43
45
38
40
17
34
30
65
33
24
48
999
27
1
3
2
2
2
2
3
1
1
1
2
2
3
3
3
2
1
3
1
2
1
1
1
2
2
2
2
2
1
1
2
1
2
1
2
2
1
2
3
1
2
3
2
1
4
2
1
3
4
2
1
2
3
1
7
1
5
3
7
5
4
3
3
7
2
1
1
3
3
2
5
5
5
1
3
3
4
2
2
3
2
5
1
3
1
3
2
1
4
3
2
2
4
5
4
4
2
2
5
5
1
1
4
4
3
5
1
999
1
3
1
2
2
3
2
2
3
2
3
3
1
2
5
2
2
2
2
4
1
3
3
2
2
1
3
3
3
4
1
2
4
3
2
3
54
55
56
57
58
59
60
6
8
7
1
6
7
6
2
2
1
4
1
4
3
4
2
1
2
5
1
3
2
4
1
4
2
1
2
5
8
7
8
3
5
8
1
2
1
2
2
2
2
29
37
32
55
30
39
18
1
3
2
2
2
3
3
1
1
1
2
2
1
1
2
4
4
3
1
2
4
5
3
6
6
3
5
4
4
1
3
5
3
3
3
2
3
2
4
3
2
3
2
4
5
2
3
4
1
1
3
4
1
2
3
2
Janet Geary 2012
Page 10
Tasks required in the first practical session.
1. Load the excel data file seabridge.xls into SPSS
2. Set up the Variable View page
3. Save the file as Seabridge.sav
[Thanks to Richard Charlesworth for the notes for this session]
Using SPSS v18/19 to analyse survey data
These notes give an introduction to using SPSS (Statistical Package for the Social Sciences) for
survey analysis. They are based around the questionnaire ‘Survey of Seabridge Fitness & Sports
Centre’ and the survey data.
1. Entering data into SPSS
Data entry
Data can be entered in three ways:
(i)
directly within SPSS (via the data editor);
(ii)
copied from a spreadsheet (e.g. Excel) or Word table, and pasted into the SPSS data
editor.
(iii)
imported to SPSS from a previously created spreadsheet file (e.g. Excel).
Methods (i) & (ii) are the easiest methods & are described below.
Data entry using the SPSS data editor
If data is to be entered directly within SPSS, click on ‘Type in data’ (see Figure 1), then the ‘Data
View’ tab at the bottom of the screen. This takes you into the SPSS data editor which has the same
format as a spreadsheet with each column representing a different variable (a question), and each
row the record of a different respondent (or ‘case’). Note however, that from time to time it is
necessary to use more than column for a single question. For example, if multiple responses are
permitted/required, a separate column is needed to record each response (e.g. see Q4 in the
‘Seabridge Fitness & Sports Centre’ questionnaire, which requires two answers).
Figure 1 – entering data directly into SPSS
Janet Geary 2012
Page 11
Data entry by copying data from a spreadsheet or Word table
If we choose to simply copy data from a previously prepared spreadsheet such as Excel or a
Word table to the SPSS data editor, remember to paste the data into the ‘Data View’ page,
and not the ‘Variable View’ page. It is important to paste only the numerical data and not any
variable names, which might also have been entered on the spreadsheet or table. SPSS will
then automatically assign default variable names VAR00001, VAR00002 etc.
2. Defining the variables
Before undertaking any analysis it is important to fully define each variable. This is carried out
on the ‘Variable View’ page. In Figure 2 the variable names have already been added –
respondent, sport, reason etc, (note that ‘respondent’ is simply the questionnaire (or
respondent/user number) whereas the remaining variables relate to specific questions). We
are in the process of defining the variable ‘sport’: the ‘(Variable) Label’ enables the user to give
a brief description so the variable ‘sport’ represents ‘Sport or activity undertaken’; similarly we
can define the meaning of each of the possible numerical responses under ‘Value Labels’, so
a response ‘1’ represents ‘Swimming’, ‘2’ represents ‘Keep Fit/Aerobics’ etc; under ‘Measure’
the user has already confirmed that ‘Sport’ is a ‘nominal’ (or categorical) variable.
Figure 2 - defining the variable ‘Sport’
Note with large questionnaires it is often helpful to insert the question number at the beginning
of the ‘(Variable) Label’. So here we have entered ‘Q1 Sport or activity undertaken’ etc. This
makes it easier to locate specific questions when conducting analyses.
The ‘Missing’ category enables us to record the code used for any missing values. For
example, respondent No. 10 has failed to divulge his/her age, recorded here with a code of
999. Figure 3 shows ‘999’ being defined as the code for a missing response to Question 6
(‘Age’).
Janet Geary 2012
Page 12
Figure 3 - defining 999 as the code for a missing value for Question 6 (‘age’)
Note that SPSS enables us to define up to three distinct missing value codes. The reason for
this is that in addition to genuinely missing responses, we may choose to treat some
‘legitimate’ responses as though they were missing.
For instance, respondent number 14 has ticked more than one box for Question 9 (main
reason for joining the Centre). Two or more reasons may have persuaded this respondent to
join the Centre, so although this response may technically be a correct & honest reply, the
ticking of only one box was requested (‘the main reason..’), and it would be inappropriate for
us to choose one of these replies to code at the expense of the other, or to count them both.
Instead we can introduce a missing value code (here 888) to record multiple responses.
.
When you have finished defining all the variables, save the file. We will be using this file next
week.
Janet Geary 2012
Page 13
Topic 1
Normal Distribution
Probability Distributions
Consider the case where we have a number of customers and we have recorded their ages.
If we take ages groups of 10 years and use probabilities rather than frequencies we get the histogram:
[the probability is the frequency of each group dived by the total frequency. The total of all probabilities
adds to 1]
age of customers
probability
0.4
0.3
0.2
0.1
0
20 to 30
30 to 40
40 to 50
50 to 60
60 to 70
70 to 80
age group
If we take age groups of 5 years we get :
ages of customers
0.25
probability
0.2
0.15
0.1
0.05
0
20 to
25
25 to
30
30 to
35
35 to
40
40 to
45
45 to
50
50 to
55
55 to
60
60 to
65
65 to
70
70 to
75
75 to
80
age group
With age groups of 2 years we get :
ages of customers
0.12
probability
0.1
0.08
0.06
0.04
0
20 to 22
22 to 24
24 to 26
26 to 28
28 to 30
30 to 32
32 to 34
34 to 36
36 to 38
38 to 40
40 to 42
42 to 44
44 to 46
46 to 48
48 to 50
50 to 52
52 to 54
54 to 56
56 to 58
58 to 60
60 to 62
62 to 64
64 to 66
66 to 68
69 to 70
70 to 72
72 to 74
74 to 76
76 to 78
78 to 80
0.02
age groups
If we continue this process, we can draw a curve rather than a histogram. The curve will join the
midpoint of the top of each column.
Janet Geary 2012
Page 14
A continuous random variable, such as age measured to the nearest day, hour, minute etc ., has an
infinite number of possible values and it is not possible to list every one with its associated probability.
Furthermore the probability of each value will be so small that it must be considered approximately
equal to zero. As a result we can only consider the probability that a value lies within a particular
interval. In the graph above we have used classes such as 22 up to 24 to represent an interval.
For example: When measuring heights the probability that an individual is exactly 1.63 metres tall is
very small but the probability that someone has a height between 1.625 and 1.635 metres is much
easier, and sensible, to determine.
The Normal Distribution is one of the most important in Statistics, not only because many data
distributions conform to this pattern, but also because the Normal Distribution underpins the subject of
statistical inference.
The Normal Distribution curve is shown below.
The total area under the curve represents the total probabilities, thus the total area equals 1.
standard normal distribution
-4
-3
-2
-1
0
1
2
3
4
standard deviations from the mean
Properties of the Normal Distribution
The Normal Curve has the following properties:
 It is symmetrical about the mean.
 The curve is bell-shaped.
 The total area under the curve equals 1
 68.26% of the area under the curve lies within 1 standard deviation from the mean.
 95.44% of the area under the curve lies within 2 standard deviations from the mean.
 99.73% of the area under the curve lies within 3 standard deviations from the mean.
The exact shape and position of the Normal curve depends upon the values of the mean and standard
deviation. As Normal Distributions have an infinite combination of means and standard deviations,
there is a problem in compiling tables for the probabilities.
This problem is solved by standardising the variables.
The formula below takes x values (actual measurements) and converts them into z values (for use in
tables).
z=x-μ
μ is the mean and σ the standard deviation.
σ
z represents the number of standard deviations between x and the mean.
If actual measurements, x, are converted into z scores by use of the formula, then a Normal
Distribution with mean μ and standard deviation σ is transformed into a Normal Distribution with
mean 0 and standard deviation 1.
Janet Geary 2012
Page 15
Example:
The examination marks for a large group of students followed a Normal Distribution. The mean mark
was 50 and the standard deviation was 10 marks.
What percentage of students gained marks in the range:
a)
Between 40 and 60
b)
Between 30 and 70
c)
above 70
.
a) mean = 50
s.d. =10
x = 40
𝑧=
x = 60
𝑧=
𝑥−𝜇
=
40−50
𝜎
10
i.e. 1 standard deviation below the mean.
𝑥−𝜇
=
60−50
=
=
𝜎
10
i.e. 1 standard deviation above the mean.
−10
= −1
10
10
10
= 1
Thus a value between 40 and 60 is equivalent to a value between 1 standard deviation above and 1
standard deviation below the mean.
Using the ‘Properties of the Normal Distribution’ above, the area is 68.26%
Thus we would expect 68.26% of students to gain marks in the range 40 to 60.
b) Between 30 and 70
mean = 50
x = 30
s.d. = 10
𝑥−𝜇
𝑧 =
=
30−50
=
−20
= −2
𝜎
10
10
Thus x = 30 is 2 standard deviations below the mean.
x = 70
𝑧 =
𝑥−𝜇
=
70−50
=
20
= 2
𝜎
10
10
Thus x = 70 is 2 standard deviations above the mean.
c)
Using the ‘Properties of the Normal Distribution’, the area between 2 standard deviations
above and below the mean is 95.44%
Thus we would expect 95.44% of students to gain marks in the 30 to 70 range.
Above 70
From part b) of this question we have find that 95.44% of students will gain marks in the range
30 to 70. This leaves 4.56% that are either above 70 or below 30. As the Normal curve is
symmetrical, 2.28% will be above 70 and 2.28% will be below 30.
Thus we would expect 2.28% of students to gain marks above 70.
In this example, we could use the information from the ‘Properties of the Normal Distribution’ as the x
values gave z values of 1 and 2. In most examples the z values will not be integers and we must use
tables to find the areas and thus determine the probabilities.
Use of tables for the Normal Distribution
There are different ways of expressing tables for the Normal Distribution. These notes refer to the
tables that follow on the next page.
The tables give the area in the right hand side tail. This area represents the probability that a random
variable is greater than z.
For example . The probability that z > 1.35 = 0.0885
Janet Geary 2012
Page 16
Tables for the Normal Distribution
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
0.00
0.5000
0.4602
0.4207
0.3821
0.3446
0.3085
0.2743
0.2420
0.2119
0.1841
0.1587
0.1357
0.1151
0.0968
0.0808
0.0668
0.0548
0.0446
0.0359
0.0287
0.0228
0.0179
0.0139
0.0107
0.0082
0.0062
0.0047
0.0035
0.0026
0.0019
0.0013
0.0010
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
Janet Geary 2012
0.01
0.4960
0.4562
0.4168
0.3783
0.3409
0.3050
0.2709
0.2389
0.2090
0.1814
0.1562
0.1335
0.1131
0.0951
0.0793
0.0655
0.0537
0.0436
0.0351
0.0281
0.0222
0.0174
0.0136
0.0104
0.0080
0.0060
0.0045
0.0034
0.0025
0.0018
0.0013
0.0009
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.02
0.4920
0.4522
0.4129
0.3745
0.3372
0.3015
0.2676
0.2358
0.2061
0.1788
0.1539
0.1314
0.1112
0.0934
0.0778
0.0643
0.0526
0.0427
0.0344
0.0274
0.0217
0.0170
0.0132
0.0102
0.0078
0.0059
0.0044
0.0033
0.0024
0.0018
0.0013
0.0009
0.0006
0.0005
0.0003
0.0002
0.0001
0.0001
0.03
0.4880
0.4483
0.4090
0.3707
0.3336
0.2981
0.2643
0.2327
0.2033
0.1762
0.1515
0.1292
0.1093
0.0918
0.0764
0.0630
0.0516
0.0418
0.0336
0.0268
0.0212
0.0166
0.0129
0.0099
0.0075
0.0057
0.0043
0.0032
0.0023
0.0017
0.0012
0.0009
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.04
0.4840
0.4443
0.4052
0.3669
0.3300
0.2946
0.2611
0.2296
0.2005
0.1736
0.1492
0.1271
0.1075
0.0901
0.0749
0.0618
0.0505
0.0409
0.0329
0.0262
0.0207
0.0162
0.0125
0.0096
0.0073
0.0055
0.0041
0.0031
0.0023
0.0016
0.0012
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.05
0.4801
0.4404
0.4013
0.3632
0.3264
0.2912
0.2578
0.2266
0.1977
0.1711
0.1469
0.1251
0.1056
0.0885
0.0735
0.0606
0.0495
0.0401
0.0322
0.0256
0.0202
0.0158
0.0122
0.0094
0.0071
0.0054
0.0040
0.0030
0.0022
0.0016
0.0011
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.06
0.4761
0.4364
0.3974
0.3594
0.3228
0.2877
0.2546
0.2236
0.1949
0.1685
0.1446
0.1230
0.1038
0.0869
0.0721
0.0594
0.0485
0.0392
0.0314
0.0250
0.0197
0.0154
0.0119
0.0091
0.0069
0.0052
0.0039
0.0029
0.0021
0.0015
0.0011
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.07
0.4721
0.4325
0.3936
0.3557
0.3192
0.2843
0.2514
0.2206
0.1922
0.1660
0.1423
0.1210
0.1020
0.0853
0.0708
0.0582
0.0475
0.0384
0.0307
0.0244
0.0192
0.0150
0.0116
0.0089
0.0068
0.0051
0.0038
0.0028
0.0021
0.0015
0.0011
0.0008
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
0.08
0.4681
0.4286
0.3897
0.3520
0.3156
0.2810
0.2483
0.2177
0.1894
0.1635
0.1401
0.1190
0.1003
0.0838
0.0694
0.0571
0.0465
0.0375
0.0301
0.0239
0.0188
0.0146
0.0113
0.0087
0.0066
0.0049
0.0037
0.0027
0.0020
0.0014
0.0010
0.0007
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
0.09
0.4641
0.4247
0.3859
0.3483
0.3121
0.2776
0.2451
0.2148
0.1867
0.1611
0.1379
0.1170
0.0985
0.0823
0.0681
0.0559
0.0455
0.0367
0.0294
0.0233
0.0183
0.0143
0.0110
0.0084
0.0064
0.0048
0.0036
0.0026
0.0019
0.0014
0.0010
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.0001
Page 17
When dealing with problems involving the use of Normal tables, it is advisable to draw sketches to
identify the required areas.
-4
1:
-3
-2
-1
0
1
2
3
4
Find the probability that z is greater than 1.
From tables; when we look up z = 1.000 we get 0.1587
2:
Thus P( z > 1 ) = 0.1587
Find the probability that z is less than 2.01
When we look up z = 2.01 we get 0.0222 as the area above z = 2.01
The required area is 1 - 0.0222 = 0.9778
Thus P( z < 2.01 ) = 0.9778
3:
Find the probability that z is between 0.6 and 2.01
The required area is shown.
-4
-2
From tables
0
2
z = 0.6 gives
4
0.2743
z = 2.01
gives
0.02222
The required area is found by 0.2743 - 0.0222= 0.2521 as it is between greater than 0.6 and
greater than 2.01
Thus P( 0.6 < z < 2.01 ) = 0.2521
4:
Find the probability than z is less than -1.5
From tables we cannot look up z = -1.5 directly as only positive values are listed.
However we can look up z = + 1.5 to find the area.
As the Normal curve is symmetrical, the area below z = -1.5 is the same as the area above z
= + 1.5.
From tables z = 1.5
gives 0.0668
Thus the area below z = -1.5 is 0.0668.
Thus P( z < -1.5 ) = 0.0668.
5.
Find the probability that z lies between -1.5 and 0.6
-4
-2
0
2
4
From tables
z = 0.6 gives 0.2743
From tables
The required area lies between z = -1.5 and z = 0.6
The required area is 1 - .2743 - 0.0668 = 0.6589
Janet Geary 2012
z = 1.5 gives
0.0668
Thus P(-1.5 < z < 0.6) = 0.6589
Page 18
Worked Example:
The mean weight of 500 schoolboys is 55 kg and the standard deviation is 4 kg.
Assuming that the weights are Normally distributed, find the percentages of boys that are expected to
weigh:
a)
more than 58 kg
b)
between 58 and 60 kg
c)
between 50 and 65 kg
40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
a)
μ = 55
σ=4
more than 58 kg
x = 58
𝑥−𝜇
𝑧 =
From tables
58−55
=
𝜎
z = 0.75
3
=
4
= 0.75
4
gives
0.2266
Thus the required area is 0.2266
Thus we would expect 22.66% of the boys to weigh more than 58 kg
b)
between 58 and 60 kg
x = 58
x = 60
z = 0.75 and tables give 0.2266
𝑧 =
𝑥−𝜇
𝜎
=
60−55
4
=
5
Thus Area 1 (above 0.75) = 0.2266
= 1.25
4
z = 1.25 tables give 0.1056
Thus Area 2 (above 1.25) = 0.1056
The area between z = 0.75 and z = 1.25 is
0.2266 - 0.1056 = 0.1210
Area 1 – Area 2
Thus we would expect 12.1 % of the boys to weigh between 58 and 60 kg.
c)
between 50 and 65 kg
x = 50
𝑧 =
𝑥−𝜇
𝜎
=
50−55
4
−5
=
4
= −1.25
Thus Area 1 (below – 1.25) = 0.1056
From tables z = 1.25 gives 0.1056
x = 65
𝑧 =
𝑥−𝜇
𝜎
=
65−55
4
=
10
4
From tables z = 2.5 gives 0.00621
The required area is Total Area - A1 - A2
= 2.5
Thus A2 (above 2.5) = 0.00621
1 - 0.1056 - 0.00621 = 0.88819
Thus we would expect 88.82% of the boys to weigh between 50 and 65 kg
Janet Geary 2012
Page 19
Example: The weekly output of a production line varies according to the Normal distribution with a
mean of 1,500 units and a standard deviation of 120 units.
The manager of the factory wants to know what the production output is for 95% of the time.
To answer this question:
We wish to find the limits of production output that encloses 95% of the area.
If we take this 95% symmetrically, we have two tails of 2.5 % each left.
Thus area in the tail = 2.5% or 0.025
Scanning the tables for an area of 0.025 we find the z value to be 1.96.
Thus 95% of the area lies between z = -1.96 and z = + 1.96.
We can now use this statement to solve equations. Mean = 1500
Upper limit
Lower limit
z = 1.96
z = x - mean =
s.d.
1.96 =
235.2
x - 1500
120
x - 1500
120
1.96 × 120
sd = 120
= x - 1500
= x - 1500
z = -1.96
z = x - mean =
s.d.
-1.96 =
x - 1500
120
-1.96 × 120
- 235.2
x - 1500
120
= x - 1500
= x - 1500
235.2 + 1500 = x
-235.2 + 1500 = x
1735.2 = x
1264.8 = x
Thus 95% of the time, the production output will be between 1265 and 1735 units a week.
Seminar Questions:
1.
A thousand candidates sat an examination, the results of which were Normally distributed with
a mean mark of 50 and a standard deviation of 10 marks.
How many candidates would be expected to score:
a)
less than 75 marks
b)
less than 25 marks
c)
more than 60 marks
d)
a grade C ( i.e. 50 to 59 inclusive)
2.
A machine in a factory produces components whose lengths are Normally distributed with
mean = 102 mm and standard deviation = 1.5 mm.
a) Find the probability that, if a component is selected at random and measured, its length
will be
i) Less than 100 mm
ii) Greater than 104 mm
b) If a component is only acceptable when its length lies in the range 100 mm to 104 mm,
find the percentage of acceptable components.
Janet Geary 2012
Page 20
3.
As a result of tests on light bulbs, it was found that the life-time of a particular make of bulb
was distributed Normally with an average life of 2040 hours and a standard deviation of 60
hours.
What percentage of bulbs are expected to last :
a) for more than 2150 hours
b) for more than 1960 hours
4.
As a result of the introduction of Income Self-Assessment, at the end of the financial year,
taxpayers may either have overpaid or underpaid their tax. Those taxpayers who have
overpaid are entitled to a refund, whilst those who have underpaid owe the Tax Office money.
Past experience leads the Tax Office to believe that the amounts to be refunded or owed are
Normally distributed.
This year, for one category of taxpayer, the mean of these amounts was a refund of ,607
and a standard deviation of ,320.
What proportion of taxpayers is entitled to a tax refund greater than ,1,100?
What proportion of taxpayers owe money to the Tax Office?
What proportion of taxpayers is entitled to a refund of between ,50 and ,250?
For a separate class of taxpayers, the standard deviation is unknown. The mean refund for
this class is ,550 and 71.9% of these tax payers have refund greater than ,300.
What is the standard deviation for this class of taxpayer.
5.
A box of breakfast cereal states that it weighs 500 grams. This is the nominal weight.
When the person responsible for quality control in the company that produces the cereal
measured a sample of 100 boxes of cereal the mean weight was 504 gms and the standard
deviation was 2 gms. These values were exactly correct according to the company=s policy.
Why would the company plan to have a mean weight greater than the nominal weight?
What is the probability that a customer, buying one packet of cereal, would buy a packet:
a) weighing more than 503 gms
b) weighing below the nominal weight
Answers
1 a) 993.79
2 a) i) 0.0918
3 a) 3.36%
4 a) 6.18%
5 a) 0.6915
Janet Geary 2012
b) 6.21
ii) 0.0918
b) 90.82%
b) 2.87%
b) 0.02275
c) 158.7
b) 81.64%
d) 315.9
c) 8.68%
d)
,431
Page 21
Practical session 2 [Using the file seabridge.sav with all variables defined]
1. Producing descriptive statistics in SPSS
2. Drawing graphs in SPSS
Data summary & analysis [Thanks to Richard Charlesworth for some of these notes on SPSS]
Figure 4 shows the basic command sequence for an elementary statistical analysis. Click first
on ‘Analyze’, then ‘Descriptive Statistics’ then ‘Frequencies’
Figure 4 (above) - using ‘Analyze’ to provide some elementary statistical analysis
Figure 5 (below) - requesting an analysis of the variable ‘sport’
Janet Geary 2012
Page 22
In Figure 5 the researcher has requested an analysis of the variable ‘Sport’ and samples of
output are in Figures 6 & 7. Had we wished to do so, we could have requested the same
analysis of several variables, rather than just ‘Sport’. Note the left-hand window records the
output requested so far – useful for quickly locating earlier analyses.
Figure 6 (above) - a frequency table of ‘Sport’
Figure 7 (below) - a bar chart of ‘Sport’
Janet Geary 2012
Page 23
To produce descriptive statistics for age
The output produced is
Descriptive Statistics
N
Minimum
Q6 age of respondent
58
Valid N (listwise)
58
Janet Geary 2012
17
Maximum
68
Mean
36.19
Std. Deviation
11.607
Page 24
4
Incorporating output in a document
The focus of these notes is about interacting with SPSS, and hence the figures illustrate the
screen display. However SPSS often produces more output than is needed (so you need to
be selective); moreover this is not always in an appropriate style for inclusion in a document
such as a dissertation or report.
Tables and graphs can easily be copied in to a document using copy and paste commands.
SPSS tables are copied to a Word document as a Word table. You may wish to reformat
aspects of the table (e.g. column widths may need readjusting, changing lines/ borders etc.)
to get closer to the preferred style, and there is the added benefit that you can edit out any
redundant or irrelevant information. SPSS graphs will be copied as a picture, which can be
adjusted for size and position to fit in the document.
5
Saving SPSS files
Remember to save both the data and output files before exiting SPSS (you will be reminded
of this by SPSS if you fail to do so). In Figure 8 the user has given the name ‘Seabridge
Fitness & Sports Centre survey’ to the output file; the full filename is ‘Seabridge Fitness &
Sports Centre survey.spv’. Similarly if the data file is also called ‘Seabridge Fitness & Sports
Centre survey’, the full filename is ‘Seabridge Fitness & Sports Centre survey.sav’.
Figure 8 - the output file is saved as ‘Seabridge Fitness & Sports Centre .spv’
Janet Geary 2012
Page 25
Topic 2:
Hypothesis Testing: Chi-Squared test
Chi-square test of association
This test is one of the most important tests used in survey analysis. It is used to test whether two
types of classification (i.e. answers to two questions) are statistically independent.
The test is extremely useful in management research and can be used to explore whether, for
instance, there is any relationship between gender and management grade, gender and motivation,
stress and salary level etc.
In a survey, airline passengers were asked to answer the question >How did you book your flight=
and also asked for their gender. The chi-squared test allows us to test to see if there is any significant
difference in the responses of the male and female passengers.
The resulting contingency table was:
method of booking
internet (1)
telephone(2)
travel agent (3)
Total
gender
male (1)
14
16
22
52
Total
female (2)
15
24
9
48
29
40
31
100
Looking at this table, it would seem that women were more likely to use the telephone more than men
and that men were more likely to use travel agents. However, we need to perform the chi-squared test
to see if these differences are statistically significant.
The chi-square test compares the ‘observed’ results with those would be ‘expected’ if the results were
independent. The expected results are calculated by the formula:
Expected = row total × column total
Grand total
As an example, The number of males who use the internet was observed to be 14.
The expected value for this cell :
Expected = row total × column total
Grand total
= 29 × 52 = 15.08
100
The table of observed and expected values is shown below:
method of booking
internet
telephone
travel agent
Total
gender
male
14 [15.08]
16 [20.80]
22 [16.12]
52
Total
female (2)
15 [13.92]
24 [19.20]
9
[14.88]
48
29
40
31
100
Formally we test whether our null hypothesis of independence is supported by the evidence of our
survey. If not, we will reject the null hypothesis in favour of the alternative hypothesis which claims
that method of booking and gender are not independent.
Formally:
Null hypothesis
Alternative hypothesis
Janet Geary 2012
H0:
H1:
method of booking is independent of gender
method of booking is not independent of gender
Page 26
The overall comparative measure is provided by the test statistic:
(𝑜 − 𝑒)2
𝜒2 = ∑
𝑒
This is calculated by:
observed, o
expected, e
o-e
(o - e)2
(o - e)2/e
14
15.08
-1.08
1.1664
0.0773
15
13.92
1.08
1.1664
0.0838
16
20.8
-4.8
23.04
1.1077
24
19.2
4.8
23.04
1.2
22
16.12
5.88
34.5744
2.1448
9
14.88
-5.88
34.5744
2.3235
total
6.9371
In this case
χ2 = 6.9371
The degrees of freedom , df, can be found from the formula :
df = (number of rows-1) × (number of columns-1)
Note: Do not count the “totals” in the number of rows and columns.
In this case df = (3-1) ×(2-1) = 2
Testing at a level of significance of 5%, the critical value of the chi-square distribution (with 2 df) is
5.991 (see tables at the end of this section). We compare our calculated test statistic with the
tabulated value. The >critical value= is exceeded by the test statistic as can be clearly seen in the
graph below.
The test statistic falls into the >rejection (or critical) region= (5.991 and above), and consequently we
reject the null hypothesis that the method of booking and gender are independent. We therefore take
this test result as evidence that there is an association between the method of booking and gender.
As the largest relative difference is between the use of travel agents, we can say that men are more
likely to book through travel agents and women less likely.
The relevant SPSS printout is on the next page
:
Janet Geary 2012
Page 27
Case Processing Summary
Cases
Valid
Missing
N
Percent
N
B6 * C3
100
100.0%
0
Percent
.0%
Total
N
100
Percent
100.0%
B6 * C3 Crosstabulation
B6
1
2
3
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
C3
1
14
15.1
16
20.8
22
16.1
52
52.0
Total
2
15
13.9
24
19.2
9
14.9
48
48.0
29
29.0
40
40.0
31
31.0
100
100.0
Chi-Square Tests
Value
df
Asymp. Sig. (2-sided)
Pearson Chi-Square
6.937
2
.031
Likelihood Ratio
7.109
2
.029
Linear-by-Linear Association
3.204
1
.073
N of Valid Cases
100
a 0 cells (.0%) have expected count less than 5. The minimum expected count is 13.92.
The value for ‘Pearson Chi-Square’ is given as 6.937.
Interpretation of the output:
This gives a calculated value of chi-squared of 6.937 which can be compared with the critical value of
5.9915 (from statistics tables using a 5% level of significance). As the calculated value is greater than
the critical value it does lie in the ‘reject’ region of the curve (i.e. the area to the right of our calculated
value is less than 5%). SPSS gives us the area to the right of the calculated value as 0.031. We could
have used this to see that the calculated value of chi-squared lies in the tail and thus we do not have
to use statistics table.
If the value in the last column for Pearson Chi-Square was greater then 5% (0.05) we would have
concluded that the method of booking and gender were independent at the 5% level of significance.
In this example, we would conclude that there is evidence of an association between gender and
method used to book the flight.
Some limitations of the chi-square test
There are some limitations of the chi-square test.


the test identifies only the presence of relationships, not the focus or direction;
the statistical properties of the test require that:
o the expected values must be ≥ 5. If this is not the case, appropriate rows or columns
must be pooled together; however in some cases we simply do not have enough data
(or density of data throughout the table), and combining rows and columns results in
collating data into meaningless overarching classes rendering the test of little or no
value.
Janet Geary 2012
Page 28
Note: The often stated requirement above, that all expected values must be ≥ 5, is a safety first
measure. In practice, we can accommodate up to 20% of the cells having expected values of between
1 and 5 (notice that the SPSS output provides a reminder of this). In the event of too many cells
containing small expected frequencies we would need to combine rows or columns as detailed above.

the χ2 formula needs to be modified slightly for 2 by 2 tables. We use the formula:
𝜒2 = ∑(|𝑜 − 𝑒 | − 0.5)2 /e
This involves subtracting 0.5 from the absolute difference between o and e, before squaring
the result and then proceeding as before. The problem arises because we are using the
continuous chi-square distribution to approximate to the discrete cell frequencies; with a small
number of cells (that is 2 by 2 tables) this approximation needs a ‘continuity correction’. In fact,
if the o and e values are quite large, the original uncorrected result will be close to the
adjusted value; however the problem becomes more marked when the frequencies are small,
so it is wise to always use the modified formula with 2 by 2 tables. Notice that SPSS provides
the continuity correction when a 2 by 2 table is analysed.
The test assumes a null hypothesis that ‘gender’ and ‘method of booking’ are statistically independent,
and expected frequencies for each of the categories are calculated. These are compared with what
actually was recorded in the survey, and a test statistic is computed. The value of the relevant test
statistic above is 6.937, which would occur by chance with a probability of 0.031 (or 3.1%) when
‘gender’ and ‘method of booking’ are statistically independent.
In such testing we typically assume a probability value of less than 0.05 (or 5%) is evidence that the
null hypothesis is incorrect. In this case we therefore have sufficient evidence to reject the null
hypothesis and conclude that ‘gender’ and ‘method of booking’ are not independent. So male and
females have used different methods of booking.
The chi-square test is used here because both variables are nominal. If both were ordinal the chisquare test could still be used.
Yates Correction for 2x2 Contingency Tables:
This is used for 2 by 2 contingency tables when the total number of items in the table is relatively
small (say, < 50 ). With a 2x2 table the degree of freedom= (r-1)×(c-1) = 1.
Example: 42 people were asked about their voting intentions in a forthcoming referendum on the
single currency. The results are shown below:
Would vote for
would vote against
total
female
12
7
19
male
8
15
23
20
22
42
would vote for
would vote against
female
12
(9.05)
7
(9.95)
19
male
8
(10.95)
15
(12.05)
23
20
22
42
We have calculated the >expected values= as usual i.e expected = (row total×column total)
grand total
The test statistic, with Yates Correction, is given by:
Janet Geary 2012
Page 29
χ2 = ( |o - e| - 0.5)2
e
Where |o-e| denotes the absolute value of a number (regardless of sign). So |3| = |-3| = 3
H0
H1
etc.
Rows and columns are independent [voting does not depend on gender]
some form of dependence exists
[the genders vote differently]
level of the test: 5%
o
e
o-e
|o-e|-0.5
(|o-e|-0.5)2
(|o-e|-0.5)2/e
12
9.05
2.95
2.45
6.0025
0.6633
7
9.95
-2.95
2.45
6.0025
0.6033
8
10.95
-2.95
2.45
6.0025
0.5482
15
12.05
2.95
2.45
6.0025
0.4981
total
2.3129
Since the critical value of χ2 with 1 degree of freedom at the 5% level is 3.8415 (from tables) then we
fail to reject H0 and conclude that, on the basis of the evidence available to us, rows and columns are
independent. Thus there is no evidence to support the claim that males and females vote differently.
Seminar Exercise
1.
Number of defects found in a product:
0
1
2
3
4 or more:
Expected percentage:
5%
15%
30%
30%
20%
We take a random sample of 35 items and observe the following numbers of units with the
stated number of defects:
Number of defects found in a product: 0
1
2
3
4 or more:
Numbers observed,
3
8
4
11
9
Test to see if this data is consistent with the claimed proportions of defectives.
(Remember to ensure that all expected values are greater than or equal to 5)
2.
A random sample of people reporting sick over a five day period was:
Monday
Tuesday
Wednesday
Thursday
Friday
8
20
14
18
25
Is there any evidence to suggest that the reporting is not spread evenly through the
week?
3.
A firm uses three similar machines to produce a large number of components. On a particular
day a random sample of 99 from the defective components produced on the early shift were
traced back to the machine that produced them. The same was done with a random sample
of 65 defectives from the late shift. The table below shows the number of defectives found to
be from each machine on each shift:
machine A
machine B
machine C
Early Shift
37
29
33
Late Shift
13
16
36
Janet Geary 2012
Page 30
Test the hypothesis that the probability of a defective coming from a particular machine is
independent of the shift in which it was produced.
4.
Stapleton Electrics manufacture televisions at four different factories and quality control is of
great interest to the company. The table below shows reliability of the machines in a
particular month:
Factory A
Factory B
Factory C
Factory D
Needed repair
4
15
9
12
No repair needed
8
10
6
6
a)
b)
Test to see if there is any association between factory and reliability of machines.
Subsequently, it was discovered that the figures above had been divided by 10 to
make the arithmetic easier.
Correct for this and repeat the test. Does it make any difference to the result?
5. Debtovia is a small country in the middle of a recession. As part of a nation-wide survey a
particular town was selected at random and the question “Would you support an incomes
policy?” was asked to a number of workers selected at random. Their answers (Yes, No or
Don’t Know) and their employment status(skilled/unskilled and Union/Non-Union) were
recorded and the data is presented below:
a)
b)
c)
Skilled and
in union
Skilled and
not in union
Unskilled and
in union
Unskilled and
not in union
Yes
7
7
9
12
No
24
21
9
11
Don’t know
29
27
17
27
Test the hypothesis that there is no association between the responses to the above question
and employment status.
Form a new 2x2 contingency table from the above data by omitting all the ADon=t know@
responses and then pooling the remaining responses to obtain one column for ASkilled@ and
one column for AUnskilled@ workers. Test to see if there is any association evident in your
new table.
Comment on your findings in both cases and compare the two situations.
Answers
1.
χ2 test = 6.904 critical = 7.815 do not reject H0
The data is consistent with claimed proportions
2
χ2 test = 9.647 critical = 9.488 reject H0
Reporting sick is not evenly spread. Monday has fewer than expected, Friday has
more.
3
χ2 test = 8.7346
critical = 5.991 reject H0
Number of defectives varies according to shift
4
a) χ2 test = 3.5773
critical = 7.815 do not reject H0
No relationship between factory and reliability
b) New χ2 test = 35.77 (i.e. 10 times as big)
reject H0
5
a) χ2 test = 8.43 Critical = 12.592
do not reject H0
Conclude no association between response and employment status
b) χ2 test = 6.8727 critical = 3.841
reject H0
Conclude there is an association between response and employment
Janet Geary 2012
Page 31
Chi-Squared Tables
area in one tail
0.100
0.050
0.025
0.010
0.005
0.001
1
2.7055
3.8415
5.0239
6.6349
7.8794
10.8276
2
4.6052
5.9915
7.3778
9.2103
10.5966
13.8155
3
6.2514
7.8147
9.3484
11.3449
12.8382
16.2662
4
7.7794
9.4877
11.1433
13.2767
14.8603
18.4668
5
9.2364
11.0705
12.8325
15.0863
16.7496
20.5150
6
10.6446
12.5916
14.4494
16.8119
18.5476
22.4577
7
12.0170
14.0671
16.0128
18.4753
20.2777
24.3219
8
13.3616
15.5073
17.5345
20.0902
21.9550
26.1245
9
14.6837
16.9190
19.0228
21.6660
23.5894
27.8772
10
15.9872
18.3070
20.4832
23.2093
25.1882
29.5883
11
17.2750
19.6751
21.9200
24.7250
26.7568
31.2641
12
18.5493
21.0261
23.3367
26.2170
28.2995
32.9095
13
19.8119
22.3620
24.7356
27.6882
29.8195
34.5282
14
21.0641
23.6848
26.1189
29.1412
31.3193
36.1233
15
22.3071
24.9958
27.4884
30.5779
32.8013
37.6973
16
23.5418
26.2962
28.8454
31.9999
34.2672
39.2524
17
24.7690
27.5871
30.1910
33.4087
35.7185
40.7902
18
25.9894
28.8693
31.5264
34.8053
37.1565
42.3124
19
27.2036
30.1435
32.8523
36.1909
38.5823
43.8202
20
28.4120
31.4104
34.1696
37.5662
39.9968
45.3147
21
29.6151
32.6706
35.4789
38.9322
41.4011
46.7970
22
30.8133
33.9244
36.7807
40.2894
42.7957
48.2679
23
32.0069
35.1725
38.0756
41.6384
44.1813
49.7282
24
33.1962
36.4150
39.3641
42.9798
45.5585
51.1786
25
34.3816
37.6525
40.6465
44.3141
46.9279
52.6197
26
35.5632
38.8851
41.9232
45.6417
48.2899
54.0520
27
36.7412
40.1133
43.1945
46.9629
49.6449
55.4760
28
37.9159
41.3371
44.4608
48.2782
50.9934
56.8923
29
39.0875
42.5570
45.7223
49.5879
52.3356
58.3012
30
40.2560
43.7730
46.9792
50.8922
53.6720
59.7031
31
41.4217
44.9853
48.2319
52.1914
55.0027
61.0983
32
42.5847
46.1943
49.4804
53.4858
56.3281
62.4872
33
43.7452
47.3999
50.7251
54.7755
57.6484
63.8701
34
44.9032
48.6024
51.9660
56.0609
58.9639
65.2472
35
46.0588
49.8018
53.2033
57.3421
60.2748
66.6188
36
47.2122
50.9985
54.4373
58.6192
61.5812
67.9852
37
48.3634
52.1923
55.6680
59.8925
62.8833
69.3465
38
49.5126
53.3835
56.8955
61.1621
64.1814
70.7029
39
50.6598
54.5722
58.1201
62.4281
65.4756
72.0547
40
51.8051
55.7585
59.3417
63.6907
66.7660
73.4020
41
52.9485
56.9424
60.5606
64.9501
68.0527
74.7449
42
54.0902
58.1240
61.7768
66.2062
69.3360
76.0838
43
55.2302
59.3035
62.9904
67.4593
70.6159
77.4186
44
56.3685
60.4809
64.2015
68.7095
71.8926
78.7495
45
57.5053
61.6562
65.4102
69.9568
73.1661
80.0767
46
58.6405
62.8296
66.6165
71.2014
74.4365
81.4003
degrees of freedom
Janet Geary 2012
Page 32
Practical session 3.
Using SPPS for the Chi-squared test and Seabridge file
As an example , suppose we are interested to see if there is any association between gender (Q5)
and the main reason for joining the Centre (Q10)
Using Analyse, Descriptive Statistics, Crosstabs
Move “Q5 gender “ into the rows and “Q10 main reason “ into the columns
Click on the Statistics button and choose Chi-squared
Janet Geary 2012
Page 33
From the cells button , choose observed and expected
Now choose continue and Ok
Janet Geary 2012
Page 34
The resulting output is :
Case Processing Summary
Cases
Valid
N
Q5 gender * Q10 main
Missing
Percent
58
N
96.7%
Total
Percent
2
N
3.3%
Percent
60
100.0%
reason for joining
Q5 gender * Q10 main reason for joining Crosstabulation
Q10 main reason for joining
recommended
Q5 gender male
location of
range of
by
membership
centre
facilities
friend/relative
rates
Count
Expected
other
Total
2
9
4
3
9
27
4.2
6.5
5.6
5.1
5.6
27.0
7
5
8
8
3
31
4.8
7.5
6.4
5.9
6.4
31.0
9
14
12
11
12
58
9.0
14.0
12.0
11.0
12.0
58.0
Count
female
Count
Expected
Count
Total
Count
Expected
Count
Chi-Square Tests
Asymp. Sig. (2Value
Pearson Chi-Square
Likelihood Ratio
N of Valid Cases
df
sided)
10.300a
4
.036
10.682
4
.030
58
a. 2 cells (20.0%) have expected count less than 5. The minimum
expected count is 4.19.
As the area in the tail of the chi-squared distribution is 0.036 (see value in last column) and thus less
than 5%, we conclude that there is an association between gender and the reason for joining the
centre.
2 cells (20%) have expected values of less than 5. This is acceptable.
Janet Geary 2012
Page 35
Topic 3
Notation
Significance Testing For Means
µ
𝑥̅
σ
is the mean of the population
is the mean of the sample
is the standard deviation of the population
s
is the best estimate of the population standard deviation from a
sample
[Reminder: We use the formula 𝜎 = √(∑(𝑥 − 𝜇)2 /𝑛 ]
for the standard deviation of a
̅)2 /(𝑛 − 1)]
population and formula 𝑠 = √(∑(𝑥 − 𝑥
for the best estimate of the population
standard deviation from a sample ]
The Central Limit Theorem states that if random samples of the same size are repeatedly drawn from
a population of any distribution, the means of those samples will be Normally Distributed.
Additionally, the mean of the sample means will be the same as the population mean.
Mean of sample means
=µ
If the population size is large relative to the sample size, n, then the standard deviation of the sample
means is equal to the population standard deviation divided by sample size
s.d. of sample means
= 𝜎⁄
√𝑛
Z-test for a Population Mean
We use the z-test for a population mean when the standard deviation of the population is known.
Procedure:
Two-tailed test
1)
Set up hypotheses
H0 : µ= some value
H1 : µ ≠some value
2)
Choose significance level
5% is commonly used
3)
Calculate test statistic
𝑧=
4)
Find critical value(s) from tables
5)
Accept or reject hypothesis
6)
Summarise findings
[null hypothesis]
[alternative
hypothesis]
̅− 𝛍
𝒙
𝝈
⁄ 𝒏
√
For a 2-tailed tail, area in each tail is half of
significance level.
Example:
The lengths of metal bars produced by a particular machine are Normally distributed with a mean
length of 420 cm and a standard deviation of 12 cm. After a recent service to the machine, a sample
of 100 bars was taken and the mean length of this sample was found to be 423cm.
Is there any evidence , at the 5% level, of a change in the mean length of the bars produced by the
machine. Assume that the standard deviation remains the same after the service.
In this example:
Janet Geary 2012
population mean
Population standard deviation
Sample mean
Sample size
µ = 420
σ = 12
𝑥̅ = 423
n = 100
Page 36
1)
H0 : µ = 420
H1 : µ ≠ 420
[mean length remains the same]
[mean length changes, 2-tailed test]
2)
significance level , α = 0.05
[5%]
̅− 𝛍
𝒙
𝝈
⁄ 𝒏
√
𝑧=
3)
=
𝟒𝟐𝟑−𝟒𝟐𝟎
𝟏𝟐⁄
√𝟏𝟎𝟎
=
𝟑
𝟏.𝟐
= 𝟐. 𝟓
4)
Critical values:
In this example we are using a two-tailed test as the alternative hypothesis has two parts
( less than 420 and greater than 420). We have to split the 5% significance level into 2 halves
of 0.025 each. From the Normal tables,( below), we get that an area in a tail of 0.025 gives
critical values of z = ±1.96.
5)
The diagram below (to be completed in seminar) shows the “accept” and “reject” areas for H0.
-4
-3
-2
-1
0
1
2
As test z = 2.5 lies in the reject region,
3
4
we reject H0
6)
We conclude that there is evidence to suggest, at the 5% level, that the mean length of the
metal bars has changed since the machine was serviced.
N.B.
In this example we assumed that the standard deviation had not changed although we were
unsure about whether the mean length had changed. This is not very realistic. In general, if
we know the population standard deviation, we are likely to know the population mean as well
and thus have no need to test it.
It is much more usual that we do not know the population mean and the population
standard deviation. This scenario is covered by the t-test.
If the sample size is large , we can use the t-test .
Critical values of the Normal Distribution
Alpha is the area in one tail.
alpha
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.045
0.050
Janet Geary 2012
z value
2.5758
2.3263
2.1701
2.0537
1.9600
1.8808
1.8119
1.7507
1.6954
1.6449
alpha
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
z value
1.6449
1.2816
1.0364
0.8416
0.6745
0.5244
0.3853
0.2533
0.1257
0.0000
Page 37
The t-test for a population mean
The t-test is used to test the value of a population mean when the population standard deviation is
not known.
Procedure:
1)
Calculate the sample mean, , and the estimated standard deviation, s.
∑(𝑥−𝑥̅ )2
𝑠=
2)
3)
√𝑛−1
Set up hypotheses
Choose significance level
4)
Calculate test statistic
5)
6)
7)
Find critical value(s) from tables degrees of freedom = n-1
Accept or reject hypothesis
Summarise findings
H0 : µ = some value
5% is commonly used
̅− 𝛍
𝒙
𝑡= 𝒔
⁄ 𝒏
√
Example:
A random sample of 8 women yielded the following cholesterol levels:
3.1
2.8
1.5
1.7
2.4
1.9
3.3
1.6
Test whether the sample could be drawn from a population whose mean cholesterol level is 3.1.
Test at the 5% level of significance.
Population mean (to be tested):
Sample size:
sample mean:
estimated standard deviation:
H0 :
H1
µ = 3.1
µ ≠ 3.1
µ = 3.1
n=8
𝑥̅ = 2.2875
s = 0.7120
[ from calculator]
[ from calculator]
[2-tailed test]
Significance level = 5%
thus α = 0.05/2 = 0.025
[2-tailed test]
̅ − 𝛍 𝟐. 𝟐𝟖𝟕𝟓 − 𝟑. 𝟏 −𝟎. 𝟖𝟏𝟐𝟓
𝒙
𝑡= 𝒔
=
=
= −𝟑. 𝟐𝟐𝟕𝟔
𝟎. 𝟕𝟏𝟐𝟎⁄
𝟎. 𝟐𝟓𝟏𝟕
⁄ 𝒏
√
√𝟖
degrees of freedom = n-1 = 7
From the t-tables (next page),
α = 0.025
the critical values are ± 2.3646
As the value of the test statistics t = -3.2276 is outside the critical values, we Reject H0
There is evidence to suggest that this sample is not drawn from a population with a mean cholesterol
level of 3.1.
Janet Geary 2012
Page 38
t- tables
0.1
area in one tail
0.05
0.025
0.01
0.005
0.0025
1
3.0777
6.3138
12.7062
31.8205
63.6567
127.3213
2
1.8856
2.9200
4.3027
6.9646
9.9248
14.0890
3
1.6377
2.3534
3.1824
4.5407
5.8409
7.4533
4
1.5332
2.1318
2.7764
3.7469
4.6041
5.5976
5
1.4759
2.0150
2.5706
3.3649
4.0321
4.7733
6
1.4398
1.9432
2.4469
3.1427
3.7074
4.3168
7
1.4149
1.8946
2.3646
2.9980
3.4995
4.0293
8
1.3968
1.8595
2.3060
2.8965
3.3554
3.8325
9
1.3830
1.8331
2.2622
2.8214
3.2498
3.6897
10
1.3722
1.8125
2.2281
2.7638
3.1693
3.5814
11
1.3634
1.7959
2.2010
2.7181
3.1058
3.4966
12
1.3562
1.7823
2.1788
2.6810
3.0545
3.4284
13
1.3502
1.7709
2.1604
2.6503
3.0123
3.3725
14
1.3450
1.7613
2.1448
2.6245
2.9768
3.3257
15
1.3406
1.7531
2.1314
2.6025
2.9467
3.2860
16
1.3368
1.7459
2.1199
2.5835
2.9208
3.2520
17
1.3334
1.7396
2.1098
2.5669
2.8982
3.2224
18
1.3304
1.7341
2.1009
2.5524
2.8784
3.1966
19
1.3277
1.7291
2.0930
2.5395
2.8609
3.1737
20
1.3253
1.7247
2.0860
2.5280
2.8453
3.1534
21
1.3232
1.7207
2.0796
2.5176
2.8314
3.1352
22
1.3212
1.7171
2.0739
2.5083
2.8188
3.1188
23
1.3195
1.7139
2.0687
2.4999
2.8073
3.1040
24
1.3178
1.7109
2.0639
2.4922
2.7969
3.0905
25
1.3163
1.7081
2.0595
2.4851
2.7874
3.0782
26
1.3150
1.7056
2.0555
2.4786
2.7787
3.0669
27
1.3137
1.7033
2.0518
2.4727
2.7707
3.0565
28
1.3125
1.7011
2.0484
2.4671
2.7633
3.0469
29
1.3114
1.6991
2.0452
2.4620
2.7564
3.0380
30
1.3104
1.6973
2.0423
2.4573
2.7500
3.0298
31
1.3095
1.6955
2.0395
2.4528
2.7440
3.0221
32
1.3086
1.6939
2.0369
2.4487
2.7385
3.0149
33
1.3077
1.6924
2.0345
2.4448
2.7333
3.0082
34
1.3070
1.6909
2.0322
2.4411
2.7284
3.0020
35
1.3062
1.6896
2.0301
2.4377
2.7238
2.9960
36
1.3055
1.6883
2.0281
2.4345
2.7195
2.9905
37
1.3049
1.6871
2.0262
2.4314
2.7154
2.9852
38
1.3042
1.6860
2.0244
2.4286
2.7116
2.9803
39
1.3036
1.6849
2.0227
2.4258
2.7079
2.9756
40
1.3031
1.6839
2.0211
2.4233
2.7045
2.9712
41
1.3025
1.6829
2.0195
2.4208
2.7012
2.9670
42
1.3020
1.6820
2.0181
2.4185
2.6981
2.9630
43
1.3016
1.6811
2.0167
2.4163
2.6951
2.9592
44
1.3011
1.6802
2.0154
2.4141
2.6923
2.9555
45
1.3006
1.6794
2.0141
2.4121
2.6896
2.9521
46
1.3002
1.6787
2.0129
2.4102
2.6870
2.9488
50
1.2987
1.6759
2.0086
2.4033
2.6778
2.9370
degrees of freedom
Janet Geary 2012
Page 39
55
1.2971
1.6730
2.0040
2.3961
2.6682
2.9247
60
1.2958
1.6706
2.0003
2.3901
2.6603
2.9146
65
1.2947
1.6686
1.9971
2.3851
2.6536
2.9060
70
1.2938
1.6669
1.9944
2.3808
2.6479
2.8987
75
1.2929
1.6654
1.9921
2.3771
2.6430
2.8924
80
1.2922
1.6641
1.9901
2.3739
2.6387
2.8870
85
1.2916
1.6630
1.9883
2.3710
2.6349
2.8822
90
1.2910
1.6620
1.9867
2.3685
2.6316
2.8779
95
1.2905
1.6611
1.9853
2.3662
2.6286
2.8741
100
1.2901
1.6602
1.9840
2.3642
2.6259
2.8707
200
1.2858
1.6525
1.9719
2.3451
2.6006
2.8385
300
1.2844
1.6499
1.9679
2.3388
2.5923
2.8279
400
1.2837
1.6487
1.9659
2.3357
2.5882
2.8227
500
1.2832
1.6479
1.9647
2.3338
2.5857
2.8195
600
1.2830
1.6474
1.9639
2.3326
2.5840
2.8175
700
1.2828
1.6470
1.9634
2.3317
2.5829
2.8160
800
1.2826
1.6468
1.9629
2.3310
2.5820
2.8148
900
1.2825
1.6465
1.9626
2.3305
2.5813
2.8140
1000
1.2824
1.6464
1.9623
2.3301
2.5808
2.8133
Seminar Exercise
1.
The commissions earned by an estate agent last year were Normally distributed with a mean
of £2,560 and a standard deviation of £310. A random sample of 24 sales from this year was
examined and the mean commission found to be £2,690.
Assuming no change in the standard deviation, does this evidence provide significant
evidence of a change in the mean commission?
Test at the 5% level of significance.
2.
A random sample of 40 sacks of animal feed is taken from a population whose mean is
unknown but whose variance is 15.7 kg2. The mean weight of the sample is 86.3 kg.
Test the hypothesis that the mean weight of the sacks is 86 kg.
Test at the 5% significance level.
3.
The mean time taken by a standard piece of software to run a particular task is 64 seconds. In
order to compare a different piece of software it was given a sample of 15 tasks of the same
type, the times being:
61.8
65.0
73.0
67.5
72.0
69.0
68.1
59.8
63.1
61.5
71.5
63.1
69.4
68.9
67.0
Does this data indicate a difference in the means times between the two pieces of
software? Test at the 5% level of significance.
You have been told by “experts” that the average weekly salary for a part-time secretary is
£275. You decide to check this assertion by taking a random sample of 30 secretaries and
obtain a sample mean of £270 and a standard deviation of £40.
Test at the 5% level to determine whether the experts are correct.
4.
5
During 2011 the number of beds required at a hospital was Normally distributed with a mean
of 1800 a day and a standard deviation of 190 per day.
During the first 50 days of 2012, the average daily requirement for beds was 1830. This data
is considered to be a valid sample for 2012. A senior hospital manager has claimed that this
gives evidence that the requirement for beds has changed since 2011. Would you agree?
Is the sampling method valid?
Janet Geary 2012
Page 40
Answers:
1.
Test statistic, z = 2.05
reject H0
mean has changed
2.
Test statistic z = 0.4788
do not reject H0
3.
Test statistic t = 2.54
reject H0
times are not the same
4.
Test statistic t = -0.6847
do not reject H0
experts could be right
5.
Test statistic z = 1.116
do not reject H0
requirement for beds has not changed
Sampling method is questionable as only covers winter (January and February) and thus will
not represent the demand for beds for the whole year. Illnesses are often seasonal.
One-tailed t-test for a Population Mean
In a one tailed test the alternative hypothesis is that the population mean is either greater than a
particular value or it is less than a particular value. (With a two-tailed test it was simply “not equal”)
Example:
A jeweller was sold some silver wire that he was suspicious about. If the wire was pure silver it would
give a reading of 1.5 ohms when tested for electrical resistance. If the wire was not pure silver the
resistance would be increased.
The jeweller tested 5 pieces of the dubious wire and the following readings for electrical resistance
were obtained:
1.51
1.49
1.54
1.52
1.54
Test at the 5% level the hypothesis that the wire is pure silver.
µ = 1.5
𝑥̅ = 1.52
s = 0.0212
H0 :
H1 :
[to be tested]
[from calculator]
[from calculator]
n=5
µ = 1.5 [it is pure silver]
µ > 1.5 [it is not pure silver, resistance is increased]
thus α = 0.05
Significance level = 5%
[1-tailed test]
̅ − 𝛍 𝟏. 𝟓𝟐 − 𝟏. 𝟓
𝒙
𝟎. 𝟎𝟐
𝑡= 𝒔
=
=
= 𝟐. 𝟏𝟎𝟓
𝟎.
𝟎𝟐𝟏𝟐
𝟎. 𝟎𝟎𝟗𝟓
⁄ 𝒏
⁄
√
√𝟓
α = 0.05
degrees of freedom = n-1 = 4
critical value = +2.1318
[we need the positive critical value for H1 > some value]
[if H1 had been < some value, we would have used the
negative critical value]
Do not reject H0
There is no significance evidence to suggest that the silver wire was impure.
The jeweller should allay his suspicions
Exercises
One tailed t-test
1..
A firm which manufactures panels tests a sample of 13 panels, loading them until they crack.
The results of this test are the following loads (in Newtons).
2.7
7.2
Janet Geary 2012
4.6
8.6
3.1
8.9
4.2
7.3
3.3
8.9
6.7
8.9
8.6
Page 41
An important customer of this firm asserts that the quality of the panels is getting worse
that the mean cracking load is now less than the previous value of 7 Newtons.
Does the evidence bear out the customer=s claim?
Test at the 5% level.
and
2.
A health clinic claims that people following its diet programme will lose, on average, at least 8
kilograms during the programme. A random sample of 41 people on the programme showed a
mean weight loss of 7 kilograms. The sample standard deviation, s, was found to be 3.1
kilograms. Test at the 5% level of significance whether the company is exaggerating, i.e. the
mean weight loss, in general, is less than 8 kilograms.
3.
A manager in a university department claimed that the average hours members of staff work
each week is 35 hours. A random sample of 12 lecturers found that the number of hours
worked in a particular week was taken. The results are given below:
37,
41,
32,
45,
39,
35,
43,
38,
40,
42,
38,
34
The staff union claims that the average number of hours worked is greater than 35.
Does the evidence bear out the union=s claim?
Test at the 5% level of significance.
Answers:
1.
mean =6.3846, sd = 2.4535, Test statistic t = -0.9044, critical = -1.782
do not reject H0 Evidence does not support the claim
2.
Test statistic t = -2.065, critical = -1.684
reject H0 Mean weight loss is less than 8 kg. Company is exaggerating.
3.
Mean = 38.66, sd = 3.821,
Test statistic t = +3.3237, critical = + 1.796
reject H0 Mean hours worked is greater than 35.
Evidence does bear out the union=s claim.
Janet Geary 2012
Page 42
Topic 4
Two-sample test and test for proportions
Two Sample t-Test for Population Means
Suppose we have two independent random samples :
Sample A:
Sample B:
x1,
y1
x2
x3
y2
y3
.....
.....
xnx
from a Normal distribution with mean µx
and standard deviation σ has sample
mean 𝑥̅ , sd = sx and sample size nx
from a Normal distribution with mean µy
and standard deviation σ has sample
mean = 𝑦̅ , sd = sy and sample size ny
yny
Notice that we are assuming that the population standard deviations are the same. Since this standard
deviation is unknown we have to find an estimate for it.
Notation
sample A
sample B
sample size
nx
ny
sample mean
̅
𝒙
̅
𝒚
sample standard deviation
sx
sy
The estimate for the common standard deviation is given by s , where
𝒔𝟐 =
The test statistic is
𝒕=
(𝒏𝒙 −𝟏)𝒔𝒙 𝟐 +(𝒏𝒚 −𝟏)𝒔𝟐𝒚
𝒏𝒙 +𝒏𝒚 −𝟐
(𝒙
̅− 𝒚̅ )−( 𝝁𝒙 −𝝁𝒚)
𝒔√(𝟏⁄𝒏𝒙+ 𝟏⁄𝒏𝒚 )
which fits a t-distribution with
nx + ny - 2 degrees of freedom
We usually use a two sample t-test to determine if the population means are equal.
i.e. (µx - µy =0)
In this case, the test statistic is
𝒕=
(𝒙
̅− 𝒚̅ )
𝒔√(𝟏⁄𝒏𝒙+ 𝟏⁄𝒏𝒚 )
Example:
As part of an investigation undertaken by a telephone company, a comparison of the weekly phone
bills in two areas was undertaken. The figures are given below :
Area A: 5.7
Area B: 8.9
12.0
3.0
10.1
8.2
13.7
5.2
11.9
2.2
11.7
5.7
10.4
3.2
7.3
9.6
5.3
3.1
6.8
3.9
11.8
Test, at the 5% level to see if there is any difference between the mean phone bills in the two areas.
Janet Geary 2012
Page 43
𝒔𝟐 =
calculations
Area A
Area B
sample means,
𝑥̅ = 9.7
𝑦̅
sample s.d ,
sx = 2.91067
sx2 =8.47200
sy = 2.7109
sy2 = 7.3489
sample size,
nx = 11
ny = 10
(𝒏𝒙 −𝟏)𝒔𝒙 𝟐 +(𝒏𝒚 −𝟏)𝒔𝟐𝒚
𝒏𝒙 +𝒏𝒚 −𝟐
=
(𝟏𝟏−𝟏)×𝟖.𝟒𝟕𝟐+(𝟏𝟎−𝟏)×𝟕.𝟑𝟒𝟖𝟗
𝟏𝟏+𝟏𝟎−𝟐
common variance, s2 = 7.940
H0:
H1:
µx = µy or
µx ≠ µy
= 5.3
=
𝟖𝟒.𝟕𝟐+𝟔𝟔.𝟏𝟒𝟎𝟏
𝟏𝟗
= 7.940
common standard deviation, s = 2.8178
µx - µy = 0
[population means are the same]
[pop. means are not the same]
Choose α = 5%
Test Statistic
𝒕=
is
(𝒙
̅− 𝒚̅ )
𝒔√(𝟏⁄𝒏𝒙 + 𝟏⁄𝒏𝒚 )
=
𝟗.𝟕−𝟓.𝟑
𝟐.𝟖𝟏𝟕𝟖√𝟏⁄𝟏𝟏+𝟏⁄𝟏𝟎
=
𝟒.𝟒
𝟐.𝟖𝟏𝟕𝟖√𝟎.𝟏𝟗𝟎𝟗
=3.574
If α = 5% for a 2-tailed test, each tail has 0.025, degrees of freedom = nx + ny -2 = 19
critical values = ± 2.0930
As the test statistic t = 3.574 is outside the “accept region “ (± 2.0930), Reject H0
The mean amount spent on telephone bills is not the same in the two areas
Exercise:
1.
2 sample t-test
Independent random samples of current account balances at two branches of a particular bank
yielded the following results:
Branch
number of accounts
sampled
sample
mean balance
sample
standard deviation
Holloway
12
£1,000
£150
Islington
10
£920
£120
Test , at the 5% level of significance, whether there is any difference between the mean
balances in the two branches of the bank.
2.
A firm is studying the delivery times of two raw material suppliers. The firm is basically satisfied
with supplier A and is prepared to stay with that supplier if the mean delivery time is roughly
the same as that of supplier B.
Independent samples gave the following results:
Janet Geary 2012
Page 44
sample size
sample mean
delivery time
sample
standard deviation
supplier A
50
14 days
3 days
supplier B
30
12.5 days
2 days
Test, at the 5% level of significance, whether there is any difference in the mean times for
delivery.
3.
In a wage discrimination case involving male and female employees, independent samples of
male and female employees with five year’s experience or more provided the hourly wage
results shown in the table.
sample size
sample mean
(£ per hour)
sample
standard deviation
male employees
44
£9.25
£1.00
female employees
32
£8.70
£0.80
Does wage discrimination appear to be present in this case?
Test at the 5% level of significance.
Test at the 1% level of significance.
Do your conclusions depend upon the level of significance?
4.
A University careers office decided to collect data on the starting salaries for graduates in
different subjects ten years ago. She wanted then to compare the averages ten years ago
with those form last year’s graduates. Among the areas investigated were accounting
graduates and general business graduates.
The salaries (in £1,000 per annum) found from two samples are given below:
Accounting
Business
14.4
13.2
12.6
11.8
13.1
12.5
14.0
11.5
13.5
14.9
13.1
12.3
14.1
14.5
12.4
13.7
12.6
12.2
14.9
13.4
14.6
13.1
14.6
Janet Geary 2012
Page 45
Use a 5% level of significance to test the hypothesis that there is no difference between the
mean annual starting salary of Accounting graduates and the mean starting salary of Business
graduates ten years ago.
What is your conclusion?
Answers:
1.
Common s = 137.31
t = 1.36 critical = ± 2.0860
No significant difference in means
do not reject H0
2.
Common s = 2.6723
t = 2.4306
There is a difference in the means
critical = ± 2.0 (approx) reject H0
3.
Common s = 0.9215
t = 2.569
a)
critical = ± 1.9921 (approx)
reject H0
There is a difference in the means
b)
critical = ± 2.6430 (approx)
do not reject H0
There is no difference in the
means
Difference in wages is significant at the 1% level but not at the 5% level.
4
Accounting
mean = 13.6583
sample s.d. = 0.8867
Business
mean = 13.0091
sample s.d. = 1.0784
common s = 0.9827
t = 1.5826
critical = ± 2.0796
No difference in the mean starting salaries
do not reject H0
Test for a difference in proportions
Consider 2 random samples of sizes nx and ny with proportions of a >success= equal to px and py
respectively. We wish to determine if there is any significant difference between the 2 proportions.
The first step is to calculate the common proportion P.
𝑃=
𝑛𝑥 𝑝𝑥 + 𝑛𝑦 𝑝𝑦
𝑛𝑥 +𝑛𝑦
then Q = 1 - P
Note: P can be remembered as: P = total number of successes
total sampled
The test statistic is
𝑍=
𝑝𝑥 − 𝑝𝑦
√𝑃𝑄(1⁄𝑛𝑥 + 1⁄𝑛𝑦 )
which fits a Normal distribution with mean 0 and standard deviation 1, i.e. the standardised Normal
distribution as found in tables.
The null hypothesis, H0 , will be that there is no difference between the 2 proportions.
Example:
Two newspapers conducted opinion polls asking voters whether they would vote for Mr Whittington as
the next Mayor of London.
In the Evening News poll, 325 voters out of 500 said that they would vote for Mr Whittington.
In the Morning Metro poll, 201 voters out of 300 polled said they would vote for Mr. Whittington.
Is there any difference in the results of the two polls?
Janet Geary 2012
Page 46
Answer:
H0 : there is no difference between the proportions saying they will vote for Mr Whittington.
H1 : there is a significant difference between the proportions saying they will vote for Mr Whittington.
Evening News
Morning Metro
number of successes
325
201
sample proportions
px = 325 = 0.65
500
py = 201 = 0.67
300
sample size
nx = 500
ny = 300
P = nx px + ny py
nx + ny
total
526
800
then Q = 1- P
P = 5000.65 + 3000.67
500 + 300
=
325 + 201
800
=
526
800
= 0.6575
Note : we could have calculated P directly from the totals column.
P = 0.6575
thus Q = 1 – P = 0.3425
The test statistic is
𝑝𝑥 −𝑝𝑦
𝑍=
√𝑃𝑄(1⁄𝑛𝑥 +1⁄𝑛𝑦 )
=
0.65−0.67
√ 0.6575×0.3425(1⁄500+1⁄300)
−0.02
√0.221519×(0.002+0.00333)
=
−0.02
√0.001201
=
=-0.5771
Testing at the 5% level of significance, with a 2-tailed test, the critical values are ± 1.96
Do not reject H0 ,
Conclude:
There is no significant difference between the results of the 2 opinion polls.
Seminar Exercises Difference in proportions
1.
A medical research unit decided to test two drugs, A and B, for reducing blood pressure. The
drugs were given to 2 sets of volunteers. One group of 90 volunteers was treated with drug A
and 60 of these volunteers reported lower blood pressure. The second group of 80 volunteers
was treated with Drug B; of these 50 reported lower blood pressure. Test at the 5% level of
significance if there is any difference between the 2 drugs ability to lower blood pressure.
2.
A survey firm conducted door-to-door interviews on a new consumer product. Some
individuals co-operated with the interviewers and completed the questionnaire whilst other
individuals did not co-operate. The sample data is shown in the table below.
Testing at the 5% level of significance, test the hypothesis that the rate of co-operation is the
same for both men and women.
sample size
number co-operating
men
200
110
women
300
210
Janet Geary 2012
Page 47
3.
Two universities were planning to merge. At one of the 2 universities, NLU, 200 staff were
interviewed. Of these 44 said that they thought the new merged University would be “good for
the local area”. At the second university, GLU, 48 of the 300 staff interviewed thought the new
University would be “good for the local area”.
Test, at the 5% level of significance, whether there is any difference in the two proportions who
think that the new university would be “good for the local area”.
Test at the 10% level of significance.
4.
A check was conducted on a form completed in 2 offices. The first office yielded a random
sample of 250 forms, of which 35 contained errors. The second office had a sample size of
300 and 27 forms that contained errors.
Test at the 10% level of significance whether there is any difference in the proportion of forms
containing errors between the 2 offices.
Answers:
1.
P = 0.647
2.
P = 0.64
3.
P = 0.184
z = 0.5679
z = -3.423
z = 1.69
4
z = 1.85
P = 0.1127
Practical session
t-test.
No significant difference in the proportions.
There is a significant difference
No significant difference at 5% level
Significant difference at 10% level
There is a significant difference
Use of SPSs for a t-test for a population means and a two–sample
One sample t-test
Example : We wish to test if the mean age of the users of the Seabridge sports centre could come from
a population with mean age = 40.
Choose Analyse, Compare Means , One-Sample T-test
Janet Geary 2012
Page 48
Choose age as the “test variable” and enter 40 as the “test value”
Using the Options button choose 95% as the Confidence Interval Percentage
Now choose Continue and OK
The SPSS output is :
One-Sample Statistics
N
Q6 age of respondent
Janet Geary 2012
Mean
58
36.19
Std. Deviation
11.607
Std. Error Mean
1.524
Page 49
One-Sample Test
Test Value = 40
95% Confidence Interval of the
Difference
Mean
t
Q6 age of respondent
df
-2.500
Sig. (2-tailed)
57
Difference
.015
Lower
-3.810
Upper
-6.86
-.76
In this case we would reject the hypothesis that the mean age of the population is 40 in favour of the
alternative hypothesis that it is not 40 ( the value in the tail is 0.015 which is less that 0.05). This is not
surprising as the sample mean is 36.19
If we repeat this test with a test value of 36
We get :
One-Sample Test
Test Value = 36
95% Confidence Interval of the
Difference
Mean
t
Q6 age of respondent
df
.124
Sig. (2-tailed)
57
.901
Difference
.190
Lower
Upper
-2.86
3.24
In this case we would not reject the null hypothesis that the mean age of the population is 36 as the
value in the tail is 0.901 ( i.e. above 0.05)
Using SPSS for a two-sample t-test.
As an example we can assume that the male and female members were surveyed independently and
thus can be considered as two independent samples . We can now test to see if there is a difference in
the mean age of the male and female members.
Janet Geary 2012
Page 50
Choose age as the Test Variable and gender as the Grouping Variable
Now define groups using the codes on the data file ( 1 and 2)
Choose Continue and then OK
The output takes the form of :
Group Statistics
Q5 gender
Q6 age of respondent
Janet Geary 2012
N
Mean
Std. Deviation
Std. Error Mean
male
27
34.67
11.622
2.237
female
31
37.52
11.619
2.087
Page 51
Independent Samples Test
Levene's Test
for Equality of
Variances
t-test for Equality of Means
95% Confidence
Interval of the
Sig.
(2F
Q6 age of
Equal
respondent variances
.037
Sig.
.849
t
df
-
tailed)
Mean
Std. Error
Difference Difference
Difference
Lower
Upper
56
.356
-2.849
3.059
-8.977
3.278
- 54.907
.356
-2.849
3.059
-8.980
3.281
.932
assumed
Equal
variances not
.932
assumed
The value in the tails are 0.356 if we assume that the populations from which the samples were drawn
have equal variances and 0.356 if we do not make this assumption . In both cases the values are
greater than 0.05 and thus we do not reject the hypotheses that they have equal means.
This is effectively saying that a female mean age of 37.52 and a male mean age of 34.67 are not
significantly different.
Janet Geary 2012
Page 52
Topic 5
Confidence Intervals
Point and Interval Estimates.
The mean rent paid by students in a provincial town is £102 per week with a standard deviation of £10.
If we were ignorant of this fact (which we would be unless a census had been taken) we would have
to conduct a survey to estimate the population mean.
If the results of the survey gave a sample mean of £100 we could use this to estimate the value of the
population mean. This value of the sample mean, £100, is known as a point estimate because it
consists of just one value. Point estimates can be very misleading as they give a false impression of
accuracy. By themselves they do not recognise the fact that they are only estimates and thus are
subject to a degree of uncertainty.
This question of uncertainty is addressed by using interval estimates such as confidence intervals.
A point estimate is
The mean weekly rent is £100'
An interval estimate would be
>The mean weekly rent is in the range £96 to £104
Confidence Interval for a Population Mean
Known
-
Population Standard Deviation
If the population standard deviation is known then we can use the Normal distribution in our
calculation of the confidence interval.
The formula for a Confidence Interval is
𝑥̅ ±
𝑧𝛼 𝜎
⁄
√𝑛
zα is found from the Normal tables, n is the size of the sample and σ is the standard deviation of the
population from which the sample is drawn.
For a 95% confidence interval α = 100% - 95% =
2
5%
2
i.e.
α = 0.025,
Zα = 1.96
For a 90% confidence interval α= 100% - 90% =
2
10 %
2
i.e.
α = 0.05,
Zα =1.6449
For a 80% confidence interval α= 100% - 80% =
20 %
i.e.
α = 0.10,
Zα =1.2816
2
2
Interpretation of a Confidence Interval
A 95 % confidence interval gives a range within which there is a 95% probability that the population
mean lies.
If we took 100 samples and for each calculated the sample mean, standard deviation and hence the
95% confidence intervals, 95 of these intervals would contain the population mean.
Thus 5% of the time the population mean will NOT lie in the 95% confidence interval.
Janet Geary 2012
Page 53
Example:
An accountant knows, from past experience that the standard deviation of the value of all invoices is
£11.60. However he has forgotten the value of the mean.
In order to estimate the value of the mean, he takes a sample of 20 invoices and finds a sample mean
of £51.41.
Determine a 95% confidence interval for the mean value of all invoices.
95 % confidence interval
𝑥̅ ±
𝑧𝛼 𝜎
= 51.41 ± 5.084 = [46.33,
⁄ = 51.41 ± 1.96 × 11.60⁄
√𝑛
√20
95% confidence interval is
56,49]
[ £46.33 , £56.49 ]
Thus there is a 95% confidence that the mean value of all invoices lies in the range £46.33 to £56.49.
Confidence Interval for a Population Mean
Unknown
-
Population Standard Deviation
If the population standard deviation is not known then we use the t distribution in our calculation of the
confidence interval. We use the value s as the best estimate of the population standard deviation.
The formula for a Confidence Interval is
𝑥̅ ±
𝑡𝛼 𝑠
⁄
√𝑛
tα is found from the t-tables. The degrees of freedom is n -1
Example:
The heights, in cm, of 6 policemen were :
180
176
179
181
183
179
Calculate
a)
a 90% Confidence Interval for the mean heights of all policemen
b)
a 95% Confidence Interval for the mean heights of all policemen
c)
If we had obtained the same sample statistics (mean and standard deviation) from a sample of
60 policemen what effect would this have on the 95% Confidence Interval?
Mean 𝑥̅ = 179.667
a)
sample standard deviation s = 2.3381
90% Confidence Interval
α = 0.05
degrees of freedom = n - 1 = 5
𝑥̅ ±
[results from calculator]
tα = 2.0150
𝑡𝛼 𝑠
= 179.667 ± 1.9234 = [177.74 , 181.59]
⁄ = 179.667 ± 2.015 × 2.3381⁄
√𝑛
√6
There is a 90% confidence that the mean height of all policemen lies in the range 177.74cm to 181.59
cm.
b)
95% Confidence Interval
α = 0.025
degrees of freedom = n -1 = 5
𝑥̅ ±
N.B.
tα = 2.5706
𝑡𝛼 𝑠
= 179.667 ± 2.4537 = [177.21 , 182.12]
⁄ = 179.667 ± 2.5706 × 2.3381⁄
√𝑛
√6
A higher percentage confidence interval gives a wider interval
Janet Geary 2012
Page 54
c)
95% Confidence Interval with sample size = 60
α = 0.025
degrees of freedom = n -1 = 59
𝑥̅ ±
tα = 2.0003 (using df = 60)
𝑡𝛼 𝑠
= 179.667 ± 0.60368 = [179.06 , 180.27]
⁄ = 179.667 ± 2.0003 × 2.3381⁄
√𝑛
√60
N.B. Larger sample sizes give narrower confidence intervals.
Thus for a “more accurate” estimate of the mean,(i.e. a narrower confidence interval) take a larger
sample. I hope this was obvious before but we have now shown it to be the case.
Confidence Interval for a Population Proportions
-
A proportion can represent any set amount and is mainly used when the data is not numerical.
Let p = sample proportion
(expressed as a decimal)
then q = 1- p
If np > 5 and nq > 5 then we can calculate a Confidence Interval for a proportion by:
p
± Z α √( pq/n)
Example:
A random sample of 1,000 electors was polled and 400 of the electors said that they will vote Labour.
How accurate an estimate is this sample proportion with respect to how all electors will vote?
p = 400 = 0.4
1000
For a 95% Confidence Interval,
thus q = 0.6
Z α = 1.96
95% Confidence Interval
p
± Z α √( pq/n)
0.4 ± 1.96 × √( 0.24/1000)
0.4
0.4
0.4 ± 1.96 × 0.0155
0.4 ± 0.03036
95% Confidence Interval ( 0.3696
± 1.96 × √( (0.4 × 0.6)/1000)
± 1.96 × √( 0.00024)
, 0.43036)
There is a 95% probability that the proportion of all electors who will vote Labour lies in the range 36.96%
to 43.036%. More sensibly (37% to 43%) will vote Labour with a confidence of 95%.
This is usually reported in the press as “ 40% will vote Labour with an error due to sampling of plus or
minus 3%”
Sample Size Required for a given level of accuracy
We can re-arrange the formula above to help as calculate the sample size required for a given level of
accuracy.
Notation : let M be the size of the margin of error .
Then from above
M=
Z α √( pq/n)
For a 95% confidence interval Z α = 1.96
The size of the product pq will vary according to value of p. But as q = 1 - p, we can say that the
quantity pq = p(1-p)
Janet Geary 2012
Page 55
From the graph below of y = p (1-p) we can see that the largest possible value of pq is 0.25 .This
occurs when both p and q are 0.5.
q=1-p
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
p(1-p)
0
0.09
0.16
0.21
0.24
0.25
0.24
0.21
0.16
0.09
0
0.3
value of pq = p(1-p)
p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.25
0.2
0.15
0.1
0.05
0
0
0.1
0.2
0.3
0.4
0.5
0.6
value of p
0.7
0.8
0.9
1
For a 95% confidence interval we now have:
largest error:
Z α √( pq/n)
1.96 √( 0.25 /n)
M=
M=
rearranging this formula to make n the subject gives:
M
= √( 0.25 /n)
1.96
M2
1.962
n
=
0.25
n
=
0.25 × 1.96 2
M2
=
0.9604
M2
Thus if a 95% Confidence Interval required a maximum margin of error of 2%
n
Some results:
margin
M
1%
2%
3%
4%
5%
=
NB
0.9604/M^2
Janet Geary 2012
9604
2401
1067.1111
600.25
384.16
0.25 × 1.96 2
M2
= 0.9604
( 0.02)2
=
M = 0.02
2401
Sample size must be an integer.
minimum
sample size
9604
2401
1068
601
385
Page 56
Seminar Exercise
1.
An ambulance station is looking at its response times and takes a sample of 16 call outs and
finds that the mean of the sample is 17 minutes whilst the standard deviation of the sample is
5 minutes. What are the 99% confidence limits for the mean response time of all call outs.
2.
In order to evaluate the success of a television advertising campaign for a new product, a
company interviewed 400 residents in the television area. 120 of them knew about the
product. How accurately does this estimate the percentage of residents in the area who know
about the product?
Calculate a 95% confidence interval.
3.
A random sample of 400 rail passengers is taken and 55% are in favour of proposed new
timetables. Calculate a 95% confidence interval for the proportion of all passengers that are in
favour of the timetables.
4.
A manufacturer needs to estimate the mean life of batteries they are producing. To do this
they take a sample of 100 batteries and find that the mean life of this sample is 50 hours.
The standard deviation of all battery lifetimes is known to be 6 hours.
Calculate
a)
a 95% Confidence Interval for the mean of all battery lifetimes.
b)
a 99% Confidence Interval for the mean of all battery lifetimes.
5.
The Human Resources department of a company is leading a campaign to reduce
absenteeism of staff.
Last year 15% of the 59,202 hours which should have been worked over a 46 week year
were lost due to absenteeism.
The campaign that was set up has now been running for 20 weeks. Absenteeism reports for
the last 10 weeks of the campaign were examined. The weekly hours lost were:
week no.
hours lost
11
195
12
190
13
162
14
170
15
177
16
190
17
198
18
177
19
184
20
191
a)
Considering these 10 weeks as a random sample of all the weeks to be worked, has
there been a significant decrease in absenteeism over last year? You should carry
out a test for the mean number of hours lost per week. Test at the 10% level of
significance, and explain your choice of test statistic.
b.
Calculate a 90% confidence interval for the mean number of hours lost per week after
the campaign, and explain its meaning.
6. Last Christmas, the average value of purchases per customer at a toyshop was ,36.00 with a
standard deviation of ,10.25.
The toyshop is anxious to know early on whether this years average Christmas spend is
different, so it takes a random sample of 15 customers. Their spend on toys is (in ,):
22
48
36
45
35
11
22
69
86
45
57
43
22
24
17
The shop wishes to be 90% confident that the error due to sampling is no larger than ,3.00.
Assuming that the population standard deviation remains at ,10.25
a)
Is the sample taken sufficiently large to achieve the accuracy objective?
b)
Test whether the average spend this Christmas is significantly different from last
year’s. Test at the 10% level of significance.
c)
Establish an 80% confidence interval for the average spend this year.
d)
Are your answers to (b) and ( c) consistent? Explain your reasoning.
Janet Geary 2012
Page 57
Answers
1.
(13.32, 20.68)
2.
(25.5%, 34.5%)
3.
(50.1% , 59.9%)
4
a)
(48.824, 51.176)
b)
(48.455 , 51.545)
5
sample mean = 183.4 sample sd = 11.6065 last year mean = 193.05
t = -2.629
critical = -1.383 reject H0 , Has been a decrease in hours lost
b) (176.67 , 190.13)
6
a)
sample size is too small, it should be at least 32
b) z = 1.058
do not reject H0 ,
Mean is not significantly different from last year.
c) (35.41 , 42.19)
d)
both b) and c) imply that the value 36 would lie in a
90% confidence interval, i.e. are consistent
Practical session : Using SPSS to find Confidence Intervals
To find a 95% confidence of the length of membership:
Choose Analyze, Descriptive Statistics and then Explore
Select “length of membership” as the dependent list.
Now choose Statistics button and use 95% for the Confidence Interval for Mean
Janet Geary 2012
Page 58
Now choose Continue and then OK
The output is :
Case Processing Summary
Cases
Valid
N
Q8 length of membership
Missing
Percent
60
N
100.0%
Total
Percent
0
.0%
N
Percent
60
100.0%
Descriptives
Statistic
Q8 length of membership
Mean
Std. Error
2.32
95% Confidence Interval for
Lower Bound
2.05
Mean
Upper Bound
2.58
5% Trimmed Mean
2.30
Median
2.00
Variance
1.034
Std. Deviation
1.017
Minimum
1
Maximum
4
Range
3
Interquartile Range
1
Skewness
Kurtosis
.131
.320
.309
-.956
.608
This gives a confidence interval of [ 2.05 , 2.58] We could report this as :
There is a 95% confidence that the mean length of membership is between 2 and 2½years
Janet Geary 2012
Page 59
Topic 6:
1.
Correlation and regression
Pearson product moment coefficient of correlation
If we wish to measure the strength of linear relationship between two variables (x and y), and the data
is cardinal, we use the Pearson product moment coefficient of correlation (r), which is defined as
follows:
𝑟=
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
√(𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ) × (𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 )
The value of lies in the range -1 < r < +1. A r value of -1 signifies a perfect negative linear correlation,
+1 a perfect positive linear correlation, and 0 no linear correlation. These scenarios are most easily
demonstrated on a scatter graph. Note that the coefficient only measures the strength of relationship;
it is not evidence of cause and effect.
In effect r measures how adequately a scatter of observations can be represented by a straight line.
However it is more meaningful to interpret the strength of the relationship by looking at r 2 (known as
the coefficient of determination, where 0 < r2 < +1) which measures the proportion (or percentage) of
variation in y which is explained by the variation in x. To focus on r tends to overstate the strength of
the relationship (e.g. r = 0.7 seems to suggest a fairly strong relationship, but it explains less than 50%
of the variation).
We can test the significance of the coefficient using the test statistic:
t=
r (N-2)
(1-r2)
which has a t-distribution with N-2 degrees of freedom
Example
In a survey of 30 company employees, the correlation between length of service and age is 0.87207.
A scatter diagram of the data is shown below.
y = 0.4573x - 8.2194
R² = 0.7605
length of service
employee data on age and length of service
26
24
22
20
18
16
14
12
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
55
60
age
The details of the calculation for the value of the correlation coefficient are shown below:
Janet Geary 2012
Page 60
totals
service
y
2
24
2
9
12
6
1
5
3
4
3
1
14
5
4
13
1
11
8
1
8
12
13
3
9
2
11
4
5
6
age
x
24
51
25
34
40
32
23
28
21
33
26
21
57
27
23
45
22
43
35
19
28
44
40
35
48
25
43
24
30
35
y2
4
576
4
81
144
36
1
25
9
16
9
1
196
25
16
169
1
121
64
1
64
144
169
9
81
4
121
16
25
36
x2
576
2601
625
1156
1600
1024
529
784
441
1089
676
441
3249
729
529
2025
484
1849
1225
361
784
1936
1600
1225
2304
625
1849
576
900
1225
xy
48
1224
50
306
480
192
23
140
63
132
78
21
798
135
92
585
22
473
280
19
224
528
520
105
432
50
473
96
150
210
Σy =202
Σx= 981
Σy2= 2168
Σx2 =35017
Σxy =7949
𝑟=
𝑟=
√(𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ) × (𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 )
30 × 7949 − 981 × 202
√(30 × 35017 − (981)2 ) × (30 × 2169 − (202)2 )
𝑟=
𝑟=
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
238470 − 198162
√(1050510 − 962361) × (65040 − 40804)
40308
88149×24236
√
40308
= 46221 = 0.87207
As the value of r = 0.87207 , the value of r2 is 0.7605
Thus 76% of the variation in the length of service can be explained by the variation in age .
Janet Geary 2012
Page 61
The SPSS printout looks like:
Correlations
AGE
SERVICE
Pearson Correlation
1
0.872
Sig. (1-tailed)
.
0
N
30
30
SERVICE
Pearson Correlation
0.872
1
Sig. (1-tailed)
0
.
N
30
30
** Correlation is significant at the 0.01 level (1-tailed).
AGE
Thus r = 0.872071 and r2 = 0.760508; therefore we have a strong positive correlation in which
approximately 76% of the variation in length of service is explained by the variation in age. The
remaining 24% will be due to other factors (e.g. not all employees will have >worked their way up=
through the company; some will have joined from other companies etc..). We can test the significance
of this result by computing the test statistic and comparing it with critical values from the t-distribution.
Our null hypothesis is that the true population correlation ρ = 0; our alternative hypothesis is that ρ > 0.
The test statistic
𝑡=
0.872071×√(30−2)
√(1−0.760508)
= 9.4294
Testing at a level of significance of 5%, the critical t-value is 1.701. Since the test statistic exceeds the
critical value we can reject the null hypothesis, and conclude that we have evidence of a genuine
linear relationship between length of service and age. Moreover, the fact that 9.4294 is much larger
than 1.701 shows that we could adopt a much smaller level of significance and still reject the
hypothesis of no relationship.
Note the output does not specifically provide the test statistic but does indicate that the correlation is
statistically significant at a 1% level of significance.
Note we are using a one-tailed test of significance with alternative hypothesis ρ > 0; if r had been
negative, our alternative hypothesis would be ρ < 0, the test statistic would also be negative, and the
critical t-value would be on the negative side of the t-distribution. Thus the null hypothesis would be
rejected if the test statistic is less than the critical value (i.e. closer to the tail). The test is therefore a
complete mirror image of the example above.
Regression
Whilst correlation measures the extent to which there is a linear relationship between 2 variables,
regression enable us to find the equation that describes that linear relationship.
The equation of a regression line has the form:
Y = a + bX
where Y is the dependent variable (the one we wish to predict / explain) and X is the independent
variable. The value Aa@ is known as the intercept of the line and Ab@ measures the gradient of this
line.
The relevant formulae are:
gradient
𝑏=
intercept
𝑎=
Janet Geary 2012
∑ 𝑥𝑦− ∑ 𝑥 ∑ 𝑦
2
𝑛 ∑ 𝑥 2 −(∑ 𝑥)
∑ 𝑦−𝑏 ∑ 𝑥
𝑛
Page 62
Looking at the calculation above, we can see that the value of the “top” of b is the same as the “top”
of r and that the value of the “left-bottom bracket” of r is the same as the “bottom” of b.
Thus
𝑏=
Thus
𝑎=
∑ 𝑥𝑦− ∑ 𝑥 ∑ 𝑦
=
2
𝑛 ∑ 𝑥 2 −(∑ 𝑥)
∑ 𝑦−𝑏 ∑ 𝑥
𝑛
40308
= 0.45727
88149
202−0.45727×981
=
30
= −8.2194
Thus the equation of the line linking length of service (y) and age (x) is:
y = -8.2194 + 0.45727x
This equation can then be used to make predictions.
The SPSS regression output, with interpretation in italics, looks like:
Variables Entered/Removedb
Model
1
Variables
Variables
Entered
Removed
agea
Method
. Enter
Variables Entered/Removed
This simply tells us that >age= was the independent variable and >service= the dependent variable
Model Summaryb
Model
R
R Square
.872a
1
Adjusted R
Std. Error of the
Square
Estimate
.761
.752
2.629
Model summary
The value of the correlation coefficient, r was 0.872 and the value of r2 was 0.761.
Coefficientsa
Unstandardized
Standardized
95.0% Confidence Interval
Coefficients
Coefficients
for B
Upper
Model
1
B
(Constant)
age
Std. Error
-8.219
1.657
.457
.048
Beta
.872
t
Sig.
Lower Bound
Bound
-4.961
.000
-11.613
-4.826
9.429
.000
.358
.557
Coefficients
The unstandardized coefficients give us the values of a and b in the regression equation.
Thus the equation here is y = -8.219 + 0.457x
The final column “Sig” gives values less than 0.01 thus we can say that the coefficients of the
regression equation are significantly different from zero at the 1% level ( and thus at 5% level).
Janet Geary 2012
Page 63
Casewise Diagnosticsa
Case Number
Std. Residual
2
service
3.385
Predicted Value
24
Residual
15.10
8.899
a. Dependent Variable: service
Casewise diagnostics
During the input dialogue, we asked for any standardised residuals outside the range -3 to + 3.
The output shows that one reading, case number 2, had a large standardised residual. This indicates
that this point does not fit the general trend of the straight line and can be regarded as an outlier (i.e.
an unusual reading). Case number 2 is an employee aged 51 with 24 years of service. On the scatter
diagram, we can see that this point is a long way from the regression line.
Residuals Statisticsa
Minimum
Predicted Value
Maximum
Mean
Std. Deviation
N
.47
17.85
6.73
4.603
30
Residual
-4.785
8.899
.000
2.583
30
Std. Predicted Value
-1.361
2.414
.000
1.000
30
Std. Residual
-1.820
3.385
.000
.983
30
a. Dependent Variable: service
Residual Statistics
This table can be ignored for simple cases.
Rank Correlation
If we wish to measure the strength of relationship between two variables, and at least one of them is
ordinal, we need to use a nonparametric measure of correlation in which the strength of relationship is
now based on the ranks of the data. There are two main measures of rank correlation, Kendall=s τ
statistic and Spearman=s rank correlation coefficient; we shall focus on the latter.
Observed values for x and y are replaced by their ranks, and the difference (d) in ranking between
each set of paired observations is calculated. Spearman=s coefficient (rs) is found from the following
formula:
rs=1-
6Σd2
N(N2_1)
As with Pearson=s coefficient, we can also test whether the result is statistically significant. Providing
N  10 we can use the t statistic as before:
rs (N-2)
t=
2
(1-rs)
which has a t-distribution with N-2 degrees of freedom
For larger samples (say N > 30) we can use a normal approximation:
z=rs (N-1)
Janet Geary 2012
Page 64
Example 1
[thanks to Richard Charlesworth for this example]
A survey of MBA students included a question which asked the respondent to identify the most
important factor in their choice of UNL for their MBA.
Factor
MBA course programme/design
Discussion with tutor at recruitment fair/open evening
Recommended by a colleague/friend
Living in London
Credit transfer/flexible study
Prior study at UNL
Photos of the staff in the brochure
Full-time
34
09
22
21
00
04
10
Part-time
25
10
18
28
09
10
00
Full-time
MBA (%)
Part-time
MBA (%)
FT
ranks
PT
ranks
d
d2
MBA programme/design
34
25
1
2
-1
1
Discussion at fair/open
eve
9
10
5
4.5
0.5
0.25
Recommendation..
22
18
2
3
-1
1
Living in London
21
28
3
1
2
4
CATS/flexible study
0
9
7
6
1
1
Prior study at UNL
4
10
6
4.5
1.5
2.25
Photos of the staff ..
10
0
4
7
-3
9
Factor:
Σd = 18.50
2
10
rs=1-
Therefore
rs = 1 -
6Σd2
N(N2_1)
(6×18.5)
7×(49-1)
=
0.6696
Note that tied ranks are averaged (i.e. the scores of 10% for Part-time MBA jointly cover the rankings
4 and 5, so both take on the rank 4.5).
SPSS Output: Spearman=s rank correlation coefficient
Nonparametric Correlations
Correlations
Spearman's rho
Janet Geary 2012
FT MBA Correlation Coefficient
Sig. (1-tailed)
N
PT MBA Correlation Coefficient
Sig. (1-tailed)
N
FT MBA
1
.
7
0.667
0.051
7
PT MBA
0.667
0.051
7
1
.
7
Page 65
How to use SPSS for Correlation and Regression
Example: load up the SPPSS file age&service.sav
To produce a correlation matrix
Choose Analyze, Correlate, Bivariate
Select age and service as the input variables. Check that Pearson is ticked. Choose OK
The resulting output looks like:
service
Pearson Correlation
1
Sig. (2-tailed)
N
age
Pearson Correlation
Sig. (2-tailed)
N
Janet Geary 2012
.872**
.000
30
30
.872**
1
.000
30
30
Page 66
service
Pearson Correlation
1
Sig. (2-tailed)
N
age
Pearson Correlation
Sig. (2-tailed)
N
.872**
.000
30
30
.872**
1
.000
30
30
**. Correlation is significant at the 0.01 level (2-tailed).
To produce Regression output
Select Analyze Regression Linear
Select age as the independent variable
Select service as the dependent variable
Select enter as the method
Then select the statistics button
Select Casewise diagnostics
Outliers outside
3 standard deviations
tick Confidence Intervals, choose 95% as Level
then Continue
then OK
The output was shown previously
SPSS has the capacity to produce further analyses for regression, for example an analysis of the
residuals is possible (and extremely useful).
Janet Geary 2012
Page 67
Topic 7:
Multiple Regression
In many situations we want to be able to use more than one independent variable to predict the value
of the dependent variable. We use multiple regression in these cases.
For example, we might suspect that the price of a house depends not only on the number of
bedrooms it has but also depends on the number of living rooms, number of bath/shower rooms, size
of garden and so on. To predict the price of a house we would need a model that took into account all
the significant factors.
We denote the independent variables as x1, x2, ........xk and the associated coefficients as b1, b2 ..... bk .
The constant of the regression equation is denoted by α .
Thus the regression equation is:
y
= a + b1x1 + b2x2 + b3x3 + .......... bkxk
The formulae for calculating the values of a , b1 , b2 , b3.......... bk are too complicated to include here.
We will focus on analysing the SPSS printouts.
t-test on Regression Coefficients
We can use the t-test to decide if the coefficients of the regression line are significant
H0:
H1:
βi = 0 [coefficient = 0 :no significant contribution to linear relationship]
βi ≠ 0
The test statistics is given by:
t=
bi / Std Err of bi
We reject H0 if |t| > t( α/2, n-2)
[found from statistics tables]
As before, we can reject H0 if the value of p <0.025 in SPSS printout..
Multicollinearity
If a regression equation contains two or more “independent” variables that have a strong linear
relationship between them, then the model has “multicollinearity”. This means that the independent
variables are not really independent in the sense that they are related to each other.
This strong linear relationship between two or more of the independent variables may make the
estimates of the coefficients unreliable and may, in fact, make some coefficients negative when they
should be positive (or vice versa).
We can check for collinearity by looking at the correlation matrix .
Multiple Regression Example:
London Theatres
Data was obtained from the Society of London Theatres about various statistics recorded for London
Theatres from 1986 to 2010. The data set was obtained from
http://www.solt.co.uk/downloads/pdfs/theatreland/2010-Graph2.pdf
The full data file is shown overleaf.
Janet Geary 2012
Page 68
year
attendances
in thousands
gross box
office revenue
£ thousands
average no of
theatres open
no of
performances
no of new
productions
1986
10236
112068
42
16543
213
1987
10881
129589
42
16603
212
1988
10897
139338
43
16970
28
1989
10945
153251
42
16436
237
1990
11321
177904
40
15887
187
1991
10905
186790
39
15508
192
1992
10900
194772
41
15916
193
1993
11503
215619
41
15922
198
1994
11163
217763
41
16063
208
1995
11938
238741
43
17163
208
1996
11179
229017
41
16084
186
1997
11466
246082
39
15568
195
1998
11925
257920
41
16018
207
1999
11931
266565
44
17089
265
2000
11555
286556
43
16633
252
2001
11735
298989
44
17035
264
2002
12064
327972
44
17090
221
2003
11585
321485
42
16664
225
2004
12025
343674
43
17235
225
2005
12319
383942
45
17406
221
2006
12351
400853
43
16912
268
2007
13636
469939
44
17455
243
2008
13892
483349
45
18275
241
2009
14257
504984
45
17923
260
2010
14152
512332
46
18615
264
We are going to use multiple regression to produce a model that will enable the value of gross box
office revenue to be explained by the other variables.
There are two basic approaches to multiple regression, top-down and bottom-up.
With top-down regression, we start by using all the independent variables in our model and then
successively eliminate those have values of bi that are not significant (Sig T >0.025) or are highly
correlated with other variables (multicollinearity)
Bottom-up approaches successively add variables to the model until no improvement can be made.
The order in which the variables are added is often the order of the values of the correlation
coefficients with the dependent variable (variable with the highest r is used first, then variable with
second highest r is added and so on) excluding multicollinearity at each stage.
For both of these approaches we need to know what the values of the various correlation coefficients
are. This can be obtained by using the correlate command in SPSS .
Note: SPSS highlights any correlations that are significant by the use of * and **.
We get:
Janet Geary 2012
Page 69
Correlations
attendance
attendance
revenue
1
.954**
.720**
.803**
.489*
.000
.000
.000
.013
25
25
25
25
25
.954**
1
.715**
.771**
.559**
.000
.000
.004
25
25
25
1
.955**
.397*
.000
.050
Pearson
theatresopen performances newproductions
Correlation
Sig. (2-tailed)
N
revenue
Pearson
Correlation
Sig. (2-tailed)
N
theatresopen
Pearson
.000
25
25
.720**
.715**
.000
.000
25
25
25
25
25
.803**
.771**
.955**
1
.367
.000
.000
.000
25
25
25
25
25
.489*
.559**
.397*
.367
1
.013
.004
.050
.071
25
25
25
25
Correlation
Sig. (2-tailed)
N
performances
Pearson
Correlation
Sig. (2-tailed)
N
newproductions Pearson
.071
Correlation
Sig. (2-tailed)
N
25
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
From this correlation matrix we can see that:
 All of the other variables are correlated with revenue
 Attendance is highly correlated with the number of theatres open and the number of
performances
As we want to use “revenue” as our dependent variable, we can guess that “attendance” will be in the
model and at most one of “theatres open” and “performances”. We would not expect both “theatres
open” and “performances” to be included as they are highly correlated and thus not really
independent.
“Top-Down” Approach
The “top-down” approach initially involves using all four independent variables in the model.
The relevant printout is shown below with some comments in italics.
The SPSS commands are summarised below
Janet Geary 2012
Page 70
The output is shown below:
Variables Entered/Removedb
Model
Variables Entered
1
Variables Removed
newproductions, performances,
Method
. Enter
attendance, theatresopen
a. All requested variables entered.
b. Dependent Variable: revenue
Variables Entered/Removed
All four possible independent variables have been added using the “Enter” method.
Revenue is the dependent variable.
This indicates that all 4 independent variables were used in the model.
Model Summaryb
Model
R
1
Janet Geary 2012
.961a
R Square
.924
Adjusted R
Std. Error of the
Square
Estimate
.909
36143.148
Page 71
Model Summaryb
Model
R
R Square
.961a
1
Adjusted R
Std. Error of the
Square
Estimate
.924
.909
36143.148
a. Predictors: (Constant), newproductions, performances, attendance,
theatresopen
b. Dependent Variable: revenue
Model Summary
The value of r = 0.961
and r2 = 0.924.
However for multiple regression models, it is usual to use the adjusted R square value of 0.909 for r2
as this value takes into account the number of variables being used as well as the strength of the
correlation.
ANOVAb
Model
1
Sum of Squares
df
Mean Square
F
Regression
3.166E11
4
7.915E10
Residual
2.613E10
20
1.306E9
Total
3.427E11
24
Sig.
.000a
60.586
ANOVA
The F statistics of 60.586 indicates that the model as a whole is significant ( sig < 0.025)
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
Std. Error
(Constant)
-1043627.767
177852.629
attendance
101.514
13.178
theatresopen
12735.681
performances
newproductions
Coefficients
Beta
t
Sig.
-5.868
.000
.913
7.703
.000
14486.506
.200
.879
.390
-28.361
39.178
-.191
-.724
.478
260.547
186.511
.104
1.397
.178
a. Dependent Variable: revenue
Coefficients
The regression equation is given by :
revenue = -1043627.767 + 101×attendance + 12735.681× theatres open + -28.361×performances
+260.547×new productions
Looking at the p values (Sig. Column):
Coefficient of the constant is significantly different from zero
(p = 0.000 <0.025)
Coefficient of attendance is significantly different from zero
(p = 0.000 < 0.025)
Coefficient of theatres open is NOT significantly different from zero
(p = 0.390)
Coefficient of performances is NOT significantly different from zero
(p = 0.478)
Coefficient of new productions is NOT significantly different from zero
(p = 0.178)
Janet Geary 2012
Page 72
This implies that we should get a better model if we went through a process of deleting variables one
by one . The first one to delete would be “performances “ as it has the highest value of p and is
correlated with other variables.
Residuals Statisticsa
Minimum
Predicted Value
Maximum
Mean
Std. Deviation
N
116690.98
536192.75
283979.76
114851.420
25
-50616.152
68157.070
.000
32994.029
25
Std. Predicted Value
-1.457
2.196
.000
1.000
25
Std. Residual
-1.400
1.886
.000
.913
25
Residual
a. Dependent Variable: revenue
Residual Statistics
The Maximum standardised residual = 2.196
minimum standardised residual = -1.400
Thus all points are fairly close to the regression line and there are no outliers.
Conclusion:
Try another model that leaves out the variable performances.
However, we will not go through the whole procedure ourselves as we can utilise a special facility in
SPSS that uses a “bottom up” procedure.
Bottom-Up= Approach (SPSS STEPWISE facility)
From the correlation matrix ,we can see that the order of the correlation coefficients with sales are:
1. attendance (0.954)
2. performances (0.4578)
3. Theatres open (0.2277
4. new productions (0.559)
Attendance is highly correlated with the number of theatres open and the number of performances
This means that we have a case of multicollinearity here. We are unlikely to use all three of these in
the “best” model.
The rest of this printout is the result of the stepwise multiple regression command.
NB. The user did not have to specify the order in which the variables were introduced.
The “stepwise” command works by introducing the variables, one at a time, based on the value of the
correlation coefficients. “Stepwise” stops when it cannot find a “better” model.
Thus the last model produced by stepwise is considered the best model to use for predictions.
Janet Geary 2012
Page 73
We also can choose to have some useful plots here
Janet Geary 2012
Page 74
The step-wise process is summarised in the final output.
Variables Entered/Removeda
Model
Variables Entered
1
Variables Removed
attendance
Method
. Stepwise (Criteria:
Probability-of-F-to-enter
<= .050, Probability-of-F-toremove >= .100).
a. Dependent Variable: revenue
The final (and best) model only uses “attendance” to predict the gross box office revenue.
Model Summaryb
Model
R
.954a
1
Adjusted R
Std. Error of the
Square
Estimate
R Square
.909
.905
36753.551
a. Predictors: (Constant), attendance
b. Dependent Variable: revenue
The value of adjusted r2 is high at 0.905
ANOVAb
Model
1
Sum of Squares
df
Mean Square
Regression
3.116E11
1
3.116E11
Residual
3.107E10
23
1.351E9
Total
3.427E11
24
F
Sig.
230.702
.000a
a. Predictors: (Constant), attendance
b. Dependent Variable: revenue
The regression model as a whole is significant as F has p=0.000
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
Std. Error
(Constant)
-974724.442
83195.461
attendance
106.037
6.981
Coefficients
Beta
t
.954
Sig.
-11.716
.000
15.189
.000
a. Dependent Variable: revenue
The regression equation is revenue = -974724 + 106.037×attendance
Both of the coefficients of the constant and attendance are significantly different from zero.
Janet Geary 2012
Page 75
Excluded Variablesb
Collinearity
Model
1
Beta In
t
Sig.
Partial
Statistics
Correlation
Tolerance
theatresopen
.060a
.650
.523
.137
.482
performances
.013a
.124
.902
.026
.355
newproductions
.122a
1.770
.091
.353
.761
a. Predictors in the Model: (Constant), attendance
b. Dependent Variable: revenue
Residuals Statisticsa
Minimum
Predicted Value
Maximum
Mean
Std. Deviation
N
110668.88
537043.06
283979.76
113951.373
25
-52402.609
67772.398
.000
35979.706
25
Std. Predicted Value
-1.521
2.221
.000
1.000
25
Std. Residual
-1.426
1.844
.000
.979
25
Residual
a. Dependent Variable: revenue
The minimum standardised residual is -1.426 and the maximum is 1.844 so there are no
outliers
The histogram of the residuals closely approximates to a normal distribution, thus the model is a good
one.
Janet Geary 2012
Page 76
The points on the p-p plot are quite close to the diagonal line and do not really exhibit any particular
pattern.
A scatter graph of the standardised residuals and the standardised predicted values is evenly
scattered above and below the zero line and there is no particular pattern here .
Janet Geary 2012
Page 77
Topic 7: Interpreting Regression Output
Using the boats data for multiple regression in SPSS
Data was collected on the prices charged for weekly boat hire at Easter and in the Summer from a
number of boatyards on the Norfolk Broads. For each boat, the fields shown below were recorded.
Data definitions:
We want to see how the maximum price charged during Easter for a week’s hire is related to the
attributes of the boat (length, width, number of fixed berths, maximum number of berths )
The correlation matrix gives:
Correlations
length in
metres
width in
metres
.688**
.000
maximum
number of
number of
maximum
berths
fixed berths Easter price
.769**
.828**
.857**
.000
.000
.000
length in metres Pearson Correlation
Sig. (2-tailed)
1
N
Pearson Correlation
Sig. (2-tailed)
112
.688**
.000
112
1
112
.498**
.000
112
.492**
.000
112
.537**
.000
N
Pearson Correlation
Sig. (2-tailed)
112
.769**
.000
112
.498**
.000
112
1
112
.891**
.000
112
.748**
.000
number of fixed
berths
N
Pearson Correlation
Sig. (2-tailed)
112
.828**
.000
112
.492**
.000
112
.891**
.000
112
1
112
.863**
.000
maximum
Easter price
N
Pearson Correlation
Sig. (2-tailed)
112
.857**
.000
112
.537**
.000
112
.748**
.000
112
.863**
.000
112
1
N
112
**. Correlation is significant at the 0.01 level (2-tailed).
112
112
112
112
width in metres
maximum
number of
berths
There appears to be some multi-collinearity here as there are strong correlations between some of
the “independent” variables. In particular, “number of fixed berths” is strongly correlated with
“maximum number of berths”. Thus we would not expect both of these variables to be present in a
“good” model.
Janet Geary 2012
Page 78
Using the step-wise approach:
Model
1
Variables Entered/Removeda
Variables
Removed
Variables Entered
number of fixed berths
Method
. Stepwise (Criteria: Probability-of-F-to-enter <= .050,
Probability-of-F-to-remove >= .100).
2
length in metres
. Stepwise (Criteria: Probability-of-F-to-enter <= .050,
Probability-of-F-to-remove >= .100).
a. Dependent Variable: maximum Easter price
The first model just used “number of fixed berths” as this has the highest correlation with maximum
Easter price. To this model, the variable “length in metres“ was added. No further additions were
made as the other variables correlated strongly with these two.
Model Summaryc
Model
R
1
.863a
R
Square
Adjusted R
Square
.745
.743
Std. Error
of the
Estimate
67.792
R Square
Change
Change Statistics
F
Change
df1
df2
Sig. F
Change
.745
322.149
1
110
.000
2
.810
.806
58.845
.064
a. Predictors: (Constant), number of fixed berths
b. Predictors: (Constant), number of fixed berths, length in metres
c. Dependent Variable: maximum Easter price
36.988
1
109
.000
.900b
The value of adjusted r2 was 0.743 for model 1 (number of fixed berths). This rose to r 2 = 0.806 when
the “length in metres” was added to model 1 to form model 2.
Coefficientsa
Unstandardized
Standardized
Coefficients
Coefficients
B
Std. Error
Beta
263.311
15.076
Model
1
(Constant)
2
.863
t
17.466
Sig.
.000
17.949
.000
.291
.771
number of fixed
berths
(Constant)
62.209
3.466
12.597
43.251
number of fixed
berths
length in metres
35.206
5.363
.489
6.564
33.623
5.528
.453
6.082
Collinearity
Statistics
Tolerance
VIF
1.000
1.000
.000
.315
3.178
.000
.315
3.178
a. Dependent Variable: maximum Easter price
In model 1 . The coefficient of number of fixed berths is significantly different from zero (p=0.000<0.05)
In model 2, both coefficients are significantly different from zero (p=0.000 in both cases )
Casewise Diagnosticsa
maximum
Case Number
15
Janet Geary 2012
Std. Residual
3.397
Easter price
945
Predicted Value
745.12
Residual
199.877
Page 79
Casewise Diagnosticsa
maximum
Case Number
Std. Residual
15
Easter price
3.397
Predicted Value
945
Residual
745.12
199.877
a. Dependent Variable: maximum Easter price
The only real outlier is case number 15 as it is the only point with a
standardised residual outside the range -3 to + 3.
Residuals Statisticsa
Minimum
Predicted Value
Maximum
Mean
Std. Deviation
N
328.79
806.65
508.26
120.382
112
-158.095
199.877
.000
58.313
112
Std. Predicted Value
-1.491
2.479
.000
1.000
112
Std. Residual
-2.687
3.397
.000
.991
112
Residual
a. Dependent Variable: maximum Easter price
The graph of the residuals does roughly resemble a normal distribution curve. Thus the assumption
that the residuals are normally distributed seems reasonable .
Janet Geary 2012
Page 80
The points are close to the straight line and there is no real pattern in the points , thus the assumption
of normally seems reasonable.
The residuals are fairly evenly spread about the horizontal line through zero. The difference from the
zero line does not change much as the standardised predictions increase.
Conclusion : This is a good model.
Compare this output with one that includes all possible variables.
Variables Entered/Removedb
Variables
Variables
Model
Entered
Removed
Method
1
maximum
. Enter
number of
berths, width in
metres, length
in metres,
number of fixed
berths
a. All requested variables entered.
b. Dependent Variable: maximum Easter price
Janet Geary 2012
Page 81
Model Summaryb
Model
1
R
.903a
R
Square
.816
Change Statistics
R Square
F
Change
Change
df1
df2
.816 118.850
4
107
Std. Error
of the
Estimate
58.396
Adjusted R
Square
.809
Sig. F
Change
.000
a. Predictors: (Constant), maximum number of berths, width in metres, length in metres, number of fixed berths
b. Dependent Variable: maximum Easter price
Model
1
Regression
Residual
Sum of Squares
1621146.826
364878.665
ANOVAb
df
4
107
1986025.491
111
Total
Mean Square
405286.707
3410.081
F
118.850
Sig.
.000a
a. Predictors: (Constant), maximum number of berths, width in metres, length in metres, number
of fixed berths
b. Dependent Variable: maximum Easter price
The F statistics is significant and thus indicates that there is a linear relationship between maximum
Easter price and at least one of the other variables
Coefficientsa
Model
(Constant)
1
length in metres
width in metres
number of fixed berths
maximum number of
berths
Unstandardized
Standardized
Collinearity
Coefficients
Coefficients
Statistics
B
34.804
Std. Error
73.494
36.055
-8.502
44.915
-10.983
6.704
27.532
7.726
5.921
Beta
.485
-.018
.623
-.172
t
.474
Sig.
Tolerance
.637
5.378
-.309
5.814
-1.855
.000
.758
.000
.066
.211
.503
.149
.201
VIF
4.746
1.987
6.696
4.985
a. Dependent Variable: maximum Easter price
(If we were building a model ourselves, we would eliminate “width in metres” and “maximum number
of berths” as the coefficients are not significantly different from zero and the signs are not what we
would expect. We would expect wider boats with more berths to have an increase in price .)
Casewise Diagnosticsa
maximum
Case Number
Std. Residual
15
Easter price
3.191
Predicted Value
945
758.64
Residual
186.357
a. Dependent Variable: maximum Easter price
Residuals Statisticsa
Minimum
Predicted Value
Residual
Maximum
Mean
Std. Deviation
N
336.28
780.69
508.26
120.851
112
-147.015
186.357
.000
57.334
112
Std. Predicted Value
-1.423
2.254
.000
1.000
112
Std. Residual
-2.518
3.191
.000
.982
112
Janet Geary 2012
Page 82
Casewise Diagnosticsa
maximum
Case Number
15
Std. Residual
Easter price
3.191
945
Predicted Value
758.64
Residual
186.357
a. Dependent Variable: maximum Easter price
Janet Geary 2012
Page 83
Features of a Good Linear Regression Model [SPSS printout]
1
Source of information
Model Summary
Desirable feature
A high value of adjusted r2
2
Table of Correlations
[A low value indicates a poor linear fit]
Model chosen should not contain independent variables
that are highly correlated with each other.
3
Coefficients table
[Model should not exhibit any multi-collinearity]
All coefficients significantly different from zero
i.e. Sig column has values below 0.025
4
ANOVA
5
Casewise diagnostics
6
7
Histogram of residuals
Normal p-p plot
[If a coefficient is close to zero it adds nothing useful to
the model]
The F statistic is significantly different from zero
indicating that there is a relationship between the
dependent variable and at least one of the independent
variables.
Only 5% of readings are outside the range -2 to +2 in
'Std. Residual' column
Any outliers are outside the range -3 to +3
[If there are a number of outliers than data points could
just be 'odd' cases or could indicate lack of a linear
relationship.]
Histogram fits the superimposed Normal curve (or is
close)
[If histogram does not approximate to Normal curve, it
implies residuals are not normally distributed and thus
model is not a good fit. Should consider possibility of
non-linear relationship]
Points are:
scattered about the diagonal line
close to the diagonal line
do not exhibit any pattern
[If points are not as stated above, the model is not a
good fit. Should consider possibility of non-linear
relationship]
Seminar Exercise
Using the SPSS data file boats.sav, produce the best multiple regression model for predicting the
maximum summer prices.
Write a report on your printouts.
Janet Geary 2012
Page 84
Topic 8 : Graphical Linear Programming
Linear Programming methods can be used to solve problems where we wish to maximise (or
minimise) a linear function subject to a number of linear constraints.
Graphical linear programming can be used when there are only two variables.
The stages of graphical linear programming are:
1.
Formulation of the problem.
This involves translating a description of a problem into a mathematical format. In particular,
the linear constraints and objective function have to be generated.
2.
Determination of the set of feasible solutions.
This involves drawing a graph to determine the region where all the constraints are met.
3.
Finding the optimal solution.
This involves finding the “best” feasible solution.
Example:
The Soft Toy Company
A manufacturer of expensive soft toys makes giant teddy bears and fluffy rabbits.
Each teddy bear has a contribution to profit of £30 whilst each rabbit has a contribution of £40. (They
are very expensive.)
The manufacturer wants to determine which combination of teddy bears and rabbits should be made
in order to maximise the contribution.
Each teddy bear requires 2 hours of machining and 2 hours of hand labour whilst each rabbit requires
1 hour of machining and 3 hours of hand labour.
Each teddy bear and each rabbit requires 1 kilogram of stuffing.
The manufacturer has certain limitations on the possible production. In particular, there are only 50
hours of machining and 90 hours of hand labour available each week.
Stuffing is in short supply, so the manufacturer can only rely on 40 kilogrammes each week.
Determine which combination of teddy bears and rabbits should be produced in order to maximise
profit.
Problem Formulation :
Summarising the details given above:
machine (hour)
labour (h)
stuffing(kg)
teddy bear
2
2
1
rabbit
1
3
1
total(week)
50
90
40
Defining the Variables:
Let x be the number of teddy bears produced and sold each week
Let y be the number of rabbits produced and sold each week.
Formulating the Objective Function:
Contribution is C = 30x + 40y
so our objective function is Max 30x + 40y
Formulate Constraints:
machine hours:
2x + y ≤ 50
hand labour:
2x + 3y ≤ 90
stuffing:
x + y ≤ 40
As we cannot have a negative number of toys we should include: x ≥ 0
Janet Geary 2012
y≥0
Page 85
Formulation Summary:
max 30x + 40y
such that
2x + y ≤ 50
2x + 3y ≤ 90
x + y ≤ 40
x ≥ 0
y ≥0
Determining the Feasible Region
To draw a graph of the feasible region we have to consider each constraint in turn.
2x + y ≤ 50
Machine hours:
In order to draw the line 2x + y = 50, we need two points.
Choosing x = 0 and y = 0 for our two points we get:
x = 0:
2x + y = 50
y = 0: 2x + y = 50
y = 50
2x
= 50
x
= 25
The graph now looks like:
Soft toy company
rabbits
60
40
20
0
0
5
10
20
25
teddy15bears
We now have to decide which side of the line is required, i.e. on which side of the line is
2x + y actually less than 50.
30
In general, if the constraint in x and y includes ≤ the required area will be to the “bottom left” of the line.
If the constraint includes ≥ the required area will be to the “top right”.
2x + 3y ≤ 90
2x + 3y = 90
3y = 90
y = 30
Again we will want the area under the line.
The graph now looks like:
Hand Labour
x=0
y = 0:
2x + 3y = 90
2x
= 90
x
= 45
Soft toy company
60
rabbits
machine
40
20
labour
0
0
Janet Geary 2012
10
20
teddy bears
30
40
50
Page 86
x + y ≤ 40
x + y = 40
y = 40
Stuffing:
x=0
y=0
x + y = 40
x = 40
We want the area under this line as well.
The two further constraints x ≥ 0 and y ≥ 0 can be included to give the graph shown:
Soft toy company
60
50
machine
rabbits
40
30
stuffing
20
10
labour
0
0
10
20
30
40
50
teddy bears
In this graph the feasible region can be shaded. It is the region where all the constraints are satisfied.
Soft toy company
60
50
rabbits
40
30
20
feasible region
10
0
0
10
20
teddy bears
30
Optimising
In this example we wish to maximise the objective function:
40
50
30 x + 40y
Method 1:
We can use the gradient of the line in order to draw an objective function line.
In general, any line with equation ax + by = c
has gradient -a/b
The line 30x + 40y
has gradient = -30
40
= -3
4
Thus we can draw any line with such a gradient (3 down and 4 along) as our initial objective function
line. This line is then moved parallel to find the optimal solution.
Method 2
Linear programming theory tells us that the optimal solution will always lie at a vertex (corner) of the
feasible region.
Janet Geary 2012
Page 87
Trying each vertex in turn:
vertex
x=0,
y=0
x=0,
y=30
x=15, y=20
x=25, y=0
value of objective function
30x + 40y = 0
30x + 40y = 1200
30x + 40y = 1250
30x + 40y = 750
The highest value of the objective function is found at:
x = 15,
y = 20.
Production Plan.
In order to maximise the value of the weekly contribution, the manufacturer should produce and sell
15 teddy bears and 20 rabbits each week.
This will generate a weekly contribution of £1250
Example: Camping Trip
A youth club is planning a camping trip.
Two sizes of tent are available 4-person and 8-person tents.
There are 64 people that want to go on the trip but there is only room on the site for 13 tents. Only 8
4-person tents are available.
If each 4-person tent costs £15 a night and each 8-person tent costs £45 per night, how many of each
type of tent minimises the nightly cost?
Formulation:
Let x be the number of 4-person tents used and let y be the number of 8-person tents used.
Minimise cost:
Total number of tents:
number of 4-person tents:
people accommodated
Drawing:
x + y = 13
4x + 8y = 64
Cost = 15x + 45y
x+y
x
4x + 8y
x
y
x = 0, y = 13
x = 0, y = 8
≤ 13
≤8
≥ 64
≥ 0,
≥0
y = 0, x = 13
y = 0, x = 16
8-person tents
The graph looks like:
camping trip
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
feasible
0
1
[
Janet Geary 2012
2
3
4
5
6
7 8 9 10 11 12 13 14 15 16 17
4-person tents
Page 88
WinQSB graph
The objective function is
15x + 45y
which has gradient
-15 =
45
Thus we can draw a line anywhere with this gradient (1 down and 3 along)
-1
3
As we wish to minimise the objective function, we must move the objective function line towards the
origin.
The optimal solution is:
The cost will be
x = 8,
y=4
15×8 + 45×4 = 300
Thus the trip organisers should book 8 4-person tents and 4 8-person tents.
This will cost £300 per night and will allow 64 people to be accommodated.
A total of 12 tents are used.
Seminar Sheet
1.
A company manufactures two types of sweatshirts: hooded and round neck.
Each hooded sweatshirt makes a contribution to profit of £4 and each round neck
sweatshirt makes a contribution of £3.
Each hooded sweatshirt requires 1 hour of labour and each round neck requires 2
hours. There are 110 hours of labour available each day. There are limitations in the
production capacity so that only 70 hooded sweatshirts can be made in a day.
The company wishes to maximise the contribution from these sweatshirts.
a)
b)
c)
Janet Geary 2012
Formulate as a linear programming problem.
Determine the number of hooded and round neck sweatshirts that should be made
each day in order to maximise contribution.
What is the value of the maximum daily contribution?
Page 89
2.
A health enthusiast would like to organise his food consumption of two diet supplements Vita
and Glow so that his minimum daily requirement of three basic nutrients A, B and C is
satisfied.
The minimum daily requirements are 14 units of A, 12 units of B and 18 units of C.
Product Vita has 2 units of A and one unit each of B and C in each packet.
Product Glow has one unit each of A and B and 3 units of C in each packet.
The price of Vita is 20p and the price of Glow is 40p per packet.
The health enthusiast wants to determine the level of consumption of Vita and Glow that will
minimise expenditure whilst satisfying the minimum daily requirements.
a)
b)
c)
3.
Formulate the description given above as a Linear Programming problem.
Advise the user on the best combination of Vita and Glow to use.
What is the minimum cost?
A furniture manufacturer makes two types of tables: Traditional and Modern.
Each traditional table requires 6 hours of cutting time, 5 hours of sanding time and 2 hours
for staining.
Each traditional table sold gives a contribution to profit of £50.
Each modern table requires 2 hours of cutting time, 5 hours of sanding time and 4 hours of
staining time.
Each modern table sold makes a contribution of £30.
Each day the manufacturer has available 36 hours of cutting time,40 hours of sanding time
and 28 hours of staining time.
All other inputs are available as required. The company can sell all the tables it makes.
a)
4
Find the number of each type of table that should be made in order to maximise the
contribution.
Find the maximum contribution possible.
A clothes manufacturer offers two versions of a particular t-shirt; one printed and the other
plain.
The manufacturing requirements are:
Cutting and printing time: The printed t-shirt takes 9 minutes each whilst each plain t-shirt
takes only 3 minutes to cut and print. There are 360 minutes available for cutting and printing
each day.
Each printed t-shirt takes 5 minutes for sewing and packing whilst each plain t-shirt takes only
3 minutes. There are 240 minutes available for sewing and packing of these t-shirts each day.
A contract with a local shop requires a minimum of 12 printed t-shirts to be produced each day.
The manufacturer makes a contribution to profit of £6 from the manufacturer and sale of each
printed t-shirt and £5 for each plain t-shirt.
The manufacturer wishes to maximise this contribution to profit.
Formulate the scenario described above as a linear programming problem. You should clearly
indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of printed and plain tshirt the manufacturer should produce and sell in order to maximise contribution to profit.
5.
A small engineering company makes two types of engine parts, coded as part A and part B.
Part A has a contribution of £30 per unit and part B £40. The company wishes to establish the
weekly production plan which maximises contribution.
Janet Geary 2012
Page 90
Production data are as follows:
part A
part B
total available per week
machining
(hours)
4
labour
(hours)
4
materials
(kg)
1
2
6
1
100
180
40
Because of a trade agreement, sales of part A are limited to a weekly maximum of 20 units
and to honour an agreement with an old established customer at least 10 units of part B must
be made each week.
Formulate the scenario described above as a linear programming problem. You should clearly
indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of part A and part B the
company should produce and sell in order to maximise contribution to profit.
What is the expected contribution to profit?
Answers:
1. a)
max
4x + 3y
such that
x + 2y ≤ 110
and
x ≤70
b)
70 hooded and 20 round neck
c)
£340
2. a)
min 20 x + 40y
such that
2x + y ≥ 14
and
x + y ≥ 12
and
x + 3y ≥18
b)
9 packets of Vita and 3 packets of Glow
c)
£3
3
max
50x + 30y
such that
6x + 2y ≤ 36
5x + 5y ≤ 40
2x + 4y ≤ 28
a)
5 traditional and 3 modern each day.
b)
Maximum contribution: £340
4
max
6x + 5y
such that
9x + 3y ≤ 360
5x + 3y ≤ 240
x ≥12
Produce 12 printed and 60 plain t-shirts each day
5 Max
30A + 40B
such that
4A + 2B ≤ 100
4A + 6B ≤ 180
A + B ≤ 40
A ≤20
B ≥10
Produce 15 units of part A and 20 units of part B each week. The expected contribution is £1250 per
week from the sale of these parts.
Janet Geary 2012
Page 91
Topic 9
Analysis
Linear Programming-Shadow Prices and Sensitivity
Consider the following problem:
A company makes two kinds of armchair; Model A with loose covers and Model B with fitted covers
only.
The company estimates that it can sell as many of the armchairs as it can make. Management must
now determine the production targets for the next few months in order to maximise profit.
The company knows that it will make a profit of £50 on each type A model and £40 on each type B
model.
As Model A chairs require extra packing, chairs and loose covers, the company can only manage to
make a maximum of 8 chairs a day.
In the machining department, where the wood for the chairs is shaped, each Model A armchair
requires 1 hour whilst each Model B requires 1.5 hours. There are 15 hours of shaping time available
each day.
In the upholstery department each Model A requires 3 hours and each Model B requires 2 hours.
There is a total of 30 hours available for upholstering each day.
Determine the optimal production plan.
Furthermore answer the following questions:
1.
If an extra hour each day could be made available for shaping the wood, what would this be
worth?
2.
If an extra hour of upholstery time could be made available, by how much would profit be
increased?
3.
By how much could the profit on type A chairs alter before the original optimal plan
changes?
4.
If the original solution is to remain optimal, by how much could the profit from type B
change?
Model:
Let x be the number of model A armchairs produced each day.
Let y be the number of model B armchairs produced each day.
maximise profit:
subject to:
max A
shaping:
upholstery:
and
max 50x + 40y
x
≤8
x + 1.5y ≤ 15
3x + 2 y ≤ 30
x ≥ 0, y ≥ 0
The outline graph of the problem is given below:
Janet Geary 2012
Page 92
The objective function
50x + 40y has gradient = - 50 = -5
40
4
[Any line ax + by = c has gradient -a/b]
We can thus draw a profit line with this gradient and move it “top-right” to maximise.
The optimal solution is found at the point x = 6 and y = 6.
The optimal production plan is to make 6 model A armchairs and 6 model B armchairs each day.
This will produce a maximum profit of:
Model A 6 @ £50 = £300
Model B 6 @ £40 = £240
Total
= £540
Thus the maximum profit available is £540 a day.
Slack and Binding Constraints.
Since the optimal solution is bounded by two constraints, shaping and upholstery, these constraints
are known as binding.
Considering each constraint in turn.
Shaping
x + 1.5y ≤ 15
When x = 6 and y = 6,
x + 1.5y = 6 + 1.5× 6 = 6 + 9 = 15
All of the available time for shaping has been used. Thus the slack on this constraint is 0.
Upholstery
When x = 6 and y = 6
3x + 2y ≤ 30
3x + 2y = 3 × 6 + 2 × 6 = 18 + 12 = 30
All of the time available for upholstering has been used. The slack on this constraint is 0.
Janet Geary 2012
Page 93
Maximum model A
when x = 6 and y = 6
x ≤8
Left hand side, LHS:
Right hand side, RHS
Slack = RHS - LHS = 2
x=6
8
All of the model A armchairs possible have not been made. Thus there is a slack on this constraint of
2.
N.B.
Binding constraints have zero slack.
Shadow Prices
The shadow price ( or dual price) for a particular constraint shows the amount of improvement in the
optimal objective value as the right hand side of that constraint is increased by one unit, with all other
data held fixed
(Eppen, Gould and Schmidt, Introductory Management Science)
Thus if we are trying to maximise the objective function, the shadow price gives the increase in the
value of the objective function. If we are seeking to minimise, the shadow price gives the decrease in
the objective function value.
Shaping
x + 1.5y ≤ 15
If the time available for shaping is increased by 1 hour the new constraint will be:
x + 1.5y ≤ 16
The new optimal solution will be where this line meets the upholstery line, (see graph).
New: shaping
Old: upholstery
When y = 7.2
x + 1.5y = 16 × 3
3x + 2y = 30
Subtracting
3x + 2y = 30
3x + 14.4 = 30
3x = 15.6
3x + 4.5y = 48
3x + 2y = 30
2.5y = 18
y = 7.2
x = 5.2
Thus the new optimal solution would be 5.2 of model A and 7.2 of model B per day
In practice this would mean producing 26 model A and 36 model B in a five day week.
New profit = 50x + 40y = 50 × 5.2 + 40 × 7.2 = 548
Old profit
= 540
Extra profit
= 8
One extra hour of shaping is worth £8. The shadow price for shaping is £8.
Upholstery
3x + 2y ≤ 30
If upholstery time is increased by 1 hour, the constraint becomes:
3x + 2y ≤ 31
and the new optimal solution will be where this constraint line meets the line for shaping.
New: upholstery
Old: shaping
3x + 2y = 31
x + 1.5y = 15 × 3
When y = 5.6
3x + 2y = 31
3x + 11.2 = 31
3x
= 19.8 x = 6.6
Janet Geary 2012
3x + 2y = 31
3x + 4.5y = 45
-2.5y = -14
y = 5.6
Page 94
New optimal solution is x = 6.6, y = 5.6
New profit
Old profit
Extra profit
50x + 40y = 50 × 6.6 + 40 × 5.6 = 554
= 540
= 14
One extra hour of time for upholstery is worth £14. The shadow price of upholstery is £14.
Sensitivity Analysis on the Objective Coefficient Ranges
The objective coefficient ranges tells us the changes that can be made in the objective function
coefficients without changing the optimal solution.
This is particularly useful as profits, costs etc are likely to change over time.
Our objective function line , profit = 50x + 40y, has gradient -50/40 = -5/4 = -1.25
Shaping constraint line x + 1.5y = 15
Upholstery constraint 3x + 2y = 30
has gradient -1/1.5 = -0.667
has gradient -3/2 = -1.5
The profit line will reach an optimal solution at the intersection of the shaping and upholstery lines
whilst the gradient of the profit line lies between the gradients of these two binding constraints.
Thus the current solution will remain optimal whilst the gradient of the objective function lies
between the gradient of the two binding constraints.
Model A
Let the profit from model A chairs change to a new amount called new.
The objective function will now look like: (new)x + 40y
which has gradient
- new
40
The current optimal solution will remain optimal whilst:
gradient of upholstery < gradient of profit line < gradient of shaping
-1.5 <
- new
< -0.667
40
- 1.5 <
- new
- new < -0.667
40
40
-60
< - new
- new < -0.667 40
new < 60
-new < -26.667
26.667 < new
26.667 < new < 60
Current solution remains optimal whilst the profit from model A lies in the range £26.67 to £60
Thus means that the profit could rise by an amount less than £60 - £50 = £10 or it could fall by an
amount less than £50 - £26.67 = £23.33 without affecting the optimal solution.
If the profit from each model A armchair remains in the range £26.67 to £60 the current solution will
remain optimal.
Model B
Let the profit from each model B armchair change to a new amount >new=.
The new objective function will be:
50x + (new)y
which has gradient - 50
new
Janet Geary 2012
Page 95
The current solution will remain optimal whilst the gradient of the profit line is between the gradients of
the two binding constraints.
- 1.5 <
-1.5
-
50
new
< -0.667
<
-50
new
-1.5×new < -50
50 < 1.5×new
50 < new
1.5
33.333 <
- 50
new
- 50
0.667×new
new
new
< -0.667
< -0.667×new
< 50
< 50___
0.667
new < 75
33.33 < new < 75
The current solution will remain optimal whilst the profit from each model B armchair lies in the range
£33.33 to £75.
This is the same as saying:
The current solution will remain optimal whilst the value of the profit from each model B does not rise
by more than £35 [£75 - £40] or fall by more than £6.67 [£40 - £33.33].
Thus continue to produce 6 of each model each day whilst the profit from each model B is between
£33.33 and £75.
The WinQSB input and output looks like :
Choose Edit and then variable names
Janet Geary 2012
Page 96
Choose edit and then constraints
The expression can now be entered into the spreadsheet format
To get the output: Choose “Solve and Analyze” then “Solve the problem”
Solution is :
Janet Geary 2012
Page 97
Notes for a report
The optimal production plan is to make 6 model A armchairs and 6 model B armchairs each day.
This will produce a maximum daily profit of £540
If the profit from each model A armchair remains in the range £26.67 to £60 the current solution will
remain optimal
The current solution will remain optimal whilst the profit from each model B armchair lies in the range
£33.33 to £75.
All of the available time for shaping has been used. Thus the slack on this constraint is 0.
One extra hour of shaping is worth £8. The shadow price for shaping is £8.
All of the time available for upholstering has been used. The slack on this constraint is 0.
One extra hour of time for upholstery is worth £14. The shadow price of upholstery is £14.
All of the 8 model A armchairs possible have not been made. Thus there is a slack on this constraint
of 2 as only 6 have been made.
N.B. Only one change can be made at a time
Seminar Exercise (the first part of each question was done last week)
1.
A company manufactures two types of sweatshirts: hooded and round neck.
Each hooded sweatshirt makes a contribution to profit of ,4 and each round neck
sweatshirt makes a contribution of ,3.
Each hooded sweatshirt requires 1 hour of labour and each round neck requires 2
hours. There are 110 hours of labour available each day. There are limitations in the
production capacity so that only 70 hooded sweatshirts can be made in a day.
The company wishes to maximise the contribution from these sweatshirts.
a)
`b)
c)
d)
e)
f)
Janet Geary 2012
Formulate as a linear programming problem.
Determine the number of hooded and round neck sweatshirts that should be made
each day in order to maximise contribution.
What is the value of the maximum daily contribution?
If another hour of labour was available each day, what would be the maximum
daily contribution?
If the limitations in the production process were changed so that 71 hooded
sweatshirts could be made each day, what would be the effect on the maximum
daily contribution?
Within what limits could the contribution from hooded sweatshirts lie, without
the optimal production plan changing?
Page 98
2.
A health enthusiast would like to organise his food consumption of two diet supplements Vita=
and Glow so that his minimum daily requirement of three basic nutrients A, B and C is
satisfied.
The minimum daily requirements are 14 units of A, 12 units of B and 18 units of C.
Product Vita has 2 units of A and one unit each of B and C in each packet.
Product Glow has one unit each of A and B and 3 units of C in each packet.
The price of Vita is 20p and the price of Glow is 40p per packet.
The health enthusiast wants to determine the level of consumption of Vita and Glow that will
minimise expenditure whilst satisfying the minimum daily requirements.
a)
b)
c)
d)
e)
f)
3.
Formulate the description given above as a Linear Programming problem.
Advise the user on the best combination of Vita and Glow to use.
What is the minimum cost?
If the minimum daily requirement for the nutrient A was increased by 1 unit, what
would the minimum cost be?
If the minimum daily requirement for nutrient A was reduced by 1 unit, what
would the minimum cost be?
Within what range could the price of Vita lie without changing the optimal
combination of Vita and Glow?
A furniture manufacturer makes two types of tables: Traditional and Modern.
Each traditional table requires 6 hours of cutting time, 5 hours of sanding time and 2 hours
for staining.
Each traditional table sold gives a contribution to profit of £50.
Each modern table requires 2 hours of cutting time, 5 hours of sanding time and 4 hours of
staining time.
Each modern table sold makes a contribution of £30.
Each day the manufacturer has available 36 hours of cutting time, 40 hours of sanding time
and 28 hours of staining time.
All other inputs are available as required. The company can sell all the tables it makes.
a)
b)
c)
d)
6
Find the number of each type of table that should be made in order to maximise the
contribution.
Find the maximum contribution possible.
How will the company’s optimal mix of tables change if there were only 30 hours of
cutting time available each day?
If an extra hour of cutting time became available each day, by how much would
the maximum contribution change?
By how much could the contribution to profit of a Modern table increase before
the optimal production plan changes?
A clothes manufacturer offers two versions of a particular t-shirt; one printed and the other
plain.
The manufacturing requirements are:
Cutting and printing time: The printed t-shirt takes 9 minutes each whilst each plain t-shirt
takes only 3 minutes to cut and print. There are 360 minutes available for cutting and printing
each day.
Each printed t-shirt takes 5 minutes for sewing and packing whilst each plain t-shirt takes only
3 minutes. There are 240 minutes available for sewing and packing of these t-shirts each day.
A contract with a local shop requires a minimum of 12 printed t-shirts to be produced each day.
The manufacturer makes a contribution to profit of £6 from the manufacturer and sale of each
printed t-shirt and £5 for each plain t-shirt.
The manufacturer wishes to maximise this contribution to profit.
Janet Geary 2012
Page 99
Formulate the scenario described above as a linear programming problem. You should
clearly indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of printed and plain tshirt the manufacturer should produce and sell in order to maximise contribution to profit.
Identify any binding constraints
If the time available for sewing and packing the t-shirts was to increase by 1 minute each day,
what effect would this have on the maximum contribution that could be earned from the
manufacture and sale of the t-shirts?
At present the contribution from a printed t-shirt is £6. By how much could this contribution
increase before the production plan would need to be changed?
6.
A small engineering company makes two types of engine parts, coded as part A and
part B.
Part A has a contribution of £30 per unit and part B £40. The company wishes to establish the
weekly production plan which maximises contribution.
Production data are as follows:
part A
part B
total available per week
machining
(hours)
4
labour
(hours)
4
materials
(kg)
1
2
6
1
100
180
40
Because of a trade agreement, sales of part A are limited to a weekly maximum of 20 units
and to honour an agreement with an old established customer at least 10 units of part B must
be made each week.
Formulate the scenario described above as a linear programming problem. You should clearly
indicate the meaning of each constraint and the meaning of any variables.
Using a graphical method, or otherwise, determine which combination of part A and part B the
company should produce and sell in order to maximise contribution to profit.
What is the expected contribution to profit?
b)
If the number of hours available for machining each week increased by 5, by how much would
the maximum possible contribution change?
c)
If the amount of material available each week increased by 1 kg, what effect would this have
on the maximum contribution?
d)
Within what range could the contribution from a part B lie, without altering the optimal solution
found previously.
Answers:
1.
d)
maximum contribution would increase by £1.50 to £341.50
e)
maximum contribution would increase by £2.50
f)
contribution from hooded must be above £1.50
2.
d)
e)
f)
minimum cost will not change i.e. £3
minimum cost will not change i.e. £3
price of Vita must lie in the range 13.3p to 40p
a)
Maximum contribution: £340
3
Janet Geary 2012
Page 100
b)
c)
d)
3.5 Traditional and 4.5 modern each day (7 Traditional and 9 modern every 2 days)
maximum contribution would increase by £5
could increase by up to £20
4
Produce 12 printed and 60 plain t-shirts each day
binding constraints:
sewing and order (printed >=12)
If sewing and packing time increases by 1 min, max contribution increases by £1.67
Contribution from a printed t-shirt could increase by up to £2.33 before the optimal solution
changes.
Produce 15 units of part A and 20 units of part B each week. The expected contribution is
£1250 per week from the sale of these parts.
b) 5 @ 1.25 = £6.25
c) c) no effect as slack constraint
d) £15 to £45
Janet Geary 2012
Page 101
Topic 10:
variables
Linear Programming - Extension to more than two
There are many types of problems that can be solved using linear programming techniques.
We shall limit our consideration to just two examples: financial planning and farm production.
Example:
Lottery Winner
A lottery winner, Fred, has instructed his financial adviser to invest ,100,000 of his winnings in the
best combination of three stocks, Alpha, Beta and Gamma.
The details on these stocks are given in the table below:
Stock
price per share
estimated annual return per share
maximum possible investment
Alpha
60
7
60,000
Beta
25
3
25,000
Gamma
20
3
30,000
Formulate, and solve, a linear programme to show how many shares of each stock Fred should
purchase in order to maximise the estimated total annual return.
Advise Fred on the results of the sensitivity analysis.
Formulation
Let A be the number of Alpha shares purchased.
Let B be the number of Beta shares purchased.
Let C be the number of Gamma shares purchased.
Total annual return per share is 7A + 3B + 3C
Total amount of money to be invested is £100,000
Thus 60A +25 B + 20C ≤ 100,000
[assuming we do not have to invest all of it]
Only £60,000 can be invested in Alpha,@ £60 each,
Only £25,000 can be invested in Beta @ £25 each,
Only £30,000 can be invested in Gamma @£20 each,
Problem:
subject to
total investment:
max A:
max B:
max C:
non-negativity
max
7A + 3B + 3C
A ≥ 0,
60A + 25B + 20C ≤ 100,000
A
≤ 1,000
B
≤ 1,000
C ≤ 1,500
B ≥ 0,
C≥ 0
therefore
therefore
therefore
A ≤1,000
B ≤1,000
C ≤1,500
Computer Input
Janet Geary 2012
Page 102
Computer Output
Decision variable
Solution value
Alpha
750
Beta
1,000
Gamma
1,500
Objective Function (max) = 12,750
constraint
total investment
Slack or Surplus
Shadow Price
0
0.1167
max in A
250
0
max in B
0
0.0833
max in C
0
0.6667
Optimal solution:
Buy 750 shares in Alpha, 1,000 shares in Beta and 1,500 shares in Gamma.
This will give a total estimated return of £12,750.
Slacks:
Constraint total investment has slack = S1 = 0. All of the £100,000 has been invested.
Constraint max in A has slack = S2 = 250.
The number of Alpha shares bought was 250 less than the number available.
[1000 were available but only 750 bought]
Constraint max in B has slack = S3 = 0.
The maximum number of Beta shares has been purchased.
Constraint max in C has slack = S4 = 0.
The maximum number of Gamma shares has been purchased.
Janet Geary 2012
Page 103
Binding Constraints
The binding constraints are constraints total investment, max in B, and max in C.(They all have zero
slack).
Thus the maximum return is limited by the total amount of money that can be spent, the number of
Beta shares available and the number of Gamma shares available.
Shadow Prices
Total Investment:
Total investment has a shadow price of 0.1167
If the total investment possible was increased by £1, a further £0.1167 could be earned.
Since any increased investment would be spent on Alpha shares, (Beta and Gamma shares are all
used), every extra £1 spent on Alpha shares would generate £7/60 = £0.1167.
Max in A
The maximum Alpha constraint has shadow price equal to zero as we can earn nothing by increasing
the availability of Alpha shares as there are already unused shares.
Max in B
If the total number of Beta shares available was increased by 1, a further £0.083 could be earned.
We could not predict this result as more Beta shares will reduce the number of Alpha and Gamma
shares in an unknown manner.
Max in C
If the total number of Gamma shares available was increased by 1, a further £0.67 could be earned.
Sensitivity Analysis
Decision Variable
Unit Cost or Profit
Allowable Min
Allowable max
A
7
0
7.2
B
3
2.9167
M
=Infinity
C
3
2.3333
M
=Infinity
A shares
The current suggested portfolio of shares will remain optimal whilst the return on an Alpha share
remains in the range 0 to £7.20, assuming the other returns remain constant.
B shares
The current suggested portfolio will remain optimal whilst the return on a Beta share remains above
£2.92.
C shares
The current portfolio will remain optimal whilst the return on a Gamma share remains above £2.33.
Ranges for Shadow Prices
Constraint
Right Hand Side
Shadow Price
Allowable Min RHS
Allowable Max RHS
total investment
100,000
0.1167
55,000
115,000
max in A
750
0
750
M
max in B
1,000
0.0833
400
2,800
max in C
1,000
0.6667
750
3,750
These values give the ranges within which the shadow prices apply.
Janet Geary 2012
Page 104
Total investment
If the RHS of constraint total investment increases to 115,000 or less, we can say that for each extra
£1 available for investment the return will increase by £0.1167. However if the increase takes the total
investment available to above £115,000 we do not know what the extra return will be.
Similar arguments apply for reductions in the maximum total investment, provided the total available
does not fall below £55,000.
Max in A
If the RHS of constraint max A, maximum number of Alpha shares, takes any value above 750, then
for each additional share, the extra return will be zero. (This confirms that we only needed 750 Alpha
shares for the original solution.)
Max in B
If the RHS of constraint max B, maximum number of Beta shares, is in the range 400 to 2,800 shares,
the shadow price of £0.0833 will apply.
As long as the maximum number of Gamma shares is in the range 750 to 3750, the shadow price of
£0.6667 will apply.
Farm Production
A farmer has two farms, Home Farm and Meadow Farm, in which he grows corn and wheat.
Both farms are 40 acres in size.
To satisfy a contract with a local mill, the farmer must produce 7,000 barrels of corn and 11,000
barrels of wheat each year.
The farmer wishes to minimise the cost of meeting the contract.
The data for each farm is given below:
Home
Farm
yield per acre
cost per
acre
Meadow
Farm
yield per acre
cost per
acre
corn
400 barrels
£100
corn
650 barrels
£120
wheat
300 barrels
£90
wheat
350 barrels
£80
Write a report to the farmer analysing your findings.
Formulation:
Variables:
We need to distinguish between production at Home and Meadow Farm.
Let CH be the number of acres of corn planted at Home Farm
Let CM be the number of acres of corn planted at Meadow Farm
Let WH be the number of acres of wheat planted at Home Farm
Let WM be the number of acres of wheat planted at Meadow Farm
Minimise Cost:
Minimise 100CH + 90WH + 120CM + 80WM
subject to:
yield
area
corn
(1)
400CH + 650CM
wheat (2)
300WH + 350WM
Home Farm (3)
CH + WH
Meadow Farm (4)
CM + WM
CH ≥ 0, CM ≥ 0, WH ≥0 WM ≥ 0
≥ 7,000
≥ 11,000
≤ 40
≤ 40
The input for Farms is:
Janet Geary 2012
Page 105
Report
1.
Planting Plan
In order to minimise the planting costs, the following should be implemented:
Plant Home Farm with 2.56 acres of wheat
Plant Meadow Farm with 10.77 acres of corn and 29.23 acres of wheat.
This will incur costs of £3,861.54, the minimum that can be achieved.
2.
Implications of Suggested Planting Programme
The planting plan given above will result in exactly 7,000 barrels of corn and exactly
11,000 barrels of wheat being produced.
[S1 = 0, S2 = 0]
37.44 acres of Home Farm will not be planted, whilst all 40 acres of Meadow Farm will be
used.
3.
3.1
Scope of the recommendations
Change in Costs
Janet Geary 2012
Page 106
The planting plan given above will result in a minimum cost whilst the following
conditions hold:
* The cost of planting corn at Home Farm stays above £89.23 per acre.
* The cost of planting corn at Meadow Farm stays below £137.50 per acre.
* The cost of planting wheat at Home Farm stays in the range £68.57 to £105 per acres.
* The cost of planting wheat at Meadow Farm stays in the range £62.50 to £105 per acre.
If any one of the conditions above fails to hold then a new planting plan will be required.
Furthermore, if two or more of the present costs change, a new plan will be required.
As these ranges are relatively large, the proposed plan should hold good for some time
3.2
Change in Contract
The proposed plan exactly meets the contracts for 7,000 barrels of corn and 11,000
barrels of wheat.
If the requirement for corn was to increase by one barrel, the extra costs incurred in
meeting this target would be 22.3 pence. This marginal cost of 22.3 pence per barrel
applies for levels of production between 5,571.43 and 26,000 barrels.
If the minimum amount of wheat required was increased, the extra cost would be 30
pence per barrel. This extra cost per barrel will apply whilst the contract for wheat lies
in the range 10,230.8 to 22,230.8 barrels.
If the minium requirement for corn or wheat fell outside these ranges, a new planting
plan would be required.
3.3
.
Changes in Farm Size
In total only 2.56 acres of Home Farm are being used. Clearly it would not make
sense to consider increasing the size of Home Farm.
All 40 acres of Meadow Farm are being used. If the size of Meadow Farm could be
increased by 1 acre the minimum cost could be reduced by ,25. This marginal
reduction in cost applies when the size of Meadow Farm lies in the range 10.77 acres
to 42.2 acres.
Similarly, if the size of Meadow Farm is reduced by 1 acre, the minimum cost will rise
by £25.
[If we could increase the size of Meadow Farm by 1 acre, we could produce an extra
350 barrels at Meadow Farm at a cost of £80.
At the same time we would reduce the yield from Home Farm by 350 barrels
i.e. 350/300 = 1.16666 acres. This reduction would save 1.16666*90 =105.
Thus the net reduction would be 105 - 80 = £25 ]
Seminar Exercise
Question 1.
Formulate the problem below as a linear programming problem.
Solve the problem using QSB for Windows.
Write some notes on the interpretation of the output.
During the seminar discuss the formulation and the output.
A company has two factories A and B. Each factory makes two products, standard and deluxe. A unit
of standard gives a profit contribution of £10, while a unit of deluxe gives a profit contribution of £15.
The company wishes to maximise this profit contribution.
Each factory uses two processes, grinding and polishing, for producing its products.
Factory A has a grinding capacity of 80 hours per week and a polishing capacity of 60 hours per week.
For factory B these capacities are 60 and 75 hours per week respectively.
Janet Geary 2012
Page 107
The grinding and polishing times in hours for a unit of each type of product in each factory are given in
the table below:
Factory A
Factory B
standard
deluxe
standard
deluxe
grinding
4
2
5
3
polishing
2
5
5
6
It is possible, for example, that factory B has older machines than factory A, resulting in higher unit
processing times.
In addition each unit of each product uses 4 kilograms of a raw material which we shall refer to as
“raw”. The company has 120 kilograms of “raw” available per week.
(Example is taken from H.P.Williams >Model Building in Mathematical Programming=
3rd edition, Wiley 1990 page 45)
[9.166 units of standard in factory A, 8.33 units of deluxe in factory A, 12.5 units of deluxe
in factory B each week. Maximum contribution = £404.17 per week]
Question 2.
Hitech Training
A small private college Hitech Training provides training in web-page design and development for
companies. The companies send their employees to the college premises for short courses. Each
course lasts 8 hours. The distribution of the 8 hours is flexible; a course make take place during one
day or spread over a number of days. It is also possible for one course to spread over a number of
weeks, e.g. 2 hours on Monday evening for 4 weeks.
The companies send employees in groups of 10 or less, as the college’s computer rooms can only
accommodate 10 trainees at a time. If a company chooses to send less than 10 employees on a
particular course they are not charged a reduced fee as the teaching costs are fixed. If the company
wishes to send more than 10 employees, they must book more than one course.
The prices charged to the companies for a course are as follows:
Introductory Web Design
Intermediate Web Design
Advanced Web Design
£500
£600
£700
for an 8-hour course for up to 10 employees
for an 8-hour course for up to 10 employees
for an 8-hour course for up to 10 employees
The college employs trainers to deliver these courses. The trainers are only paid for the hours they
work. The rates of pay are as follows:
Introductory Web Design
Intermediate Web Design
Advanced Web Design
£18 an hour
£21 an hour
£35 an hour
Demand for these training courses is such that the college is confident of having enough companies
wanting courses to fill all the courses it provides. However, the number of courses that can be
provided each week is limited by the fact that the college only has 200 trainer-hours available each
week. [A trainer-hour is defined as one trainer teaching for one hour.]
The college has enough rooms and trainers to meet any time-tabling problems that could arise.
Of these 200 trainer-hours, only 80 are available for the Advanced Web Design course. A total of 160
trainer-hours are available for teaching the Intermediate or Advanced courses.
The trainers that can teach the Advanced courses can also teach the Intermediate courses. All
trainers can teach the Introductory course.
The college’s analysis of the demand for its courses indicates that at least 5 of all courses offered
should be at the Advanced level and at least 5 of all courses offered at Intermediate level each week.
Janet Geary 2012
Page 108
The college wishes to maximise its Net Revenue.
This is defined as:
Net Revenue = Revenue from courses - Labour Costs.
Required:
 Formulate the scenario described on the previous page as a Linear Programme.
Keep a copy of any notes made during the formulation, as you may need these in the
examination.
 Enter the problem into QSB for Windows (or a similar package)
Produce printouts of
The input data
Solution of the problem
Sensitivity Analysis
You will need to ensure that your printouts include: shadow prices, right-hand side ranges,
ranges of objective coefficients.
$
Bring these printouts and any notes to the seminar
Answer:
Question 1.
Let AS be number of standard products made in factory A each week
Let BS be number of standard products made in factory B each week
Let AD be number of deluxe products made in factory A each week
Let BD be number of standard products made in factory B each week
max
subject to:
grinding in A:
polishing in A:
grinding in B:
polishing in B:
raw:
Janet Geary 2012
10AS + 10BS+ 15AD + 15BD
4AS + 2AD ≤ 80
2AS + 5AD ≤60
5BS + 3BD ≤60
5BS + 6BD ≤ 75
4AS + 4BS + 4AD + 4BD ≤ 120
Page 109
Question 2:
Janet Geary 2012
Output.
Page 110
Topic 11: Project Management: Critical Path
Introduction
In every industry there are concerns about how to manage large-scale, complicated projects
effectively. Millions of pounds in cost overruns have been wasted due to the poor planning and control
of projects.
Project planning
Planning the project requires that the objectives are clearly defined so the project team knows exactly
what is required of them. All the activities involved in the project must be clearly identified. An activity
is the performance of an individual job that requires labour, resources and time. Once the activities
have been defined, their sequential relationships; which activity comes first, which activities must
precede others; must be determined.
Once the activities and their relationships to each other are known a network of activities can be
drawn up. The time estimates of each activity then enable a total project completion time can be
calculated. If this total completion time is longer than that specified in the objectives, means must be
found to reduce the total project time (known as crashing). This reduction in time is usually done by
assigning more resources to certain activities which will reduce the overall project time.
Project Control
Once the planning process is complete and the work begun, the focus then moves onto controlling
actual work involved. Controlling a project involves ensuring that the timetable set in the planning
stage is adhered to and that the activities are completed in the appropriate sequence. Should any
unforeseen problems arise at this stage, the project may have to be re-scheduled and additional
resources allocated to ensure that the original completion time is adhered to.
We are going to cover:
 Understand how to plan, monitor and control projects with the use of CPM.
 Be able to determine the earliest start, earliest finish, latest finish and float times for each
activity, along with the total project completion time.
 Produce Gantt Charts to facilitate monitoring of the project.
 Reduce the total project time, at the least total cost, by crashing the network.
Critical Path Method
The Critical Path Method (CPM) is a popular technique that helps managers plan, schedule, monitor
and control large and complex projects.
The steps in the procedure are:
1.
Define the project and all of its significant activities.
2.
Decide which activities must precede others.
3.
Draw the network connecting all of the activities.
4.
Assign times and/or costs estimates to each activity.
5.
Compute the critical path through the network.
6.
Use the network to plan, schedule, monitor and control the project.
Example: to produce a text book
Activity Activity description
code
A
Initial consultation with publisher
B
Prepare proposal
C
Sign contract
D
Write material
E
First proof-read
F
Final proof-read and publish
Duration
(months)
3
2
1
18
4
6
Preceding
activities
A
B
B
C, D
E
The CPM diagram consists of a network of boxes of the form
Janet Geary 2012
Page 111
Activity
EST EFT
duration LST LFT
EST = earliest start time
EFT = earliest finish time
A
3
EST
LST
EFT
LFT
LST = latest start time
LFT = latest finish time
B
2
EST
LST
EFT
LFT
D
18
C
1
EST
LST
EST
LST
EFT
LFT
E
4
EST
LST
EFT
LFT
F
6
EST
LST
EFT
LFT
EFT
LFT
Forward Pass
The forward pass involves calculating the earliest time that each activity can start and finish.
Let activity A start at time 0 (its earliest start time)
As activity A takes 3 months its earliest finish time is 0+3
Thus earliest finish time (EF) = earliest start time (ES) + duration
B cannot start until A has completed, thus its earliest start time is the same as A’s earliest finish
time .Thus B has ES = 3. The duration is 2 and thus the earliest finishing time = 5
C follows on from B and thus its earliest start time = B’s earliest finish time = 5. The earliest finish time
is 5+1 = 6
D follows on from B and thus the earliest start time is 5. It earliest finish time is 5+18 = 23
E follows from both C and D . As D has an earliest finish time of 23 , E cannot start until 23. Thus ES=
23. Duration is 4 , thus earliest finish time = 23+4 = 27
F cannot start until E has finished , thus earliest start time = 27 and earliest finish time = 33.
Thus the project will take 33 months .
A
3
0
LST
3
LFT
B
2
3
LST
5
LFT
D
18
C
1
5
LST
5
LST
23
LFT
6
LFT
E
4
23
LST
27
LFT
F
6
27
LST
33
LFT
Backward Pass
The backward pass is used to determine which activities are “critical” and works by considering the
latest start time. Latest start time (LS) = latest finish time (LF – Duration)
The latest finish time of F = 33 (as we do not want to extend the total time taken). The latest start time
= 33 - duration (6) = 27
As F can start at 27 months , then this will be E’s latest finish time . E has latest start time of 27-4 = 23
C has latest finish of 23 and earliest start of 23-1 = 22
D has latest finish time of 23 and earliest start time of 23-18 = 5
Janet Geary 2012
Page 112
B must occur before C and D. Taking the smallest value of the latest start time form the 2 , we get that
B has a latest finish time of 5. The latest start time is 5-2 = 3.
A has a latest finish time of 3 and an earliest start time of 3-3=0
A
3
0
0
3
3
B
2
3
3
5
5
C
1
D
18
5
5
5
22
6
23
E
4
23
23
27
27
F
6
27
27
33
33
23
23
The critical path is where the earliest finishing time equals the latest finishing time. In this example, the
critical path is ABDEF. The minimum time the project can take is 33 months
Example: An Estate Agency is planning to open a new office in North London.
The main activities are as follows:
Activity Description
A
Find new office location
B
Recruit new staff
C
Make alterations to office
D
Order equipment
E
Install new equipment
F
Train staff
G
Test operations
A
9
B
7
0
0
9
7
Preceding activity
None
None
A
A
D
B
C,E,F
C
5
9
14
D
3
9
12
F
4
7
11
E
3
12
15
Duration (weeks)
9
7
5
3
3
4
1
G
1
15
16
Forward Pass
A:
EST = 0
duration of A = 9
EFT = 0 + 9 = 9
B:
EST = 0
duration of B = 7
EFT = 0 + 7 = 7
C:
EST = 9
duration of C = 5
EFT = 9+ 5 = 14
D:
EST = 9
duration of D = 3
EFT = 9 + 3 = 12
E:
EST = 12
duration of E = 3
EFT = 12 + 3 = 15
F:
EST = 7
duration of F = 4
EFT = 7 + 4 = 11
G:
G can start until C. E. F has finished. These have EFT values of 14, 15, 11.
Thus EST = 15
duration = 1
EFT = 15 + 1 = 16
Thus the earliest finishing time of the entire project is 16 weeks.
Janet Geary 2012
Page 113
Backward Pass
A
9
B
7
G:
E:
C:
D:
F:
A:
B:
0
0
0
9
9
7
C
5
9
10
14
15
D
3
9
9
12
12
F
4
7
11
11
15
E
3
12
12
15
15
G
1
15
15
16
16
LFT = 16 [same as final EFT]
EST = LFT – duration = 16 – 1 = 15
LFT = 15
EST = LFT – duration = 15 – 3 = 12
LFT = 15
EST = LFT – duration = 15 – 5 = 10
LFT = 12
EST = LFT – duration = 12 – 3 = 9
LFT = 15
EST = LFT – duration = 15 – 4 = 11
LFT = 9 (smaller of EFT of C and D)
EST = LFT – duration = 9 – 9 = 0
LFT= 11
EST = LFT – duration = 11 – 7 = 4
The critical path is the route joining the nodes where EFT = LFT.
In this example, the critical path is ADEG. The shortest finishing time for the project is 16
weeks.
Floats
The floats measure the amount of time that an activity can over-run by without affecting the
total completion time.
Float = Latest finish time (LFT) – Earliest finish time (EFT)
Considering the Estate's Agency example used earlier:
A
9
B
7
0
0
0
4
9
9
7
11
C
5
9
10
14
15
D
3
9
9
12
12
F
4
7
11
11
15
E
3
12
12
15
15
G
1
15
15
16
16
The critical path is ADEG. The shortest finishing time for the project is 16 weeks.
Activity
A
B
C
D
E
F
G
Janet Geary 2012
EFT
9
7
14
12
15
11
16
LFT
9
11
15
12
15
15
16
float
0
4
1
0
0
4
0
Critical
yes
yes
yes
yes
Page 114
Gantt Charts
Gantt Charts can be used by managers to chart the progress of a project. In these charts, time is
measured along the horizontal axis, each activity is listed on the vertical axis, and a bar drawn to show
the duration of each activity. The variety of Gantt chart we will use, shows each bar stating at its
earliest start time (EST) and the length of the bar represent the duration of the activity with any 'float'
being shown at the end of the duration.
The Gantt Chart for the Estates Agency is :
Estate Agents
days
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
A
B
C
D
E
F
G
no activity
duration
float
Monitoring progress on a Gantt chart
A crucial step in meeting a completion target is to monitor a project's progress. The Gantt charts
provide a visual means of tracking the progress of the project's activities by the scheduled completion
dates. The chart also let the manager know where he has some 'slack' in the sense of the 'float' times.
For example, at the end of day 9, Activity A should be completed and F should have been started.
Activities D and C should begin next day. Ideally activity B will have finished but at long as it
completes by day 11, the project can still complete on time.
Gantt Charts
Advantages
1. The chart is easy to draw, particularly if
appropriate software is used.
2. An earliest possible completion date is clearly
visible.
3. For each activity the earliest possible start
and finishing times are shown.
Janet Geary 2012
Disadvantages
1. The chart only gives one possible
sequence for the activities, (the earliest
possible completion time).
2. The precedence rules are not clear from
the chart so it is not obvious how a delay in
one activity will affect the project.
Page 115
Seminar Activity :
For the following cases, draw the network, find the critical path and shortest duration.
Identify the “float“ for each activity
a)
Activity
Preceding
Duration
A
None
3
B
None
4
C
A
4
D
A
7
E
B,C
2
b)
Activity
A
B
C
D
E
F
G
H
c)
Activity
A
B
C
D
E
F
G
H
I
Janet Geary 2012
Preceding
None
None
A
A
C
B
D,F
B
Preceding
None
A
A
none
D
B,C,E
F
E
G,H
Duration
4
6
4
3
5
2
8
5
Duration
4
12
11
20
6
7
10
5
4
Page 116
Topic 11:
Project Management: Crashing
Some definitions:
Normal Cost: The cost associated with the normal time estimate for the activity.
Crash Cost:
The cost associated with the minimum possible time for an activity. Crash Costs,
because of the extra wages, overtime payments etc., are higher than the normal costs.
Crash time:
The minimum possible time that an activity can take.
Slope:
The average cost of shortening an activity by one time unit.
Slope = increase in cost
=
crash cost – normal cost
decrease in time
normal time-crash time
Crashing:
This process involves finding the least cost method of reducing the overall project
duration.
This is done by reducing the time of the activity with the minimum ‘slope’ providing this
activity is on the critical path.
The process is repeated until no further savings can be made.
Example:
Activity
Preceding
activity
None
none
A
A
B,C
A
B
C
D
E
Firstly:
A
4
B
8
Normal
duration
4
8
5
9
5
Crash
duration
3
5
3
7
3
total
Normal cost
(£)
360
300
170
220
200
1250
Crash cost
(£)
420
510
270
300
360
Draw network, find critical path and shortest duration.
0
0
0
1
4
4
D
9
4
5
13
14
C
5
4
4
9
9
E
5
8
9
Critical path = A,C,E
normal duration = 14 days
9
9
14
14
Total Cost = £1250
Crashing:
First Crash:
Step 1: Calculate the ‘slopes’ using:
Slope = increase in cost
decrease in time
Activity
A
B
C
D
E
Janet Geary 2012
=
critical
Normal
duration
Crash
duration
Normal
cost (£)
yes
4
8
5
9
5
3
5
3
7
3
total
360
300
170
220
200
1250
yes
yes
crash cost – normal cost
normal time-crash time
Crash
cost
(£)
420
510
270
300
360
slope
=(420-360) / (4-3) = 60
=(270-170) / (5-3) = 50
= (360-200) / (5-3) = 80
Page 117
Thus activity C has the minimum slope and thus is the cheapest way of reducing the total duration by
1 day. Reducing C by 1 day costs £50.
Step 2: New network:
A
4
0
0
4
4
B
8
0
0
8
8
Step 3:
D
9
4
4
13
13
C
4
4
4
8
9
E
5
8
8
13
13
New critical path, total duration and cost
Critical paths = ACE
BE and AD
Total Cost = £1250 + £50 = £1300
and
duration = 13 days
Second Crash: duration of C is now 4 and all activities are critical
Step 1: Calculate the ‘slopes’ using:
Slope = increase in cost
decrease in time
Activity
critical
Normal
duration
Crash
duration
A
yes
4
B
Yes
C
=
crash cost – normal cost
normal time-crash time
Normal
cost (£)
Crash
cost (£)
slope
3
360
420
= 60 [from before]
8
5
300
510
= (510-300) /(8-5) = 70
Yes
4
3
170 (220)
270
= 50 [from before]
D
Yes
9
7
220
300
= (300-220) /(9-7) = 40
E
yes
5
3
200
360
= 80 [from before]
total
1250(1300)
Thus activity D has the minimum slope and thus is the cheapest way of reducing the total duration by
1 day. Reducing D by 1 day costs £40.
However, we cannot reduce D on its own as all paths are critical.
We have to consider combinations of reductions:
A and B
extra cost = 60 + 70 = 130
D and E
extra cost = 40 + 80 = 120
B,C and D
extra cost = 70 + 50 + 40 = 160
A and E
extra cost = 60 + 80 = 140
All of these combinations will reduce the total duration by 1 day.
The cheapest option is D and E.
Janet Geary 2012
Page 118
Step 2: New network:
A
4
B
8
0
0
0
0
4
4
D
8
4
4
12
12
C
4
4
4
8
9
E
4
8
8
Step 3: New critical path, total duration and cost
Critical paths = ACE
BE and AD
Total Cost = £1300 + £120 = £1420
and
8
8
12
12
duration = 12 days
Third Crash: duration of C is now 4, D is 8 and E is 4 and all activities are critical
Step 1:
Activity
A
B
C
D
E
critical
yes
yes
yes
yes
yes
Normal
duration
4
8
4
8
4
Crash
duration
3
5
3
7
3
slope
60
70
50
40
80
We cannot reduce one activity on its own as all paths are critical. We have to consider combinations of
reductions:
A and B
extra cost = 130
D and E
extra cost = 120
B,C and D
extra cost = 160
A and E
extra cost = 140
All of these combinations will reduce the total duration by 1 day. The cheapest option is D and E.
Step 2: New network:
A
4
0
0
4
4
B
8
0
0
8
8
Step 3:
D
7
4
4
11
12
C
4
4
4
8
9
New critical path, total duration and cost
Critical paths = ACE
BE and AD and
Total Cost = £1420 + £120 = £1540
E
3
8
8
11
11
duration = 11 days
Fourth Crash: duration of C is now 4, D is 7 and E is 3 and all activities are critical
Janet Geary 2012
Page 119
Step 1:
Activity
A
B
C
D
E
critical
yes
yes
yes
yes
yes
Normal
duration
4
8
4
7
3
Crash
duration
3
5
3
7
3
slope
60
70
50
40
80
We cannot reduce one activity on its own as all paths are critical and we cannot reduce
activities D and E as they are at their ‘crash duration’.
We have to consider combinations of reductions:
A and B
D and E
B,C and D
A and E
extra cost = 130
D and E cannot be reduced
D cannot be reduced
E cannot be reduced
The only possible option left is to reduce A and B. This will reduce the total duration by 1 day.
Step 2: New network:
A
3
B
7
0
0
0
0
Step 3:
3
4
D
7
3
4
10
12
C
4
3
3
7
7
E
3
7
7
New critical path, total duration and cost
Critical paths = ACE
BE and AD and
Total Cost = £1540 + £130 = £1670
Activity
A
B
C
D
E
critical
yes
yes
yes
yes
yes
Normal
duration
3
7
4
7
3
Crash
duration
3
5
3
7
3
7
7
10
10
duration = 10 days
slope
60
70
50
40
80
The only activities that are not at their minimum duration are B and C.
As reducing B and C will not reduce the total duration, we cannot ‘crash’ any further.
Janet Geary 2012
Page 120
Example:
Klone Computers
Klone Computers is a small manufacturer of personal computers that is about to design, manufacture
and Market the Klone 2000 palmbook computer. The company faces three major tasks in introducing
a new computer.:
1.
Manufacturing the new computer
2.
Training staff and sales teams to operate the new computer
3.
Advertising the new computer
When the proposed specifications for the new computer have been reviewed, the manufacturing
phase begins with the design of a prototype computer. Once the design is determined, the required
materials are purchased and the prototypes are manufactured. Prototype models are then tested and
analysed by staff who have completed the staff training course. Based on their input, refinements are
made to the prototype and an initial production run of computers is scheduled.
Staff training of company personnel begins once the computer is designed allowing staff to test the
prototypes once they have been manufactured. After the computer design has been revised based on
staff input, the sales force undergoes full-scale training.
Advertising is a two-phase process. First, a small group works closely with the design team so that
once a product design has been chosen, the marketing team can begin an initial pre-production
advertising campaign. Following this initial campaign and completion of the final design revisions, a
larger advertising campaign team is introduced to the special features of the computer, and a full-scale
advertising programme is launched.
The entire project is concluded when the initial production run is completed, the sales staff are trained,
and the advertising campaign is underway.
Klone has come up with the following information to assist the planning of the project .
Phase
Activity
Manufacturing
A
B
C
D
E
F
G
Training
H
I
J
Advertising
Description
Prototype model design
Purchase of materials
Manufacture of prototype
models
Revision of design
Initial production run
Staff training
Staff input on prototype
models
Sales training
Pre-production advertising
Post-production advertising
Immediate
predecessors
None
A
B
estimated completion
time(days)
90
15
5
G
D
A
C, F
20
21
25
14
D
A
D,I
28
30
45
[ based on an example in Lawrence,John & Paternack,Barry, Applied Management Science, 2nd edition, Wiley
2002
B
90
C
105
15
A
90
0
90
Janet Geary 2012
105
110
5
F
25
90
I
30
90
115
G
14
115
129
D
20
129
E
21
149
170
H
28
149
177
J
45
149
194
149
120
Page 121
Forward pass:
Node
A
EST = 0 EFT = EST+ duration of A = 0 + 90 = 90
B
EST = 90 EFT = 90 + duration of B = 90 +15 = 105
C
EST= 105 EFT = 105 +duration of C = 105 +5 = 110
F
EST = 90 EFT= 90+ duration of F = 90 + 25 = 115
G
G cannot start until C & F have finished
EST = 115 EFT= 115duration of G = 115 +14 = 129
D
EST = 129 EFT = 129 + duration of D = 129 +20 = 149
E
EST = 149 EFT= 149+ duration of E = 149 + 21 = 170
H
EST = 149 EFT = 149+ duration of H = 149 + 28 = 177
I
EST = 90 EFT = 90 + duration of I = 90 + 30 = 120
J
J cannot start until both D and I have finished
EST = 149 EFT = 149 + duration of J = 149+ 45 = 194
B
15
A
90
0
0
90
90
90
95
105
110
C
105
110
5
110
115
F
25
90
90
115
115
I
30
90
120
149
G
14
115
115
129
129
D
20
EFT
90
105
110
115
129
149
170
177
194
129
129
149
149
Backward pass:
Node
J
LFT = 194, LST = 194 - duration of J = 149
H
LFT = 194 LST = 194 - duration of H = 194 - 28 = 166
E
LFT = 194 LST = 194 – duration of E = 194 – 21 = 173
D
Via E: LFT = LST of E = 173
Via H: LFT = LST of H = 166
Via J: LFT = LST of = 149
Thus LFT of D = 149, LST of D = 159 – duration of D = 149 - 20 = 129
G
LFT = LST of D = 129 LST of G = 129 – duration of G = 129-14= 115
C
LFT = LST of G =115 LST of C = 115 - duration of c = 115 - 5 = 115
B
LFT = LST of C = 110 LST = 110 - duration of B = 110 - 15 = 95
F
LFT = LST of G = 115 LST of F = 115 – duration of F = 115 -25 =90
I
LFT = LST of J = 149 LST = I = 149 – duration of I = 149-30 = 119
A
Via B: LFT = LST of B = 95
Via F: LFT = LST of F = 90
Via I: LFT = LST of I = 119
Thus LST of A = 90 – duration of a = 90-90 = 0
E
21
149
173
170
194
H
28
149
166
177
194
J
45
149
149
194
194
LST
194
166
194
129
115
110
95
90
119
0
Critical path is AFGDJ minimum completion time is 194 days
Floats
The floats measure the amount of time that an activity can over-run by without affecting the total
completion time.
Float = Latest finish time (LFT) – Earliest finish time (EFT)
Janet Geary 2012
Page 122
Activity
A
B
C
D
E
F
G
H
I
J
EFT
90
105
110
149
170
115
129
177
120
194
LFT
90
110
115
149
194
115
129
194
149
194
float
0
5
5
0
24
0
0
17
29
0
Critical
yes
yes
yes
Yes
yes
From the Gantt chart below, we can see the floats
Klone Computers
days
0
10
20
30
40
50
60
70
80
90 100 110 120 130 140 150 160 170 180 190 200
A
B
C
D
E
F
G
H
I
J
In the Gantt chart the activities are presented at their EST with the duration represented in black and
float in grey.
Seminar Question
A bank is planning to install a new computerised accounting system. The bank management has
determined the activities required to complete the project, the precedence relationships, and activity
time estimates (in weeks) as shown in the table below.
activity
description
A
B
C
D
E
Recruit staff
Systems development
Systems training
Equipment training
Manual system test
Janet Geary 2012
predecessor
A
A
B,C
Normal
time
9
11
7
10
1
Crash
time
7
9
5
8
1
Normal
cost
4800
9100
3000
3600
0
Crash
cost
6300
15500
4000
5000
0
Page 123
F
G
H
I
J
K
a.
b.
b.
Preliminary system
changeover
Computer-personnel
interface
Equipment modification
Equipment testing
System debugging and
installation
Equipment changeover
B,C
5
3
1500
2000
D,E
6
5
1800
2000
D,E
H
F,G
3
1
2
3
1
2
0
0
0
0
0
0
G,I
8
6
5000
7000
Determine the project completion time and the critical path.
Represent the project as a Gantt chart.
Crash the network to 26 weeks. Indicate how much this will cost the bank. Identify the new
critical path.
[Example is based on an example from Taylor, Bernard, Introduction to Management Science, 7th
edition, Prentice Hall, 2002]
How to use MS project 2010
Using the estate agents example we had before, we can use Microsoft project to schedule the task.
The main difference we will notice at first is that MS Project uses a calendar . To make the comparison
easy with our manual example let us start the project on 1st January 2013 . We calculated that the
project would take at least 16 weeks . 16 weeks from Tuesday 1st January would have taken us to
Monday 22nd April.
The data entry looks like:
Note only the first date, 1st January was entered , the rest were calculated by the software.
Data entered:
As a network diagram this looks like:
Janet Geary 2012
Page 124
As Gantt Chart we get
Using the example above :
Load Up Microsoft Office Project
Choose File, New, Blank Project
To use the system with “auto scheduled tasks” we need to turn off manually scheduled tasks .
We do this by selecting , File , Options , Schedule, New task created as choose Auto Scheduled
then OK
Janet Geary 2012
Page 125
To start to enter the project details:
The screen is split in two, you will need to use more on the left hand side to add predecessors
The first entry is Z: Start project with 0 duration. It is here that we set the start date of 1st January
Janet Geary 2012
Page 126
The rest of the details can now be added:
Notes: a) only the first starting date is added by us
b) The predecessors have to be added by the activity number given in the first column
To get the different displays:
Using the View Tab, choose Gantt chart icon
You will have to increase the size of the right-hand of the screen to see the whole chart
Using the View Tab, choose the network icon . This looks like
Janet Geary 2012
Page 127
Seminar Activity
Use
MS project to determine the critical path and the project duration for :
a. Klone Computers
b. Bank Accounting System
Answers
Janet Geary 2012
Page 128
Critical Path is A,F,G,D,J start date = Tues1st January, end date = Friday 27th September (38 weeks
and 4 days = 38×5 + 4 = 194 working days
Janet Geary 2012
Page 129
Critical path = ADGK duration = 33 weeks
Janet Geary 2012
Page 130
Janet Geary 2012
Page 131
Download