Linear Models in R

advertisement
Linear Models in R
Regression and Analysis Variance
What do you know already?
Regression
 Continuous Dependent Variable
 Continuous Independent Variable
 Assumptions
 Normality
 Independence
 Constant variance
   N(0, 2)
 Linear or curvilinear
ANOVA
 Continuous Dependent Variable
 Discrete Independent Variable
 Assumptions
 Normality
 Independence
 Constant variance
   N(0, 2)
 Factor level variances are equal
Linear Models
 Regression and ANOVA (and in fact ANCOVA) are all related
mathematically to one another.
 Exactly the same mathematics is used throughout.
 The only difference is the type (and number) of independent
variables that you are working with.
 The base assumptions are required for all linear models.
What procedure are we going to use to
analyse linear model data?
Wagga House Prices
 A Wagga Wagga Real Estate Agent wishes to use data from 30
recent house sales to predict future selling prices ($ 000)
from land area (m2).
 The data was collected from the internet from any real estate
listings that included the land size and the listing price.
 Most of the included listings were for 2 bedroom, 1
bathroom and 1 garage houses.
Call:
lm(formula = Price ~ Land, data = dat)
Residuals:
Min
1Q
Median
3Q
Max
-169.486
-57.992
1.337
68.666
169.565
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
53.9420
56.8561
0.6202
0.0931
Land
0.949
0.351
6.662 3.14e-07 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 92.94 on 28 degrees of freedom
Multiple R-squared:
0.6132,
Adjusted R-squared:
F-statistic: 44.39 on 1 and 28 DF,
0.5994
p-value: 3.141e-07
anova(dat.lm)
Analysis of Variance Table
Response: Price
Df Sum Sq Mean Sq F value
Land
1 383397
383397
Residuals 28 241863
8638
Pr(>F)
44.385 3.141e-07 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Bottlenose Dolphins
 Neonate bottlenose dolphins produce many sounds just after
birth. Prior to suckling these sounds intensify and then as the
neonate prepares to feed the sounds cease, this is called a
latency period (LP). It is thought that the LP is related to the
suckling frequency. A study was conducted to collect
information about the length of the LP and the suckling
frequency, where the aim was to define this relationship if it
existed.
Johne’s Disease
To eliminate Johne’s disease from an infected farm or to prevent
transmission, it is essential that susceptible animals are not
exposed to an environment contaminated with the virus. The virus
causing Johne’s disease is capable of persisting in the environment
for long periods due to the high lipid content in the cell wall and
the metabolic inactivity of the organism. Factors that could
influence the survival of the virus in the soil including
temperature, pH, organic matter exposure to ultra violet light and
moisture content were investigated under controlled conditions.
Johne’s Disease continued
This experiment involved trays of contaminated soil randomised
to 12 unique treatments, involving changing the pH, UV light and
the moisture content. They are uniquely defined as Treatment
1:12. The treatments were randomised to the trays of soil on a
completely randomised fashion so there each treatment was
replicated 5 times. The ln(number of virsus) remaining was the
response measured as an indication of the effectiveness of the
treatment. The aim of the experiment is to determine the “best”
treatment for removing the virus from the soil.
Download