2.4 19.4 R-tutorial to fit non-linear curve with 2.18.funding.sav data (1) LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows you to fit non-linear curve with p-value and confidence intervals. R-tutorial to fit non-linear curve with 2.18.funding.sav data. Open R (download R from http://www.r-project.org if have not done so) Go to Packages, and then select Load packages, load Base and Foreign package We are going to use Dr. Harrell’s libraries, first load Hmisc and Design libraries from MENU Go to Packages, and then select Install packages from CRAN Load Hmisc and Design Delete downloaded files (y/N)? N #R is capital letter sensitive R-tutorial to fit non-linear curve with 2.18.funding.sav data (2) # There is another step in order to complete loading job. # at the command line “>” type the following # (you can cut and paste the following commands from here to R): library(Hmisc) library(Design) library(foreign) library(MASS) # See what are in there by typing library(help="Hmisc") library(help="Design") # Now look in Hmisc for a command to read SPSS file help.search("bootstrap") #general search # R can be used as calculator 1+2 [1] 3 # a に1を代入 a<-1 # b に2を代入 b<-2 a+b [1] 3 a*b [1] 2 b ** b [1] 4 R-tutorial to fit non-linear curve with 2.18.funding.sav data (3) #If you want to know instruction how to use spss.get to read SPSS file ?spss.get #Now let’s read in the dataset (first, you need to move the file to the #directory called c:\\temp, if you want to use the following command, #otherwise, specify the directory you stored the dataset.) support<-spss.get('c://Rdata//support.sav', lowernames=T ) # List name of variables names(support) [1] "age" "sex" "hospdead" "slos" "d.time" "dzgroup" [7] "dzclass" "num.co" "edu" "income" "scoma" "charges" [13] "totcst" "totmcst" "avtisst" "race" "meanbp" "hrt" [19] "pafi" "bili" "crea" "ph" "wblc" "resp" [25] "temp" "alb" "sod" "glucose" "bun" "urine" [31] "adlp" "adls" "pre.1" "pred.cat" "rand.num" "rrand.nu" [37] "filter.." "pre.2" "death" "years" "year3" "status3" #if you want to know the contents of the datasets, type name of the dataset support age 1 2 3 4 5 6 7 8 9 10 11 12 13 sex hospdead slos d.time 43.53998 female 0 115 63.66299 female 1 14 41.52197 male 1 21 89.58795 male 1 4 67.49097 male 0 24 72.83795 male 1 109 75.36798 male 1 13 37.71899 male 1 7 58.95999 female 0 26 25.48700 male 1 19 56.66498 male 1 14 38.88300 male 0 15 66.54596 male 1 45 2022 14 21 4 1951 109 13 7 1882 19 14 1807 45 dzgroup ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis MOSF w/Malig ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis ARF/MOSF w/Sepsis R-tutorial to fit non-linear curve with 2.18.funding.sav data (4): R-output from describe function #describe is tremendously useful to see what is in the dataset describe(support) 42 Variables 1000 Observations ------------------------------------------------------------------------------------------------------------------age : Age n missing unique Mean .05 .10 .25 .50 .75 .90 .95 1000 0 970 62.47 33.76 38.91 51.81 64.90 74.50 81.87 86.00 lowest : 18.04 18.41 19.76 20.30 20.31, highest: 95.51 96.02 96.71 100.13 101.85 ------------------------------------------------------------------------------------------------------------------sex n missing unique 1000 0 2 female (438, 44%), male (562, 56%) ------------------------------------------------------------------------------------------------------------------hospdead n missing unique Sum Mean 1000 0 2 253 0.253 ------------------------------------------------------------------------------------------------------------------slos : Days from study enrollment to hospital discharge n missing unique Mean .05 .10 .25 .50 .75 .90 .95 1000 0 88 17.86 4 4 6 11 20 37 53 lowest : 3 4 5 6 7, highest: 145 164 202 236 241 ------------------------------------------------------------------------------------------------------------------- # To create nice graphs to describe data datadensity(support) #Scatter plot 2e+05 0e+00 totcst 4e+05 plot(totcst~alb, data=support) 1 2 3 4 5 #Scatter plot 9 10 8 7 log(totcst) 12 plot(log(totcst)~alb, data=support) 1 2 3 alb 4 5 # Save a new variable, ln.totcst into support data support$ln.totcst <- log(support$totcst + 1) #Type the next 2 lines before you do any graphical work dd <- datadist(support) options(datadist='dd') # Lowess curve plsmo(support$alb, support$ln.totcst, datadensity=T) # Fitting linear regression f.linear<-ols(ln.totcst~alb, data=support) # Show results of the regression f.linear ols(formula = ln.totcst ~ alb, data = support) Frequencies of Missing Values Due to Each Variable ln.totcst alb 105 378 n Model L.R. 565 66.88 Residuals: Total RCC cost Min 1Q -9.93285 -0.77134 Median 0.02745 d.f. 1 R2 0.1116 3Q 0.74721 Max 3.07144 R2 Sigma 1.153 Coefficients: Value Std. Error t Pr(>|t|) Intercept 11.1960 0.18971 59.017 0.000e+00 alb -0.5263 0.06258 -8.411 4.441e-16 Residual standard error: 1.153 on 563 degrees of freedom Adjusted R-Squared: 0.11 P-value # Plot result of the linear regression plot(f.linear, alb=NA) # Check normality of residuals hist(f.linear$residuals) #Fitting non-lineaer linear regression f.nonlinear<-ols(ln.totcst~rcs(alb,3), data=support) # Graph the non-linear regression plot(f.nonlinear, alb=NA) #Viewing the result of the regression anova(f.nonlinear) Overall effect of ALB p<0.0001 indicates significant effect by ALB Analysis of Variance Response: ln.totcst Factor d.f. Partial SS MS F alb 2 96.384725 48.192363 36.29 Nonlinear 1 2.322377 2.322377 1.75 REGRESSION 2 96.384725 48.192363 36.29 ERROR 562 746.263561 1.327871 P <.0001 0.1865 <.0001 P<0.05 indicates non-linearity P<0.05 indicates the model is useful Box-Cox transformation (1): Finding an optimal choice for power transformation in R A useful method so called Box-Cox transformation, will help you to identify the optimal power transformation to achieve normality of residuals. Now we find the best transformation for a regression of Crea= age f.linear.crea<-ols(crea~age, data=support) hist(f.linear.crea$residuals) anova(f.linear.crea) plot(f.linear.crea, age=NA) Analysis of Variance Table Response: crea Df Sum Sq Mean Sq F value Pr(>F) age 1 0.13 0.13 0.045 0.832 Residuals 995 2915.09 2.93 Box-Cox transformation (2): Finding an optimal choice for power transformation in R f.linear.crea<-lm(crea~age, data=support) bcout<-boxcox(f.linear.crea) bcout$x[bcout$y == max(bcout$y)] [1] -0.5050505 Indicates that you may try transfomation by Y-0.5 # Create a new variable support$crea05<-support$crea**(-0.5050505) Box-Cox transformation (3): Finding an optimal choice for power transformation in R # Re-do linear regression with transformed variable f.new<-ols(crea05~age, data=support) # Check residuals again hist(f.new$residuals) # Now you can see p-values anova(f.new) plot(f.new, age=NA) Analysis of Variance Factor d.f. Partial SS age 1 1.198304 REGRESSION 1 1.198304 ERROR 995 68.590890 Response: crea05 MS F P 1.19830356 17.38 <.0001 1.19830356 17.38 <.0001 0.06893557 Homework assignment 1 Using Support.sav and use R-software, answer the following questions. 1. Plot non-linear regression slope of log-transformed total cost by serum albumin level with 95% CI for the slope. (a) Does R2 improve from the analysis of 19.1.1? (b) Does the test of non-linearity for serum albumin level suggest non-linear effect of serum albumin level? (c) Is there association between transformed total cost and serum albumin level? Homework assignment 2 Using Support.sav and use R-software, answer the following questions. 2. Plot simple non-linear regression slope of log-transformed total cost by SUPPORT coma score. (a) Does R2 improve from the analysis of 19.2.1? (b) Does the test of non-linearity for SUPPORT coma score suggest non-linear effect of SUPPORT coma score? (c) Is there association between transformed total cost and SUPPORT coma score?