对称回归模型

正交（对称，全）回归哪个y, 哪个x  身高预测体重还是体重预测身高  父亲身高预测孩子身高，还是孩子身高判断父亲身高 #heights {alr3} #Karl Pearson organized the collection of data on over 1100 families in England #in the period 1893 to 1898. This particular data set gives the heights # in inches of mothers and their daughters, with up to two daughters per mother. # All daughters are at least age 18, and all mothers are younger than 65. #Data were given in the source as a frequency table to the nearest inch. #Rounding error has been added to remove discreteness from graph   #Davis {car} The Davis data frame has 200 rows and 5 columns. The subjects were men and women engaged in regular exercise. There are some missing data. # father.son {UsingR} #1078 measurements of a father's height and his son's height  father.son {UsingR} summary(lm(fheight~sheight,father.son))  summary(lm(sheight~fheight,father.son))  o1=lm(fheight~sheight,father.son)  o2=lm(sheight~fheight,father.son)  plot(fheight~sheight,father.son)  s.prid=expand.grid(sheight=seq(50,90,1))  s.prid$fheight=predict(o1,s.prid)  s.prid2=expand.grid(fheight=s.prid$fheight)  s.prid2$sheight=predict(o2,s.prid2)  lines(fheight~sheight,s.prid,col="red")  lines(fheight~sheight,s.prid2,col="blue")  legend("topleft",c("fheight~sheight","sheight~fh eight"),lty=1,col=c("red","blue"))  对称回归如果难以确定x, y中哪个是响应变量, 如何建立两者之间的函数关系?  如果x, y地位对等(对称)，y~x以及x~y都不合理。应该使用对称回归方法，包括major-axis reg(或 orthogonal reg), reduced major reg(或impartial reg),bisector reg(或double regression)  Pearson给出了major axis regression (也称作 orthogonal regression) 方法, 这是一种对称回归方法。  Reduced major axis regression (impartial regression)：the SD line 其它symmetric regression  Bisector regression (double regression):平分 y~x, x~y回归直线的夹角二元正态分布-回归、逆回归程序               ol<-function(x,y) { s_xy=sum((x-mean(x))*(y-mean(y))) s_xx=sum((x-mean(x))^2) s_yy=sum((y-mean(y))^2) b1=s_xy/s_xx b2=s_yy/s_xy r=cor(x,y) b_ol=(-(b2-1/b1)+sign(r)*sqrt(4+(b2-1/b1)^2))/2 b_sd=sign(r)*sqrt(b1*b2) b_bi=(b1*b2-1+sqrt((1+b1^2)*(1+b2^2)))/(b1+b2) B=list(b_xy=b1,b_yx=b2,b_ol=b_ol,b_sd=b_sd,b_bi=b_bi) return(B) } 数据 IQ=c(90,92,93,95,97,98,100)  P=c(39,42,36,45,39,45,42)  分析          B=as.numeric(ol(IQ,P)) A=mean(P)-B*mean(IQ) plot(IQ,P) lines(IQ,A[1]+B[1]*IQ) lines(IQ,A[2]+B[2]*IQ,col="purple") lines(IQ,A[3]+B[3]*IQ,col="red") lines(IQ,A[4]+B[4]*IQ,col="blue") lines(IQ,A[5]+B[5]*IQ,col="green") legend("topleft",c("x~y","y~x","ol","sd","bi"),lty=1,col=c("bla ck","purple","red","blue","green")) 一些特殊问题 1. 异常点/标准化  2. 中心化  1. 异常值(outlier)/标准化例1.1.青年人IQ分数的分布为正态，超过99% 分位数的可定义为智力超常者(outlier): 例1.2.体重指数。肥胖的不恰当的定义：重量超过群体95%分位数的人为肥胖：不同身高、性别、年龄的人不具可比性。即μ 是若干因素的函数。一个简单但繁琐的办法是分层，对给定群体发现W分布，并定义超过C(比如标准正态分布95%分位数)的人为肥胖。  另外一个做法是消除掉log(H)对log(W)的影响(同时控制性别G、年龄），即假设回归模型： 2. 中心化

对称回归模型

Related documents

Products

Support

对称回归模型

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib