run

advertisement
Non-linear regression
• All regression analyses are for finding the relationship
between a dependent variable (y) and one or more
independent variables (x), by estimating the parameters that
define the relationship.
• Non-linear relationships whose parameters can be estimated
by linear regression: e.g, y = axb, y = abx, y = aebx
• Non-linear relationships whose parameters can be estimated
by non-linear regression, e.g,
bx

y
,y
1  ax
  e-  ( x - )
• Non-linear relationships that cannot be represented by a
function: loess
Xuhua Xia
Slide 1
Growth curve of E. coli
• A researcher wishes to estimate the growth curve of
E. coli. He put a very small number of E. coli cells
into a large flask with rich growth medium, and take
samples every half an hour to estimate the density
(n/L).
• 14 data points over 7 hours were obtained.
• What is the instantaneous rate of growth (r). What is
the initial density (N0)?
• As the flask is very large, he assumed that the growth
should be exponential, i.e., y = a·ebx (Which
parameter correspond to r and which to N0?)
Xuhua Xia
Slide 3
SAS Program
/* Fictitious data */
data Ecoli;
Input Density @@;
Time = _N_;
lnD = log(Density);
datalines;
20.023 39.833 80.571 161.102 317.923 635.672 1284.544 2569.430
5082.654 10220.777 20673.873 40591.439 81374.642 163963.873
;
proc reg;
This statement is necessary, otherwise Density
var Density lnD Time;
will be an unknown variable for the reg
model lnD = Time;
procedure and the plot statement will fail.
plot Density*Time/ symbol='.';
plot lnD*Time/ symbol='.';
Run, and ask students to compute
run;
parameters a and b from regression
output. What is the initial density?
Xuhua Xia
Slide 4
Body weight of wild elephant
• A researcher wishes to estimate the body weight of wild
elephants.
• He measured the body weight of 13 captured elephants of
different sizes as well as a number of predictor variables, such
as leg length, trunk length, etc. Through stepwise regression,
he found that the inter-leg distance (shown in figiure) is the
best predictor of body weight.
• He learned from his former biology professor that the
allometric law governing the body weight (W) and the length
of a body part (L) states that
W = aLb
• Use linear regression to find
parameters a and b.
Xuhua Xia
Slide 5
SAS Program
/*Fitcitious data */
data Elephant;
Input L W @@;
lnL = log(L);
lnW = log(W);
datalines;
0.3 1.657 0.4 2.500 0.5 4.680 0.6 7.075 0.7 10.070
0.8 11.988 0.9 14.836 1 18.318 1.1 23.496 1.2 27.897
1.3 36.796 1.4 44.611 1.5 50.183
;
proc reg;
Run and ask students to
var L W lnL lnW;
calculate the parameters a
model lnW = lnL;
and b.
plot W*L/ symbol='.';
plot lnW*lnL/ symbol='.';
run;
Xuhua Xia
Slide 6
DNA and protein gel electrophoresis
• How to estimate the molecular
mass of a protein?
– A ladder: proteins with known
molecular mass
– Deriving a calibration curve
relating molecular mass (M) to
migration distance (D): D =
F(M)
– Measure D and obtain M
• The calibration curve is
obtained by fitting a regression
model
Xuhua Xia
Slide 7
Protein molecular mass
• The equation appears to describe the
relationship between D and M quite well.
This relationship is better than some
published relationships, e.g.,
D = a – b ln(M)
• The data are my measurement of D and
M for a subset of secreted proteins from
the gastric pathogen Helicobacter pylori
(Bumann et al., 2002).
• Write a SAS program to use the data to
find parameters a and b
D  ae
bM
Mass
D
5
14.5
10
12.6
20
9.4
30
7.1
40
5.3
50
3.9
60
3.05
70
2.3
80
1.75
Bumann, D., Aksu, S., Wendland, M., Janek, K., Zimny-Arndt, U., Sabarth, N., Meyer, T.F., and
Jungblut, P.R., 2002, Proteome analysis of secreted proteins of the gastric pathogen Helicobacter pylori.
Infect. Immun. 70: 3396-3403.
Xuhua Xia
Slide 8
Area and Radius
What is the functional relationship
between the area and the radius?
Xuhua Xia
Toxicity study: pesticide
100
90
Percentage killed
80
70
60
50
40
30
20
10
0
25
30
35
40
45
50
55
60
65
70
Dosage
What transformation to use?
Xuhua Xia
Slide 10
Probit and probit transformation
• Probit has two names/definitions, both
associated with standard normal
distribution:
– the inverse cumulative distribution
function (CDF)
– quantile function
0.9
0.8
0.7
0.6
CDF
• CDF is denoted (z), which is a
continuous, monotone increasing
sigmoid function in the range of (0,1),
e.g.,
(z) = p
(-1.96) = 0.025 = 1 - (1.96)
• The probit function gives the 'inverse'
computation, formally denoted -1(p),
i.e.,
probit(p) = -1(p)
probit(0.025) = -1.96 = -probit(0.975)
• [probit(p)] = p, and probit[(z)] = z.
1
0.5
0.4
0.3
0.2
0.1
0
-2.5
-1.5
-0.5
0.5
1.5
z
Xuhua Xia
Slide 11
SAS program
Data Pesticide;
Input Dosage Percent @@;
Why
NuProbit = probit(Percent/100);
Cards;
27 0.90 28 1.39 31 2.40 31 2.49 35 6.42 36 7.78 37 9.16
38 10.21 38 11.71 40 16.24 41 16.90 43 22.94 44 27.35
44 27.45 44 28.14 45 28.97 45 29.96 45 30.50 46 34.30
46 35.39 46 35.65 47 37.55 47 38.46 48 40.97 49 44.37
49 45.71 49 46.66 49 47.38 50 49.86 50 52.26 51 55.12
51 56.12 52 57.68 52 59.99 52 60.30 53 60.51 53 61.82
53 62.00 53 62.92 54 66.06 54 67.14 55 70.58 55 71.57
56 74.11 56 74.12 57 76.77 57 77.01 58 78.56 58 79.01
59 83.53 60 83.96 60 84.40 61 87.95 62 88.74 63 91.13
64 92.64 64 92.67 66 95.49 68 97.00 69 97.15
;
Proc reg;
Model Percent = Dosage / R CLM alpha = 0.01 CLI;
Plot Percent*Dosage / symbol = '.';
Model NuProbit = Dosage / R CLM alpha = 0.01 CLI;
Plot NuProbit*Dosage / symbol = '.';
run;
Xuhua Xia
divide Percent by 100?
Run and
explain
CLM: CL of mean
CLI: CL of individual observation
Graphic contrast
between the original
and the transformed DV
Slide 12
Non-linear regression
• In rapidly replicating unicellular eukaryotes such as
the yeast, highly expressed intron-containing genes
requires more efficient splicing sites than lowly
expressed genes.
• Natural selection will operate on the mutations at the
slicing sites to optimize splicing efficiency.
• Designate splicing efficiency as SE and gene
expression as GE.
• Certain biochemical reasoning suggests that SE and
GE will follow the following relationships:

   GE   GE 2 if GE  GE0 

E[ SE | GE ]  

if GE  GE0 

c

Xuhua Xia
GE
SE
1
0.46
2
0.47
3
0.57
4
0.61
5
0.62
6
0.68
7
0.69
8
0.78
9
0.7
10
0.74
11
0.77
12
0.78
13
0.74
13
0.8
15
0.8
16
0.78
Slide 14
Guess initial values
   GE   GE 2 if GE  GE0 
E[ SE | GE ]  

c
if
GE

GE
0

0.85
When GE=0 then SE = , so   0.4
0.8
When GE increases from 2 to 8, SE
increases from 0.47 to 0.75, so
  (0.75-0.47)/(8-2)  0.047
0.75
0.7
With   0.4 and   0.047, then SE for
GE = 12 should be 0.4+0.04712 =
0.96, but the actual SE is only about
0.77. This must be due to the quadratic
term GE2, i.e.,
SE
0.65
0.6
0.55
0.5
(0.77 - 0.96) =  122, so
0.45
  - 0.002
0.4
0
2
4
6
8
10
12
14
16
GE
Xuhua Xia
Slide 15
A few more twists

   GE   GE 2 if GE  GE0 

E[ SE | GE ]  

if GE  GE0 

c

The continuity condition requires that
E[SE | GE0 ]     GE0   GE02
The smoothness condition requires that
E[ SE | GE0 ]
   2 GE0  0
GE0
The two conditions implies that
GE0 
c 

2
  GE0   GE02
Xuhua Xia
2
 
4
Slide 16
SAS program
/* Fictitious data */
data Intron;
input SE GE @@;
datalines;
.46 1 .47 2 .57 3 .61 4 .62 5 .68 6 .69 7 .78 8 .70 9
.74 10 .77 11 .78 12 .74 13 .80 13 .80 15 .78 16
;
title 'Quadratic Model with Plateau';
proc nlin data=Intron;
Initial guestimates
parms alpha=.4 beta=.047 gamma=-.002;
GE0 = -.5*beta / gamma;
if (GE < GE0) then Y = alpha + beta*GE + gamma*GE*GE;
else Y = alpha + beta*GE0 + gamma*GE0*GE0;
A conditional statement
model SE = Y;
needed to model the two
if _obs_ =1 and _iter_ =. then do;
segments
plateau =alpha + beta*GE0 + gamma*GE0*GE0;
put / GE0= plateau= ;
Write out GE0 and plateau (which could
end;
be computed from the estimated , , and
output out=b predicted=SEp;
. However, the output of these parameters
run;
Xuhua Xia
computes predicted values
for plotting and saves them
to data set b
contain rounding errors.
Run and explain
Slide 17
Plot
proc sgplot data=b noautolegend;
yaxis label='Observed or Predicted';
refline 0.7775 / axis=y label="Plateau" labelpos=min;
refline 12.7476 / axis=x label="GE0" labelpos=min;
scatter y=SE x=GE;
series y=SEp x=GE;
run;
Xuhua Xia
Slide 18
Fitting another function
/* Fictitious data */
data Intron;
input SE GE @@;
datalines;
.46 1 .47 2 .57 3 .61 4 .62 5 .68 6 .69 7 .78 8 .70 9
.74 10 .77 11 .78 12 .74 13 .80 13 .80 15 .78 16
;
title 'Quadratic Model with Plateau';
proc nlin data=Intron;
parms alpha=.4 beta=.8 gamma=1;
model SE = (alpha+beta*GE)/(1+gamma*GE);
output out=b predicted=SEp;
SE 
run;
proc sgplot data=b noautolegend;
yaxis label='Observed or Predicted';
scatter y=SE x=GE;
series y=SEp x=GE;
run;
   GE
1   GE
Run and explain the output
Xuhua Xia
Slide 19
Plot output
Xuhua Xia
Slide 20
Robust regression
• LOWESS: robust local regression between Y and X, with
linear fitting
• LOESS: robust local regression between Y and one or more
Xs, with linear or quadratic fitting
• Used with relations that cannot be expressed in functional
forms
• SAS: proc loess
• Data:
– Data set 1: monthly averaged atmospheric pressure differences
between Easter Island and Darwin, Australia for a period of 168
months (NIST, 1998), suspected to exhibit 12-month (annual), 42month (El Nino), and 25-month (Southern Oscillation) cycles (From
Robert Cohen of SAS Institute)
– Data set 2: Two-channel microarray data. Background correction and
two-channel loess normalization (from Wudu).
Xuhua Xia
Slide 21
SmoothingParameter=0.2
Pressure
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
10
20
30
40
50
60
70
80
90
Month
100
110
120
130
140
150
160
170
SmoothingParameter=0.05
Pressure
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
10
20
30
40
50
60
70
80
90
Month
100
110
120
130
140
150
160
170
Step 1: characterize the most obvious
trend.
data ENSO;
input Pressure @@;
Month=_N_;
datalines;
12.9 11.3 10.6 11.2 10.9 7.5 7.7 11.7 12.9 14.3 10.9 13.7 17.1 14.0 15.3 8.5
5.7 5.5 7.6 8.6 7.3 7.6 12.7 11.0 12.7 12.9 13.0 10.9 10.4 10.2 8.0 10.9
13.6 10.5 9.2 12.4 12.7 13.3 10.1 7.8 4.8 3.0 2.5 6.3 9.7 11.6 8.6 12.4
10.5 13.3 10.4 8.1 3.7 10.7 5.1 10.4 10.9 11.7 11.4 13.7 14.1 14.0 12.5 6.3
9.6 11.7 5.0 10.8 12.7 10.8 11.8 12.6 15.7 12.6 14.8 7.8 7.1 11.2 8.1 6.4
5.2 12.0 10.2 12.7 10.2 14.7 12.2 7.1 5.7 6.7 3.9 8.5 8.3 10.8 16.7 12.6
12.5 12.5 9.8 7.2 4.1 10.6 10.1 10.1 11.9 13.6 16.3 17.6 15.5 16.0 15.2 11.2
14.3 14.5 8.5 12.0 12.7 11.3 14.5 15.1 10.4 11.5 13.4 7.5 0.6 0.3 5.5 5.0
4.6 8.2 9.9 9.2 12.5 10.9 9.9 8.9 7.6 9.5 8.4 10.7 13.6 13.7 13.7 16.5
16.8 17.1 15.4 9.5 6.1 10.1 9.3 5.3 11.2 16.6 15.6 12.0 11.5 8.6 13.8 8.7
8.6 8.6 8.7 12.8 13.2 14.0 13.4 14.8
;
ods output OutputStatistics=ENSOstats FitSummary=ENSOsummary;
proc loess data=ENSO;
The output data ENSOstats contains variables:
model Pressure=Month / CLM
SoothingParameter, Month, DepVar, Pred, etc.
smooth = 0.02 to 0.2 by 0.01
dfmethod=exact;
run;
symbol1 c=black i=join value=dot;
Three criteria for choosing the smooth parameter:
symbol2 c=black i=join value=none;
GCV: generalized cross-validation (Craven and Wahba
proc gplot data=ENSOstats;
1979)
by SmoothingParameter;
plot (DepVar Pred)*Month/overlay;
AIC: Akaike information criterion (Akaike 1973)
run;
AICC1: bias-corrected AIC (JUrvich and Simonoff
1988)
Summary output: kd Tree, linear
max(1, n*s/5)
N. Fitting Points
kd Tree Bucket Size
Deg. Local Polynom.
Smoothing Param.
Neighbor Points
RSS
Trace[L]
GCV
AICC
AICC1
Delta1
Delta2
Equivalent N. Param.
Lookup DF
Residual SE
168 168 168 168
105
105
1
1
1
1
2
2
1
1
1
1
1
1
0.02 0.03 0.04 0.05 0.06 0.07
3.0
5.0
6.0
8.0 10.0 11.0
0.0 293.1 468.8 603.6 737.2 736.7
168.0 72.3 49.1 37.1 29.6 29.4
.
0.0
0.0
0.0
0.0
0.0
.
3.1
2.9
2.9
2.9
2.9
. 567.8 496.1 487.4 495.1 494.5
0.0 82.3 110.1 124.4 132.6 132.8
0.0 78.6 108.3 123.5 130.0 130.2
168.0 58.9 40.3 30.6 23.8 23.7
. 86.1 111.8 125.3 135.2 135.4
.
1.9
2.1
2.2
2.4
2.4
105
65
65
2
3
3
1
1
1
0.08
0.09
0.1
13.0
15.0
16.0
868.5 1057.6 1196.5
24.7
21.1
18.8
0.0
0.0
0.1
3.0
3.1
3.2
507.2 530.0 544.1
138.6 142.6 145.6
136.9 140.3 143.8
20.1
16.9
15.1
140.3 144.9 147.4
2.5
2.7
2.9
int(n*s), where n is ‘N. Fitting Points’ and s is ‘Smotthing Param.’
Xuhua Xia
Slide 25
More on Output
1  Trace( I  L)T ( I  L)

 2  Trace ( I  L) ( I  L)
T

2
  12 /  2  Lookup Degrees of Freedom
Residual SE= RSS/1
 RSS 
AIC  n ln 
 2p
 n 
 RSS 
AICC1  n ln 
  (greater penalty than 2 p)
 n 
Xuhua Xia
Slide 26
SmoothingParameter=0.05
Pressure
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
10
20
30
40
50
60
70
80
90
Month
100
110
120
130
140
150
160
170
proc loess data=ENSO;
model Pressure=Month /
smooth = 0.01 to 0.2 by 0.01 dfmethod=exact degree = 2;
run;
SmoothingParameter=0.05
Pressure
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
10
20
30
40
50
60
70
80 90
Month
100 110 120 130 140 150 160 170
99% Confidence limits
ods output OutputStatistics=ENSOstats FitSummary=ENSOsummary;
proc loess data=ENSO;
model Pressure=Month / r CLM smooth = 0.05 alpha=0.01 dfmethod=exact;
run;
symbol1 c=black i=none value=dot;
symbol2 c=black i=join value=none;
symbol3 c=red i=join value=none;
symbol4 c=red i=join value=none;
proc gplot data=ENSOstats;
format DepVar 4.0;
plot (DepVar Pred UpperCl LowerCL)*Month/ overlay hminor = 0 vminor = 0 vaxis
= axis1 frame;
run;
Xuhua Xia
Slide 29
99% confidence limits
20
Pressure
16
12
8
4
0
0
10
20
30
40
50
60
70
80 90
Month
100 110 120 130 140 150 160 170
Step 2: Characterize other trends
/* modify data set ENSO to filter out the 12-month cycle */
data ENSO(drop=pi);
set ENSO;
pi = 4 * atan (1); /* or pi = 3.1415926 */
cos1 = cos(2*pi*Month/12);
sin1 = sin(2*pi*Month/12);
proc reg data=ENSO;
model Pressure = cos1 sin1;
r: residual, i.e., what is left after the
output out=ENSO1 r=FilteredPressure;
monthly cycle has been removed.
run;
proc print data=ENSO1;
var Month Pressure FilteredPressure;
run;
ods output OutputStatistics=ENSO1stats;
proc loess data=ENSO1;
model FilteredPressure=Month/
smooth = 0.12;
run;
proc gplot data=ENSO1stats;
plot (DepVar Pred)*Month/overlay;
run;
Xuhua Xia
Slide 31
Cos1 and Sin1
1
0.8
0.6
0.4
Cos1, Sin1
0.2
Cos1
0
0
20
40
60
80
100
120
140
160
180
Sin1
-0.2
-0.4
-0.6
-0.8
-1
Month
Xuhua Xia
Slide 32
Filtered Pressure
20
15
Pressure
10
Pressure
5
FilteredP
0
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-5
-10
Cos(2*Pi*Month/12)
Xuhua Xia
Slide 33
~42-month cycle
7
6
5
4
3
2
Residual
1
0
-1
-2
-3
-4
-5
-6
-7
-8
Xuhua Xia
0
10
20
30
40
50
60
70
80 90
Month
100 110 120 130 140 150 160 170
Slide 34
Any other trend?
/* modify data set ENSO to filter out the 12-month cycle */
data ENSO(drop=pi);
set ENSO;
pi = 4 * atan (1);
cos1 = cos(2*pi*Month/12);
sin1 = sin(2*pi*Month/12);
cos2 = cos(2*pi*Month/42);
sin2 = sin(2*pi*Month/42);
proc reg data=ENSO;
model Pressure = cos1 sin1 cos2 sin2;
output out=ENSO1 r=FilteredPressure;
run;
proc print data=ENSO1;
var Month Pressure FilteredPressure;
run;
ods output OutputStatistics=ENSO1stats;
proc loess data=ENSO1;
model FilteredPressure=Month/
smooth = 0.12;
run;
proc gplot data=ENSO1stats;
plot (DepVar Pred)*Month/overlay;
run;
Xuhua Xia
Slide 35
~25-month cycle
6
5
4
3
2
Residual
1
0
-1
-2
-3
-4
-5
-6
-7
0
Xuhua Xia
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Month
Slide 36
Microarray
• Diagnosis
• Signature genes
• Gene regulation pathway
Avian coronavirus
Xuhua Xia
Bovine coronavirus
Slide 37
Spotted cDNA Microarray
cDNA “B”
Cy3 labeled
cDNA “A”
Cy5 labeled
+
Laser 1
Scanning
Hybridization
DNA molecules are
immobilized
by high-speed
robots on a solid
surface such as glass.
+
Analysis
Xuhua Xia
Laser 2
Image Capture
Slide 38
Microarray Data
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Array Row Array Column
Row
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Xuhua Xia
Column
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
1
2
3
4
Name
X Location Y Location ch1 Intensity
ch1 Background
61m12 Vacuolar ATP
2110synthase
8690
subunit 727.8
H (EC 3.6.3.14)
274.9352(V-ATPa
61o12 Hypothetical
2330
protein ygaF
8690 620.5389 302.3055
61a24 Protein C20orf24
2560 homolog
8690 615.8312 253.7037
61c24 unknown 2800
8690 670.1639 343.8796
61e24 Voltage-gated
3020potassium
8680channel
1002.349
beta-2255.6019
subunit (K(+)
61g24 hypothetical
3260
protein LOC550520
8690 790.1502
[Danio rerio]
303.2963
61i24 Unclassifiable
3490
EST
8690 877.9728 337.6204
61k24 PREDICTED:
3770similar to
8670
adenylate
822.9764
kinase316.0185
5 isoform 1 [Dan
61m24 Unclassifiable
3970EST 8680 747.2612 289.8148
61o24 Unclassifiable
4200EST 8680 1418.041 354.5648
84a12 RXRgamma?
4430
8680 595.088 305.9445
84c12 buffer
4670
8680 545.9477 228.4444
4900
8680 524.8138 248.9259
5140
8680 556.7717 273.9722
5370
8680 697.3549 299.3148
5600
8680 633.922 260.9445
5840
8680
579.86 267.9908
6080
8680 672.7692 247.2037
6310
8680 738.2451 267.5092
60g12 unknown 2100
8980 837.2375 337.4908
60i12 unknown 2320
8970 782.3333 333.6759
60k12 60S ribosomal
2550protein8970
L6 (TAX-responsive
906.7733 314.9352
enhancer elem
60m12 Unclassifiable
2790EST 8970 775.3107 329.3704
Slide 39
Basic Microarray Data Analysis
• Basic information
– Channel 1 background and foreground intensity: B1, F1
– Channel 2 background and foreground intensity: B2, F2
• Because two-channel microarray typically use the red and green
dye for the two channels, we have: Rb, Rf, Gb, Gf
• Background correction
R  R f  Rb
G  G f  Gb
• Within-slide normalization
R
• Identification of differentially
M  log 2    log 2 R  log 2 G
G
expressed genes
A  log 2  RG   (log 2 R  log 2 G) / 2
– Expectation of M is zero.
– A gene is overexpressed (or underexpressed)
if its M is significantly greater (or smaller) than 0
Xuhua Xia
Slide 40
Routine analysis
ODS GRAPHICS / ANTIALIASMAX=5800;
Data WuduPGF7;
Input X Y Ch1I Ch1B Ch2I Ch2B;
R=Ch1I - Ch1B;
G=Ch2I - Ch2B;
if R>0 and G>0 then do;
M=log2(R/G);
A=log2(R*G)/2;
end;
else do;
M=.;
A=.;
end;
cards;
2280
8520
469.22876
166.962967
2510
8500
443.9776
116.055557
..
;
proc sgplot;
scatter x=A y=M ;
Xuhua Xia
run;
231.892853
295.28302
54.888889
73.981483
Slide 41
MA plot
6
5
4
3
M
2
1
0
-1
-2
-3
3
5
7
9
11
13
15
A
Xuhua Xia
Slide 42
Microarray data: Background correction
/*partial data from slide PGF7*/
Data WuduPGF7;
Input X Y Ch1I Ch1B Ch2I Ch2B;
cards;
2280 8520 469.22876 166.962967 231.892853 54.888889
2510 8500 443.9776 116.055557 295.28302 73.981483
2750 8500 421.165527 133.40741 227.327103 74.953705
2990 8510 401.165405 151.361115 223.877365 53.722221
...
;
proc g3d;
scatter X*Y=Ch1B;
run;
Run the WuduPGFA.sas file
proc loess data=WuduPGF7;
model Ch1B=X Y/smooth=0.01 to 0.04 by 0.01 dfmethod=exact degree=2;
run;
Xuhua Xia
Slide 43
Two different kinds of CLM
ods output OutputStatistics=StatOut FitSummary=OutSummary;
proc loess data=WuduPGF7;
model M=A / r CLM smooth = 0.09 alpha=0.01 dfmethod=exact;
run;
symbol1 c=black i=none value=dot;
symbol2 c=black i=join value=none;
symbol3 c=red i=join value=none;
symbol4 c=red i=join value=none;
proc sort data=StatOut out = StatOut2;
by A;
run;
proc gplot data=StatOut2;
format DepVar 4.0;
plot (DepVar Pred UpperCl LowerCL)*A/ overlay;
run;
proc reg data=StatOut;
model Residual=A / R CLM alpha = 0.01 CLI ;
plot Residual *A / pred ;
run;
Xuhua Xia
Slide 44
CLM of the mean
Xuhua Xia
Slide 45
CLM of the observations
Channel-1 background: PGF Slide 7
Ch1B
7328
4899
2469
26230
20260
Y
39
20250
14290
14240
Xuhua Xia
X
8230
2220 8320
Slide 47
Channel-2 background: PGF Slide 7
Ch2B
5898
3936
1974
26230
20260
Y
11
20250
14290
14240
X
Xuhua Xia
8230
2220 8320
Slide 48
Channel-1 background: 1720p Slide 1
Ch1B
13096
8766
4435
26430
20437
Y
104
20070
14443
14063
X
Xuhua Xia
8057
2050 8450
Slide 49
Channel-2 background: 1720p Slide 1
Ch2B
12003
8013
4023
26430
20437
Y
33
20070
14443
14063
X
Xuhua Xia
8057
2050 8450
Slide 50
Summary output
Number of Observations
Number of Fitting Points
kd Tree Bucket Size
Degree of Local Polynomials
Smoothing Parameter
Points in Local Neighborhood
Residual Sum of Squares
Trace[L]
GCV
AICC
AICC1
Delta1
Delta2
Equivalent Number of Parameters
Lookup Degrees of Freedom
Residual Standard Error
5776
5776
5776
1245
480
480
11
23
34
2
2
2
0.01
0.02
0.03
57
115
173
179605778 194043883 196730653
426.43824 194.50014 166.10841
6.27601
6.22871
6.2512
11.50467
11.49221
11.49548
66457
66381
66400
5314.57324 5560.04084 5570.37194
5372.63763 5571.93922 5544.77939
391.44973 173.04112 126.58875
5257.13638 5548.16787 5596.0826
183.83405 186.81467
187.929
Which smoothing parameter is the best?
Xuhua Xia
Slide 51
Fit background surface
data PredGrid;
do Y = 8320 to 26230 by 230;
do X = 2220 to 20250 by 230;
output;
end;
end;
ods output ScoreResults=Ch1Bscore;
proc loess data=WuduPGF7;
model Ch1B=X Y/smooth=0.02 dfmethod=exact;
score data=PredGrid;
run;
proc g3d data=Ch1Bscore;
format X f4.0;
format Y f4.0;
format p_Ch1B f4.1;
plot X*Y=p_Ch1B/
tilt=60 rotate=80;
run;
Xuhua Xia
Slide 52
Fitted channel-1 background: PGF Slide7
Predicted Ch1B
941
658
26E3
2E4
375
Y
14E3
91.4
2E4
14E3
X
8200
2220
8320
SAS statements for Channel-2
/* With iterative reweighting */
proc loess data=WuduPGF7;
model Ch1B=X Y/smooth=0.01 dfmethod=exact ITERATIONS=5;
score data=PredGrid;
run;
proc g3d data=Ch1Bscore;
format X f4.0;
format Y f4.0;
format p_Ch1B f4.1;
plot X*Y=p_Ch1B/
tilt=60 rotate=80;
run;
ods output OutputStatistics=Wudustats;
proc loess data=WuduPGF7;
model Ch1B=X Y/smooth=0.01 dfmethod=exact ITERATIONS=5;
run;
proc g3d;
scatter X*Y=Pred;
run;
Xuhua Xia
Slide 54
Fitted channel-1 background: PGF Slide7 with ITERATIONS = 5
Predicted Ch1B
250
197
26E3
2E4
144
Y
14E3
90.8
2E4
14E3
X
8200
2220
8320
Fitted channel-1 background: PGF Slide7 with ITERATIONS = 5
Predicted Ch1B
251.77706
197.44784
143.11863
26230
20260
Y
88.78942
20250
14290
14240
X
8230
2220 8320
Download
Related flashcards

Statistical theory

24 cards

Information theory

28 cards

Create Flashcards