Chapter 4: Moments – Linear Regression Chapter 4: Elements of Statistics

Chapter 4: Moments – Linear Regression
Chapter 4: Elements of Statistics
4-6
Curve Fitting and Linear Regression
4-7
Correlation Between Two Sets of Data
Concepts

How close are the sample values to the underlying pdf values ?

Practical curve fitting, using an NTC resistor to measure temperature.
Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or
facts:
Descriptive statistics – the collecting, grouping and presenting data in a way that can be easily
understood or assimilated.
Inductive statistics or statistical inference – use data to draw conclusions about or estimate
parameters of, the environment from which the data came from.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
1 of 17
ECE 3800
4-6
Curve Fitting and Linear Regression
Fitting lines/curves to scatter plots.
Data provided as (x,y) pairs. Is there a function that goes through all the points? Yes …
If you want to use a polynomial of degree n-1 for n pairs! But we usually want simple curves to
represent the data, like lines or parabolas, etc. where
y  a  bx or y  a  bx  cx 2
To fit the curve we want to minimize the following function (the squared error):
 yi  a  b  xi  c  xi 2  
n
2
i 1
For a linear regression (a line), we have
 y
n
err 
i
 a  b  x i 2

2
i 1
To minimize for the values a and b, take the derivatives and set them equal to zero. Then solve
for a and b:
d err 

da
d err 

db
n
 2  yi  a  b  xi   0
i 1
n
 2  yi  a  b  xi  xi  0
i 1
Solving results in
n
n
 y i  b  xi
a  i 1
i 1
n
n
and
b
n
n
n
i 1
i 1
 n
i 1
 y i  xi   xi  y i

xi 2  
xi 


i 1
 i 1 
n
n


What happens when we take expected values?
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
2 of 17
ECE 3800
Proof:
d err 

da
n
n
 2  yi  a  b  xi   0
i 1
n
n
 yi   a  b  xi   n  a  b   xi
i 1
i 1
n
a
y
i 1
i 1
n
i
 b  xi
i 1
n
1 n
1 n
  y i  b    xi
n i 1
n i 1

Working on b
d err 

db
n
 2  yi  a  b  xi  xi  0
i 1
n
n
n
i 1
i 1
i 1
 y i  x i  a   xi  b   x i 2
Substituting for the computation for a
n
n
n
1  n
 n
2




y
x
y
b
x

x

b

xi



i
i
i   i
  i
i 1
i 1
i 1
 i 1
 n  i 1
n

i 1
1
y i  xi  
n
2
2
 n
n
n
 n
 

1 
1


yi 
xi  b  
xi   b 
xi 2  b  
xi 2   
xi  




n
n
 i 1
i 1
i 1
i 1
 i 1  
 i 1 

n
n
 




Isolating b
n
b
 y i  xi 
i 1
n
1 n
  y i   xi
n i 1
i 1
1  n 
2

   xi 
x

i
n  i 1 
i 1
n
2

1 n
1 n
1 n
  y i  xi    y i    xi
n i 1
n i 1
n i 1
1 n 2 1 n 
  xi     xi 
n i 1
 n i 1 
2
Now that b is determined based on the values, return to a
Substituting for the computation for b into a
1 n
1 n
1 n 

y
x
y





  xi 
 i i n 
i
n
1 n
 1 n
n i 1
n i 1  1 n

i 1
a    y i  b  x i     y i 
  xi
2
 n 

n
n
n  i 1
i 1
i 1
 n i 1
1
1


2


  xi     xi 


n i 1
 n i 1 
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
3 of 17
ECE 3800
2
1 n
1 n 2 1 n
1 n
1 n
1 n
1 n 
1 n 
  y i    xi    y i     xi     y i  xi    xi    y i     xi 
n i 1
n i 1
n i 1
n i 1
n i 1
n i 1
 n i 1 
 n i 1 
a
2
1 n 2 1 n 
  xi     xi 
n i 1
 n i 1 
Therefore a becomes
a
1 n
1 n 2 1 n
1 n
  y i    xi    xi    y i  xi
n i 1
n i 1
n i 1
n i 1
1 n 2 1 n 
  xi     xi 
n i 1
 n i 1 
2
Alternate formulation using the computed sample means of x and y
b
1

n
 yi  xi  Yˆ  Xˆ 
n
i 1
1

n
 xi 2  Xˆ 
n
2
i 1
Yˆ  1n   x  Xˆ  1n   y  x
a
1
  x  Xˆ 
n
n
i 1
n
2
i
i 1
n
i 1
2
i
i
2
i
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
4 of 17
ECE 3800
2
Linear regression example p. 180. Figure 4-5.
%%
% Figure 4_5
%
clear;
close all;
x=(0:0.5:10)';
% Linear Curve values y=a*c+b
a=2; b=4;
yref = a+b*x;
% Random noise added to the line
ydata = yref + 5*randn(size(x));
figure
plot(x,ydata,'x',x,yref)
legend('Data','Ref Line')
meanx=mean(x);
meany=mean(ydata);
meanxsq = mean(x.^2);
meanysq = mean(ydata.^2);
meancorr = mean(x.*ydata);
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
5 of 17
ECE 3800
aest_equ = (meany*meanxsq-meanx*meancorr)/(meanxsq-meanx^2);
best_equ = (meancorr-meany*meanx)/(meanxsq-meanx^2);
yest_equ = aest_equ + best_equ*x;
p=polyfit(x,ydata,1);
aest = p(2);
best = p(1);
yest = polyval(p,x);
figure
plot(x,ydata,'bo',x,yref,'k',x,yest,'r',x,yest_equ,'m');
legend('Data','Ref Line','Polyfit Line','Equ Line')
fprintf('Computation error\n')
max(abs(yest_equ-yest))
rxy = meancorr/sqrt(meanxsq*meanysq)
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
6 of 17
ECE 3800
Non-linear estimation using a polynomial fit
Example: Taking the data from Table 4-3 on p. 180.
i
1
2
3
4
5
6
7
8
9
10
T, xi
10
20
30
40
50
60
70
80
90
100
VB, yi
425
400
366
345
283
298
205
189
83
22
Figure 4-6
450
400
V=426.05+-0.654015*x+-0.0333712*x 2
Breakdown Voltage
350
300
250
200
150
100
50
0
10
20
30
40
50
60
70
Temperature (in C)
80
90
100
p=polyfit(x,y,2);
a = p(3);
b = p(2);
c = p(1);
z = a + b*x + c*x.^2;
figure
plot(x,y,'bo',x,z,'r');
xlabel('Temperature (in C)')
ylabel('Breakdown Voltage')
title('Figure 4-6')
grid
atxt=sprintf('V=%g+%g*x+%g*x^2',a,b,c);
text(50,375,atxt);
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
7 of 17
ECE 3800
7-7
Correlation of a discrete random variables
For a single random variable, we have defined measures of the relationship of one sample or
event and the next. These are the means and moments and the variance.
2nd Moment
Mean or 1st Moment


   x
 x  f x, y   dx
E X  
EX
EX  

    x    x n x   dx
   x  xi  
x

  dx
n


i 1
n
 

1
 X  E X   
n
 f  x, y   dx
2



2
2
EX
2
 
 xi
EX
2
i 1
i
i 1

n
n
1
 R XX  
n
n
 xi 2
i 1
2nd Central Moment



 x  X 
2
E X  X  





2
1
E X  X   

 n


E X  X



 x  X   

i 1
  x  xi  
 xi  X 
i 1
n

i 1
1
 
n
  dx

n

2

n
2
n
2
1
E X  X   

 n
 f  x, y   dx



2
E X  X  


2
n
  xi 2  2  xi  X  X
2


i 1
2
xi   X 
n
2
n

i 1
1
xi  
n
n
X
2
i 1
n
n
i 1
i 1
2   1n   xi 2  2  X 2  X 2  1n   xi 2  X 2
2
n


2 1
2
2

 X  C XX  E  X  X
xi  X
xi 
xi 

n


i 1
i 1
 i 1 
The variance is a measure of the similarity of successive samples or events with each other. How
close or correlated with the others would an event be expected to be?

X
2

1
 
 n
2
1
 C XX  
n
n

2
1
 
n
n


2
1 n


xi 
xi   R XX   X 2

n

i 1
 i 1 
n

2

Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
8 of 17
ECE 3800
Correlation between discrete random variables X and Y
For two sequences or paired groupings (x,y).
If we assume that every (x,y) pair is equally likely, the pmf of the functions has the same value
for every pair. Repeated pairs simply sum the probability at the point. So, for correlation,
EX  Y  
 
  x  y  f x, y   dx  dy
 
for  xi , yi  pairs, i  1to n we can define a pmf for each sample point as 1/n. Therefore,
n
   x  xi     y  y i  
EX  Y    x  y  

n


i 1
1
R XY  E X  Y   
n
n
 xi  y i
i 1
Defining the cross correlation

 
   x  X  y  Y  f x, y   dx  dy

E X  X  Y Y 
 
   x  xi     y  y i  
E X  X  Y Y   x  X  y Y 

n


i 1




n




1
E X  X  Y Y  
n



E X  X  Y Y 



1
E X  X  Y Y  
n

1

n

n
 xi  X  yi  Y 
i 1
n
 xi  yi  xi  Y  yi  X  X  Y 
i 1
n
  xi  y i   n  n  X  Y  n  n  X  Y  n  n  X  Y
i 1

1
1

1 n
   xi  y i   X  Y
n i 1
C XY  E X  X  Y  Y 
1
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
9 of 17
ECE 3800
The Discrete Correlation coefficient
For two sequences or paired groupings (x,y).
If we assume that every (x,y) pair is equally likely, the pmf of the functions has the same value
for every pair, 1/n. Repeated pairs simply sum the probability at the point. So,



 



 X  X Y Y 
x  X y Y

 f  x, y   dx  dy
E


Y 
X
Y
 X
 







n
 X  X Y Y 
x  X y  Y   x  xi     y  y i 



E


 Y  i 1  X
Y
n


 X



E



n


 X  X Y Y  1
xi  X y i  Y
r  E


 
Y  n
X
Y
 X
i 1




n


 X  X Y Y  1
1
E


xi  y i  x i  Y  y i  X  X  Y
 
 Y  n  X  Y
 X
i 1
n
1

X  X Y Y 
1
1
1
1








x

y


n

X

Y


n

X

Y


n

X

Y



i
i
X
 Y   X  Y n
n
n
n

 i 1







 X  X Y Y 

r   XY  E 



X
Y 

1

n
n
  xi  y i    X  Y
i 1
 X  Y

C XY
 X  Y
or making it fully data driven



 X  X Y Y 

r   XY  E 



X
Y


1 n
1 n  1 n 
   x i  y i      x i      y i 
n i 1
 n i 1   n i 1 
2
1 n 2 1 n 
1 n 2 1 n

  xi     xi  
  yi     yi 
n i 1
n i 1
 n i 1 
 n i 1 
2
The text defines this as the Pearson’s r statistical measure, the linear correlation coefficient
between two sets of data!
from Wikipedia
Pearson product-moment correlation coefficient:
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
10 of 17
ECE 3800
Based on the discrete terms, linear estimation becomes
Then,
a
Yˆ  R
ˆ

ˆ 
ˆ
XX  X  R XY  Y  R XX  X  R XY
2
C XX
R XX  Xˆ
and
b
    C
C
 Xˆ 
R XY  Yˆ  Xˆ
R XX
2
XY
XX
Pavlovian conditioning for sampled data … always compute the following with data
x: Mean, 2nd moment, variance (  X , R XX , and  X )
y: Mean, 2nd moment, variance (  Y , RYY , and  Y )
x and y: R XY , C XY , and  XY
n
1
 X  E X   
n
 xi
 
 xi 2
E X 2  R XX 
1

n
i 1
n
i 1
2

1 n
1
 X 2  C XX  
xi 2   
xi   R XX   X 2

n
n
i 1
 i 1 
n


1
R XY  E X  Y   
n



C XY  E X  X  Y  Y 
 XY 
n
 xi  y i
1

n
i 1
n
  xi  y i    X  Y
i 1
C XY
 X  Y
For more information:
Alberto Leon-Garcia, “Probability, Statistics, and Random Processes For Electrical Engineering,
3rd ed.”, Pearson Prentice Hall, Upper Saddle River, NJ, 2008. Chap. 8.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
11 of 17
ECE 3800
Practical Example: NTC Resistor Temperature Measurements Sunseeker
Based on Vishay BCComponents, Resistor Products Application Note, Document Number:
29053, 24 May, 2012.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
12 of 17
ECE 3800
Note: The B constant will be called a K constant in the following material.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
13 of 17
ECE 3800
The data is an exponential curve with respect to temperature.
NTC Resistors typically referenced to 25° C or 298.15° K.
For the 1st order approximation, assume
  1 1 
RT2   RT1   exp K     
  T2 T1  
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
14 of 17
ECE 3800
Plotting the resist versus temperature based on the data and some approximations, we have.
Sunseeker NTC Resistor Temperature Curves
4.00E+05
3.50E+05
Resistance (ohms)
3.00E+05
2.50E+05
Data Sheet
2.00E+05
K25/85
1.50E+05
K25/60
1.00E+05
5.00E+04
0.00E+00
0
20
40
60
80
100
Temperture (deg C)
See Excel Spread Sheet for values
Typically, data sheets provide K values based on 25° C and 85° C.
 RT2  
ln 

RT1  
K 
1 1
  
 T2 T1 
 4190
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
15 of 17
ECE 3800
For better accuracy within a critical region, the K can be computed to bound desired temperature
operating points.
For Sunseeker, key temperatures for battery operation are 45° C and 60° C. Therefore a K based
on 25° C and 60° C is sufficient for operation. This resulted in a portion of the spread sheet
analysis.
Designing with an NTC Thermistor.
EPCOS NTC Thermistor Application Notes, Feb. 2009.
A reference current or voltage is required. In this case a known voltage is provided to a resistor
divider and the output voltage is indicative of the temperature.
The resulting curve is highly non-linear due to the exponential nature of the device. To “linearize
the curve” and reduce the steepest part of the curve, place the NTC in parallel with a large
resistor.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
16 of 17
ECE 3800
For Sunseeker:
A 2.5 Vref drives the resistor divider. The Upper value used is 100kΩ and the resistor in parallel
with the NTC thermistor is 330kΩ. An inverting op-amp is not used, we are directly connected to
a 24-bit ADC.
The resulting voltage to temperature curve is
NTC
1.8000
1.6000
1.4000
1.2000
1.0000
0.8000
0.6000
0.4000
0.2000
0.0000
0
20
40
60
80
100
See the spread sheet for the expected ADC outputs and hexadecimal digital values.
Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System
Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9.
B.J. Bazuin, Spring 2016
17 of 17
ECE 3800