ppt

advertisement
Analysis of RT distributions
with R
Emil Ratko-Dehnert
WS 2010/ 2011
Session 11 – 01.02.2011
0.2
0.4
y1
exg(mu=0, sigma=1, nu=1)
exg(mu=0, sigma=0.5, nu=1)
exg(mu=0, sigma=1, nu=1)
exg(mu=0, sigma=0.5, nu=1)
0.0
Last time...
0.6
0.8
1.0
Comparing Ex-Gauss CDF and hazard function
-2
0
2
4
xx
• RT distributions in the field (very briefly)
• Survivor function
• Hazard and Cumulative Hazard function
2
6
III
ESTIMATION THEORY
3
Overview
• Introductory words on estimation theory (Ex‘s X, LSM)
• Properties of good estimators
– Unbiasedness, Consistency, Efficiency, MLE
– Cutoffs and inverse transformation
• Parametric vs. non-parametric estimation
– CDF and quantile estimates
– Density estimators
– Hazard estimators
4
Estimation
• The goal of estimation is to determine the properties of the (RT) distribution from sampled data.
• These may be...
– Gross properties as the mean or skewness,
– Parameters of a distribution (parametric)
– or the exact functional form of the distribution (nonparametric)
5
Complications
• Every estimator in itself is again a random
variable, i.e. it has a specific variation and
distributional form (mostly unknown).
• Therefore, one should use estimators which
are „good“ in a well-defined sense
6
Example: X
• The sample mean is an estimate of the parameter
μ, the population mean:
1
X 
n
X
i
• X will change for different samples, even when
those samples are taken from the same population.
• By the Central limit theroem, we know that the
distribution associated with X is approximately
normal (X ~ N(μ, σ2/n), when Xi are iid).
7
Least-squares minimization
• Estimate a function that follows the data (x,y),
while minimizing the square of residuals
min f  y where f is a model function
x
2
• In case f is a combination of linear functions,
then problem can be solved analytically (without
numerical approximation)
8
9
PROPERTIES OF ESTIMATORS
10
1) Unbiasedness
• An estimator α^ for the parameter α, is said to
be unbiased, if
E (ˆ ) ˆ  
^ is equal to
meaning the expected value of α
the parameter it wants estimate (α)
11
1*) Asymptotic unbiasedness
^ are called asymptoti• A series of estimators α
n
cally unbiased, if the expectation of it converges
to the „real“ parameter α
lim E (ˆ n )  
n 
• Drawback: you might need a lot of n to get to
the vicinity of α
12
2) Consistency
• An estimator is said to be consistent, when (for
growing n) the probability, that it differs from the
estimated parameter shrinks to zero:
lim P  ˆ       0
n
13
2) Consistency
• Interpretation:
„The accuracy of the estimator α can be
improved by increasing sample size n.“
• For e.g. X, this property is granted by the Law
of large numbers
14
3) Efficiency
• Efficiency refers to the variance of an estimator.
• Because the estimate of a parameter is a RV, we
would like for the variance of that variable to be
relatively small.
• If the estimator is unbiased, we can be fairly
certain that it does not vary to much from the
true value of the parameter.
15
3) Efficiency
^
• Def: An unbiased estimator α, whose efficiency
1 / ( )
e(ˆ ) 
1
var( )
for all parameters α, is called efficient.
• Here I(α) is the Fisher-Information matrix of the
sample.
16
4) Maximum likelihood
• Let x1, x2, ..., xn be the observed iid data and θ
the parameter vector of the model.
• Then the joint density function
f x1 , x2 ,  , xn |    f ( x1 |  )  f ( x2 |  )  f ( xn |  )
describes the likelihood of observing the data
under the assumption of θ
17
Log-likelihood and MLE
• The log-likelihood is the reverse interpretation
ˆl  1 ln L | x ,  , x   ln f ( x |  ).

1
n
i
n
• The maximum likelihood estimator is defined as
ˆmle  arg max lˆ( | x1, , xn ).
 
18
Rationale and constraints
• „If we observe a particular sample, then it must
have been highly probable, and so we should
choose the estimates of the population parameters that make this probability as large as
possible.“
• But: one requires a parametric model and the
associated likelihood function(!)
19
Remarks on estimators so far
• Sample mean X and sample standard deviation s:
– are not robust to „outliers“
– RT distributions are right-skewed, so both are
inappropriate for statistical testing (power issues)
• Median Md or Interquartile Range (IQR:= Q3-Q1):
– are robust, but their standard errors are larger than of X, s
– Also require higher sample sizes to approximate normality
– And Md is biased, when estimating a skewed distribution
20
Cutoffs
• Monte Carlo simulations paper (Ratcliff, 1993)
• Investigated methods to reduce/ eliminate
effects of outliers on X, s:
– Md

lower power in any case
– Fixed cutoffs 
no hard rule
– 3sd<->mean 
disastrous effects on power
– Cutoffs
Can introduce asymmetric biases

21
Inverse transformation (I)
• Next most powerful alternative to minimizing
effects of outliers – transform RT to speed
• Ratcliff doubled the slowest RT and the
standard deviation of X = 1/RT worsened from
s=.466x10-3/ ms to only .494x10-3/ ms
22
Summary
• Try to avoid using cutoffs
• if you do, always present the results for both
complete and trimmed data
• Rather use robust methods (e.g. inverse
transformation) instead
23
PARAMETRIC VS
NON-PARAMETRIC ESTIMATION
24
Reflections
• Least-squares minimization and MLE insure that
estimates are unbiased, consistent, efficient, ...
• But:
– preconditions for these methods rarely met (normality, iid, ...)
– Furthermore, when estimating e.g. density functions, the
extent of bias depends on the true form of the population
distribution (which is unknown).
25
Options
 When a model is specified, then one can use
parametric procedures
 Else one has to use non-parametric procedures
26
Estimating the CDF
• We already know that the CDF can be estimated
by the ECDF:
number of elements  t 1
ˆ
Fn (t ) 
  1xi  t
n
n
• This also gives us an estimator for the survivor
function(!)
• The ECDF is unbiased and asypmtotically normal
27
28
Quantiles
• You can easily estimate the pth quantile
• For example the Median (p = 0.5)
• Start by sorting your data from smallest to largest
• If your data RTn has e.g. n = 45 values, then
n*p = 22.5 and the [np]th observation is
(RT22 + RT23)/2
29
DENSITY ESTIMATORS
30
Classes of density estimators
• general weight function estimators
– e.g. the histogram or weighted histogram
– Are easy to compute, but tend to be inaccurate
• Kernel estimators
– More difficult to understand and need more
computation
31
Warning
• „It is important to realize that a density estimate is
probably biased, the degree of bias will be unobservable, and the bias will not necessarily get smaller
with increase in sample size.“ (Van Zandt, 2002)
• Also: Never use density estimates for your parameter
estimation (i.e. model fitting)!
32
Histogram
ˆf (t )  number of observatio ns in bin i , t  t  t
i 1
i
nhi
• The more n you have the smaller your bin widths
hn have to be. hn can be fixed and variable.
• But: there is no algorithm to adjust hn for
increasing n (accuracy need not improve with
greater n!)
 inappropriate for serious RT analysis
33
0.4
0.3
0.2
0.1
0.0
Density
0.5
0.6
0.7
Histogram of eruptions
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
34
Kernel estimators
• Idea: At every point the kernel estimator is a weighted
average of all of the observations in the sample
ˆf (t )  1
nh n
 Ti  t 

K 

i 1
 hn 
n
• hn is again the width and the larger hn the smoother
the estimate will be. Ti is the sum of all obs‘s in bin i.
• The kernel K itself is a density function, which
integrates to 1 over x and is typically symmetric
35
Ex: Gaussian Kernel
1  x2 / 2
K ( x) 
e
2
36
Gaussian Kernel
• Is a generally good estimator of RT densities
• Especially for larger samples (n > 500)
^
• As the kernel is continuous, the f(t) is, too
• Again: even with larges samples n, the
Gaussian kernel estimate might be biased (Van
Zandt, 2000)
37
Choosing hn
• Silverman (1986) proposed a method to choose
the smoothing parameter of the kernel estimate
hn as a function of the spread of the data
0.9
 IQR 
hn  0.2 min  s,

n
 1.349 
• Here s is the sample standard deviation and IQR
is the interquartile range
38
HAZARD ESTIMATES
39
Hazard function
• Since the hazard function
f (t )
h(t ) 
F (t )
ˆ (t )
f
hˆ(t ) 
ˆ (t )
1

F
one might try to estimate it by
was defined to be
ˆf (t )  1
nhn
 t  Ti 

K 

i 1
 hn 
n


t

T
1
~
i

Fˆ (t )   K 
n i 1  hn 
n
40
Problems
• since the denominator goes to 0, errors inflate!
• also sparseness of data from the tail of the
distribution in the sample generally makes
hazard functions difficult to observe
41
 3  x2 
 4 5 1   if x  5
K ( x)   
5 
0
else



x
3
1
~
3
K ( x)   K (u ) du 
x  x / 15 
 5
2
4 5
42
Epanechnikov estimator
• By using the Silvermann smoothing estimate for hn,
one avoids discontinuities at the tail of the data
• Eventually though, hn will show a tremendous
acceleration towards ∞
• All in all the Epanechnikov estimator gives the most
accurate estimates for the greatest range and one
easily sees, where it becomes inaccurate
43
44
IV
MODEL FITTING
45
AND NOW TO
46
ecdf(x)
• The ecdf(x) command generates a stepfunction
– the empirical cumulative distribution function
• One can plot the result by plot.ecdf(x) or
plot(ecdf(x))
• One can access the data via knots(ecdf(x)) or
summary(ecdf(x))
47
histogram(x)
hist(x, breaks = b, freq = FALSE)
• breaks can be either
– a vector giving the breakpoints (x,y)
– A single number n, giving the number of bins
– A function to compute the number of cells (e.g. Silverman)
– Characters for break-algorithms („Sturgess“, „Scott“, „FD“)
48
density(x)
density(x, bw = "nrd0", adjust = 1,kernel = "gaussian“)
• This computes kernel density estimates and can also
be used for plotting ( plot(density(x) )
• bw = smoothing bandwith („nrd0“ = Silverman‘s rule
of thumb)
• adjust = adjusts the bandwith relative to bw (e.g. 0.5)
• kernel = „gaussian“, „rectangular“, „epanechnikov“, ...
49
Estimating hazard functions
• There is a package „muhaz“ in R, but it only
deals with exponential hazard rates
• But with the commands from before one can
implement a generic hazard function estimate
50
Download