Analysis of RT distributions with R Emil Ratko-Dehnert WS 2010/ 2011 Session 11 – 01.02.2011 0.2 0.4 y1 exg(mu=0, sigma=1, nu=1) exg(mu=0, sigma=0.5, nu=1) exg(mu=0, sigma=1, nu=1) exg(mu=0, sigma=0.5, nu=1) 0.0 Last time... 0.6 0.8 1.0 Comparing Ex-Gauss CDF and hazard function -2 0 2 4 xx • RT distributions in the field (very briefly) • Survivor function • Hazard and Cumulative Hazard function 2 6 III ESTIMATION THEORY 3 Overview • Introductory words on estimation theory (Ex‘s X, LSM) • Properties of good estimators – Unbiasedness, Consistency, Efficiency, MLE – Cutoffs and inverse transformation • Parametric vs. non-parametric estimation – CDF and quantile estimates – Density estimators – Hazard estimators 4 Estimation • The goal of estimation is to determine the properties of the (RT) distribution from sampled data. • These may be... – Gross properties as the mean or skewness, – Parameters of a distribution (parametric) – or the exact functional form of the distribution (nonparametric) 5 Complications • Every estimator in itself is again a random variable, i.e. it has a specific variation and distributional form (mostly unknown). • Therefore, one should use estimators which are „good“ in a well-defined sense 6 Example: X • The sample mean is an estimate of the parameter μ, the population mean: 1 X n X i • X will change for different samples, even when those samples are taken from the same population. • By the Central limit theroem, we know that the distribution associated with X is approximately normal (X ~ N(μ, σ2/n), when Xi are iid). 7 Least-squares minimization • Estimate a function that follows the data (x,y), while minimizing the square of residuals min f y where f is a model function x 2 • In case f is a combination of linear functions, then problem can be solved analytically (without numerical approximation) 8 9 PROPERTIES OF ESTIMATORS 10 1) Unbiasedness • An estimator α^ for the parameter α, is said to be unbiased, if E (ˆ ) ˆ ^ is equal to meaning the expected value of α the parameter it wants estimate (α) 11 1*) Asymptotic unbiasedness ^ are called asymptoti• A series of estimators α n cally unbiased, if the expectation of it converges to the „real“ parameter α lim E (ˆ n ) n • Drawback: you might need a lot of n to get to the vicinity of α 12 2) Consistency • An estimator is said to be consistent, when (for growing n) the probability, that it differs from the estimated parameter shrinks to zero: lim P ˆ 0 n 13 2) Consistency • Interpretation: „The accuracy of the estimator α can be improved by increasing sample size n.“ • For e.g. X, this property is granted by the Law of large numbers 14 3) Efficiency • Efficiency refers to the variance of an estimator. • Because the estimate of a parameter is a RV, we would like for the variance of that variable to be relatively small. • If the estimator is unbiased, we can be fairly certain that it does not vary to much from the true value of the parameter. 15 3) Efficiency ^ • Def: An unbiased estimator α, whose efficiency 1 / ( ) e(ˆ ) 1 var( ) for all parameters α, is called efficient. • Here I(α) is the Fisher-Information matrix of the sample. 16 4) Maximum likelihood • Let x1, x2, ..., xn be the observed iid data and θ the parameter vector of the model. • Then the joint density function f x1 , x2 , , xn | f ( x1 | ) f ( x2 | ) f ( xn | ) describes the likelihood of observing the data under the assumption of θ 17 Log-likelihood and MLE • The log-likelihood is the reverse interpretation ˆl 1 ln L | x , , x ln f ( x | ). 1 n i n • The maximum likelihood estimator is defined as ˆmle arg max lˆ( | x1, , xn ). 18 Rationale and constraints • „If we observe a particular sample, then it must have been highly probable, and so we should choose the estimates of the population parameters that make this probability as large as possible.“ • But: one requires a parametric model and the associated likelihood function(!) 19 Remarks on estimators so far • Sample mean X and sample standard deviation s: – are not robust to „outliers“ – RT distributions are right-skewed, so both are inappropriate for statistical testing (power issues) • Median Md or Interquartile Range (IQR:= Q3-Q1): – are robust, but their standard errors are larger than of X, s – Also require higher sample sizes to approximate normality – And Md is biased, when estimating a skewed distribution 20 Cutoffs • Monte Carlo simulations paper (Ratcliff, 1993) • Investigated methods to reduce/ eliminate effects of outliers on X, s: – Md lower power in any case – Fixed cutoffs no hard rule – 3sd<->mean disastrous effects on power – Cutoffs Can introduce asymmetric biases 21 Inverse transformation (I) • Next most powerful alternative to minimizing effects of outliers – transform RT to speed • Ratcliff doubled the slowest RT and the standard deviation of X = 1/RT worsened from s=.466x10-3/ ms to only .494x10-3/ ms 22 Summary • Try to avoid using cutoffs • if you do, always present the results for both complete and trimmed data • Rather use robust methods (e.g. inverse transformation) instead 23 PARAMETRIC VS NON-PARAMETRIC ESTIMATION 24 Reflections • Least-squares minimization and MLE insure that estimates are unbiased, consistent, efficient, ... • But: – preconditions for these methods rarely met (normality, iid, ...) – Furthermore, when estimating e.g. density functions, the extent of bias depends on the true form of the population distribution (which is unknown). 25 Options When a model is specified, then one can use parametric procedures Else one has to use non-parametric procedures 26 Estimating the CDF • We already know that the CDF can be estimated by the ECDF: number of elements t 1 ˆ Fn (t ) 1xi t n n • This also gives us an estimator for the survivor function(!) • The ECDF is unbiased and asypmtotically normal 27 28 Quantiles • You can easily estimate the pth quantile • For example the Median (p = 0.5) • Start by sorting your data from smallest to largest • If your data RTn has e.g. n = 45 values, then n*p = 22.5 and the [np]th observation is (RT22 + RT23)/2 29 DENSITY ESTIMATORS 30 Classes of density estimators • general weight function estimators – e.g. the histogram or weighted histogram – Are easy to compute, but tend to be inaccurate • Kernel estimators – More difficult to understand and need more computation 31 Warning • „It is important to realize that a density estimate is probably biased, the degree of bias will be unobservable, and the bias will not necessarily get smaller with increase in sample size.“ (Van Zandt, 2002) • Also: Never use density estimates for your parameter estimation (i.e. model fitting)! 32 Histogram ˆf (t ) number of observatio ns in bin i , t t t i 1 i nhi • The more n you have the smaller your bin widths hn have to be. hn can be fixed and variable. • But: there is no algorithm to adjust hn for increasing n (accuracy need not improve with greater n!) inappropriate for serious RT analysis 33 0.4 0.3 0.2 0.1 0.0 Density 0.5 0.6 0.7 Histogram of eruptions 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 34 Kernel estimators • Idea: At every point the kernel estimator is a weighted average of all of the observations in the sample ˆf (t ) 1 nh n Ti t K i 1 hn n • hn is again the width and the larger hn the smoother the estimate will be. Ti is the sum of all obs‘s in bin i. • The kernel K itself is a density function, which integrates to 1 over x and is typically symmetric 35 Ex: Gaussian Kernel 1 x2 / 2 K ( x) e 2 36 Gaussian Kernel • Is a generally good estimator of RT densities • Especially for larger samples (n > 500) ^ • As the kernel is continuous, the f(t) is, too • Again: even with larges samples n, the Gaussian kernel estimate might be biased (Van Zandt, 2000) 37 Choosing hn • Silverman (1986) proposed a method to choose the smoothing parameter of the kernel estimate hn as a function of the spread of the data 0.9 IQR hn 0.2 min s, n 1.349 • Here s is the sample standard deviation and IQR is the interquartile range 38 HAZARD ESTIMATES 39 Hazard function • Since the hazard function f (t ) h(t ) F (t ) ˆ (t ) f hˆ(t ) ˆ (t ) 1 F one might try to estimate it by was defined to be ˆf (t ) 1 nhn t Ti K i 1 hn n t T 1 ~ i Fˆ (t ) K n i 1 hn n 40 Problems • since the denominator goes to 0, errors inflate! • also sparseness of data from the tail of the distribution in the sample generally makes hazard functions difficult to observe 41 3 x2 4 5 1 if x 5 K ( x) 5 0 else x 3 1 ~ 3 K ( x) K (u ) du x x / 15 5 2 4 5 42 Epanechnikov estimator • By using the Silvermann smoothing estimate for hn, one avoids discontinuities at the tail of the data • Eventually though, hn will show a tremendous acceleration towards ∞ • All in all the Epanechnikov estimator gives the most accurate estimates for the greatest range and one easily sees, where it becomes inaccurate 43 44 IV MODEL FITTING 45 AND NOW TO 46 ecdf(x) • The ecdf(x) command generates a stepfunction – the empirical cumulative distribution function • One can plot the result by plot.ecdf(x) or plot(ecdf(x)) • One can access the data via knots(ecdf(x)) or summary(ecdf(x)) 47 histogram(x) hist(x, breaks = b, freq = FALSE) • breaks can be either – a vector giving the breakpoints (x,y) – A single number n, giving the number of bins – A function to compute the number of cells (e.g. Silverman) – Characters for break-algorithms („Sturgess“, „Scott“, „FD“) 48 density(x) density(x, bw = "nrd0", adjust = 1,kernel = "gaussian“) • This computes kernel density estimates and can also be used for plotting ( plot(density(x) ) • bw = smoothing bandwith („nrd0“ = Silverman‘s rule of thumb) • adjust = adjusts the bandwith relative to bw (e.g. 0.5) • kernel = „gaussian“, „rectangular“, „epanechnikov“, ... 49 Estimating hazard functions • There is a package „muhaz“ in R, but it only deals with exponential hazard rates • But with the commands from before one can implement a generic hazard function estimate 50