Time Series Frequency Domain Warning: this section has depleted my mathematical typsetting resources. Those who want the best explanation possible are advised to also read the book, which gives the mathematical equations in a less ambiguous font. Observations are made at discrete, equally spaced intervals, i.e. t = 1,2,3,...etc Rather than thinking of the time-series as processes as we did before, think of them as sums of cosine waves: X(t) = sum Rj cos(wj t + thetaj) + Zt where w is really an omega, and Zt is white noise. For a very simple time-series: X(t) = R cos(wt + theta) + Zt R is the amplitude---how high/low the curve goes on the y-axis w is the frequency -- how quickly the curve oscillates; the number of radians per unit of time. theta is the phase -- shifts the curve up and down the x-axis. Other related concepts: f = w/2pi = # of cycles per unit time If we think of t as continuous, then this is just a plot of the cosine function. But we don’t make continuous observations, we see only t = 0,1,2,3,etc. (Note: the following discussion is easier with a picture, which unfortunately I can’t reproduce here too easily.) Note that it is important that your observations be at the right frequency. If your time observations are only once per cycle (once every 2pi radians), you will observe a flat line. In fact, if you observe only twice per cycle, you will see a flat line (remember that the observations are evenly spaced.) If you make, say, 3 observations per cycle, you will eventually see different values, but it will take a long time for you to see the all possible values. This suggests that there is a certain minimum frequency at which you must sample if you want to observe a particular wave. More disturbing, since we don’t know ahead of time what the frequency is, it means that we will never be able to tell if we’re missing a wave oscillating at a much higher frequency. In fact, the rule is that if you sample every t units of time, the fastest wave you can observe is the one with w = 1pi (that oscillates with 1 radian per unit of time; in other words, that goes through 1/4 of its cycle in each time interval.) This corresponds to a frequency of 1/2 a cycle per unit of time. Or look at it like this; to see a complete wave, you must see the peak and the trough. This means you must sample at 0 radians and at 1 radian. This frequency is called the Nyquist frequency. Hence, if we sample every 1 second, the fastest wave we can see is the one that oscillates 1 radian per second. If we sample yearly, the fastest we can observe is the one that cycles one radian per year. When planning a study, this plays a role in the following sense. If you sampled air temperature every day at noon, then you would be able to model day-to-day cycles, but you could not model cycles that occured within the day. Common sense. If the process depends on frequencies faster than the Nyquist frequency, then we get an effect called aliasing. This means that these higher frequencies appear -- to us -- to be caused by slower waves. Essentially, we are “out of sync” of the process. Again, the book has some nice illustrations. A trig identify is behind this: for k, t integers, cos(wt + k*pi t) = (1) cos(wt) if k is an even integer (2) cos(pi - w)t if k is odd So for example, if the omega = 3/2 *pi , meaning the wave goes through 1.5 rads per unit of time, then cos(3/2 pi t) = cos(pi/2 t + 1*pi t) = cos(pi - pi/2) (using the identify with omega = pi/2 and k = 1) = cos(pi/2 t). Hence, we can’t distinguish between cos(3pi/2 t) and cos(pi/2 t). We can “tease out” the phase by using this identify: cos(wt + theta) = cos(wt) cos(theta) - sin(wt)sin(theta). Since X(t) = sum Rcos(wjt + theta j) +Zt, applying this identity, let aj = Rj cos theta j bj = - Rj sin theta j X(t) = sum aj Cos(wj t) + bj sin(wjt) + Zt. So we need to estimate the a’s, and b’s and w’s. Note: we’re going to flip back and forth between these two representations: the one in terms of R and thetas versus the one with a’s and b’s. Just as there is a fastest observable wave, there is also a slowest observable. This would be the wave that completes just a single cycle during our study period. Assuming observations are at equal spaced intervals which we label 1,2,...N, then this wave is the one that goes through 2pi radians in N units of time, and therefore the frequency is omega = 2pi/N. Put it all together, and it means that there are only a certain number of frequencies which we’ll be able to estimate. From slowest to fastest: 2pi/N, 4pi/N, 6pi/N,...,pi which are equal to 1*2pi/N, 2*2pi/N,....,(N/2)*2pi/N These frequencies can be represented by wp where p = 1,.2,...N/2. (Assuming N even.) In other words, half as many as we have data points. We now might model the data as a sum of these terms: ap cos(wp t) + bp sin(wp t) for each value of p above. Each term in this model is called the “pth harmonic.” Given this, one straight-foward strategy for estimating the a’s and b’s might be as follows: Fix a frequency, wp. For that frequency, apply the least squares criteria to find the a and b that best fit the data. With much algebra and trig, (and a little bit of calculus), one can find nice formulas for these parameters. I won’t reproduce them here, though, but they are essentially averages of the data, weighted by either cos(wp t) or sin(wp t). We can do this for each one of the N/2 frequencies. But note that we will then have “over fit” the data. At each frequency, we fit two parameters (a and b), and since there are N/2 frequencies that are fittable, we have N/2 * 2 = N paramters. This means that there is no “error” in the fit; no residuals. This is fine if you want to perfectly fit the observations, but our model allows for some “white noise” -- random deviations from the model. Including white noise in the model allows us to use the model to make predictions about the future. Put slightly differently, the problem with overfitting is you’re never sure if certain frequencies are included in the model soley because of abberations in this particular data set; you can’t be certain they’ll be there again if you recollect the data. So we can’t fit the function for every frequency; we must therefore choose some frequencies. But how to choose? Clearly we want only the “most important” frequencies, and one means of assessing importance is to look at the percentage of variation explained by including a particular frequency in the fit. Think of it like this: if we fit only the mean, that is we modelled the process as a straight line about the central value (estimated by the average of all of the data points), there would be considerable deviation from this line. In fact, one could estimate the variance as the average squared deviation from the average value. We could then ask how much of the variation would be cut down if we were to include in the model the estimate for one particular frequency. This overall variance could then be partitioned into two parts: the first is the amount of variation that was “explained by” (or removed by) the new model, and the “noise” that is still left unexplained. Turns out that with just a little bit of algebra, we can write out the amount explained by each of the p harmonics. It also turns out that this variation is equal to (R2p)/2 for p = 1,...,N-1 a2(N/2) for p = N/2 where Rp = sqrt(ap 2 + bp 2). This means that by estimating the a’s and b’s, and then converting them back to R’s, we have a quick way of seeing how much variation was explained by each component, and hence how important each frequency is.