DesignCon 2008 Characterization of Gaussian Noise Sources Ransom Stephens, Ph.D., Ransom’s Notes Ransom@RansomsNotes.com Robert Muro, NoiseCom, Inc. Abstract Key applications in high-rate serial data technologies assume that Random Jitter (RJ) follows a Gaussian distribution and require that receivers be tested under the stress of a calibrated level of Gaussian RJ. However, the most commonly available noise sources have never been rigorously characterized – until now. We perform a complete statistical analysis of industry-standard noise sources and report on the extent to which the Gaussian assumption is valid. By including both statistical and systematic errors in complete chi-squared and maximum likelihood analyses we show how to calculate the “goodness-of-fit” confidence level without resorting to the commonly used but inadequate least-squares approach. Authors’ Biographies Ransom Stephens’ company, Ransom’s Notes, produces and presents content at every level of technical sophistication to help engineers advance to technology's cutting edge. He is the author of more than 200 articles in the electronics industry, science journals, and magazines. Dr. Stephens has introduced new measurement techniques for electrical and optical systems, invented methods for extracting signals from noise, led an engineering commando team, and served on high data-rate standards committees. Contact him at www.RansomsNotes.com. Bob Muro, NoiseCom Product Manager, has worked in the Test & Measurement Industry for over 10 years. Initially as an application engineer for a large digital oscilloscope company, and now as a product manager for the NoiseCom division of the Wireless Telecom Group in Parsippany, N.J. He has a BSES from the New Jersey Institute of Technology, and is working towards a MS in Bio-Medical Engineering. Characterization of Gaussian Noise Sources Receiver testing requires well calibrated worst-case signals – the fabled stressed-eye. The worst-case prescription inevitably calls for specified amounts of Random Jitter (RJ) and, increasingly, Random Noise (RN). In most specifications (PCI-Express, sATA, FBDIMM, FibreChannel, et cetera) RJ is required to be mostly flat in the jitter-frequency domain and Gaussian in the time domain. The use of spectrum analyzers simplifies the frequency domain analysis, but in the time domain, sophisticated statistical analysis techniques are required to verify the Gaussian nature. Random Jitter is applied to signals by converting voltage noise to phase noise with a voltage-controlled delay. The Voltage-Controlled Delay (VCD) translates the distribution of voltage noise to a phase noise time domain distribution with the same shape – An ideal VCD has a linear translation coefficient, jitter/volt in ps/mV. Analog noise generators are the most economical and common source for generating RJ and RN with the broad crest factor necessary to probe receivers with sensitivity sufficient to reach a bit error ratio of 1E-12. Most pattern generators produced by the major test and measurement companies use RJ sources similar to, and sometimes precisely, those reported here. Many of the biggest component manufacturers use a combination of offthe-shelf pattern generators and noise sources developed in-house that are also based on the generators analyzed in this study. The test equipment typically quotes an uncertainty of 10% or more in the amplitude of their RJ/RN generation, but they don’t quote the degree to which the distribution is Gaussian, or the range over which it is Gaussian – we suspect, because they don’t know. Neither do they provide documentation of test results or procedures to check calibration – this paper provides both results for the most common noise sources and techniques for test engineers to perform the analysis themselves. The Gaussian nature of the source is, in most cases, more important than the crest factor. The crest factor is relevant only on measurements performed by bit error ratio testers that truly probe extremely low bit error ratios, like 1E-12 – a measurement that can take from half an hour to several hours. The vast majority of measurements, any jitter analysis performed on any type of oscilloscope or time interval analyzer and any fast measurement on bit error ratio testers, involve extrapolation [1]. The extrapolation relies much more on the assumption of a Gaussian shape in the center of the distribution than the extent of the distribution’s tails and, hence, the crest factor. But, a nice quality of commercial noise sources is that Gaussian nature and large crest factor go hand-in-hand. It is when large power is needed that sacrifices must be made. Power is increased by amplifying the source and, even with state-of-the-art amplifiers, nonlinearities cause deviations from Gaussian structure and reduce the crest factor. The Gaussian nature of a source can be checked with a least-squares, but least-squares is not useful for what statisticians call “hypothesis testing.” Chi-squared and maximum likelihood analyses built from a combination of statistical and systematic uncertainties and models of the measuring instrumentation are necessary for the calculation of a genuine “goodness-of-fit” hypothesis consistency parameter. A benefit of the proper analyses is that accurate uncertainties of the fit parameters can also be calculated; there’s no way to do this with least-squares tests. Since this “goodness-of-fit” parameter, also called the “confidence level of the fit,” as opposed to the confidence intervals of fit parameters, is poorly understood in the engineering community, this paper begins with a short but rigorous tutorial. Statistical techniques for hypothesis testing Given a set of data, we are compelled to explain it with our best hypothesis, but how can we tell which hypothesis is best [2]? Noise sources generate random events that follow some type of distribution. Figure 1 shows a histogram of random events. A histogram is the number of times that different types of events occur. In this case, the number of times that a measurement of the noise resulted yielded a voltage in a small range, or bin, of voltage: N vs. V. Our hypothesis for this data is that it follows a Gaussian distribution (of course we’re cheating, the data is a simulated Gaussian). The “central limit theorem” of probability and statistics says that the combination of an infinite number of small, unrelated, effects follow a Gaussian distribution. A Gaussian is also called a “Normal distribution” or “bell curve.” The normalized version is: G ( x) = (x μ)2 exp 2 2 2 N (1) where is the standard deviation or width of the distribution, total number of entries. is its mean, and N is the To evaluate the hypothesis, first we need to find the parameters ( , N, ) that best fit the data. Figure 1: A histogram (bars) with a hypothetical distribution (smooth curve). Least Chi-Squared Fitting In a least-squares fit, one determines the set of parameters ( , N, ) that minimizes the sum of the squares of differences between the data and hypothesis, K = (nk G ( x k ) ) 2 (2) k =1 where K is the total number of bins in the histogram, k indicates the bin number, nk is the number of events in the kth bin, and xk is the value of the random variable (e.g., voltage) at the center of the kth bin. In practice, fitting is easy if you have the right software. Reference [3] has some good software routines for finding the global minimum of any function. Matlab and Mathematica have several optimizers built-in and even Microsoft’s Excel has one. There are two crippling problems with least-squares fitting. 1. Least-squares fits don’t incorporate measurement uncertainty. Clearly an accurate measurement should have a greater affect on the choice of parameters than an inaccurate one. 2. Least-squares fits don’t provide a measure of how good (or bad) the hypothesis is. By virtue of incorporating measurement uncertainties, the least chi-squared fit solves both of these problems. Chi-squared, is given by K 2 = k =1 (nk G ( xk ) )2 . (nk ) 2 (3) Measurements with larger uncertainties, (nk), have smaller impacts on chi-squared. Figure 2 shows the histogram from Figure 1 with error bars of length (nk) and G(x) determined by minimizing chi-squared. Notice that if the curve just touches the end of the error bar, then that data point contributes one unit to chi-squared; if the curve is two error bars from the data point, it contributes 4=22 to chi-squared. Figure 2: The histogram expressed with statistical uncertainty (triangles with error bars) and optimized fit (smooth line). Statistical and Systematic Uncertainties In a purely random process, like the histogram in Figure 1, the statistical uncertainty of the number of events in a single bin is simply the square-root of that number: stat (nk ) = nk . (4) If the number of events is less than about 5 we run into an annoying technical problem that requires us to use the maximum likelihood technique, discussed below, instead of least chi-squared. Using Eq. (3) to determine the best fit parameters is always superior to using Eq. (2), so we we’ll put off worrying about this technicality until it can’t be avoided, besides, everything we say about chi-squared turns out to be true for maximum likelihood anyway. The fractional uncertainty is then (nk)/nk = 1/ nk which, as expected, gets smaller as the sample size gets larger and the measurement becomes more precise. We have to be careful with the magnitude of the uncertainty; it should be what statisticians call “one standard error” so that the other pieces of the puzzle fit together properly. One standard error corresponds to a 68% confidence interval which is the number of entries under a Gaussian distribution, Eq. (1), within one standard deviation of the mean. Eq (4) rests on the assumption that the number of entries in the kth bin is independent of the number in the (k+1)th or (k-1)th bins. For random noise, this assumption is valid. A case where it might not be valid would be non-random noise, for example, periodic noise. If the voltage as a function of time follows a known trajectory, then if we know the number of entries in the (k-1)th bin we could predict the number in the kth bin; in this case the (k-1)th and kth bins are not independent, they are correlated. Systematic, as opposed to statistical, uncertainties are caused by both mistakes and the reality of test equipment. Poor calibration, unaccounted for noise, detector sensitivity and limited bandwidth, and incorrect technique are all examples of systematic uncertainties. For measurements performed on oscilloscopes, the dominant systematic uncertainties are caused by noise, calibration errors, ADC nonlinearities and are usually reported by the manufacturer in the form of sensitivity, rms noise, accuracy and resolution. Unfortunately, test and measurement manufacturers do not report their systematic uncertainties in terms of “one standard error.” What they report is better thought of as a tolerance: the product team’s best guess at the worst case inaccuracy of their equipment. For the purposes of this paper, we’ll assume that these tolerances correspond to 99.7% confidence intervals, which corresponds to three standard errors. We’ll see below that there are ways to determine if this assumption is merited. It can be a tricky game and the best approach is to use the most accurate equipment possible. Often, the “best guess” technique is all we have to estimate systematic uncertainty, after all, if we knew what was wrong with a measurement, we’d do it right. Where a statistical uncertainty causes individual measurements to oscillate evenly about their true mean, systematic uncertainties can result in a bias, skew, or offset of the observed values from true values. Chi-squared in Eq. (3) requires the total uncertainty, (nk). For what we’re doing here, and in most cases, it is reasonable to assume that the statistical and systematic uncertainties are independent of each other. In practice, independence means that if we change one there is no effect on the other. Simple error propagation shows that independent uncertainties combine like the sides of a right triangle [2]: (nk ) = sys (nk ) 2 + stat (nk ) 2 = sys (nk ) 2 + nk . (5) Uncertainty of Fit Parameters Since every entry in the histogram has its own uncertainty, we should be able to use something like propagation of errors to obtain the uncertainty of the fit parameters, (N, , ). Rather than go through the whole derivation, let me just sketch where it comes from and report the result. Notice the similarity between the arguments of chi-squared and the exponent of the Gaussian distribution: K 2 = k =1 (nk G ( xk ) )2 (nk ) 2 (Eq. (3)) and G ( x) = (x μ)2 exp 2 2 2 N (Eq. (1)), both have something like (z - expectation)2/width2 where we expect to find z close to the expectation and how close we expect it is given by the width of the distribution. In fact, that chi-squared is so closely related to the sum of the arguments of a Gaussian – you might even say that chi-squared looks like the exponent of the product of many Gaussians – means that it has almost magic properties. The width is the sort of thing we’re looking for in the fit-parameter uncertainty. If we were to change one of the measured values by a single unit of uncertainty, then chisquared would increase by one unit. Similarly, if we vary one of the fit parameters, say , so that chi-squared varies by one unit, then it makes sense that the amount we’ve varied is precisely one standard error. In other words, let (Nfit, minimum, fit, fit) be the fit parameters for which chi-squared is a 2 min = 2 ( N fit , μ fit , fit ) then the one standard error uncertainty in is given by solving 2 2 ( N fit , μ , fit ) = min +1 for so that ( μ ) = μ fit μ . (6) Of course the same argument holds for the uncertainties in N and . Since we usually solve Eq. (6) numerically, it’s equally likely to get fit > or fit < the absolute value bars are there so that we aren’t confused by negative uncertainties. In practice, since we know the number of events in a distribution, we can let N vary in the fit and compare the fit result with the actual number of events in the distribution. If there is an appreciable difference between the uncertainty of N and the difference in the fitted and measured values a problem is indicated in either the fitting software or the hypothesis being tested. In the fits reported below, N is fixed at the measured value, only and are allowed to vary in the fit. Hypothesis Testing, Statistical Consistency, and Goodness of fit The best a statistical test can do in evaluating a hypothesis is to “not reject it.” Like proving the negative – we cannot prove that we were not some where at some time, the negative, except by proving that we were somewhere else at that time, the positive – it is impossible to prove that a hypothesis is incorrect. However, statistical consistency, that is, the non-rejection of a hypothesis, is the next best thing. As we’ll see, consistency dramatically narrows the field. A good fit is a necessary but insufficient condition on the validity of a hypothesis. This is a key concept to understanding goodness-of-fit: if two hypotheses are both consistent with the data, but one is a better fit, there is still no statistically valid reason to choose one hypothesis over the other. On the other hand, if you have to choose between two hypotheses, you could do worse than picking one on the basis that it has a better fit than the other – it’s just not rigorous. To determine if the hypothetical distribution is consistent with the true distribution, we calculate the probability for the hypothesized distribution to yield a worse fit than the one we got. The probability of getting a chi-squared worse than the one we observe is calculated by integrating the chi-squared distribution from the value we obtained to infinity: p = 2 f ( z; nd )dz (7) where f(z; nd) describes the chi-squared distribution [2]. The number of degrees of freedom, nd, is given by the number of ways that the actual distribution can vary from the hypothetical distribution. For example, if we fit a three parameter function to three data points, then since the number of parameters is the sane as the number of data points, minimizing chi-squared results in a fit that passes exactly through the three data points. The fit doesn’t tell us anything about the hypothesis – that is, there are no degrees of freedom. In Figure 2, nd is given by the number of data points minus the number of parameters in the fit, nd = K 3. In words: The goodness-of-fit, p, is defined as the probability to find chi-squared equally or less compatible with the hypothesis than the level of compatibility observed in the actual data. The obvious question is: What are good and bad values for goodness-of-fit, p? To get the answer, we simulated 400 different random Gaussian distributions like those in Figure 1 and Figure 2 and fit a Gaussian hypothesis to each. The resulting goodness-of-fit values, p, are shown in Figure 3a. The p distribution is flat; this means that the probability that a correct hypothesis will give a p-value of 0.01 is the same as the probability that it will give a p-value of 0.99. Since each data point is a random variable, even when our hypothesis is correct, obtaining consistently small values for chi-squared (and correspondingly large values for p) is just as unlikely as obtaining consistently large chi-squared values (and correspondingly small values for p). Figure 3: (a) Distribution of goodness-of-fit parameters, p, for 400 fits where the hypothetical distribution is correct, (b) with a tiny amount (0.5%) of additional flat background noise, and (c) and (d) where the hypothetical distribution is incorrect. The goodness-of-fit parameter is extremely sensitive to the agreement of the data and the hypothesis. Figure 3b shows the goodness-of-fit distribution for simulations with 4975 Gaussian samples and 25 samples that follow a flat distribution – the sort of effect you might expect of an instrument background. Even this 0.5% effect has a huge impact on the goodness-of-fit. Accurate hypothesis tests rely heavily on understanding how the measurement equipment affects the data. Figure 3c shows a Gaussian hypothesis applied to a simulated Cauchy distribution. The Cauchy has the same qualitative bell-shaped curve as a Gaussian. Figure 3d shows the goodness-of-fit parameter for 500 fits of Gaussian hypotheses to simulated Cauchy distributions. The highest goodness-of-fit of the lot is less than 0.005. Figure 3b and Figure 3d indicate the power of the goodness-of-fit distribution in rejecting bad hypotheses. The conclusions are: 1. Correct hypotheses goodness-of-fit values are evenly distributed: 0 < p < 1 2. It is not wise to judge a hypothesis from a single fit. 3. Incorrect hypotheses result in goodness-of-fit values p << 0.01. A few comments are warranted before we go off and start testing hypotheses. First the goodness-of-fit p-value should not be confused with the significance level of a test or a confidence interval of a measured variable. However, goodness-of-fit is sometimes referred to as “the confidence level of a fit”; while not quite a misnomer, it is misleading because, as we see in Figure 3, the chance for a correct hypothesis to have a “1% confidence level,” p = 0.01, is the same as for it to have a “99% confidence level,” p = 0.99. In other words, goodness-of-fit is nothing like a probability that the hypothesis is true. Nor should goodness-of-fit be confused with the correlation parameter of two distributions, R2. The correlation is sometimes mistakenly used as a consistency test. In truth, the correlation tends to 1 if two data sets depend on the same parameters, but doesn’t say anything about whether the two data sets are the same. Maximum Likelihood Fits The maximum likelihood technique is a generalization of the least chi-squared technique that naturally accommodates bins with small numbers of events and sometimes makes it easier to account for backgrounds and test equipment irregularities that can be directly observed. In a maximum likelihood fit we write down the probability that a given histogram is described by a particular hypothesis and then determine the set of parameters, in this case (N, , ), for which that probability is largest. The likelihood, L, is given by the product of the probabilities for each bin in the histogram. It is essentially the probability that a particular histogram could occur given the underlying process. Since a huge number of different configurations can occur, the likelihood tends to be a very small number that is useful for evaluating a hypothesis only after the goodness-of-fit parameter, Eq. (7) is evaluated. Let f(nk; N, , ) be the probability for nk entries to appear in the kth bin if the true distribution parameters are (N, , ). Then the likelihood for a given histogram is K L = f (nk ; N , μ , ) . (8) k =1 In a random process, the probability of seeing nk entries in the kth bin of a histogram, follows the Poisson distribution, P [2]: P(nk ) = e n , nk ! (9) where is the average number of events expected in that bin. For our Gaussian hypothesis, from Eq. (1), k = ( xk μ ) 2 exp 2 2 2 N . (10) Plugging Eq. (10) into Eq. (9) and using it in Eq. (8) makes a big algebraic mess. Since all we care about is finding the maximum of Eq. (8), the mess can be converted into a more manageable form by using the fact that 2 = 2 ln( L) + constant . (11) Thankfully, -2 ln(L) + constant follows the chi-squared distribution and we can use Eq. (6) and (7) to determine the uncertainties in the fit parameters, the goodness-of-fit and so forth. The constant in Eq. (11) is determined by requiring that -2 ln(L) + constant = 0 when all k = nk. Analysis of commercial noise sources The industry-standard noise sources use diodes. Like the thermal or Johnson noise generated by a resistor, we expect the noise to be essentially white in frequency and Gaussian in voltage. It’s ironic that Gaussian RJ and RN effects are a problem in electronics design, but the effects are quite difficult to produce on purpose. The noise generated with a diode is amplified to increase the spectral output power to levels useful for practical applications. Nonlinearities in the amplification introduces the potential for the output power to deviate from Gaussian behavior. Analysis game plan We’ve seen that a single hypothesis test is inadequate to inform whether or not a source follows a given hypothesis. It’s much more effective to perform a large set of tests, evaluate each goodness-of-fit and compare them with Figure 3. If the comparison favors Figure 3a, then it’s reasonable to conclude that the true underlying mechanism is consistent with our hypothesis; if it looks more like Figure 3d, then we should reject the hypothesis. If, on the other hand, it is between these extremes, like Figure 3b, then we can say that our hypothesis isn’t too far off and is overwhelmingly, if not completely, pure Gaussian. Figure 4 shows the test setup. The noise source is a NoiseCom 1108a model which has a spectral density covering 100 Hz to 500 MHz, with a total output power of up to 10 dBm. Figure 4: Test setup. The noise output is captured by a LeCroy WavePro 7200A real-time oscilloscope. A noise “trace” is acquired simply by triggering the scope and filling the scope memory with subsequent voltage samples. The result is a long list of voltages, each corresponding to a single instance, or event, of the source emitting a given voltage level. Since the noise is random, no interpolation is used and it is incorrect to think of the signal as we might a waveform trace. Rather, we consider it a set of independent data points. We could quibble over the independence of adjacent samples, but any correlation between samples has a tiny effect on the results of this analysis. A primary source of dependence could arise were the scope low-pass filtering the data. Here, the scope bandwidth, 2 GHz, is appreciably higher than the noise bandwidth. The data is accumulated in a text file – a million measurements of voltage – which is ported to a PC for analysis. The analysis is performed in Ransom’s Notes lab and consists of the techniques detailed above; all software was implemented in Visual Basic from first principles, except the minimization routine which came from the Microsoft Excel dynamic link library. Finally, the game plan: after a cursory look at the whole distribution, we apply maximum likelihood fits to 200 sub-samples, each with 5000 events. In this first stage of analysis, we include only statistical uncertainties, Eq. (4). The goodness-of-fit parameter, p, is calculated for each and plotted as a histogram as was done for simulated noise in Figure 3. We then apply a model of the oscilloscope behavior to the Gaussian hypothesis – smearing the hypothesis into a shape that we ccould reasonably expect to observe on the oscilloscope. By comparing the hypothesis with its smeared replica, we derive the systematic uncertainty. The systematic uncertainty is then included in the calculation of the maximum likelihood and the smeared hypothesis is fit to, again, 200 subsets of 5000 noise samples. The goodness-of-fit histogram is then evaluated to determine whether or not the Gaussian hypothesis is consistent with our observations. Analysis and results Figure 5 shows the distribution of a million noise samples with a maximum likelihood fit. To the eye, the distribution is indistinguishable from a Gaussian, but the goodness-of-fit parameter for this fit is nearly zero. Of course, since the distribution of goodness-of-fit is flat, it is not wise to judge a hypothesis from a single fit. Figure 5: Histogram of the full 1Mpoint noise sample on a (a) linear scale, and (b) logarithmic scale. Figure 6 shows the distribution of goodness-of-fit for 200 maximum likelihood fits applied to each 5000 sample subset of the million sample data. If the distribution were truly Gaussian, we would expect a flat distribution, like we saw for fits to simulated Gaussians in Figure 3a. Of course, we have yet to include the instrument effects, i.e., systematic uncertainty, in the analysis. Figure 7 shows results for the fits that had the largest and smallest goodness-fo-fit; notice the sensitivity of the goodness-of-fit parameter to seemingly minor fluctuations. Figure 6: Distribution of goodness-of-fit for 200 maximum likelihood fits applied to each 5000 sample subset of the million sample data. It is unreasonable to expect any test and measurement equipment to leave no footprint whatsoever on a measurement. Our task is to make the best estimate possible of how the equipment affects the data and then include the uncertainty in our calculation of goodness-of-fit. Figure 8 shows the intrinsic noise of the oscilloscope. Notice that the noise distribution demonstrates the same sleight asymmetry that we see on a much larger scale in the distribution of the whole data set, Figure 5a. In estimating systematic errors, the real challenge is combining the oscilloscope datasheet information and the observed noise, Figure 8, into bin-by-bin systematic uncertainties, sys(nk). The oscilloscope data sheet reports the following relevant uncertainties: Sensitivity Vertical gain accuracy Offset accuracy 2 mV at 1 V/div into 50 Ohm ± 1.5% (1% typical) of full scale ± 1.5% full scale + 0.5% of offset value + 2 mV Figure 7: Histograms of 5000 event subsets that have (a) the highest (at 0.993) and (b) lowest (at 5E-07) goodness-of-fit values. Figure 8: The intrinsic noise of the oscilloscope used to collect the data, (a) the “trace” and (b) the distribution. Since the uncertainties of all real-time oscilloscopes include components that scale with the dynamic range setting, we need to formulate a way to incorporate them into systematic uncertainties for use in Eq. (5). As mentioned above, T&M manufacturers report tolerances rather than 67% confidence interval standard errors, so despite the 1% typical DC Gain offset reported, we should consider something closer to a third of that. First, consider what happens as the input signal is processed by the oscilloscope. The oscilloscope properties are convolved with the signal. Let the effect of the scope on the data, that is, its transfer function, be T(x) and let the Gaussian hypothesis be G(x), then, if the hypothesis is correct, then we observe, h(x), given by G( x u )T (u )du = h( x) . (12) Of course the actual oscilloscope transfer function is complicated and very difficult to measure. While we can’t actually calculate the corrected hypothesis h(x), we can estimate what the effect of the oscilloscope transfer function is on the hypothesis by smearing the hypothetical distribution with the observed oscilloscope background. The difference between the hypothesis and the smeared version provides a scale for the systematic uncertainty, K sys (n k ) f G (k ) G (k i )b(i ) , (13) i =1 where b(i) is the background or noise distribution of the oscilloscope and includes its transfer function. Now we turn to the scope data sheet. The vertical gain accuracy scales with the dynamic range, so in evaluating Eq. (13) we rescale b(i) so that it results in about a 0.5% bin-tobin smearing and, based (however loosely) on the assumption that the manufacturer’s “typical” uncertainty is about three times a standard error, set f = 0.003. We then combine sys(nk) and stat(nk) with Eq. (5) and refits the data. Figure 9 shows the resulting goodness-of-fit distribution; the same as in Figure 6 but now with the complete uncertainty. Overall the distribution is quite flat, thought there is still a small handful of fits that produced low goodness-of-fit values. Comparing with the simulations in Figure 3b, only the smallest variations from the tested hypothesis are likely to result in this distribution. There are two possibilities: we have not fully characterized the oscilloscope transfer function in our systematic uncertainties and/or there are tiny non-Gaussian variations in the output signal. Figure 9: Distribution of goodness-of-fit for 200 maximum likelihood fits applied to each 5000 sample subset of the million sample data but now including the systematic error. Overall, it is reasonable to conclude that the voltage distribution of the NoiseCom 1108A is overwhelmingly, if not completely, pure Gaussian. Gaussian Nature, Large Crest Factor, and Output Power Crest factor, given by V Peak 12 V peak to peak , = CF = Vrms (14) indicates the breadth of the tails of a distribution. The crest factor indicates whether or not a noise source can probe the low Bit Error Ratios, typically 1E-12, required for compliance tests in serial-data applications. In practice, few engineers ever actually make measurements to such low BERs – acquisition of some 1E13 bits are required to make an accurate measurement that low. Instead, clever algorithms are used to extrapolate data sets of several million bits [4] to predict the BER and total jitter defined at a bit error ratio. The accuracy of these algorithms is much more dependent on the Gaussian structure than large crest factor. However, the study of rarely occurring errors does require a large crest factor. Fortunately, large crest factor and Gaussian structure go hand-in-hand. The problem occurs when an application requires high output power. Power is increased by amplifying the source. Ideally, the amplifiers would have no effect on crest factor or Gaussian structure, but even state-of-the-art amplifiers introduce nonlinearities that clip the crest factor and distort the shape of the distribution. Figure 10 through Figure 13 show the evolution of effects as a source is amplified. At -10 dBm, the source has a large crest factor and nice Gaussian shape, but both properties decay as the power is increased until, at +10 dBm, the distribution is clearly non-Gaussian and the crest factor has been reduced by at least 2 dB. Figure 10: -10 dBm noise source has a large crest factor, 16 dB, and Gaussian shape. Figure 11: -5 dBm noise source exhibits a slightly reduced crest factor and some asymmetry in the distribution. Figure 12: +5 dBm noise source exhibits a reduced crest factor, 14 dB, and a distribution that is not consistent with a Gaussian. Figure 13: At +10 dBm amplifier nonliearities have destroyed the Gaussian structure of the source. Conclusion The Gaussian character of noise sources is a key attribute in their application to stress tests of receivers. Most of these tests require tolerance to a combination of random and deterministic jitter. Unless the complete bathtub plot, i.e., the bit error ratio measured as a function of sampling point delay, is measured all the way down to bit error ratios below 1E-12, a measurement that takes many hours. Total jitter measurements performed on oscilloscopes, time interval analyzers, as well as the fast jitter analysis tools on bit error ratio testers, rely heavily on the assumption that the source is Gaussian. In this paper we have presented the first complete characterization of the Gaussian nature of an industry-standard noise source. The paper began with a tutorial on statistical analysis and hypothesis testing. It was shown that the least chi-squared and maximum likelihood methods of fitting hypotheses to data are superior to the commonly used least-squares technique: they allow for calculation of the goodness-of-fit parameter and uncertainties in the parameters determined by the fit. We hope that our tutorial is sufficient to enable users of Gaussian noise sources to perform their own evaluations. We follow the statistical analysis with a complete analysis of an industry-standard noise source, the NoiseCom 1108A. The analysis demonstrated that the source is consistent with a pure Gaussian. The tiny deviations we observed could be a remnant of the transfer function of the oscilloscope used to acquire the data. Finally, we compared noise sources of different power and crest factor. Large crest factors are needed to study rare events. Fortunately, when using a quality noise source, there is no need to sacrifice Gaussian nature for large crest factor. Unfortunately, there is a tradeoff between output power and Gaussian nature and crest factor. Even the finest amplifiers available distort the noise distribution and clip the crest factor. 1 Ransom Stephens, The Rules of Jitter Analysis, published in AnalogZone, www.analogzone.com/nett0927.pdf, 2004. 2 See any standard probability and statistics text, for example, Anthony J. Hayter, Probability and Statistics For Engineers and Scientists, 2nd ed. (Brooks/Cole Publishing, 2002); a remarkably complete, if terse, description is available from the Lawrence Berkeley Particle Data Group web site: G. Cowan et al., Statistics, pdg.lbl.gov/2007/reviews/statrpp.pdf , 2007. 3 William H. Press et al, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. (Cambridge University Press, New York, 1997). 4 Dennis Derickson and Marcus Mueller (editors), Digital Communications Test and Measurement, (Prentice Hall, Boston, 2007).