The arithmetic mean is the "standard" average, often simply called the "mean" The standard deviation (SD) quantifies variability. If the data follow a bell-shaped Gaussian distribution, then 68% of the values lie within one SD of the mean (on either side) and 95% of the values lie within two SD of the mean. The SD is expressed in the same units as your data. To apply a significance test, a hypothesis must be clearly stated and must have a quantity with a calculated probability associated with it. This is the fundamental difference between a hunch and a hypothesis test—a quantity and a probability. The hypothesis will be accepted or rejected on the basis of a comparison of the calculated quantity with a table of values relating to a normal distribution. As with the confidence interval, the analyst selects an associated level of certainty, typically 95%. The starting hypothesis takes the form of the null hypothesis What is a null hypothesis? The null hypothesis is stated in such a way as to say that there is no difference between the calculated quantity and the expected quantity, save that attributable to normal random error. As regards to the outlier in question, the null hypothesis for the chemist and the trainee states that the 11.0% value is not an outlier and that any difference between the calculated and expected value can be attributed to normal random error. The P value is a probability, with a value ranging from zero to one. If the P value is small, you'll conclude that the difference is unlikely to be a coincidence: P<0.05 "significant” P>0.05, "not significant" t test Another hypothesis test used in forensic chemistry is one that compares the means of two data sets. In the supervisor– trainee example, the two chemists are analyzing the same unknown, but obtain different means. The t-test of means can be used to determine whether the difference of the means is significant. The t-value is the same as that used in for determining confidence intervals. This makes sense; the goal of the t-test of means is to determine whether the spread of two sets of data overlap sufficiently for one to conclude that they are or are not representative of the same population. In the supervisor–trainee example, the null hypothesis could be stated as “ The mean obtained by the trainee is not significantly different than the mean obtained by the supervisor at the 95% confidence level ” Stated another way, the means are the same and any difference between them is due to small random errors. Table Bullets and fragments received by the FBI. Specimen Description Total weight, grains Total weight,mg CE 399 (Q1) Bullet from stretcher (lead core plus jacket) 158.6 10,277 CE 567 (Q2) Bullet fragment from seat cushion (lead core plus brass jacket) 44.6 2,890 CE 569 (Q3) Bullet fragment from front seat (jacket) 21.0 1,361 CE 843 (Q4,5) Two lead fragments from President’s head[2] 1.65; 0.15 107; 9.7 CE 842 (Q9) Three lead fragments from Connally’s arm 0.5 32 CE 840 (Q14) Three lead fragments from rear carpet 0.9, 0.7, 0.7 58, 45, 45 CE 841 (Q15) Scraping from inside surface of windshield None listed Table : Individual determinations of antimony in the FBI’s Run 4 Specimen Weight of subfragment, mg Sb, ppm Q1 7.16 643 4.20 636 1.79 750 1.24 749 1.16* 749 15.55 705±60* Table : Individual determinations of antimony in the FBI’s Run 4 Specimen Weight of subfragment, mg Sb, ppm Q9 1.92 690 2.07 662 1.34 677 5.33 676±14 Table : Individual determinations of antimony in the FBI’s Run 4 Specimen Weight of subfragment, mg Sb, ppm Q2 39.75 521 21.60 521 3.84 578 3.68 515 68.87 534±30 Table The FBI’s results for silver and antimony in bullets and fragments (concentrations in ppm). Specimen Q1 Q9 Q2 Q4,5 Q14 Ag 9.4±0.3 9.2±0.1 7.9±0.9 8.5±0.4 8.5±0.2 Sb Run 1 945±16 977±24 745±16 783±5 793±10 Sb Run 2 1002±13 1090±37 747±20 858±46 879±33 Sb Run 3 813±43 773±22 626±57 614±37 629±18 Sb Run 4 705±54 676±14 534±30 561±32 562±21