Data Domains and Introduction to Statistics Chemistry 243 Instrumental methods and what they measure Electromagnetic methods Electrical methods Instruments are translators Convert physical or chemical properties that we cannot directly observe into information that we can interpret. P T P0 A bc log T P log P0 c b Sometimes multiple translations are needed Thermometer Bimetallic coil converts temperature to physical displacement Scale converts angle of the pointer to an observable value of meaning adapted from C.G. Enke, The Art and Science of Chemical Analysis, 2001. Thermostat: Displacement used to activate switch http://upload.wikimedia.org/wikipedia/commons/d/d2/Bimetaal.jpg http://upload.wikimedia.org/wikipedia/commons/2/26/Bimetal_coil_reacts_to_lighter.gif http://static.howstuffworks.com/gif/home-thermostat-thermometer.jpg Components in translation Data domains Information is encoded and transferred between domains Non-electrical domains Beginning and end of a measurement Electrical domains Intermediate data collection and processing Data domains Initial conversion device Intermediate conversion device Readout conversion device Often viewed on a GUI (graphical user interface) PMT Resistor Digital voltmeter Electrical domains Analog signals Magnitude of voltage, current, charge, or power Continuous in both amplitude and time Time-domain signals Time relationship of signal fluctuations (not amplitudes) Frequency, pulse width, phase Digital information Data encoded in only two discrete levels A simplification for transmission and storage of information which can be re-combined with great accuracy and precision The heart of modern electronics Digital and analog signals Analog signals Magnitude of voltage, current, charge, or power Continuous in both amplitude and time Digital information Data encoded in only discrete levels Analog to digital to conversion Limited by bit resolution of ADC 4-bit card has 24 = 16 discrete binary levels 8-bit card has 28 = 256 discrete binary levels 32-bit card has 232 = 4,294,967,296 discrete binary levels Common today Maximum resolution comes from full use of ADC voltage range. Trade-offs More bits is usually slower More expensive K.A. Rubinson, J.F. Rubinson, Contemporary Instrumental Analysis, 2000. Byte prefixes About 1000 About a million About a billion Serial and parallel binary encoding Slow – not digital; outdated (serial) Fast – between instruments “serial-coded binary” data Binary Parallel: Very Fast – within an instrument “parallel digital” data Introductory statistics Statistical handling of data is incredibly important because it gives it significance. The ability or inability to definitively state that two values are statistically different has profound ramifications in data interpretation. Measurements are not absolute and robust methods for establishing run-to-run reproducibility and instrument-to-instrument variability are essential. Introductory statistics: Mean, median, and mode Population mean (m): average value of replicate data N x i m lim N i 1 N x1 x2 x3 ...xN N Median (m½): ½ of the observations are greater; ½ are less Mode (mmd): most probable value For a symmetrical distribution: m1/ 2 mmd m Real distributions are rarely perfectly symmetrical Statistical distribution Often follows a Gaussian functional form Introductory statistics: Standard deviation and variance Standard deviation (s): N s 2 x m i lim i 1 N N Variance (s2): N s 2 lim N xi m i 1 N 2 Gaussian distribution Common distribution with well-defined stats y 68.3% of data is within 1s of mean 95.5% at 2s 99.7% at 3s 1 s 2 x m e 2s 2 2 Statistical distribution 50 Abs measurements of an identical sample Let’s go to Excel Table a1-1, Skoog But no one has an infinite data set … N x i x i 1 N N s 2 x x i i 1 N 1 N x x i s2 i 1 N 1 2 Standard deviation and variance, continued s is a measure of precision (magnitude of indeterminate error) 2 s total s12 s 22 s 32 ...s n2 Other useful definitions: Standard error of mean sm s N Confidence intervals In most situations m cannot be determined Would require infinite number of measurements Statistically we can establish confidence interval around x in which m is expected to lie with a certain level of probability. Calculating confidence intervals We cannot absolutely determine s, so when s is not a good estimate (small # of samples) use: Note that t approaches z as N increases. 2-sided t values Example of confidence interval determination for smaller number of samples Given the following values for serum carcinoembryonic acid (CEA) measurements, determine the 95% confidence interval. or 16.9 ng/mL, 12.7 ng/mL, 15.3 ng/mL, 17.2 ng/mL Sample mean = 15.525 ng/mL s = 2.059733 ng/mL Answer: 15.525 ± 2.863, but when you consider sig figs you get: 16 ± 3 Propagation of errors How do errors at each set contribute to the final result? x f p, q, r... dxi f dpi , dqi , dri ... x x x dx dp dq dr ... r v p v q v s x2 x 2 x 2 x 2 s p sq sr ... r p q