SIVVU ™ Dr. Douglas Vander Griend Michael DeVries Department of Chemistry & Biochemistry Calvin College Grand Rapids, MI 49546-4403 Contact: Sivvu@calvin.edu = - Outline I. Getting Started A. Overview B. Key Features C. Limitations D. Loading the Program II. The Sivvu™ Data Input Graphical User Interface A. Data File B. Inspection Buttons: Preview & Factors C. Building a Chemical Model i. Solvent ii. Activity Coefficient Model iii. Chemical Species iv. Initial Spectra Guesses v. Mass Balance and Chemical Reaction Equations vi. Simulation Button D. Process Buttons: Save, Optimize, Copy, & Cancel III. The Sivvu™ Control Graphical User Interface A. User Control Buttons: Optimize ΔG° Values, Modify Model, Save ΔG° Values, Calculate Error, Write Report & Copy Page B. Plots: Absorbance, Concentration, Absorptivity, Solution Fits, Residual, Residual Factors C. Ignore Solutions IV. Calculations A. Unrestricted Model Calculations B. Equilibrium Restricted Model Calculations C. Error Calculations Appendix 1. Data for Solvents 2 = - I. Getting Started A. Overview Sivvu™ – the name says it all (UV-vis backwords). In a UV-vis spectrometer, molecular species achieve chemical equilibrium and absorb light according to Beer’s Law. This program is designed to start with absorbance data for a series of solutions and ultimately extract the thermodynamic information that describes the chemical equilibria of the solutions, as well as the molar absorptivity values of all the individual species in solution. How does it work? The user must provide a set of chemical reactions that potentially govern all the equilibria in the system along with the raw absorbance data and information on the makeup of each solution. Starting from initial guesses for the ΔG°rxn values, Sivvu™ refines them in order to best model the entire dataset – all in a matter of seconds. B. Key Features Easy to load data – one Excel based input file Easily transportable results – Excel files and clipboard images Models raw absorbance or effective absorptivity Graphical user interfaces Extensive ‘model-free’ analysis to aide in deconvolution of data Self regulation of thermodynamic model construction Ready simulation to aide in experimental design High speed analysis: full optimizations in seconds Calculates activity coefficients according to various literature models Calculates standard deviation and confidence limits for thermodynamic parameters Estimates dependent and independent sensitivity of optimized ΔG°rxn values Numerous inquiry buttons to help guide new users C. Limitations Sivvu™ is a powerful tool for studying solution phase thermodynamics and for obtaining information about pure chemical compounds without the need to isolate them chemically. While a great deal of information can be extracted from a dataset by modeling it, the calculations are not magic. The program can only extract information as it exists in the dataset based on the mathematical relationships of chemical equilibria and Beer’s Law. Furthermore, the amount of information about a particular species is proportional to the total level of occurrence within the set of solutions. Therefore, if it only exists at a very low level in all the solutions, then the uncertainty of the molar absorptivity values for that species must be understood to be quite large. For a thorough treatment of the power and limitations of such a method, please see Malinowski’s Factor Analysis in Chemistry.1 3 Additionally, the resolution between two species may be poor within the dataset, particularly if their chemical or spectroscopic behaviors are comparable. For example, if two species grow in comparably over the course of a titration, their molar absorptivity values may be irresolvable. This can be true of the reactant and product in dimerization reactions for example. Finally, since the dependence of equilibrium concentration values on the equilibrium constant is non-linear, there are regimes where concentration, and therefore absorbance values, are insensitive to the equilibrium constant. 1. When the equilibrium constant or the initial concentrations are too large. 2. When the amount of product is too small (< 20%) or too large (>80%). In these regimes the equilibrium constants cannot be modeled precisely, because their sensitivity to the data is too low.2 D. Loading the Program Matlab must already be installed. Four files are generally included with the program: 1. ‘Sivvu.zip’ 2. ‘Sivvu Manual.doc’ 3. ‘example.xls’ 4. ‘example.mat’ (this file may not display the ‘.mat’ extension, but it will appear when viewed from within Matlab). Unzip the main folder into Matlab’s toolbox folder, which is where all m-files for Matlab are typically grouped and stored. Place the example excel and matlab file into Matlab’s working directory, which by default is called ‘work’, but can be changed from within the Matlab environment. Note: the data file must be in whichever folder is designated as the working directory. From within the Matlab environment, select ‘set path’ under the file menu to bring up a pathway window. If a previous version of Sivvu™ has been installed, remove the folder from the path directory, then add the newly unzipped folder back to the path directory. This will ensure that the all calls to m-files within Sivvu™ will find the proper versions. To initialize Sivvu™ type ‘Sivvu’ into the command line of Matlab, then load in the data file by typing the name of the file into the appropriate box in the data input GUI that appears. This can also be done in one step by typing “Sivvu (‘filename’)” in the command line. Example: Nickel(II) with 2,2’-bipyridine The example file provided is based on work detailed in the paper "Detailed Spectroscopic, Thermodynamic and Kinetic Characterization of Nickel(II) Complexes with 2,2’-bipyridine and 1,10-Phenanthroline attained via Equilibrium-Restricted Factor Analysis," Vander Griend, D. A.; Bediako, D. K.; DeVries, M. J.; DeJong N. A.; Heeringa, L. P., Inorg. Chem. 2008, 47, 656-662. To view it, make sure that the both example files are in the working directory of Matlab, and that the Sivvu m-files have been properly installed in the Matlab toolbox. Then enter “Sivvu(‘example’)” into the command line. After a few seconds, the interface window should appear with all the appropriate entries completed. 4 II. The Sivvu™ Data Input Graphical User Interface The initial interface of the program allows the user to upload a data file, graph the data, examine the mathematical structure of the data, build a thermodynamic model, and simulate the effects of dilution/concentration or temperature change on the experiment. Click on ‘?’ buttons to find brief explanations for the interface. A. The Data File Sivvu™ requires exactly one input file containing all relevant experimental data. It must be the first worksheet of a Microsoft Excel spreadsheet (.xls), organized as shown in Figure 1. A typical dataset will be a set of absorbance curves from a spectrophotometric titration, along with the composition of the solution for each curve. However, any set of curves where the signal results from the additive effect of one or more chemical species at equilibrium can potentially be modeled. Besides the major block of data, there are four words that Sivvu™ will recognize in the first column of the header rows of the spreadsheet: 1. Comment: indicates a row that will be totally ignored. It can be used multiple times. 2. Pathlength: Sivvu™ expects a number in the corresponding cell of column 2 which is the experimental pathlength in units of cm. If this line is not included, the assumed pathlength is 1 cm. The user may also input a distinct pathlength for each solution. 3. Solvent: Sivvu™ takes the contents of the corresponding cell of column 2 as the solvent name. If the solvent is not recognized by the program, Sivvu™ will ask for its density and dielectric constant, which are used for any activity coefficient calculations. 4. Temperature: Sivvu™ takes the number in the corresponding cell of column 2 for the experimental temperature in Kelvin. If this line is not included, the assumed temperature is 298 K. The header rows may be placed in any order in the spreadsheet, but must precede the data. Comment: Pathlength (cm): Solvent: Temperature (K): M++ Ligand BF4900 899 898 897 896 … 900 899 898 897 … This line is optional, and is 1 1 toluene 298 0.1 0.1 0 0.04 0.2 0.2 0.04369301 0.0470115 0.04439201 0.05059578 0.04186945 0.04922064 0.04229023 0.04666939 0.03923234 0.04789623 … … 1 1 1 1 … 1 1 1 1 … ignored by the program. 0.1 0.1 0.1 0.1 0.06 0.2 0.05090182 0.05434557 0.05312673 0.04944631 0.04886965 … 0.1 0.5 0.2 0.09089655 0.09110519 0.09252504 0.088114 0.08724912 … 0.1 0.7 0.2 0.10069366 0.10145286 0.10289368 0.10079306 0.0996298 … 1 1 1 1 … 1 1 1 1 … 1 1 1 1 … Figure 1. Archetypal Excel Spreadsheet Input File for Sivvu™ 5 After the header information, the first column must list reagent names followed by values for the independent axis for the absorbance data. Each subsequent column corresponds to one chemical solution. The first rows contain the composition, followed by the raw absorbance data. Figure 1 shows an example where the metal ion concentration is constant at 0.1 M for each solution, while the ligand concentration varies from 0.0 to 0.7 M. It is recommended to have 10 solutions per equivalent of titrant and hundreds of wavelengths. It is advantageous to order the reagents from least varying downward, and then sort the composition with the smallest equivalents to the left. For straightforward titrations, this means the first reagent listed should be the analyte, and the first absorbance curve should that of pure analyte. The program is designed to use units of molarity because it is directly proportional to absorbance. Other molar units of concentration can be used and the only substantial difference will be that the optimized ΔG°rxn values will be slightly different; the absorptivity values should then be understood to have corresponding units. The absorbance data is listed with the energy axis in the first column. Sivvu™ will annotate with wavelength in nanometers, but any scaling will work for the calculations. The file should be saved into a folder which will be selected as the working directory of the Matlab interface. Any references to the solutions within Sivvu™ will be to the ordinal position in the data file. The only limit to the number of solutions in a data file is the standard limit of 256 columns in an Excel worksheet. Upon loading a data file, the interface will display the temperature, pathlength, solvent, number of solutions and full wavelength range (see Figure 4). One final set of information that can be passed to the program through the input file is a block of numbers, identical in size to the data block. This block of numbers will be entry-by-entry multiplied with the residuals before the root-mean square residual is calculated and therefore functions as a weighting mask for the data. If desired, this block of numbers should be placed below the data block with exactly one blank row in the spreadsheet in between. A duplicate energy scale can be place in column A in front of the weighting mask, but it is not required. If only one row of numbers is placed below the data, then it will be used as a weighting mask as well, with a single value being used to weight all the datum in a column. B. Inspection Buttons: Preview & Factors Calculations can be restricted to an arbitrary energy range and step-size through the appropriate inputs on the Data Input GUI. Step-size limitations are accomplished through averaging. Sivvu™ will restrict the data and update the inputs to the closest possible values. The ‘Preview’ button simply plots the absorbance data upon which all calculations will be based in a new window, along with the composition data, for the user to inspect and verify. 6 Figure 2. Plot of absorbance data with composition data inset on log plot. The ‘Factors’ button will run an extensive protocol which aims to model the mathematical structure of the data without any chemical parameters. This is called model free factor analysis. A figure with six plots will appear. Figure 3. Model-free analysis of raw absorbance data. See text for detailed explanation. 7 The upper-left plot graphs the significance of the additive factors in the data (either the raw absorbance data or the effective molar absorptivity). A factor represents the portion of the data that derives from a single unique contributor. The significance of a factor derives from how much of the data can be additively reconstructed with it. The nth point in the plot is the ratio of the nth largest factor to the nth + 1 factor, less one. Using this formulation, the values for the factors for a set of random numbers (also plotted) will be nearly zero. Any non-random additive mathematical factors in the data will have a significance that is substantially more than that of the next most significant factor. Therefore, the number of factors in the data corresponds to the right-most value that is off the base line. For some datasets, this graphical feature is quite distinct, for others it can only be inferred with some uncertainty. The data used for Figure 3 clearly shows five significant factors because the 5th factor is over five times larger yet than the 6th, but after that all the factors are small and similar in size, so their significance is close to zero. Note: if the molar absorptivity curve of a factor is essentially, but not exactly, zero over the range of interest, its significance can be lost in the baseline. The other two plots on the top of the figure depict the result if the same factor analysis is carried out, not on the whole dataset as in the upper-left plot, but on a subset of the solutions. By starting with the first solution, which obviously contains just 1 full factor, and adding additional solutions in sequence, the evolution of factors in the dataset can be seen (upper-middle). Likewise the backwards evolution can be seen by starting with the last solution and working backwards (upper-right). These 2 plots can be used together to estimate the location within the dataset of particular factors, which is often a good indication of their empirical formula when compared to the composition of the solutions, which is plotted on the lower-left. Note this evolutionary factor analysis requires that the compositions in the original data file be ordered appropriately along a single reaction coordinate. The final two plots may take a little time to appear, but that is because a self-consistent calculation is running to find the best non-negative, unimodal profiles to model the data (lowermiddle). This is the best purely mathematical way to approximate the concentration profiles of a titration and will lead to a reasonable picture of the spectral signatures for each chemical species (lower-right). Remember that this is all done based solely on the structure of the data without any restrictions pertaining to the nature of chemical reactions. All output for these calculations is stored in a Matlab variable structure called ‘Sivvu_free’. C. Building a Chemical Model The true power of the program comes in the ability to force the concentration profiles to adhere to the strictures of equilibrium for chemical reactions. This requires the inputting of a set of balanced chemical reactions to relate the reagents to any potential chemical species in the model. i. Chemical Species A list of the chemical species present in the solutions, listed as ions, is the first requirement in constructing a thermodynamic model for the data. The equilibrium concentration of each species listed here will be calculated as part of the fitting process. Therefore, in the chemical species edit box, type in a name for all species to be used in the equilibrium reactions as well as all reagents listed in the input file. The species names are only labels and need not represent the stoichiometry of the compound. Separate the name of each species with a space. If desired, the charge on each species may be indicated with the correct number of plus’s or minus’s (e.g. Ni++). This is required only for using non-trivial activity models. The maximum limit at present is 12 species. 8 ii. Solvent The identity of the solvent will have been harvested from the data file if it existed there. The densities and dielectric constants for many solvents are already coded into the program (See Appendix A), but if another solvent is used a dialog box will pop up that allows the user to enter the density and dielectric constant of any solvent desired. iii. Activity Coefficient Model An activity coefficient, , is the proportionality constant between concentration (ideal activity) and true chemical activity, which is used in calculating equilibrium constants. There are five activity coefficient models to choose from in Sivvu™, each of which uses a different equation to calculate the activity coefficients. The first model is ‘none’, which simply assigns unity to all the activity coefficients. In the equations that follow, the variables A, B, Z, I, and a0 are used. Z is the charge vector. I is the ionic strength of a solution. A and B are calculated at a given temperature, T, using the solvent density, d, and dielectric constant, , of the solvent. Their equations are: A = 1823928 d (T ) B 3 50.3 T The equations for the other models are as follows. Davies: log AZ 2 Hückel: log AZ 2 Guntelberg: log AZ 2 Scatchard: log AZ 2 I 1 I I 0.3I 1 Ba 0 I I 1 I I 1 1.5 I The a0 values, which correspond to solution radii, are taken from the ‘Azero’ edit box. The Hückel model is the only one which uses a0 values, so if any other model is selected, the edit box for the a0 values will be disabled and filled in with a zero vector of the correct length. Non-zero a0 values are only used for charged species. Some common a0 values are: Ion Co2+ Ni2+ Cl- Br- BF4- NEt4+ Li+ a0 6 6 3 3 6 6 6 iv. Initial Spectra Guesses The initial spectra box takes three different types of values. Each number in this vector corresponds to a respective chemical species. If a species does not absorb, enter a zero into the box for it. Any positive number indicates that the species absorbs and that the spectrum is to be refined by Sivvu™. Finally, if the species absorbs and there is a solution amidst the data that corresponds 9 precisely to the curve of that species (at any concentration), enter the number of the solution, but as a negative value. This last feature allows the user to enter curves of species with known molar absorptivity values. Sivvu™ will use them in the refinement, but not change the molar absorptivity values or the concentrations of that solution. The program assumes that any unrefined solution consists of exactly one absorbing species, and a warning message will appear if the input file is not consistent with this, but the optimization can still be carried out. Figure 4. Sivvu™ Data Input GUI v. Mass Balance and Reaction Vectors After the chemical species box is filled in, the appropriate number of lines for the mass balance equations and chemical reaction equations will be enabled. The total number of mass equations needed is equal to the number of chemical species. A mass balance equation simply defines the reagent constitution of each chemical species. One mass balance equation is required for each reagent listed in the original data file. Therefore, a 10 mass balance vector edit box will be enabled with the name of a reagent in front of it. A vector with the same length as the chemical species list must be entered with each value indicating the number of reagent molecules in the corresponding chemical species. For example, if the list of chemical species was ‘BF4- Cl- Fe+++ FeCl++ FeCl2+’, then the mass balance vector for the reagent Clwould be ‘0 1 0 1 2’. After these vectors are filled in, the reaction vector mass equations remain. This is how chemical reactions are encoded. For each, a vector with the same length as the chemical species list must be entered with each value indicating the stoichiometry of the corresponding chemical species in an arbitrary chemical reaction. A positive number in the vector indicates a product in the reaction, while a negative number a reactant. For example, if the species list was ‘BF4- Li+ Br- LiBr Py NiBr4- NiBr2Py2 NiPy4++’, then the mass action vector for the creation of NiBr2Py2 from NiBr4-would be ‘0 0 2 0 -2 -1 1 0’. Note that Sivvu™ will not allow the user to continue until each mass action vector is balanced for mass and charge. Once this vector is entered, simply type an initial guess for the thermodynamic values for that reaction in the appropriate box on the left and click the radio button next to it to prepare to refine those values. Recommendation: Even though thermodynamically it does not make a difference how the chemical species are reacted with each other, some reaction schemes may be more logically advantageous. Since Sivvu™ has no prior knowledge of the chemical species, it is advisable when entering multiple chemical reactions to make them linear in relation to each other (A → B, B → C, C → D, etc.) rather than branching (A → B, A → C, A → D, etc.) whenever possible. This will help eliminate any arbitrary confusion between different species. Important Note: Dissociative reactions (A → B + C) are problematic for the program because they imply a hidden mass equation (B = C) and therefore should not be used. They can be completely avoided by simply setting up the input file with more fundamental reagents that do not dissociate. For example, rather than putting the concentration of reagent ‘A’ into the input file. Replace it with a row for ‘B’ and another row for ‘C’. Then the associative reaction (B + C → A) can be entered into Sivvu as a reaction vector. Note: If the initial guesses at the thermodynamic values lead to concentration profiles that are totally flat, then it may be helpful to input different guesses to ensure that the model is initially sensitive to the thermodynamic values. vi. Simulation Once a consistent chemical model is constructed, pressing the ‘Simulation’ button will open up a new Simulator figure in which the user can see the concentration profiles of all the chemical species in accord with the ΔG°rxn values that were input. In the Simulator, the concentration profiles can be compared with a model free result, and the impact of the ΔG°rxn values can be seen. This latter set of values quantifies how sensitive the concentration profiles are to the ΔG°rxn values themselves. This is important with non-linear dependence like that of equilibrium concentrations on equilibrium constants. All output for these calculations is stored in a Matlab variable structure called ‘Sivvu_sim’. 11 Figure 5: Simulation Interface D. Process Buttons: Save, Optimize, Copy, & Cancel ‘Save’ will first check over all of the vectors to make sure they are all of the same length, and notify the user if any of the chemical reactions are not balanced. If corrections are required, the save protocol will be aborted. The user must make the corrections and try to save again. The program will then pull in all of the information entered, place it in a Matlab variable structure called ‘Sivvu_input’, and create a file with this information in it, which is saved in the current directory. The file created has the extension ‘.mat’ and its name is the same as that of the data file. Once this file is saved, Sivvu™ will load it every time it processes the data file of the same name. To modify it, simply load it into Sivvu™ make the necessary changes and save the file again. The .mat file will be overwritten. ‘Optimize’ initiates several calculations on the data. The concentration profiles of all the species listed are calculated from the initial thermodynamic data provided (ΔG°rxn’s). Then the spectra of the absorbing species to be refined are determined with a least squares fitting protocol. Finally, the Sivvu™ Data Input GUI will be replaced with the Sivvu™ Control GUI. The ‘Continue’ button protocol will also automatically save the user input in the same manner as the ‘Save’ button if and only if the mat file does not yet exist. ‘Copy’ places a figure file of the Sivvu™ window on the computer clipboard from where it can be pasted into other programs. ‘Cancel’ closes Sivvu™ and clears all variable information from the Matlab environment. Any unsaved information will be lost. 12 III. The Sivvu™ Control Graphical User Interface Figure 6 shows six views of the Control GUI, which contains a set of user controls on the left, two sets of axes, and some key information in the bottom left. The concentration profiles will be immediately displayed on the main axes. The bottom set of axes always shows a bar graph of the root-mean-square residual for individual solutions or individual wavelengths, for both the equilibrium-restricted fit and the model-free fit. At this point several options are available. Figure 6: Six examples of the plot window in the Sivvu™ Control GUI. Notice the impact of the ignore solutions entries in the concentration plot on the top right. 13 A. User Control Buttons: Optimize ΔG° Values, Modify Model, Save ΔG° Values, Calculate Error, Write Report & Copy Page The ‘Optimize ΔG° Values’ button in the upper left corner of the screen begins the main function of the program; it will optimize the thermodynamic values (G°rxn’s) starting with the initial guesses and narrow in on the values that will produce the least error between the model and the data. Specifically, the root-mean-square of the residuals (the difference between the observed and calculated values) is minimized. The values will then be placed in a Matlab structure called ‘Sivvu_result.RefinedGvalues’. This process will normally take less than a minute as Sivvu™ can typically check 10 sets of thermodynamic parameters per second. The only output during the optimization will be visible in the Matlab command window. When complete the absorptivity values will automatically be plotted on the main graph. To end the optimization procedure prematurely, go to the command window and simultaneously press ‘Ctrl’ and ‘C’. This will terminate all procedures running in Matlab. The sliders at the top left of the interface allow the user to adjust the tolerances on the search for the minimum residual. Sivvu™ will optimize the G°rxn’s until the optimal values are constant within the specified tolerance and the root-mean-square residual is minimized within its tolerance. If you want the calculations to go even faster, select the ‘Suppress Output’ check box in the upper left corner. No running output will be written to the command line and so the calculation goes about twice as fast. The ‘Optimal Non-negativity’ checkbox toggles between two methods for calculating the molar absorptivity values at each wavelength. Optimization uses a Matlab function called nonnegative least squares, which is slow, but can lead to a better-behaved model, especially when the model is refining. Alternatively, the unrestricted least squares solution is found, and then any negative values are replaced with zeros. Thus the non-negativity is enforced via truncation and not optimization. The latter is much faster. Fortunately, both methods can lead to the best model, often with no detectable difference. However it is often useful to check both ways. If the initial RMS Residual value is very high, then switching to optimal non-negativity can often be helpful. The ‘Calculate Error’ button launches an extensive protocol to calculate the standard deviation in the optimized free energies in four different ways. First, G°rxn values are re-optimized repeatedly (40 times) with 50% of the wavelengths randomly ignored. The standard deviation of the re-optimized values for each free energy value is then calculated. The running output in the command line is suppressed during these calculations to save time. Note: the non-negativity setting is carried through here. Second, the G°rxn values are re-optimized repeatedly (40 times) after random normalized error is added to the dataset (0.0002 absorbance units is typical for modern bench-top spectrometers). The standard deviation of the re-optimized values for each free energy value is then calculated. From both standard deviation values, the 95% confidence value is calculated assuming a student’s T-distribution. Next, independent error is calculated by sequentially changing each G°rxn to a value one kJ/mol greater and one less than the optimized value, and calculating the associated increase in the RMS residual. This is a measure of how much each G°rxn independently affects the quality of fit. Finally, the dependent error is calculated in the same way as the independent error, except that the unchanged G°rxn values are re-optimized before the increase in RMS residual is calculated. This then measures how much a G°rxn in conjunction with the other values affects the quality of fit. The results of all error calculations are written to the command line and stored in ‘Sivvu_result’. They are also included when a report is generated. 14 The ‘Save G° values’ button saves any newly refined thermodynamic values into the .mat file of the corresponding data set. The next time the data file is loaded into the Sivvu™ Data Input GUI, the refined values will be in place of the original initial guesses, so the program will start any new refinement with more accurate values. At any time the ‘Modify Model’ button will send the user back to the Sivvu™ Data Input GUI. Initially, all of the data stored in the .mat file for the system will be loaded up, and any necessary changes can be made, or a new data file can be loaded. Note: Pressing the ‘Modify Model’ button will clear the ‘Sivvu_result’ variable structure and thus lose any optimized G°rxn values and error calculation results that have not been saved. The ‘Write Report’ button will create an excel spreadsheet containing relevant information about the thermodynamic model, the thermodynamic values and their standard deviations (if error calculations have been done), and the molar extinction coefficients for the absorbing species. It will be located in the working directory. The name will be the original data file name with the suffix ‘report’. The ‘Copy Page’ button will copy a screen shot of the Control GUI onto the windows clipboard that can be pasted into other windows programs. B. Plots: Absorbances, Concentration, Absorptivity, Solution Fits, Residual, & Residual Factors There are many options for plots to show on the large axes (see Figure 6 for examples), and many of the buttons produce alternative plots upon repeated calls. ‘Plot Absorbances’ will simply take the portion of the original data matrix that is used for the refinement and plot the absorbance data found there. ‘Plot Concentrations’ will plot the calculated equilibrium concentration profiles (concentration as a function of solution) for each of the absorbing species in the system; repeated calls on this button will change the x-axis to equivalents rather than solution number. ‘Plot Absorptivity’ will plot the molar absorptivity values; repeated calls cycle through individual chemical species. ‘Plot Solution Fits’ will display the observed and calculated data for individual data curves; repeated calls on this button will cycle through all of the curves in the dataset. ‘Plot Residuals’ button will calculate all the residuals in the system and display them as a contour map. Note: due to the slowness of this plotting feature for large datasets, the data is filtered. Repeated calls on this button will cycle through all filter sets. ‘Plot Residual Factors’ button will generate a plot of the mathematical factors in the residual matrix. The blue line is always the most significant and points to the primary deficiency in the model. C. Ignore Solutions If the bar graph of residual per solution at the bottom reveals that one or more solutions have an unusually high amount of residual relative to the others, Sivvu™ provides the option to ignore these solutions in its calculations. Type in the number or numbers of the solutions to be ignored in the ignore solutions edit box, with spaces between, and press enter. From that point on, Sivvu™ will leave these solutions out of all calculations. Therefore optimization of thermodynamic values will occur with these solutions ignored. To reinstate the solution(s), simply clear out the ignore solutions edit box and press enter, and Sivvu™ will be ready to optimize with that data included. 15 IV. Calculations A. Model-free Calculations The factor significance is determined using singular value decomposition, which takes advantage of the fact that every n × p matrix of real numbers, M, can be factored into three new matrices: M = U*S*V where U is an n × n unitary (U*U, where U is the conjugate transpose of U) matrix, V is an p × p unitary matrix, and S is an n × p nonnegative diagonal (only nonzero numbers are on diagonal) matrix. The values of the S matrix are a unique set of n or p (whichever is smaller) ranked positive numbers called the singular values and they correspond to the additive significance of the columns of the U and V matrices. The singular values then are the starting point for breaking down the data. The sum of all the singular values corresponds to all of the information (signal and error) in the dataset. The contribution of the m largest singular values corresponds to the fraction of the data that can be accounted for with m factors. For random error, the decrease in a list of singular values is characteristically slow (~10%). If there are meaningful signals in the data, the list of the singular values will start off quite high and eventually drop (often precipitously) once all the significant additive features have been accounted for. After the drop, the decrease in the singular values will behave as it would for random numbers. See Figure 3, top left, for an example of this. Ultimately, we want to factor the data matrix into just two matrices, one n × m and the other m × p, because this corresponds to the molar absorptivity and concentration factors in Beer’s law. However, there are many solutions to this factorization over the set of real numbers. Requiring the matrices to consist of entirely non-negative numbers, which is true of all chemically sensible answers, greatly reduces the number of optimal solutions. Furthermore, restricting one of the matrices to contain factor traces that are unimodal (possessing only 1 maximum), which is true of the equilibrium concentration of species as a function of a single reaction coordinate, the number of optimal solutions can be reduced further and possibly even to one. This is the type of calculation that the model-free figure is based on. The starting point is derived from the evolutionary factor analysis which identifies when during the course of the titration each species appears and disappears. The final answer is achieved through iteratively calculating one matrix from the other and vice versa until self-consistency is achieved. Within each iteration, the non-negative and unimodal restraints are imposed.3 The exact calculations for this within Sivvu™ are somewhat rudimentary. This is because they are designed to aid in the establishment of a viable chemical model and not be used as standalone results. More thorough calculations can be done, but are quite time-consuming. 16 B. Equilibrium Restricted Calculations Sivvu™ makes use of two major mathematical relationships in chemistry. The first is the Beer-Lambert law for spectroscopic data. Because it is an additive law it can be expanded to include multiple species and is therefore directly amenable to factor analysis in all of its forms. It can be conveniently written in matrix form according to the following equation (the pathlength is accounted for by dividing it into the raw absorbance):4 Abs np nm C mp m (n x m) Equilibrium Activities (m x p) n = number of wavelengths p = number of solution mixtures Molar Absorptivity Absorbance (n x p) Residual (n x p) m = number of pure chemical species in the system. Figure 7: The Beer-Lambert Law in schematic matrix form. The second mathematical relationship is that of chemical potential which leads to equilibrium concentrations. These greatly restrict the model because the entire concentration matrix is now dependent on just a few independent values. ΔG°rxn = -RTlnK - RTlnA lnK = niMproducts - niMreactants lnA = niγproducts - niγreactants Here γ is the activity coefficient, which is calculated from the concentration terms in a variety of ways (see section II.C.iii). For a given set of ΔG°rxn values, the equilibrium concentrations and activities are found that minimize the free energy, making ΔGrxn = 0. The program begins by using the initial ΔG°rxn values for the equilibrium reactions to calculate initial equilibrium activities and concentrations that satisfy the mass and charge balance parameters.5 With matrices of absorbance data and concentrations, Sivvu™ can then directly optimize the molar absorptivity of each absorbing species at each wavelength by least squares minimization. This calculation amounts to solving (n × p) simultaneous linear equations for (m × p) molar absorptivity values, where p is the number of solutions, m is the number of species, and n is the number of wavelengths. Since m is often much smaller than n, this system of equations is largely over-determined, which means that rather than be solved exactly, it is solved with error. 17 Now, having found spectral and concentration data for each species, Sivvu™ calculates the difference between the measured absorbance data and the product of the optimized molar absorptivity values with the calculated concentrations. The root-mean-square is taken of these residuals to ascertain the quality of fit for the data. Finally, Matlab searches for thermodynamic values that minimize the total root-mean-square residual. Within the Matlab environment, Sivvu™ works with several variable structures for storing all of the important values used during the calculations. ‘Sivvu_data’ contains all of the important information from the input data file, and once populated this structure is not changed unless a new data file is called. ‘Sivvu_input’ contains user-defined information on the thermodynamic model, and ‘Sivvu_result’ contains the data actually used in calculations, trimmed down according to user specifications, as well as the output from the optimization calculations. ‘Sivvu_sim’ and ‘Sivvu_free’ are two additional variable structures used to store information relevant to the simulator and the model free analysis, respectively. The contents of any variable structure can be viewed by typing its name in the Matlab command line. There are also several other variables that exist while Sivvu™ is being used. They include ‘filename’, which contains the name of the data file; and track, which records which buttons have been pressed in the User Control GUI. C. Error Calculations The primary error calculation is based on the residuals – the point by point difference between the calculated and measured absorbance values. The squares of these values are averaged. The square root of this average is what is presented as total RMS error, and this is the value that is minimized. The program will also print out several other numbers which quantify the degree of fit. The unrestricted data reconstruction is the sum of the first m singular values of the data matrix divided by the sum of all of them. This represents the fraction of data that can be accounted for with m mathematical factors and no restrictions. The restricted data reconstruction is the sum of the singular values for the calculated data matrix divided by the sum of the singular values for the observed data matrix. This represents the fraction of data that can be accounted for with m factors that are forced to adhere to equilibrium concentrations according to the chemical model. All forms of factor analysis have the added benefit of reducing error because much of the error, which is distributed evenly, is left behind when only a few factors are used to reconstruct the data.6 Some error, called imbedded error, does remain however. = - 18 Appendix A: Data for Common Solvents Name Structure Density BP (g/ml) (oC) dipole moment dielectric constant acetic acid 1.049 118 1.74 6.15 acetone 0.79 56 2.88 20.7 acetonitrile 0.786 81 3.92 36.6 benzene 0.879 80 0 2.28 0.81 118 1.66 17.8 1.12 204 4.4 39.1 1.594 1.3266 0.713 76 40 35 0 1.14 1.15 2.24 9.1 4.34 N,N-dimethylformamide (DMF) 0.944 153 3.82 38.3 dimethyl sulfoxide (DMSO) 1.092 189 3.96 47.2 0.789 78 1.69 24.3 ethyl acetate 0.894 78 1.78 6.02 formamide 1.13 210 3.73 109 formic acid 1.22 100 1.41 58 0.655 0.791 69 68 0 1.70 2.02 33 0.805 80 2.78 18.5 1.326 0.803 40 97 1.60 1.68 9.08 20.1 0.886 66 1.63 7.52 0.998 100 1.85 80 1-butanol -butyrolactone carbon tetrachloride Dichloromethane diethyl ether ethanol hexane methanol CH3CH2CH2CH2-OH O O CCl4 CCl2H2 CH3CH2OCH2CH3 CH3CH2-OH CH3(CH2)4 CH3 CH3-OH methyl ethyl ketone methylene chloride 1-propanol Tetrahydrofuran (THF) water CH2Cl2 CH3CH2CH2-OH H-OH 19 References 1 Malinowski, E.R. Factor Analysis in Chemistry, 3rd Ed. Wiley Interscience, 2002. Hirose, K. Analytical Methods in Supramolecular Chemistry. Ed. C. Schalley, Wiley, 2007, chapter 2. 3 Gampp, H.; Maeder, M.; Meyer, C.J.; Zuberbuehler, A.D. Talanta, 1986, 33, 943-951. 4 Wallace, R.M. J. Phys. Chem., 1960, 64, 899-901. 5 Wall, T.W.; Greening, D.; Woolsey, R.E.D. Oper. Res., 1986, 34, 345-355. 6 Malinowski, E.R. Factor Analysis in Chemistry, 3rd Ed. Wiley Interscience, 2002, chapter 4. 2 20