S I V V U ™ Department of Chemistry & Biochemistry

advertisement
SIVVU
™
Dr. Douglas Vander Griend
Michael DeVries
Department of Chemistry & Biochemistry
Calvin College
Grand Rapids, MI 49546-4403
Contact: Sivvu@calvin.edu
=

-
Outline
I. Getting Started
A. Overview
B. Key Features
C. Limitations
D. Loading the Program
II. The Sivvu™ Data Input Graphical User Interface
A. Data File
B. Inspection Buttons: Preview & Factors
C. Building a Chemical Model
i. Solvent
ii. Activity Coefficient Model
iii. Chemical Species
iv. Initial Spectra Guesses
v. Mass Balance and Chemical Reaction Equations
vi. Simulation Button
D. Process Buttons: Save, Optimize, Copy, & Cancel
III. The Sivvu™ Control Graphical User Interface
A. User Control Buttons: Optimize ΔG° Values, Modify Model, Save ΔG° Values,
Calculate Error, Write Report & Copy Page
B. Plots: Absorbance, Concentration, Absorptivity, Solution Fits, Residual, Residual
Factors
C. Ignore Solutions
IV. Calculations
A. Unrestricted Model Calculations
B. Equilibrium Restricted Model Calculations
C. Error Calculations
Appendix
1. Data for Solvents
2
=

-
I. Getting Started
A. Overview
Sivvu™ – the name says it all (UV-vis backwords). In a UV-vis spectrometer, molecular
species achieve chemical equilibrium and absorb light according to Beer’s Law. This program is
designed to start with absorbance data for a series of solutions and ultimately extract the
thermodynamic information that describes the chemical equilibria of the solutions, as well as the
molar absorptivity values of all the individual species in solution.
How does it work? The user must provide a set of chemical reactions that potentially govern all
the equilibria in the system along with the raw absorbance data and information on the makeup of
each solution. Starting from initial guesses for the ΔG°rxn values, Sivvu™ refines them in order to
best model the entire dataset – all in a matter of seconds.
B. Key Features
 Easy to load data – one Excel based input file
 Easily transportable results – Excel files and clipboard images
 Models raw absorbance or effective absorptivity
 Graphical user interfaces
 Extensive ‘model-free’ analysis to aide in deconvolution of data
 Self regulation of thermodynamic model construction
 Ready simulation to aide in experimental design
 High speed analysis: full optimizations in seconds
 Calculates activity coefficients according to various literature models
 Calculates standard deviation and confidence limits for thermodynamic parameters
 Estimates dependent and independent sensitivity of optimized ΔG°rxn values
 Numerous inquiry buttons to help guide new users
C. Limitations
Sivvu™ is a powerful tool for studying solution phase thermodynamics and for obtaining
information about pure chemical compounds without the need to isolate them chemically. While a
great deal of information can be extracted from a dataset by modeling it, the calculations are not
magic. The program can only extract information as it exists in the dataset based on the
mathematical relationships of chemical equilibria and Beer’s Law. Furthermore, the amount of
information about a particular species is proportional to the total level of occurrence within the set of
solutions. Therefore, if it only exists at a very low level in all the solutions, then the uncertainty of
the molar absorptivity values for that species must be understood to be quite large. For a thorough
treatment of the power and limitations of such a method, please see Malinowski’s Factor Analysis in
Chemistry.1
3
Additionally, the resolution between two species may be poor within the dataset, particularly
if their chemical or spectroscopic behaviors are comparable. For example, if two species grow in
comparably over the course of a titration, their molar absorptivity values may be irresolvable. This
can be true of the reactant and product in dimerization reactions for example.
Finally, since the dependence of equilibrium concentration values on the equilibrium
constant is non-linear, there are regimes where concentration, and therefore absorbance values, are
insensitive to the equilibrium constant.
1. When the equilibrium constant or the initial concentrations are too large.
2. When the amount of product is too small (< 20%) or too large (>80%).
In these regimes the equilibrium constants cannot be modeled precisely, because their sensitivity to
the data is too low.2
D. Loading the Program
Matlab must already be installed. Four files are generally included with the program:
1. ‘Sivvu.zip’
2. ‘Sivvu Manual.doc’
3. ‘example.xls’
4. ‘example.mat’ (this file may not display the ‘.mat’ extension, but it will appear when
viewed from within Matlab).
Unzip the main folder into Matlab’s toolbox folder, which is where all m-files for Matlab are
typically grouped and stored. Place the example excel and matlab file into Matlab’s working
directory, which by default is called ‘work’, but can be changed from within the Matlab
environment. Note: the data file must be in whichever folder is designated as the working directory.
From within the Matlab environment, select ‘set path’ under the file menu to bring up a
pathway window. If a previous version of Sivvu™ has been installed, remove the folder from the
path directory, then add the newly unzipped folder back to the path directory. This will ensure that
the all calls to m-files within Sivvu™ will find the proper versions.
To initialize Sivvu™ type ‘Sivvu’ into the command line of Matlab, then load in the data file
by typing the name of the file into the appropriate box in the data input GUI that appears. This can
also be done in one step by typing “Sivvu (‘filename’)” in the command line.
Example: Nickel(II) with 2,2’-bipyridine
The example file provided is based on work detailed in the paper "Detailed
Spectroscopic, Thermodynamic and Kinetic Characterization of Nickel(II) Complexes
with 2,2’-bipyridine and 1,10-Phenanthroline attained via Equilibrium-Restricted
Factor Analysis," Vander Griend, D. A.; Bediako, D. K.; DeVries, M. J.; DeJong N. A.;
Heeringa, L. P., Inorg. Chem. 2008, 47, 656-662.
To view it, make sure that the both example files are in the working directory of
Matlab, and that the Sivvu m-files have been properly installed in the Matlab toolbox.
Then enter “Sivvu(‘example’)” into the command line. After a few seconds, the
interface window should appear with all the appropriate entries completed.
4
II. The Sivvu™ Data Input Graphical User Interface
The initial interface of the program allows the user to upload a data file, graph the data,
examine the mathematical structure of the data, build a thermodynamic model, and simulate the
effects of dilution/concentration or temperature change on the experiment. Click on ‘?’ buttons to
find brief explanations for the interface.
A. The Data File
Sivvu™ requires exactly one input file containing all relevant experimental data. It must be
the first worksheet of a Microsoft Excel spreadsheet (.xls), organized as shown in Figure 1. A typical
dataset will be a set of absorbance curves from a spectrophotometric titration, along with the
composition of the solution for each curve. However, any set of curves where the signal results from
the additive effect of one or more chemical species at equilibrium can potentially be modeled.
Besides the major block of data, there are four words that Sivvu™ will recognize in the first
column of the header rows of the spreadsheet:
1. Comment: indicates a row that will be totally ignored. It can be used multiple times.
2. Pathlength: Sivvu™ expects a number in the corresponding cell of column 2 which is
the experimental pathlength in units of cm. If this line is not included, the assumed
pathlength is 1 cm. The user may also input a distinct pathlength for each solution.
3. Solvent: Sivvu™ takes the contents of the corresponding cell of column 2 as the
solvent name. If the solvent is not recognized by the program, Sivvu™ will ask for its
density and dielectric constant, which are used for any activity coefficient
calculations.
4. Temperature: Sivvu™ takes the number in the corresponding cell of column 2 for the
experimental temperature in Kelvin. If this line is not included, the assumed
temperature is 298 K.
The header rows may be placed in any order in the spreadsheet, but must precede the data.
Comment:
Pathlength (cm):
Solvent:
Temperature (K):
M++
Ligand
BF4900
899
898
897
896
…
900
899
898
897
…
This line is optional, and is
1
1
toluene
298
0.1
0.1
0
0.04
0.2
0.2
0.04369301
0.0470115
0.04439201 0.05059578
0.04186945 0.04922064
0.04229023 0.04666939
0.03923234 0.04789623
…
…
1
1
1
1
…
1
1
1
1
…
ignored by the program.
0.1
0.1
0.1
0.1
0.06
0.2
0.05090182
0.05434557
0.05312673
0.04944631
0.04886965
…
0.1
0.5
0.2
0.09089655
0.09110519
0.09252504
0.088114
0.08724912
…
0.1
0.7
0.2
0.10069366
0.10145286
0.10289368
0.10079306
0.0996298
…
1
1
1
1
…
1
1
1
1
…
1
1
1
1
…
Figure 1. Archetypal Excel Spreadsheet Input File for Sivvu™
5
After the header information, the first column must list reagent names followed by values for
the independent axis for the absorbance data. Each subsequent column corresponds to one chemical
solution. The first rows contain the composition, followed by the raw absorbance data. Figure 1
shows an example where the metal ion concentration is constant at 0.1 M for each solution, while the
ligand concentration varies from 0.0 to 0.7 M. It is recommended to have 10 solutions per equivalent
of titrant and hundreds of wavelengths. It is advantageous to order the reagents from least varying
downward, and then sort the composition with the smallest equivalents to the left. For
straightforward titrations, this means the first reagent listed should be the analyte, and the first
absorbance curve should that of pure analyte.
The program is designed to use units of molarity because it is directly proportional to
absorbance. Other molar units of concentration can be used and the only substantial difference will
be that the optimized ΔG°rxn values will be slightly different; the absorptivity values should then be
understood to have corresponding units.
The absorbance data is listed with the energy axis in the first column. Sivvu™ will annotate
with wavelength in nanometers, but any scaling will work for the calculations.
The file should be saved into a folder which will be selected as the working directory of the
Matlab interface. Any references to the solutions within Sivvu™ will be to the ordinal position in the
data file. The only limit to the number of solutions in a data file is the standard limit of 256 columns
in an Excel worksheet.
Upon loading a data file, the interface will display the temperature, pathlength, solvent,
number of solutions and full wavelength range (see Figure 4).
One final set of information that can be passed to the program through the input file is a
block of numbers, identical in size to the data block. This block of numbers will be entry-by-entry
multiplied with the residuals before the root-mean square residual is calculated and therefore
functions as a weighting mask for the data. If desired, this block of numbers should be placed below
the data block with exactly one blank row in the spreadsheet in between. A duplicate energy scale
can be place in column A in front of the weighting mask, but it is not required. If only one row of
numbers is placed below the data, then it will be used as a weighting mask as well, with a single
value being used to weight all the datum in a column.
B. Inspection Buttons: Preview & Factors
Calculations can be restricted to an arbitrary energy range and step-size through the
appropriate inputs on the Data Input GUI. Step-size limitations are accomplished through averaging.
Sivvu™ will restrict the data and update the inputs to the closest possible values.
The ‘Preview’ button simply plots the absorbance data upon which all calculations will be
based in a new window, along with the composition data, for the user to inspect and verify.
6
Figure 2. Plot of absorbance data with composition data inset on log plot.
The ‘Factors’ button will run an extensive protocol which aims to model the mathematical
structure of the data without any chemical parameters. This is called model free factor analysis. A
figure with six plots will appear.
Figure 3. Model-free analysis of raw absorbance data. See text for detailed explanation.
7
The upper-left plot graphs the significance of the additive factors in the data (either the raw
absorbance data or the effective molar absorptivity). A factor represents the portion of the data that
derives from a single unique contributor. The significance of a factor derives from how much of the
data can be additively reconstructed with it. The nth point in the plot is the ratio of the nth largest
factor to the nth + 1 factor, less one. Using this formulation, the values for the factors for a set of
random numbers (also plotted) will be nearly zero. Any non-random additive mathematical factors in
the data will have a significance that is substantially more than that of the next most significant
factor. Therefore, the number of factors in the data corresponds to the right-most value that is off the
base line. For some datasets, this graphical feature is quite distinct, for others it can only be inferred
with some uncertainty. The data used for Figure 3 clearly shows five significant factors because the
5th factor is over five times larger yet than the 6th, but after that all the factors are small and similar in
size, so their significance is close to zero. Note: if the molar absorptivity curve of a factor is
essentially, but not exactly, zero over the range of interest, its significance can be lost in the baseline.
The other two plots on the top of the figure depict the result if the same factor analysis is
carried out, not on the whole dataset as in the upper-left plot, but on a subset of the solutions. By
starting with the first solution, which obviously contains just 1 full factor, and adding additional
solutions in sequence, the evolution of factors in the dataset can be seen (upper-middle). Likewise
the backwards evolution can be seen by starting with the last solution and working backwards
(upper-right). These 2 plots can be used together to estimate the location within the dataset of
particular factors, which is often a good indication of their empirical formula when compared to the
composition of the solutions, which is plotted on the lower-left. Note this evolutionary factor
analysis requires that the compositions in the original data file be ordered appropriately along a
single reaction coordinate.
The final two plots may take a little time to appear, but that is because a self-consistent
calculation is running to find the best non-negative, unimodal profiles to model the data (lowermiddle). This is the best purely mathematical way to approximate the concentration profiles of a
titration and will lead to a reasonable picture of the spectral signatures for each chemical species
(lower-right). Remember that this is all done based solely on the structure of the data without any
restrictions pertaining to the nature of chemical reactions.
All output for these calculations is stored in a Matlab variable structure called ‘Sivvu_free’.
C. Building a Chemical Model
The true power of the program comes in the ability to force the concentration profiles to
adhere to the strictures of equilibrium for chemical reactions. This requires the inputting of a set of
balanced chemical reactions to relate the reagents to any potential chemical species in the model.
i. Chemical Species
A list of the chemical species present in the solutions, listed as ions, is the first requirement
in constructing a thermodynamic model for the data. The equilibrium concentration of each species
listed here will be calculated as part of the fitting process. Therefore, in the chemical species edit
box, type in a name for all species to be used in the equilibrium reactions as well as all reagents
listed in the input file. The species names are only labels and need not represent the stoichiometry of
the compound. Separate the name of each species with a space. If desired, the charge on each species
may be indicated with the correct number of plus’s or minus’s (e.g. Ni++). This is required only for
using non-trivial activity models. The maximum limit at present is 12 species.
8
ii. Solvent
The identity of the solvent will have been harvested from the data file if it existed there. The
densities and dielectric constants for many solvents are already coded into the program (See
Appendix A), but if another solvent is used a dialog box will pop up that allows the user to enter the
density and dielectric constant of any solvent desired.
iii. Activity Coefficient Model
An activity coefficient, , is the proportionality constant between concentration (ideal
activity) and true chemical activity, which is used in calculating equilibrium constants. There are
five activity coefficient models to choose from in Sivvu™, each of which uses a different equation to
calculate the activity coefficients. The first model is ‘none’, which simply assigns unity to all the
activity coefficients.
In the equations that follow, the variables A, B, Z, I, and a0 are used. Z is the charge vector. I
is the ionic strength of a solution. A and B are calculated at a given temperature, T, using the solvent
density, d, and dielectric constant, , of the solvent. Their equations are:
A = 1823928 
d
(T )
B
3
50.3
T
The equations for the other models are as follows.
Davies:
 log   AZ 2 
Hückel:
 log   AZ 2 
Guntelberg:
 log   AZ 2 
Scatchard:
 log   AZ 2 
I
1 I
I
 0.3I
1  Ba 0 I
I
1 I
I
1  1.5 I
The a0 values, which correspond to solution radii, are taken from the ‘Azero’ edit box. The
Hückel model is the only one which uses a0 values, so if any other model is selected, the edit box for
the a0 values will be disabled and filled in with a zero vector of the correct length. Non-zero a0
values are only used for charged species. Some common a0 values are:
Ion Co2+ Ni2+ Cl- Br- BF4- NEt4+ Li+
a0
6
6
3
3
6
6
6
iv. Initial Spectra Guesses
The initial spectra box takes three different types of values. Each number in this vector
corresponds to a respective chemical species. If a species does not absorb, enter a zero into the box
for it. Any positive number indicates that the species absorbs and that the spectrum is to be refined
by Sivvu™. Finally, if the species absorbs and there is a solution amidst the data that corresponds
9
precisely to the curve of that species (at any concentration), enter the number of the solution, but as a
negative value.
This last feature allows the user to enter curves of species with known molar absorptivity
values. Sivvu™ will use them in the refinement, but not change the molar absorptivity values or the
concentrations of that solution. The program assumes that any unrefined solution consists of exactly
one absorbing species, and a warning message will appear if the input file is not consistent with this,
but the optimization can still be carried out.
Figure 4. Sivvu™ Data Input GUI
v.
Mass Balance and Reaction Vectors
After the chemical species box is filled in, the appropriate number of lines for the mass
balance equations and chemical reaction equations will be enabled. The total number of mass
equations needed is equal to the number of chemical species.
A mass balance equation simply defines the reagent constitution of each chemical species.
One mass balance equation is required for each reagent listed in the original data file. Therefore, a
10
mass balance vector edit box will be enabled with the name of a reagent in front of it. A vector with
the same length as the chemical species list must be entered with each value indicating the number
of reagent molecules in the corresponding chemical species. For example, if the list of chemical
species was ‘BF4- Cl- Fe+++ FeCl++ FeCl2+’, then the mass balance vector for the reagent Clwould be ‘0 1 0 1 2’.
After these vectors are filled in, the reaction vector mass equations remain. This is how
chemical reactions are encoded. For each, a vector with the same length as the chemical species list
must be entered with each value indicating the stoichiometry of the corresponding chemical species
in an arbitrary chemical reaction. A positive number in the vector indicates a product in the reaction,
while a negative number a reactant. For example, if the species list was ‘BF4- Li+ Br- LiBr Py
NiBr4- NiBr2Py2 NiPy4++’, then the mass action vector for the creation of NiBr2Py2 from NiBr4-would be ‘0 0 2 0 -2 -1 1 0’. Note that Sivvu™ will not allow the user to continue until each mass
action vector is balanced for mass and charge. Once this vector is entered, simply type an initial
guess for the thermodynamic values for that reaction in the appropriate box on the left and click the
radio button next to it to prepare to refine those values.
Recommendation: Even though thermodynamically it does not make a difference how the
chemical species are reacted with each other, some reaction schemes may be more logically
advantageous. Since Sivvu™ has no prior knowledge of the chemical species, it is advisable when
entering multiple chemical reactions to make them linear in relation to each other (A → B, B → C,
C → D, etc.) rather than branching (A → B, A → C, A → D, etc.) whenever possible. This will help
eliminate any arbitrary confusion between different species.
Important Note: Dissociative reactions (A → B + C) are problematic for the program
because they imply a hidden mass equation (B = C) and therefore should not be used. They can be
completely avoided by simply setting up the input file with more fundamental reagents that do not
dissociate. For example, rather than putting the concentration of reagent ‘A’ into the input file.
Replace it with a row for ‘B’ and another row for ‘C’. Then the associative reaction (B + C → A)
can be entered into Sivvu as a reaction vector.
Note: If the initial guesses at the thermodynamic values lead to concentration profiles that
are totally flat, then it may be helpful to input different guesses to ensure that the model is initially
sensitive to the thermodynamic values.
vi.
Simulation
Once a consistent chemical model is constructed, pressing the ‘Simulation’ button will open
up a new Simulator figure in which the user can see the concentration profiles of all the chemical
species in accord with the ΔG°rxn values that were input. In the Simulator, the concentration profiles
can be compared with a model free result, and the impact of the ΔG°rxn values can be seen. This
latter set of values quantifies how sensitive the concentration profiles are to the ΔG°rxn values
themselves. This is important with non-linear dependence like that of equilibrium concentrations on
equilibrium constants. All output for these calculations is stored in a Matlab variable structure called
‘Sivvu_sim’.
11
Figure 5: Simulation Interface
D. Process Buttons: Save, Optimize, Copy, & Cancel
‘Save’ will first check over all of the vectors to make sure they are all of the same length, and
notify the user if any of the chemical reactions are not balanced. If corrections are
required, the save protocol will be aborted. The user must make the corrections and
try to save again. The program will then pull in all of the information entered, place it
in a Matlab variable structure called ‘Sivvu_input’, and create a file with this
information in it, which is saved in the current directory. The file created has the
extension ‘.mat’ and its name is the same as that of the data file. Once this file is
saved, Sivvu™ will load it every time it processes the data file of the same name. To
modify it, simply load it into Sivvu™ make the necessary changes and save the file
again. The .mat file will be overwritten.
‘Optimize’ initiates several calculations on the data. The concentration profiles of all the
species listed are calculated from the initial thermodynamic data provided (ΔG°rxn’s).
Then the spectra of the absorbing species to be refined are determined with a least
squares fitting protocol. Finally, the Sivvu™ Data Input GUI will be replaced with
the Sivvu™ Control GUI. The ‘Continue’ button protocol will also automatically
save the user input in the same manner as the ‘Save’ button if and only if the mat file
does not yet exist.
‘Copy’ places a figure file of the Sivvu™ window on the computer clipboard from where it
can be pasted into other programs.
‘Cancel’ closes Sivvu™ and clears all variable information from the Matlab environment.
Any unsaved information will be lost.
12
III. The Sivvu™ Control Graphical User Interface
Figure 6 shows six views of the Control GUI, which contains a set of user controls on the
left, two sets of axes, and some key information in the bottom left. The concentration profiles will be
immediately displayed on the main axes. The bottom set of axes always shows a bar graph of the
root-mean-square residual for individual solutions or individual wavelengths, for both the
equilibrium-restricted fit and the model-free fit. At this point several options are available.
Figure 6: Six examples of the plot window in the Sivvu™ Control GUI. Notice the impact of
the ignore solutions entries in the concentration plot on the top right.
13
A. User Control Buttons: Optimize ΔG° Values, Modify Model, Save ΔG° Values, Calculate
Error, Write Report & Copy Page
The ‘Optimize ΔG° Values’ button in the upper left corner of the screen begins the main
function of the program; it will optimize the thermodynamic values (G°rxn’s) starting with the
initial guesses and narrow in on the values that will produce the least error between the model and
the data. Specifically, the root-mean-square of the residuals (the difference between the observed
and calculated values) is minimized. The values will then be placed in a Matlab structure called
‘Sivvu_result.RefinedGvalues’. This process will normally take less than a minute as Sivvu™ can
typically check 10 sets of thermodynamic parameters per second. The only output during the
optimization will be visible in the Matlab command window. When complete the absorptivity values
will automatically be plotted on the main graph. To end the optimization procedure prematurely, go
to the command window and simultaneously press ‘Ctrl’ and ‘C’. This will terminate all procedures
running in Matlab.
The sliders at the top left of the interface allow the user to adjust the tolerances on the search
for the minimum residual. Sivvu™ will optimize the G°rxn’s until the optimal values are constant
within the specified tolerance and the root-mean-square residual is minimized within its tolerance.
If you want the calculations to go even faster, select the ‘Suppress Output’ check box in the upper
left corner. No running output will be written to the command line and so the calculation goes about
twice as fast.
The ‘Optimal Non-negativity’ checkbox toggles between two methods for calculating the
molar absorptivity values at each wavelength. Optimization uses a Matlab function called nonnegative least squares, which is slow, but can lead to a better-behaved model, especially when the
model is refining. Alternatively, the unrestricted least squares solution is found, and then any
negative values are replaced with zeros. Thus the non-negativity is enforced via truncation and not
optimization. The latter is much faster. Fortunately, both methods can lead to the best model, often
with no detectable difference. However it is often useful to check both ways. If the initial RMS
Residual value is very high, then switching to optimal non-negativity can often be helpful.
The ‘Calculate Error’ button launches an extensive protocol to calculate the standard
deviation in the optimized free energies in four different ways. First, G°rxn values are re-optimized
repeatedly (40 times) with 50% of the wavelengths randomly ignored. The standard deviation of the
re-optimized values for each free energy value is then calculated. The running output in the
command line is suppressed during these calculations to save time. Note: the non-negativity setting
is carried through here.
Second, the G°rxn values are re-optimized repeatedly (40 times) after random normalized
error is added to the dataset (0.0002 absorbance units is typical for modern bench-top
spectrometers). The standard deviation of the re-optimized values for each free energy value is then
calculated. From both standard deviation values, the 95% confidence value is calculated assuming a
student’s T-distribution.
Next, independent error is calculated by sequentially changing each G°rxn to a value one
kJ/mol greater and one less than the optimized value, and calculating the associated increase in the
RMS residual. This is a measure of how much each G°rxn independently affects the quality of fit.
Finally, the dependent error is calculated in the same way as the independent error, except
that the unchanged G°rxn values are re-optimized before the increase in RMS residual is calculated.
This then measures how much a G°rxn in conjunction with the other values affects the quality of fit.
The results of all error calculations are written to the command line and stored in
‘Sivvu_result’. They are also included when a report is generated.
14
The ‘Save G° values’ button saves any newly refined thermodynamic values into the .mat
file of the corresponding data set. The next time the data file is loaded into the Sivvu™ Data Input
GUI, the refined values will be in place of the original initial guesses, so the program will start any
new refinement with more accurate values.
At any time the ‘Modify Model’ button will send the user back to the Sivvu™ Data Input
GUI. Initially, all of the data stored in the .mat file for the system will be loaded up, and any
necessary changes can be made, or a new data file can be loaded.
Note: Pressing the ‘Modify Model’ button will clear the ‘Sivvu_result’ variable structure and
thus lose any optimized G°rxn values and error calculation results that have not been saved.
The ‘Write Report’ button will create an excel spreadsheet containing relevant information
about the thermodynamic model, the thermodynamic values and their standard deviations (if error
calculations have been done), and the molar extinction coefficients for the absorbing species. It will
be located in the working directory. The name will be the original data file name with the suffix
‘report’.
The ‘Copy Page’ button will copy a screen shot of the Control GUI onto the windows
clipboard that can be pasted into other windows programs.
B. Plots: Absorbances, Concentration, Absorptivity, Solution Fits, Residual, & Residual
Factors
There are many options for plots to show on the large axes (see Figure 6 for examples), and
many of the buttons produce alternative plots upon repeated calls.
‘Plot Absorbances’ will simply take the portion of the original data matrix that is used for the
refinement and plot the absorbance data found there.
‘Plot Concentrations’ will plot the calculated equilibrium concentration profiles
(concentration as a function of solution) for each of the absorbing species in the
system; repeated calls on this button will change the x-axis to equivalents rather than
solution number.
‘Plot Absorptivity’ will plot the molar absorptivity values; repeated calls cycle through
individual chemical species.
‘Plot Solution Fits’ will display the observed and calculated data for individual data curves;
repeated calls on this button will cycle through all of the curves in the dataset.
‘Plot Residuals’ button will calculate all the residuals in the system and display them as a
contour map. Note: due to the slowness of this plotting feature for large datasets, the
data is filtered. Repeated calls on this button will cycle through all filter sets.
‘Plot Residual Factors’ button will generate a plot of the mathematical factors in the residual
matrix. The blue line is always the most significant and points to the primary
deficiency in the model.
C. Ignore Solutions
If the bar graph of residual per solution at the bottom reveals that one or more solutions have
an unusually high amount of residual relative to the others, Sivvu™ provides the option to ignore
these solutions in its calculations. Type in the number or numbers of the solutions to be ignored in
the ignore solutions edit box, with spaces between, and press enter. From that point on, Sivvu™ will
leave these solutions out of all calculations. Therefore optimization of thermodynamic values will
occur with these solutions ignored. To reinstate the solution(s), simply clear out the ignore solutions
edit box and press enter, and Sivvu™ will be ready to optimize with that data included.
15
IV. Calculations
A. Model-free Calculations
The factor significance is determined using singular value decomposition, which takes
advantage of the fact that every n × p matrix of real numbers, M, can be factored into three new
matrices:
M = U*S*V
where U is an n × n unitary (U*U, where U is the conjugate transpose of U) matrix, V is an p × p
unitary matrix, and S is an n × p nonnegative diagonal (only nonzero numbers are on diagonal)
matrix. The values of the S matrix are a unique set of n or p (whichever is smaller) ranked positive
numbers called the singular values and they correspond to the additive significance of the columns
of the U and V matrices.
The singular values then are the starting point for breaking down the data. The sum of all the
singular values corresponds to all of the information (signal and error) in the dataset. The
contribution of the m largest singular values corresponds to the fraction of the data that can be
accounted for with m factors. For random error, the decrease in a list of singular values is
characteristically slow (~10%). If there are meaningful signals in the data, the list of the singular
values will start off quite high and eventually drop (often precipitously) once all the significant
additive features have been accounted for. After the drop, the decrease in the singular values will
behave as it would for random numbers. See Figure 3, top left, for an example of this.
Ultimately, we want to factor the data matrix into just two matrices, one n × m and the other
m × p, because this corresponds to the molar absorptivity and concentration factors in Beer’s law.
However, there are many solutions to this factorization over the set of real numbers. Requiring the
matrices to consist of entirely non-negative numbers, which is true of all chemically sensible
answers, greatly reduces the number of optimal solutions. Furthermore, restricting one of the
matrices to contain factor traces that are unimodal (possessing only 1 maximum), which is true of
the equilibrium concentration of species as a function of a single reaction coordinate, the number of
optimal solutions can be reduced further and possibly even to one. This is the type of calculation that
the model-free figure is based on. The starting point is derived from the evolutionary factor analysis
which identifies when during the course of the titration each species appears and disappears. The
final answer is achieved through iteratively calculating one matrix from the other and vice versa
until self-consistency is achieved. Within each iteration, the non-negative and unimodal restraints are
imposed.3
The exact calculations for this within Sivvu™ are somewhat rudimentary. This is because
they are designed to aid in the establishment of a viable chemical model and not be used as
standalone results. More thorough calculations can be done, but are quite time-consuming.
16
B. Equilibrium Restricted Calculations
Sivvu™ makes use of two major mathematical relationships in chemistry. The first is the
Beer-Lambert law for spectroscopic data. Because it is an additive law it can be expanded to include
multiple species and is therefore directly amenable to factor analysis in all of its forms. It can be
conveniently written in matrix form according to the following equation (the pathlength is accounted
for by dividing it into the raw absorbance):4
Abs np    nm C mp
m
(n x m)
Equilibrium Activities
(m x p)
n = number of wavelengths
p = number of solution mixtures
Molar
Absorptivity
Absorbance
(n x p)
Residual
(n x p)
m = number of pure chemical
species in the system.
Figure 7: The Beer-Lambert Law in schematic matrix form.
The second mathematical relationship is that of chemical potential which leads to equilibrium
concentrations. These greatly restrict the model because the entire concentration matrix is now
dependent on just a few independent values.
ΔG°rxn = -RTlnK - RTlnA
lnK = niMproducts - niMreactants
lnA = niγproducts - niγreactants
Here γ is the activity coefficient, which is calculated from the concentration terms in a variety of
ways (see section II.C.iii). For a given set of ΔG°rxn values, the equilibrium concentrations and
activities are found that minimize the free energy, making ΔGrxn = 0.
The program begins by using the initial ΔG°rxn values for the equilibrium reactions to
calculate initial equilibrium activities and concentrations that satisfy the mass and charge balance
parameters.5 With matrices of absorbance data and concentrations, Sivvu™ can then directly
optimize the molar absorptivity of each absorbing species at each wavelength by least squares
minimization. This calculation amounts to solving (n × p) simultaneous linear equations for (m × p)
molar absorptivity values, where p is the number of solutions, m is the number of species, and n is
the number of wavelengths. Since m is often much smaller than n, this system of equations is largely
over-determined, which means that rather than be solved exactly, it is solved with error.
17
Now, having found spectral and concentration data for each species, Sivvu™ calculates the
difference between the measured absorbance data and the product of the optimized molar
absorptivity values with the calculated concentrations. The root-mean-square is taken of these
residuals to ascertain the quality of fit for the data. Finally, Matlab searches for thermodynamic
values that minimize the total root-mean-square residual.
Within the Matlab environment, Sivvu™ works with several variable structures for storing
all of the important values used during the calculations. ‘Sivvu_data’ contains all of the important
information from the input data file, and once populated this structure is not changed unless a new
data file is called. ‘Sivvu_input’ contains user-defined information on the thermodynamic model,
and ‘Sivvu_result’ contains the data actually used in calculations, trimmed down according to user
specifications, as well as the output from the optimization calculations. ‘Sivvu_sim’ and
‘Sivvu_free’ are two additional variable structures used to store information relevant to the simulator
and the model free analysis, respectively. The contents of any variable structure can be viewed by
typing its name in the Matlab command line.
There are also several other variables that exist while Sivvu™ is being used. They include
‘filename’, which contains the name of the data file; and track, which records which buttons have
been pressed in the User Control GUI.
C. Error Calculations
The primary error calculation is based on the residuals – the point by point difference between
the calculated and measured absorbance values. The squares of these values are averaged. The
square root of this average is what is presented as total RMS error, and this is the value that is
minimized. The program will also print out several other numbers which quantify the degree of fit.
The unrestricted data reconstruction is the sum of the first m singular values of the data matrix
divided by the sum of all of them. This represents the fraction of data that can be accounted for with
m mathematical factors and no restrictions. The restricted data reconstruction is the sum of the
singular values for the calculated data matrix divided by the sum of the singular values for the
observed data matrix. This represents the fraction of data that can be accounted for with m factors
that are forced to adhere to equilibrium concentrations according to the chemical model.
All forms of factor analysis have the added benefit of reducing error because much of the
error, which is distributed evenly, is left behind when only a few factors are used to reconstruct the
data.6 Some error, called imbedded error, does remain however.
=

-
18
Appendix A: Data for Common Solvents
Name
Structure
Density BP
(g/ml) (oC)
dipole
moment
dielectric
constant
acetic acid
1.049
118
1.74
6.15
acetone
0.79
56
2.88
20.7
acetonitrile
0.786
81
3.92
36.6
benzene
0.879
80
0
2.28
0.81
118
1.66
17.8
1.12
204
4.4
39.1
1.594
1.3266
0.713
76
40
35
0
1.14
1.15
2.24
9.1
4.34
N,N-dimethylformamide
(DMF)
0.944
153
3.82
38.3
dimethyl sulfoxide
(DMSO)
1.092
189
3.96
47.2
0.789
78
1.69
24.3
ethyl acetate
0.894
78
1.78
6.02
formamide
1.13
210
3.73
109
formic acid
1.22
100
1.41
58
0.655
0.791
69
68
0
1.70
2.02
33
0.805
80
2.78
18.5
1.326
0.803
40
97
1.60
1.68
9.08
20.1
0.886
66
1.63
7.52
0.998
100
1.85
80
1-butanol
-butyrolactone
carbon tetrachloride
Dichloromethane
diethyl ether
ethanol
hexane
methanol
CH3CH2CH2CH2-OH
O
O
CCl4
CCl2H2
CH3CH2OCH2CH3
CH3CH2-OH
CH3(CH2)4 CH3
CH3-OH
methyl ethyl ketone
methylene chloride
1-propanol
Tetrahydrofuran
(THF)
water
CH2Cl2
CH3CH2CH2-OH
H-OH
19
References
1
Malinowski, E.R. Factor Analysis in Chemistry, 3rd Ed. Wiley Interscience, 2002.
Hirose, K. Analytical Methods in Supramolecular Chemistry. Ed. C. Schalley, Wiley, 2007, chapter 2.
3
Gampp, H.; Maeder, M.; Meyer, C.J.; Zuberbuehler, A.D. Talanta, 1986, 33, 943-951.
4
Wallace, R.M. J. Phys. Chem., 1960, 64, 899-901.
5
Wall, T.W.; Greening, D.; Woolsey, R.E.D. Oper. Res., 1986, 34, 345-355.
6
Malinowski, E.R. Factor Analysis in Chemistry, 3rd Ed. Wiley Interscience, 2002, chapter 4.
2
20
Download