Source Apportionment through Advanced Factor Analysis Modeling

advertisement
Brian Diefendorf
Honors Thesis Proposal
Source Apportionment through Advanced Factor Analysis Modeling
The field of aerosol chemistry is very diverse and extensive. The goal of this project is to
analyze several sets of particle composition data such as data from the Regional Air Pollution
Study of St. Louis using Positive Matrix Factorization. With this approach, we seek to resolve
and distinguish the sources contributing to the measured particulate matter concentration at each
site. If successful, a greater understanding of source contributions will be gained as well as a
more accurate way to control particulate matter in the atmosphere.
The first National Ambient Air Quality Standards were established in 1971. These
standards set limits on allowable levels of particulate matter suspended in the air. However, these
standards only focused on total particulate mass and did not distinguish between the different
sizes of particles present in air. As such, control efforts were directed towards the larger particles,
as they comprise a larger proportion of the total particulate mass and are more easily removed
from the air. There is increasing evidence that the smaller particles pose a greater threat to human
health. These risks are mostly the result of respiratory problems that arise from the continued
inhalation of the particulate matter. Thus, it is important to control and regulate the inhalable
particles, those with diameters ten m or less. The current NAAQS for 10m or less diameter
particles is an annual arithmetic mean of 50 m /m3 and a 24-hour average of 150 m /m3.
Standards for particles in the 2.5 m or less range have also been established. The standard calls
for an annual mean of 15 m /m3 and a 24-hour average of 65 m /m3. The measurement and
tracking of the relative emission rates from the sources would enhance the controls developed
from these standards. Nonetheless, the complexity of urban ecosystems makes determining the
sources of particulate matter a very difficult problem.
The fundamental principle behind source receptor relationships is the conservation of
mass and the application of a mass balance analysis. This analysis can be used to identify and
assign sources of airborne particulate matter in the atmosphere. This particular method is known
as receptor modeling. There are two approaches to obtaining a data set for receptor modeling.
One is to determine a large number of chemical constituents such as elemental concentrations in a
number of samples. The other is to use automated electron microscopy to characterize the
composition and shape of particles in a series of particle samples. Regardless of the approach
used, a mass balance can be made to account for all chemical species in the samples as
contributions from the independent sources.
Natural physical constraints exist on the system and must be considered in developing
any model and obtaining physically realistic solutions from the model [Henry, 1991]. These
fundamental, natural constraints are:
1) The model must reproduce the original data; the model must explain the
observations.
2) The predicted source compositions must be non-negative; a source cannot have a
negative percentage of an element.
3) The predicted source contributions to the aerosol concentrations must all be nonnegative; a source cannot emit negative mass.
4) The sum of the predicted elemental mass contributions for each source must be less
than or equal to the total measured mass for each element; the whole is greater than
or equal to the sum of its parts.
There are several methods utilized to model such problems. Miller et al. (1972) initially
used a chemical element balance analysis to solve these problems. In this method, it is assumed
that the number and composition of the sources are known. The observed composition data is
then regressed against the known source profile matrix. This method has produced very good fits
to the data in recent studies. The major drawback in this method is that it requires an a priori
knowledge of both the number and composition of the source emissions.
Thus, it is necessary to estimate the number and composition of the sources as well as
their contributions to the measured particulate mass. The multivariate data analysis methods that
are used are referred to as factor analysis. The most common form of factor analysis is Principal
Components Analysis (PCA). The PCA results are usually calculated using an eigenvector
analysis of a correlation matrix [Hopke, 1985; Henry, 1991]. The PCA method utilizes a singular
value decomposition of the matrix. However there are numerous problems that arise from using
the PCA method. Paatero and Tapper [1993] show that in PCA, there is a scaling of the data by
column or row. This scaling will lead to distortions in the analysis. They also show that the
optimum scaling of the data would be to scale each data point individually so as to have more
precise data, having greater influence on the solution than points with higher uncertainties.
However, point by point scaling results in a data matrix that cannot be reproduced by a
conventional factor analysis method based on the singular value decomposition. Therefore, it is
necessary to use a different form of factor analysis.
Recently, a new technique known as Positive Matrix Factorization has been developed.
Positive Matrix Factorization (PMF) differs from previous analysis methods in that all other
methods are eigenvector based and the problem of non-optimal scaling has been specifically
addressed in the PMF method. PMF utilizes error estimates of the data to provide optimum data
point scaling. This scaling is accomplished through considering the problem as a least-squares
problem. Initially the problem was solved iteratively using alternating least squares [Paatero and
Tapper, 1993]. In an early version of this approach, one of the matrices is taken as known and the
chi-squared is minimized with respect to the other matrix as a weighted linear-least-squares
problem. Then the roles of the matrices are reversed and the process is repeated. This reversal is
continued until convergence. The drawback to solving PMF in this fashion is that the process is
slow. In order to improve speed, each step in the iteration was changed so that modifications are
made to both matrices instead of only one [Paatero and Tapper, 1994]. Subsequently, a computer
program, PMF2, determines the joint solution.
Now that the problem and the data analysis methods utilized have been identified, it is
necessary to determine a data set to examine PMF's analysis power. Data from the Regional Air
Pollution Study (RAPS) of St. Louis, MO makes an excellent choice for PMF analysis. From
May 1975 to Apri11977, roughly 35,000 ambient aerosol samples were collected at 10 sampling
sites in and around the city of St. Louis (Goulding et al., 1981). Coarse and fine fractions of
particles were deposited on membrane filters utilizing dichotomous air samplers (Nelson, 1979).
Total mass of samples was measured by B-gauge measurements and for concentrations of up to
27 elements by utilizing an energy dispersive X-ray fluorescence analysis (Goulding et al., 1981).
This data is not only robust in samples, but it has been examined using numerous analysis
methods and thus provides the perfect opportunity to compare PMF's prove its accuracy and
precision to that of other methods.
It is likely that additional data from Washington, DC with more complete elemental
analysis will also be examined as time permits. These data include organic and elemental carbon
that was not measured in St. Louis.
Recent unpublished studies have suggested that these variables can be used to separate
diesel from spark ignition sources, which is a critical problem currently facing receptor modeling.
The Washington data will provide a good test to see if such an analysis can be duplicated
elsewhere.
References:
Alpert, D.J and P.K. Hopke (1981) A Determination of the Sources of Airborne Particles
Collected During the Regional Air Pollution Study, Atmospheric Environment 15:675687.
Chang, S.N, P .K. Hopke, G.E. Gordon and S. W .Rheingrover (1988) Target-Transformation
Factor Analysis of Airborne Particulate Samples Selected by Wind-Trajectory Analysis,
Aerosol Sci. Technol. 8:63-80.
Cobourn, W .G., and R.B. Husor (1982) Diurnal and Seasonal Pattems.of Particulate Sulfur and
Sulfuric Acid in St. Louis, July 1977- June 1978, Atmospheric Environment 16:14411450.
Dzubay, T.G. (1980) Chemical Element Balance Method Applied to Dichotomous Sampler Data,
New York Academy of Sciences 338:126-144.
Goulding, F .S, J.M. Jak1evic and B. W .Loo (1978) Aerosol Analysis for the Regional Air
Pollution Study-Interim Report. EPA-600/4-78-034, U.S. Environmental Protection
Agency, Research Triangle Park, N.C.
Henry, R.C. (1991) Multivariate Receptor Models, In: Receptor Modeling for Air Quality
Management, P .K. Hopke, ed., Elsevier Science Publishers, Amsterdam, 117-147.
Hopke, P .K. (2000) A Guide to Positive Matrix Factorization
Hopke, P .K. (1985) Receptor Modeling in Environmental Chemistry, John Wiley & Sons, Inc.,
New York.
Hwang, C.S., K.G. Severin, and P.K. Hopke (1984) A comparison of R- and Q- Modes in target
Transformation Factor Analysis for Resolving Environmental Data, Atmospheric
Environment 18:345-352.
Jaklevic, J.M. R.C. Gatti, F.S Goulding, B.W. Loo and A.C. Thompson (1981) Aerosol Analysis
for the Regional Air Pollution Study- Final Report EP A-600/4-81-006, U.S.
Environmental Protection Agency, Research Triangle Park, N.C.
Karl, T.R. (1980) A Study on the Spatial Variability of Ozone and Other Pollutants at St. Louis,
Missouri, Atmospheric Environment 14:681-694.
Liu, C.K., B.A. Roscoe, K.G. Severin and P.K. Hopke (1982) The Application of Factor Analysis
to Source Apportionment of Aerosol Mass, Am. Ind. Hyg. Assoc. 43:314-318.
Download